Reinforcement Learning - 2025-10

Publish Date Title Authors PDF Translate Read Code
2025-10-31 Reinforcement Learning for Resource Allocation in Vehicular Multi-Fog Computing Mohammad Hadi Akbarzadeh et.al. 2511.00276 translate read null
2025-10-31 Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning Michiel Straat et.al. 2511.00272 translate read null
2025-10-31 Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning Marwa Abdulhai et.al. 2511.00222 translate read null
2025-10-31 Iterative Foundation Model Fine-Tuning on Multiple Rewards Pouya M. Ghari et.al. 2511.00220 translate read null
2025-10-31 Deep reinforcement learning for optimal trading with partial information Andrea Macrì et.al. 2511.00190 translate read null
2025-10-31 Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning Shiman Zhang et.al. 2511.00166 translate read null
2025-10-31 EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations Justin Yu et.al. 2511.00153 translate read null
2025-10-31 A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control Qing Guo et.al. 2511.00136 translate read null
2025-10-31 DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads Antonio Guillen-Perez et.al. 2511.00117 translate read null
2025-10-31 LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers Avisek Naug et.al. 2511.00116 translate read null
2025-10-31 End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning Hanae Elmekki et.al. 2511.00114 translate read null
2025-10-30 Real-DRL: Teach and Learn in Reality Yanbing Mao et.al. 2511.00112 translate read null
2025-10-30 Self-Improving Vision-Language-Action Models with Data Generation via Residual RL Wenli Xiao et.al. 2511.00091 translate read null
2025-10-30 Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail NVIDIA et.al. 2511.00088 translate read null
2025-10-29 Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models Tue Le et.al. 2511.00066 translate read null
2025-10-31 Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems Alireza Saleh Abadi et.al. 2510.27659 translate read null
2025-10-31 Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Yuhong Liu et.al. 2510.27606 translate read link
2025-10-31 MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval Qi Luo et.al. 2510.27569 translate read null
2025-10-31 Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval Yulong Hui et.al. 2510.27566 translate read null
2025-10-31 VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision Xuan Gong et.al. 2510.27462 translate read null
2025-10-31 Learning Soft Robotic Dynamics with Active Exploration Hehui Zheng et.al. 2510.27428 translate read null
2025-10-31 DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains Tian Liang et.al. 2510.27419 translate read null
2025-10-31 Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints Yueyang Wang et.al. 2510.27383 translate read null
2025-10-31 Reasoning Models Sometimes Output Illegible Chains of Thought Arun Jose et.al. 2510.27338 translate read null
2025-10-31 When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making Ali Raza Jafree et.al. 2510.27334 translate read null
2025-10-31 Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines Kristina Levina et.al. 2510.27329 translate read null
2025-10-31 A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination Zhengchang Hua et.al. 2510.27289 translate read null
2025-10-31 Inferring trust in recommendation systems from brain, behavioural, and physiological data Vincent K. M. Cheung et.al. 2510.27272 translate read null
2025-10-31 MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models Kangkun Mao et.al. 2510.27267 translate read null
2025-10-31 GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation Tao Liu et.al. 2510.27210 translate read null
2025-10-31 ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction Jing Chang et.al. 2510.27168 translate read null
2025-10-31 Disrupting Networks: Amplifying Social Dissensus via Opinion Perturbation and Large Language Models Erica Coppolillo et.al. 2510.27152 translate read null
2025-10-31 AURA: A Reinforcement Learning Framework for AI-Driven Adaptive Conversational Surveys Jinwen Tang et.al. 2510.27126 translate read null
2025-10-31 Towards Understanding Self-play for LLM Reasoning Justin Yang Chae et.al. 2510.27072 translate read null
2025-10-31 Distributed Precoding for Cell-free Massive MIMO in O-RAN: A Multi-agent Deep Reinforcement Learning Framework Mohammad Hossein Shokouhi et.al. 2510.27069 translate read null
2025-10-31 Adaptive Human-Computer Interaction Strategies Through Reinforcement Learning in Complex Rui Liu et.al. 2510.27058 translate read null
2025-10-30 SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation Eric T. Chang et.al. 2510.27048 translate read null
2025-10-30 Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Md Tanvirul Alam et.al. 2510.27044 translate read link
2025-10-30 e1: Learning Adaptive Control of Reasoning Effort Michael Kleinman et.al. 2510.27042 translate read null
2025-10-30 Algorithmic Predation: Equilibrium Analysis in Dynamic Oligopolies with Smooth Market Sharing Fabian Raoul Pieroth et.al. 2510.27008 translate read null
2025-10-30 A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms Elise Wolf et.al. 2510.27001 translate read null
2025-10-30 Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench Fenfen Lin et.al. 2510.26865 translate read link
2025-10-30 Defeating the Training-Inference Mismatch via FP16 Penghui Qi et.al. 2510.26788 translate read link
2025-10-30 A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation Ashwin Kumar et.al. 2510.26740 translate read null
2025-10-30 Stabilizing Rayleigh-Benard convection with reinforcement learning trained on a reduced-order model Qiwei Chen et.al. 2510.26705 translate read null
2025-10-30 Kimi Linear: An Expressive, Efficient Attention Architecture Kimi Team et.al. 2510.26692 translate read link
2025-10-30 Action-Driven Processes for Continuous-Time Control Ruimin He et.al. 2510.26672 translate read null
2025-10-30 Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation Qianyou Zhao et.al. 2510.26670 translate read null
2025-10-30 The Era of Agentic Organization: Learning to Organize with Language Models Zewen Chi et.al. 2510.26658 translate read null
2025-10-30 Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments Xiaoyi He et.al. 2510.26646 translate read null
2025-10-30 Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications Chuang Zhang et.al. 2510.26628 translate read null
2025-10-30 A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication Weixuan Chen et.al. 2510.26610 translate read null
2025-10-30 Emu3.5: Native Multimodal Models are World Learners Yufeng Cui et.al. 2510.26583 translate read link
2025-10-30 InfoFlow: Reinforcing Search Agent Via Reward Density Optimization Kun Luo et.al. 2510.26575 translate read null
2025-10-30 Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics Prathamesh Kothavale et.al. 2510.26551 translate read null
2025-10-30 Think Outside the Policy: In-Context Steered Policy Optimization Hsiu-Yuan Huang et.al. 2510.26519 translate read null
2025-10-30 Data-Efficient RLVR via Off-Policy Influence Guidance Erle Zhu et.al. 2510.26491 translate read null
2025-10-30 ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems Qiaoling Chen et.al. 2510.26475 translate read null
2025-10-30 PolarZero: A Reinforcement Learning Approach for Low-Complexity Polarization Kernel Design Yi-Ting Hong et.al. 2510.26452 translate read null
2025-10-30 An Impulse Control Approach to Market Making in a Hawkes LOB Market Konark Jain et.al. 2510.26438 translate read null
2025-10-30 Human-in-the-loop Online Rejection Sampling for Robotic Manipulation Guanxing Lu et.al. 2510.26406 translate read null
2025-10-30 Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning Wenchang Duan et.al. 2510.26389 translate read null
2025-10-30 Towards Reinforcement Learning Based Log Loading Automation Ilya Kurinov et.al. 2510.26363 translate read null
2025-10-30 Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle Sebastian Zieglmeier et.al. 2510.26347 translate read null
2025-10-30 Offline Clustering of Preference Learning with Active-data Augmentation Jingyuan Liu et.al. 2510.26301 translate read null
2025-10-30 Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving Lin Liu et.al. 2510.26292 translate read null
2025-10-30 Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search Guochang Li et.al. 2510.26287 translate read null
2025-10-30 Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments Gangyang Li et.al. 2510.26280 translate read null
2025-10-30 Graph-Enhanced Policy Optimization in LLM Agent Training Jiazhen Yuan et.al. 2510.26270 translate read null
2025-10-30 A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation Songxin Lei et.al. 2510.26184 translate read null
2025-10-30 One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning Renhao Li et.al. 2510.26167 translate read null
2025-10-30 Learning to Manage Investment Portfolios beyond Simple Utility Functions Maarten P. Scholl et.al. 2510.26165 translate read null
2025-10-30 Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math Bo Pang et.al. 2510.26143 translate read null
2025-10-30 EgoExo-Con: Exploring View-Invariant Video Temporal Understanding Minjoon Jung et.al. 2510.26113 translate read null
2025-10-30 Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error Chenming Tang et.al. 2510.26109 translate read null
2025-10-30 GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks Chenrui Shi et.al. 2510.26098 translate read null
2025-10-30 Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing Fazel Arasteh et.al. 2510.26089 translate read null
2025-10-30 Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion Chi Zhang et.al. 2510.26067 translate read null
2025-10-30 Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods Emily Steiner et.al. 2510.26040 translate read null
2025-10-29 Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation Feichen Gan et.al. 2510.26026 translate read null
2025-10-29 PORTool: Tool-Use LLM Training with Rewarded Tree Feijie Wu et.al. 2510.26020 translate read null
2025-10-29 Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Yihe Deng et.al. 2510.25992 translate read null
2025-10-29 Estimating cognitive biases with attention-aware inverse planning Sounak Banerjee et.al. 2510.25951 translate read null
2025-10-29 InputDSA: Demixing then Comparing Recurrent and Externally Driven Dynamics Ann Huang et.al. 2510.25943 translate read null
2025-10-29 Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion Ziyi Wang et.al. 2510.25929 translate read null
2025-10-29 $π_\texttt{RL}$ : Online RL Fine-tuning for Flow-based Vision-Language-Action Models Kang Chen et.al. 2510.25889 translate read null
2025-10-29 Approximating Human Preferences Using a Multi-Judge Learned System Eitán Sprejer et.al. 2510.25884 translate read null
2025-10-29 MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs Xiaoke Huang et.al. 2510.25867 translate read null
2025-10-29 Adversarial Pre-Padding: Generating Evasive Network Traffic Against Transformer-Based Classifiers Quanliang Jing et.al. 2510.25810 translate read null
2025-10-29 MetaLore: Learning to Orchestrate Communication and Computation for Metaverse Synchronization Elif Ebru Ohri et.al. 2510.25705 translate read null
2025-10-29 PairUni: Pairwise Training for Unified Multimodal Language Models Jiani Zheng et.al. 2510.25682 translate read null
2025-10-29 Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning Federica Tonti et.al. 2510.25679 translate read null
2025-10-29 ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents Tianyu Yang et.al. 2510.25668 translate read null
2025-10-29 Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills Weikang Wan et.al. 2510.25634 translate read null
2025-10-29 EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis Yusheng Liao et.al. 2510.25628 translate read null
2025-10-29 On the instability of local learning algorithms: Q-learning can fail in infinite state spaces Urtzi Ayesta et.al. 2510.25572 translate read null
2025-10-29 Deep Reinforcement Learning-Based Cooperative Rate Splitting for Satellite-to-Underground Communication Networks Kaiqiang Lin et.al. 2510.25562 translate read null
2025-10-29 Off-policy Reinforcement Learning with Model-based Exploration Augmentation Likun Wang et.al. 2510.25529 translate read null
2025-10-29 Zero Reinforcement Learning Towards General Domains Yuyuan Zeng et.al. 2510.25528 translate read null
2025-10-29 MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL Zekun Xu et.al. 2510.25510 translate read null
2025-10-29 Dynamic Beamforming and Power Allocation in ISAC via Deep Reinforcement Learning Duc Nguyen Dao et.al. 2510.25496 translate read null
2025-10-29 Reinforcement Learning techniques for the flavor problem in particle physics A. Giarnetti et.al. 2510.25495 translate read null
2025-10-29 Generalized Pseudo-Relevance Feedback Yiteng Tu et.al. 2510.25488 translate read null
2025-10-29 Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning Kei Ikemura et.al. 2510.25405 translate read null
2025-10-29 Model-Free Robust Beamforming in Satellite Downlink using Reinforcement Learning Alea Schröder et.al. 2510.25393 translate read null
2025-10-29 Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork Beiwen Zhang et.al. 2510.25340 translate read null
2025-10-29 GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning Jiaqi Wu et.al. 2510.25320 translate read null
2025-10-29 Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning Sagalpreet Singh et.al. 2510.25311 translate read null
2025-10-29 Adaptive Design of mmWave Initial Access Codebooks using Reinforcement Learning Sabrine Aroua et.al. 2510.25271 translate read null
2025-10-29 The influence of the random numbers quality on the results in stochastic simulations and machine learning Benjamin A. Antunes et.al. 2510.25269 translate read null
2025-10-29 SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation Wang zhi et.al. 2510.25268 translate read null
2025-10-29 One-shot Humanoid Whole-body Motion Learning Hao Huang et.al. 2510.25241 translate read null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)