Reinforcement Learning - 2025-07

Publish Date Title Authors PDF Translate Read Code
2025-07-29 Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics Leonard Hinckeldey et.al. 2507.21638 translate read null
2025-07-23 Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains Anisha Gunjal et.al. 2507.17746 translate read null
2025-07-23 Megrez2 Technical Report Boxun Li et.al. 2507.17728 translate read null
2025-07-23 How Should We Meta-Learn Reinforcement Learning Algorithms? Alexander David Goldie et.al. 2507.17668 translate read null
2025-07-23 CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning Lingxiao Tang et.al. 2507.17548 translate read null
2025-07-23 Generalized Advantage Estimation for Distributional Policy Gradients Shahil Shaik et.al. 2507.17530 translate read null
2025-07-23 Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice Shanbo Cheng et.al. 2507.17527 translate read null
2025-07-23 URPO: A Unified Reward & Policy Optimization Framework for Large Language Models Songshuo Lu et.al. 2507.17515 translate read null
2025-07-23 Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Yu Li et.al. 2507.17512 translate read null
2025-07-23 ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents Chang Nie et.al. 2507.17462 translate read null
2025-07-23 Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning Situo Zhang et.al. 2507.17448 translate read null
2025-07-22 Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Junhao Shen et.al. 2507.16814 translate read null
2025-07-22 Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty Mehul Damani et.al. 2507.16806 translate read null
2025-07-22 Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning Mian Ibad Ali Shah et.al. 2507.16796 translate read null
2025-07-22 Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Ang Li et.al. 2507.16746 translate read link
2025-07-23 Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints Zhenyun Yin et.al. 2507.16727 translate read null
2025-07-22 Adaptive Inventory Strategies using Deep Reinforcement Learning for Dynamic Agri-Food Supply Chains Amandeep Kaur et.al. 2507.16670 translate read null
2025-07-22 FOGNITE: Federated Learning-Enhanced Fog-Cloud Architecture Somayeh Sobati-M et.al. 2507.16668 translate read null
2025-07-22 Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis Sara Giordano et.al. 2507.16641 translate read null
2025-07-22 Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems Ali Mohamed Ali et.al. 2507.16635 translate read null
2025-07-22 Step-Audio 2 Technical Report Boyong Wu et.al. 2507.16632 translate read link
2025-07-21 The Impact of Language Mixing on Bilingual LLM Reasoning Yihao Li et.al. 2507.15849 translate read null
2025-07-21 GUI-G $^2$ : Gaussian Reward Modeling for GUI Grounding Fei Tang et.al. 2507.15846 translate read link
2025-07-22 Hierarchical Budget Policy Optimization for Adaptive Reasoning Shangke Lyu et.al. 2507.15844 translate read link
2025-07-21 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra Seth Karten et.al. 2507.15815 translate read link
2025-07-21 Power-Constrained Policy Gradient Methods for LQR Ashwin Verma et.al. 2507.15806 translate read null
2025-07-21 Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning Sneheel Sarangi et.al. 2507.15788 translate read null
2025-07-21 Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Jiakang Wang et.al. 2507.15778 translate read link
2025-07-21 LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Xingyu Wu et.al. 2507.15758 translate read link
2025-07-21 EMP: Executable Motion Prior for Humanoid Robot Standing Upper-body Motion Imitation Haocheng Xu et.al. 2507.15649 translate read null
2025-07-21 Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training Kailai Yang et.al. 2507.15640 translate read null
2025-07-18 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Xiaoya Li et.al. 2507.14111 translate read link
2025-07-18 Preference-based Multi-Objective Reinforcement Learning Ni Mu et.al. 2507.14066 translate read null
2025-07-18 Reframing attention as a reinforcement learning problem for causal discovery Turan Orujlu et.al. 2507.13920 translate read null
2025-07-18 Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments Kathrin Korte et.al. 2507.13846 translate read null
2025-07-18 Scalable Submodular Policy Optimization via Pruned Submodularity Graph Aditi Anand et.al. 2507.13834 translate read null
2025-07-18 DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training Zhixin Wang et.al. 2507.13833 translate read null
2025-07-18 Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery Joydeep Chandra et.al. 2507.13757 translate read null
2025-07-18 LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction Jing Chang et.al. 2507.13712 translate read null
2025-07-18 CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation Jing Chang et.al. 2507.13710 translate read null
2025-07-18 State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions Sen Lu et.al. 2507.13638 translate read null
2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Senqiao Yang et.al. 2507.13348 translate read link
2025-07-17 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Zhouqi Hua et.al. 2507.13332 translate read null
2025-07-17 Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour Emma M. A. Harrison et.al. 2507.13277 translate read null
2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation Jiazheng Li et.al. 2507.13266 translate read null
2025-07-17 Signal Temporal Logic Compliant Co-design of Planning and Control Manas Sashank Juvvi et.al. 2507.13225 translate read null
2025-07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL Ofir Nabati et.al. 2507.13181 translate read null
2025-07-17 Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback Suzie Kim et.al. 2507.13171 translate read null
2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Hao Sun et.al. 2507.13158 translate read null
2025-07-17 From Roots to Rewards: Dynamic Tree Reasoning with RL Ahmed Bahloul et.al. 2507.13142 translate read null
2025-07-17 ZipMPC: Compressed Context-Dependent MPC Cost via Imitation Learning Rahel Rickenbach et.al. 2507.13088 translate read null
2025-07-16 EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos Ruihan Yang et.al. 2507.12440 translate read null
2025-07-16 Improving Reinforcement Learning Sample-Efficiency using Local Approximation Mohit Prashant et.al. 2507.12383 translate read null
2025-07-16 Thought Purity: Defense Paradigm For Chain-of-Thought Attack Zihao Xue et.al. 2507.12314 translate read null
2025-07-16 Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning Yuhao Chen et.al. 2507.12215 translate read null
2025-07-16 BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search Azhar Ikhtiarudin et.al. 2507.12189 translate read link
2025-07-17 Efficient Preparation of Fermionic Superfluids in an Optical Dipole Trap through Reinforcement Learning Yueyang Min et.al. 2507.12152 translate read null
2025-07-16 Topology Enhanced MARL for Multi-Vehicle Cooperative Decision-Making of CAVs Ye Han et.al. 2507.12110 translate read null
2025-07-16 Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics Muleilan Pei et.al. 2507.12083 translate read null
2025-07-16 Towards Ultra-Reliable 6G in-X Subnetworks: Dynamic Link Adaptation by Deep Reinforcement Learning Fateme Salehi et.al. 2507.12031 translate read null
2025-07-16 QAS-QTNs: Curriculum Reinforcement Learning-Driven Quantum Architecture Search for Quantum Tensor Networks Siddhant Dutta et.al. 2507.12013 translate read null
2025-07-15 Robot Drummer: Learning Rhythmic Skills for Humanoid Drumming Asad Ali Shahid et.al. 2507.11498 translate read null
2025-07-15 Exploring the robustness of TractOracle methods in RL-based tractography Jeremi Levesque et.al. 2507.11486 translate read null
2025-07-15 Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light Mani Hamidi et.al. 2507.11482 translate read null
2025-07-15 Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs Gabriel Bo et.al. 2507.11371 translate read null
2025-07-15 Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning Daniel Tanneberg et.al. 2507.11367 translate read null
2025-07-15 Sensing Accuracy Optimization for Multi-UAV SAR Interferometry with Data Offloading Mohamed-Amine Lahmeri et.al. 2507.11284 translate read null
2025-07-15 Ocean Diviner: A Diffusion-Augmented Reinforcement Learning for AUV Robust Control in the Underwater Tasks Weiyi Liu et.al. 2507.11283 translate read null
2025-07-15 Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound Tal Fiskus et.al. 2507.11269 translate read null
2025-07-15 Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction Deepak Kumar Panda et.al. 2507.11173 translate read null
2025-07-15 Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities Yiting Qu et.al. 2507.11155 translate read null
2025-07-14 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Mingxian Lin et.al. 2507.10548 translate read link
2025-07-14 Disentangling Neural Disjunctive Normal Form Models Kexin Gu Baugh et.al. 2507.10546 translate read null
2025-07-14 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Mingqi Wu et.al. 2507.10532 translate read link
2025-07-14 Some remarks on gradient dominance and LQR policy optimization Eduardo D. Sontag et.al. 2507.10452 translate read null
2025-07-14 Prompt Informed Reinforcement Learning for Visual Coverage Path Planning Venkat Margapuri et.al. 2507.10284 translate read null
2025-07-14 Cross-Timeslot Optimization for Distributed GPU Inference Using Reinforcement Learning Chengze Du et.al. 2507.10259 translate read null
2025-07-14 ToMacVF : Temporal Macro-action Value Factorization for Asynchronous Multi-Agent Reinforcement Learning Wenjing Zhang et.al. 2507.10251 translate read null
2025-07-14 Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning? Yumi Omori et.al. 2507.10174 translate read null
2025-07-14 Robust RL Control for Bipedal Locomotion with Closed Kinematic Chains Egor Maslennikov et.al. 2507.10164 translate read null
2025-07-14 Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review Siyi Hu et.al. 2507.10142 translate read null
2025-07-11 One Token to Fool LLM-as-a-Judge Yulai Zhao et.al. 2507.08794 translate read null
2025-07-11 Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning James McCarthy et.al. 2507.08793 translate read null
2025-07-11 Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data Jeonghye Kim et.al. 2507.08761 translate read null
2025-07-11 On the Effect of Regularization in Policy Mirror Descent Jan Felix Kleuker et.al. 2507.08718 translate read null
2025-07-11 SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations Peter Crowley et.al. 2507.08707 translate read null
2025-07-11 elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings Philip Osborne et.al. 2507.08705 translate read null
2025-07-11 Multi-critic Learning for Whole-body End-effector Twist Tracking Aravind Elanjimattathil Vijayan et.al. 2507.08656 translate read null
2025-07-11 Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees Berire Gunes Reyhan et.al. 2507.08653 translate read null
2025-07-11 Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning Xingguang Ji et.al. 2507.08649 translate read link
2025-07-11 Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data Parag Dutta et.al. 2507.08610 translate read null
2025-07-10 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Haochen Wang et.al. 2507.07999 translate read link
2025-07-10 Single-pass Adaptive Image Tokenization for Minimum Program Search Shivam Duggal et.al. 2507.07995 translate read null
2025-07-10 EXPO: Stable Reinforcement Learning with Expressive Policies Perry Dong et.al. 2507.07986 translate read null
2025-07-10 Reinforcement Learning with Action Chunking Qiyang Li et.al. 2507.07969 translate read null
2025-07-10 Scaling RL to Long Videos Yukang Chen et.al. 2507.07966 translate read link
2025-07-10 Excess Observables Reveal Nonreciprocity in Integrated Covariance Timur Aslyamov et.al. 2507.07876 translate read null
2025-07-10 “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents Giovanni Dispoto et.al. 2507.07848 translate read null
2025-07-10 Beyond Robustness: Learning Unknown Dynamic Load Adaptation for Quadruped Locomotion on Rough Terrain Leixin Chang et.al. 2507.07825 translate read null
2025-07-10 BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning Ruohong Liu et.al. 2507.07769 translate read null
2025-07-10 Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization Chengtao Jian et.al. 2507.07723 translate read null
2025-07-09 Graph-Based Complexity Metrics for Multi-Agent Curriculum Learning: A Validated Approach to Task Ordering in Cooperative Coordination Environments Farhaan Ebadulla et.al. 2507.07074 translate read null
2025-07-09 First Return, Entropy-Eliciting Explore Tianyu Zheng et.al. 2507.07017 translate read null
2025-07-09 Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks Deemah H. Tashman et.al. 2507.06997 translate read null
2025-07-09 Optimizing Cognitive Networks: Reinforcement Learning Meets Energy Harvesting Over Cascaded Channels Deemah H. Tashman et.al. 2507.06981 translate read null
2025-07-09 Bounomodes: the grazing ox algorithm for exploration of clustered anomalies Samuel Matloob et.al. 2507.06960 translate read null
2025-07-10 Rethinking Verification for LLM Code Generation: From Generation to Testing Zihan Ma et.al. 2507.06920 translate read link
2025-07-09 Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams Abolfazl Zarghani et.al. 2507.06901 translate read null
2025-07-09 Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model Jing Liang et.al. 2507.06892 translate read null
2025-07-09 Episodic Contextual Bandits with Knapsacks under Conversion Models Zitian Li et.al. 2507.06859 translate read null
2025-07-10 Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning Matej Straka et.al. 2507.06825 translate read link
2025-07-08 EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow Yixiang Chen et.al. 2507.06224 translate read null
2025-07-08 CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Zhongyuan Peng et.al. 2507.06181 translate read link
2025-07-08 Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model Koki Yamane et.al. 2507.06174 translate read null
2025-07-08 Learning Agile Tensile Perching for Aerial Robots from Demonstrations Kangle Yuan et.al. 2507.06172 translate read null
2025-07-08 Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation Mohamad H. Danesh et.al. 2507.06111 translate read null
2025-07-08 AI-Based Demand Forecasting and Load Balancing for Optimising Energy use in Healthcare Systems: A real case study Iman Rahimi et.al. 2507.06077 translate read null
2025-07-09 FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models Bo Pang et.al. 2507.06057 translate read null
2025-07-08 CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation Kushal Gajjar et.al. 2507.06013 translate read null
2025-07-08 From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination Chang Yao et.al. 2507.06004 translate read null
2025-07-08 BlueLM-2.5-3B Technical Report Baojiao Xiong et.al. 2507.05934 translate read null
2025-07-07 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Yana Wei et.al. 2507.05255 translate read link
2025-07-07 Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving Elahe Delavari et.al. 2507.05251 translate read null
2025-07-07 NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving Qucheng Peng et.al. 2507.05227 translate read null
2025-07-07 EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling Boyuan Wang et.al. 2507.05198 translate read null
2025-07-07 Sequential Attention-based Sampling for Histopathological Analysis Tarun G et.al. 2507.05077 translate read null
2025-07-07 Replacing thinking with tool usage enables reasoning in small language models Corrado Rainone et.al. 2507.05065 translate read null
2025-07-07 When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning Maxence Boels et.al. 2507.05011 translate read null
2025-07-07 Linking Homeostasis to Reinforcement Learning: Internal State Control of Motivated Behavior Naoto Yoshida et.al. 2507.04998 translate read null
2025-07-07 Object-centric Denoising Diffusion Models for Physical Reasoning Moritz Lange et.al. 2507.04920 translate read null
2025-07-07 Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning Sanyam Vyas et.al. 2507.04883 translate read null
2025-07-03 MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs Purbesh Mitra et.al. 2507.02851 translate read link
2025-07-03 StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason Kaiyi Zhang et.al. 2507.02841 translate read null
2025-07-03 ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning Ruiyang Zhou et.al. 2507.02834 translate read null
2025-07-03 Generalizing Verifiable Instruction Following Valentina Pyatkin et.al. 2507.02833 translate read null
2025-07-03 Multimodal Mathematical Reasoning with Diverse Solving Perspective Wenhao Shi et.al. 2507.02804 translate read null
2025-07-03 A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control Zilin Kang et.al. 2507.02712 translate read null
2025-07-03 Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions Thomas Hazenberg et.al. 2507.02698 translate read null
2025-07-03 RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes Jiaxing Wang et.al. 2507.02690 translate read null
2025-07-03 TUC-PPO: Team Utility-Constrained Proximal Policy Optimization for Spatial Public Goods Games Zhaoqilin Yang et.al. 2507.02675 translate read null
2025-07-03 On Efficient Bayesian Exploration in Model-Based Reinforcement Learning Alberto Caron et.al. 2507.02639 translate read null
2025-07-02 Kwai Keye-VL Technical Report Kwai Keye Team et.al. 2507.01949 translate read link
2025-07-02 NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks Yang Li et.al. 2507.01921 translate read null
2025-07-02 Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models Chengao Li et.al. 2507.01915 translate read null
2025-07-02 TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types Yuhao Lin et.al. 2507.01857 translate read null
2025-07-02 TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents Dmytro Kuzmenko et.al. 2507.01823 translate read null
2025-07-02 Quantum reinforcement learning in dynamic environments Oliver Sefrin et.al. 2507.01691 translate read null
2025-07-02 AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training Zhenyu Han et.al. 2507.01663 translate read null
2025-07-02 Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning Wu Fei et.al. 2507.01551 translate read null
2025-07-02 Chargax: A JAX Accelerated EV Charging Simulator Koen Ponse et.al. 2507.01522 translate read null
2025-07-02 Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning Yanfei Zhang et.al. 2507.01489 translate read null
2025-07-01 SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Bo Liu et.al. 2506.24119 translate read link
2025-07-01 Adapt Your Body: Mitigating Proprioception Shifts in Imitation Learning Fuhang Kuang et.al. 2506.23944 translate read null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)