Reinforcement Learning - 2025-05

Publish Date Title Authors PDF Translate Read Code
2025-05-30 ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL Yu Zhang et.al. 2505.24875 translate read null
2025-05-30 ProxyThinker: Test-Time Guidance through Small Visual Reasoners Zilin Xiao et.al. 2505.24872 translate read null
2025-05-30 MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning Yiqing Liang et.al. 2505.24871 translate read null
2025-05-30 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Mingjie Liu et.al. 2505.24864 translate read null
2025-05-30 MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Jingyan Shen et.al. 2505.24846 translate read null
2025-05-30 AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models Conor Heins et.al. 2505.24784 translate read null
2025-05-30 Diffusion-Based Symbolic Regression Zachary Bastiani et.al. 2505.24776 translate read null
2025-05-30 REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Zafir Stojanovski et.al. 2505.24760 translate read link
2025-05-30 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Shelly Bensal et.al. 2505.24726 translate read null
2025-05-29 ZeroGUI: Automating Online GUI Learning at Zero Human Cost Chenyu Yang et.al. 2505.23762 translate read link
2025-05-29 DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning Ziyin Zhang et.al. 2505.23754 translate read link
2025-05-29 PixelThink: Towards Efficient Chain-of-Pixel Reasoning Song Wang et.al. 2505.23727 translate read null
2025-05-29 ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering Zexi Liu et.al. 2505.23723 translate read link
2025-05-29 AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning Lucas N. Alegre et.al. 2505.23708 translate read null
2025-05-29 Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability Ruida Wang et.al. 2505.23703 translate read null
2025-05-29 Grounded Reinforcement Learning for Visual Reasoning Gabriel Sarch et.al. 2505.23678 translate read null
2025-05-29 Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models Lang Cao et.al. 2505.23667 translate read null
2025-05-29 AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction Niklas Freymuth et.al. 2505.23663 translate read link
2025-05-29 Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation Hongxiang Zhang et.al. 2505.23657 translate read null
2025-05-28 Maximizing Confidence Alone Improves Reasoning Mihir Prabhudesai et.al. 2505.22660 translate read null
2025-05-28 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason Ang Lv et.al. 2505.22653 translate read null
2025-05-28 WebDancer: Towards Autonomous Information Seeking Agency Jialong Wu et.al. 2505.22648 translate read null
2025-05-28 FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control Younggyo Seo et.al. 2505.22642 translate read null
2025-05-28 SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning Yu Zhang et.al. 2505.22626 translate read null
2025-05-28 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Ganqu Cui et.al. 2505.22617 translate read null
2025-05-28 HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym Ngoc La et.al. 2505.22597 translate read null
2025-05-28 SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning Jiaqi Huang et.al. 2505.22596 translate read null
2025-05-28 Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs Changhao Song et.al. 2505.22548 translate read null
2025-05-28 Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation Hongyi Zhou et.al. 2505.22492 translate read null
2025-05-27 Reinforcing General Reasoning without Verifiers Xiangxin Zhou et.al. 2505.21493 translate read null
2025-05-27 Policy Optimized Text-to-Image Pipeline Design Uri Gadot et.al. 2505.21478 translate read null
2025-05-27 Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Muzhi Zhu et.al. 2505.21457 translate read null
2025-05-27 Can Large Reasoning Models Self-Train? Sheikh Shafayat et.al. 2505.21444 translate read null
2025-05-27 A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment Brett Bissey et.al. 2505.21414 translate read null
2025-05-27 MRSD: Multi-Resolution Skill Discovery for HRL Agents Shashank Sharma et.al. 2505.21410 translate read null
2025-05-27 Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features Zixuan Xie et.al. 2505.21391 translate read null
2025-05-27 EgoWalk: A Multimodal Dataset for Robot Navigation in the Wild Timur Akhtyamov et.al. 2505.21282 translate read null
2025-05-27 Data-Driven Cellular Mobility Management via Bayesian Optimization and Reinforcement Learning Mohamed Benzaghta et.al. 2505.21249 translate read null
2025-05-27 Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies Felix Chalumeau et.al. 2505.21236 translate read null
2025-05-26 FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities Jin Wang et.al. 2505.20147 translate read null
2025-05-26 MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning Yuanxin Zhuang et.al. 2505.20131 translate read null
2025-05-26 Proxy-Free GFlowNet Ruishuo Chen et.al. 2505.20110 translate read null
2025-05-26 Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning Ziyi Zhang et.al. 2505.20107 translate read null
2025-05-26 Adaptive Deep Reasoning: Triggering Deep Thinking When Needed Yunhao Wang et.al. 2505.20101 translate read null
2025-05-26 SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale Qi Li et.al. 2505.20094 translate read null
2025-05-26 Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback Mengdi Li et.al. 2505.20075 translate read null
2025-05-26 Incentivizing Reasoning from Weak Supervision Yige Yuan et.al. 2505.20072 translate read null
2025-05-26 SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety Geon-Hyeong Kim et.al. 2505.20065 translate read null
2025-05-26 REARANK: Reasoning Re-ranking Agent via Reinforcement Learning Le Zhang et.al. 2505.20046 translate read null
2025-05-23 One RL to See Them All: Visual Triple Unified Reinforcement Learning Yan Ma et.al. 2505.18129 translate read null
2025-05-23 Reward Model Overoptimisation in Iterated RLHF Lorenz Wolf et.al. 2505.18126 translate read null
2025-05-23 ProgRM: Build Better GUI Agents with Progress Rewards Danyang Zhang et.al. 2505.18121 translate read null
2025-05-23 Bridging Supervised Learning and Reinforcement Learning in Math Reasoning Huayu Chen et.al. 2505.18116 translate read null
2025-05-23 Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL Joey Hong et.al. 2505.18098 translate read null
2025-05-23 Stable Reinforcement Learning for Efficient Reasoning Muzhi Dai et.al. 2505.18086 translate read null
2025-05-23 What Do You Need for Diverse Trajectory Stitching in Diffusion Planning? Quentin Clark et.al. 2505.18083 translate read null
2025-05-23 Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals Jia-Nan Li et.al. 2505.18071 translate read null
2025-05-23 Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective Jintian Shao et.al. 2505.17997 translate read null
2025-05-23 Outcome-based Reinforcement Learning to Predict the Future Benjamin Turtel et.al. 2505.17989 translate read null
2025-05-22 GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Chengqi Duan et.al. 2505.17022 translate read link
2025-05-22 SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Kaixuan Fan et.al. 2505.17018 translate read link
2025-05-22 Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO Chengzhuo Tong et.al. 2505.17017 translate read link
2025-05-22 Interactive Post-Training for Vision-Language-Action Models Shuhan Tan et.al. 2505.17016 translate read null
2025-05-22 R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning Huatong Song et.al. 2505.17005 translate read link
2025-05-22 $\text{R}^2\text{ec}$ : Towards Large Recommender Models with Reasoning Runyang You et.al. 2505.16994 translate read link
2025-05-22 SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development Yaxin Du et.al. 2505.16975 translate read link
2025-05-22 Risk-Averse Reinforcement Learning with Itakura-Saito Loss Igor Udovichenko et.al. 2505.16925 translate read null
2025-05-22 LARES: Latent Reasoning for Sequential Recommendation Enze Liu et.al. 2505.16865 translate read null
2025-05-22 Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only Wei Xiao et.al. 2505.16856 translate read null
2025-05-21 GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents Yuqi Zhou et.al. 2505.15810 translate read link
2025-05-21 MMaDA: Multimodal Large Diffusion Language Models Ling Yang et.al. 2505.15809 translate read link
2025-05-21 STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs Zongzhao Li et.al. 2505.15804 translate read null
2025-05-21 VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models Yuchen Yan et.al. 2505.15801 translate read null
2025-05-21 Reverse Engineering Human Preferences with Reinforcement Learning Lisa Alazraki et.al. 2505.15795 translate read null
2025-05-21 HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving Zhiwen Chen et.al. 2505.15793 translate read null
2025-05-21 VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL Fengyuan Dai et.al. 2505.15791 translate read null
2025-05-21 ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning Changtai Zhu et.al. 2505.15776 translate read null
2025-05-21 Improving planning and MBRL with temporally-extended actions Palash Chatterjee et.al. 2505.15754 translate read null
2025-05-21 UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning Xiangyu Wang et.al. 2505.15725 translate read null
2025-05-20 Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning Haolei Xu et.al. 2505.14684 translate read link
2025-05-20 Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning Jiaer Xia et.al. 2505.14677 translate read link
2025-05-20 Reward Reasoning Model Jiaxin Guo et.al. 2505.14674 translate read null
2025-05-20 General-Reasoner: Advancing LLM Reasoning Across All Domains Xueguang Ma et.al. 2505.14652 translate read link
2025-05-20 Think Only When You Need with Large Hybrid-Reasoning Models Lingjie Jiang et.al. 2505.14631 translate read null
2025-05-20 TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Zhangchen Xu et.al. 2505.14625 translate read link
2025-05-20 Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning Wenbin Hu et.al. 2505.14585 translate read null
2025-05-20 Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning Deemah H. Tashman et.al. 2505.14581 translate read null
2025-05-20 KIPPO: Koopman-Inspired Proximal Policy Optimization Andrei Cozma et.al. 2505.14566 translate read null
2025-05-20 Bellman operator convergence enhancements in reinforcement learning algorithms David Krame Kadurha et.al. 2505.14564 translate read null
2025-05-19 Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards Xiaoyuan Liu et.al. 2505.13445 translate read link
2025-05-19 Optimizing Anytime Reasoning via Budget Relative Policy Optimization Penghui Qi et.al. 2505.13438 translate read link
2025-05-19 KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture R. James Cotton et.al. 2505.13436 translate read null
2025-05-19 G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning Liang Chen et.al. 2505.13426 translate read link
2025-05-20 A Dataless Reinforcement Learning Approach to Rounding Hyperplane Optimization for Max-Cut Gabriel Malikal et.al. 2505.13405 translate read null
2025-05-19 Thinkless: LLM Learns When to Think Gongfan Fang et.al. 2505.13379 translate read link
2025-05-19 Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning Irene Brugnara et.al. 2505.13372 translate read null
2025-05-19 J4R: Learning to Judge with Equivalent Initial State Group Relative Preference Optimization Austin Xu et.al. 2505.13346 translate read null
2025-05-19 Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems Babak Badnava et.al. 2505.13337 translate read null
2025-05-19 CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning Lei Sheng et.al. 2505.13271 translate read link
2025-05-16 SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics Lizhi Yang et.al. 2505.11494 translate read null
2025-05-16 Improving Assembly Code Performance with Large Language Models via Reinforcement Learning Anjiang Wei et.al. 2505.11480 translate read null
2025-05-16 Automatic Reward Shaping from Confounded Offline Data Mingxuan Li et.al. 2505.11478 translate read null
2025-05-16 HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages Zhilin Wang et.al. 2505.11475 translate read null
2025-05-16 Disentangling Reasoning and Knowledge in Medical Large Language Models Rahul Thapa et.al. 2505.11462 translate read null
2025-05-16 Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks Wesley A Suttle et.al. 2505.11461 translate read null
2025-05-16 Visual Planning: Let’s Think Only with Images Yi Xu et.al. 2505.11409 translate read link
2025-05-16 Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner Wenchuan Zhang et.al. 2505.11404 translate read link
2025-05-16 Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space Ali Rabiee et.al. 2505.11366 translate read null
2025-05-16 Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics Ardian Selmonaj et.al. 2505.11311 translate read null
2025-05-15 Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Zhiyuan Hu et.al. 2505.10554 translate read link
2025-05-15 Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation Xinrui Wang et.al. 2505.10522 translate read null
2025-05-15 Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning Andrea Baisero et.al. 2505.10484 translate read null
2025-05-15 Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps Ningyuan Yang et.al. 2505.10482 translate read null
2025-05-15 Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models Zemin Huang et.al. 2505.10446 translate read null
2025-05-15 IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning Dechen Gao et.al. 2505.10442 translate read null
2025-05-15 Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs Jingyao Wang et.al. 2505.10425 translate read null
2025-05-15 Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency Daniel Weitekamp et.al. 2505.10422 translate read null
2025-05-15 Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change Jonathan Clifford Balloch et.al. 2505.10330 translate read null
2025-05-15 J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning Chenxi Whitehouse et.al. 2505.10320 translate read null
2025-05-14 DataMIL: Selecting Data for Robot Imitation Learning with Datamodels Shivin Dass et.al. 2505.09603 translate read null
2025-05-14 Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware Justin Yu et.al. 2505.09601 translate read link
2025-05-14 VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation Chaofan Zhang et.al. 2505.09577 translate read null
2025-05-14 Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach Shannon Lodoen et.al. 2505.09576 translate read null
2025-05-14 Learning Long-Context Diffusion Policies via Past-Token Prediction Marcel Torne et.al. 2505.09561 translate read null
2025-05-14 WavReward: Spoken Dialogue Models With Generalist Reward Evaluators Shengpeng Ji et.al. 2505.09558 translate read link
2025-05-14 Distilling Realizable Students from Unrealizable Teachers Yujin Kim et.al. 2505.09546 translate read null
2025-05-14 Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data Rui Miao et.al. 2505.09496 translate read null
2025-05-14 Preserving Plasticity in Continual Learning with Adaptive Linearity Injection Seyed Roozbeh Razavi Rohani et.al. 2505.09486 translate read null
2025-05-14 Quantum state-agnostic work extraction (almost) without dissipation Josep Lumbreras et.al. 2505.09456 translate read null
2025-05-13 Generative Molecular Design with Steerable and Granular Synthesizability Control Jeff Guo et.al. 2505.08774 translate read null
2025-05-13 Preference Optimization for Combinatorial Optimization Problems Mingjun Pan et.al. 2505.08735 translate read null
2025-05-13 A Study of Data-driven Methods for Inventory Optimization Lee Yeung Ping et.al. 2505.08673 translate read null
2025-05-13 Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning Shuai Han et.al. 2505.08630 translate read null
2025-05-13 Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations Sarmad Mehrdad et.al. 2505.08619 translate read null
2025-05-13 OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Zhaochen Su et.al. 2505.08617 translate read link
2025-05-13 Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection Ayush K. Rai et.al. 2505.08561 translate read null
2025-05-13 Strategy-Augmented Planning for Large Language Models via Opponent Exploitation Shuai Xu et.al. 2505.08459 translate read null
2025-05-13 Zero-Shot Sim-to-Real Reinforcement Learning for Fruit Harvesting Emlyn Williams et.al. 2505.08458 translate read null
2025-05-13 Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges Miguel Arana-Catania et.al. 2505.08453 translate read null
2025-05-12 DanceGRPO: Unleashing GRPO on Visual Generation Zeyue Xue et.al. 2505.07818 translate read link
2025-05-12 A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values Daniel Beechey et.al. 2505.07797 translate read link
2025-05-12 MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering Rushi Qiang et.al. 2505.07782 translate read link
2025-05-12 Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving Xinji Mai et.al. 2505.07773 translate read link
2025-05-12 Guiding Data Collection via Factored Scaling Curves Lihan Zha et.al. 2505.07728 translate read link
2025-05-12 S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models Muzhi Dai et.al. 2505.07686 translate read null
2025-05-12 A comparative study of Bitcoin and Ripple cryptocurrencies trading using Deep Reinforcement Learning algorithms Dieu-Donne Fangnon et.al. 2505.07660 translate read null
2025-05-12 MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining Xiaomi LLM-Core Team et.al. 2505.07608 translate read link
2025-05-12 Multi-Objective Reinforcement Learning for Energy-Efficient Industrial Control Georg Schäfer et.al. 2505.07607 translate read null
2025-05-12 Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent Ziyang Huang et.al. 2505.07596 translate read link
2025-05-09 VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction Noah Frahm et.al. 2505.06219 translate read null
2025-05-09 Let Humanoids Hike! Integrative Skill Development on Complex Trails Kwan-Yee Lin et.al. 2505.06218 translate read null
2025-05-09 Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach Tim Schneider et.al. 2505.06182 translate read null
2025-05-09 Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning Haokun Yu et.al. 2505.06122 translate read null
2025-05-09 TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations Shuaiyi Huang et.al. 2505.06079 translate read null
2025-05-09 Safe-EF: Error Feedback for Nonsmooth Constrained Optimization Rustem Islamov et.al. 2505.06053 translate read null
2025-05-09 Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI Jianpeng Qi et.al. 2505.06025 translate read null
2025-05-09 Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models Lennart Stöpler et.al. 2505.05970 translate read null
2025-05-09 Offline Multi-agent Reinforcement Learning via Score Decomposition Dan Qiao et.al. 2505.05968 translate read null
2025-05-09 Learning Power Control Protocol for In-Factory 6G Subnetworks Uyoata E. Uyoata et.al. 2505.05967 translate read null
2025-05-08 Flow-GRPO: Training Flow Matching Models via Online RL Jie Liu et.al. 2505.05470 translate read link
2025-05-08 RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles Pouria Behnoudfar et.al. 2505.05452 translate read null
2025-05-08 Reasoning Models Don’t Always Say What They Think Yanda Chen et.al. 2505.05410 translate read null
2025-05-08 Repair Crew Routing for Infrastructure Network Restoration under Incomplete Information Subhojit Biswas et.al. 2505.05297 translate read null
2025-05-08 Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation Zechu Li et.al. 2505.05287 translate read null
2025-05-08 Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration Andreas Kontogiannis et.al. 2505.05262 translate read null
2025-05-08 High Altitude Platform-Based Caching and Multicasting for Rural Connectivity Yongqiang Zhang et.al. 2505.05251 translate read null
2025-05-08 Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation Luca Marzari et.al. 2505.05235 translate read null
2025-05-08 Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network Changxiang Wu et.al. 2505.05231 translate read null
2025-05-08 Multi-Objective Reinforcement Learning for Adaptive Personalized Autonomous Driving Hendrik Surmann et.al. 2505.05223 translate read null
2025-05-07 EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning Zhenghao Xing et.al. 2505.04623 translate read link
2025-05-07 Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation Abdulaziz Almuzairee et.al. 2505.04619 translate read null
2025-05-07 ZeroSearch: Incentivize the Search Capability of LLMs without Searching Hao Sun et.al. 2505.04588 translate read link
2025-05-07 Active Sampling for MRI-based Sequential Decision Making Yuning Du et.al. 2505.04586 translate read link
2025-05-07 Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions Stéphane Aroca-Ouellette et.al. 2505.04579 translate read null
2025-05-07 Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization Wenjun Cao et.al. 2505.04578 translate read null
2025-05-07 Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions Shanyu Han et.al. 2505.04553 translate read null
2025-05-07 A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance Axel Friedrich Wolter et.al. 2505.04494 translate read null
2025-05-07 RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation Jing Hu et.al. 2505.04424 translate read link
2025-05-07 A Heuristic-Integrated DRL Approach for Phase Optimization in Large-Scale RISs Wei Wang et.al. 2505.04401 translate read null
2025-05-06 AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control Jialong Li et.al. 2505.03738 translate read null
2025-05-06 Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning Dian Chen et.al. 2505.03721 translate read null
2025-05-06 Actor-Critics Can Achieve Optimal Sample Efficiency Kevin Tan et.al. 2505.03710 translate read null
2025-05-06 Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches Feiran Zhao et.al. 2505.03706 translate read null
2025-05-06 Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation Songchen Fu et.al. 2505.03586 translate read null
2025-05-06 Ergodic Generative Flows Leo Maxime Brunswic et.al. 2505.03561 translate read null
2025-05-06 Multi-Agent Reinforcement Learning Scheduling to Support Low Latency in Teleoperated Driving Giacomo Avanzi et.al. 2505.03558 translate read null
2025-05-06 Small-Scale-Fading-Aware Resource Allocation in Wireless Federated Learning Jiacheng Wang et.al. 2505.03533 translate read null
2025-05-06 The Steganographic Potentials of Language Models Artem Karpov et.al. 2505.03439 translate read null
2025-05-06 Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients Stefano Bruno et.al. 2505.03432 translate read null
2025-05-05 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Yi-Fan Zhang et.al. 2505.02835 translate read link
2025-05-05 TWIST: Teleoperated Whole-Body Imitation System Yanjie Ze et.al. 2505.02833 translate read null
2025-05-05 Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing Diji Yang et.al. 2505.02811 translate read link
2025-05-05 Teaching the social media generation: rethinking learning without sacrificing quality Sepinoud Azimi et.al. 2505.02770 translate read null
2025-05-05 The use of Artificial Intelligence for Intervention and Assessment in Individuals with ASD Aggeliki Sideraki et.al. 2505.02747 translate read null
2025-05-05 Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry Junu Kim et.al. 2505.02722 translate read link
2025-05-05 Graph Neural Network-Based Reinforcement Learning for Controlling Biological Networks: The GATTACA Framework Andrzej Mizera et.al. 2505.02712 translate read null
2025-05-05 Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models Xiaobao Wu et.al. 2505.02686 translate read link
2025-05-05 Online Phase Estimation of Human Oscillatory Motions using Deep Learning Antonio Grotta et.al. 2505.02668 translate read null
2025-05-05 A Survey on Progress in LLM Alignment from the Perspective of Reward Design Miaomiao Ji et.al. 2505.02666 translate read null
2025-05-02 FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research Yan Miao et.al. 2505.01383 translate read null
2025-05-02 Stabilizing Temporal Difference Learning via Implicit Stochastic Approximation Hwanwoo Kim et.al. 2505.01361 translate read null
2025-05-02 Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story Vincenzo De Paola et.al. 2505.01336 translate read null
2025-05-02 Integration of Multi-Mode Preference into Home Energy Management System Using Deep Reinforcement Learning Mohammed Sumayli et.al. 2505.01332 translate read null
2025-05-02 Exploring Equity of Climate Policies using Multi-Agent Multi-Objective Reinforcement Learning Palok Biswas et.al. 2505.01115 translate read null
2025-05-02 Multi-Objective Reinforcement Learning for Water Management Zuzanna Osika et.al. 2505.01094 translate read null
2025-05-02 Llama-Nemotron: Efficient Reasoning Models Akhiad Bercovich et.al. 2505.00949 translate read null
2025-05-01 Learning Neural Control Barrier Functions from Offline Data with Conservatism Ihab Tabbara et.al. 2505.00908 translate read null
2025-05-01 SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation Quang P. M. Pham et.al. 2505.00831 translate read null
2025-05-01 Constructing an Optimal Behavior Basis for the Option Keyboard Lucas N. Alegre et.al. 2505.00787 translate read null
2025-05-01 T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Dongzhi Jiang et.al. 2505.00703 translate read link
2025-05-01 Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions Chenggang Wang et.al. 2505.00671 translate read null
2025-05-01 Deep Reinforcement Learning for Urban Air Quality Management: Multi-Objective Optimization of Pollution Mitigation Booth Placement in Metropolitan Environments Kirtan Rajesh et.al. 2505.00668 translate read null
2025-05-01 Wasserstein Policy Optimization David Pfau et.al. 2505.00663 translate read null
2025-05-01 DeepCritic: Deliberate Critique with Large Language Models Wenkai Yang et.al. 2505.00662 translate read link
2025-05-02 100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Chong Zhang et.al. 2505.00551 translate read null
2025-05-01 Directly Forecasting Belief for Reinforcement Learning with Delays Qingyuan Wu et.al. 2505.00546 translate read null
2025-05-01 Emergence of Roles in Robotic Teams with Model Sharing and Limited Communication Ian O’Flynn et.al. 2505.00540 translate read null
2025-05-01 Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks Xinyu Wang et.al. 2505.00530 translate read null
2025-05-01 DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation Zixuan Chen et.al. 2505.00527 translate read null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)