Reinforcement Learning - 2025-06

Publish Date Title Authors PDF Translate Read Code
2025-06-30 Scaling Human Judgment in Community Notes with LLMs Haiwen Li et.al. 2506.24118 translate read null
2025-06-30 Constructing Non-Markovian Decision Process via History Aggregator Yongyi Wang et.al. 2506.24026 translate read null
2025-06-30 Provably Efficient and Agile Randomized Q-Learning He Wang et.al. 2506.24005 translate read null
2025-06-30 Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning Seungjun Yi et.al. 2506.23998 translate read null
2025-06-30 ADReFT: Adaptive Decision Repair for Safe Autonomous Driving via Reinforcement Fine-Tuning Mingfei Cheng et.al. 2506.23960 translate read null
2025-06-30 Reinforcement Learning for Synchronised Flow Control in a Dual-Gate Resin Infusion System Miguel Camacho-Sánchez et.al. 2506.23923 translate read null
2025-06-30 The Trilemma of Truth in Large Language Models Germans Savcisens et.al. 2506.23921 translate read link
2025-06-30 Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning Anton Andreychuk et.al. 2506.23793 translate read link
2025-06-27 MiCo: Multi-image Contrast for Reinforcement Visual Reasoning Xi Chen et.al. 2506.22434 translate read null
2025-06-27 ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks Pritam Dash et.al. 2506.22423 translate read null
2025-06-27 HyperCLOVA X THINK Technical Report NAVER Cloud HyperCLOVA X Team et.al. 2506.22403 translate read null
2025-06-27 Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL Tong Yang et.al. 2506.22401 translate read null
2025-06-27 Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation Tao Li et.al. 2506.22365 translate read null
2025-06-27 Education-Oriented Graph Retrieval-Augmented Generation for Learning Path Recommendation Xinghe Cheng et.al. 2506.22303 translate read null
2025-06-27 ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning Ming Zhao et.al. 2506.22216 translate read null
2025-06-27 A Reinforcement Learning Framework for Some Singular Stochastic Control Problems Zongxia Liang et.al. 2506.22203 translate read null
2025-06-27 EFRame: Deeper Reasoning via Exploration-Filtering-Replay Reinforcement Learning Framework Chen Wang et.al. 2506.22200 translate read link
2025-06-27 ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research Bavo Lesy et.al. 2506.22174 translate read null
2025-06-26 Joint Scheduling of DER under Demand Charges: Structure and Approximation Ruixiao Yang et.al. 2506.21510 translate read null
2025-06-26 Bridging Offline and Online Reinforcement Learning for LLMs Jack Lanchantin et.al. 2506.21495 translate read null
2025-06-26 Reinforcement Learning for Optimal Control of Spin Magnetometers Logan W. Cooke et.al. 2506.21475 translate read null
2025-06-26 Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage Gavin Lee Goodship et.al. 2506.21465 translate read null
2025-06-26 Spatial Mental Modeling from Limited Views Baiqiao Yin et.al. 2506.21458 translate read null
2025-06-26 Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning Prajwal Koirala et.al. 2506.21427 translate read null
2025-06-26 rQdia: Regularizing Q-Value Distributions With Image Augmentation Sam Lerman et.al. 2506.21367 translate read null
2025-06-26 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Qize Yang et.al. 2506.21277 translate read link
2025-06-26 World-aware Planning Narratives Enhance Large Vision-Language Model Planner Junhao Shi et.al. 2506.21230 translate read null
2025-06-26 Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design Hampus Gummesson Svensson et.al. 2506.21158 translate read null
2025-06-25 MMSearch-R1: Incentivizing LMMs to Search Jinming Wu et.al. 2506.20670 translate read link
2025-06-25 DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy Sungjae Park et.al. 2506.20668 translate read null
2025-06-25 The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind Andrei Lupu et.al. 2506.20664 translate read null
2025-06-25 DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation Shansan Gong et.al. 2506.20639 translate read link
2025-06-25 PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models Soufiane Hayou et.al. 2506.20629 translate read link
2025-06-25 Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control Andrew Mole et.al. 2506.20554 translate read null
2025-06-25 Demonstration of effective UCB-based routing in skill-based queues on real-world data Sanne van Kempen et.al. 2506.20543 translate read null
2025-06-25 Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards Charles Arnal et.al. 2506.20520 translate read null
2025-06-25 OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Zengzhi Wang et.al. 2506.20512 translate read link
2025-06-25 ReCode: Updating Code API Knowledge with Reinforcement Learning Haoze Wu et.al. 2506.20495 translate read link
2025-06-24 JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning Ai Han et.al. 2506.19846 translate read null
2025-06-24 Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning Guo Li et.al. 2506.19843 translate read null
2025-06-24 Persona Features Control Emergent Misalignment Miles Wang et.al. 2506.19823 translate read null
2025-06-24 KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Baochang Ren et.al. 2506.19807 translate read null
2025-06-24 Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning Menglong Zhang et.al. 2506.19785 translate read null
2025-06-24 SAGE: Strategy-Adaptive Generation Engine for Query Rewriting Teng Wang et.al. 2506.19783 translate read null
2025-06-24 Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment Yuhui Sun et.al. 2506.19780 translate read null
2025-06-24 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Yuqian Fu et.al. 2506.19767 translate read null
2025-06-24 Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks Nathan Maurer et.al. 2506.19703 translate read null
2025-06-24 From memories to maps: Mechanisms of in context reinforcement learning in transformers Ching Fang et.al. 2506.19686 translate read null
2025-06-23 ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Jiaru Zou et.al. 2506.18896 translate read null
2025-06-23 Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning Anthony Kobanda et.al. 2506.18847 translate read null
2025-06-23 LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Yuhao Wu et.al. 2506.18841 translate read null
2025-06-23 SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives Yizhou Chen et.al. 2506.18825 translate read null
2025-06-23 MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation Ruicheng Zhang et.al. 2506.18679 translate read null
2025-06-23 Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation Jingming Liu et.al. 2506.18670 translate read null
2025-06-23 RL-Driven Semantic Compression Model Selection and Resource Allocation in Semantic Communication Systems Xinyi Lin et.al. 2506.18660 translate read null
2025-06-23 Dual-level Behavioral Consistency for Inter-group and Intra-group Coordination in Multi-Agent Systems Shuocun Yang et.al. 2506.18651 translate read null
2025-06-23 Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits Yannik Mahlau et.al. 2506.18627 translate read null
2025-06-23 Policy gradient methods for ordinal policies Simón Weinberger et.al. 2506.18614 translate read null
2025-06-20 No Free Lunch: Rethinking Internal Feedback for LLM Reasoning Yanzhi Zhang et.al. 2506.17219 translate read null
2025-06-20 Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Zeyuan Yang et.al. 2506.17218 translate read null
2025-06-20 BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning Xuechen Zhang et.al. 2506.17211 translate read null
2025-06-20 Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning Guozheng Ma et.al. 2506.17204 translate read null
2025-06-20 Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity Samin Yeasar Arnob et.al. 2506.17155 translate read null
2025-06-20 When Can Model-Free Reinforcement Learning be Enough for Thinking? Josiah P. Hanna et.al. 2506.17124 translate read null
2025-06-20 TransDreamerV3: Implanting Transformer In DreamerV3 Shruti Sadanand Dongare et.al. 2506.17103 translate read null
2025-06-20 Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs Ricardo Rei et.al. 2506.17080 translate read null
2025-06-20 Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment Leizhen Wang et.al. 2506.17029 translate read null
2025-06-20 Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators Marco Jiralerspong et.al. 2506.17007 translate read null
2025-06-18 Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards Qingming Liu et.al. 2506.15684 translate read null
2025-06-18 CC-LEARN: Cohort-based Consistency Learning Xiao Ye et.al. 2506.15662 translate read null
2025-06-18 CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization Ranting Hu et.al. 2506.15654 translate read null
2025-06-18 AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning Tevin Wang et.al. 2506.15651 translate read null
2025-06-18 Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement Weixiang Zhao et.al. 2506.15647 translate read null
2025-06-18 Learning to flock in open space by avoiding collisions and staying together Martino Brambati et.al. 2506.15587 translate read null
2025-06-18 Design of an all-facet illuminator for high NA EUV lithography exposure tool based on deep reinforcement learning Tong Li et.al. 2506.15558 translate read null
2025-06-18 Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning Roger Creus Castanyer et.al. 2506.15544 translate read link
2025-06-18 Lessons from Training Grounded LLMs with Verifiable Rewards Shang Hong Sim et.al. 2506.15522 translate read null
2025-06-18 Zero-Shot Reinforcement Learning Under Partial Observability Scott Jeen et.al. 2506.15446 translate read null
2025-06-17 Reasoning with Exploration: An Entropy Perspective Daixuan Cheng et.al. 2506.14758 translate read null
2025-06-17 Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation Carolina Higuera et.al. 2506.14754 translate read null
2025-06-17 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Ring Team et.al. 2506.14731 translate read null
2025-06-17 Adaptive Accompaniment with ReaLchords Yusong Wu et.al. 2506.14723 translate read null
2025-06-17 SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning Hexian Ni et.al. 2506.14648 translate read null
2025-06-17 On Quantum BSDE Solver for High-Dimensional Parabolic PDEs Howard Su et.al. 2506.14612 translate read null
2025-06-17 TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization Mingkang Zhu et.al. 2506.14574 translate read null
2025-06-17 Toward Safety-First Human-Like Decision Making for Autonomous Vehicles in Time-Varying Traffic Flow Xiao Wang et.al. 2506.14502 translate read null
2025-06-17 Zeroth-Order Optimization is Secretly Single-Step Policy Optimization Junbin Qiu et.al. 2506.14460 translate read null
2025-06-17 Toward Rich Video Human-Motion2D Generation Ruihao Xi et.al. 2506.14428 translate read null
2025-06-16 Touch begins where vision ends: Generalizable policies for contact-rich manipulation Zifan Zhao et.al. 2506.13762 translate read null
2025-06-16 MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering Arya Fayyazi et.al. 2506.13755 translate read null
2025-06-16 LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction Haoru Xue et.al. 2506.13751 translate read null
2025-06-16 PB $^2$ : Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning Brahim Driss et.al. 2506.13741 translate read null
2025-06-16 TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning Junru Zhang et.al. 2506.13705 translate read link
2025-06-16 Value-Free Policy Optimization via Reward Partitioning Bilal Faye et.al. 2506.13702 translate read null
2025-06-16 OneRec Technical Report Guorui Zhou et.al. 2506.13695 translate read null
2025-06-16 Meta-learning how to Share Credit among Macro-Actions Ionel-Alexandru Hosu et.al. 2506.13690 translate read null
2025-06-16 The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning Jiashun Liu et.al. 2506.13672 translate read null
2025-06-16 We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems Junfeng Fang et.al. 2506.13666 translate read null
2025-06-13 Schema-R1: A reasoning training approach for schema linking in Text-to-SQL Task Wuzhenghong Wen et.al. 2506.11986 translate read null
2025-06-13 Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks Ankit Bhardwaj et.al. 2506.11973 translate read null
2025-06-13 Visual Pre-Training on Unlabeled Images using Reinforcement Learning Dibya Ghosh et.al. 2506.11967 translate read null
2025-06-13 Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning Mohammadamin Moradi et.al. 2506.11957 translate read null
2025-06-13 SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies Nadun Ranawaka Arachchige et.al. 2506.11948 translate read null
2025-06-13 Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations Miguel Suau et.al. 2506.11912 translate read null
2025-06-13 Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients Chapa Sirithunge et.al. 2506.11906 translate read null
2025-06-13 TreeRL: LLM Reinforcement Learning with On-Policy Tree Search Zhenyu Hou et.al. 2506.11902 translate read link
2025-06-13 An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing Haochen Sun et.al. 2506.11882 translate read null
2025-06-13 LLM-based Dynamic Differential Testing for Database Connectors with Reinforcement Learning-Guided Prompt Selection Ce Lyu et.al. 2506.11870 translate read null
2025-06-12 Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop Justin Kerr et.al. 2506.10968 translate read null
2025-06-12 Spurious Rewards: Rethinking Training Signals in RLVR Rulin Shao et.al. 2506.10947 translate read link
2025-06-12 Self-Adapting Language Models Adam Zweiger et.al. 2506.10943 translate read null
2025-06-12 Magistral Mistral-AI et.al. 2506.10910 translate read null
2025-06-12 Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning Waylon Luo et.al. 2506.10889 translate read null
2025-06-12 Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization Pierre-François Massiani et.al. 2506.10871 translate read null
2025-06-13 Joint Beamforming with Extremely Large Scale RIS: A Sequential Multi-Agent A2C Approach Zhi Chai et.al. 2506.10815 translate read null
2025-06-12 Human-Robot Navigation using Event-based Cameras and Reinforcement Learning Ignacio Bugueno-Cordova et.al. 2506.10790 translate read null
2025-06-12 PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework SiXiang Chen et.al. 2506.10741 translate read link
2025-06-12 Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs Yucong Luo et.al. 2506.10630 translate read null
2025-06-11 Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing Junfei Wu et.al. 2506.09965 translate read link
2025-06-11 VerIF: Verification Engineering for Reinforcement Learning in Instruction Following Hao Peng et.al. 2506.09942 translate read link
2025-06-11 The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability Jiachen Hu et.al. 2506.09940 translate read null
2025-06-11 From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models Irving Fang et.al. 2506.09930 translate read link
2025-06-11 “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) Noel Brindise et.al. 2506.09901 translate read null
2025-06-11 Hierarchical Learning-Enhanced MPC for Safe Crowd Navigation with Heterogeneous Constraints Huajian Liu et.al. 2506.09859 translate read null
2025-06-11 Foundation Model-Aided Deep Reinforcement Learning for RIS-Assisted Wireless Communication Mohammad Ghassemi et.al. 2506.09855 translate read null
2025-06-11 CoRT: Code-integrated Reasoning within Thinking Chengpeng Li et.al. 2506.09820 translate read link
2025-06-11 Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy Tonghe Wang et.al. 2506.09805 translate read null
2025-06-11 Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving Haochen Liu et.al. 2506.09800 translate read null
2025-06-09 Play to Generalize: Learning to Reason Through Game Play Yunfei Xie et.al. 2506.08011 translate read link
2025-06-09 Reinforcement Pre-Training Qingxiu Dong et.al. 2506.08007 translate read null
2025-06-09 Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator Alberto Bazán-Guillén et.al. 2506.07980 translate read null
2025-06-09 Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction Junhong Shen et.al. 2506.07976 translate read link
2025-06-09 A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle Amirreza Yasami et.al. 2506.07929 translate read null
2025-06-09 LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement Dimitris Panagopoulos et.al. 2506.07915 translate read null
2025-06-09 WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning Jie Yang et.al. 2506.07905 translate read link
2025-06-09 MiniCPM4: Ultra-Efficient LLMs on End Devices MiniCPM Team et.al. 2506.07900 translate read link
2025-06-09 Diffusion-RL for Scalable Resource Allocation for 6G Networks Salar Nouri et.al. 2506.07880 translate read null
2025-06-09 Versatile Loco-Manipulation through Flexible Interlimb Coordination Xinghao Zhu et.al. 2506.07876 translate read null
2025-06-06 Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens Jihwan Jeong et.al. 2506.06261 translate read null
2025-06-06 How to craft a deep reinforcement learning policy for wind farm flow control Elie Kadoche et.al. 2506.06204 translate read null
2025-06-06 Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization Jonathan Yang et.al. 2506.06196 translate read null
2025-06-06 A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization Muhammed Ustaomeroglu et.al. 2506.06179 translate read null
2025-06-06 Reusing Trajectories in Policy Gradients Enables Fast Convergence Alessandro Montenegro et.al. 2506.06178 translate read null
2025-06-06 Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach James Ford et.al. 2506.06175 translate read null
2025-06-06 Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models Rihui Jin et.al. 2506.06137 translate read null
2025-06-06 Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library Weixun Wang et.al. 2506.06122 translate read link
2025-06-06 On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems Elim Kwan et.al. 2506.06094 translate read null
2025-06-06 Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning Atharv Kulkarni et.al. 2506.06093 translate read null
2025-06-05 ContentV: Efficient Training of Video Generation Models with Limited Compute Wenfeng Lin et.al. 2506.05343 translate read null
2025-06-05 AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Lidong Lu et.al. 2506.05328 translate read link
2025-06-05 Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay Yifan Sun et.al. 2506.05316 translate read null
2025-06-05 Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s Ramesh Johari et.al. 2506.05308 translate read null
2025-06-05 A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$ : Robust Imitation via Learning to Search Arnav Kumar Jain et.al. 2506.05294 translate read link
2025-06-06 Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning Violet Xiang et.al. 2506.05256 translate read null
2025-06-05 Towards Language-Augmented Multi-Agent Deep Reinforcement Learning Maxime Toquebiau et.al. 2506.05236 translate read null
2025-06-05 Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning Yuhua Zhu et.al. 2506.05208 translate read null
2025-06-05 TreeRPO: Tree Relative Policy Optimization Zhicheng Yang et.al. 2506.05183 translate read link
2025-06-05 Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning Yunsheng Tian et.al. 2506.05168 translate read null
2025-06-04 Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Shuang Chen et.al. 2506.04207 translate read link
2025-06-04 MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures Elena Zamaraeva et.al. 2506.04195 translate read null
2025-06-04 R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning Qingfei Zhao et.al. 2506.04185 translate read link
2025-06-04 Horizon Reduction Makes RL Scalable Seohong Park et.al. 2506.04168 translate read null
2025-06-04 SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL Jiaheng Hu et.al. 2506.04147 translate read null
2025-06-04 Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning Muling Wu et.al. 2506.04065 translate read null
2025-06-04 Crowd-SFT: Crowdsourcing for LLM Alignment Alex Sotiropoulos et.al. 2506.04063 translate read null
2025-06-04 Autonomous Vehicle Lateral Control Using Deep Reinforcement Learning with MPC-PID Demonstration Chengdong Wu et.al. 2506.04040 translate read null
2025-06-04 Interpretability by Design for Efficient Multi-Objective Reinforcement Learning Qiyue Xia et.al. 2506.04022 translate read null
2025-06-04 Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning Xunzhu Tang et.al. 2506.03921 translate read null
2025-06-03 Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning Yinjie Wang et.al. 2506.03136 translate read link
2025-06-03 AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation Prashanth Vijayaraghavan et.al. 2506.03122 translate read null
2025-06-03 Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback Xiaoying Zhang et.al. 2506.03106 translate read link
2025-06-03 EgoVLM: Policy Optimization for Egocentric Video Understanding Ashwin Vinod et.al. 2506.03097 translate read link
2025-06-03 DPO Learning with LLMs-Judge Signal for Computer Use Agents Man Luo et.al. 2506.03095 translate read null
2025-06-03 Provable Reinforcement Learning from Human Feedback with an Unknown Link Function Qining Zhang et.al. 2506.03066 translate read null
2025-06-03 EDEN: Entorhinal Driven Egocentric Navigation Toward Robotic Deployment Mikolaj Walczak et.al. 2506.03046 translate read null
2025-06-03 Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective Jintian Shao et.al. 2506.03038 translate read null
2025-06-03 MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver Yuepeng Zheng et.al. 2506.02935 translate read null
2025-06-03 Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning Yin Fang et.al. 2506.02911 translate read link
2025-06-03 Reinforcing Video Reasoning with Focused Thinking Jisheng Dang et.al. 2505.24718 translate read link

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)