Reinforcement Learning - 2025-06

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-06-30	Scaling Human Judgment in Community Notes with LLMs	Haiwen Li et.al.	2506.24118	translate	read	null
2025-06-30	Constructing Non-Markovian Decision Process via History Aggregator	Yongyi Wang et.al.	2506.24026	translate	read	null
2025-06-30	Provably Efficient and Agile Randomized Q-Learning	He Wang et.al.	2506.24005	translate	read	null
2025-06-30	Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning	Seungjun Yi et.al.	2506.23998	translate	read	null
2025-06-30	ADReFT: Adaptive Decision Repair for Safe Autonomous Driving via Reinforcement Fine-Tuning	Mingfei Cheng et.al.	2506.23960	translate	read	null
2025-06-30	Reinforcement Learning for Synchronised Flow Control in a Dual-Gate Resin Infusion System	Miguel Camacho-Sánchez et.al.	2506.23923	translate	read	null
2025-06-30	The Trilemma of Truth in Large Language Models	Germans Savcisens et.al.	2506.23921	translate	read	link
2025-06-30	Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning	Anton Andreychuk et.al.	2506.23793	translate	read	link
2025-06-27	MiCo: Multi-image Contrast for Reinforcement Visual Reasoning	Xi Chen et.al.	2506.22434	translate	read	null
2025-06-27	ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks	Pritam Dash et.al.	2506.22423	translate	read	null
2025-06-27	HyperCLOVA X THINK Technical Report	NAVER Cloud HyperCLOVA X Team et.al.	2506.22403	translate	read	null
2025-06-27	Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL	Tong Yang et.al.	2506.22401	translate	read	null
2025-06-27	Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation	Tao Li et.al.	2506.22365	translate	read	null
2025-06-27	Education-Oriented Graph Retrieval-Augmented Generation for Learning Path Recommendation	Xinghe Cheng et.al.	2506.22303	translate	read	null
2025-06-27	ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning	Ming Zhao et.al.	2506.22216	translate	read	null
2025-06-27	A Reinforcement Learning Framework for Some Singular Stochastic Control Problems	Zongxia Liang et.al.	2506.22203	translate	read	null
2025-06-27	EFRame: Deeper Reasoning via Exploration-Filtering-Replay Reinforcement Learning Framework	Chen Wang et.al.	2506.22200	translate	read	link
2025-06-27	ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research	Bavo Lesy et.al.	2506.22174	translate	read	null
2025-06-26	Joint Scheduling of DER under Demand Charges: Structure and Approximation	Ruixiao Yang et.al.	2506.21510	translate	read	null
2025-06-26	Bridging Offline and Online Reinforcement Learning for LLMs	Jack Lanchantin et.al.	2506.21495	translate	read	null
2025-06-26	Reinforcement Learning for Optimal Control of Spin Magnetometers	Logan W. Cooke et.al.	2506.21475	translate	read	null
2025-06-26	Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage	Gavin Lee Goodship et.al.	2506.21465	translate	read	null
2025-06-26	Spatial Mental Modeling from Limited Views	Baiqiao Yin et.al.	2506.21458	translate	read	null
2025-06-26	Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning	Prajwal Koirala et.al.	2506.21427	translate	read	null
2025-06-26	rQdia: Regularizing Q-Value Distributions With Image Augmentation	Sam Lerman et.al.	2506.21367	translate	read	null
2025-06-26	HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context	Qize Yang et.al.	2506.21277	translate	read	link
2025-06-26	World-aware Planning Narratives Enhance Large Vision-Language Model Planner	Junhao Shi et.al.	2506.21230	translate	read	null
2025-06-26	Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design	Hampus Gummesson Svensson et.al.	2506.21158	translate	read	null
2025-06-25	MMSearch-R1: Incentivizing LMMs to Search	Jinming Wu et.al.	2506.20670	translate	read	link
2025-06-25	DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy	Sungjae Park et.al.	2506.20668	translate	read	null
2025-06-25	The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind	Andrei Lupu et.al.	2506.20664	translate	read	null
2025-06-25	DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation	Shansan Gong et.al.	2506.20639	translate	read	link
2025-06-25	PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models	Soufiane Hayou et.al.	2506.20629	translate	read	link
2025-06-25	Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control	Andrew Mole et.al.	2506.20554	translate	read	null
2025-06-25	Demonstration of effective UCB-based routing in skill-based queues on real-world data	Sanne van Kempen et.al.	2506.20543	translate	read	null
2025-06-25	Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards	Charles Arnal et.al.	2506.20520	translate	read	null
2025-06-25	OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling	Zengzhi Wang et.al.	2506.20512	translate	read	link
2025-06-25	ReCode: Updating Code API Knowledge with Reinforcement Learning	Haoze Wu et.al.	2506.20495	translate	read	link
2025-06-24	JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning	Ai Han et.al.	2506.19846	translate	read	null
2025-06-24	Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning	Guo Li et.al.	2506.19843	translate	read	null
2025-06-24	Persona Features Control Emergent Misalignment	Miles Wang et.al.	2506.19823	translate	read	null
2025-06-24	KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality	Baochang Ren et.al.	2506.19807	translate	read	null
2025-06-24	Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning	Menglong Zhang et.al.	2506.19785	translate	read	null
2025-06-24	SAGE: Strategy-Adaptive Generation Engine for Query Rewriting	Teng Wang et.al.	2506.19783	translate	read	null
2025-06-24	Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment	Yuhui Sun et.al.	2506.19780	translate	read	null
2025-06-24	SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning	Yuqian Fu et.al.	2506.19767	translate	read	null
2025-06-24	Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks	Nathan Maurer et.al.	2506.19703	translate	read	null
2025-06-24	From memories to maps: Mechanisms of in context reinforcement learning in transformers	Ching Fang et.al.	2506.19686	translate	read	null
2025-06-23	ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs	Jiaru Zou et.al.	2506.18896	translate	read	null
2025-06-23	Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning	Anthony Kobanda et.al.	2506.18847	translate	read	null
2025-06-23	LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning	Yuhao Wu et.al.	2506.18841	translate	read	null
2025-06-23	SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives	Yizhou Chen et.al.	2506.18825	translate	read	null
2025-06-23	MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation	Ruicheng Zhang et.al.	2506.18679	translate	read	null
2025-06-23	Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation	Jingming Liu et.al.	2506.18670	translate	read	null
2025-06-23	RL-Driven Semantic Compression Model Selection and Resource Allocation in Semantic Communication Systems	Xinyi Lin et.al.	2506.18660	translate	read	null
2025-06-23	Dual-level Behavioral Consistency for Inter-group and Intra-group Coordination in Multi-Agent Systems	Shuocun Yang et.al.	2506.18651	translate	read	null
2025-06-23	Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits	Yannik Mahlau et.al.	2506.18627	translate	read	null
2025-06-23	Policy gradient methods for ordinal policies	Simón Weinberger et.al.	2506.18614	translate	read	null
2025-06-20	No Free Lunch: Rethinking Internal Feedback for LLM Reasoning	Yanzhi Zhang et.al.	2506.17219	translate	read	null
2025-06-20	Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens	Zeyuan Yang et.al.	2506.17218	translate	read	null
2025-06-20	BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning	Xuechen Zhang et.al.	2506.17211	translate	read	null
2025-06-20	Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning	Guozheng Ma et.al.	2506.17204	translate	read	null
2025-06-20	Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity	Samin Yeasar Arnob et.al.	2506.17155	translate	read	null
2025-06-20	When Can Model-Free Reinforcement Learning be Enough for Thinking?	Josiah P. Hanna et.al.	2506.17124	translate	read	null
2025-06-20	TransDreamerV3: Implanting Transformer In DreamerV3	Shruti Sadanand Dongare et.al.	2506.17103	translate	read	null
2025-06-20	Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs	Ricardo Rei et.al.	2506.17080	translate	read	null
2025-06-20	Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment	Leizhen Wang et.al.	2506.17029	translate	read	null
2025-06-20	Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators	Marco Jiralerspong et.al.	2506.17007	translate	read	null
2025-06-18	Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards	Qingming Liu et.al.	2506.15684	translate	read	null
2025-06-18	CC-LEARN: Cohort-based Consistency Learning	Xiao Ye et.al.	2506.15662	translate	read	null
2025-06-18	CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization	Ranting Hu et.al.	2506.15654	translate	read	null
2025-06-18	AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning	Tevin Wang et.al.	2506.15651	translate	read	null
2025-06-18	Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement	Weixiang Zhao et.al.	2506.15647	translate	read	null
2025-06-18	Learning to flock in open space by avoiding collisions and staying together	Martino Brambati et.al.	2506.15587	translate	read	null
2025-06-18	Design of an all-facet illuminator for high NA EUV lithography exposure tool based on deep reinforcement learning	Tong Li et.al.	2506.15558	translate	read	null
2025-06-18	Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning	Roger Creus Castanyer et.al.	2506.15544	translate	read	link
2025-06-18	Lessons from Training Grounded LLMs with Verifiable Rewards	Shang Hong Sim et.al.	2506.15522	translate	read	null
2025-06-18	Zero-Shot Reinforcement Learning Under Partial Observability	Scott Jeen et.al.	2506.15446	translate	read	null
2025-06-17	Reasoning with Exploration: An Entropy Perspective	Daixuan Cheng et.al.	2506.14758	translate	read	null
2025-06-17	Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation	Carolina Higuera et.al.	2506.14754	translate	read	null
2025-06-17	Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs	Ring Team et.al.	2506.14731	translate	read	null
2025-06-17	Adaptive Accompaniment with ReaLchords	Yusong Wu et.al.	2506.14723	translate	read	null
2025-06-17	SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning	Hexian Ni et.al.	2506.14648	translate	read	null
2025-06-17	On Quantum BSDE Solver for High-Dimensional Parabolic PDEs	Howard Su et.al.	2506.14612	translate	read	null
2025-06-17	TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization	Mingkang Zhu et.al.	2506.14574	translate	read	null
2025-06-17	Toward Safety-First Human-Like Decision Making for Autonomous Vehicles in Time-Varying Traffic Flow	Xiao Wang et.al.	2506.14502	translate	read	null
2025-06-17	Zeroth-Order Optimization is Secretly Single-Step Policy Optimization	Junbin Qiu et.al.	2506.14460	translate	read	null
2025-06-17	Toward Rich Video Human-Motion2D Generation	Ruihao Xi et.al.	2506.14428	translate	read	null
2025-06-16	Touch begins where vision ends: Generalizable policies for contact-rich manipulation	Zifan Zhao et.al.	2506.13762	translate	read	null
2025-06-16	MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering	Arya Fayyazi et.al.	2506.13755	translate	read	null
2025-06-16	LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction	Haoru Xue et.al.	2506.13751	translate	read	null
2025-06-16	PB $^2$ : Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning	Brahim Driss et.al.	2506.13741	translate	read	null
2025-06-16	TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning	Junru Zhang et.al.	2506.13705	translate	read	link
2025-06-16	Value-Free Policy Optimization via Reward Partitioning	Bilal Faye et.al.	2506.13702	translate	read	null
2025-06-16	OneRec Technical Report	Guorui Zhou et.al.	2506.13695	translate	read	null
2025-06-16	Meta-learning how to Share Credit among Macro-Actions	Ionel-Alexandru Hosu et.al.	2506.13690	translate	read	null
2025-06-16	The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning	Jiashun Liu et.al.	2506.13672	translate	read	null
2025-06-16	We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems	Junfeng Fang et.al.	2506.13666	translate	read	null
2025-06-13	Schema-R1: A reasoning training approach for schema linking in Text-to-SQL Task	Wuzhenghong Wen et.al.	2506.11986	translate	read	null
2025-06-13	Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks	Ankit Bhardwaj et.al.	2506.11973	translate	read	null
2025-06-13	Visual Pre-Training on Unlabeled Images using Reinforcement Learning	Dibya Ghosh et.al.	2506.11967	translate	read	null
2025-06-13	Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning	Mohammadamin Moradi et.al.	2506.11957	translate	read	null
2025-06-13	SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies	Nadun Ranawaka Arachchige et.al.	2506.11948	translate	read	null
2025-06-13	Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations	Miguel Suau et.al.	2506.11912	translate	read	null
2025-06-13	Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients	Chapa Sirithunge et.al.	2506.11906	translate	read	null
2025-06-13	TreeRL: LLM Reinforcement Learning with On-Policy Tree Search	Zhenyu Hou et.al.	2506.11902	translate	read	link
2025-06-13	An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing	Haochen Sun et.al.	2506.11882	translate	read	null
2025-06-13	LLM-based Dynamic Differential Testing for Database Connectors with Reinforcement Learning-Guided Prompt Selection	Ce Lyu et.al.	2506.11870	translate	read	null
2025-06-12	Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop	Justin Kerr et.al.	2506.10968	translate	read	null
2025-06-12	Spurious Rewards: Rethinking Training Signals in RLVR	Rulin Shao et.al.	2506.10947	translate	read	link
2025-06-12	Self-Adapting Language Models	Adam Zweiger et.al.	2506.10943	translate	read	null
2025-06-12	Magistral	Mistral-AI et.al.	2506.10910	translate	read	null
2025-06-12	Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning	Waylon Luo et.al.	2506.10889	translate	read	null
2025-06-12	Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization	Pierre-François Massiani et.al.	2506.10871	translate	read	null
2025-06-13	Joint Beamforming with Extremely Large Scale RIS: A Sequential Multi-Agent A2C Approach	Zhi Chai et.al.	2506.10815	translate	read	null
2025-06-12	Human-Robot Navigation using Event-based Cameras and Reinforcement Learning	Ignacio Bugueno-Cordova et.al.	2506.10790	translate	read	null
2025-06-12	PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework	SiXiang Chen et.al.	2506.10741	translate	read	link
2025-06-12	Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs	Yucong Luo et.al.	2506.10630	translate	read	null
2025-06-11	Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing	Junfei Wu et.al.	2506.09965	translate	read	link
2025-06-11	VerIF: Verification Engineering for Reinforcement Learning in Instruction Following	Hao Peng et.al.	2506.09942	translate	read	link
2025-06-11	The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability	Jiachen Hu et.al.	2506.09940	translate	read	null
2025-06-11	From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models	Irving Fang et.al.	2506.09930	translate	read	link
2025-06-11	“What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)	Noel Brindise et.al.	2506.09901	translate	read	null
2025-06-11	Hierarchical Learning-Enhanced MPC for Safe Crowd Navigation with Heterogeneous Constraints	Huajian Liu et.al.	2506.09859	translate	read	null
2025-06-11	Foundation Model-Aided Deep Reinforcement Learning for RIS-Assisted Wireless Communication	Mohammad Ghassemi et.al.	2506.09855	translate	read	null
2025-06-11	CoRT: Code-integrated Reasoning within Thinking	Chengpeng Li et.al.	2506.09820	translate	read	link
2025-06-11	Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy	Tonghe Wang et.al.	2506.09805	translate	read	null
2025-06-11	Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving	Haochen Liu et.al.	2506.09800	translate	read	null
2025-06-09	Play to Generalize: Learning to Reason Through Game Play	Yunfei Xie et.al.	2506.08011	translate	read	link
2025-06-09	Reinforcement Pre-Training	Qingxiu Dong et.al.	2506.08007	translate	read	null
2025-06-09	Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator	Alberto Bazán-Guillén et.al.	2506.07980	translate	read	null
2025-06-09	Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction	Junhong Shen et.al.	2506.07976	translate	read	link
2025-06-09	A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle	Amirreza Yasami et.al.	2506.07929	translate	read	null
2025-06-09	LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement	Dimitris Panagopoulos et.al.	2506.07915	translate	read	null
2025-06-09	WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning	Jie Yang et.al.	2506.07905	translate	read	link
2025-06-09	MiniCPM4: Ultra-Efficient LLMs on End Devices	MiniCPM Team et.al.	2506.07900	translate	read	link
2025-06-09	Diffusion-RL for Scalable Resource Allocation for 6G Networks	Salar Nouri et.al.	2506.07880	translate	read	null
2025-06-09	Versatile Loco-Manipulation through Flexible Interlimb Coordination	Xinghao Zhu et.al.	2506.07876	translate	read	null
2025-06-06	Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens	Jihwan Jeong et.al.	2506.06261	translate	read	null
2025-06-06	How to craft a deep reinforcement learning policy for wind farm flow control	Elie Kadoche et.al.	2506.06204	translate	read	null
2025-06-06	Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization	Jonathan Yang et.al.	2506.06196	translate	read	null
2025-06-06	A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization	Muhammed Ustaomeroglu et.al.	2506.06179	translate	read	null
2025-06-06	Reusing Trajectories in Policy Gradients Enables Fast Convergence	Alessandro Montenegro et.al.	2506.06178	translate	read	null
2025-06-06	Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach	James Ford et.al.	2506.06175	translate	read	null
2025-06-06	Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models	Rihui Jin et.al.	2506.06137	translate	read	null
2025-06-06	Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library	Weixun Wang et.al.	2506.06122	translate	read	link
2025-06-06	On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems	Elim Kwan et.al.	2506.06094	translate	read	null
2025-06-06	Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning	Atharv Kulkarni et.al.	2506.06093	translate	read	null
2025-06-05	ContentV: Efficient Training of Video Generation Models with Limited Compute	Wenfeng Lin et.al.	2506.05343	translate	read	null
2025-06-05	AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Lidong Lu et.al.	2506.05328	translate	read	link
2025-06-05	Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay	Yifan Sun et.al.	2506.05316	translate	read	null
2025-06-05	Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s	Ramesh Johari et.al.	2506.05308	translate	read	null
2025-06-05	A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$ : Robust Imitation via Learning to Search	Arnav Kumar Jain et.al.	2506.05294	translate	read	link
2025-06-06	Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning	Violet Xiang et.al.	2506.05256	translate	read	null
2025-06-05	Towards Language-Augmented Multi-Agent Deep Reinforcement Learning	Maxime Toquebiau et.al.	2506.05236	translate	read	null
2025-06-05	Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning	Yuhua Zhu et.al.	2506.05208	translate	read	null
2025-06-05	TreeRPO: Tree Relative Policy Optimization	Zhicheng Yang et.al.	2506.05183	translate	read	link
2025-06-05	Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning	Yunsheng Tian et.al.	2506.05168	translate	read	null
2025-06-04	Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning	Shuang Chen et.al.	2506.04207	translate	read	link
2025-06-04	MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures	Elena Zamaraeva et.al.	2506.04195	translate	read	null
2025-06-04	R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning	Qingfei Zhao et.al.	2506.04185	translate	read	link
2025-06-04	Horizon Reduction Makes RL Scalable	Seohong Park et.al.	2506.04168	translate	read	null
2025-06-04	SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL	Jiaheng Hu et.al.	2506.04147	translate	read	null
2025-06-04	Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning	Muling Wu et.al.	2506.04065	translate	read	null
2025-06-04	Crowd-SFT: Crowdsourcing for LLM Alignment	Alex Sotiropoulos et.al.	2506.04063	translate	read	null
2025-06-04	Autonomous Vehicle Lateral Control Using Deep Reinforcement Learning with MPC-PID Demonstration	Chengdong Wu et.al.	2506.04040	translate	read	null
2025-06-04	Interpretability by Design for Efficient Multi-Objective Reinforcement Learning	Qiyue Xia et.al.	2506.04022	translate	read	null
2025-06-04	Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning	Xunzhu Tang et.al.	2506.03921	translate	read	null
2025-06-03	Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning	Yinjie Wang et.al.	2506.03136	translate	read	link
2025-06-03	AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation	Prashanth Vijayaraghavan et.al.	2506.03122	translate	read	null
2025-06-03	Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback	Xiaoying Zhang et.al.	2506.03106	translate	read	link
2025-06-03	EgoVLM: Policy Optimization for Egocentric Video Understanding	Ashwin Vinod et.al.	2506.03097	translate	read	link
2025-06-03	DPO Learning with LLMs-Judge Signal for Computer Use Agents	Man Luo et.al.	2506.03095	translate	read	null
2025-06-03	Provable Reinforcement Learning from Human Feedback with an Unknown Link Function	Qining Zhang et.al.	2506.03066	translate	read	null
2025-06-03	EDEN: Entorhinal Driven Egocentric Navigation Toward Robotic Deployment	Mikolaj Walczak et.al.	2506.03046	translate	read	null
2025-06-03	Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective	Jintian Shao et.al.	2506.03038	translate	read	null
2025-06-03	MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver	Yuepeng Zheng et.al.	2506.02935	translate	read	null
2025-06-03	Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning	Yin Fang et.al.	2506.02911	translate	read	link
2025-06-03	Reinforcing Video Reasoning with Focused Thinking	Jisheng Dang et.al.	2505.24718	translate	read	link

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)