Reinforcement Learning - 2025-05

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-05-30	ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL	Yu Zhang et.al.	2505.24875	translate	read	null
2025-05-30	ProxyThinker: Test-Time Guidance through Small Visual Reasoners	Zilin Xiao et.al.	2505.24872	translate	read	null
2025-05-30	MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning	Yiqing Liang et.al.	2505.24871	translate	read	null
2025-05-30	ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models	Mingjie Liu et.al.	2505.24864	translate	read	null
2025-05-30	MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning	Jingyan Shen et.al.	2505.24846	translate	read	null
2025-05-30	AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models	Conor Heins et.al.	2505.24784	translate	read	null
2025-05-30	Diffusion-Based Symbolic Regression	Zachary Bastiani et.al.	2505.24776	translate	read	null
2025-05-30	REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards	Zafir Stojanovski et.al.	2505.24760	translate	read	link
2025-05-30	Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning	Shelly Bensal et.al.	2505.24726	translate	read	null
2025-05-29	ZeroGUI: Automating Online GUI Learning at Zero Human Cost	Chenyu Yang et.al.	2505.23762	translate	read	link
2025-05-29	DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning	Ziyin Zhang et.al.	2505.23754	translate	read	link
2025-05-29	PixelThink: Towards Efficient Chain-of-Pixel Reasoning	Song Wang et.al.	2505.23727	translate	read	null
2025-05-29	ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering	Zexi Liu et.al.	2505.23723	translate	read	link
2025-05-29	AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning	Lucas N. Alegre et.al.	2505.23708	translate	read	null
2025-05-29	Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability	Ruida Wang et.al.	2505.23703	translate	read	null
2025-05-29	Grounded Reinforcement Learning for Visual Reasoning	Gabriel Sarch et.al.	2505.23678	translate	read	null
2025-05-29	Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models	Lang Cao et.al.	2505.23667	translate	read	null
2025-05-29	AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction	Niklas Freymuth et.al.	2505.23663	translate	read	link
2025-05-29	Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation	Hongxiang Zhang et.al.	2505.23657	translate	read	null
2025-05-28	Maximizing Confidence Alone Improves Reasoning	Mihir Prabhudesai et.al.	2505.22660	translate	read	null
2025-05-28	The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason	Ang Lv et.al.	2505.22653	translate	read	null
2025-05-28	WebDancer: Towards Autonomous Information Seeking Agency	Jialong Wu et.al.	2505.22648	translate	read	null
2025-05-28	FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control	Younggyo Seo et.al.	2505.22642	translate	read	null
2025-05-28	SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning	Yu Zhang et.al.	2505.22626	translate	read	null
2025-05-28	The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models	Ganqu Cui et.al.	2505.22617	translate	read	null
2025-05-28	HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym	Ngoc La et.al.	2505.22597	translate	read	null
2025-05-28	SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning	Jiaqi Huang et.al.	2505.22596	translate	read	null
2025-05-28	Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs	Changhao Song et.al.	2505.22548	translate	read	null
2025-05-28	Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation	Hongyi Zhou et.al.	2505.22492	translate	read	null
2025-05-27	Reinforcing General Reasoning without Verifiers	Xiangxin Zhou et.al.	2505.21493	translate	read	null
2025-05-27	Policy Optimized Text-to-Image Pipeline Design	Uri Gadot et.al.	2505.21478	translate	read	null
2025-05-27	Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO	Muzhi Zhu et.al.	2505.21457	translate	read	null
2025-05-27	Can Large Reasoning Models Self-Train?	Sheikh Shafayat et.al.	2505.21444	translate	read	null
2025-05-27	A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment	Brett Bissey et.al.	2505.21414	translate	read	null
2025-05-27	MRSD: Multi-Resolution Skill Discovery for HRL Agents	Shashank Sharma et.al.	2505.21410	translate	read	null
2025-05-27	Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features	Zixuan Xie et.al.	2505.21391	translate	read	null
2025-05-27	EgoWalk: A Multimodal Dataset for Robot Navigation in the Wild	Timur Akhtyamov et.al.	2505.21282	translate	read	null
2025-05-27	Data-Driven Cellular Mobility Management via Bayesian Optimization and Reinforcement Learning	Mohamed Benzaghta et.al.	2505.21249	translate	read	null
2025-05-27	Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies	Felix Chalumeau et.al.	2505.21236	translate	read	null
2025-05-26	FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities	Jin Wang et.al.	2505.20147	translate	read	null
2025-05-26	MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning	Yuanxin Zhuang et.al.	2505.20131	translate	read	null
2025-05-26	Proxy-Free GFlowNet	Ruishuo Chen et.al.	2505.20110	translate	read	null
2025-05-26	Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning	Ziyi Zhang et.al.	2505.20107	translate	read	null
2025-05-26	Adaptive Deep Reasoning: Triggering Deep Thinking When Needed	Yunhao Wang et.al.	2505.20101	translate	read	null
2025-05-26	SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale	Qi Li et.al.	2505.20094	translate	read	null
2025-05-26	Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback	Mengdi Li et.al.	2505.20075	translate	read	null
2025-05-26	Incentivizing Reasoning from Weak Supervision	Yige Yuan et.al.	2505.20072	translate	read	null
2025-05-26	SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety	Geon-Hyeong Kim et.al.	2505.20065	translate	read	null
2025-05-26	REARANK: Reasoning Re-ranking Agent via Reinforcement Learning	Le Zhang et.al.	2505.20046	translate	read	null
2025-05-23	One RL to See Them All: Visual Triple Unified Reinforcement Learning	Yan Ma et.al.	2505.18129	translate	read	null
2025-05-23	Reward Model Overoptimisation in Iterated RLHF	Lorenz Wolf et.al.	2505.18126	translate	read	null
2025-05-23	ProgRM: Build Better GUI Agents with Progress Rewards	Danyang Zhang et.al.	2505.18121	translate	read	null
2025-05-23	Bridging Supervised Learning and Reinforcement Learning in Math Reasoning	Huayu Chen et.al.	2505.18116	translate	read	null
2025-05-23	Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL	Joey Hong et.al.	2505.18098	translate	read	null
2025-05-23	Stable Reinforcement Learning for Efficient Reasoning	Muzhi Dai et.al.	2505.18086	translate	read	null
2025-05-23	What Do You Need for Diverse Trajectory Stitching in Diffusion Planning?	Quentin Clark et.al.	2505.18083	translate	read	null
2025-05-23	Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals	Jia-Nan Li et.al.	2505.18071	translate	read	null
2025-05-23	Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective	Jintian Shao et.al.	2505.17997	translate	read	null
2025-05-23	Outcome-based Reinforcement Learning to Predict the Future	Benjamin Turtel et.al.	2505.17989	translate	read	null
2025-05-22	GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning	Chengqi Duan et.al.	2505.17022	translate	read	link
2025-05-22	SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward	Kaixuan Fan et.al.	2505.17018	translate	read	link
2025-05-22	Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO	Chengzhuo Tong et.al.	2505.17017	translate	read	link
2025-05-22	Interactive Post-Training for Vision-Language-Action Models	Shuhan Tan et.al.	2505.17016	translate	read	null
2025-05-22	R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning	Huatong Song et.al.	2505.17005	translate	read	link
2025-05-22	$\text{R}^2\text{ec}$ : Towards Large Recommender Models with Reasoning	Runyang You et.al.	2505.16994	translate	read	link
2025-05-22	SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development	Yaxin Du et.al.	2505.16975	translate	read	link
2025-05-22	Risk-Averse Reinforcement Learning with Itakura-Saito Loss	Igor Udovichenko et.al.	2505.16925	translate	read	null
2025-05-22	LARES: Latent Reasoning for Sequential Recommendation	Enze Liu et.al.	2505.16865	translate	read	null
2025-05-22	Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only	Wei Xiao et.al.	2505.16856	translate	read	null
2025-05-21	GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents	Yuqi Zhou et.al.	2505.15810	translate	read	link
2025-05-21	MMaDA: Multimodal Large Diffusion Language Models	Ling Yang et.al.	2505.15809	translate	read	link
2025-05-21	STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs	Zongzhao Li et.al.	2505.15804	translate	read	null
2025-05-21	VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models	Yuchen Yan et.al.	2505.15801	translate	read	null
2025-05-21	Reverse Engineering Human Preferences with Reinforcement Learning	Lisa Alazraki et.al.	2505.15795	translate	read	null
2025-05-21	HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving	Zhiwen Chen et.al.	2505.15793	translate	read	null
2025-05-21	VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL	Fengyuan Dai et.al.	2505.15791	translate	read	null
2025-05-21	ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning	Changtai Zhu et.al.	2505.15776	translate	read	null
2025-05-21	Improving planning and MBRL with temporally-extended actions	Palash Chatterjee et.al.	2505.15754	translate	read	null
2025-05-21	UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning	Xiangyu Wang et.al.	2505.15725	translate	read	null
2025-05-20	Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning	Haolei Xu et.al.	2505.14684	translate	read	link
2025-05-20	Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning	Jiaer Xia et.al.	2505.14677	translate	read	link
2025-05-20	Reward Reasoning Model	Jiaxin Guo et.al.	2505.14674	translate	read	null
2025-05-20	General-Reasoner: Advancing LLM Reasoning Across All Domains	Xueguang Ma et.al.	2505.14652	translate	read	link
2025-05-20	Think Only When You Need with Large Hybrid-Reasoning Models	Lingjie Jiang et.al.	2505.14631	translate	read	null
2025-05-20	TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning	Zhangchen Xu et.al.	2505.14625	translate	read	link
2025-05-20	Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning	Wenbin Hu et.al.	2505.14585	translate	read	null
2025-05-20	Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning	Deemah H. Tashman et.al.	2505.14581	translate	read	null
2025-05-20	KIPPO: Koopman-Inspired Proximal Policy Optimization	Andrei Cozma et.al.	2505.14566	translate	read	null
2025-05-20	Bellman operator convergence enhancements in reinforcement learning algorithms	David Krame Kadurha et.al.	2505.14564	translate	read	null
2025-05-19	Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards	Xiaoyuan Liu et.al.	2505.13445	translate	read	link
2025-05-19	Optimizing Anytime Reasoning via Budget Relative Policy Optimization	Penghui Qi et.al.	2505.13438	translate	read	link
2025-05-19	KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture	R. James Cotton et.al.	2505.13436	translate	read	null
2025-05-19	G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning	Liang Chen et.al.	2505.13426	translate	read	link
2025-05-20	A Dataless Reinforcement Learning Approach to Rounding Hyperplane Optimization for Max-Cut	Gabriel Malikal et.al.	2505.13405	translate	read	null
2025-05-19	Thinkless: LLM Learns When to Think	Gongfan Fang et.al.	2505.13379	translate	read	link
2025-05-19	Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning	Irene Brugnara et.al.	2505.13372	translate	read	null
2025-05-19	J4R: Learning to Judge with Equivalent Initial State Group Relative Preference Optimization	Austin Xu et.al.	2505.13346	translate	read	null
2025-05-19	Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems	Babak Badnava et.al.	2505.13337	translate	read	null
2025-05-19	CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning	Lei Sheng et.al.	2505.13271	translate	read	link
2025-05-16	SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics	Lizhi Yang et.al.	2505.11494	translate	read	null
2025-05-16	Improving Assembly Code Performance with Large Language Models via Reinforcement Learning	Anjiang Wei et.al.	2505.11480	translate	read	null
2025-05-16	Automatic Reward Shaping from Confounded Offline Data	Mingxuan Li et.al.	2505.11478	translate	read	null
2025-05-16	HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages	Zhilin Wang et.al.	2505.11475	translate	read	null
2025-05-16	Disentangling Reasoning and Knowledge in Medical Large Language Models	Rahul Thapa et.al.	2505.11462	translate	read	null
2025-05-16	Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks	Wesley A Suttle et.al.	2505.11461	translate	read	null
2025-05-16	Visual Planning: Let’s Think Only with Images	Yi Xu et.al.	2505.11409	translate	read	link
2025-05-16	Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner	Wenchuan Zhang et.al.	2505.11404	translate	read	link
2025-05-16	Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space	Ali Rabiee et.al.	2505.11366	translate	read	null
2025-05-16	Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics	Ardian Selmonaj et.al.	2505.11311	translate	read	null
2025-05-15	Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models	Zhiyuan Hu et.al.	2505.10554	translate	read	link
2025-05-15	Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation	Xinrui Wang et.al.	2505.10522	translate	read	null
2025-05-15	Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning	Andrea Baisero et.al.	2505.10484	translate	read	null
2025-05-15	Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps	Ningyuan Yang et.al.	2505.10482	translate	read	null
2025-05-15	Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	Zemin Huang et.al.	2505.10446	translate	read	null
2025-05-15	IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning	Dechen Gao et.al.	2505.10442	translate	read	null
2025-05-15	Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs	Jingyao Wang et.al.	2505.10425	translate	read	null
2025-05-15	Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency	Daniel Weitekamp et.al.	2505.10422	translate	read	null
2025-05-15	Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change	Jonathan Clifford Balloch et.al.	2505.10330	translate	read	null
2025-05-15	J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning	Chenxi Whitehouse et.al.	2505.10320	translate	read	null
2025-05-14	DataMIL: Selecting Data for Robot Imitation Learning with Datamodels	Shivin Dass et.al.	2505.09603	translate	read	null
2025-05-14	Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware	Justin Yu et.al.	2505.09601	translate	read	link
2025-05-14	VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation	Chaofan Zhang et.al.	2505.09577	translate	read	null
2025-05-14	Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach	Shannon Lodoen et.al.	2505.09576	translate	read	null
2025-05-14	Learning Long-Context Diffusion Policies via Past-Token Prediction	Marcel Torne et.al.	2505.09561	translate	read	null
2025-05-14	WavReward: Spoken Dialogue Models With Generalist Reward Evaluators	Shengpeng Ji et.al.	2505.09558	translate	read	link
2025-05-14	Distilling Realizable Students from Unrealizable Teachers	Yujin Kim et.al.	2505.09546	translate	read	null
2025-05-14	Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data	Rui Miao et.al.	2505.09496	translate	read	null
2025-05-14	Preserving Plasticity in Continual Learning with Adaptive Linearity Injection	Seyed Roozbeh Razavi Rohani et.al.	2505.09486	translate	read	null
2025-05-14	Quantum state-agnostic work extraction (almost) without dissipation	Josep Lumbreras et.al.	2505.09456	translate	read	null
2025-05-13	Generative Molecular Design with Steerable and Granular Synthesizability Control	Jeff Guo et.al.	2505.08774	translate	read	null
2025-05-13	Preference Optimization for Combinatorial Optimization Problems	Mingjun Pan et.al.	2505.08735	translate	read	null
2025-05-13	A Study of Data-driven Methods for Inventory Optimization	Lee Yeung Ping et.al.	2505.08673	translate	read	null
2025-05-13	Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning	Shuai Han et.al.	2505.08630	translate	read	null
2025-05-13	Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations	Sarmad Mehrdad et.al.	2505.08619	translate	read	null
2025-05-13	OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning	Zhaochen Su et.al.	2505.08617	translate	read	link
2025-05-13	Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection	Ayush K. Rai et.al.	2505.08561	translate	read	null
2025-05-13	Strategy-Augmented Planning for Large Language Models via Opponent Exploitation	Shuai Xu et.al.	2505.08459	translate	read	null
2025-05-13	Zero-Shot Sim-to-Real Reinforcement Learning for Fruit Harvesting	Emlyn Williams et.al.	2505.08458	translate	read	null
2025-05-13	Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges	Miguel Arana-Catania et.al.	2505.08453	translate	read	null
2025-05-12	DanceGRPO: Unleashing GRPO on Visual Generation	Zeyue Xue et.al.	2505.07818	translate	read	link
2025-05-12	A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values	Daniel Beechey et.al.	2505.07797	translate	read	link
2025-05-12	MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering	Rushi Qiang et.al.	2505.07782	translate	read	link
2025-05-12	Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving	Xinji Mai et.al.	2505.07773	translate	read	link
2025-05-12	Guiding Data Collection via Factored Scaling Curves	Lihan Zha et.al.	2505.07728	translate	read	link
2025-05-12	S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models	Muzhi Dai et.al.	2505.07686	translate	read	null
2025-05-12	A comparative study of Bitcoin and Ripple cryptocurrencies trading using Deep Reinforcement Learning algorithms	Dieu-Donne Fangnon et.al.	2505.07660	translate	read	null
2025-05-12	MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining	Xiaomi LLM-Core Team et.al.	2505.07608	translate	read	link
2025-05-12	Multi-Objective Reinforcement Learning for Energy-Efficient Industrial Control	Georg Schäfer et.al.	2505.07607	translate	read	null
2025-05-12	Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent	Ziyang Huang et.al.	2505.07596	translate	read	link
2025-05-09	VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction	Noah Frahm et.al.	2505.06219	translate	read	null
2025-05-09	Let Humanoids Hike! Integrative Skill Development on Complex Trails	Kwan-Yee Lin et.al.	2505.06218	translate	read	null
2025-05-09	Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach	Tim Schneider et.al.	2505.06182	translate	read	null
2025-05-09	Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning	Haokun Yu et.al.	2505.06122	translate	read	null
2025-05-09	TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations	Shuaiyi Huang et.al.	2505.06079	translate	read	null
2025-05-09	Safe-EF: Error Feedback for Nonsmooth Constrained Optimization	Rustem Islamov et.al.	2505.06053	translate	read	null
2025-05-09	Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI	Jianpeng Qi et.al.	2505.06025	translate	read	null
2025-05-09	Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models	Lennart Stöpler et.al.	2505.05970	translate	read	null
2025-05-09	Offline Multi-agent Reinforcement Learning via Score Decomposition	Dan Qiao et.al.	2505.05968	translate	read	null
2025-05-09	Learning Power Control Protocol for In-Factory 6G Subnetworks	Uyoata E. Uyoata et.al.	2505.05967	translate	read	null
2025-05-08	Flow-GRPO: Training Flow Matching Models via Online RL	Jie Liu et.al.	2505.05470	translate	read	link
2025-05-08	RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles	Pouria Behnoudfar et.al.	2505.05452	translate	read	null
2025-05-08	Reasoning Models Don’t Always Say What They Think	Yanda Chen et.al.	2505.05410	translate	read	null
2025-05-08	Repair Crew Routing for Infrastructure Network Restoration under Incomplete Information	Subhojit Biswas et.al.	2505.05297	translate	read	null
2025-05-08	Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation	Zechu Li et.al.	2505.05287	translate	read	null
2025-05-08	Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration	Andreas Kontogiannis et.al.	2505.05262	translate	read	null
2025-05-08	High Altitude Platform-Based Caching and Multicasting for Rural Connectivity	Yongqiang Zhang et.al.	2505.05251	translate	read	null
2025-05-08	Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation	Luca Marzari et.al.	2505.05235	translate	read	null
2025-05-08	Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network	Changxiang Wu et.al.	2505.05231	translate	read	null
2025-05-08	Multi-Objective Reinforcement Learning for Adaptive Personalized Autonomous Driving	Hendrik Surmann et.al.	2505.05223	translate	read	null
2025-05-07	EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning	Zhenghao Xing et.al.	2505.04623	translate	read	link
2025-05-07	Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation	Abdulaziz Almuzairee et.al.	2505.04619	translate	read	null
2025-05-07	ZeroSearch: Incentivize the Search Capability of LLMs without Searching	Hao Sun et.al.	2505.04588	translate	read	link
2025-05-07	Active Sampling for MRI-based Sequential Decision Making	Yuning Du et.al.	2505.04586	translate	read	link
2025-05-07	Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions	Stéphane Aroca-Ouellette et.al.	2505.04579	translate	read	null
2025-05-07	Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization	Wenjun Cao et.al.	2505.04578	translate	read	null
2025-05-07	Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions	Shanyu Han et.al.	2505.04553	translate	read	null
2025-05-07	A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance	Axel Friedrich Wolter et.al.	2505.04494	translate	read	null
2025-05-07	RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation	Jing Hu et.al.	2505.04424	translate	read	link
2025-05-07	A Heuristic-Integrated DRL Approach for Phase Optimization in Large-Scale RISs	Wei Wang et.al.	2505.04401	translate	read	null
2025-05-06	AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control	Jialong Li et.al.	2505.03738	translate	read	null
2025-05-06	Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning	Dian Chen et.al.	2505.03721	translate	read	null
2025-05-06	Actor-Critics Can Achieve Optimal Sample Efficiency	Kevin Tan et.al.	2505.03710	translate	read	null
2025-05-06	Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches	Feiran Zhao et.al.	2505.03706	translate	read	null
2025-05-06	Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation	Songchen Fu et.al.	2505.03586	translate	read	null
2025-05-06	Ergodic Generative Flows	Leo Maxime Brunswic et.al.	2505.03561	translate	read	null
2025-05-06	Multi-Agent Reinforcement Learning Scheduling to Support Low Latency in Teleoperated Driving	Giacomo Avanzi et.al.	2505.03558	translate	read	null
2025-05-06	Small-Scale-Fading-Aware Resource Allocation in Wireless Federated Learning	Jiacheng Wang et.al.	2505.03533	translate	read	null
2025-05-06	The Steganographic Potentials of Language Models	Artem Karpov et.al.	2505.03439	translate	read	null
2025-05-06	Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients	Stefano Bruno et.al.	2505.03432	translate	read	null
2025-05-05	R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning	Yi-Fan Zhang et.al.	2505.02835	translate	read	link
2025-05-05	TWIST: Teleoperated Whole-Body Imitation System	Yanjie Ze et.al.	2505.02833	translate	read	null
2025-05-05	Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing	Diji Yang et.al.	2505.02811	translate	read	link
2025-05-05	Teaching the social media generation: rethinking learning without sacrificing quality	Sepinoud Azimi et.al.	2505.02770	translate	read	null
2025-05-05	The use of Artificial Intelligence for Intervention and Assessment in Individuals with ASD	Aggeliki Sideraki et.al.	2505.02747	translate	read	null
2025-05-05	Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry	Junu Kim et.al.	2505.02722	translate	read	link
2025-05-05	Graph Neural Network-Based Reinforcement Learning for Controlling Biological Networks: The GATTACA Framework	Andrzej Mizera et.al.	2505.02712	translate	read	null
2025-05-05	Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models	Xiaobao Wu et.al.	2505.02686	translate	read	link
2025-05-05	Online Phase Estimation of Human Oscillatory Motions using Deep Learning	Antonio Grotta et.al.	2505.02668	translate	read	null
2025-05-05	A Survey on Progress in LLM Alignment from the Perspective of Reward Design	Miaomiao Ji et.al.	2505.02666	translate	read	null
2025-05-02	FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research	Yan Miao et.al.	2505.01383	translate	read	null
2025-05-02	Stabilizing Temporal Difference Learning via Implicit Stochastic Approximation	Hwanwoo Kim et.al.	2505.01361	translate	read	null
2025-05-02	Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story	Vincenzo De Paola et.al.	2505.01336	translate	read	null
2025-05-02	Integration of Multi-Mode Preference into Home Energy Management System Using Deep Reinforcement Learning	Mohammed Sumayli et.al.	2505.01332	translate	read	null
2025-05-02	Exploring Equity of Climate Policies using Multi-Agent Multi-Objective Reinforcement Learning	Palok Biswas et.al.	2505.01115	translate	read	null
2025-05-02	Multi-Objective Reinforcement Learning for Water Management	Zuzanna Osika et.al.	2505.01094	translate	read	null
2025-05-02	Llama-Nemotron: Efficient Reasoning Models	Akhiad Bercovich et.al.	2505.00949	translate	read	null
2025-05-01	Learning Neural Control Barrier Functions from Offline Data with Conservatism	Ihab Tabbara et.al.	2505.00908	translate	read	null
2025-05-01	SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation	Quang P. M. Pham et.al.	2505.00831	translate	read	null
2025-05-01	Constructing an Optimal Behavior Basis for the Option Keyboard	Lucas N. Alegre et.al.	2505.00787	translate	read	null
2025-05-01	T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT	Dongzhi Jiang et.al.	2505.00703	translate	read	link
2025-05-01	Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions	Chenggang Wang et.al.	2505.00671	translate	read	null
2025-05-01	Deep Reinforcement Learning for Urban Air Quality Management: Multi-Objective Optimization of Pollution Mitigation Booth Placement in Metropolitan Environments	Kirtan Rajesh et.al.	2505.00668	translate	read	null
2025-05-01	Wasserstein Policy Optimization	David Pfau et.al.	2505.00663	translate	read	null
2025-05-01	DeepCritic: Deliberate Critique with Large Language Models	Wenkai Yang et.al.	2505.00662	translate	read	link
2025-05-02	100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models	Chong Zhang et.al.	2505.00551	translate	read	null
2025-05-01	Directly Forecasting Belief for Reinforcement Learning with Delays	Qingyuan Wu et.al.	2505.00546	translate	read	null
2025-05-01	Emergence of Roles in Robotic Teams with Model Sharing and Limited Communication	Ian O’Flynn et.al.	2505.00540	translate	read	null
2025-05-01	Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks	Xinyu Wang et.al.	2505.00530	translate	read	null
2025-05-01	DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation	Zixuan Chen et.al.	2505.00527	translate	read	null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)