Reinforcement Learning - 2025-07

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-07-29	Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics	Leonard Hinckeldey et.al.	2507.21638	translate	read	null
2025-07-23	Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains	Anisha Gunjal et.al.	2507.17746	translate	read	null
2025-07-23	Megrez2 Technical Report	Boxun Li et.al.	2507.17728	translate	read	null
2025-07-23	How Should We Meta-Learn Reinforcement Learning Algorithms?	Alexander David Goldie et.al.	2507.17668	translate	read	null
2025-07-23	CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning	Lingxiao Tang et.al.	2507.17548	translate	read	null
2025-07-23	Generalized Advantage Estimation for Distributional Policy Gradients	Shahil Shaik et.al.	2507.17530	translate	read	null
2025-07-23	Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice	Shanbo Cheng et.al.	2507.17527	translate	read	null
2025-07-23	URPO: A Unified Reward & Policy Optimization Framework for Large Language Models	Songshuo Lu et.al.	2507.17515	translate	read	null
2025-07-23	Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning	Yu Li et.al.	2507.17512	translate	read	null
2025-07-23	ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents	Chang Nie et.al.	2507.17462	translate	read	null
2025-07-23	Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning	Situo Zhang et.al.	2507.17448	translate	read	null
2025-07-22	Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning	Junhao Shen et.al.	2507.16814	translate	read	null
2025-07-22	Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty	Mehul Damani et.al.	2507.16806	translate	read	null
2025-07-22	Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning	Mian Ibad Ali Shah et.al.	2507.16796	translate	read	null
2025-07-22	Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning	Ang Li et.al.	2507.16746	translate	read	link
2025-07-23	Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints	Zhenyun Yin et.al.	2507.16727	translate	read	null
2025-07-22	Adaptive Inventory Strategies using Deep Reinforcement Learning for Dynamic Agri-Food Supply Chains	Amandeep Kaur et.al.	2507.16670	translate	read	null
2025-07-22	FOGNITE: Federated Learning-Enhanced Fog-Cloud Architecture	Somayeh Sobati-M et.al.	2507.16668	translate	read	null
2025-07-22	Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis	Sara Giordano et.al.	2507.16641	translate	read	null
2025-07-22	Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems	Ali Mohamed Ali et.al.	2507.16635	translate	read	null
2025-07-22	Step-Audio 2 Technical Report	Boyong Wu et.al.	2507.16632	translate	read	link
2025-07-21	The Impact of Language Mixing on Bilingual LLM Reasoning	Yihao Li et.al.	2507.15849	translate	read	null
2025-07-21	GUI-G $^2$ : Gaussian Reward Modeling for GUI Grounding	Fei Tang et.al.	2507.15846	translate	read	link
2025-07-22	Hierarchical Budget Policy Optimization for Adaptive Reasoning	Shangke Lyu et.al.	2507.15844	translate	read	link
2025-07-21	LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra	Seth Karten et.al.	2507.15815	translate	read	link
2025-07-21	Power-Constrained Policy Gradient Methods for LQR	Ashwin Verma et.al.	2507.15806	translate	read	null
2025-07-21	Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning	Sneheel Sarangi et.al.	2507.15788	translate	read	null
2025-07-21	Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR	Jiakang Wang et.al.	2507.15778	translate	read	link
2025-07-21	LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization	Xingyu Wu et.al.	2507.15758	translate	read	link
2025-07-21	EMP: Executable Motion Prior for Humanoid Robot Standing Upper-body Motion Imitation	Haocheng Xu et.al.	2507.15649	translate	read	null
2025-07-21	Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training	Kailai Yang et.al.	2507.15640	translate	read	null
2025-07-18	CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning	Xiaoya Li et.al.	2507.14111	translate	read	link
2025-07-18	Preference-based Multi-Objective Reinforcement Learning	Ni Mu et.al.	2507.14066	translate	read	null
2025-07-18	Reframing attention as a reinforcement learning problem for causal discovery	Turan Orujlu et.al.	2507.13920	translate	read	null
2025-07-18	Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments	Kathrin Korte et.al.	2507.13846	translate	read	null
2025-07-18	Scalable Submodular Policy Optimization via Pruned Submodularity Graph	Aditi Anand et.al.	2507.13834	translate	read	null
2025-07-18	DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training	Zhixin Wang et.al.	2507.13833	translate	read	null
2025-07-18	Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery	Joydeep Chandra et.al.	2507.13757	translate	read	null
2025-07-18	LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction	Jing Chang et.al.	2507.13712	translate	read	null
2025-07-18	CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation	Jing Chang et.al.	2507.13710	translate	read	null
2025-07-18	State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions	Sen Lu et.al.	2507.13638	translate	read	null
2025-07-17	VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning	Senqiao Yang et.al.	2507.13348	translate	read	link
2025-07-17	The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner	Zhouqi Hua et.al.	2507.13332	translate	read	null
2025-07-17	Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour	Emma M. A. Harrison et.al.	2507.13277	translate	read	null
2025-07-17	QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation	Jiazheng Li et.al.	2507.13266	translate	read	null
2025-07-17	Signal Temporal Logic Compliant Co-design of Planning and Control	Manas Sashank Juvvi et.al.	2507.13225	translate	read	null
2025-07-17	Spectral Bellman Method: Unifying Representation and Exploration in RL	Ofir Nabati et.al.	2507.13181	translate	read	null
2025-07-17	Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback	Suzie Kim et.al.	2507.13171	translate	read	null
2025-07-17	Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities	Hao Sun et.al.	2507.13158	translate	read	null
2025-07-17	From Roots to Rewards: Dynamic Tree Reasoning with RL	Ahmed Bahloul et.al.	2507.13142	translate	read	null
2025-07-17	ZipMPC: Compressed Context-Dependent MPC Cost via Imitation Learning	Rahel Rickenbach et.al.	2507.13088	translate	read	null
2025-07-16	EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos	Ruihan Yang et.al.	2507.12440	translate	read	null
2025-07-16	Improving Reinforcement Learning Sample-Efficiency using Local Approximation	Mohit Prashant et.al.	2507.12383	translate	read	null
2025-07-16	Thought Purity: Defense Paradigm For Chain-of-Thought Attack	Zihao Xue et.al.	2507.12314	translate	read	null
2025-07-16	Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning	Yuhao Chen et.al.	2507.12215	translate	read	null
2025-07-16	BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search	Azhar Ikhtiarudin et.al.	2507.12189	translate	read	link
2025-07-17	Efficient Preparation of Fermionic Superfluids in an Optical Dipole Trap through Reinforcement Learning	Yueyang Min et.al.	2507.12152	translate	read	null
2025-07-16	Topology Enhanced MARL for Multi-Vehicle Cooperative Decision-Making of CAVs	Ye Han et.al.	2507.12110	translate	read	null
2025-07-16	Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics	Muleilan Pei et.al.	2507.12083	translate	read	null
2025-07-16	Towards Ultra-Reliable 6G in-X Subnetworks: Dynamic Link Adaptation by Deep Reinforcement Learning	Fateme Salehi et.al.	2507.12031	translate	read	null
2025-07-16	QAS-QTNs: Curriculum Reinforcement Learning-Driven Quantum Architecture Search for Quantum Tensor Networks	Siddhant Dutta et.al.	2507.12013	translate	read	null
2025-07-15	Robot Drummer: Learning Rhythmic Skills for Humanoid Drumming	Asad Ali Shahid et.al.	2507.11498	translate	read	null
2025-07-15	Exploring the robustness of TractOracle methods in RL-based tractography	Jeremi Levesque et.al.	2507.11486	translate	read	null
2025-07-15	Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light	Mani Hamidi et.al.	2507.11482	translate	read	null
2025-07-15	Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs	Gabriel Bo et.al.	2507.11371	translate	read	null
2025-07-15	Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning	Daniel Tanneberg et.al.	2507.11367	translate	read	null
2025-07-15	Sensing Accuracy Optimization for Multi-UAV SAR Interferometry with Data Offloading	Mohamed-Amine Lahmeri et.al.	2507.11284	translate	read	null
2025-07-15	Ocean Diviner: A Diffusion-Augmented Reinforcement Learning for AUV Robust Control in the Underwater Tasks	Weiyi Liu et.al.	2507.11283	translate	read	null
2025-07-15	Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound	Tal Fiskus et.al.	2507.11269	translate	read	null
2025-07-15	Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction	Deepak Kumar Panda et.al.	2507.11173	translate	read	null
2025-07-15	Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities	Yiting Qu et.al.	2507.11155	translate	read	null
2025-07-14	EmbRACE-3K: Embodied Reasoning and Action in Complex Environments	Mingxian Lin et.al.	2507.10548	translate	read	link
2025-07-14	Disentangling Neural Disjunctive Normal Form Models	Kexin Gu Baugh et.al.	2507.10546	translate	read	null
2025-07-14	Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination	Mingqi Wu et.al.	2507.10532	translate	read	link
2025-07-14	Some remarks on gradient dominance and LQR policy optimization	Eduardo D. Sontag et.al.	2507.10452	translate	read	null
2025-07-14	Prompt Informed Reinforcement Learning for Visual Coverage Path Planning	Venkat Margapuri et.al.	2507.10284	translate	read	null
2025-07-14	Cross-Timeslot Optimization for Distributed GPU Inference Using Reinforcement Learning	Chengze Du et.al.	2507.10259	translate	read	null
2025-07-14	ToMacVF : Temporal Macro-action Value Factorization for Asynchronous Multi-Agent Reinforcement Learning	Wenjing Zhang et.al.	2507.10251	translate	read	null
2025-07-14	Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning?	Yumi Omori et.al.	2507.10174	translate	read	null
2025-07-14	Robust RL Control for Bipedal Locomotion with Closed Kinematic Chains	Egor Maslennikov et.al.	2507.10164	translate	read	null
2025-07-14	Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review	Siyi Hu et.al.	2507.10142	translate	read	null
2025-07-11	One Token to Fool LLM-as-a-Judge	Yulai Zhao et.al.	2507.08794	translate	read	null
2025-07-11	Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning	James McCarthy et.al.	2507.08793	translate	read	null
2025-07-11	Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data	Jeonghye Kim et.al.	2507.08761	translate	read	null
2025-07-11	On the Effect of Regularization in Policy Mirror Descent	Jan Felix Kleuker et.al.	2507.08718	translate	read	null
2025-07-11	SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations	Peter Crowley et.al.	2507.08707	translate	read	null
2025-07-11	elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings	Philip Osborne et.al.	2507.08705	translate	read	null
2025-07-11	Multi-critic Learning for Whole-body End-effector Twist Tracking	Aravind Elanjimattathil Vijayan et.al.	2507.08656	translate	read	null
2025-07-11	Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees	Berire Gunes Reyhan et.al.	2507.08653	translate	read	null
2025-07-11	Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning	Xingguang Ji et.al.	2507.08649	translate	read	link
2025-07-11	Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data	Parag Dutta et.al.	2507.08610	translate	read	null
2025-07-10	Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology	Haochen Wang et.al.	2507.07999	translate	read	link
2025-07-10	Single-pass Adaptive Image Tokenization for Minimum Program Search	Shivam Duggal et.al.	2507.07995	translate	read	null
2025-07-10	EXPO: Stable Reinforcement Learning with Expressive Policies	Perry Dong et.al.	2507.07986	translate	read	null
2025-07-10	Reinforcement Learning with Action Chunking	Qiyang Li et.al.	2507.07969	translate	read	null
2025-07-10	Scaling RL to Long Videos	Yukang Chen et.al.	2507.07966	translate	read	link
2025-07-10	Excess Observables Reveal Nonreciprocity in Integrated Covariance	Timur Aslyamov et.al.	2507.07876	translate	read	null
2025-07-10	“So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents	Giovanni Dispoto et.al.	2507.07848	translate	read	null
2025-07-10	Beyond Robustness: Learning Unknown Dynamic Load Adaptation for Quadruped Locomotion on Rough Terrain	Leixin Chang et.al.	2507.07825	translate	read	null
2025-07-10	BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning	Ruohong Liu et.al.	2507.07769	translate	read	null
2025-07-10	Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization	Chengtao Jian et.al.	2507.07723	translate	read	null
2025-07-09	Graph-Based Complexity Metrics for Multi-Agent Curriculum Learning: A Validated Approach to Task Ordering in Cooperative Coordination Environments	Farhaan Ebadulla et.al.	2507.07074	translate	read	null
2025-07-09	First Return, Entropy-Eliciting Explore	Tianyu Zheng et.al.	2507.07017	translate	read	null
2025-07-09	Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks	Deemah H. Tashman et.al.	2507.06997	translate	read	null
2025-07-09	Optimizing Cognitive Networks: Reinforcement Learning Meets Energy Harvesting Over Cascaded Channels	Deemah H. Tashman et.al.	2507.06981	translate	read	null
2025-07-09	Bounomodes: the grazing ox algorithm for exploration of clustered anomalies	Samuel Matloob et.al.	2507.06960	translate	read	null
2025-07-10	Rethinking Verification for LLM Code Generation: From Generation to Testing	Zihan Ma et.al.	2507.06920	translate	read	link
2025-07-09	Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams	Abolfazl Zarghani et.al.	2507.06901	translate	read	null
2025-07-09	Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model	Jing Liang et.al.	2507.06892	translate	read	null
2025-07-09	Episodic Contextual Bandits with Knapsacks under Conversion Models	Zitian Li et.al.	2507.06859	translate	read	null
2025-07-10	Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning	Matej Straka et.al.	2507.06825	translate	read	link
2025-07-08	EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow	Yixiang Chen et.al.	2507.06224	translate	read	null
2025-07-08	CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization	Zhongyuan Peng et.al.	2507.06181	translate	read	link
2025-07-08	Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model	Koki Yamane et.al.	2507.06174	translate	read	null
2025-07-08	Learning Agile Tensile Perching for Aerial Robots from Demonstrations	Kangle Yuan et.al.	2507.06172	translate	read	null
2025-07-08	Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation	Mohamad H. Danesh et.al.	2507.06111	translate	read	null
2025-07-08	AI-Based Demand Forecasting and Load Balancing for Optimising Energy use in Healthcare Systems: A real case study	Iman Rahimi et.al.	2507.06077	translate	read	null
2025-07-09	FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models	Bo Pang et.al.	2507.06057	translate	read	null
2025-07-08	CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation	Kushal Gajjar et.al.	2507.06013	translate	read	null
2025-07-08	From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination	Chang Yao et.al.	2507.06004	translate	read	null
2025-07-08	BlueLM-2.5-3B Technical Report	Baojiao Xiong et.al.	2507.05934	translate	read	null
2025-07-07	Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning	Yana Wei et.al.	2507.05255	translate	read	link
2025-07-07	Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving	Elahe Delavari et.al.	2507.05251	translate	read	null
2025-07-07	NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving	Qucheng Peng et.al.	2507.05227	translate	read	null
2025-07-07	EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling	Boyuan Wang et.al.	2507.05198	translate	read	null
2025-07-07	Sequential Attention-based Sampling for Histopathological Analysis	Tarun G et.al.	2507.05077	translate	read	null
2025-07-07	Replacing thinking with tool usage enables reasoning in small language models	Corrado Rainone et.al.	2507.05065	translate	read	null
2025-07-07	When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning	Maxence Boels et.al.	2507.05011	translate	read	null
2025-07-07	Linking Homeostasis to Reinforcement Learning: Internal State Control of Motivated Behavior	Naoto Yoshida et.al.	2507.04998	translate	read	null
2025-07-07	Object-centric Denoising Diffusion Models for Physical Reasoning	Moritz Lange et.al.	2507.04920	translate	read	null
2025-07-07	Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning	Sanyam Vyas et.al.	2507.04883	translate	read	null
2025-07-03	MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs	Purbesh Mitra et.al.	2507.02851	translate	read	link
2025-07-03	StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason	Kaiyi Zhang et.al.	2507.02841	translate	read	null
2025-07-03	ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning	Ruiyang Zhou et.al.	2507.02834	translate	read	null
2025-07-03	Generalizing Verifiable Instruction Following	Valentina Pyatkin et.al.	2507.02833	translate	read	null
2025-07-03	Multimodal Mathematical Reasoning with Diverse Solving Perspective	Wenhao Shi et.al.	2507.02804	translate	read	null
2025-07-03	A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control	Zilin Kang et.al.	2507.02712	translate	read	null
2025-07-03	Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions	Thomas Hazenberg et.al.	2507.02698	translate	read	null
2025-07-03	RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes	Jiaxing Wang et.al.	2507.02690	translate	read	null
2025-07-03	TUC-PPO: Team Utility-Constrained Proximal Policy Optimization for Spatial Public Goods Games	Zhaoqilin Yang et.al.	2507.02675	translate	read	null
2025-07-03	On Efficient Bayesian Exploration in Model-Based Reinforcement Learning	Alberto Caron et.al.	2507.02639	translate	read	null
2025-07-02	Kwai Keye-VL Technical Report	Kwai Keye Team et.al.	2507.01949	translate	read	link
2025-07-02	NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks	Yang Li et.al.	2507.01921	translate	read	null
2025-07-02	Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models	Chengao Li et.al.	2507.01915	translate	read	null
2025-07-02	TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types	Yuhao Lin et.al.	2507.01857	translate	read	null
2025-07-02	TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents	Dmytro Kuzmenko et.al.	2507.01823	translate	read	null
2025-07-02	Quantum reinforcement learning in dynamic environments	Oliver Sefrin et.al.	2507.01691	translate	read	null
2025-07-02	AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training	Zhenyu Han et.al.	2507.01663	translate	read	null
2025-07-02	Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning	Wu Fei et.al.	2507.01551	translate	read	null
2025-07-02	Chargax: A JAX Accelerated EV Charging Simulator	Koen Ponse et.al.	2507.01522	translate	read	null
2025-07-02	Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning	Yanfei Zhang et.al.	2507.01489	translate	read	null
2025-07-01	SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning	Bo Liu et.al.	2506.24119	translate	read	link
2025-07-01	Adapt Your Body: Mitigating Proprioception Shifts in Imitation Learning	Fuhang Kuang et.al.	2506.23944	translate	read	null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)