Reinforcement Learning - 2025-04

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-04-30	DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition	Z. Z. Ren et.al.	2504.21801	translate	read	link
2025-04-30	Reconciling Discrete-Time Mixed Policies and Continuous-Time Relaxed Controls in Reinforcement Learning and Stochastic Control	Rene Carmona et.al.	2504.21793	translate	read	null
2025-04-30	MAGNET: an open-source library for mesh agglomeration by Graph Neural Networks	Paola F. Antonietti et.al.	2504.21780	translate	read	null
2025-04-30	LLM-based Interactive Imitation Learning for Robotic Manipulation	Jonas Werner et.al.	2504.21769	translate	read	null
2025-04-30	LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning	Yiyang Shao et.al.	2504.21738	translate	read	null
2025-04-30	Adaptive 3D UI Placement in Mixed Reality Using Deep Reinforcement Learning	Feiyu Lu et.al.	2504.21731	translate	read	null
2025-04-30	MovementVR: An open-source tool for the study of motor control and learning in virtual reality	Cristina Rossi et.al.	2504.21696	translate	read	null
2025-04-30	Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation	Luca Marzari et.al.	2504.21643	translate	read	null
2025-04-30	Multi-Goal Dexterous Hand Manipulation using Probabilistic Model-based Reinforcement Learning	Yingzhuo Jiang et.al.	2504.21585	translate	read	null
2025-04-30	SimPRIVE: a Simulation framework for Physical Robot Interaction with Virtual Environments	Federico Nesti et.al.	2504.21454	translate	read	null
2025-04-29	Toward Efficient Exploration by Large Language Model Agents	Dilip Arumugam et.al.	2504.20997	translate	read	null
2025-04-29	XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search	Yiting Zhang et.al.	2504.20969	translate	read	null
2025-04-29	Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity	Taisuke Kobayashi et.al.	2504.20932	translate	read	null
2025-04-29	ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification	Ziqing Fan et.al.	2504.20930	translate	read	link
2025-04-29	Exploiting inter-agent coupling information for efficient reinforcement learning of cooperative LQR	Shahbaz P Qadri Syed et.al.	2504.20927	translate	read	null
2025-04-29	A Domain-Agnostic Scalable AI Safety Ensuring Framework	Beomjun Kim et.al.	2504.20924	translate	read	null
2025-04-29	Reinforcement Learning for LLM Reasoning Under Memory Constraints	Alan Lee et.al.	2504.20834	translate	read	null
2025-04-29	A Teacher-Student MPC-PPO Coupled Reinforcement Learning Framework for Winter Temperature Control of Solar Greenhouses in Northern China	Jingxin Yu et.al.	2504.20815	translate	read	null
2025-04-29	SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings	Florian Vahl et.al.	2504.20808	translate	read	null
2025-04-29	Q-Fusion: Diffusing Quantum Circuits	Collin Beaudoin et.al.	2504.20794	translate	read	null
2025-04-28	SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning	Wufei Ma et.al.	2504.20024	translate	read	null
2025-04-28	Socially-Aware Autonomous Driving: Inferring Yielding Intentions for Safer Interactions	Jing Wang et.al.	2504.20004	translate	read	null
2025-04-28	Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets	Adam Younsi et.al.	2504.19981	translate	read	null
2025-04-28	Mesh-Learner: Texturing Mesh with Spherical Harmonics	Yunfei Wan et.al.	2504.19938	translate	read	null
2025-04-28	Automated decision-making for dynamic task assignment at scale	Riccardo Lo Bianco et.al.	2504.19933	translate	read	null
2025-04-28	GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets	Mingqian He et.al.	2504.19898	translate	read	null
2025-04-28	Optimizing the Charging of Open Quantum Batteries using Long Short-Term Memory-Driven Reinforcement Learning	Shadab Zakavati et.al.	2504.19840	translate	read	null
2025-04-28	LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects	Guangyi Liu et.al.	2504.19838	translate	read	link
2025-04-28	Reinforcement Learning-Based Heterogeneous Multi-Task Optimization in Semantic Broadcast Communications	Zhilin Lu et.al.	2504.19806	translate	read	null
2025-04-28	Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control	Heisei Yonezawa et.al.	2504.19715	translate	read	null
2025-04-25	Generalization Capability for Imitation Learning	Yixiao Wang et.al.	2504.18538	translate	read	null
2025-04-25	Intelligent Attacks and Defense Methods in Federated Learning-enabled Energy-Efficient Wireless Networks	Han Zhang et.al.	2504.18519	translate	read	null
2025-04-25	Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation	Peiyuan Jing et.al.	2504.18453	translate	read	null
2025-04-25	Pushing the boundary on Natural Language Inference	Pablo Miralles-González et.al.	2504.18376	translate	read	null
2025-04-25	Explainable AI for UAV Mobility Management: A Deep Q-Network Approach for Handover Minimization	Irshad A. Meer et.al.	2504.18371	translate	read	null
2025-04-25	Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps	Simon Hakenes et.al.	2504.18300	translate	read	null
2025-04-25	Depth-Constrained ASV Navigation with Deep RL and Limited Sensing	Amirhossein Zhalehmehrabi et.al.	2504.18253	translate	read	null
2025-04-25	Aligning Language Models for Icelandic Legal Text Summarization	Þórir Hrafn Harðarson et.al.	2504.18180	translate	read	null
2025-04-25	Offline Learning of Controllable Diverse Behaviors	Mathieu Petitbois et.al.	2504.18160	translate	read	null
2025-04-25	Learning from Less: SINDy Surrogates in RL	Aniket Dixit et.al.	2504.18113	translate	read	null
2025-04-24	Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control	Haochen Wang et.al.	2504.17771	translate	read	null
2025-04-24	Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence	Edward Collins et.al.	2504.17703	translate	read	null
2025-04-24	Applied Sheaf Theory For Multi-agent Artificial Intelligence (Reinforcement Learning) Systems: A Prospectus	Eric Schmid et.al.	2504.17700	translate	read	null
2025-04-24	SAPO-RL: Sequential Actuator Placement Optimization for Fuselage Assembly via Reinforcement Learning	Peng Ye et.al.	2504.17603	translate	read	null
2025-04-24	Mitigating xApp conflicts for efficient network slicing in 6G O-RAN: a graph convolutional-based attention network approach	Sihem Bakri et.al.	2504.17590	translate	read	null
2025-04-24	Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization	Hongshu Guo et.al.	2504.17578	translate	read	null
2025-04-24	Cooperative Task Offloading through Asynchronous Deep Reinforcement Learning in Mobile Edge Computing for Future Networks	Yuelin Liu et.al.	2504.17526	translate	read	null
2025-04-24	Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning	Mingqi Yuan et.al.	2504.17490	translate	read	null
2025-04-24	Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning	Weiliang Zhang et.al.	2504.17356	translate	read	null
2025-04-24	Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization	Xiaohan Huang et.al.	2504.17355	translate	read	null
2025-04-23	Latent Diffusion Planning for Imitation Learning	Amber Xie et.al.	2504.16925	translate	read	null
2025-04-23	Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms	Hsin-Jung Yang et.al.	2504.16916	translate	read	null
2025-04-23	Hybrid Reinforcement Learning and Model Predictive Control for Adaptive Control of Hydrogen-Diesel Dual-Fuel Combustion	Julian Bedei et.al.	2504.16875	translate	read	null
2025-04-23	Monte Carlo Planning with Large Language Model for Text-Based Game Agents	Zijing Shi et.al.	2504.16855	translate	read	null
2025-04-23	SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward	Nicolas Jonason et.al.	2504.16839	translate	read	null
2025-04-23	MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme	Weixi Li et.al.	2504.16729	translate	read	null
2025-04-23	PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation	Wenxuan Li et.al.	2504.16693	translate	read	null
2025-04-23	Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator	Chenhao Li et.al.	2504.16680	translate	read	null
2025-04-23	Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning	Chris et.al.	2504.16656	translate	read	link
2025-04-23	Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models	Fredy Pokou et.al.	2504.16635	translate	read	null
2025-04-22	TTRL: Test-Time Reinforcement Learning	Yuxin Zuo et.al.	2504.16084	translate	read	link
2025-04-22	LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities	Thomas Schmied et.al.	2504.16078	translate	read	null
2025-04-22	Reinforcement Learning and Metaheuristics for Feynman Integral Reduction	Mao Zeng et.al.	2504.16045	translate	read	null
2025-04-22	The Formation of Production Networks: How Supply Chains Arise from Simple Learning with Minimal Information	Tuong Manh Vu et.al.	2504.16010	translate	read	null
2025-04-22	Making Neural Networks More Suitable for Approximate Clifford+T Circuit Synthesis	Mathias Weiden et.al.	2504.15990	translate	read	null
2025-04-22	Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems	Lukas Gehrke et.al.	2504.15984	translate	read	null
2025-04-22	Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning	Wang Lin et.al.	2504.15932	translate	read	null
2025-04-22	StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation	Yinmin Zhong et.al.	2504.15930	translate	read	null
2025-04-22	New Recipe for Semi-supervised Community Detection: Clique Annealing under Crystallization Kinetics	Ling Cheng et.al.	2504.15927	translate	read	null
2025-04-22	GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network	Wenjing Xiao et.al.	2504.15905	translate	read	null
2025-04-21	VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models	Weiye Xu et.al.	2504.15279	translate	read	null
2025-04-21	Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning	Jie Cheng et.al.	2504.15275	translate	read	link
2025-04-21	FlowReasoner: Reinforcing Query-Level Meta-Agents	Hongcheng Gao et.al.	2504.15257	translate	read	link
2025-04-21	DRAGON: Distributional Rewards Optimize Diffusion Generative Models	Yatong Bai et.al.	2504.15217	translate	read	null
2025-04-21	Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs	Marina Sakharova et.al.	2504.15210	translate	read	null
2025-04-21	Beyond Binary Opinions: A Deep Reinforcement Learning-Based Approach to Uncertainty-Aware Competitive Influence Maximization	Qi Zhang et.al.	2504.15131	translate	read	null
2025-04-21	A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment	Kangyao Huang et.al.	2504.15129	translate	read	null
2025-04-21	Fast-Slow Co-advancing Optimizer: Toward Harmonious Adversarial Training of GAN	Lin Wang et.al.	2504.15099	translate	read	null
2025-04-21	Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL	Simone Papicchio et.al.	2504.15077	translate	read	null
2025-04-21	Energy-Efficient UAV-Mounted RIS for IoT: A Hybrid Energy Harvesting and DRL Approach	Mahmoud M. Salim et.al.	2504.15043	translate	read	null
2025-04-18	Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?	Yang Yue et.al.	2504.13837	translate	read	link
2025-04-18	Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning	Yixuan Even Xu et.al.	2504.13818	translate	read	null
2025-04-18	DiffOG: Differentiable Policy Trajectory Optimization with Generalizability	Zhengtong Xu et.al.	2504.13807	translate	read	null
2025-04-18	Imitation Learning with Precisely Labeled Human Demonstrations	Yilong Song et.al.	2504.13803	translate	read	null
2025-04-18	Bake Two Cakes with One Oven: RL for Defusing Popularity Bias and Cold-start in Third-Party Library Recommendations	Minh Hoang Vuong et.al.	2504.13772	translate	read	null
2025-04-18	A Reinforcement Learning Method to Factual and Counterfactual Explanations for Session-based Recommendation	Han Zhou et.al.	2504.13632	translate	read	null
2025-04-18	Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning	Rohan P. Singh et.al.	2504.13619	translate	read	null
2025-04-18	On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting	Niklas Funk et.al.	2504.13618	translate	read	null
2025-04-18	Compile Scene Graphs with Reinforcement Learning	Zuyao Chen et.al.	2504.13617	translate	read	null
2025-04-18	Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling	Zihao Feng et.al.	2504.13592	translate	read	null
2025-04-17	Energy-Based Reward Models for Robust Language Model Alignment	Anamika Lochab et.al.	2504.13134	translate	read	null
2025-04-17	LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard	Varun Rao et.al.	2504.13125	translate	read	null
2025-04-17	SkyReels-V2: Infinite-length Film Generative Model	Guibin Chen et.al.	2504.13074	translate	read	link
2025-04-17	NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation	Xiangyan Liu et.al.	2504.13055	translate	read	link
2025-04-17	InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning	Zheng Wang et.al.	2504.13032	translate	read	null
2025-04-17	QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?	Zhouyang Jiang et.al.	2504.12961	translate	read	null
2025-04-17	RL-PINNs: Reinforcement Learning-Driven Adaptive Sampling for Efficient Training of PINNs	Zhenao Song et.al.	2504.12949	translate	read	null
2025-04-17	Image-Editing Specialists: An RLAIF Approach for Diffusion Models	Elior Benarous et.al.	2504.12833	translate	read	link
2025-04-17	Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis	James Rudd-Jones et.al.	2504.12777	translate	read	null
2025-04-17	GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks	Hao Xu et.al.	2504.12764	translate	read	link
2025-04-16	Adapting a World Model for Trajectory Following in a 3D Game	Marko Tot et.al.	2504.12299	translate	read	null
2025-04-16	d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning	Siyan Zhao et.al.	2504.12216	translate	read	link
2025-04-16	Reasoning-Based AI for Startup Evaluation (R.A.I.S.E.): A Memory-Augmented, Multi-Step Decision Framework	Jack Preuveneers et.al.	2504.12090	translate	read	null
2025-04-16	pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild	Jonas Myhre Schiøtt et.al.	2504.12045	translate	read	null
2025-04-16	Evolutionary Reinforcement Learning for Interpretable Decision-Making in Supply Chain Management	Stefano Genetti et.al.	2504.12023	translate	read	null
2025-04-16	Control of Rayleigh-Bénard Convection: Effectiveness of Reinforcement Learning in the Turbulent Regime	Thorben Markmann et.al.	2504.12000	translate	read	null
2025-04-16	A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs	Kihyuk Hong et.al.	2504.11997	translate	read	null
2025-04-16	Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions	Yifei Dong et.al.	2504.11967	translate	read	null
2025-04-16	R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors	Haoyang Wang et.al.	2504.11946	translate	read	null
2025-04-16	VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning	Xuyang Chen et.al.	2504.11944	translate	read	null
2025-04-15	DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning	Zhiwei He et.al.	2504.11456	translate	read	link
2025-04-15	A Clean Slate for Offline Reinforcement Learning	Matthew Thomas Jackson et.al.	2504.11453	translate	read	null
2025-04-15	Embodied World Models Emerge from Navigational Task in Open-Ended Environments	Li Jin et.al.	2504.11419	translate	read	null
2025-04-15	Measures of Variability for Risk-averse Policy Gradient	Yudong Luo et.al.	2504.11412	translate	read	null
2025-04-15	Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning	Haiming Wang et.al.	2504.11354	translate	read	null
2025-04-15	A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce	Wei Xiong et.al.	2504.11343	translate	read	link
2025-04-15	Multi-Agent Reinforcement Learning for Greenhouse Gas Offset Credit Markets	Liam Welsh et.al.	2504.11258	translate	read	null
2025-04-15	A Rollout-Based Algorithm and Reward Function for Efficient Resource Allocation in Business Processes	Jeroen Middelhuis et.al.	2504.11250	translate	read	null
2025-04-15	Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks	Fikrican Özgür et.al.	2504.11247	translate	read	null
2025-04-15	Revealing Covert Attention by Analyzing Human and Reinforcement Learning Agent Gameplay	Henrik Krauss et.al.	2504.11118	translate	read	null
2025-04-14	Weight Ensembling Improves Reasoning in Language Models	Xingyu Dang et.al.	2504.10478	translate	read	null
2025-04-14	Co-optimizing Physical Reconfiguration Parameters and Controllers for an Origami-inspired Reconfigurable Manipulator	Zhe Chen et.al.	2504.10474	translate	read	null
2025-04-14	GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents	Xiaobo Xia et.al.	2504.10458	translate	read	link
2025-04-14	The Communication and Computation Trade-off in Wireless Semantic Communications	Xuyang Chen et.al.	2504.10357	translate	read	null
2025-04-14	Heimdall: test-time scaling on the generative verification	Wenlei Shi et.al.	2504.10337	translate	read	null
2025-04-14	Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning	Guanqi He et.al.	2504.10334	translate	read	null
2025-04-14	InstructEngine: Instruction-driven Text-to-Image Alignment	Xingyu Lu et.al.	2504.10329	translate	read	null
2025-04-14	Vision based driving agent for race car simulation environments	Gergely Bári et.al.	2504.10266	translate	read	null
2025-04-14	Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins	Collins O. Ogbodo et.al.	2504.10248	translate	read	null
2025-04-14	Deep Reasoning Translation via Reinforcement Learning	Jiaan Wang et.al.	2504.10187	translate	read	null
2025-04-11	Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing	Vinal Asodia et.al.	2504.08704	translate	read	null
2025-04-11	Pobogot – An Open-Hardware Open-Source Low Cost Robot for Swarm Robotics	Alessia Loi et.al.	2504.08686	translate	read	null
2025-04-11	Reinforcement Learning-Driven Plant-Wide Refinery Planning Using Model Decomposition	Zhouchang Li et.al.	2504.08642	translate	read	null
2025-04-11	Neural Fidelity Calibration for Informative Sim-to-Real Adaptation	Youwei Yu et.al.	2504.08604	translate	read	null
2025-04-11	SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning	Peixian Ma et.al.	2504.08600	translate	read	link
2025-04-11	Playpen: An Environment for Exploring Learning Through Conversational Interaction	Nicola Horst et.al.	2504.08590	translate	read	link
2025-04-11	Slicing the Gaussian Mixture Wasserstein Distance	Moritz Piening et.al.	2504.08544	translate	read	null
2025-04-11	Diffusion Models for Robotic Manipulation: A Survey	Rosa Wolf et.al.	2504.08438	translate	read	null
2025-04-11	Belief States for Cooperative Multi-Agent Reinforcement Learning under Partial Observability	Paul J. Pritz et.al.	2504.08417	translate	read	null
2025-04-11	Scalable Conflict-free Decision Making with Photons	Kohei Konaka et.al.	2504.08331	translate	read	null
2025-04-10	Perception-R1: Pioneering Perception Policy with Reinforcement Learning	En Yu et.al.	2504.07954	translate	read	link
2025-04-10	Echo: An Open-Source, Low-Cost Teleoperation System with Force Feedback for Dataset Collection in Robot Learning	Artem Bazhenov et.al.	2504.07939	translate	read	null
2025-04-10	Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining	Rosie Zhao et.al.	2504.07912	translate	read	link
2025-04-10	Fast Adaptation with Behavioral Foundation Models	Harshit Sikchi et.al.	2504.07896	translate	read	null
2025-04-10	2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization	Mengyang Li et.al.	2504.07856	translate	read	null
2025-04-10	Genetic Programming with Reinforcement Learning Trained Transformer for Real-World Dynamic Scheduling Problems	Xian Chen et.al.	2504.07779	translate	read	null
2025-04-10	Harnessing Equivariance: Modeling Turbulence with Graph Neural Networks	Marius Kurz et.al.	2504.07741	translate	read	null
2025-04-10	Relaxing the Markov Requirements on Reinforcement Learning Under Weak Partial Ignorability	MaryLena Bleile et.al.	2504.07722	translate	read	null
2025-04-10	Sim-to-Real Transfer in Reinforcement Learning for Maneuver Control of a Variable-Pitch MAV	Zhikun Wang et.al.	2504.07694	translate	read	null
2025-04-10	VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model	Haozhan Shen et.al.	2504.07615	translate	read	link
2025-04-09	Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning	Chenjie Hao et.al.	2504.07095	translate	read	link
2025-04-09	AssistanceZero: Scalably Solving Assistance Games	Cassidy Laidlaw et.al.	2504.07091	translate	read	link
2025-04-09	A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility	Andreas Hochlehnert et.al.	2504.07086	translate	read	link
2025-04-09	To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning	Tian Qin et.al.	2504.07052	translate	read	null
2025-04-09	Free Random Projection for In-Context Reinforcement Learning	Tomohiro Hayase et.al.	2504.06983	translate	read	null
2025-04-09	VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning	Xinhao Li et.al.	2504.06958	translate	read	link
2025-04-09	Regret Bounds for Robust Online Decision Making	Alexander Appel et.al.	2504.06820	translate	read	null
2025-04-09	Interactive Expressive Motion Generation Using Dynamic Movement Primitives	Till Hielscher et.al.	2504.06735	translate	read	null
2025-04-09	Learning global control of underactuated systems with Model-Based Reinforcement Learning	Niccolò Turcato et.al.	2504.06721	translate	read	null
2025-04-09	SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination	Delin Zhao et.al.	2504.06684	translate	read	null
2025-04-08	ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface	Fangchen Liu et.al.	2504.06156	translate	read	null
2025-04-08	Adversarial Training of Reward Models	Alexander Bukharin et.al.	2504.06141	translate	read	null
2025-04-08	A Multimedia Analytics Model for the Foundation Model Era	Marcel Worring et.al.	2504.06138	translate	read	null
2025-04-08	Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms	Ido Greenberg et.al.	2504.06126	translate	read	null
2025-04-08	Robo-taxi Fleet Coordination at Scale via Reinforcement Learning	Luigi Tresca et.al.	2504.06125	translate	read	link
2025-04-09	Leanabell-Prover: Posttraining Scaling in Formal Reasoning	Jingyuan Zhang et.al.	2504.06122	translate	read	link
2025-04-08	Trust-Region Twisted Policy Improvement	Joery A. de Vries et.al.	2504.06048	translate	read	null
2025-04-08	Information-Theoretic Reward Decomposition for Generalizable RLHF	Liyuan Mao et.al.	2504.06020	translate	read	null
2025-04-08	Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models	J. S. van Hulst et.al.	2504.05978	translate	read	null
2025-04-08	AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems	Zhuoli Zhuang et.al.	2504.05950	translate	read	null
2025-04-07	RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception	Hui Zhang et.al.	2504.05287	translate	read	link
2025-04-07	Concise Reasoning via Reinforcement Learning	Mehdi Fatemi et.al.	2504.05185	translate	read	link
2025-04-07	Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval	Kidist Amde Mekonnen et.al.	2504.05181	translate	read	link
2025-04-07	RLBayes: a Bayesian Network Structure Learning Algorithm via Reinforcement Learning-Based Search Strategy	Mingcan Wang et.al.	2504.05167	translate	read	null
2025-04-07	A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks	Leonardo Kanashiro Felizardo et.al.	2504.05150	translate	read	link
2025-04-08	VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks	Yu Yue et.al.	2504.05118	translate	read	null
2025-04-07	Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning	Anja Surina et.al.	2504.05108	translate	read	null
2025-04-08	Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation	Huilin Yin et.al.	2504.05045	translate	read	null
2025-04-07	Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning	Bibek Poudel et.al.	2504.05018	translate	read	null
2025-04-07	Wavelet Policy: Imitation Policy Learning in Frequency Domain with Wavelet Transforms	Changchuan Yang et.al.	2504.04991	translate	read	link
2025-04-04	Align to Structure: Aligning Large Language Models with Structural Information	Zae Myung Kim et.al.	2504.03622	translate	read	null
2025-04-04	Optimization of a Triangular Delaunay Mesh Generator using Reinforcement Learning	Will Thacher et.al.	2504.03610	translate	read	null
2025-04-04	Dexterous Manipulation through Imitation Learning: A Survey	Shan An et.al.	2504.03515	translate	read	null
2025-04-04	Learning Dual-Arm Coordination for Grasping Large Flat Objects	Yongliang Wang et.al.	2504.03500	translate	read	null
2025-04-04	Optimizing Quantum Circuits via ZX Diagrams using Reinforcement Learning and Graph Neural Networks	Alexander Mattick et.al.	2504.03429	translate	read	null
2025-04-04	DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models	Sathish Kumar et.al.	2504.03423	translate	read	null
2025-04-04	Autonomous state-space segmentation for Deep-RL sparse reward scenarios	Gianluca Maselli et.al.	2504.03420	translate	read	null
2025-04-04	Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning	Sanghwan Bae et.al.	2504.03380	translate	read	null
2025-04-04	Verification of Autonomous Neural Car Control with KeYmaera X	Enguerrand Prebet et.al.	2504.03272	translate	read	null
2025-04-04	Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward	Yanming Wan et.al.	2504.03206	translate	read	null
2025-04-03	Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets	Chuning Zhu et.al.	2504.02792	translate	read	link
2025-04-03	A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy	Andrea Ghezzi et.al.	2504.02710	translate	read	null
2025-04-03	Handover and SINR-Aware Path Optimization in 5G-UAV mmWave Communication using DRL	Achilles Kiwanuka Machumilane et.al.	2504.02688	translate	read	null
2025-04-03	Integrating Human Knowledge Through Action Masking in Reinforcement Learning for Operations Research	Mirko Stappert et.al.	2504.02662	translate	read	null
2025-04-03	SymDQN: Symbolic Knowledge and Reasoning in Neural Network-based Reinforcement Learning	Ivo Amador et.al.	2504.02654	translate	read	null
2025-04-03	Solving the Paint Shop Problem with Flexible Management of Multi-Lane Buffers Using Reinforcement Learning and Action Masking	Mirko Stappert et.al.	2504.02644	translate	read	null
2025-04-03	Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving	Daoguang Zan et.al.	2504.02605	translate	read	link
2025-04-03	Regulating Spatial Fairness in a Tripartite Micromobility Sharing System via Reinforcement Learning	Matteo Cederle et.al.	2504.02597	translate	read	null
2025-04-03	LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning	Kepu Zhang et.al.	2504.02590	translate	read	null
2025-04-04	Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme	Yan Ma et.al.	2504.02587	translate	read	link
2025-04-02	OpenCodeReasoning: Advancing Data Distillation for Competitive Coding	Wasi Uddin Ahmad et.al.	2504.01943	translate	read	null
2025-04-02	Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity	Lisa Coiffard et.al.	2504.01915	translate	read	null
2025-04-02	GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning	Yanzhou Su et.al.	2504.01886	translate	read	link
2025-04-02	Interpreting Emergent Planning in Model-Free Reinforcement Learning	Thomas Bush et.al.	2504.01871	translate	read	null
2025-04-02	Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error	Anne Somalwar et.al.	2504.01766	translate	read	null
2025-04-03	Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning	Ke Jiang et.al.	2504.01719	translate	read	null
2025-04-02	ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs	Yi-Long Lu et.al.	2504.01698	translate	read	null
2025-04-02	8-DoFs Cable Driven Parallel Robots for Bimanual Teleportation	Hung Hon Cheng et.al.	2504.01554	translate	read	null
2025-04-02	A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics	Qihao Ye et.al.	2504.01482	translate	read	null
2025-04-02	Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning	Llewyn Salt et.al.	2504.01459	translate	read	null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)