Reinforcement Learning - 2025-04

Publish Date Title Authors PDF Translate Read Code
2025-04-30 DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition Z. Z. Ren et.al. 2504.21801 translate read link
2025-04-30 Reconciling Discrete-Time Mixed Policies and Continuous-Time Relaxed Controls in Reinforcement Learning and Stochastic Control Rene Carmona et.al. 2504.21793 translate read null
2025-04-30 MAGNET: an open-source library for mesh agglomeration by Graph Neural Networks Paola F. Antonietti et.al. 2504.21780 translate read null
2025-04-30 LLM-based Interactive Imitation Learning for Robotic Manipulation Jonas Werner et.al. 2504.21769 translate read null
2025-04-30 LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning Yiyang Shao et.al. 2504.21738 translate read null
2025-04-30 Adaptive 3D UI Placement in Mixed Reality Using Deep Reinforcement Learning Feiyu Lu et.al. 2504.21731 translate read null
2025-04-30 MovementVR: An open-source tool for the study of motor control and learning in virtual reality Cristina Rossi et.al. 2504.21696 translate read null
2025-04-30 Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation Luca Marzari et.al. 2504.21643 translate read null
2025-04-30 Multi-Goal Dexterous Hand Manipulation using Probabilistic Model-based Reinforcement Learning Yingzhuo Jiang et.al. 2504.21585 translate read null
2025-04-30 SimPRIVE: a Simulation framework for Physical Robot Interaction with Virtual Environments Federico Nesti et.al. 2504.21454 translate read null
2025-04-29 Toward Efficient Exploration by Large Language Model Agents Dilip Arumugam et.al. 2504.20997 translate read null
2025-04-29 XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search Yiting Zhang et.al. 2504.20969 translate read null
2025-04-29 Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity Taisuke Kobayashi et.al. 2504.20932 translate read null
2025-04-29 ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification Ziqing Fan et.al. 2504.20930 translate read link
2025-04-29 Exploiting inter-agent coupling information for efficient reinforcement learning of cooperative LQR Shahbaz P Qadri Syed et.al. 2504.20927 translate read null
2025-04-29 A Domain-Agnostic Scalable AI Safety Ensuring Framework Beomjun Kim et.al. 2504.20924 translate read null
2025-04-29 Reinforcement Learning for LLM Reasoning Under Memory Constraints Alan Lee et.al. 2504.20834 translate read null
2025-04-29 A Teacher-Student MPC-PPO Coupled Reinforcement Learning Framework for Winter Temperature Control of Solar Greenhouses in Northern China Jingxin Yu et.al. 2504.20815 translate read null
2025-04-29 SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings Florian Vahl et.al. 2504.20808 translate read null
2025-04-29 Q-Fusion: Diffusing Quantum Circuits Collin Beaudoin et.al. 2504.20794 translate read null
2025-04-28 SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning Wufei Ma et.al. 2504.20024 translate read null
2025-04-28 Socially-Aware Autonomous Driving: Inferring Yielding Intentions for Safer Interactions Jing Wang et.al. 2504.20004 translate read null
2025-04-28 Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets Adam Younsi et.al. 2504.19981 translate read null
2025-04-28 Mesh-Learner: Texturing Mesh with Spherical Harmonics Yunfei Wan et.al. 2504.19938 translate read null
2025-04-28 Automated decision-making for dynamic task assignment at scale Riccardo Lo Bianco et.al. 2504.19933 translate read null
2025-04-28 GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets Mingqian He et.al. 2504.19898 translate read null
2025-04-28 Optimizing the Charging of Open Quantum Batteries using Long Short-Term Memory-Driven Reinforcement Learning Shadab Zakavati et.al. 2504.19840 translate read null
2025-04-28 LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects Guangyi Liu et.al. 2504.19838 translate read link
2025-04-28 Reinforcement Learning-Based Heterogeneous Multi-Task Optimization in Semantic Broadcast Communications Zhilin Lu et.al. 2504.19806 translate read null
2025-04-28 Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control Heisei Yonezawa et.al. 2504.19715 translate read null
2025-04-25 Generalization Capability for Imitation Learning Yixiao Wang et.al. 2504.18538 translate read null
2025-04-25 Intelligent Attacks and Defense Methods in Federated Learning-enabled Energy-Efficient Wireless Networks Han Zhang et.al. 2504.18519 translate read null
2025-04-25 Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation Peiyuan Jing et.al. 2504.18453 translate read null
2025-04-25 Pushing the boundary on Natural Language Inference Pablo Miralles-González et.al. 2504.18376 translate read null
2025-04-25 Explainable AI for UAV Mobility Management: A Deep Q-Network Approach for Handover Minimization Irshad A. Meer et.al. 2504.18371 translate read null
2025-04-25 Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps Simon Hakenes et.al. 2504.18300 translate read null
2025-04-25 Depth-Constrained ASV Navigation with Deep RL and Limited Sensing Amirhossein Zhalehmehrabi et.al. 2504.18253 translate read null
2025-04-25 Aligning Language Models for Icelandic Legal Text Summarization Þórir Hrafn Harðarson et.al. 2504.18180 translate read null
2025-04-25 Offline Learning of Controllable Diverse Behaviors Mathieu Petitbois et.al. 2504.18160 translate read null
2025-04-25 Learning from Less: SINDy Surrogates in RL Aniket Dixit et.al. 2504.18113 translate read null
2025-04-24 Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control Haochen Wang et.al. 2504.17771 translate read null
2025-04-24 Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence Edward Collins et.al. 2504.17703 translate read null
2025-04-24 Applied Sheaf Theory For Multi-agent Artificial Intelligence (Reinforcement Learning) Systems: A Prospectus Eric Schmid et.al. 2504.17700 translate read null
2025-04-24 SAPO-RL: Sequential Actuator Placement Optimization for Fuselage Assembly via Reinforcement Learning Peng Ye et.al. 2504.17603 translate read null
2025-04-24 Mitigating xApp conflicts for efficient network slicing in 6G O-RAN: a graph convolutional-based attention network approach Sihem Bakri et.al. 2504.17590 translate read null
2025-04-24 Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization Hongshu Guo et.al. 2504.17578 translate read null
2025-04-24 Cooperative Task Offloading through Asynchronous Deep Reinforcement Learning in Mobile Edge Computing for Future Networks Yuelin Liu et.al. 2504.17526 translate read null
2025-04-24 Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning Mingqi Yuan et.al. 2504.17490 translate read null
2025-04-24 Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning Weiliang Zhang et.al. 2504.17356 translate read null
2025-04-24 Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization Xiaohan Huang et.al. 2504.17355 translate read null
2025-04-23 Latent Diffusion Planning for Imitation Learning Amber Xie et.al. 2504.16925 translate read null
2025-04-23 Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms Hsin-Jung Yang et.al. 2504.16916 translate read null
2025-04-23 Hybrid Reinforcement Learning and Model Predictive Control for Adaptive Control of Hydrogen-Diesel Dual-Fuel Combustion Julian Bedei et.al. 2504.16875 translate read null
2025-04-23 Monte Carlo Planning with Large Language Model for Text-Based Game Agents Zijing Shi et.al. 2504.16855 translate read null
2025-04-23 SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward Nicolas Jonason et.al. 2504.16839 translate read null
2025-04-23 MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme Weixi Li et.al. 2504.16729 translate read null
2025-04-23 PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation Wenxuan Li et.al. 2504.16693 translate read null
2025-04-23 Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator Chenhao Li et.al. 2504.16680 translate read null
2025-04-23 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Chris et.al. 2504.16656 translate read link
2025-04-23 Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models Fredy Pokou et.al. 2504.16635 translate read null
2025-04-22 TTRL: Test-Time Reinforcement Learning Yuxin Zuo et.al. 2504.16084 translate read link
2025-04-22 LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Thomas Schmied et.al. 2504.16078 translate read null
2025-04-22 Reinforcement Learning and Metaheuristics for Feynman Integral Reduction Mao Zeng et.al. 2504.16045 translate read null
2025-04-22 The Formation of Production Networks: How Supply Chains Arise from Simple Learning with Minimal Information Tuong Manh Vu et.al. 2504.16010 translate read null
2025-04-22 Making Neural Networks More Suitable for Approximate Clifford+T Circuit Synthesis Mathias Weiden et.al. 2504.15990 translate read null
2025-04-22 Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems Lukas Gehrke et.al. 2504.15984 translate read null
2025-04-22 Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning Wang Lin et.al. 2504.15932 translate read null
2025-04-22 StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation Yinmin Zhong et.al. 2504.15930 translate read null
2025-04-22 New Recipe for Semi-supervised Community Detection: Clique Annealing under Crystallization Kinetics Ling Cheng et.al. 2504.15927 translate read null
2025-04-22 GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network Wenjing Xiao et.al. 2504.15905 translate read null
2025-04-21 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Weiye Xu et.al. 2504.15279 translate read null
2025-04-21 Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Jie Cheng et.al. 2504.15275 translate read link
2025-04-21 FlowReasoner: Reinforcing Query-Level Meta-Agents Hongcheng Gao et.al. 2504.15257 translate read link
2025-04-21 DRAGON: Distributional Rewards Optimize Diffusion Generative Models Yatong Bai et.al. 2504.15217 translate read null
2025-04-21 Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs Marina Sakharova et.al. 2504.15210 translate read null
2025-04-21 Beyond Binary Opinions: A Deep Reinforcement Learning-Based Approach to Uncertainty-Aware Competitive Influence Maximization Qi Zhang et.al. 2504.15131 translate read null
2025-04-21 A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment Kangyao Huang et.al. 2504.15129 translate read null
2025-04-21 Fast-Slow Co-advancing Optimizer: Toward Harmonious Adversarial Training of GAN Lin Wang et.al. 2504.15099 translate read null
2025-04-21 Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL Simone Papicchio et.al. 2504.15077 translate read null
2025-04-21 Energy-Efficient UAV-Mounted RIS for IoT: A Hybrid Energy Harvesting and DRL Approach Mahmoud M. Salim et.al. 2504.15043 translate read null
2025-04-18 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Yang Yue et.al. 2504.13837 translate read link
2025-04-18 Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning Yixuan Even Xu et.al. 2504.13818 translate read null
2025-04-18 DiffOG: Differentiable Policy Trajectory Optimization with Generalizability Zhengtong Xu et.al. 2504.13807 translate read null
2025-04-18 Imitation Learning with Precisely Labeled Human Demonstrations Yilong Song et.al. 2504.13803 translate read null
2025-04-18 Bake Two Cakes with One Oven: RL for Defusing Popularity Bias and Cold-start in Third-Party Library Recommendations Minh Hoang Vuong et.al. 2504.13772 translate read null
2025-04-18 A Reinforcement Learning Method to Factual and Counterfactual Explanations for Session-based Recommendation Han Zhou et.al. 2504.13632 translate read null
2025-04-18 Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning Rohan P. Singh et.al. 2504.13619 translate read null
2025-04-18 On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting Niklas Funk et.al. 2504.13618 translate read null
2025-04-18 Compile Scene Graphs with Reinforcement Learning Zuyao Chen et.al. 2504.13617 translate read null
2025-04-18 Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling Zihao Feng et.al. 2504.13592 translate read null
2025-04-17 Energy-Based Reward Models for Robust Language Model Alignment Anamika Lochab et.al. 2504.13134 translate read null
2025-04-17 LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard Varun Rao et.al. 2504.13125 translate read null
2025-04-17 SkyReels-V2: Infinite-length Film Generative Model Guibin Chen et.al. 2504.13074 translate read link
2025-04-17 NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Xiangyan Liu et.al. 2504.13055 translate read link
2025-04-17 InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning Zheng Wang et.al. 2504.13032 translate read null
2025-04-17 QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? Zhouyang Jiang et.al. 2504.12961 translate read null
2025-04-17 RL-PINNs: Reinforcement Learning-Driven Adaptive Sampling for Efficient Training of PINNs Zhenao Song et.al. 2504.12949 translate read null
2025-04-17 Image-Editing Specialists: An RLAIF Approach for Diffusion Models Elior Benarous et.al. 2504.12833 translate read link
2025-04-17 Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis James Rudd-Jones et.al. 2504.12777 translate read null
2025-04-17 GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks Hao Xu et.al. 2504.12764 translate read link
2025-04-16 Adapting a World Model for Trajectory Following in a 3D Game Marko Tot et.al. 2504.12299 translate read null
2025-04-16 d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning Siyan Zhao et.al. 2504.12216 translate read link
2025-04-16 Reasoning-Based AI for Startup Evaluation (R.A.I.S.E.): A Memory-Augmented, Multi-Step Decision Framework Jack Preuveneers et.al. 2504.12090 translate read null
2025-04-16 pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild Jonas Myhre Schiøtt et.al. 2504.12045 translate read null
2025-04-16 Evolutionary Reinforcement Learning for Interpretable Decision-Making in Supply Chain Management Stefano Genetti et.al. 2504.12023 translate read null
2025-04-16 Control of Rayleigh-Bénard Convection: Effectiveness of Reinforcement Learning in the Turbulent Regime Thorben Markmann et.al. 2504.12000 translate read null
2025-04-16 A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs Kihyuk Hong et.al. 2504.11997 translate read null
2025-04-16 Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions Yifei Dong et.al. 2504.11967 translate read null
2025-04-16 R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors Haoyang Wang et.al. 2504.11946 translate read null
2025-04-16 VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning Xuyang Chen et.al. 2504.11944 translate read null
2025-04-15 DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Zhiwei He et.al. 2504.11456 translate read link
2025-04-15 A Clean Slate for Offline Reinforcement Learning Matthew Thomas Jackson et.al. 2504.11453 translate read null
2025-04-15 Embodied World Models Emerge from Navigational Task in Open-Ended Environments Li Jin et.al. 2504.11419 translate read null
2025-04-15 Measures of Variability for Risk-averse Policy Gradient Yudong Luo et.al. 2504.11412 translate read null
2025-04-15 Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning Haiming Wang et.al. 2504.11354 translate read null
2025-04-15 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Wei Xiong et.al. 2504.11343 translate read link
2025-04-15 Multi-Agent Reinforcement Learning for Greenhouse Gas Offset Credit Markets Liam Welsh et.al. 2504.11258 translate read null
2025-04-15 A Rollout-Based Algorithm and Reward Function for Efficient Resource Allocation in Business Processes Jeroen Middelhuis et.al. 2504.11250 translate read null
2025-04-15 Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks Fikrican Özgür et.al. 2504.11247 translate read null
2025-04-15 Revealing Covert Attention by Analyzing Human and Reinforcement Learning Agent Gameplay Henrik Krauss et.al. 2504.11118 translate read null
2025-04-14 Weight Ensembling Improves Reasoning in Language Models Xingyu Dang et.al. 2504.10478 translate read null
2025-04-14 Co-optimizing Physical Reconfiguration Parameters and Controllers for an Origami-inspired Reconfigurable Manipulator Zhe Chen et.al. 2504.10474 translate read null
2025-04-14 GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents Xiaobo Xia et.al. 2504.10458 translate read link
2025-04-14 The Communication and Computation Trade-off in Wireless Semantic Communications Xuyang Chen et.al. 2504.10357 translate read null
2025-04-14 Heimdall: test-time scaling on the generative verification Wenlei Shi et.al. 2504.10337 translate read null
2025-04-14 Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning Guanqi He et.al. 2504.10334 translate read null
2025-04-14 InstructEngine: Instruction-driven Text-to-Image Alignment Xingyu Lu et.al. 2504.10329 translate read null
2025-04-14 Vision based driving agent for race car simulation environments Gergely Bári et.al. 2504.10266 translate read null
2025-04-14 Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins Collins O. Ogbodo et.al. 2504.10248 translate read null
2025-04-14 Deep Reasoning Translation via Reinforcement Learning Jiaan Wang et.al. 2504.10187 translate read null
2025-04-11 Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing Vinal Asodia et.al. 2504.08704 translate read null
2025-04-11 Pobogot – An Open-Hardware Open-Source Low Cost Robot for Swarm Robotics Alessia Loi et.al. 2504.08686 translate read null
2025-04-11 Reinforcement Learning-Driven Plant-Wide Refinery Planning Using Model Decomposition Zhouchang Li et.al. 2504.08642 translate read null
2025-04-11 Neural Fidelity Calibration for Informative Sim-to-Real Adaptation Youwei Yu et.al. 2504.08604 translate read null
2025-04-11 SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Peixian Ma et.al. 2504.08600 translate read link
2025-04-11 Playpen: An Environment for Exploring Learning Through Conversational Interaction Nicola Horst et.al. 2504.08590 translate read link
2025-04-11 Slicing the Gaussian Mixture Wasserstein Distance Moritz Piening et.al. 2504.08544 translate read null
2025-04-11 Diffusion Models for Robotic Manipulation: A Survey Rosa Wolf et.al. 2504.08438 translate read null
2025-04-11 Belief States for Cooperative Multi-Agent Reinforcement Learning under Partial Observability Paul J. Pritz et.al. 2504.08417 translate read null
2025-04-11 Scalable Conflict-free Decision Making with Photons Kohei Konaka et.al. 2504.08331 translate read null
2025-04-10 Perception-R1: Pioneering Perception Policy with Reinforcement Learning En Yu et.al. 2504.07954 translate read link
2025-04-10 Echo: An Open-Source, Low-Cost Teleoperation System with Force Feedback for Dataset Collection in Robot Learning Artem Bazhenov et.al. 2504.07939 translate read null
2025-04-10 Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining Rosie Zhao et.al. 2504.07912 translate read link
2025-04-10 Fast Adaptation with Behavioral Foundation Models Harshit Sikchi et.al. 2504.07896 translate read null
2025-04-10 2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization Mengyang Li et.al. 2504.07856 translate read null
2025-04-10 Genetic Programming with Reinforcement Learning Trained Transformer for Real-World Dynamic Scheduling Problems Xian Chen et.al. 2504.07779 translate read null
2025-04-10 Harnessing Equivariance: Modeling Turbulence with Graph Neural Networks Marius Kurz et.al. 2504.07741 translate read null
2025-04-10 Relaxing the Markov Requirements on Reinforcement Learning Under Weak Partial Ignorability MaryLena Bleile et.al. 2504.07722 translate read null
2025-04-10 Sim-to-Real Transfer in Reinforcement Learning for Maneuver Control of a Variable-Pitch MAV Zhikun Wang et.al. 2504.07694 translate read null
2025-04-10 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Haozhan Shen et.al. 2504.07615 translate read link
2025-04-09 Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning Chenjie Hao et.al. 2504.07095 translate read link
2025-04-09 AssistanceZero: Scalably Solving Assistance Games Cassidy Laidlaw et.al. 2504.07091 translate read link
2025-04-09 A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility Andreas Hochlehnert et.al. 2504.07086 translate read link
2025-04-09 To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning Tian Qin et.al. 2504.07052 translate read null
2025-04-09 Free Random Projection for In-Context Reinforcement Learning Tomohiro Hayase et.al. 2504.06983 translate read null
2025-04-09 VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Xinhao Li et.al. 2504.06958 translate read link
2025-04-09 Regret Bounds for Robust Online Decision Making Alexander Appel et.al. 2504.06820 translate read null
2025-04-09 Interactive Expressive Motion Generation Using Dynamic Movement Primitives Till Hielscher et.al. 2504.06735 translate read null
2025-04-09 Learning global control of underactuated systems with Model-Based Reinforcement Learning Niccolò Turcato et.al. 2504.06721 translate read null
2025-04-09 SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination Delin Zhao et.al. 2504.06684 translate read null
2025-04-08 ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface Fangchen Liu et.al. 2504.06156 translate read null
2025-04-08 Adversarial Training of Reward Models Alexander Bukharin et.al. 2504.06141 translate read null
2025-04-08 A Multimedia Analytics Model for the Foundation Model Era Marcel Worring et.al. 2504.06138 translate read null
2025-04-08 Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms Ido Greenberg et.al. 2504.06126 translate read null
2025-04-08 Robo-taxi Fleet Coordination at Scale via Reinforcement Learning Luigi Tresca et.al. 2504.06125 translate read link
2025-04-09 Leanabell-Prover: Posttraining Scaling in Formal Reasoning Jingyuan Zhang et.al. 2504.06122 translate read link
2025-04-08 Trust-Region Twisted Policy Improvement Joery A. de Vries et.al. 2504.06048 translate read null
2025-04-08 Information-Theoretic Reward Decomposition for Generalizable RLHF Liyuan Mao et.al. 2504.06020 translate read null
2025-04-08 Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models J. S. van Hulst et.al. 2504.05978 translate read null
2025-04-08 AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems Zhuoli Zhuang et.al. 2504.05950 translate read null
2025-04-07 RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception Hui Zhang et.al. 2504.05287 translate read link
2025-04-07 Concise Reasoning via Reinforcement Learning Mehdi Fatemi et.al. 2504.05185 translate read link
2025-04-07 Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval Kidist Amde Mekonnen et.al. 2504.05181 translate read link
2025-04-07 RLBayes: a Bayesian Network Structure Learning Algorithm via Reinforcement Learning-Based Search Strategy Mingcan Wang et.al. 2504.05167 translate read null
2025-04-07 A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks Leonardo Kanashiro Felizardo et.al. 2504.05150 translate read link
2025-04-08 VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks Yu Yue et.al. 2504.05118 translate read null
2025-04-07 Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning Anja Surina et.al. 2504.05108 translate read null
2025-04-08 Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation Huilin Yin et.al. 2504.05045 translate read null
2025-04-07 Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning Bibek Poudel et.al. 2504.05018 translate read null
2025-04-07 Wavelet Policy: Imitation Policy Learning in Frequency Domain with Wavelet Transforms Changchuan Yang et.al. 2504.04991 translate read link
2025-04-04 Align to Structure: Aligning Large Language Models with Structural Information Zae Myung Kim et.al. 2504.03622 translate read null
2025-04-04 Optimization of a Triangular Delaunay Mesh Generator using Reinforcement Learning Will Thacher et.al. 2504.03610 translate read null
2025-04-04 Dexterous Manipulation through Imitation Learning: A Survey Shan An et.al. 2504.03515 translate read null
2025-04-04 Learning Dual-Arm Coordination for Grasping Large Flat Objects Yongliang Wang et.al. 2504.03500 translate read null
2025-04-04 Optimizing Quantum Circuits via ZX Diagrams using Reinforcement Learning and Graph Neural Networks Alexander Mattick et.al. 2504.03429 translate read null
2025-04-04 DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models Sathish Kumar et.al. 2504.03423 translate read null
2025-04-04 Autonomous state-space segmentation for Deep-RL sparse reward scenarios Gianluca Maselli et.al. 2504.03420 translate read null
2025-04-04 Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning Sanghwan Bae et.al. 2504.03380 translate read null
2025-04-04 Verification of Autonomous Neural Car Control with KeYmaera X Enguerrand Prebet et.al. 2504.03272 translate read null
2025-04-04 Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward Yanming Wan et.al. 2504.03206 translate read null
2025-04-03 Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Chuning Zhu et.al. 2504.02792 translate read link
2025-04-03 A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy Andrea Ghezzi et.al. 2504.02710 translate read null
2025-04-03 Handover and SINR-Aware Path Optimization in 5G-UAV mmWave Communication using DRL Achilles Kiwanuka Machumilane et.al. 2504.02688 translate read null
2025-04-03 Integrating Human Knowledge Through Action Masking in Reinforcement Learning for Operations Research Mirko Stappert et.al. 2504.02662 translate read null
2025-04-03 SymDQN: Symbolic Knowledge and Reasoning in Neural Network-based Reinforcement Learning Ivo Amador et.al. 2504.02654 translate read null
2025-04-03 Solving the Paint Shop Problem with Flexible Management of Multi-Lane Buffers Using Reinforcement Learning and Action Masking Mirko Stappert et.al. 2504.02644 translate read null
2025-04-03 Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Daoguang Zan et.al. 2504.02605 translate read link
2025-04-03 Regulating Spatial Fairness in a Tripartite Micromobility Sharing System via Reinforcement Learning Matteo Cederle et.al. 2504.02597 translate read null
2025-04-03 LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning Kepu Zhang et.al. 2504.02590 translate read null
2025-04-04 Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Yan Ma et.al. 2504.02587 translate read link
2025-04-02 OpenCodeReasoning: Advancing Data Distillation for Competitive Coding Wasi Uddin Ahmad et.al. 2504.01943 translate read null
2025-04-02 Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity Lisa Coiffard et.al. 2504.01915 translate read null
2025-04-02 GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning Yanzhou Su et.al. 2504.01886 translate read link
2025-04-02 Interpreting Emergent Planning in Model-Free Reinforcement Learning Thomas Bush et.al. 2504.01871 translate read null
2025-04-02 Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error Anne Somalwar et.al. 2504.01766 translate read null
2025-04-03 Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning Ke Jiang et.al. 2504.01719 translate read null
2025-04-02 ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs Yi-Long Lu et.al. 2504.01698 translate read null
2025-04-02 8-DoFs Cable Driven Parallel Robots for Bimanual Teleportation Hung Hon Cheng et.al. 2504.01554 translate read null
2025-04-02 A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics Qihao Ye et.al. 2504.01482 translate read null
2025-04-02 Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning Llewyn Salt et.al. 2504.01459 translate read null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)