Reinforcement Learning - 2025-03

Publish Date Title Authors PDF Translate Read Code
2025-03-31 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Yi Chen et.al. 2503.24376 translate read link
2025-03-31 Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning Yubo Zhang et.al. 2503.24296 translate read null
2025-03-31 Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Jingcheng Hu et.al. 2503.24290 translate read link
2025-03-31 Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning Jiacheng Lin et.al. 2503.24289 translate read link
2025-03-31 Moving Edge for On-Demand Edge Computing: An Uncertainty-aware Approach Fangtong Zhou et.al. 2503.24214 translate read null
2025-03-31 Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantees via Constrained Mean-Field Reinforcement Learning Matej Jusup et.al. 2503.24183 translate read link
2025-03-31 Learning a Canonical Basis of Human Preferences from Binary Ratings Kailas Vodrahalli et.al. 2503.24150 translate read null
2025-03-31 Reinforcement Learning for Safe Autonomous Two Device Navigation of Cerebral Vessels in Mechanical Thrombectomy Harry Robertshaw et.al. 2503.24140 translate read null
2025-03-31 Level the Level: Balancing Game Levels for Asymmetric Player Archetypes With Reinforcement Learning Florian Rupp et.al. 2503.24099 translate read null
2025-03-31 HACTS: a Human-As-Copilot Teleoperation System for Robot Learning Zhiyuan Xu et.al. 2503.24070 translate read null
2025-03-28 Q-Insight: Understanding Image Quality via Visual Reinforcement Learning Weiqi Li et.al. 2503.22679 translate read link
2025-03-28 Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels Adam Wei et.al. 2503.22634 translate read null
2025-03-28 Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments S. Aaron McClendon et.al. 2503.22595 translate read null
2025-03-28 On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations Rajdeep Singh Hundal et.al. 2503.22575 translate read null
2025-03-28 Robust Offline Imitation Learning Through State-level Trajectory Stitching Shuze Wang et.al. 2503.22524 translate read null
2025-03-28 Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments Luke Rowe et.al. 2503.22496 translate read null
2025-03-28 Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model Wangtao Sun et.al. 2503.22480 translate read null
2025-03-28 Control of Humanoid Robots with Parallel Mechanisms using Kinematic Actuation Models Victor Lutz et.al. 2503.22459 translate read null
2025-03-28 Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning Abdullah Vanlioglu et.al. 2503.22456 translate read null
2025-03-28 Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses Sebastián Espinel-Ríos et.al. 2503.22409 translate read null
2025-03-27 Video-R1: Reinforcing Video Reasoning in MLLMs Kaituo Feng et.al. 2503.21776 translate read link
2025-03-27 ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation Zhicheng Lee et.al. 2503.21729 translate read link
2025-03-27 Collab: Controlled Decoding using Mixture of Agents for LLM Alignment Souradip Chakraborty et.al. 2503.21720 translate read null
2025-03-27 Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Wenqi Zhang et.al. 2503.21696 translate read link
2025-03-27 LLM-Gomoku: A Large Language Model-Based System for Strategic Gomoku with Self-Play and Reinforcement Learning Hui Wang et.al. 2503.21683 translate read null
2025-03-27 A tale of two goals: leveraging sequentiality in multi-goal scenarios Olivier Serris et.al. 2503.21677 translate read null
2025-03-27 UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Zhengxi Lu et.al. 2503.21620 translate read link
2025-03-27 A Deep Reinforcement Learning-based Approach for Adaptive Handover Protocols Johannes Voigt et.al. 2503.21601 translate read null
2025-03-27 DATA-WA: Demand-based Adaptive Task Assignment with Dynamic Worker Availability Windows Jinwen Chen et.al. 2503.21458 translate read null
2025-03-27 On Learning-Based Traffic Monitoring With a Swarm of Drones Marko Maljkovic et.al. 2503.21433 translate read null
2025-03-26 Understanding R1-Zero-Like Training: A Critical Perspective Zichen Liu et.al. 2503.20783 translate read link
2025-03-27 Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning Huajie Tan et.al. 2503.20752 translate read link
2025-03-26 Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control Eloy Anguiano Batanero et.al. 2503.20688 translate read null
2025-03-26 Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound Yuhao Huang et.al. 2503.20685 translate read null
2025-03-26 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Han Wu et.al. 2503.20641 translate read link
2025-03-26 State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning Zongyuan Zhang et.al. 2503.20613 translate read null
2025-03-26 Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models Siyuan Guo et.al. 2503.20576 translate read null
2025-03-26 Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems Rakesh Nadig et.al. 2503.20507 translate read null
2025-03-26 Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles Ruoqi Wen et.al. 2503.20462 translate read null
2025-03-26 The Crucial Role of Problem Formulation in Real-World Reinforcement Learning Georg Schäfer et.al. 2503.20442 translate read null
2025-03-25 Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Xiaoyu Tian et.al. 2503.19855 translate read link
2025-03-25 Optimal Path Planning and Cost Minimization for a Drone Delivery System Via Model Predictive Control Muhammad Al-Zafar Khan et.al. 2503.19699 translate read null
2025-03-25 Risk-Aware Reinforcement Learning for Autonomous Driving: Improving Safety When Driving through Intersection Bo Leng et.al. 2503.19690 translate read null
2025-03-25 Learning to chain-of-thought with Jensen’s evidence lower bound Yunhao Tang et.al. 2503.19618 translate read null
2025-03-25 RL-finetuning LLMs from on- and off-policy data with a single algorithm Yunhao Tang et.al. 2503.19612 translate read null
2025-03-25 Optimizing Language Models for Inference Time Objectives using Reinforcement Learning Yunhao Tang et.al. 2503.19595 translate read null
2025-03-25 One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF Xin Cai et.al. 2503.19523 translate read null
2025-03-25 ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning Mingyang Chen et.al. 2503.19470 translate read link
2025-03-25 Multi-Agent Deep Reinforcement Learning for Safe Autonomous Driving with RICS-Assisted MEC Xueyao Zhang et.al. 2503.19418 translate read null
2025-03-25 NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios Songyi Gao et.al. 2503.19267 translate read link
2025-03-24 Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training Brian R. Bartoldson et.al. 2503.18929 translate read link
2025-03-24 SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Weihao Zeng et.al. 2503.18892 translate read link
2025-03-24 Bootstrapped Model Predictive Control Yuhang Wang et.al. 2503.18871 translate read link
2025-03-24 Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm Chak Lam Shek et.al. 2503.18816 translate read null
2025-03-24 Sample-Efficient Reinforcement Learning of Koopman eNMPC Daniel Mayfrank et.al. 2503.18787 translate read null
2025-03-24 Simulation-Driven Balancing of Competitive Game Levels with Reinforcement Learning Florian Rupp et.al. 2503.18748 translate read null
2025-03-24 RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation Chengbo Yuan et.al. 2503.18738 translate read null
2025-03-24 FF-SRL: High Performance GPU-Based Surgical Simulation For Robot Learning Diego Dall’Alba et.al. 2503.18616 translate read null
2025-03-24 Adventurer: Exploration with BiGAN for Deep Reinforcement Learning Yongshuai Liu et.al. 2503.18612 translate read null
2025-03-24 Reinforcement Learning in Switching Non-Stationary Markov Decision Processes: Algorithms and Convergence Analysis Mohsen Amiri et.al. 2503.18607 translate read null
2025-03-21 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Yihe Deng et.al. 2503.17352 translate read link
2025-03-21 Capturing Individual Human Preferences with Reward Features André Barreto et.al. 2503.17338 translate read null
2025-03-21 FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models Mingyang Song et.al. 2503.17287 translate read link
2025-03-21 Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem Abhijeet Pendyala et.al. 2503.17194 translate read null
2025-03-21 Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning Chan Kim et.al. 2503.17125 translate read null
2025-03-21 Neural-Guided Equation Discovery Jannis Brugger et.al. 2503.16953 translate read null
2025-03-21 A New Segment Routing method with Swap Node Selection Strategy Based on Deep Reinforcement Learning for Software Defined Network Miao Ye et.al. 2503.16914 translate read null
2025-03-21 Federated Digital Twin Construction via Distributed Sensing: A Game-Theoretic Online Optimization with Overlapping Coalitions Ruoyang Chen et.al. 2503.16823 translate read null
2025-03-21 BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation Hirotaka Tahara et.al. 2503.16803 translate read null
2025-03-21 Causally Aligned Curriculum Learning Mingxuan Li et.al. 2503.16799 translate read null
2025-03-20 Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Yang Sui et.al. 2503.16419 translate read link
2025-03-20 RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Yiran Qin et.al. 2503.16408 translate read null
2025-03-20 Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming Minori Narita et.al. 2503.16371 translate read null
2025-03-20 JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Muyao Li et.al. 2503.16365 translate read link
2025-03-21 Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Zhaowei Liu et.al. 2503.16252 translate read link
2025-03-20 Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t Quy-Anh Dang et.al. 2503.16219 translate read link
2025-03-20 Explosive Jumping with Rigid and Articulated Soft Quadrupeds via Example Guided Reinforcement Learning Georgios Apostolides et.al. 2503.16197 translate read null
2025-03-20 Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning Yuki Akiyama et.al. 2503.16192 translate read null
2025-03-20 CLS-RL: Image Classification with Rule-Based Reinforcement Learning Ming Li et.al. 2503.16188 translate read link
2025-03-20 Cultural Alignment in Large Language Models Using Soft Prompt Tuning Reem I. Masoud et.al. 2503.16094 translate read null
2025-03-19 Learning to Play Piano in the Real World Yves-Simon Zeulner et.al. 2503.15481 translate read null
2025-03-19 What Makes a Reward Model a Good Teacher? An Optimization Perspective Noam Razin et.al. 2503.15477 translate read link
2025-03-19 CCDP: Composition of Conditional Diffusion Policies with Guided Sampling Amirreza Razmjoo et.al. 2503.15386 translate read null
2025-03-19 Online Imitation Learning for Manipulation via Decaying Relative Correction through Teleoperation Cheng Pan et.al. 2503.15368 translate read null
2025-03-19 Optimizing Decomposition for Optimal Claim Verification Yining Lu et.al. 2503.15354 translate read link
2025-03-19 aiXcoder-7B-v2: Training LLMs to Fully Utilize the Long Context in Repository-level Code Completion Jia Li et.al. 2503.15301 translate read null
2025-03-19 Reinforcement Learning for Robust Athletic Intelligence: Lessons from the 2nd ‘AI Olympics with RealAIGym’ Competition Felix Wiebe et.al. 2503.15290 translate read null
2025-03-19 DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning Ruowen Zhao et.al. 2503.15265 translate read link
2025-03-19 Partially Observable Reinforcement Learning with Memory Traces Onno Eberhard et.al. 2503.15200 translate read null
2025-03-19 Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach Mohamed Hassouna et.al. 2503.15190 translate read null
2025-03-18 DAPO: An Open-Source LLM Reinforcement Learning System at Scale Qiying Yu et.al. 2503.14476 translate read null
2025-03-18 Pauli Network Circuit Synthesis with Reinforcement Learning Ayushi Dubal et.al. 2503.14448 translate read null
2025-03-18 Flying in Highly Dynamic Environments with End-to-end Learning Approach Xiyu Fan et.al. 2503.14352 translate read null
2025-03-18 MANTRA: Enhancing Automated Method-Level Refactoring with Contextual RAG and Multi-Agent LLM Collaboration Yisen Xu et.al. 2503.14340 translate read null
2025-03-18 Revealing higher-order neural representations with generative artificial intelligence Hojjat Azimi Asrari et.al. 2503.14333 translate read null
2025-03-18 Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs Nicolas Le Roux et.al. 2503.14286 translate read null
2025-03-18 Integral modelling and Reinforcement Learning control of 3D liquid metal coating on a moving substrate Fabio Pino et.al. 2503.14270 translate read null
2025-03-18 Automating Experimental Optics with Sample Efficient Machine Learning Methods Arindam Saha et.al. 2503.14260 translate read null
2025-03-18 Quantization-Free Autoregressive Action Transformer Ziyad Sheebaelhamd et.al. 2503.14259 translate read null
2025-03-18 CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration Chunyu Yang et.al. 2503.14254 translate read null
2025-03-17 Uncovering Utility Functions from Observed Outcomes Marta Grzeskiewicz et.al. 2503.13432 translate read null
2025-03-17 FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation Shijie Fang et.al. 2503.13418 translate read null
2025-03-17 A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives Weiqiang Jin et.al. 2503.13415 translate read null
2025-03-17 TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM Ye Wang et.al. 2503.13377 translate read link
2025-03-17 Agents Play Thousands of 3D Video Games Zhongwen Xu et.al. 2503.13356 translate read null
2025-03-17 Local-Global Learning of Interpretable Control Policies: The Interface between MPC and Reinforcement Learning Thomas Banker et.al. 2503.13289 translate read null
2025-03-17 Timing the Match: A Deep Reinforcement Learning Approach for Ride-Hailing and Ride-Pooling Services Yiman Bao et.al. 2503.13200 translate read null
2025-03-17 A representational framework for learning and encoding structurally enriched trajectories in complex agent environments Corina Catarau-Cotutiu et.al. 2503.13194 translate read null
2025-03-17 HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning Wensheng Wang et.al. 2503.13171 translate read null
2025-03-17 Efficient Imitation Under Misspecification Nicolas Espinosa-Dice et.al. 2503.13162 translate read null
2025-03-14 Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning Siyuan Huang et.al. 2503.11646 translate read null
2025-03-14 Scaling the Automated Discovery of Quantum Circuits via Reinforcement Learning with Gadgets Jan Olle et.al. 2503.11638 translate read null
2025-03-14 Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control Yifeng Zhang et.al. 2503.11488 translate read null
2025-03-14 A Review of DeepSeek Models’ Key Innovative Techniques Chengen Wang et.al. 2503.11486 translate read null
2025-03-14 Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning Jose-Luis Holgado-Alvarez et.al. 2503.11467 translate read null
2025-03-14 Optimizing 6G Dense Network Deployment for the Metaverse Using Deep Reinforcement Learning Jie Zhang et.al. 2503.11449 translate read null
2025-03-14 Adaptive Torque Control of Exoskeletons under Spasticity Conditions via Reinforcement Learning Andrés Chavarrías et.al. 2503.11433 translate read null
2025-03-14 TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Hongxiang Zhao et.al. 2503.11423 translate read null
2025-03-14 Reinforcement Learning-Based Controlled Switching Approach for Inrush Current Minimization in Power Transformers Jone Ugarte Valdivielso et.al. 2503.11398 translate read null
2025-03-14 Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model Moritz A. Zanger et.al. 2503.11339 translate read null
2025-03-13 NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models Mert Albaba et.al. 2503.10626 translate read null
2025-03-13 R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Yi Yang et.al. 2503.10615 translate read link
2025-03-13 The Lagrangian Method for Solving Constrained Markov Games Soham Das et.al. 2503.10561 translate read null
2025-03-13 Towards Safe Path Tracking Using the Simplex Architecture Georg Jäger et.al. 2503.10559 translate read null
2025-03-13 SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models Sahar Admoni et.al. 2503.10509 translate read null
2025-03-13 Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality Wei Xiao et.al. 2503.10484 translate read null
2025-03-13 SortingEnv: An Extendable RL-Environment for an Industrial Sorting Process Tom Maus et.al. 2503.10466 translate read null
2025-03-13 Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Liang Wen et.al. 2503.10460 translate read link
2025-03-13 Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback Derun Li et.al. 2503.10434 translate read null
2025-03-13 Towards Constraint-Based Adaptive Hypergraph Learning for Solving Vehicle Routing: An End-to-End Solution Zhenwei Wang et.al. 2503.10421 translate read null
2025-03-12 Strategyproof Reinforcement Learning from Human Feedback Thomas Kleine Buening et.al. 2503.09561 translate read null
2025-03-12 Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Bowen Jin et.al. 2503.09516 translate read link
2025-03-12 RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment Md Morshed Alam et.al. 2503.09513 translate read null
2025-03-12 Reinforcement Learning is all You Need Yongsheng Lian et.al. 2503.09512 translate read null
2025-03-12 ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning Ziyu Wan et.al. 2503.09501 translate read link
2025-03-12 Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic Kexuan Wang et.al. 2503.09391 translate read null
2025-03-12 Evaluating Reinforcement Learning Safety and Trustworthiness in Cyber-Physical Systems Katherine Dearstyne et.al. 2503.09388 translate read null
2025-03-12 Rule-Guided Reinforcement Learning Policy Evaluation and Improvement Martin Tappler et.al. 2503.09270 translate read null
2025-03-12 Large-scale Regional Traffic Signal Control Based on Single-Agent Reinforcement Learning Qiang Li et.al. 2503.09252 translate read null
2025-03-12 MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics Shuguang Chu et.al. 2503.09203 translate read null
2025-03-11 MoE-Loco: Mixture of Experts for Multitask Locomotion Runhan Huang et.al. 2503.08564 translate read null
2025-03-11 Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies Chen Xu et.al. 2503.08558 translate read null
2025-03-11 TLA: Tactile-Language-Action Model for Contact-Rich Manipulation Peng Hao et.al. 2503.08548 translate read null
2025-03-11 GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training Tong Wei et.al. 2503.08525 translate read null
2025-03-11 Hierarchical Multi Agent DRL for Soft Handovers Between Edge Clouds in Open RAN F. Giarrè et.al. 2503.08493 translate read null
2025-03-11 Hybrid Deep Reinforcement Learning for Radio Tracer Localisation in Robotic-assisted Radioguided Surgery Hanyi Zhang et.al. 2503.08492 translate read null
2025-03-12 An Autonomous RL Agent Methodology for Dynamic Web UI Testing in a BDD Framework Ali Hassaan Mughal et.al. 2503.08464 translate read null
2025-03-11 V-Max: Making RL practical for Autonomous Driving Valentin Charraut et.al. 2503.08388 translate read link
2025-03-11 Gait in Eight: Efficient On-Robot Learning for Omnidirectional Quadruped Locomotion Nico Bohlinger et.al. 2503.08375 translate read null
2025-03-11 LiPS: Large-Scale Humanoid Robot Reinforcement Learning with Parallel-Series Structures Qiang Zhang et.al. 2503.08349 translate read null
2025-03-10 Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration Dylan J. Foster et.al. 2503.07453 translate read null
2025-03-10 DRESS: Diffusion Reasoning-based Reward Shaping Scheme For Intelligent Networks Feiran You et.al. 2503.07433 translate read null
2025-03-10 The Interplay of AI-and-RAN: Dynamic Resource Allocation for Converged 6G Platform Syed Danial Ali Shah et.al. 2503.07420 translate read null
2025-03-10 Cost-Effective Design of Grid-tied Community Microgrid Moslem Uddin et.al. 2503.07414 translate read null
2025-03-10 PER-DPP Sampling Framework and Its Application in Path Planning Junzhe Wang et.al. 2503.07411 translate read null
2025-03-10 Towards Safe Robot Foundation Models Maximilian Tölle et.al. 2503.07404 translate read null
2025-03-10 Q-MARL: A quantum-inspired algorithm using neural message passing for large-scale multi-agent reinforcement learning Kha Vo et.al. 2503.07397 translate read null
2025-03-10 AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments Grik Tadevosyan et.al. 2503.07376 translate read null
2025-03-10 MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Fanqing Meng et.al. 2503.07365 translate read link
2025-03-10 Artificial Utopia: Simulation and Intelligent Agents for a Democratised Future Yannick Oswald et.al. 2503.07364 translate read null
2025-03-07 Multi-Fidelity Policy Gradient Algorithms Xinjie Liu et.al. 2503.05696 translate read null
2025-03-07 dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale Yihao Liu et.al. 2503.05646 translate read null
2025-03-07 R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Huatong Song et.al. 2503.05592 translate read null
2025-03-07 InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model Feeza Khan Khanzada et.al. 2503.05573 translate read null
2025-03-07 Tractable Representations for Convergent Approximation of Distributional HJB Equations Julie Alhosh et.al. 2503.05563 translate read null
2025-03-07 Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning Raphael Trumpp et.al. 2503.05546 translate read null
2025-03-07 RiLoCo: An ISAC-oriented AI Solution to Build RIS-empowered Networks Guillermo Encinas-Lago et.al. 2503.05480 translate read null
2025-03-07 Controllable Complementarity: Subjective Preferences in Human-AI Collaboration Chase McDonald et.al. 2503.05455 translate read null
2025-03-07 R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Jiaxing Zhao et.al. 2503.05379 translate read null
2025-03-07 Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning Hyungkyu Kang et.al. 2503.05306 translate read null
2025-03-06 Sample-Optimal Agnostic Boosting with Unlabeled Data Udaya Ghai et.al. 2503.04706 translate read null
2025-03-06 L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning Pranjal Aggarwal et.al. 2503.04697 translate read null
2025-03-06 Multi-Agent Inverse Q-Learning from Demonstrations Nathaniel Haynam et.al. 2503.04679 translate read null
2025-03-06 Learning Generalizable Language-Conditioned Cloth Manipulation from Long Demonstrations Hanyi Zhao et.al. 2503.04557 translate read null
2025-03-06 PALo: Learning Posture-Aware Locomotion for Quadruped Robots Xiangyu Miao et.al. 2503.04462 translate read null
2025-03-06 AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services Xiaoqi Wang et.al. 2503.04418 translate read null
2025-03-06 Learning Transformer-based World Models with Contrastive Predictive Coding Maxime Burchi et.al. 2503.04416 translate read null
2025-03-06 Energy-Aware Task Offloading for Rotatable STAR-RIS-Enhanced Mobile Edge Computing Systems Dongdong Yang et.al. 2503.04397 translate read null
2025-03-06 Delay-Aware Digital Twin Synchronization in Mobile Edge Networks with Semantic Communications Bin Li et.al. 2503.04387 translate read null
2025-03-06 Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models Niccolò Turcato et.al. 2503.04280 translate read null
2025-03-05 Curating Demonstrations using Online Experience Annie S. Chen et.al. 2503.03707 translate read null
2025-03-05 A Generative Approach to High Fidelity 3D Reconstruction from Text Data Venkat Kumar R et.al. 2503.03664 translate read null
2025-03-05 Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns Dong Tian et.al. 2503.03660 translate read null
2025-03-05 Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset Jessica Hoffmann et.al. 2503.03654 translate read null
2025-03-05 Olympus: A Jumping Quadruped for Planetary Exploration Utilizing Reinforcement Learning for In-Flight Attitude Control Jørgen Anker Olsen et.al. 2503.03574 translate read null
2025-03-05 Probabilistic Insights for Efficient Exploration Strategies in Reinforcement Learning Ernesto Garcia et.al. 2503.03565 translate read null
2025-03-05 DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions Anna Kuchko et.al. 2503.03515 translate read null
2025-03-05 SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning Borong Zhang et.al. 2503.03480 translate read null
2025-03-05 Continuous Control of Diverse Skills in Quadruped Robots Without Complete Expert Datasets Jiaxin Tu et.al. 2503.03476 translate read null
2025-03-05 Navigating Intelligence: A Survey of Google OR-Tools and Machine Learning for Global Path Planning in Autonomous Vehicles Alexandre Benoit et.al. 2503.03338 translate read null
2025-03-04 Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation Han Xue et.al. 2503.02881 translate read null
2025-03-04 AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation Songming Zhang et.al. 2503.02832 translate read null
2025-03-04 Meta-Learning to Explore via Memory Density Feedback Kevin L. McKee et.al. 2503.02831 translate read null
2025-03-04 Quantitative Resilience Modeling for Autonomous Cyber Defense Xavier Cadet et.al. 2503.02780 translate read null
2025-03-04 Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning Qiyang Yan et.al. 2503.02738 translate read null
2025-03-04 Learning-Based Passive Fault-Tolerant Control of a Quadrotor with Rotor Failure Jiehao Chen et.al. 2503.02649 translate read null
2025-03-04 Human-aligned Safe Reinforcement Learning for Highway On-Ramp Merging in Dense Traffic Yang Li et.al. 2503.02624 translate read null
2025-03-04 Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models Paul Stangel et.al. 2503.02623 translate read null
2025-03-04 Reinforcement Learning-based Threat Assessment Wuzhou Sun et.al. 2503.02612 translate read null
2025-03-04 What Makes a Model Breathe? Understanding Reinforcement Learning Reward Function Design in Biomechanical User Simulation Hannah Selder et.al. 2503.02571 translate read null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)