Reinforcement Learning - 2025-03
Reinforcement Learning - 2025-03
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-03-31 | Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | Yi Chen et.al. | 2503.24376 | translate | read | link |
| 2025-03-31 | Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning | Yubo Zhang et.al. | 2503.24296 | translate | read | null |
| 2025-03-31 | Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model | Jingcheng Hu et.al. | 2503.24290 | translate | read | link |
| 2025-03-31 | Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning | Jiacheng Lin et.al. | 2503.24289 | translate | read | link |
| 2025-03-31 | Moving Edge for On-Demand Edge Computing: An Uncertainty-aware Approach | Fangtong Zhou et.al. | 2503.24214 | translate | read | null |
| 2025-03-31 | Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantees via Constrained Mean-Field Reinforcement Learning | Matej Jusup et.al. | 2503.24183 | translate | read | link |
| 2025-03-31 | Learning a Canonical Basis of Human Preferences from Binary Ratings | Kailas Vodrahalli et.al. | 2503.24150 | translate | read | null |
| 2025-03-31 | Reinforcement Learning for Safe Autonomous Two Device Navigation of Cerebral Vessels in Mechanical Thrombectomy | Harry Robertshaw et.al. | 2503.24140 | translate | read | null |
| 2025-03-31 | Level the Level: Balancing Game Levels for Asymmetric Player Archetypes With Reinforcement Learning | Florian Rupp et.al. | 2503.24099 | translate | read | null |
| 2025-03-31 | HACTS: a Human-As-Copilot Teleoperation System for Robot Learning | Zhiyuan Xu et.al. | 2503.24070 | translate | read | null |
| 2025-03-28 | Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Weiqi Li et.al. | 2503.22679 | translate | read | link |
| 2025-03-28 | Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels | Adam Wei et.al. | 2503.22634 | translate | read | null |
| 2025-03-28 | Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments | S. Aaron McClendon et.al. | 2503.22595 | translate | read | null |
| 2025-03-28 | On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations | Rajdeep Singh Hundal et.al. | 2503.22575 | translate | read | null |
| 2025-03-28 | Robust Offline Imitation Learning Through State-level Trajectory Stitching | Shuze Wang et.al. | 2503.22524 | translate | read | null |
| 2025-03-28 | Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments | Luke Rowe et.al. | 2503.22496 | translate | read | null |
| 2025-03-28 | Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model | Wangtao Sun et.al. | 2503.22480 | translate | read | null |
| 2025-03-28 | Control of Humanoid Robots with Parallel Mechanisms using Kinematic Actuation Models | Victor Lutz et.al. | 2503.22459 | translate | read | null |
| 2025-03-28 | Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning | Abdullah Vanlioglu et.al. | 2503.22456 | translate | read | null |
| 2025-03-28 | Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses | Sebastián Espinel-Ríos et.al. | 2503.22409 | translate | read | null |
| 2025-03-27 | Video-R1: Reinforcing Video Reasoning in MLLMs | Kaituo Feng et.al. | 2503.21776 | translate | read | link |
| 2025-03-27 | ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation | Zhicheng Lee et.al. | 2503.21729 | translate | read | link |
| 2025-03-27 | Collab: Controlled Decoding using Mixture of Agents for LLM Alignment | Souradip Chakraborty et.al. | 2503.21720 | translate | read | null |
| 2025-03-27 | Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks | Wenqi Zhang et.al. | 2503.21696 | translate | read | link |
| 2025-03-27 | LLM-Gomoku: A Large Language Model-Based System for Strategic Gomoku with Self-Play and Reinforcement Learning | Hui Wang et.al. | 2503.21683 | translate | read | null |
| 2025-03-27 | A tale of two goals: leveraging sequentiality in multi-goal scenarios | Olivier Serris et.al. | 2503.21677 | translate | read | null |
| 2025-03-27 | UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning | Zhengxi Lu et.al. | 2503.21620 | translate | read | link |
| 2025-03-27 | A Deep Reinforcement Learning-based Approach for Adaptive Handover Protocols | Johannes Voigt et.al. | 2503.21601 | translate | read | null |
| 2025-03-27 | DATA-WA: Demand-based Adaptive Task Assignment with Dynamic Worker Availability Windows | Jinwen Chen et.al. | 2503.21458 | translate | read | null |
| 2025-03-27 | On Learning-Based Traffic Monitoring With a Swarm of Drones | Marko Maljkovic et.al. | 2503.21433 | translate | read | null |
| 2025-03-26 | Understanding R1-Zero-Like Training: A Critical Perspective | Zichen Liu et.al. | 2503.20783 | translate | read | link |
| 2025-03-27 | Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning | Huajie Tan et.al. | 2503.20752 | translate | read | link |
| 2025-03-26 | Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control | Eloy Anguiano Batanero et.al. | 2503.20688 | translate | read | null |
| 2025-03-26 | Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound | Yuhao Huang et.al. | 2503.20685 | translate | read | null |
| 2025-03-26 | Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging | Han Wu et.al. | 2503.20641 | translate | read | link |
| 2025-03-26 | State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning | Zongyuan Zhang et.al. | 2503.20613 | translate | read | null |
| 2025-03-26 | Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models | Siyuan Guo et.al. | 2503.20576 | translate | read | null |
| 2025-03-26 | Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems | Rakesh Nadig et.al. | 2503.20507 | translate | read | null |
| 2025-03-26 | Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles | Ruoqi Wen et.al. | 2503.20462 | translate | read | null |
| 2025-03-26 | The Crucial Role of Problem Formulation in Real-World Reinforcement Learning | Georg Schäfer et.al. | 2503.20442 | translate | read | null |
| 2025-03-25 | Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking | Xiaoyu Tian et.al. | 2503.19855 | translate | read | link |
| 2025-03-25 | Optimal Path Planning and Cost Minimization for a Drone Delivery System Via Model Predictive Control | Muhammad Al-Zafar Khan et.al. | 2503.19699 | translate | read | null |
| 2025-03-25 | Risk-Aware Reinforcement Learning for Autonomous Driving: Improving Safety When Driving through Intersection | Bo Leng et.al. | 2503.19690 | translate | read | null |
| 2025-03-25 | Learning to chain-of-thought with Jensen’s evidence lower bound | Yunhao Tang et.al. | 2503.19618 | translate | read | null |
| 2025-03-25 | RL-finetuning LLMs from on- and off-policy data with a single algorithm | Yunhao Tang et.al. | 2503.19612 | translate | read | null |
| 2025-03-25 | Optimizing Language Models for Inference Time Objectives using Reinforcement Learning | Yunhao Tang et.al. | 2503.19595 | translate | read | null |
| 2025-03-25 | One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF | Xin Cai et.al. | 2503.19523 | translate | read | null |
| 2025-03-25 | ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning | Mingyang Chen et.al. | 2503.19470 | translate | read | link |
| 2025-03-25 | Multi-Agent Deep Reinforcement Learning for Safe Autonomous Driving with RICS-Assisted MEC | Xueyao Zhang et.al. | 2503.19418 | translate | read | null |
| 2025-03-25 | NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios | Songyi Gao et.al. | 2503.19267 | translate | read | link |
| 2025-03-24 | Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training | Brian R. Bartoldson et.al. | 2503.18929 | translate | read | link |
| 2025-03-24 | SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild | Weihao Zeng et.al. | 2503.18892 | translate | read | link |
| 2025-03-24 | Bootstrapped Model Predictive Control | Yuhang Wang et.al. | 2503.18871 | translate | read | link |
| 2025-03-24 | Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm | Chak Lam Shek et.al. | 2503.18816 | translate | read | null |
| 2025-03-24 | Sample-Efficient Reinforcement Learning of Koopman eNMPC | Daniel Mayfrank et.al. | 2503.18787 | translate | read | null |
| 2025-03-24 | Simulation-Driven Balancing of Competitive Game Levels with Reinforcement Learning | Florian Rupp et.al. | 2503.18748 | translate | read | null |
| 2025-03-24 | RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation | Chengbo Yuan et.al. | 2503.18738 | translate | read | null |
| 2025-03-24 | FF-SRL: High Performance GPU-Based Surgical Simulation For Robot Learning | Diego Dall’Alba et.al. | 2503.18616 | translate | read | null |
| 2025-03-24 | Adventurer: Exploration with BiGAN for Deep Reinforcement Learning | Yongshuai Liu et.al. | 2503.18612 | translate | read | null |
| 2025-03-24 | Reinforcement Learning in Switching Non-Stationary Markov Decision Processes: Algorithms and Convergence Analysis | Mohsen Amiri et.al. | 2503.18607 | translate | read | null |
| 2025-03-21 | OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement | Yihe Deng et.al. | 2503.17352 | translate | read | link |
| 2025-03-21 | Capturing Individual Human Preferences with Reward Features | André Barreto et.al. | 2503.17338 | translate | read | null |
| 2025-03-21 | FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models | Mingyang Song et.al. | 2503.17287 | translate | read | link |
| 2025-03-21 | Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem | Abhijeet Pendyala et.al. | 2503.17194 | translate | read | null |
| 2025-03-21 | Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning | Chan Kim et.al. | 2503.17125 | translate | read | null |
| 2025-03-21 | Neural-Guided Equation Discovery | Jannis Brugger et.al. | 2503.16953 | translate | read | null |
| 2025-03-21 | A New Segment Routing method with Swap Node Selection Strategy Based on Deep Reinforcement Learning for Software Defined Network | Miao Ye et.al. | 2503.16914 | translate | read | null |
| 2025-03-21 | Federated Digital Twin Construction via Distributed Sensing: A Game-Theoretic Online Optimization with Overlapping Coalitions | Ruoyang Chen et.al. | 2503.16823 | translate | read | null |
| 2025-03-21 | BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation | Hirotaka Tahara et.al. | 2503.16803 | translate | read | null |
| 2025-03-21 | Causally Aligned Curriculum Learning | Mingxuan Li et.al. | 2503.16799 | translate | read | null |
| 2025-03-20 | Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Yang Sui et.al. | 2503.16419 | translate | read | link |
| 2025-03-20 | RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints | Yiran Qin et.al. | 2503.16408 | translate | read | null |
| 2025-03-20 | Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming | Minori Narita et.al. | 2503.16371 | translate | read | null |
| 2025-03-20 | JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse | Muyao Li et.al. | 2503.16365 | translate | read | link |
| 2025-03-21 | Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning | Zhaowei Liu et.al. | 2503.16252 | translate | read | link |
| 2025-03-20 | Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t | Quy-Anh Dang et.al. | 2503.16219 | translate | read | link |
| 2025-03-20 | Explosive Jumping with Rigid and Articulated Soft Quadrupeds via Example Guided Reinforcement Learning | Georgios Apostolides et.al. | 2503.16197 | translate | read | null |
| 2025-03-20 | Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning | Yuki Akiyama et.al. | 2503.16192 | translate | read | null |
| 2025-03-20 | CLS-RL: Image Classification with Rule-Based Reinforcement Learning | Ming Li et.al. | 2503.16188 | translate | read | link |
| 2025-03-20 | Cultural Alignment in Large Language Models Using Soft Prompt Tuning | Reem I. Masoud et.al. | 2503.16094 | translate | read | null |
| 2025-03-19 | Learning to Play Piano in the Real World | Yves-Simon Zeulner et.al. | 2503.15481 | translate | read | null |
| 2025-03-19 | What Makes a Reward Model a Good Teacher? An Optimization Perspective | Noam Razin et.al. | 2503.15477 | translate | read | link |
| 2025-03-19 | CCDP: Composition of Conditional Diffusion Policies with Guided Sampling | Amirreza Razmjoo et.al. | 2503.15386 | translate | read | null |
| 2025-03-19 | Online Imitation Learning for Manipulation via Decaying Relative Correction through Teleoperation | Cheng Pan et.al. | 2503.15368 | translate | read | null |
| 2025-03-19 | Optimizing Decomposition for Optimal Claim Verification | Yining Lu et.al. | 2503.15354 | translate | read | link |
| 2025-03-19 | aiXcoder-7B-v2: Training LLMs to Fully Utilize the Long Context in Repository-level Code Completion | Jia Li et.al. | 2503.15301 | translate | read | null |
| 2025-03-19 | Reinforcement Learning for Robust Athletic Intelligence: Lessons from the 2nd ‘AI Olympics with RealAIGym’ Competition | Felix Wiebe et.al. | 2503.15290 | translate | read | null |
| 2025-03-19 | DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning | Ruowen Zhao et.al. | 2503.15265 | translate | read | link |
| 2025-03-19 | Partially Observable Reinforcement Learning with Memory Traces | Onno Eberhard et.al. | 2503.15200 | translate | read | null |
| 2025-03-19 | Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach | Mohamed Hassouna et.al. | 2503.15190 | translate | read | null |
| 2025-03-18 | DAPO: An Open-Source LLM Reinforcement Learning System at Scale | Qiying Yu et.al. | 2503.14476 | translate | read | null |
| 2025-03-18 | Pauli Network Circuit Synthesis with Reinforcement Learning | Ayushi Dubal et.al. | 2503.14448 | translate | read | null |
| 2025-03-18 | Flying in Highly Dynamic Environments with End-to-end Learning Approach | Xiyu Fan et.al. | 2503.14352 | translate | read | null |
| 2025-03-18 | MANTRA: Enhancing Automated Method-Level Refactoring with Contextual RAG and Multi-Agent LLM Collaboration | Yisen Xu et.al. | 2503.14340 | translate | read | null |
| 2025-03-18 | Revealing higher-order neural representations with generative artificial intelligence | Hojjat Azimi Asrari et.al. | 2503.14333 | translate | read | null |
| 2025-03-18 | Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs | Nicolas Le Roux et.al. | 2503.14286 | translate | read | null |
| 2025-03-18 | Integral modelling and Reinforcement Learning control of 3D liquid metal coating on a moving substrate | Fabio Pino et.al. | 2503.14270 | translate | read | null |
| 2025-03-18 | Automating Experimental Optics with Sample Efficient Machine Learning Methods | Arindam Saha et.al. | 2503.14260 | translate | read | null |
| 2025-03-18 | Quantization-Free Autoregressive Action Transformer | Ziyad Sheebaelhamd et.al. | 2503.14259 | translate | read | null |
| 2025-03-18 | CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration | Chunyu Yang et.al. | 2503.14254 | translate | read | null |
| 2025-03-17 | Uncovering Utility Functions from Observed Outcomes | Marta Grzeskiewicz et.al. | 2503.13432 | translate | read | null |
| 2025-03-17 | FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation | Shijie Fang et.al. | 2503.13418 | translate | read | null |
| 2025-03-17 | A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives | Weiqiang Jin et.al. | 2503.13415 | translate | read | null |
| 2025-03-17 | TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM | Ye Wang et.al. | 2503.13377 | translate | read | link |
| 2025-03-17 | Agents Play Thousands of 3D Video Games | Zhongwen Xu et.al. | 2503.13356 | translate | read | null |
| 2025-03-17 | Local-Global Learning of Interpretable Control Policies: The Interface between MPC and Reinforcement Learning | Thomas Banker et.al. | 2503.13289 | translate | read | null |
| 2025-03-17 | Timing the Match: A Deep Reinforcement Learning Approach for Ride-Hailing and Ride-Pooling Services | Yiman Bao et.al. | 2503.13200 | translate | read | null |
| 2025-03-17 | A representational framework for learning and encoding structurally enriched trajectories in complex agent environments | Corina Catarau-Cotutiu et.al. | 2503.13194 | translate | read | null |
| 2025-03-17 | HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning | Wensheng Wang et.al. | 2503.13171 | translate | read | null |
| 2025-03-17 | Efficient Imitation Under Misspecification | Nicolas Espinosa-Dice et.al. | 2503.13162 | translate | read | null |
| 2025-03-14 | Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning | Siyuan Huang et.al. | 2503.11646 | translate | read | null |
| 2025-03-14 | Scaling the Automated Discovery of Quantum Circuits via Reinforcement Learning with Gadgets | Jan Olle et.al. | 2503.11638 | translate | read | null |
| 2025-03-14 | Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control | Yifeng Zhang et.al. | 2503.11488 | translate | read | null |
| 2025-03-14 | A Review of DeepSeek Models’ Key Innovative Techniques | Chengen Wang et.al. | 2503.11486 | translate | read | null |
| 2025-03-14 | Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning | Jose-Luis Holgado-Alvarez et.al. | 2503.11467 | translate | read | null |
| 2025-03-14 | Optimizing 6G Dense Network Deployment for the Metaverse Using Deep Reinforcement Learning | Jie Zhang et.al. | 2503.11449 | translate | read | null |
| 2025-03-14 | Adaptive Torque Control of Exoskeletons under Spasticity Conditions via Reinforcement Learning | Andrés Chavarrías et.al. | 2503.11433 | translate | read | null |
| 2025-03-14 | TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation | Hongxiang Zhao et.al. | 2503.11423 | translate | read | null |
| 2025-03-14 | Reinforcement Learning-Based Controlled Switching Approach for Inrush Current Minimization in Power Transformers | Jone Ugarte Valdivielso et.al. | 2503.11398 | translate | read | null |
| 2025-03-14 | Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model | Moritz A. Zanger et.al. | 2503.11339 | translate | read | null |
| 2025-03-13 | NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models | Mert Albaba et.al. | 2503.10626 | translate | read | null |
| 2025-03-13 | R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization | Yi Yang et.al. | 2503.10615 | translate | read | link |
| 2025-03-13 | The Lagrangian Method for Solving Constrained Markov Games | Soham Das et.al. | 2503.10561 | translate | read | null |
| 2025-03-13 | Towards Safe Path Tracking Using the Simplex Architecture | Georg Jäger et.al. | 2503.10559 | translate | read | null |
| 2025-03-13 | SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models | Sahar Admoni et.al. | 2503.10509 | translate | read | null |
| 2025-03-13 | Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality | Wei Xiao et.al. | 2503.10484 | translate | read | null |
| 2025-03-13 | SortingEnv: An Extendable RL-Environment for an Industrial Sorting Process | Tom Maus et.al. | 2503.10466 | translate | read | null |
| 2025-03-13 | Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | Liang Wen et.al. | 2503.10460 | translate | read | link |
| 2025-03-13 | Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback | Derun Li et.al. | 2503.10434 | translate | read | null |
| 2025-03-13 | Towards Constraint-Based Adaptive Hypergraph Learning for Solving Vehicle Routing: An End-to-End Solution | Zhenwei Wang et.al. | 2503.10421 | translate | read | null |
| 2025-03-12 | Strategyproof Reinforcement Learning from Human Feedback | Thomas Kleine Buening et.al. | 2503.09561 | translate | read | null |
| 2025-03-12 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Bowen Jin et.al. | 2503.09516 | translate | read | link |
| 2025-03-12 | RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment | Md Morshed Alam et.al. | 2503.09513 | translate | read | null |
| 2025-03-12 | Reinforcement Learning is all You Need | Yongsheng Lian et.al. | 2503.09512 | translate | read | null |
| 2025-03-12 | ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning | Ziyu Wan et.al. | 2503.09501 | translate | read | link |
| 2025-03-12 | Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic | Kexuan Wang et.al. | 2503.09391 | translate | read | null |
| 2025-03-12 | Evaluating Reinforcement Learning Safety and Trustworthiness in Cyber-Physical Systems | Katherine Dearstyne et.al. | 2503.09388 | translate | read | null |
| 2025-03-12 | Rule-Guided Reinforcement Learning Policy Evaluation and Improvement | Martin Tappler et.al. | 2503.09270 | translate | read | null |
| 2025-03-12 | Large-scale Regional Traffic Signal Control Based on Single-Agent Reinforcement Learning | Qiang Li et.al. | 2503.09252 | translate | read | null |
| 2025-03-12 | MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics | Shuguang Chu et.al. | 2503.09203 | translate | read | null |
| 2025-03-11 | MoE-Loco: Mixture of Experts for Multitask Locomotion | Runhan Huang et.al. | 2503.08564 | translate | read | null |
| 2025-03-11 | Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies | Chen Xu et.al. | 2503.08558 | translate | read | null |
| 2025-03-11 | TLA: Tactile-Language-Action Model for Contact-Rich Manipulation | Peng Hao et.al. | 2503.08548 | translate | read | null |
| 2025-03-11 | GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training | Tong Wei et.al. | 2503.08525 | translate | read | null |
| 2025-03-11 | Hierarchical Multi Agent DRL for Soft Handovers Between Edge Clouds in Open RAN | F. Giarrè et.al. | 2503.08493 | translate | read | null |
| 2025-03-11 | Hybrid Deep Reinforcement Learning for Radio Tracer Localisation in Robotic-assisted Radioguided Surgery | Hanyi Zhang et.al. | 2503.08492 | translate | read | null |
| 2025-03-12 | An Autonomous RL Agent Methodology for Dynamic Web UI Testing in a BDD Framework | Ali Hassaan Mughal et.al. | 2503.08464 | translate | read | null |
| 2025-03-11 | V-Max: Making RL practical for Autonomous Driving | Valentin Charraut et.al. | 2503.08388 | translate | read | link |
| 2025-03-11 | Gait in Eight: Efficient On-Robot Learning for Omnidirectional Quadruped Locomotion | Nico Bohlinger et.al. | 2503.08375 | translate | read | null |
| 2025-03-11 | LiPS: Large-Scale Humanoid Robot Reinforcement Learning with Parallel-Series Structures | Qiang Zhang et.al. | 2503.08349 | translate | read | null |
| 2025-03-10 | Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration | Dylan J. Foster et.al. | 2503.07453 | translate | read | null |
| 2025-03-10 | DRESS: Diffusion Reasoning-based Reward Shaping Scheme For Intelligent Networks | Feiran You et.al. | 2503.07433 | translate | read | null |
| 2025-03-10 | The Interplay of AI-and-RAN: Dynamic Resource Allocation for Converged 6G Platform | Syed Danial Ali Shah et.al. | 2503.07420 | translate | read | null |
| 2025-03-10 | Cost-Effective Design of Grid-tied Community Microgrid | Moslem Uddin et.al. | 2503.07414 | translate | read | null |
| 2025-03-10 | PER-DPP Sampling Framework and Its Application in Path Planning | Junzhe Wang et.al. | 2503.07411 | translate | read | null |
| 2025-03-10 | Towards Safe Robot Foundation Models | Maximilian Tölle et.al. | 2503.07404 | translate | read | null |
| 2025-03-10 | Q-MARL: A quantum-inspired algorithm using neural message passing for large-scale multi-agent reinforcement learning | Kha Vo et.al. | 2503.07397 | translate | read | null |
| 2025-03-10 | AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments | Grik Tadevosyan et.al. | 2503.07376 | translate | read | null |
| 2025-03-10 | MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning | Fanqing Meng et.al. | 2503.07365 | translate | read | link |
| 2025-03-10 | Artificial Utopia: Simulation and Intelligent Agents for a Democratised Future | Yannick Oswald et.al. | 2503.07364 | translate | read | null |
| 2025-03-07 | Multi-Fidelity Policy Gradient Algorithms | Xinjie Liu et.al. | 2503.05696 | translate | read | null |
| 2025-03-07 | dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale | Yihao Liu et.al. | 2503.05646 | translate | read | null |
| 2025-03-07 | R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning | Huatong Song et.al. | 2503.05592 | translate | read | null |
| 2025-03-07 | InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model | Feeza Khan Khanzada et.al. | 2503.05573 | translate | read | null |
| 2025-03-07 | Tractable Representations for Convergent Approximation of Distributional HJB Equations | Julie Alhosh et.al. | 2503.05563 | translate | read | null |
| 2025-03-07 | Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning | Raphael Trumpp et.al. | 2503.05546 | translate | read | null |
| 2025-03-07 | RiLoCo: An ISAC-oriented AI Solution to Build RIS-empowered Networks | Guillermo Encinas-Lago et.al. | 2503.05480 | translate | read | null |
| 2025-03-07 | Controllable Complementarity: Subjective Preferences in Human-AI Collaboration | Chase McDonald et.al. | 2503.05455 | translate | read | null |
| 2025-03-07 | R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning | Jiaxing Zhao et.al. | 2503.05379 | translate | read | null |
| 2025-03-07 | Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning | Hyungkyu Kang et.al. | 2503.05306 | translate | read | null |
| 2025-03-06 | Sample-Optimal Agnostic Boosting with Unlabeled Data | Udaya Ghai et.al. | 2503.04706 | translate | read | null |
| 2025-03-06 | L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning | Pranjal Aggarwal et.al. | 2503.04697 | translate | read | null |
| 2025-03-06 | Multi-Agent Inverse Q-Learning from Demonstrations | Nathaniel Haynam et.al. | 2503.04679 | translate | read | null |
| 2025-03-06 | Learning Generalizable Language-Conditioned Cloth Manipulation from Long Demonstrations | Hanyi Zhao et.al. | 2503.04557 | translate | read | null |
| 2025-03-06 | PALo: Learning Posture-Aware Locomotion for Quadruped Robots | Xiangyu Miao et.al. | 2503.04462 | translate | read | null |
| 2025-03-06 | AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services | Xiaoqi Wang et.al. | 2503.04418 | translate | read | null |
| 2025-03-06 | Learning Transformer-based World Models with Contrastive Predictive Coding | Maxime Burchi et.al. | 2503.04416 | translate | read | null |
| 2025-03-06 | Energy-Aware Task Offloading for Rotatable STAR-RIS-Enhanced Mobile Edge Computing Systems | Dongdong Yang et.al. | 2503.04397 | translate | read | null |
| 2025-03-06 | Delay-Aware Digital Twin Synchronization in Mobile Edge Networks with Semantic Communications | Bin Li et.al. | 2503.04387 | translate | read | null |
| 2025-03-06 | Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models | Niccolò Turcato et.al. | 2503.04280 | translate | read | null |
| 2025-03-05 | Curating Demonstrations using Online Experience | Annie S. Chen et.al. | 2503.03707 | translate | read | null |
| 2025-03-05 | A Generative Approach to High Fidelity 3D Reconstruction from Text Data | Venkat Kumar R et.al. | 2503.03664 | translate | read | null |
| 2025-03-05 | Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns | Dong Tian et.al. | 2503.03660 | translate | read | null |
| 2025-03-05 | Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset | Jessica Hoffmann et.al. | 2503.03654 | translate | read | null |
| 2025-03-05 | Olympus: A Jumping Quadruped for Planetary Exploration Utilizing Reinforcement Learning for In-Flight Attitude Control | Jørgen Anker Olsen et.al. | 2503.03574 | translate | read | null |
| 2025-03-05 | Probabilistic Insights for Efficient Exploration Strategies in Reinforcement Learning | Ernesto Garcia et.al. | 2503.03565 | translate | read | null |
| 2025-03-05 | DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions | Anna Kuchko et.al. | 2503.03515 | translate | read | null |
| 2025-03-05 | SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning | Borong Zhang et.al. | 2503.03480 | translate | read | null |
| 2025-03-05 | Continuous Control of Diverse Skills in Quadruped Robots Without Complete Expert Datasets | Jiaxin Tu et.al. | 2503.03476 | translate | read | null |
| 2025-03-05 | Navigating Intelligence: A Survey of Google OR-Tools and Machine Learning for Global Path Planning in Autonomous Vehicles | Alexandre Benoit et.al. | 2503.03338 | translate | read | null |
| 2025-03-04 | Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation | Han Xue et.al. | 2503.02881 | translate | read | null |
| 2025-03-04 | AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation | Songming Zhang et.al. | 2503.02832 | translate | read | null |
| 2025-03-04 | Meta-Learning to Explore via Memory Density Feedback | Kevin L. McKee et.al. | 2503.02831 | translate | read | null |
| 2025-03-04 | Quantitative Resilience Modeling for Autonomous Cyber Defense | Xavier Cadet et.al. | 2503.02780 | translate | read | null |
| 2025-03-04 | Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning | Qiyang Yan et.al. | 2503.02738 | translate | read | null |
| 2025-03-04 | Learning-Based Passive Fault-Tolerant Control of a Quadrotor with Rotor Failure | Jiehao Chen et.al. | 2503.02649 | translate | read | null |
| 2025-03-04 | Human-aligned Safe Reinforcement Learning for Highway On-Ramp Merging in Dense Traffic | Yang Li et.al. | 2503.02624 | translate | read | null |
| 2025-03-04 | Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models | Paul Stangel et.al. | 2503.02623 | translate | read | null |
| 2025-03-04 | Reinforcement Learning-based Threat Assessment | Wuzhou Sun et.al. | 2503.02612 | translate | read | null |
| 2025-03-04 | What Makes a Model Breathe? Understanding Reinforcement Learning Reward Function Design in Biomechanical User Simulation | Hannah Selder et.al. | 2503.02571 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)