Reinforcement Learning

Publish Date Title Authors PDF Code
2025-12-18 Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification Qihao Liu et.al. 2512.16921 null
2025-12-18 AdaTooler-V: Adaptive Tool-Use for Images and Videos Chaoyang Wang et.al. 2512.16918 null
2025-12-18 Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning Qihao Liu et.al. 2512.16917 null
2025-12-18 Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward Peter Chen et.al. 2512.16912 null
2025-12-18 Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning Andrew Wagenmaker et.al. 2512.16911 null
2025-12-18 MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning Yuanchen Ju et.al. 2512.16909 null
2025-12-18 AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning Tzu-Han Lin et.al. 2512.16883 null
2025-12-18 A survey of the orienteering problem: model evolution, algorithmic advances, and future directions Songhao Shen et.al. 2512.16865 null
2025-12-18 RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Tianyuan Qu et.al. 2512.16864 null
2025-12-18 ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning Zihan Zhou et.al. 2512.16861 null
2025-12-18 Meta-RL Induces Exploration in Language Agents Yulun Jiang et.al. 2512.16848 null
2025-12-18 Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning Bahman Abolhassani et.al. 2512.16813 null
2025-12-18 Olaf: Bringing an Animated Character to Life in the Physical World David Müller et.al. 2512.16705 null
2025-12-18 JustRL: Scaling a 1.5B LLM with a Simple RL Recipe Bingxiang He et.al. 2512.16649 null
2025-12-18 Implementing a Sharia Chatbot as a Consultation Medium for Questions About Islam Wisnu Uriawan et.al. 2512.16644 null
2025-12-18 Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game Barna Pásztor et.al. 2512.16626 null
2025-12-18 Non-Asymptotic Global Convergence of PPO-Clip Yin Liu et.al. 2512.16565 null
2025-12-18 ParamExplorer: A framework for exploring parameters in generative art Julien Gachadoat et.al. 2512.16529 null
2025-12-18 Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment Yuan Li et.al. 2512.16484 null
2025-12-18 E-SDS: Environment-aware See it, Do it, Sorted - Automated Environment-Aware Reinforcement Learning for Humanoid Locomotion Enis Yalcin et.al. 2512.16446 null
2025-12-18 StarCraft+: Benchmarking Multi-agent Algorithms in Adversary Paradigm Yadong Li et.al. 2512.16444 null
2025-12-18 NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning Ruifeng Xu et.al. 2512.16408 null
2025-12-18 Hypernetworks That Evolve Themselves Joachim Winther Pedersen et.al. 2512.16406 null
2025-12-18 Machine Learning-based Optimal Control for Colloidal Self-Assembly Andres Lizano-Villalobos et.al. 2512.16402 null
2025-12-18 ManiLong-Shot: Interaction-Aware One-Shot Imitation Learning for Long-Horizon Manipulation Zixuan Chen et.al. 2512.16302 null
2025-12-18 Simultaneous Secrecy and Covert Communications (SSACC) in Mobility-Aware RIS-Aided Networks Yanyu Cheng et.al. 2512.16224 null
2025-12-18 Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation Sarosij Bose et.al. 2512.16201 null
2025-12-18 MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation Pengyu Wang et.al. 2512.16145 null
2025-12-18 INTELLECT-3: Technical Report Prime Intellect Team et.al. 2512.16144 null
2025-12-17 Techno-economic optimization of a heat-pipe microreactor, part I: theory and cost optimization Paul Seurin et.al. 2512.16032 null
2025-12-17 Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models Caner Erden et.al. 2512.15973 null
2025-12-17 Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning Polaris Jhandi et.al. 2512.15943 null
2025-12-17 DSO: Direct Steering Optimization for Bias Mitigation Lucas Monteiro Paes et.al. 2512.15926 null
2025-12-15 Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT) Akhil Sharma et.al. 2512.15790 null
2025-12-17 Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Zhenwen Liang et.al. 2512.15687 null
2025-12-17 Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning Jiaqi Xu et.al. 2512.15662 null
2025-12-17 Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction Mathieu Blondel et.al. 2512.15605 null
2025-12-17 Deep Reinforcement Learning for EH-Enabled Cognitive-IoT Under Jamming Attacks Nadia Abdolkhani et.al. 2512.15558 null
2025-12-17 Autonomous Pressure Control in MuVacAS via Deep Reinforcement Learning and Deep Learning Surrogate Models Guillermo Rodriguez-Llorente et.al. 2512.15521 null
2025-12-17 Double Horizon Model-Based Policy Optimization Akihiro Kubo et.al. 2512.15439 null
2025-12-17 FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments Quanxi Zhou et.al. 2512.15430 null
2025-12-17 Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods Ji Zhou et.al. 2512.15422 null
2025-12-17 EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning Jianfei Ma et.al. 2512.15405 null
2025-12-17 Graph Contextual Reinforcement Learning for Efficient Directed Controller Synthesis Toshihide Ubukata et.al. 2512.15295 null
2025-12-17 Learning-Based Phase Shift Optimization of Liquid Crystal RIS in Dynamic mmWave Networks Le Hao et.al. 2512.15279 null
2025-12-17 Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning Yiliu Sun et.al. 2512.15274 null
2025-12-17 EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence Jiaxu Wan et.al. 2512.15160 null
2025-12-17 Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning Weiqin Wang et.al. 2512.15146 null
2025-12-17 Automatic Reward Shaping from Multi-Objective Human Heuristics Yuqing Xie et.al. 2512.15120 null
2025-12-17 QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management Jiayang Wan et.al. 2512.15119 null
2025-12-17 Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models Jinwu Hu et.al. 2512.15089 null
2025-12-17 Deep Reinforcement Learning for Joint Time and Power Management in SWIPT-EH CIoT Nadia Abdolkhani et.al. 2512.15062 null
2025-12-17 Spectral Representation-based Reinforcement Learning Chenxiao Gao et.al. 2512.15036 null
2025-12-17 ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision Wenlong Xia et.al. 2512.15020 null
2025-12-17 Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management E. C. Garrido-Merchán et.al. 2512.14992 null
2025-12-17 Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes Hanqing Jin et.al. 2512.14991 null
2025-12-16 Puzzle Curriculum GRPO for Vision-Centric Reasoning Ahmadreza Jeddi et.al. 2512.14944 null
2025-12-16 Imitation Learning for Multi-turn LM Agents via On-policy Expert Corrections Niklas Lauffer et.al. 2512.14895 null
2025-12-16 Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse Jingwei Chen et.al. 2512.14879 null
2025-12-16 TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs Jun Zhang et.al. 2512.14698 null
2025-12-16 CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives Zihan Wang et.al. 2512.14696 null
2025-12-16 Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes Alessandro Trapasso et.al. 2512.14617 null
2025-12-16 RecGPT-V2 Technical Report Chao Yi et.al. 2512.14503 null
2025-12-16 Hybrid Cognitive IoT with Cooperative Caching and SWIPT-EH: A Hierarchical Reinforcement Learning Framework Nadia Abdolkhani et.al. 2512.14488 null
2025-12-16 Context-Picker: Dynamic context selection using multi-stage reinforcement learning Siyuan Zhu et.al. 2512.14465 null
2025-12-16 A data-physics hybrid generative model for patient-specific post-stroke motor rehabilitation using wearable sensor data Yanning Dai et.al. 2512.14329 null
2025-12-16 Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations Xudong Han et.al. 2512.14321 null
2025-12-16 A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks Agrippina Mwangi et.al. 2512.14297 null
2025-12-16 GLM-TTS Technical Report Jiayan Cui et.al. 2512.14291 null
2025-12-16 Understanding and Improving Hyperbolic Deep Reinforcement Learning Timo Klein et.al. 2512.14202 null
2025-12-16 Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis Yankai Jiang et.al. 2512.14157 null
2025-12-16 A First-Order Logic-Based Alternative to Reward Models in RLHF Chunjin Jian et.al. 2512.14100 null
2025-12-16 RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees Junjie Ma et.al. 2512.14069 null
2025-12-16 Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning Amir M. Soufi Enayati et.al. 2512.14057 null
2025-12-16 OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving Zhenguo Zhang et.al. 2512.14044 null
2025-12-16 Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model Zhaofeng Hu et.al. 2512.14031 null
2025-12-16 Cooperative Caching Towards Efficient Spectrum Utilization in Cognitive-IoT Networks Nadia Abdolkhani et.al. 2512.14029 null
2025-12-16 Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks Nadia Abdolkhani et.al. 2512.14013 null
2025-12-15 Adaptive digital twins for predictive decision-making: Online Bayesian learning of transition dynamics Eugenio Varetti et.al. 2512.13919 null
2025-12-15 Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences Charles Marrder et.al. 2512.13890 null
2025-12-15 SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Jitesh Jain et.al. 2512.13874 null
2025-12-15 Explainable reinforcement learning from human feedback to improve alignment Shicheng Liu et.al. 2512.13837 null
2025-12-13 RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing Yuhan Tang et.al. 2512.13727 null
2025-12-13 Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce Sayak Chakrabarty et.al. 2512.13726 null
2025-12-15 AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection Junwen Miao et.al. 2512.13671 null
2025-12-15 A Scientific Reasoning Model for Organic Synthesis Procedure Generation Guoqing Liu et.al. 2512.13668 null
2025-12-15 Advancing Machine Learning Optimization of Chiral Photonic Metasurface: Comparative Study of Neural Network and Genetic Algorithm Approaches Davide Filippozzi et.al. 2512.13656 null
2025-12-15 MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning Haoyu Fu et.al. 2512.13636 null
2025-12-15 SCR2-ST: Combine Single Cell with Spatial Transcriptomics for Efficient Active Sampling via Reinforcement Learning Junchao Zhu et.al. 2512.13635 null
2025-12-15 Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models Boxin Wang et.al. 2512.13607 null
2025-12-15 Image Diffusion Preview with Consistency Solver Fu-Yun Wang et.al. 2512.13592 link
2025-12-15 MMhops-R1: Multimodal Multi-hop Reasoning Tao Zhang et.al. 2512.13573 null
2025-12-15 Memory in the Age of AI Agents Yuyang Hu et.al. 2512.13564 link
2025-12-15 How Low Can You Go? The Data-Light SE Challenge Kishan Kumar Ganguly et.al. 2512.13524 null
2025-12-15 Reinforcement Learning based 6-DoF Maneuvers for Microgravity Intravehicular Docking: A Simulation Study with Int-Ball2 in ISS-JEM Aman Arora et.al. 2512.13514 null
2025-12-15 MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph Linjie Mu et.al. 2512.13510 null
2025-12-15 Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model Heyi Chen et.al. 2512.13507 null
2025-12-15 Differentiable Evolutionary Reinforcement Learning Sitao Cheng et.al. 2512.13399 null
2025-12-15 QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution Mohammad Reza Fasihi et.al. 2512.13393 null
2025-12-15 Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning Chuan Mao et.al. 2512.13380 null
2025-12-15 Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles Sümer Tunçay et.al. 2512.13359 null
2025-12-15 Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3) Zeyad Gamal et.al. 2512.13356 null
2025-12-15 Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration Hao Fu et.al. 2512.13293 null
2025-12-15 AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning Jiaru Zou et.al. 2512.13278 null
2025-12-15 SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling Muhammad Alfian Amrizal et.al. 2512.13268 null
2025-12-15 Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving Hyunki Seong et.al. 2512.13262 null
2025-12-15 Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection Zihui Zhao et.al. 2512.13240 null
2025-12-15 SACn: Soft Actor-Critic with n-step Returns Jakub Łyskawa et.al. 2512.13165 null
2025-12-15 SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning Emre Can Acikgoz et.al. 2512.13159 null
2025-12-15 TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning Shenzhi Yang et.al. 2512.13106 null
2025-12-15 Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures Mohammad Walid Charrwi et.al. 2512.13096 null
2025-12-15 ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning Feng Zhang et.al. 2512.13095 null
2025-12-15 Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation Xiang Li et.al. 2512.13094 null
2025-12-15 PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations Mingqi Yuan et.al. 2512.13093 null
2025-12-15 M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization Bizhe Bai et.al. 2512.13070 null
2025-12-15 Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments Kangning Gao et.al. 2512.13060 null
2025-12-15 GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training Tong Wei et.al. 2512.13043 null
2025-12-15 What Happens Next? Next Scene Prediction with a Unified Video Model Xinjie Li et.al. 2512.13015 null
2025-12-15 Learning Terrain Aware Bipedal Locomotion via Reduced Dimensional Perceptual Representations Guillermo A. Castillo et.al. 2512.12993 null
2025-12-15 Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning Amin Jalal Aghdasian et.al. 2512.12987 null
2025-12-15 QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management Weizhou Shen et.al. 2512.12967 null
2025-12-15 Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals Gagan Deep et.al. 2512.12924 null
2025-12-15 LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization Bangyu Li et.al. 2512.12922 null
2025-12-15 Meta-GPT: Decoding the Metasurface Genome with Generative Artificial Intelligence David Dang et.al. 2512.12888 null
2025-12-14 Information-Consistent Language Model Recommendations through Group Relative Policy Optimization Sonal Prabhune et.al. 2512.12858 null
2025-12-14 MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems Patrick Kostelac et.al. 2512.12855 null
2025-12-14 Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks Dong Liu et.al. 2512.12803 null
2025-12-14 CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning Xuanzhang Liu et.al. 2512.12716 null
2025-12-14 Self-Motivated Growing Neural Network for Adaptive Architecture via Local Structural Plasticity Yiyang Jia et.al. 2512.12713 null
2025-12-14 Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning Enhong Mu et.al. 2512.12706 null
2025-12-14 Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning Yongcan Yu et.al. 2512.12690 null
2025-12-14 CogDoc: Towards Unified thinking in Documents Qixin Xu et.al. 2512.12658 null
2025-12-14 Coupled Variational Reinforcement Learning for Language Model General Reasoning Xueru Wen et.al. 2512.12576 null
2025-12-14 World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents Yesid Fonseca et.al. 2512.12548 null
2025-12-13 Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings Shengkai Xu et.al. 2512.12492 null
2025-12-13 More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models Hoang Anh Just et.al. 2512.12487 null
2025-12-13 HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments Yongjun He et.al. 2512.12476 null
2025-12-13 Sim2Real Reinforcement Learning for Soccer skills Jonathan Spraggett et.al. 2512.12437 null
2025-12-13 Deep Hedging with Reinforcement Learning: A Practical Framework for Option Risk Management Travon Lucius et.al. 2512.12420 null
2025-12-13 ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) Systems Babak Badnava et.al. 2512.12366 null
2025-12-13 The Role of AI in Modern Penetration Testing J. Alexander Curtis et.al. 2512.12326 null
2025-12-13 A Conflict-Aware Resource Management Framework for the Computing Continuum Vlad Popescu-Vifor et.al. 2512.12299 null
2025-12-13 Moment and Highlight Detection via MLLM Frame Segmentation I Putu Andika Bagas Jiwanta et.al. 2512.12246 null
2025-12-13 Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy Jonathan Spraggett et.al. 2512.12230 null
2025-12-12 Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning Vittorio Giammarino et.al. 2512.12046 null
2025-12-12 Policy Gradient Algorithms for Age-of-Information Cost Minimization José-Ramón Vidal et.al. 2512.11990 null
2025-12-12 Learning to Extract Context for Context-Aware LLM Inference Minseon Kim et.al. 2512.11986 null
2025-12-12 A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach Jia Hu et.al. 2512.11944 null
2025-12-12 Evolutionary Reinforcement Learning based AI tutor for Socratic Interdisciplinary Instruction Mei Jiang et.al. 2512.11930 null
2025-12-12 AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis Junjie Ye et.al. 2512.11797 null
2025-12-12 Agile Flight Emerges from Multi-Agent Competitive Racing Vineet Pasumarti et.al. 2512.11781 null
2025-12-12 SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support Yuming Feng et.al. 2512.11755 null
2025-12-12 UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations Tingyu Yuan et.al. 2512.11609 null
2025-12-12 DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry Zhenyang Cai et.al. 2512.11558 null
2025-12-12 Rethinking Expert Trajectory Utilization in LLM Post-training Bowen Ding et.al. 2512.11470 null
2025-12-12 Three methods, one problem: Classical and AI approaches to no-three-in-line Pranav Ramanathan et.al. 2512.11469 null
2025-12-12 Towards Trustworthy Multi-Turn LLM Agents via Behavioral Guidance Gonca Gürsun et.al. 2512.11421 null
2025-12-12 Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization Yifan Niu et.al. 2512.11391 null
2025-12-12 Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits Minwoo Park et.al. 2512.11345 null
2025-12-12 DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning Jinming Ge et.al. 2512.11342 null
2025-12-12 RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training Tianyuan Wu et.al. 2512.11306 null
2025-12-12 When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents Mrinal Rawat et.al. 2512.11277 null
2025-12-12 A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation Hong Je-Gal et.al. 2512.11270 null
2025-12-12 Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control Iftekharul Islam et.al. 2512.11247 null
2025-12-11 Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning Wei Duan et.al. 2512.11179 null
2025-12-11 Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance Tzu-Hsien Lee et.al. 2512.11173 null
2025-12-11 CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound Akhil S Anand et.al. 2512.11169 null
2025-12-11 Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts Guanli Liu et.al. 2512.11161 null
2025-12-11 In-Context Multi-Objective Optimization Xinyu Zhang et.al. 2512.11114 null
2025-12-11 Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation Yiwen Tang et.al. 2512.10949 link
2025-12-11 Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit Zamirddine Mari et.al. 2512.10934 null
2025-12-11 Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation Zamirddine Mari et.al. 2512.10925 null
2025-12-11 Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies Mohammad Rezoanul Hoque et.al. 2512.10913 null
2025-12-11 Iterative Compositional Data Generation for Robot Control Anh-Quan Pham et.al. 2512.10891 null
2025-12-11 Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments Atahan Cilan et.al. 2512.10835 null
2025-12-11 OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification Zijian Wu et.al. 2512.10756 null
2025-12-11 Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification Maya Swisa et.al. 2512.10747 null
2025-12-11 Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Songyang Gao et.al. 2512.10739 null
2025-12-11 How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning Jianbo Wang et.al. 2512.10698 null
2025-12-11 Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning Benjamin Gundersen et.al. 2512.10691 null
2025-12-11 AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence Bo Yang et.al. 2512.10624 null
2025-12-11 Multi-Objective Reward and Preference Optimization: Theory and Algorithms Akhil Agnihotri et.al. 2512.10601 null
2025-12-11 Grounding Everything in Tokens for Multimodal Large Language Models Xiangxuan Ren et.al. 2512.10554 null
2025-12-11 Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning Haiteng Zhao et.al. 2512.10534 null
2025-12-11 Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning Chihyeon Song et.al. 2512.10510 null
2025-12-11 UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning Jiaxi Wu et.al. 2512.10492 null
2025-12-11 Shot and Architecture Adaptive Subspace Variational Quantum Eigensolver for Microwave Simulation Zhixiu Han et.al. 2512.10458 null
2025-12-11 HypeR Adaptivity: Joint $hr$ -Adaptive Meshing via Hypergraph Multi-Agent Deep Reinforcement Learning Niccolò Grillo et.al. 2512.10439 null
2025-12-11 Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention Yang Yu et.al. 2512.10414 null
2025-12-11 A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale Vinoth Punniyamoorthy et.al. 2512.10341 null
2025-12-11 Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters Shruti Dongare et.al. 2512.10271 null
2025-12-11 Multi-dimensional Preference Alignment by Conditioning Reward Itself Jiho Jang et.al. 2512.10237 null
2025-12-11 Task-Oriented Grasping Using Reinforcement Learning with a Contextual Reward Machine Hui Li et.al. 2512.10235 null
2025-12-11 Latent Chain-of-Thought World Modeling for End-to-End Driving Shuhan Tan et.al. 2512.10226 null
2025-12-11 An exploration for higher efficiency in multi objective optimisation with reinforcement learning Mehmet Emin Aydin et.al. 2512.10208 null
2025-12-10 Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation Pol Mestres et.al. 2512.10118 null
2025-12-10 Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation Steven Caro et.al. 2512.10099 null
2025-12-10 SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation Jongmin Lee et.al. 2512.10042 null
2025-12-10 Diffusion Is Your Friend in Show, Suggest and Tell Jia Cheng Hu et.al. 2512.10038 null
2025-12-10 Latent Action World Models for Control with Unlabeled Trajectories Marvin Alles et.al. 2512.10016 null
2025-12-10 TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0 Jinyu Chen et.al. 2512.09961 null
2025-12-10 STACHE: Local Black-Box Explanations for Reinforcement Learning Policies Andrew Elashkin et.al. 2512.09909 null
2025-12-10 FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning Khurram Khalil et.al. 2512.09872 null
2025-12-10 Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation Yuyang Li et.al. 2512.09851 link
2025-12-10 ChronusOmni: Improving Time Awareness of Omni Large Language Models Yijing Chen et.al. 2512.09841 null
2025-12-10 RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning Khurram Khalil et.al. 2512.09829 null
2025-12-10 Prefrontal scaling of reward prediction error readout gates reinforcement-derived adaptive behavior in primates Tian Sang et.al. 2512.09761 null
2025-12-10 MOA: Multi-Objective Alignment for Role-Playing Agents Chonghua Liao et.al. 2512.09756 null
2025-12-10 Flexible Reconfigurable Intelligent Surface-Aided Covert Communications in UAV Networks Chong Huang et.al. 2512.09714 null
2025-12-10 Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning Kaichen He et.al. 2512.09706 null
2025-12-10 Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies Mika Persson et.al. 2512.09682 null
2025-12-10 d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models Leyi Pan et.al. 2512.09675 null
2025-12-10 SynthPix: A lightspeed PIV images generator Antonio Terpin et.al. 2512.09664 null
2025-12-10 Mastering Diverse, Unknown, and Cluttered Tracks for Robust Vision-Based Drone Racing Feng Yu et.al. 2512.09571 null
2025-12-10 Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search Junkai Ji et.al. 2512.09566 null
2025-12-10 REASAN: Learning Reactive Safe Navigation for Legged Robots Qihao Yuan et.al. 2512.09537 null
2025-12-10 RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning Yucan Guo et.al. 2512.09487 null
2025-12-10 Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation Jialin Ying et.al. 2512.09410 null
2025-12-10 CFLight: Enhancing Safety with Traffic Signal Control through Counterfactual Learning Mingyuan Li et.al. 2512.09368 null
2025-12-10 COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning Lin Li et.al. 2512.09349 null
2025-12-10 Tyche: A Hybrid Computation Framework of Illumination Pattern for Satellite Beam Hopping Ziheng Yang et.al. 2512.09312 null
2025-12-10 One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation Huayi Zhou et.al. 2512.09297 null
2025-12-10 Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning Ruonan Pi et.al. 2512.09293 null
2025-12-10 Exploratory Mean-Variance with Jumps: An Equilibrium Approach Yuling Max Chen et.al. 2512.09224 null
2025-12-09 Learning Unmasking Policies for Diffusion Language Models Metod Jazbec et.al. 2512.09106 null
2025-12-09 Masked Generative Policy for Robotic Control Lipeng Zhuang et.al. 2512.09101 null
2025-12-09 No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers Damiano Marsili et.al. 2512.08889 null
2025-12-09 IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams Ryan LeRoy et.al. 2512.08877 null
2025-12-09 Reinforcement Learning From State and Temporal Differences Lex Weaver et.al. 2512.08855 null
2025-12-09 Optimal navigation in two-dimensional regular and turbulent flows Vladimir Parfenyev et.al. 2512.08766 null
2025-12-09 Learning and Editing Universal Graph Prompt Tuning via Reinforcement Learning Jinfeng Xu et.al. 2512.08763 null
2025-12-09 Direct transfer of optimized controllers to similar systems using dimensionless MPC Josip Kir Hromatko et.al. 2512.08667 null
2025-12-09 Sim2Swim: Zero-Shot Velocity Control for Agile AUV Maneuvering in 3 Minutes Lauritz Rismark Fosso et.al. 2512.08656 null
2025-12-09 Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis Orit Davidovich et.al. 2512.08601 null
2025-12-09 Mind to Hand: Purposeful Robotic Control via Embodied Reasoning Peijun Tang et.al. 2512.08580 null
2025-12-09 Thinking with Images via Self-Calling Agent Wenxi Yang et.al. 2512.08511 link
2025-12-09 Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning Junnan Qiu et.al. 2512.08485 null
2025-12-09 Using reinforcement learning to probe the role of feedback in skill acquisition Antonio Terpin et.al. 2512.08463 null
2025-12-09 From Accuracy to Impact: The Impact-Driven AI Framework (IDAIF) for Aligning Engineering Architecture with Theory of Change Yong-Woon Kim et.al. 2512.08449 null
2025-12-09 Turning Threat into Opportunity: DRL-Powered Anti-Jamming via Energy Harvesting in UAV-Disrupted Channels Ngoc-Tan Nguyen et.al. 2512.08351 null
2025-12-09 Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks Thai Duong Nguyen et.al. 2512.08341 null
2025-12-09 Collaborative Intelligence for UAV-Satellite Network Slicing: Towards a Joint QoS-Energy-Fairness MADRL Optimization Thanh-Dao Nguyen et.al. 2512.08322 null
2025-12-09 rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection Sijia Chen et.al. 2512.08300 null
2025-12-09 Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventions Eunice Yiu et.al. 2512.08230 null
2025-12-09 Primal-dual policy learning for mean-field stochastic LQR problem Xiushan Jiang et.al. 2512.08205 null
2025-12-09 TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models Zheng Ding et.al. 2512.08153 null
2025-12-09 Robust Agents in Open-Ended Worlds Mikayel Samvelyan et.al. 2512.08139 null
2025-12-09 Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward Sampriti Soor et.al. 2512.08131 null
2025-12-08 Scalable Offline Model-Based RL with Action Chunks Kwanyoung Park et.al. 2512.08108 null
2025-12-08 Training LLMs for Honesty via Confessions Manas Joglekar et.al. 2512.08093 null
2025-12-08 An Introduction to Deep Reinforcement and Imitation Learning Pedro Santana et.al. 2512.08052 null
2025-12-08 F2: Offline Reinforcement Learning for Hamiltonian Simulation via Free-Fermionic Subroutine Compilation Ethan Decker et.al. 2512.08023 null
2025-12-08 Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care Aryaman Bansal et.al. 2512.08012 null
2025-12-08 VLD: Visual Language Goal Distance for Reinforcement Learning Navigation Lazar Milikic et.al. 2512.07976 null
2025-12-08 Agentic Artificial Intelligence for Ethical Cybersecurity in Uganda: A Reinforcement Learning Framework for Threat Detection in Resource-Constrained Environments Ibrahim Adabara et.al. 2512.07909 null
2025-12-08 An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning Lukas Johannes Möller et.al. 2512.07827 null
2025-12-08 On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Charlie Zhang et.al. 2512.07783 null
2025-12-08 RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models Xiqiao Xiong et.al. 2512.07761 null
2025-12-08 DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving Jialv Zou et.al. 2512.07745 null
2025-12-08 SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery Meng Cao et.al. 2512.07733 null
2025-12-08 Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE Anxiang Zeng et.al. 2512.07710 null
2025-12-08 Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks Aileen Liao et.al. 2512.07697 null
2025-12-08 The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds Shahar Lutati et.al. 2512.07631 null
2025-12-08 Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement Yongsheng Lian et.al. 2512.07611 null
2025-12-08 Understanding Individual Decision-Making in Multi-Agent Reinforcement Learning: A Dynamical Systems Approach James Rudd-Jones et.al. 2512.07588 null
2025-12-08 ReLaX: Reasoning with Latent Exploration for Large Reasoning Models Shimin Zhang et.al. 2512.07558 null
2025-12-08 Model-Based Reinforcement Learning Under Confounding Nishanth Venkatesh et.al. 2512.07528 null
2025-12-08 How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations JV Roig et.al. 2512.07497 null
2025-12-08 Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization Zhuoran Zhuang et.al. 2512.07478 null
2025-12-08 Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction Haolin Song et.al. 2512.07464 null
2025-12-08 Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Tong Wu et.al. 2512.07461 null
2025-12-08 From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models Clarisse Bardiot et.al. 2512.07452 null
2025-12-08 KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models Chenwei Shi et.al. 2512.07437 null
2025-12-08 Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models Haidong Kang et.al. 2512.07419 null
2025-12-08 Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning Giray Önür et.al. 2512.07417 null
2025-12-08 Training Language Models to Use Prolog as a Tool Niklas Mellgren et.al. 2512.07407 null
2025-12-08 Control and Reinforcement Learning through the Lens of Optimization: An Algorithmic Perspective Tolga Ok et.al. 2512.07377 null
2025-12-08 ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning Byungju Kim et.al. 2512.07371 null
2025-12-08 Multi-Rigid-Body Approximation of Human Hands with Application to Digital Twin Bin Zhao et.al. 2512.07359 null
2025-12-08 PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning Chen Gong et.al. 2512.07342 null
2025-12-08 RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation Zhi Rao et.al. 2512.07273 null
2025-12-08 SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks Florian Tretter et.al. 2512.07266 null
2025-12-08 Benchmarking Humanoid Imitation Learning with Motion Difficulty Zhaorui Meng et.al. 2512.07248 null
2025-12-08 Towards Robust Protective Perturbation against DeepFake Face Swapping Hengyang Yao et.al. 2512.07228 null
2025-12-08 Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation Zhaoyang Liu et.al. 2512.07212 null
2025-12-08 MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning Xuhui Zheng et.al. 2512.07203 null
2025-12-08 Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction Zhen Huang et.al. 2512.07200 null
2025-12-08 Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models Fenghua Weng et.al. 2512.07141 null
2025-12-08 TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning Zebin Xing et.al. 2512.07135 null
2025-12-08 Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots Jue Wang et.al. 2512.07114 null
2025-12-07 A Hetero-Associative Sequential Memory Model Utilizing Neuromorphic Signals: Validated on a Mobile Manipulator Runcong Wang et.al. 2512.07032 null
2025-12-07 Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients Krishna Arun et.al. 2512.06990 null
2025-12-07 LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding Yu Yu et.al. 2512.06982 null
2025-12-07 Neuro-Vesicles: Neuromodulation Should Be a Dynamical System, Not a Tensor Decoration Zilin Li et.al. 2512.06966 null
2025-12-07 Statistical analysis of Inverse Entropy-regularized Reinforcement Learning Denis Belomestny et.al. 2512.06956 null
2025-12-07 Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features Aseer Al Faisal et.al. 2512.06925 null
2025-12-07 Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models Alexandr Plashchinsky et.al. 2512.06920 null
2025-12-07 Know your Trajectory – Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis Clifford F et.al. 2512.06917 null
2025-12-07 Khalasi: Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields Rushiraj Gadhvi et.al. 2512.06912 null
2025-12-07 An Analysis of Large Language Models for Simulating User Responses in Surveys Ziyun Yu et.al. 2512.06874 null
2025-12-07 JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models Ce Chi et.al. 2512.06859 null
2025-12-07 Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning Tingyu Li et.al. 2512.06835 null
2025-12-07 MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning Yueqian Wang et.al. 2512.06810 null
2025-12-07 PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance Jifar Wakuma Ayana et.al. 2512.06747 null
2025-12-07 The Role of Entropy in Visual Grounding: Analysis and Optimization Shuo Li et.al. 2512.06726 null
2025-12-07 RunawayEvil: Jailbreaking the Image-to-Video Generative Models Songping Wang et.al. 2512.06674 null
2025-12-07 LightSearcher: Efficient DeepSearch via Experiential Memory Hengzhi Lan et.al. 2512.06653 null
2025-12-07 Analyzing Collision Rates in Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning Muyang Fan et.al. 2512.06645 null
2025-12-07 Learning to Hedge Swaptions Zaniar Ahmadi et.al. 2512.06639 null
2025-12-07 MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment Ruicheng Zhang et.al. 2512.06628 null
2025-12-07 A New Trajectory-Oriented Approach to Enhancing Comprehensive Crowd Navigation Performance Xinyu Zhou et.al. 2512.06608 null
2025-12-06 MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding Yuhao Su et.al. 2512.06581 null
2025-12-06 Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input Zifan Xu et.al. 2512.06571 null
2025-12-06 A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation Xiaocan Li et.al. 2512.06547 null
2025-12-06 Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning Ming Chen et.al. 2512.06533 null
2025-12-06 Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains Wanru Gong et.al. 2512.06486 null
2025-12-06 Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control Nathan P. Lawrence et.al. 2512.06471 null
2025-12-06 RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs Runlong Zhou et.al. 2512.06392 null
2025-12-06 VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning Yuji Wang et.al. 2512.06373 null
2025-12-06 LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing Zhiying Yang et.al. 2512.06351 null
2025-12-06 ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models Jiahao Li et.al. 2512.06328 null
2025-12-06 A Hybrid Physics-Based and Reinforcement Learning Framework for Electric Vehicle Charging Time Prediction Praharshitha Aryasomayajula et.al. 2512.06287 null
2025-12-06 Networked Restless Multi-Arm Bandits with Reinforcement Learning Hanmo Zhang et.al. 2512.06274 null
2025-12-06 Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models Chen Yang et.al. 2512.06266 null
2025-12-06 Learning Without Time-Based Embodiment Resets in Soft-Actor Critic Homayoon Farrahi et.al. 2512.06252 null
2025-12-06 Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning Chris Tava et.al. 2512.06250 null
2025-12-06 Auto-exploration for online reinforcement learning Caleb Ju et.al. 2512.06244 null
2025-12-06 AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems Chuanhao Nie et.al. 2512.06240 null
2025-12-05 Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration Huizhen Yu et.al. 2512.06218 null
2025-12-05 Quantifying Memory Use in Reinforcement Learning with Temporal Range Rodney Lafuente-Mercado et.al. 2512.06204 null
2025-12-05 JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning Ufuk Çakır et.al. 2512.06102 null
2025-12-05 Empathy by Design: Aligning Large Language Models for Healthcare Dialogue Emre Umucu et.al. 2512.06097 null
2025-12-05 Comparative Analysis of Autonomous and Systematic Control Strategies for Hole-Doped Hubbard Clusters: Reinforcement Learning versus Physics-Guided Design Shivanshu Dwivedi et.al. 2512.06095 null
2025-12-05 Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring Mohanakrishnan Hariharan et.al. 2512.06060 null
2025-12-05 EditThinker: Unlocking Iterative Reasoning for Any Image Editor Hongyu Li et.al. 2512.05965 null
2025-12-05 Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity Germán Kruszewski et.al. 2512.05962 null
2025-12-05 Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning Yunhao Cao et.al. 2512.05953 null
2025-12-05 Variational Quantum Rainbow Deep Q-Network for Optimizing Resource Allocation Problem Truong Thanh Hung Nguyen et.al. 2512.05946 null
2025-12-05 Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation Fabian Konstantinidis et.al. 2512.05812 null
2025-12-05 Real-time Remote Tracking and Autonomous Planning for Whale Rendezvous using Robots Sushmita Bhattacharya et.al. 2512.05808 null
2025-12-05 A Fast Anti-Jamming Cognitive Radar Deployment Algorithm Based on Reinforcement Learning Wencheng Cai et.al. 2512.05753 null
2025-12-05 A High-Order Immersed Boundary Method for Fluid-Structure Interaction Problems Yingjie Xia et.al. 2512.05733 null
2025-12-05 Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning Ali Krayani et.al. 2512.05711 null
2025-12-05 LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving Yiming Shu et.al. 2512.05686 null
2025-12-05 MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation Zhitao He et.al. 2512.05671 null
2025-12-05 Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning Zhenpeng Su et.al. 2512.05591 null
2025-12-05 Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning Pengcheng Dai et.al. 2512.05447 null
2025-12-05 ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction Jiangtong Tan et.al. 2512.05422 null
2025-12-05 State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning Yuxiang Liu et.al. 2512.05335 null
2025-12-04 Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay Mehmet Efe Lorasdagi et.al. 2512.05320 null
2025-12-04 Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces Na Li et.al. 2512.05291 null
2025-12-04 Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem Ali Al Housseini et.al. 2512.05207 null
2025-12-04 ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Shengyuan Ding et.al. 2512.05111 null
2025-12-04 STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models Feng Xu et.al. 2512.05107 null
2025-12-04 Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning Purbesh Mitra et.al. 2512.05105 link
2025-11-06 FoodRL: A Reinforcement Learning Ensembling Framework For In-Kind Food Donation Forecasting Esha Sharma et.al. 2511.04865 null
2025-11-06 Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning Thore Gerlach et.al. 2511.04856 null
2025-11-06 Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning NVIDIA et.al. 2511.04831 null
2025-11-06 Unified Multimodal Diffusion Forcing for Forceful Manipulation Zixuan Huang et.al. 2511.04812 null
2025-11-06 Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models Chenxi Liu et.al. 2511.04800 null
2025-11-05 SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change Memory Mahek Desai et.al. 2511.04713 null
2025-11-05 NCSAC: Effective Neural Community Search via Attribute-augmented Conductance Longlong Lin et.al. 2511.04712 null
2025-11-06 GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction Qingzhou Lu et.al. 2511.04679 null
2025-11-06 Forgetting is Everywhere Ben Sanati et.al. 2511.04666 null
2025-11-06 Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning Hampus Åström et.al. 2511.04598 null
2025-11-06 End-to-End Reinforcement Learning of Koopman Models for eNMPC of an Air Separation Unit Daniel Mayfrank et.al. 2511.04522 null
2025-11-06 V-Thinker: Interactive Thinking with Images Runqi Qiao et.al. 2511.04460 null
2025-11-06 Fitting Reinforcement Learning Model to Behavioral Data under Bandits Hao Zhu et.al. 2511.04454 null
2025-11-06 The Peril of Preference: Why GRPO fails on Ordinal Rewards Anisha Garg et.al. 2511.04439 null
2025-11-06 Temporal Action Selection for Action Chunking Yueyang Weng et.al. 2511.04421 null
2025-11-06 GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies Maëlic Neau et.al. 2511.04357 null
2025-11-06 MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments Kuankuan Sima et.al. 2511.04320 null
2025-11-06 GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Jian Mu et.al. 2511.04307 null
2025-11-06 Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference Matteo Cercola et.al. 2511.04286 null
2025-11-06 RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization Zeng Zhiyuan et.al. 2511.04285 null
2025-11-06 SSPO: Subsentence-level Policy Optimization Kun Yang et.al. 2511.04256 null
2025-11-06 Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies Marco Iannotta et.al. 2511.04249 null
2025-11-06 Shared Spatial Memory Through Predictive Coding Zhengru Fang et.al. 2511.04235 null
2025-11-06 Opus: A Quantitative Framework for Workflow Evaluation Alan Seroul et.al. 2511.04220 null
2025-11-06 Black-Box Guardrail Reverse-engineering Attack Hongwei Yao et.al. 2511.04215 null
2025-11-06 PUL-SLAM: Path-Uncertainty Co-Optimization with Lightweight Stagnation Detection for Efficient Robotic Exploration Yizhen Yin et.al. 2511.04180 null
2025-11-06 Deep reinforcement learning based navigation of a jellyfish-like swimmer in flows with obstacles Yihao Chen et.al. 2511.04156 null
2025-11-06 Exchange Policy Optimization Algorithm for Semi-Infinite Safe Reinforcement Learning Jiaming Zhang et.al. 2511.04147 null
2025-11-06 BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning Yitang Li et.al. 2511.04131 null
2025-11-06 RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning Xinyuan Li et.al. 2511.04120 null
2025-11-06 CBMC-V3: A CNS-inspired Control Framework Towards Manipulation Agility with SNN Yanbo Pang et.al. 2511.04109 null
2025-11-06 Necessary and Sufficient Conditions for the Optimization-Based Concurrent Execution of Learned Robotic Tasks Sheikh A. Tahmid et.al. 2511.04054 null
2025-11-06 Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots Yushi Wang et.al. 2511.03996 null
2025-11-06 Adaptive Temporal Refinement: Continuous Depth Allocation and Distance Regression for Efficient Action Localization Ibne Farabi Shihab et.al. 2511.03943 null
2025-11-06 RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods Raghav Sharma et.al. 2511.03939 null
2025-11-05 Learning to shine: Neuroevolution enables optical control of phase transitions Sraddha Agrawal et.al. 2511.03895 null
2025-11-05 Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures Florence Klitzner et.al. 2511.03882 null
2025-11-05 From Static to Dynamic: Enhancing Offline-to-Online Reinforcement Learning via Energy-Guided Diffusion Stratification Lipeng Zu et.al. 2511.03828 null
2025-11-05 Scaling Agent Learning via Experience Synthesis Zhaorun Chen et.al. 2511.03773 null
2025-11-05 Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning Richard Dewey et.al. 2511.03724 null
2025-11-05 Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards Guanning Zeng et.al. 2511.03710 null
2025-11-05 AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing Mohsen Ahmadzadeh et.al. 2511.03697 null
2025-11-05 Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL Lipeng Zu et.al. 2511.03695 null
2025-11-05 Simulation-Based Validation of an Integrated 4D/5D Digital-Twin Framework for Predictive Construction Control Atena Khoshkonesh et.al. 2511.03684 null
2025-11-05 DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay Daniel Perkins et.al. 2511.03670 null
2025-11-05 Towards Formalizing Reinforcement Learning Theory Shangtong Zhang et.al. 2511.03618 null
2025-11-05 Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning Iason Chrysomallis et.al. 2511.03616 null
2025-11-05 Tensor-Efficient High-Dimensional Q-learning Junyi Wu et.al. 2511.03595 null
2025-11-05 PerfDojo: Automated ML Library Generation for Heterogeneous Architectures Andrei Ivanov et.al. 2511.03586 null
2025-11-05 Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances Iason Chrysomallis et.al. 2511.03565 null
2025-11-05 Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments Bryan L. M. de Oliveira et.al. 2511.03527 null
2025-11-05 Reinforcement Learning Using known Invariances Alexandru Cioba et.al. 2511.03473 null
2025-11-05 Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG Longpeng Qiu et.al. 2511.03410 null
2025-11-05 Adaptable Hindsight Experience Replay for Search-Based Learning Alexandros Vazaios et.al. 2511.03405 null
2025-11-05 Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning Changxi Zhu et.al. 2511.03348 null
2025-11-05 DRL-Based Robust Multi-Timescale Anti-Jamming Approaches under State Uncertainty Haoqin Zhao et.al. 2511.03305 null
2025-11-05 Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning Ning Lyu et.al. 2511.03279 null
2025-11-05 Climate Adaptation with Reinforcement Learning: Economic vs. Quality of Life Adaptation Pathways Miguel Costa et.al. 2511.03243 null
2025-11-05 Incorporating Quality of Life in Climate Adaptation Planning via Reinforcement Learning Miguel Costa et.al. 2511.03238 null
2025-11-05 Collaborative Assembly Policy Learning of a Sightless Robot Zeqing Zhang et.al. 2511.03189 null
2025-11-05 Periodic Skill Discovery Jonghae Park et.al. 2511.03187 null
2025-11-05 Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control Rewida Ali et.al. 2511.03181 null
2025-11-05 Optimizing Earth-Moon Transfer and Cislunar Navigation: Integrating Low-Energy Trajectories, AI Techniques and GNSS-R Technologies Arsalan Muhammad et.al. 2511.03173 null
2025-11-05 Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning Xin Liu et.al. 2511.03167 null
2025-11-05 Accelerating inverse materials design using generative diffusion models with reinforcement learning Junwu Chen et.al. 2511.03112 null
2025-11-05 Scaling Multi-Agent Environment Co-Design with Diffusion Models Hao Xiang Li et.al. 2511.03100 null
2025-11-04 Leveraging Discrete Function Decomposability for Scientific Design James C. Bowden et.al. 2511.03032 null
2025-11-04 Value of Information-Enhanced Exploration in Bootstrapped DQN Stergios Plataniotis et.al. 2511.02969 null
2025-11-04 Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks Mohsin Mahmud Topu et.al. 2511.02957 null
2025-11-04 Audience Amplified: Virtual Audiences in Asynchronously Performed AR Theater You-Jin Kim et.al. 2511.02807 null
2025-11-04 MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning Qianhao Yuan et.al. 2511.02805 null
2025-11-04 From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos Xun Wang et.al. 2511.02762 null
2025-11-04 Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning Bowen Jin et.al. 2511.02755 null
2025-11-04 VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models Zhicheng Zhang et.al. 2511.02712 null
2025-11-04 Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs Georgios Tzannetos et.al. 2511.02690 null
2025-11-04 RL-Aided Cognitive ISAC: Robust Detection and Sensing-Communication Trade-offs Adam Umra et.al. 2511.02672 null
2025-11-04 Natural-gas storage modelling by deep reinforcement learning Tiziano Balaconi et.al. 2511.02646 null
2025-11-04 Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning Tiberiu-Andrei Georgescu et.al. 2511.02605 null
2025-11-04 Directional-Clamp PPO Gilad Karpel et.al. 2511.02577 null
2025-11-04 Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning Yixiu Mao et.al. 2511.02567 null
2025-11-04 An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems Changhao Miao et.al. 2511.02525 null
2025-11-04 Dexterous Robotic Piano Playing at Scale Le Chen et.al. 2511.02504 null
2025-11-04 Auditable-choice reframing unlocks RL-based verification for open-ended tasks Mengyu Zhang et.al. 2511.02463 null
2025-11-04 ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension Duo Xu et.al. 2511.02415 null
2025-11-04 Large-scale automatic carbon ion treatment planning for head and neck cancers via parallel multi-agent reinforcement learning Jueye Zhang et.al. 2511.02314 null
2025-11-04 Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning Beyazit Yalcinkaya et.al. 2511.02304 null
2025-11-04 Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation Zhiwei Zhang et.al. 2511.02303 null
2025-11-04 Reinforcement learning based data assimilation for unknown state model Ziyi Wang et.al. 2511.02286 null
2025-11-04 SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning Fangxun Shu et.al. 2511.02280 null
2025-11-04 Structural Plasticity as Active Inference: A Biologically-Inspired Architecture for Homeostatic Control Brennen A. Hill et.al. 2511.02241 null
2025-11-04 Learning Interactive World Model for Object-Centric Reinforcement Learning Fan Feng et.al. 2511.02225 null
2025-11-04 Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments Manonmani Sekar et.al. 2511.02217 null
2025-11-04 Adaptive Cooperative Transmission Design for Ultra-Reliable Low-Latency Communications via Deep Reinforcement Learning Hyemin Yu et.al. 2511.02216 null
2025-11-04 Training Proactive and Personalized LLM Agents Weiwei Sun et.al. 2511.02208 null
2025-11-04 A Quantitative Comparison of Centralised and Distributed Reinforcement Learning-Based Control for Soft Robotic Arms Linxin Hou et.al. 2511.02192 null
2025-11-03 JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading Valentin Mohl et.al. 2511.02136 null
2025-11-03 Second-Order Policy Gradient Methods for the Linear Quadratic Regulator Amirreza Valaei et.al. 2511.02095 null
2025-11-03 Automated Reward Design for Gran Turismo Michel Ma et.al. 2511.02094 null
2025-11-03 Deep Reinforcement Learning for Multi-flow Routing in Heterogeneous Wireless Networks Brian Kim et.al. 2511.02030 null
2025-11-03 ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book Patrick Cheridito et.al. 2511.02016 null
2025-11-02 Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR Abdelaziz Bounhar et.al. 2511.01937 link
2025-11-02 Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch Yirong Zeng et.al. 2511.01934 null
2025-11-03 GenDexHand: Generative Simulation for Dexterous Hands Feng Chen et.al. 2511.01791 null
2025-11-03 MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll Alexander Schperberg et.al. 2511.01774 null
2025-11-03 RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks Mian Wu et.al. 2511.01758 null
2025-11-03 Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding Jungyeon Koh et.al. 2511.01695 null
2025-11-03 Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward Xiaogang Xu et.al. 2511.01645 null
2025-11-03 Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models Xiaoyu Zhan et.al. 2511.01618 null
2025-11-03 L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3 Xinyue Yang et.al. 2511.01602 null
2025-11-03 Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning Aditya Kapoor et.al. 2511.01554 null
2025-11-03 TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks Hanwen Xu et.al. 2511.01527 null
2025-11-03 BARD: budget-aware reasoning distillation Lujie Niu et.al. 2511.01470 null
2025-11-03 Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis Yuhang Huang et.al. 2511.01425 null
2025-11-03 Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm Amrapali Pednekar et.al. 2511.01415 null
2025-11-03 AoI-Aware Machine Learning for Constrained Multimodal Sensing-Aided Communications Abolfazl Zakeri et.al. 2511.01406 null
2025-11-03 Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization Ziqi Wang et.al. 2511.01374 null
2025-11-03 Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series Wenrui Cai et.al. 2511.01354 null
2025-11-03 Diffusion-Based Solver for CNF Placement on the Cloud-Continuum Álvaro Vázquez Rodríguez et.al. 2511.01343 null
2025-11-03 RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models Hongyin Zhang et.al. 2511.01331 null
2025-11-03 From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models Sureyya Akin et.al. 2511.01310 null
2025-11-03 Optimizing Electric Vehicle Charging Station Placement Using Reinforcement Learning and Agent-Based Simulations Minh-Duc Nguyen et.al. 2511.01218 null
2025-11-03 Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering Riddhi Jain et.al. 2511.01213 null
2025-11-03 DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection Guoxin Ma et.al. 2511.01192 null
2025-11-03 Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning Ru Wang et.al. 2511.01191 null
2025-11-03 DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models Ruofan Zhang et.al. 2511.01170 null
2025-11-02 SLAP: Shortcut Learning for Abstract Planning Y. Isabel Liu et.al. 2511.01107 null
2025-11-02 HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning Yujian Liu et.al. 2511.01104 null
2025-11-02 Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment Zihan Wang et.al. 2511.01083 null
2025-11-02 Predictive Auxiliary Learning for Belief-based Multi-Agent Systems Qinwei Huang et.al. 2511.01078 null
2025-11-02 Quantum Reinforcement Learning for 6G and Beyond Wireless Networks Dinh-Hieu Tran et.al. 2511.01070 null
2025-11-02 Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning Wenjin Liu et.al. 2511.01016 link
2025-11-02 IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation Bosi Wen et.al. 2511.01014 null
2025-11-02 MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL Haolin Yang et.al. 2511.01008 link
2025-11-02 GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies Ziye Wang et.al. 2511.00998 null
2025-11-02 Optimizing Energy and Latency in 6G Smart Cities with Edge CyberTwins Amine Abouaomar et.al. 2511.00955 null
2025-11-02 KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization Joonyoung Lim et.al. 2511.00880 null
2025-11-02 Optimal Undulatory Swimming with Constrained Deformation and Actuation Intervals Fumiya Tokoro et.al. 2511.00816 null
2025-11-02 Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games Runyu Lu et.al. 2511.00811 null
2025-11-02 Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events? Bowen Fang et.al. 2511.00808 null
2025-11-02 Logic-informed reinforcement learning for cross-domain optimization of large-scale cyber-physical systems Guangxi Wan et.al. 2511.00806 null
2025-11-02 GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents Jie JW Wu et.al. 2511.00802 null
2025-11-02 Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration Yan Sun et.al. 2511.00794 null
2025-11-02 Power Control Based on Multi-Agent Deep Q Network for D2D Communication Shi Gengtian et.al. 2511.00767 null
2025-11-01 Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries Minghe Shen et.al. 2511.00710 null
2025-11-01 PreferThinker: Reasoning-based Personalized Image Preference Assessment Shengqi Xu et.al. 2511.00609 null
2025-11-01 OpenSIR: Open-Ended Self-Improving Reasoner Wai-Chung Kwan et.al. 2511.00602 link
2025-11-01 Improving Robustness to Out-of-Distribution States in Imitation Learning via Deep Koopman-Boosted Diffusion Policy Dianye Huang et.al. 2511.00555 null
2025-11-01 Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control Qiang Li et.al. 2511.00551 null
2025-11-01 Robust Single-Agent Reinforcement Learning for Regional Traffic Signal Control Under Demand Fluctuations Qiang Li et.al. 2511.00549 null
2025-11-01 ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation Panwang Pan et.al. 2511.00511 null
2025-11-01 GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining Chunyu Wei et.al. 2511.00457 null
2025-11-01 Bootstrap Off-policy with World Model Guojian Zhan et.al. 2511.00423 null
2025-11-01 UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings Zhibin Lan et.al. 2511.00405 link
2025-11-01 CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks Long Li et.al. 2511.00396 null
2025-11-01 VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning Xuanle Zhao et.al. 2511.00391 link
2025-11-01 Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond Fan Zhang et.al. 2511.00389 null
2025-11-01 Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict Chaochen Wu et.al. 2511.00370 null
2025-10-31 Reinforcement Learning for Resource Allocation in Vehicular Multi-Fog Computing Mohammad Hadi Akbarzadeh et.al. 2511.00276 null
2025-10-31 Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning Michiel Straat et.al. 2511.00272 null
2025-10-31 Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning Marwa Abdulhai et.al. 2511.00222 null
2025-10-31 Iterative Foundation Model Fine-Tuning on Multiple Rewards Pouya M. Ghari et.al. 2511.00220 null
2025-10-31 Deep reinforcement learning for optimal trading with partial information Andrea Macrì et.al. 2511.00190 null
2025-10-31 Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning Shiman Zhang et.al. 2511.00166 null
2025-10-31 EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations Justin Yu et.al. 2511.00153 null
2025-10-31 A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control Qing Guo et.al. 2511.00136 null
2025-10-31 DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads Antonio Guillen-Perez et.al. 2511.00117 null
2025-10-31 LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers Avisek Naug et.al. 2511.00116 null
2025-10-31 End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning Hanae Elmekki et.al. 2511.00114 null
2025-10-30 Real-DRL: Teach and Learn in Reality Yanbing Mao et.al. 2511.00112 null
2025-10-30 Self-Improving Vision-Language-Action Models with Data Generation via Residual RL Wenli Xiao et.al. 2511.00091 null
2025-10-30 Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail NVIDIA et.al. 2511.00088 null
2025-10-29 Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models Tue Le et.al. 2511.00066 null
2025-10-31 Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems Alireza Saleh Abadi et.al. 2510.27659 null
2025-10-31 Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Yuhong Liu et.al. 2510.27606 link
2025-10-31 MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval Qi Luo et.al. 2510.27569 null
2025-10-31 Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval Yulong Hui et.al. 2510.27566 null
2025-10-31 VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision Xuan Gong et.al. 2510.27462 null
2025-10-31 Learning Soft Robotic Dynamics with Active Exploration Hehui Zheng et.al. 2510.27428 null
2025-10-31 DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains Tian Liang et.al. 2510.27419 null
2025-10-31 Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints Yueyang Wang et.al. 2510.27383 null
2025-10-31 Reasoning Models Sometimes Output Illegible Chains of Thought Arun Jose et.al. 2510.27338 null
2025-10-31 When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making Ali Raza Jafree et.al. 2510.27334 null
2025-10-31 Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines Kristina Levina et.al. 2510.27329 null
2025-10-31 A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination Zhengchang Hua et.al. 2510.27289 null
2025-10-31 Inferring trust in recommendation systems from brain, behavioural, and physiological data Vincent K. M. Cheung et.al. 2510.27272 null
2025-10-31 MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models Kangkun Mao et.al. 2510.27267 null
2025-10-31 GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation Tao Liu et.al. 2510.27210 null
2025-10-31 ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction Jing Chang et.al. 2510.27168 null
2025-10-31 Disrupting Networks: Amplifying Social Dissensus via Opinion Perturbation and Large Language Models Erica Coppolillo et.al. 2510.27152 null
2025-10-31 AURA: A Reinforcement Learning Framework for AI-Driven Adaptive Conversational Surveys Jinwen Tang et.al. 2510.27126 null
2025-10-31 Towards Understanding Self-play for LLM Reasoning Justin Yang Chae et.al. 2510.27072 null
2025-10-31 Distributed Precoding for Cell-free Massive MIMO in O-RAN: A Multi-agent Deep Reinforcement Learning Framework Mohammad Hossein Shokouhi et.al. 2510.27069 null
2025-10-31 Adaptive Human-Computer Interaction Strategies Through Reinforcement Learning in Complex Rui Liu et.al. 2510.27058 null
2025-10-30 SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation Eric T. Chang et.al. 2510.27048 null
2025-10-30 Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Md Tanvirul Alam et.al. 2510.27044 link
2025-10-30 e1: Learning Adaptive Control of Reasoning Effort Michael Kleinman et.al. 2510.27042 null
2025-10-30 Algorithmic Predation: Equilibrium Analysis in Dynamic Oligopolies with Smooth Market Sharing Fabian Raoul Pieroth et.al. 2510.27008 null
2025-10-30 A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms Elise Wolf et.al. 2510.27001 null
2025-10-30 Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench Fenfen Lin et.al. 2510.26865 link
2025-10-30 Defeating the Training-Inference Mismatch via FP16 Penghui Qi et.al. 2510.26788 link
2025-10-30 A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation Ashwin Kumar et.al. 2510.26740 null
2025-10-30 Stabilizing Rayleigh-Benard convection with reinforcement learning trained on a reduced-order model Qiwei Chen et.al. 2510.26705 null
2025-10-30 Kimi Linear: An Expressive, Efficient Attention Architecture Kimi Team et.al. 2510.26692 link
2025-10-30 Action-Driven Processes for Continuous-Time Control Ruimin He et.al. 2510.26672 null
2025-10-30 Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation Qianyou Zhao et.al. 2510.26670 null
2025-10-30 The Era of Agentic Organization: Learning to Organize with Language Models Zewen Chi et.al. 2510.26658 null
2025-10-30 Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments Xiaoyi He et.al. 2510.26646 null
2025-10-30 Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications Chuang Zhang et.al. 2510.26628 null
2025-10-30 A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication Weixuan Chen et.al. 2510.26610 null
2025-10-30 Emu3.5: Native Multimodal Models are World Learners Yufeng Cui et.al. 2510.26583 link
2025-10-30 InfoFlow: Reinforcing Search Agent Via Reward Density Optimization Kun Luo et.al. 2510.26575 null
2025-10-30 Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics Prathamesh Kothavale et.al. 2510.26551 null
2025-10-30 Think Outside the Policy: In-Context Steered Policy Optimization Hsiu-Yuan Huang et.al. 2510.26519 null
2025-10-30 Data-Efficient RLVR via Off-Policy Influence Guidance Erle Zhu et.al. 2510.26491 null
2025-10-30 ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems Qiaoling Chen et.al. 2510.26475 null
2025-10-30 PolarZero: A Reinforcement Learning Approach for Low-Complexity Polarization Kernel Design Yi-Ting Hong et.al. 2510.26452 null
2025-10-30 An Impulse Control Approach to Market Making in a Hawkes LOB Market Konark Jain et.al. 2510.26438 null
2025-10-30 Human-in-the-loop Online Rejection Sampling for Robotic Manipulation Guanxing Lu et.al. 2510.26406 null
2025-10-30 Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning Wenchang Duan et.al. 2510.26389 null
2025-10-30 Towards Reinforcement Learning Based Log Loading Automation Ilya Kurinov et.al. 2510.26363 null
2025-10-30 Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle Sebastian Zieglmeier et.al. 2510.26347 null
2025-10-30 Offline Clustering of Preference Learning with Active-data Augmentation Jingyuan Liu et.al. 2510.26301 null
2025-10-30 Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving Lin Liu et.al. 2510.26292 null
2025-10-30 Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search Guochang Li et.al. 2510.26287 null
2025-10-30 Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments Gangyang Li et.al. 2510.26280 null
2025-10-30 Graph-Enhanced Policy Optimization in LLM Agent Training Jiazhen Yuan et.al. 2510.26270 null
2025-10-30 A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation Songxin Lei et.al. 2510.26184 null
2025-10-30 One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning Renhao Li et.al. 2510.26167 null
2025-10-30 Learning to Manage Investment Portfolios beyond Simple Utility Functions Maarten P. Scholl et.al. 2510.26165 null
2025-10-30 Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math Bo Pang et.al. 2510.26143 null
2025-10-30 EgoExo-Con: Exploring View-Invariant Video Temporal Understanding Minjoon Jung et.al. 2510.26113 null
2025-10-30 Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error Chenming Tang et.al. 2510.26109 null
2025-10-30 GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks Chenrui Shi et.al. 2510.26098 null
2025-10-30 Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing Fazel Arasteh et.al. 2510.26089 null
2025-10-30 Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion Chi Zhang et.al. 2510.26067 null
2025-10-30 Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods Emily Steiner et.al. 2510.26040 null
2025-10-29 Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation Feichen Gan et.al. 2510.26026 null
2025-10-29 PORTool: Tool-Use LLM Training with Rewarded Tree Feijie Wu et.al. 2510.26020 null
2025-10-29 Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Yihe Deng et.al. 2510.25992 null
2025-10-29 Estimating cognitive biases with attention-aware inverse planning Sounak Banerjee et.al. 2510.25951 null
2025-10-29 InputDSA: Demixing then Comparing Recurrent and Externally Driven Dynamics Ann Huang et.al. 2510.25943 null
2025-10-29 Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion Ziyi Wang et.al. 2510.25929 null
2025-10-29 $π_\texttt{RL}$ : Online RL Fine-tuning for Flow-based Vision-Language-Action Models Kang Chen et.al. 2510.25889 null
2025-10-29 Approximating Human Preferences Using a Multi-Judge Learned System Eitán Sprejer et.al. 2510.25884 null
2025-10-29 MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs Xiaoke Huang et.al. 2510.25867 null
2025-10-29 Adversarial Pre-Padding: Generating Evasive Network Traffic Against Transformer-Based Classifiers Quanliang Jing et.al. 2510.25810 null
2025-10-29 MetaLore: Learning to Orchestrate Communication and Computation for Metaverse Synchronization Elif Ebru Ohri et.al. 2510.25705 null
2025-10-29 PairUni: Pairwise Training for Unified Multimodal Language Models Jiani Zheng et.al. 2510.25682 null
2025-10-29 Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning Federica Tonti et.al. 2510.25679 null
2025-10-29 ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents Tianyu Yang et.al. 2510.25668 null
2025-10-29 Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills Weikang Wan et.al. 2510.25634 null
2025-10-29 EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis Yusheng Liao et.al. 2510.25628 null
2025-10-29 On the instability of local learning algorithms: Q-learning can fail in infinite state spaces Urtzi Ayesta et.al. 2510.25572 null
2025-10-29 Deep Reinforcement Learning-Based Cooperative Rate Splitting for Satellite-to-Underground Communication Networks Kaiqiang Lin et.al. 2510.25562 null
2025-10-29 Off-policy Reinforcement Learning with Model-based Exploration Augmentation Likun Wang et.al. 2510.25529 null
2025-10-29 Zero Reinforcement Learning Towards General Domains Yuyuan Zeng et.al. 2510.25528 null
2025-10-29 MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL Zekun Xu et.al. 2510.25510 null
2025-10-29 Dynamic Beamforming and Power Allocation in ISAC via Deep Reinforcement Learning Duc Nguyen Dao et.al. 2510.25496 null
2025-10-29 Reinforcement Learning techniques for the flavor problem in particle physics A. Giarnetti et.al. 2510.25495 null
2025-10-29 Generalized Pseudo-Relevance Feedback Yiteng Tu et.al. 2510.25488 null
2025-10-29 Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning Kei Ikemura et.al. 2510.25405 null
2025-10-29 Model-Free Robust Beamforming in Satellite Downlink using Reinforcement Learning Alea Schröder et.al. 2510.25393 null
2025-10-29 Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork Beiwen Zhang et.al. 2510.25340 null
2025-10-29 GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning Jiaqi Wu et.al. 2510.25320 null
2025-10-29 Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning Sagalpreet Singh et.al. 2510.25311 null
2025-10-29 Adaptive Design of mmWave Initial Access Codebooks using Reinforcement Learning Sabrine Aroua et.al. 2510.25271 null
2025-10-29 The influence of the random numbers quality on the results in stochastic simulations and machine learning Benjamin A. Antunes et.al. 2510.25269 null
2025-10-29 SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation Wang zhi et.al. 2510.25268 null
2025-10-29 One-shot Humanoid Whole-body Motion Learning Hao Huang et.al. 2510.25241 null
2025-09-26 Impact of Collective Behaviors of Autonomous Vehicles on Urban Traffic Dynamics: A Multi-Agent Reinforcement Learning Approach Ahmet Onur Akman et.al. 2509.22216 null
2025-07-29 Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics Leonard Hinckeldey et.al. 2507.21638 null
2025-07-23 Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains Anisha Gunjal et.al. 2507.17746 null
2025-07-23 Megrez2 Technical Report Boxun Li et.al. 2507.17728 null
2025-07-23 How Should We Meta-Learn Reinforcement Learning Algorithms? Alexander David Goldie et.al. 2507.17668 null
2025-07-23 CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning Lingxiao Tang et.al. 2507.17548 null
2025-07-23 Generalized Advantage Estimation for Distributional Policy Gradients Shahil Shaik et.al. 2507.17530 null
2025-07-23 Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice Shanbo Cheng et.al. 2507.17527 null
2025-07-23 URPO: A Unified Reward & Policy Optimization Framework for Large Language Models Songshuo Lu et.al. 2507.17515 null
2025-07-23 Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Yu Li et.al. 2507.17512 null
2025-07-23 ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents Chang Nie et.al. 2507.17462 null
2025-07-23 Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning Situo Zhang et.al. 2507.17448 null
2025-07-22 Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Junhao Shen et.al. 2507.16814 null
2025-07-22 Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty Mehul Damani et.al. 2507.16806 null
2025-07-22 Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning Mian Ibad Ali Shah et.al. 2507.16796 null
2025-07-22 Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Ang Li et.al. 2507.16746 link
2025-07-23 Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints Zhenyun Yin et.al. 2507.16727 null
2025-07-22 Adaptive Inventory Strategies using Deep Reinforcement Learning for Dynamic Agri-Food Supply Chains Amandeep Kaur et.al. 2507.16670 null
2025-07-22 FOGNITE: Federated Learning-Enhanced Fog-Cloud Architecture Somayeh Sobati-M et.al. 2507.16668 null
2025-07-22 Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis Sara Giordano et.al. 2507.16641 null
2025-07-22 Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems Ali Mohamed Ali et.al. 2507.16635 null
2025-07-22 Step-Audio 2 Technical Report Boyong Wu et.al. 2507.16632 link
2025-07-21 The Impact of Language Mixing on Bilingual LLM Reasoning Yihao Li et.al. 2507.15849 null
2025-07-21 GUI-G $^2$ : Gaussian Reward Modeling for GUI Grounding Fei Tang et.al. 2507.15846 link
2025-07-22 Hierarchical Budget Policy Optimization for Adaptive Reasoning Shangke Lyu et.al. 2507.15844 link
2025-07-21 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra Seth Karten et.al. 2507.15815 link
2025-07-21 Power-Constrained Policy Gradient Methods for LQR Ashwin Verma et.al. 2507.15806 null
2025-07-21 Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning Sneheel Sarangi et.al. 2507.15788 null
2025-07-21 Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Jiakang Wang et.al. 2507.15778 link
2025-07-21 LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Xingyu Wu et.al. 2507.15758 link
2025-07-21 EMP: Executable Motion Prior for Humanoid Robot Standing Upper-body Motion Imitation Haocheng Xu et.al. 2507.15649 null
2025-07-21 Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training Kailai Yang et.al. 2507.15640 null
2025-07-18 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Xiaoya Li et.al. 2507.14111 link
2025-07-18 Preference-based Multi-Objective Reinforcement Learning Ni Mu et.al. 2507.14066 null
2025-07-18 Reframing attention as a reinforcement learning problem for causal discovery Turan Orujlu et.al. 2507.13920 null
2025-07-18 Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments Kathrin Korte et.al. 2507.13846 null
2025-07-18 Scalable Submodular Policy Optimization via Pruned Submodularity Graph Aditi Anand et.al. 2507.13834 null
2025-07-18 DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training Zhixin Wang et.al. 2507.13833 null
2025-07-18 Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery Joydeep Chandra et.al. 2507.13757 null
2025-07-18 LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction Jing Chang et.al. 2507.13712 null
2025-07-18 CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation Jing Chang et.al. 2507.13710 null
2025-07-18 State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions Sen Lu et.al. 2507.13638 null
2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Senqiao Yang et.al. 2507.13348 link
2025-07-17 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Zhouqi Hua et.al. 2507.13332 null
2025-07-17 Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour Emma M. A. Harrison et.al. 2507.13277 null
2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation Jiazheng Li et.al. 2507.13266 null
2025-07-17 Signal Temporal Logic Compliant Co-design of Planning and Control Manas Sashank Juvvi et.al. 2507.13225 null
2025-07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL Ofir Nabati et.al. 2507.13181 null
2025-07-17 Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback Suzie Kim et.al. 2507.13171 null
2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Hao Sun et.al. 2507.13158 null
2025-07-17 From Roots to Rewards: Dynamic Tree Reasoning with RL Ahmed Bahloul et.al. 2507.13142 null
2025-07-17 ZipMPC: Compressed Context-Dependent MPC Cost via Imitation Learning Rahel Rickenbach et.al. 2507.13088 null
2025-07-16 EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos Ruihan Yang et.al. 2507.12440 null
2025-07-16 Improving Reinforcement Learning Sample-Efficiency using Local Approximation Mohit Prashant et.al. 2507.12383 null
2025-07-16 Thought Purity: Defense Paradigm For Chain-of-Thought Attack Zihao Xue et.al. 2507.12314 null
2025-07-16 Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning Yuhao Chen et.al. 2507.12215 null
2025-07-16 BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search Azhar Ikhtiarudin et.al. 2507.12189 link
2025-07-17 Efficient Preparation of Fermionic Superfluids in an Optical Dipole Trap through Reinforcement Learning Yueyang Min et.al. 2507.12152 null
2025-07-16 Topology Enhanced MARL for Multi-Vehicle Cooperative Decision-Making of CAVs Ye Han et.al. 2507.12110 null
2025-07-16 Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics Muleilan Pei et.al. 2507.12083 null
2025-07-16 Towards Ultra-Reliable 6G in-X Subnetworks: Dynamic Link Adaptation by Deep Reinforcement Learning Fateme Salehi et.al. 2507.12031 null
2025-07-16 QAS-QTNs: Curriculum Reinforcement Learning-Driven Quantum Architecture Search for Quantum Tensor Networks Siddhant Dutta et.al. 2507.12013 null
2025-07-15 Robot Drummer: Learning Rhythmic Skills for Humanoid Drumming Asad Ali Shahid et.al. 2507.11498 null
2025-07-15 Exploring the robustness of TractOracle methods in RL-based tractography Jeremi Levesque et.al. 2507.11486 null
2025-07-15 Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light Mani Hamidi et.al. 2507.11482 null
2025-07-15 Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs Gabriel Bo et.al. 2507.11371 null
2025-07-15 Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning Daniel Tanneberg et.al. 2507.11367 null
2025-07-15 Sensing Accuracy Optimization for Multi-UAV SAR Interferometry with Data Offloading Mohamed-Amine Lahmeri et.al. 2507.11284 null
2025-07-15 Ocean Diviner: A Diffusion-Augmented Reinforcement Learning for AUV Robust Control in the Underwater Tasks Weiyi Liu et.al. 2507.11283 null
2025-07-15 Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound Tal Fiskus et.al. 2507.11269 null
2025-07-15 Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction Deepak Kumar Panda et.al. 2507.11173 null
2025-07-15 Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities Yiting Qu et.al. 2507.11155 null
2025-07-14 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Mingxian Lin et.al. 2507.10548 link
2025-07-14 Disentangling Neural Disjunctive Normal Form Models Kexin Gu Baugh et.al. 2507.10546 null
2025-07-14 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Mingqi Wu et.al. 2507.10532 link
2025-07-14 Some remarks on gradient dominance and LQR policy optimization Eduardo D. Sontag et.al. 2507.10452 null
2025-07-14 Prompt Informed Reinforcement Learning for Visual Coverage Path Planning Venkat Margapuri et.al. 2507.10284 null
2025-07-14 Cross-Timeslot Optimization for Distributed GPU Inference Using Reinforcement Learning Chengze Du et.al. 2507.10259 null
2025-07-14 ToMacVF : Temporal Macro-action Value Factorization for Asynchronous Multi-Agent Reinforcement Learning Wenjing Zhang et.al. 2507.10251 null
2025-07-14 Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning? Yumi Omori et.al. 2507.10174 null
2025-07-14 Robust RL Control for Bipedal Locomotion with Closed Kinematic Chains Egor Maslennikov et.al. 2507.10164 null
2025-07-14 Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review Siyi Hu et.al. 2507.10142 null
2025-07-11 One Token to Fool LLM-as-a-Judge Yulai Zhao et.al. 2507.08794 null
2025-07-11 Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning James McCarthy et.al. 2507.08793 null
2025-07-11 Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data Jeonghye Kim et.al. 2507.08761 null
2025-07-11 On the Effect of Regularization in Policy Mirror Descent Jan Felix Kleuker et.al. 2507.08718 null
2025-07-11 SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations Peter Crowley et.al. 2507.08707 null
2025-07-11 elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings Philip Osborne et.al. 2507.08705 null
2025-07-11 Multi-critic Learning for Whole-body End-effector Twist Tracking Aravind Elanjimattathil Vijayan et.al. 2507.08656 null
2025-07-11 Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees Berire Gunes Reyhan et.al. 2507.08653 null
2025-07-11 Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning Xingguang Ji et.al. 2507.08649 link
2025-07-11 Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data Parag Dutta et.al. 2507.08610 null
2025-07-10 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Haochen Wang et.al. 2507.07999 link
2025-07-10 Single-pass Adaptive Image Tokenization for Minimum Program Search Shivam Duggal et.al. 2507.07995 null
2025-07-10 EXPO: Stable Reinforcement Learning with Expressive Policies Perry Dong et.al. 2507.07986 null
2025-07-10 Reinforcement Learning with Action Chunking Qiyang Li et.al. 2507.07969 null
2025-07-10 Scaling RL to Long Videos Yukang Chen et.al. 2507.07966 link
2025-07-10 Excess Observables Reveal Nonreciprocity in Integrated Covariance Timur Aslyamov et.al. 2507.07876 null
2025-07-10 “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents Giovanni Dispoto et.al. 2507.07848 null
2025-07-10 Beyond Robustness: Learning Unknown Dynamic Load Adaptation for Quadruped Locomotion on Rough Terrain Leixin Chang et.al. 2507.07825 null
2025-07-10 BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning Ruohong Liu et.al. 2507.07769 null
2025-07-10 Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization Chengtao Jian et.al. 2507.07723 null
2025-07-09 Graph-Based Complexity Metrics for Multi-Agent Curriculum Learning: A Validated Approach to Task Ordering in Cooperative Coordination Environments Farhaan Ebadulla et.al. 2507.07074 null
2025-07-09 First Return, Entropy-Eliciting Explore Tianyu Zheng et.al. 2507.07017 null
2025-07-09 Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks Deemah H. Tashman et.al. 2507.06997 null
2025-07-09 Optimizing Cognitive Networks: Reinforcement Learning Meets Energy Harvesting Over Cascaded Channels Deemah H. Tashman et.al. 2507.06981 null
2025-07-09 Bounomodes: the grazing ox algorithm for exploration of clustered anomalies Samuel Matloob et.al. 2507.06960 null
2025-07-10 Rethinking Verification for LLM Code Generation: From Generation to Testing Zihan Ma et.al. 2507.06920 link
2025-07-09 Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams Abolfazl Zarghani et.al. 2507.06901 null
2025-07-09 Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model Jing Liang et.al. 2507.06892 null
2025-07-09 Episodic Contextual Bandits with Knapsacks under Conversion Models Zitian Li et.al. 2507.06859 null
2025-07-10 Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning Matej Straka et.al. 2507.06825 link
2025-07-08 EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow Yixiang Chen et.al. 2507.06224 null
2025-07-08 CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Zhongyuan Peng et.al. 2507.06181 link
2025-07-08 Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model Koki Yamane et.al. 2507.06174 null
2025-07-08 Learning Agile Tensile Perching for Aerial Robots from Demonstrations Kangle Yuan et.al. 2507.06172 null
2025-07-08 Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation Mohamad H. Danesh et.al. 2507.06111 null
2025-07-08 AI-Based Demand Forecasting and Load Balancing for Optimising Energy use in Healthcare Systems: A real case study Iman Rahimi et.al. 2507.06077 null
2025-07-09 FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models Bo Pang et.al. 2507.06057 null
2025-07-08 CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation Kushal Gajjar et.al. 2507.06013 null
2025-07-08 From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination Chang Yao et.al. 2507.06004 null
2025-07-08 BlueLM-2.5-3B Technical Report Baojiao Xiong et.al. 2507.05934 null
2025-07-07 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Yana Wei et.al. 2507.05255 link
2025-07-07 Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving Elahe Delavari et.al. 2507.05251 null
2025-07-07 NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving Qucheng Peng et.al. 2507.05227 null
2025-07-07 EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling Boyuan Wang et.al. 2507.05198 null
2025-07-07 Sequential Attention-based Sampling for Histopathological Analysis Tarun G et.al. 2507.05077 null
2025-07-07 Replacing thinking with tool usage enables reasoning in small language models Corrado Rainone et.al. 2507.05065 null
2025-07-07 When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning Maxence Boels et.al. 2507.05011 null
2025-07-07 Linking Homeostasis to Reinforcement Learning: Internal State Control of Motivated Behavior Naoto Yoshida et.al. 2507.04998 null
2025-07-07 Object-centric Denoising Diffusion Models for Physical Reasoning Moritz Lange et.al. 2507.04920 null
2025-07-07 Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning Sanyam Vyas et.al. 2507.04883 null
2025-07-03 MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs Purbesh Mitra et.al. 2507.02851 link
2025-07-03 StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason Kaiyi Zhang et.al. 2507.02841 null
2025-07-03 ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning Ruiyang Zhou et.al. 2507.02834 null
2025-07-03 Generalizing Verifiable Instruction Following Valentina Pyatkin et.al. 2507.02833 null
2025-07-03 Multimodal Mathematical Reasoning with Diverse Solving Perspective Wenhao Shi et.al. 2507.02804 null
2025-07-03 A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control Zilin Kang et.al. 2507.02712 null
2025-07-03 Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions Thomas Hazenberg et.al. 2507.02698 null
2025-07-03 RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes Jiaxing Wang et.al. 2507.02690 null
2025-07-03 TUC-PPO: Team Utility-Constrained Proximal Policy Optimization for Spatial Public Goods Games Zhaoqilin Yang et.al. 2507.02675 null
2025-07-03 On Efficient Bayesian Exploration in Model-Based Reinforcement Learning Alberto Caron et.al. 2507.02639 null
2025-07-02 Kwai Keye-VL Technical Report Kwai Keye Team et.al. 2507.01949 link
2025-07-02 NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks Yang Li et.al. 2507.01921 null
2025-07-02 Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models Chengao Li et.al. 2507.01915 null
2025-07-02 TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types Yuhao Lin et.al. 2507.01857 null
2025-07-02 TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents Dmytro Kuzmenko et.al. 2507.01823 null
2025-07-02 Quantum reinforcement learning in dynamic environments Oliver Sefrin et.al. 2507.01691 null
2025-07-02 AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training Zhenyu Han et.al. 2507.01663 null
2025-07-02 Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning Wu Fei et.al. 2507.01551 null
2025-07-02 Chargax: A JAX Accelerated EV Charging Simulator Koen Ponse et.al. 2507.01522 null
2025-07-02 Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning Yanfei Zhang et.al. 2507.01489 null
2025-07-01 SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Bo Liu et.al. 2506.24119 link
2025-06-30 Scaling Human Judgment in Community Notes with LLMs Haiwen Li et.al. 2506.24118 null
2025-06-30 Constructing Non-Markovian Decision Process via History Aggregator Yongyi Wang et.al. 2506.24026 null
2025-06-30 Provably Efficient and Agile Randomized Q-Learning He Wang et.al. 2506.24005 null
2025-06-30 Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning Seungjun Yi et.al. 2506.23998 null
2025-06-30 ADReFT: Adaptive Decision Repair for Safe Autonomous Driving via Reinforcement Fine-Tuning Mingfei Cheng et.al. 2506.23960 null
2025-07-01 Adapt Your Body: Mitigating Proprioception Shifts in Imitation Learning Fuhang Kuang et.al. 2506.23944 null
2025-06-30 Reinforcement Learning for Synchronised Flow Control in a Dual-Gate Resin Infusion System Miguel Camacho-Sánchez et.al. 2506.23923 null
2025-06-30 The Trilemma of Truth in Large Language Models Germans Savcisens et.al. 2506.23921 link
2025-06-30 Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning Anton Andreychuk et.al. 2506.23793 link
2025-06-27 MiCo: Multi-image Contrast for Reinforcement Visual Reasoning Xi Chen et.al. 2506.22434 null
2025-06-27 ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks Pritam Dash et.al. 2506.22423 null
2025-06-27 HyperCLOVA X THINK Technical Report NAVER Cloud HyperCLOVA X Team et.al. 2506.22403 null
2025-06-27 Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL Tong Yang et.al. 2506.22401 null
2025-06-27 Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation Tao Li et.al. 2506.22365 null
2025-06-27 Education-Oriented Graph Retrieval-Augmented Generation for Learning Path Recommendation Xinghe Cheng et.al. 2506.22303 null
2025-06-27 ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning Ming Zhao et.al. 2506.22216 null
2025-06-27 A Reinforcement Learning Framework for Some Singular Stochastic Control Problems Zongxia Liang et.al. 2506.22203 null
2025-06-27 EFRame: Deeper Reasoning via Exploration-Filtering-Replay Reinforcement Learning Framework Chen Wang et.al. 2506.22200 link
2025-06-27 ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research Bavo Lesy et.al. 2506.22174 null
2025-06-26 Joint Scheduling of DER under Demand Charges: Structure and Approximation Ruixiao Yang et.al. 2506.21510 null
2025-06-26 Bridging Offline and Online Reinforcement Learning for LLMs Jack Lanchantin et.al. 2506.21495 null
2025-06-26 Reinforcement Learning for Optimal Control of Spin Magnetometers Logan W. Cooke et.al. 2506.21475 null
2025-06-26 Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage Gavin Lee Goodship et.al. 2506.21465 null
2025-06-26 Spatial Mental Modeling from Limited Views Baiqiao Yin et.al. 2506.21458 null
2025-06-26 Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning Prajwal Koirala et.al. 2506.21427 null
2025-06-26 rQdia: Regularizing Q-Value Distributions With Image Augmentation Sam Lerman et.al. 2506.21367 null
2025-06-26 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Qize Yang et.al. 2506.21277 link
2025-06-26 World-aware Planning Narratives Enhance Large Vision-Language Model Planner Junhao Shi et.al. 2506.21230 null
2025-06-26 Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design Hampus Gummesson Svensson et.al. 2506.21158 null
2025-06-25 MMSearch-R1: Incentivizing LMMs to Search Jinming Wu et.al. 2506.20670 link
2025-06-25 DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy Sungjae Park et.al. 2506.20668 null
2025-06-25 The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind Andrei Lupu et.al. 2506.20664 null
2025-06-25 DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation Shansan Gong et.al. 2506.20639 link
2025-06-25 PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models Soufiane Hayou et.al. 2506.20629 link
2025-06-25 Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control Andrew Mole et.al. 2506.20554 null
2025-06-25 Demonstration of effective UCB-based routing in skill-based queues on real-world data Sanne van Kempen et.al. 2506.20543 null
2025-06-25 Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards Charles Arnal et.al. 2506.20520 null
2025-06-25 OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Zengzhi Wang et.al. 2506.20512 link
2025-06-25 ReCode: Updating Code API Knowledge with Reinforcement Learning Haoze Wu et.al. 2506.20495 link
2025-06-24 JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning Ai Han et.al. 2506.19846 null
2025-06-24 Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning Guo Li et.al. 2506.19843 null
2025-06-24 Persona Features Control Emergent Misalignment Miles Wang et.al. 2506.19823 null
2025-06-24 KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Baochang Ren et.al. 2506.19807 null
2025-06-24 Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning Menglong Zhang et.al. 2506.19785 null
2025-06-24 SAGE: Strategy-Adaptive Generation Engine for Query Rewriting Teng Wang et.al. 2506.19783 null
2025-06-24 Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment Yuhui Sun et.al. 2506.19780 null
2025-06-24 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Yuqian Fu et.al. 2506.19767 null
2025-06-24 Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks Nathan Maurer et.al. 2506.19703 null
2025-06-24 From memories to maps: Mechanisms of in context reinforcement learning in transformers Ching Fang et.al. 2506.19686 null
2025-06-23 ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Jiaru Zou et.al. 2506.18896 null
2025-06-23 Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning Anthony Kobanda et.al. 2506.18847 null
2025-06-23 LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Yuhao Wu et.al. 2506.18841 null
2025-06-23 SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives Yizhou Chen et.al. 2506.18825 null
2025-06-23 MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation Ruicheng Zhang et.al. 2506.18679 null
2025-06-23 Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation Jingming Liu et.al. 2506.18670 null
2025-06-23 RL-Driven Semantic Compression Model Selection and Resource Allocation in Semantic Communication Systems Xinyi Lin et.al. 2506.18660 null
2025-06-23 Dual-level Behavioral Consistency for Inter-group and Intra-group Coordination in Multi-Agent Systems Shuocun Yang et.al. 2506.18651 null
2025-06-23 Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits Yannik Mahlau et.al. 2506.18627 null
2025-06-23 Policy gradient methods for ordinal policies Simón Weinberger et.al. 2506.18614 null
2025-06-20 No Free Lunch: Rethinking Internal Feedback for LLM Reasoning Yanzhi Zhang et.al. 2506.17219 null
2025-06-20 Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Zeyuan Yang et.al. 2506.17218 null
2025-06-20 BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning Xuechen Zhang et.al. 2506.17211 null
2025-06-20 Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning Guozheng Ma et.al. 2506.17204 null
2025-06-20 Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity Samin Yeasar Arnob et.al. 2506.17155 null
2025-06-20 When Can Model-Free Reinforcement Learning be Enough for Thinking? Josiah P. Hanna et.al. 2506.17124 null
2025-06-20 TransDreamerV3: Implanting Transformer In DreamerV3 Shruti Sadanand Dongare et.al. 2506.17103 null
2025-06-20 Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs Ricardo Rei et.al. 2506.17080 null
2025-06-20 Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment Leizhen Wang et.al. 2506.17029 null
2025-06-20 Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators Marco Jiralerspong et.al. 2506.17007 null
2025-06-18 Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards Qingming Liu et.al. 2506.15684 null
2025-06-18 CC-LEARN: Cohort-based Consistency Learning Xiao Ye et.al. 2506.15662 null
2025-06-18 CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization Ranting Hu et.al. 2506.15654 null
2025-06-18 AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning Tevin Wang et.al. 2506.15651 null
2025-06-18 Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement Weixiang Zhao et.al. 2506.15647 null
2025-06-18 Learning to flock in open space by avoiding collisions and staying together Martino Brambati et.al. 2506.15587 null
2025-06-18 Design of an all-facet illuminator for high NA EUV lithography exposure tool based on deep reinforcement learning Tong Li et.al. 2506.15558 null
2025-06-18 Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning Roger Creus Castanyer et.al. 2506.15544 link
2025-06-18 Lessons from Training Grounded LLMs with Verifiable Rewards Shang Hong Sim et.al. 2506.15522 null
2025-06-18 Zero-Shot Reinforcement Learning Under Partial Observability Scott Jeen et.al. 2506.15446 null
2025-06-17 Reasoning with Exploration: An Entropy Perspective Daixuan Cheng et.al. 2506.14758 null
2025-06-17 Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation Carolina Higuera et.al. 2506.14754 null
2025-06-17 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Ring Team et.al. 2506.14731 null
2025-06-17 Adaptive Accompaniment with ReaLchords Yusong Wu et.al. 2506.14723 null
2025-06-17 SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning Hexian Ni et.al. 2506.14648 null
2025-06-17 On Quantum BSDE Solver for High-Dimensional Parabolic PDEs Howard Su et.al. 2506.14612 null
2025-06-17 TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization Mingkang Zhu et.al. 2506.14574 null
2025-06-17 Toward Safety-First Human-Like Decision Making for Autonomous Vehicles in Time-Varying Traffic Flow Xiao Wang et.al. 2506.14502 null
2025-06-17 Zeroth-Order Optimization is Secretly Single-Step Policy Optimization Junbin Qiu et.al. 2506.14460 null
2025-06-17 Toward Rich Video Human-Motion2D Generation Ruihao Xi et.al. 2506.14428 null
2025-06-16 Touch begins where vision ends: Generalizable policies for contact-rich manipulation Zifan Zhao et.al. 2506.13762 null
2025-06-16 MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering Arya Fayyazi et.al. 2506.13755 null
2025-06-16 LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction Haoru Xue et.al. 2506.13751 null
2025-06-16 PB $^2$ : Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning Brahim Driss et.al. 2506.13741 null
2025-06-16 TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning Junru Zhang et.al. 2506.13705 link
2025-06-16 Value-Free Policy Optimization via Reward Partitioning Bilal Faye et.al. 2506.13702 null
2025-06-16 OneRec Technical Report Guorui Zhou et.al. 2506.13695 null
2025-06-16 Meta-learning how to Share Credit among Macro-Actions Ionel-Alexandru Hosu et.al. 2506.13690 null
2025-06-16 The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning Jiashun Liu et.al. 2506.13672 null
2025-06-16 We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems Junfeng Fang et.al. 2506.13666 null
2025-06-13 Schema-R1: A reasoning training approach for schema linking in Text-to-SQL Task Wuzhenghong Wen et.al. 2506.11986 null
2025-06-13 Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks Ankit Bhardwaj et.al. 2506.11973 null
2025-06-13 Visual Pre-Training on Unlabeled Images using Reinforcement Learning Dibya Ghosh et.al. 2506.11967 null
2025-06-13 Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning Mohammadamin Moradi et.al. 2506.11957 null
2025-06-13 SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies Nadun Ranawaka Arachchige et.al. 2506.11948 null
2025-06-13 Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations Miguel Suau et.al. 2506.11912 null
2025-06-13 Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients Chapa Sirithunge et.al. 2506.11906 null
2025-06-13 TreeRL: LLM Reinforcement Learning with On-Policy Tree Search Zhenyu Hou et.al. 2506.11902 link
2025-06-13 An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing Haochen Sun et.al. 2506.11882 null
2025-06-13 LLM-based Dynamic Differential Testing for Database Connectors with Reinforcement Learning-Guided Prompt Selection Ce Lyu et.al. 2506.11870 null
2025-06-12 Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop Justin Kerr et.al. 2506.10968 null
2025-06-12 Spurious Rewards: Rethinking Training Signals in RLVR Rulin Shao et.al. 2506.10947 link
2025-06-12 Self-Adapting Language Models Adam Zweiger et.al. 2506.10943 null
2025-06-12 Magistral Mistral-AI et.al. 2506.10910 null
2025-06-12 Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning Waylon Luo et.al. 2506.10889 null
2025-06-12 Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization Pierre-François Massiani et.al. 2506.10871 null
2025-06-13 Joint Beamforming with Extremely Large Scale RIS: A Sequential Multi-Agent A2C Approach Zhi Chai et.al. 2506.10815 null
2025-06-12 Human-Robot Navigation using Event-based Cameras and Reinforcement Learning Ignacio Bugueno-Cordova et.al. 2506.10790 null
2025-06-12 PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework SiXiang Chen et.al. 2506.10741 link
2025-06-12 Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs Yucong Luo et.al. 2506.10630 null
2025-06-11 Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing Junfei Wu et.al. 2506.09965 link
2025-06-11 VerIF: Verification Engineering for Reinforcement Learning in Instruction Following Hao Peng et.al. 2506.09942 link
2025-06-11 The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability Jiachen Hu et.al. 2506.09940 null
2025-06-11 From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models Irving Fang et.al. 2506.09930 link
2025-06-11 “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) Noel Brindise et.al. 2506.09901 null
2025-06-11 Hierarchical Learning-Enhanced MPC for Safe Crowd Navigation with Heterogeneous Constraints Huajian Liu et.al. 2506.09859 null
2025-06-11 Foundation Model-Aided Deep Reinforcement Learning for RIS-Assisted Wireless Communication Mohammad Ghassemi et.al. 2506.09855 null
2025-06-11 CoRT: Code-integrated Reasoning within Thinking Chengpeng Li et.al. 2506.09820 link
2025-06-11 Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy Tonghe Wang et.al. 2506.09805 null
2025-06-11 Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving Haochen Liu et.al. 2506.09800 null
2025-06-09 Play to Generalize: Learning to Reason Through Game Play Yunfei Xie et.al. 2506.08011 link
2025-06-09 Reinforcement Pre-Training Qingxiu Dong et.al. 2506.08007 null
2025-06-09 Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator Alberto Bazán-Guillén et.al. 2506.07980 null
2025-06-09 Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction Junhong Shen et.al. 2506.07976 link
2025-06-09 A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle Amirreza Yasami et.al. 2506.07929 null
2025-06-09 LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement Dimitris Panagopoulos et.al. 2506.07915 null
2025-06-09 WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning Jie Yang et.al. 2506.07905 link
2025-06-09 MiniCPM4: Ultra-Efficient LLMs on End Devices MiniCPM Team et.al. 2506.07900 link
2025-06-09 Diffusion-RL for Scalable Resource Allocation for 6G Networks Salar Nouri et.al. 2506.07880 null
2025-06-09 Versatile Loco-Manipulation through Flexible Interlimb Coordination Xinghao Zhu et.al. 2506.07876 null
2025-06-06 Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens Jihwan Jeong et.al. 2506.06261 null
2025-06-06 How to craft a deep reinforcement learning policy for wind farm flow control Elie Kadoche et.al. 2506.06204 null
2025-06-06 Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization Jonathan Yang et.al. 2506.06196 null
2025-06-06 A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization Muhammed Ustaomeroglu et.al. 2506.06179 null
2025-06-06 Reusing Trajectories in Policy Gradients Enables Fast Convergence Alessandro Montenegro et.al. 2506.06178 null
2025-06-06 Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach James Ford et.al. 2506.06175 null
2025-06-06 Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models Rihui Jin et.al. 2506.06137 null
2025-06-06 Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library Weixun Wang et.al. 2506.06122 link
2025-06-06 On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems Elim Kwan et.al. 2506.06094 null
2025-06-06 Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning Atharv Kulkarni et.al. 2506.06093 null
2025-06-05 ContentV: Efficient Training of Video Generation Models with Limited Compute Wenfeng Lin et.al. 2506.05343 null
2025-06-05 AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Lidong Lu et.al. 2506.05328 link
2025-06-05 Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay Yifan Sun et.al. 2506.05316 null
2025-06-05 Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s Ramesh Johari et.al. 2506.05308 null
2025-06-05 A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$ : Robust Imitation via Learning to Search Arnav Kumar Jain et.al. 2506.05294 link
2025-06-06 Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning Violet Xiang et.al. 2506.05256 null
2025-06-05 Towards Language-Augmented Multi-Agent Deep Reinforcement Learning Maxime Toquebiau et.al. 2506.05236 null
2025-06-05 Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning Yuhua Zhu et.al. 2506.05208 null
2025-06-05 TreeRPO: Tree Relative Policy Optimization Zhicheng Yang et.al. 2506.05183 link
2025-06-05 Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning Yunsheng Tian et.al. 2506.05168 null
2025-06-04 Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Shuang Chen et.al. 2506.04207 link
2025-06-04 MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures Elena Zamaraeva et.al. 2506.04195 null
2025-06-04 R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning Qingfei Zhao et.al. 2506.04185 link
2025-06-04 Horizon Reduction Makes RL Scalable Seohong Park et.al. 2506.04168 null
2025-06-04 SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL Jiaheng Hu et.al. 2506.04147 null
2025-06-04 Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning Muling Wu et.al. 2506.04065 null
2025-06-04 Crowd-SFT: Crowdsourcing for LLM Alignment Alex Sotiropoulos et.al. 2506.04063 null
2025-06-04 Autonomous Vehicle Lateral Control Using Deep Reinforcement Learning with MPC-PID Demonstration Chengdong Wu et.al. 2506.04040 null
2025-06-04 Interpretability by Design for Efficient Multi-Objective Reinforcement Learning Qiyue Xia et.al. 2506.04022 null
2025-06-04 Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning Xunzhu Tang et.al. 2506.03921 null
2025-06-03 Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning Yinjie Wang et.al. 2506.03136 link
2025-06-03 AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation Prashanth Vijayaraghavan et.al. 2506.03122 null
2025-06-03 Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback Xiaoying Zhang et.al. 2506.03106 link
2025-06-03 EgoVLM: Policy Optimization for Egocentric Video Understanding Ashwin Vinod et.al. 2506.03097 link
2025-06-03 DPO Learning with LLMs-Judge Signal for Computer Use Agents Man Luo et.al. 2506.03095 null
2025-06-03 Provable Reinforcement Learning from Human Feedback with an Unknown Link Function Qining Zhang et.al. 2506.03066 null
2025-06-03 EDEN: Entorhinal Driven Egocentric Navigation Toward Robotic Deployment Mikolaj Walczak et.al. 2506.03046 null
2025-06-03 Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective Jintian Shao et.al. 2506.03038 null
2025-06-03 MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver Yuepeng Zheng et.al. 2506.02935 null
2025-06-03 Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning Yin Fang et.al. 2506.02911 link
2025-05-30 ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL Yu Zhang et.al. 2505.24875 null
2025-05-30 ProxyThinker: Test-Time Guidance through Small Visual Reasoners Zilin Xiao et.al. 2505.24872 null
2025-05-30 MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning Yiqing Liang et.al. 2505.24871 null
2025-05-30 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Mingjie Liu et.al. 2505.24864 null
2025-05-30 MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Jingyan Shen et.al. 2505.24846 null
2025-05-30 AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models Conor Heins et.al. 2505.24784 null
2025-05-30 Diffusion-Based Symbolic Regression Zachary Bastiani et.al. 2505.24776 null
2025-05-30 REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Zafir Stojanovski et.al. 2505.24760 link
2025-05-30 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Shelly Bensal et.al. 2505.24726 null
2025-06-03 Reinforcing Video Reasoning with Focused Thinking Jisheng Dang et.al. 2505.24718 link
2025-05-29 ZeroGUI: Automating Online GUI Learning at Zero Human Cost Chenyu Yang et.al. 2505.23762 link
2025-05-29 DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning Ziyin Zhang et.al. 2505.23754 link
2025-05-29 PixelThink: Towards Efficient Chain-of-Pixel Reasoning Song Wang et.al. 2505.23727 null
2025-05-29 ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering Zexi Liu et.al. 2505.23723 link
2025-05-29 AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning Lucas N. Alegre et.al. 2505.23708 null
2025-05-29 Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability Ruida Wang et.al. 2505.23703 null
2025-05-29 Grounded Reinforcement Learning for Visual Reasoning Gabriel Sarch et.al. 2505.23678 null
2025-05-29 Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models Lang Cao et.al. 2505.23667 null
2025-05-29 AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction Niklas Freymuth et.al. 2505.23663 link
2025-05-29 Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation Hongxiang Zhang et.al. 2505.23657 null
2025-05-28 Maximizing Confidence Alone Improves Reasoning Mihir Prabhudesai et.al. 2505.22660 null
2025-05-28 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason Ang Lv et.al. 2505.22653 null
2025-05-28 WebDancer: Towards Autonomous Information Seeking Agency Jialong Wu et.al. 2505.22648 null
2025-05-28 FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control Younggyo Seo et.al. 2505.22642 null
2025-05-28 SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning Yu Zhang et.al. 2505.22626 null
2025-05-28 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Ganqu Cui et.al. 2505.22617 null
2025-05-28 HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym Ngoc La et.al. 2505.22597 null
2025-05-28 SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning Jiaqi Huang et.al. 2505.22596 null
2025-05-28 Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs Changhao Song et.al. 2505.22548 null
2025-05-28 Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation Hongyi Zhou et.al. 2505.22492 null
2025-05-27 Reinforcing General Reasoning without Verifiers Xiangxin Zhou et.al. 2505.21493 null
2025-05-27 Policy Optimized Text-to-Image Pipeline Design Uri Gadot et.al. 2505.21478 null
2025-05-27 Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Muzhi Zhu et.al. 2505.21457 null
2025-05-27 Can Large Reasoning Models Self-Train? Sheikh Shafayat et.al. 2505.21444 null
2025-05-27 A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment Brett Bissey et.al. 2505.21414 null
2025-05-27 MRSD: Multi-Resolution Skill Discovery for HRL Agents Shashank Sharma et.al. 2505.21410 null
2025-05-27 Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features Zixuan Xie et.al. 2505.21391 null
2025-05-27 EgoWalk: A Multimodal Dataset for Robot Navigation in the Wild Timur Akhtyamov et.al. 2505.21282 null
2025-05-27 Data-Driven Cellular Mobility Management via Bayesian Optimization and Reinforcement Learning Mohamed Benzaghta et.al. 2505.21249 null
2025-05-27 Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies Felix Chalumeau et.al. 2505.21236 null
2025-05-26 FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities Jin Wang et.al. 2505.20147 null
2025-05-26 MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning Yuanxin Zhuang et.al. 2505.20131 null
2025-05-26 Proxy-Free GFlowNet Ruishuo Chen et.al. 2505.20110 null
2025-05-26 Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning Ziyi Zhang et.al. 2505.20107 null
2025-05-26 Adaptive Deep Reasoning: Triggering Deep Thinking When Needed Yunhao Wang et.al. 2505.20101 null
2025-05-26 SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale Qi Li et.al. 2505.20094 null
2025-05-26 Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback Mengdi Li et.al. 2505.20075 null
2025-05-26 Incentivizing Reasoning from Weak Supervision Yige Yuan et.al. 2505.20072 null
2025-05-26 SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety Geon-Hyeong Kim et.al. 2505.20065 null
2025-05-26 REARANK: Reasoning Re-ranking Agent via Reinforcement Learning Le Zhang et.al. 2505.20046 null
2025-05-23 One RL to See Them All: Visual Triple Unified Reinforcement Learning Yan Ma et.al. 2505.18129 null
2025-05-23 Reward Model Overoptimisation in Iterated RLHF Lorenz Wolf et.al. 2505.18126 null
2025-05-23 ProgRM: Build Better GUI Agents with Progress Rewards Danyang Zhang et.al. 2505.18121 null
2025-05-23 Bridging Supervised Learning and Reinforcement Learning in Math Reasoning Huayu Chen et.al. 2505.18116 null
2025-05-23 Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL Joey Hong et.al. 2505.18098 null
2025-05-23 Stable Reinforcement Learning for Efficient Reasoning Muzhi Dai et.al. 2505.18086 null
2025-05-23 What Do You Need for Diverse Trajectory Stitching in Diffusion Planning? Quentin Clark et.al. 2505.18083 null
2025-05-23 Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals Jia-Nan Li et.al. 2505.18071 null
2025-05-23 Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective Jintian Shao et.al. 2505.17997 null
2025-05-23 Outcome-based Reinforcement Learning to Predict the Future Benjamin Turtel et.al. 2505.17989 null
2025-05-22 GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Chengqi Duan et.al. 2505.17022 link
2025-05-22 SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Kaixuan Fan et.al. 2505.17018 link
2025-05-22 Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO Chengzhuo Tong et.al. 2505.17017 link
2025-05-22 Interactive Post-Training for Vision-Language-Action Models Shuhan Tan et.al. 2505.17016 null
2025-05-22 R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning Huatong Song et.al. 2505.17005 link
2025-05-22 $\text{R}^2\text{ec}$ : Towards Large Recommender Models with Reasoning Runyang You et.al. 2505.16994 link
2025-05-22 SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development Yaxin Du et.al. 2505.16975 link
2025-05-22 Risk-Averse Reinforcement Learning with Itakura-Saito Loss Igor Udovichenko et.al. 2505.16925 null
2025-05-22 LARES: Latent Reasoning for Sequential Recommendation Enze Liu et.al. 2505.16865 null
2025-05-22 Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only Wei Xiao et.al. 2505.16856 null
2025-05-21 GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents Yuqi Zhou et.al. 2505.15810 link
2025-05-21 MMaDA: Multimodal Large Diffusion Language Models Ling Yang et.al. 2505.15809 link
2025-05-21 STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs Zongzhao Li et.al. 2505.15804 null
2025-05-21 VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models Yuchen Yan et.al. 2505.15801 null
2025-05-21 Reverse Engineering Human Preferences with Reinforcement Learning Lisa Alazraki et.al. 2505.15795 null
2025-05-21 HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving Zhiwen Chen et.al. 2505.15793 null
2025-05-21 VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL Fengyuan Dai et.al. 2505.15791 null
2025-05-21 ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning Changtai Zhu et.al. 2505.15776 null
2025-05-21 Improving planning and MBRL with temporally-extended actions Palash Chatterjee et.al. 2505.15754 null
2025-05-21 UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning Xiangyu Wang et.al. 2505.15725 null
2025-05-20 Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning Haolei Xu et.al. 2505.14684 link
2025-05-20 Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning Jiaer Xia et.al. 2505.14677 link
2025-05-20 Reward Reasoning Model Jiaxin Guo et.al. 2505.14674 null
2025-05-20 General-Reasoner: Advancing LLM Reasoning Across All Domains Xueguang Ma et.al. 2505.14652 link
2025-05-20 Think Only When You Need with Large Hybrid-Reasoning Models Lingjie Jiang et.al. 2505.14631 null
2025-05-20 TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Zhangchen Xu et.al. 2505.14625 link
2025-05-20 Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning Wenbin Hu et.al. 2505.14585 null
2025-05-20 Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning Deemah H. Tashman et.al. 2505.14581 null
2025-05-20 KIPPO: Koopman-Inspired Proximal Policy Optimization Andrei Cozma et.al. 2505.14566 null
2025-05-20 Bellman operator convergence enhancements in reinforcement learning algorithms David Krame Kadurha et.al. 2505.14564 null
2025-05-19 Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards Xiaoyuan Liu et.al. 2505.13445 link
2025-05-19 Optimizing Anytime Reasoning via Budget Relative Policy Optimization Penghui Qi et.al. 2505.13438 link
2025-05-19 KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture R. James Cotton et.al. 2505.13436 null
2025-05-19 G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning Liang Chen et.al. 2505.13426 link
2025-05-20 A Dataless Reinforcement Learning Approach to Rounding Hyperplane Optimization for Max-Cut Gabriel Malikal et.al. 2505.13405 null
2025-05-19 Thinkless: LLM Learns When to Think Gongfan Fang et.al. 2505.13379 link
2025-05-19 Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning Irene Brugnara et.al. 2505.13372 null
2025-05-19 J4R: Learning to Judge with Equivalent Initial State Group Relative Preference Optimization Austin Xu et.al. 2505.13346 null
2025-05-19 Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems Babak Badnava et.al. 2505.13337 null
2025-05-19 CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning Lei Sheng et.al. 2505.13271 link
2025-05-16 SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics Lizhi Yang et.al. 2505.11494 null
2025-05-16 Improving Assembly Code Performance with Large Language Models via Reinforcement Learning Anjiang Wei et.al. 2505.11480 null
2025-05-16 Automatic Reward Shaping from Confounded Offline Data Mingxuan Li et.al. 2505.11478 null
2025-05-16 HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages Zhilin Wang et.al. 2505.11475 null
2025-05-16 Disentangling Reasoning and Knowledge in Medical Large Language Models Rahul Thapa et.al. 2505.11462 null
2025-05-16 Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks Wesley A Suttle et.al. 2505.11461 null
2025-05-16 Visual Planning: Let’s Think Only with Images Yi Xu et.al. 2505.11409 link
2025-05-16 Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner Wenchuan Zhang et.al. 2505.11404 link
2025-05-16 Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space Ali Rabiee et.al. 2505.11366 null
2025-05-16 Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics Ardian Selmonaj et.al. 2505.11311 null
2025-05-15 Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Zhiyuan Hu et.al. 2505.10554 link
2025-05-15 Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation Xinrui Wang et.al. 2505.10522 null
2025-05-15 Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning Andrea Baisero et.al. 2505.10484 null
2025-05-15 Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps Ningyuan Yang et.al. 2505.10482 null
2025-05-15 Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models Zemin Huang et.al. 2505.10446 null
2025-05-15 IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning Dechen Gao et.al. 2505.10442 null
2025-05-15 Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs Jingyao Wang et.al. 2505.10425 null
2025-05-15 Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency Daniel Weitekamp et.al. 2505.10422 null
2025-05-15 Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change Jonathan Clifford Balloch et.al. 2505.10330 null
2025-05-15 J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning Chenxi Whitehouse et.al. 2505.10320 null
2025-05-14 DataMIL: Selecting Data for Robot Imitation Learning with Datamodels Shivin Dass et.al. 2505.09603 null
2025-05-14 Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware Justin Yu et.al. 2505.09601 link
2025-05-14 VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation Chaofan Zhang et.al. 2505.09577 null
2025-05-14 Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach Shannon Lodoen et.al. 2505.09576 null
2025-05-14 Learning Long-Context Diffusion Policies via Past-Token Prediction Marcel Torne et.al. 2505.09561 null
2025-05-14 WavReward: Spoken Dialogue Models With Generalist Reward Evaluators Shengpeng Ji et.al. 2505.09558 link
2025-05-14 Distilling Realizable Students from Unrealizable Teachers Yujin Kim et.al. 2505.09546 null
2025-05-14 Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data Rui Miao et.al. 2505.09496 null
2025-05-14 Preserving Plasticity in Continual Learning with Adaptive Linearity Injection Seyed Roozbeh Razavi Rohani et.al. 2505.09486 null
2025-05-14 Quantum state-agnostic work extraction (almost) without dissipation Josep Lumbreras et.al. 2505.09456 null
2025-05-13 Generative Molecular Design with Steerable and Granular Synthesizability Control Jeff Guo et.al. 2505.08774 null
2025-05-13 Preference Optimization for Combinatorial Optimization Problems Mingjun Pan et.al. 2505.08735 null
2025-05-13 A Study of Data-driven Methods for Inventory Optimization Lee Yeung Ping et.al. 2505.08673 null
2025-05-13 Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning Shuai Han et.al. 2505.08630 null
2025-05-13 Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations Sarmad Mehrdad et.al. 2505.08619 null
2025-05-13 OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Zhaochen Su et.al. 2505.08617 link
2025-05-13 Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection Ayush K. Rai et.al. 2505.08561 null
2025-05-13 Strategy-Augmented Planning for Large Language Models via Opponent Exploitation Shuai Xu et.al. 2505.08459 null
2025-05-13 Zero-Shot Sim-to-Real Reinforcement Learning for Fruit Harvesting Emlyn Williams et.al. 2505.08458 null
2025-05-13 Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges Miguel Arana-Catania et.al. 2505.08453 null
2025-05-12 DanceGRPO: Unleashing GRPO on Visual Generation Zeyue Xue et.al. 2505.07818 link
2025-05-12 A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values Daniel Beechey et.al. 2505.07797 link
2025-05-12 MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering Rushi Qiang et.al. 2505.07782 link
2025-05-12 Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving Xinji Mai et.al. 2505.07773 link
2025-05-12 Guiding Data Collection via Factored Scaling Curves Lihan Zha et.al. 2505.07728 link
2025-05-12 S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models Muzhi Dai et.al. 2505.07686 null
2025-05-12 A comparative study of Bitcoin and Ripple cryptocurrencies trading using Deep Reinforcement Learning algorithms Dieu-Donne Fangnon et.al. 2505.07660 null
2025-05-12 MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining Xiaomi LLM-Core Team et.al. 2505.07608 link
2025-05-12 Multi-Objective Reinforcement Learning for Energy-Efficient Industrial Control Georg Schäfer et.al. 2505.07607 null
2025-05-12 Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent Ziyang Huang et.al. 2505.07596 link
2025-05-09 VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction Noah Frahm et.al. 2505.06219 null
2025-05-09 Let Humanoids Hike! Integrative Skill Development on Complex Trails Kwan-Yee Lin et.al. 2505.06218 null
2025-05-09 Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach Tim Schneider et.al. 2505.06182 null
2025-05-09 Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning Haokun Yu et.al. 2505.06122 null
2025-05-09 TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations Shuaiyi Huang et.al. 2505.06079 null
2025-05-09 Safe-EF: Error Feedback for Nonsmooth Constrained Optimization Rustem Islamov et.al. 2505.06053 null
2025-05-09 Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI Jianpeng Qi et.al. 2505.06025 null
2025-05-09 Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models Lennart Stöpler et.al. 2505.05970 null
2025-05-09 Offline Multi-agent Reinforcement Learning via Score Decomposition Dan Qiao et.al. 2505.05968 null
2025-05-09 Learning Power Control Protocol for In-Factory 6G Subnetworks Uyoata E. Uyoata et.al. 2505.05967 null
2025-05-08 Flow-GRPO: Training Flow Matching Models via Online RL Jie Liu et.al. 2505.05470 link
2025-05-08 RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles Pouria Behnoudfar et.al. 2505.05452 null
2025-05-08 Reasoning Models Don’t Always Say What They Think Yanda Chen et.al. 2505.05410 null
2025-05-08 Repair Crew Routing for Infrastructure Network Restoration under Incomplete Information Subhojit Biswas et.al. 2505.05297 null
2025-05-08 Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation Zechu Li et.al. 2505.05287 null
2025-05-08 Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration Andreas Kontogiannis et.al. 2505.05262 null
2025-05-08 High Altitude Platform-Based Caching and Multicasting for Rural Connectivity Yongqiang Zhang et.al. 2505.05251 null
2025-05-08 Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation Luca Marzari et.al. 2505.05235 null
2025-05-08 Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network Changxiang Wu et.al. 2505.05231 null
2025-05-08 Multi-Objective Reinforcement Learning for Adaptive Personalized Autonomous Driving Hendrik Surmann et.al. 2505.05223 null
2025-05-07 EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning Zhenghao Xing et.al. 2505.04623 link
2025-05-07 Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation Abdulaziz Almuzairee et.al. 2505.04619 null
2025-05-07 ZeroSearch: Incentivize the Search Capability of LLMs without Searching Hao Sun et.al. 2505.04588 link
2025-05-07 Active Sampling for MRI-based Sequential Decision Making Yuning Du et.al. 2505.04586 link
2025-05-07 Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions Stéphane Aroca-Ouellette et.al. 2505.04579 null
2025-05-07 Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization Wenjun Cao et.al. 2505.04578 null
2025-05-07 Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions Shanyu Han et.al. 2505.04553 null
2025-05-07 A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance Axel Friedrich Wolter et.al. 2505.04494 null
2025-05-07 RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation Jing Hu et.al. 2505.04424 link
2025-05-07 A Heuristic-Integrated DRL Approach for Phase Optimization in Large-Scale RISs Wei Wang et.al. 2505.04401 null
2025-05-06 AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control Jialong Li et.al. 2505.03738 null
2025-05-06 Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning Dian Chen et.al. 2505.03721 null
2025-05-06 Actor-Critics Can Achieve Optimal Sample Efficiency Kevin Tan et.al. 2505.03710 null
2025-05-06 Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches Feiran Zhao et.al. 2505.03706 null
2025-05-06 Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation Songchen Fu et.al. 2505.03586 null
2025-05-06 Ergodic Generative Flows Leo Maxime Brunswic et.al. 2505.03561 null
2025-05-06 Multi-Agent Reinforcement Learning Scheduling to Support Low Latency in Teleoperated Driving Giacomo Avanzi et.al. 2505.03558 null
2025-05-06 Small-Scale-Fading-Aware Resource Allocation in Wireless Federated Learning Jiacheng Wang et.al. 2505.03533 null
2025-05-06 The Steganographic Potentials of Language Models Artem Karpov et.al. 2505.03439 null
2025-05-06 Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients Stefano Bruno et.al. 2505.03432 null
2025-05-05 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Yi-Fan Zhang et.al. 2505.02835 link
2025-05-05 TWIST: Teleoperated Whole-Body Imitation System Yanjie Ze et.al. 2505.02833 null
2025-05-05 Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing Diji Yang et.al. 2505.02811 link
2025-05-05 Teaching the social media generation: rethinking learning without sacrificing quality Sepinoud Azimi et.al. 2505.02770 null
2025-05-05 The use of Artificial Intelligence for Intervention and Assessment in Individuals with ASD Aggeliki Sideraki et.al. 2505.02747 null
2025-05-05 Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry Junu Kim et.al. 2505.02722 link
2025-05-05 Graph Neural Network-Based Reinforcement Learning for Controlling Biological Networks: The GATTACA Framework Andrzej Mizera et.al. 2505.02712 null
2025-05-05 Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models Xiaobao Wu et.al. 2505.02686 link
2025-05-05 Online Phase Estimation of Human Oscillatory Motions using Deep Learning Antonio Grotta et.al. 2505.02668 null
2025-05-05 A Survey on Progress in LLM Alignment from the Perspective of Reward Design Miaomiao Ji et.al. 2505.02666 null
2025-05-02 FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research Yan Miao et.al. 2505.01383 null
2025-05-02 Stabilizing Temporal Difference Learning via Implicit Stochastic Approximation Hwanwoo Kim et.al. 2505.01361 null
2025-05-02 Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story Vincenzo De Paola et.al. 2505.01336 null
2025-05-02 Integration of Multi-Mode Preference into Home Energy Management System Using Deep Reinforcement Learning Mohammed Sumayli et.al. 2505.01332 null
2025-05-02 Exploring Equity of Climate Policies using Multi-Agent Multi-Objective Reinforcement Learning Palok Biswas et.al. 2505.01115 null
2025-05-02 Multi-Objective Reinforcement Learning for Water Management Zuzanna Osika et.al. 2505.01094 null
2025-05-02 Llama-Nemotron: Efficient Reasoning Models Akhiad Bercovich et.al. 2505.00949 null
2025-05-01 Learning Neural Control Barrier Functions from Offline Data with Conservatism Ihab Tabbara et.al. 2505.00908 null
2025-05-01 SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation Quang P. M. Pham et.al. 2505.00831 null
2025-05-01 Constructing an Optimal Behavior Basis for the Option Keyboard Lucas N. Alegre et.al. 2505.00787 null
2025-05-01 T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Dongzhi Jiang et.al. 2505.00703 link
2025-05-01 Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions Chenggang Wang et.al. 2505.00671 null
2025-05-01 Deep Reinforcement Learning for Urban Air Quality Management: Multi-Objective Optimization of Pollution Mitigation Booth Placement in Metropolitan Environments Kirtan Rajesh et.al. 2505.00668 null
2025-05-01 Wasserstein Policy Optimization David Pfau et.al. 2505.00663 null
2025-05-01 DeepCritic: Deliberate Critique with Large Language Models Wenkai Yang et.al. 2505.00662 link
2025-05-02 100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Chong Zhang et.al. 2505.00551 null
2025-05-01 Directly Forecasting Belief for Reinforcement Learning with Delays Qingyuan Wu et.al. 2505.00546 null
2025-05-01 Emergence of Roles in Robotic Teams with Model Sharing and Limited Communication Ian O’Flynn et.al. 2505.00540 null
2025-05-01 Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks Xinyu Wang et.al. 2505.00530 null
2025-05-01 DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation Zixuan Chen et.al. 2505.00527 null
2025-04-30 DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition Z. Z. Ren et.al. 2504.21801 link
2025-04-30 Reconciling Discrete-Time Mixed Policies and Continuous-Time Relaxed Controls in Reinforcement Learning and Stochastic Control Rene Carmona et.al. 2504.21793 null
2025-04-30 MAGNET: an open-source library for mesh agglomeration by Graph Neural Networks Paola F. Antonietti et.al. 2504.21780 null
2025-04-30 LLM-based Interactive Imitation Learning for Robotic Manipulation Jonas Werner et.al. 2504.21769 null
2025-04-30 LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning Yiyang Shao et.al. 2504.21738 null
2025-04-30 Adaptive 3D UI Placement in Mixed Reality Using Deep Reinforcement Learning Feiyu Lu et.al. 2504.21731 null
2025-04-30 MovementVR: An open-source tool for the study of motor control and learning in virtual reality Cristina Rossi et.al. 2504.21696 null
2025-04-30 Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation Luca Marzari et.al. 2504.21643 null
2025-04-30 Multi-Goal Dexterous Hand Manipulation using Probabilistic Model-based Reinforcement Learning Yingzhuo Jiang et.al. 2504.21585 null
2025-04-30 SimPRIVE: a Simulation framework for Physical Robot Interaction with Virtual Environments Federico Nesti et.al. 2504.21454 null
2025-04-29 Toward Efficient Exploration by Large Language Model Agents Dilip Arumugam et.al. 2504.20997 null
2025-04-29 XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search Yiting Zhang et.al. 2504.20969 null
2025-04-29 Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity Taisuke Kobayashi et.al. 2504.20932 null
2025-04-29 ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification Ziqing Fan et.al. 2504.20930 link
2025-04-29 Exploiting inter-agent coupling information for efficient reinforcement learning of cooperative LQR Shahbaz P Qadri Syed et.al. 2504.20927 null
2025-04-29 A Domain-Agnostic Scalable AI Safety Ensuring Framework Beomjun Kim et.al. 2504.20924 null
2025-04-29 Reinforcement Learning for LLM Reasoning Under Memory Constraints Alan Lee et.al. 2504.20834 null
2025-04-29 A Teacher-Student MPC-PPO Coupled Reinforcement Learning Framework for Winter Temperature Control of Solar Greenhouses in Northern China Jingxin Yu et.al. 2504.20815 null
2025-04-29 SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings Florian Vahl et.al. 2504.20808 null
2025-04-29 Q-Fusion: Diffusing Quantum Circuits Collin Beaudoin et.al. 2504.20794 null
2025-04-28 SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning Wufei Ma et.al. 2504.20024 null
2025-04-28 Socially-Aware Autonomous Driving: Inferring Yielding Intentions for Safer Interactions Jing Wang et.al. 2504.20004 null
2025-04-28 Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets Adam Younsi et.al. 2504.19981 null
2025-04-28 Mesh-Learner: Texturing Mesh with Spherical Harmonics Yunfei Wan et.al. 2504.19938 null
2025-04-28 Automated decision-making for dynamic task assignment at scale Riccardo Lo Bianco et.al. 2504.19933 null
2025-04-28 GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets Mingqian He et.al. 2504.19898 null
2025-04-28 Optimizing the Charging of Open Quantum Batteries using Long Short-Term Memory-Driven Reinforcement Learning Shadab Zakavati et.al. 2504.19840 null
2025-04-28 LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects Guangyi Liu et.al. 2504.19838 link
2025-04-28 Reinforcement Learning-Based Heterogeneous Multi-Task Optimization in Semantic Broadcast Communications Zhilin Lu et.al. 2504.19806 null
2025-04-28 Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control Heisei Yonezawa et.al. 2504.19715 null
2025-04-25 Generalization Capability for Imitation Learning Yixiao Wang et.al. 2504.18538 null
2025-04-25 Intelligent Attacks and Defense Methods in Federated Learning-enabled Energy-Efficient Wireless Networks Han Zhang et.al. 2504.18519 null
2025-04-25 Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation Peiyuan Jing et.al. 2504.18453 null
2025-04-25 Pushing the boundary on Natural Language Inference Pablo Miralles-González et.al. 2504.18376 null
2025-04-25 Explainable AI for UAV Mobility Management: A Deep Q-Network Approach for Handover Minimization Irshad A. Meer et.al. 2504.18371 null
2025-04-25 Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps Simon Hakenes et.al. 2504.18300 null
2025-04-25 Depth-Constrained ASV Navigation with Deep RL and Limited Sensing Amirhossein Zhalehmehrabi et.al. 2504.18253 null
2025-04-25 Aligning Language Models for Icelandic Legal Text Summarization Þórir Hrafn Harðarson et.al. 2504.18180 null
2025-04-25 Offline Learning of Controllable Diverse Behaviors Mathieu Petitbois et.al. 2504.18160 null
2025-04-25 Learning from Less: SINDy Surrogates in RL Aniket Dixit et.al. 2504.18113 null
2025-04-24 Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control Haochen Wang et.al. 2504.17771 null
2025-04-24 Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence Edward Collins et.al. 2504.17703 null
2025-04-24 Applied Sheaf Theory For Multi-agent Artificial Intelligence (Reinforcement Learning) Systems: A Prospectus Eric Schmid et.al. 2504.17700 null
2025-04-24 SAPO-RL: Sequential Actuator Placement Optimization for Fuselage Assembly via Reinforcement Learning Peng Ye et.al. 2504.17603 null
2025-04-24 Mitigating xApp conflicts for efficient network slicing in 6G O-RAN: a graph convolutional-based attention network approach Sihem Bakri et.al. 2504.17590 null
2025-04-24 Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization Hongshu Guo et.al. 2504.17578 null
2025-04-24 Cooperative Task Offloading through Asynchronous Deep Reinforcement Learning in Mobile Edge Computing for Future Networks Yuelin Liu et.al. 2504.17526 null
2025-04-24 Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning Mingqi Yuan et.al. 2504.17490 null
2025-04-24 Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning Weiliang Zhang et.al. 2504.17356 null
2025-04-24 Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization Xiaohan Huang et.al. 2504.17355 null
2025-04-23 Latent Diffusion Planning for Imitation Learning Amber Xie et.al. 2504.16925 null
2025-04-23 Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms Hsin-Jung Yang et.al. 2504.16916 null
2025-04-23 Hybrid Reinforcement Learning and Model Predictive Control for Adaptive Control of Hydrogen-Diesel Dual-Fuel Combustion Julian Bedei et.al. 2504.16875 null
2025-04-23 Monte Carlo Planning with Large Language Model for Text-Based Game Agents Zijing Shi et.al. 2504.16855 null
2025-04-23 SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward Nicolas Jonason et.al. 2504.16839 null
2025-04-23 MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme Weixi Li et.al. 2504.16729 null
2025-04-23 PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation Wenxuan Li et.al. 2504.16693 null
2025-04-23 Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator Chenhao Li et.al. 2504.16680 null
2025-04-23 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Chris et.al. 2504.16656 link
2025-04-23 Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models Fredy Pokou et.al. 2504.16635 null
2025-04-22 TTRL: Test-Time Reinforcement Learning Yuxin Zuo et.al. 2504.16084 link
2025-04-22 LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Thomas Schmied et.al. 2504.16078 null
2025-04-22 Reinforcement Learning and Metaheuristics for Feynman Integral Reduction Mao Zeng et.al. 2504.16045 null
2025-04-22 The Formation of Production Networks: How Supply Chains Arise from Simple Learning with Minimal Information Tuong Manh Vu et.al. 2504.16010 null
2025-04-22 Making Neural Networks More Suitable for Approximate Clifford+T Circuit Synthesis Mathias Weiden et.al. 2504.15990 null
2025-04-22 Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems Lukas Gehrke et.al. 2504.15984 null
2025-04-22 Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning Wang Lin et.al. 2504.15932 null
2025-04-22 StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation Yinmin Zhong et.al. 2504.15930 null
2025-04-22 New Recipe for Semi-supervised Community Detection: Clique Annealing under Crystallization Kinetics Ling Cheng et.al. 2504.15927 null
2025-04-22 GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network Wenjing Xiao et.al. 2504.15905 null
2025-04-21 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Weiye Xu et.al. 2504.15279 null
2025-04-21 Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Jie Cheng et.al. 2504.15275 link
2025-04-21 FlowReasoner: Reinforcing Query-Level Meta-Agents Hongcheng Gao et.al. 2504.15257 link
2025-04-21 DRAGON: Distributional Rewards Optimize Diffusion Generative Models Yatong Bai et.al. 2504.15217 null
2025-04-21 Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs Marina Sakharova et.al. 2504.15210 null
2025-04-21 Beyond Binary Opinions: A Deep Reinforcement Learning-Based Approach to Uncertainty-Aware Competitive Influence Maximization Qi Zhang et.al. 2504.15131 null
2025-04-21 A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment Kangyao Huang et.al. 2504.15129 null
2025-04-21 Fast-Slow Co-advancing Optimizer: Toward Harmonious Adversarial Training of GAN Lin Wang et.al. 2504.15099 null
2025-04-21 Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL Simone Papicchio et.al. 2504.15077 null
2025-04-21 Energy-Efficient UAV-Mounted RIS for IoT: A Hybrid Energy Harvesting and DRL Approach Mahmoud M. Salim et.al. 2504.15043 null
2025-04-18 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Yang Yue et.al. 2504.13837 null
2025-04-18 Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning Yixuan Even Xu et.al. 2504.13818 null
2025-04-18 DiffOG: Differentiable Policy Trajectory Optimization with Generalizability Zhengtong Xu et.al. 2504.13807 null
2025-04-18 Imitation Learning with Precisely Labeled Human Demonstrations Yilong Song et.al. 2504.13803 null
2025-04-18 Bake Two Cakes with One Oven: RL for Defusing Popularity Bias and Cold-start in Third-Party Library Recommendations Minh Hoang Vuong et.al. 2504.13772 null
2025-04-18 A Reinforcement Learning Method to Factual and Counterfactual Explanations for Session-based Recommendation Han Zhou et.al. 2504.13632 null
2025-04-18 Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning Rohan P. Singh et.al. 2504.13619 null
2025-04-18 On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting Niklas Funk et.al. 2504.13618 null
2025-04-18 Compile Scene Graphs with Reinforcement Learning Zuyao Chen et.al. 2504.13617 null
2025-04-18 Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling Zihao Feng et.al. 2504.13592 null
2025-04-17 Energy-Based Reward Models for Robust Language Model Alignment Anamika Lochab et.al. 2504.13134 null
2025-04-17 LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard Varun Rao et.al. 2504.13125 null
2025-04-17 SkyReels-V2: Infinite-length Film Generative Model Guibin Chen et.al. 2504.13074 null
2025-04-17 NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Xiangyan Liu et.al. 2504.13055 null
2025-04-17 InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning Zheng Wang et.al. 2504.13032 null
2025-04-17 QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? Zhouyang Jiang et.al. 2504.12961 null
2025-04-17 RL-PINNs: Reinforcement Learning-Driven Adaptive Sampling for Efficient Training of PINNs Zhenao Song et.al. 2504.12949 null
2025-04-17 Image-Editing Specialists: An RLAIF Approach for Diffusion Models Elior Benarous et.al. 2504.12833 link
2025-04-17 Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis James Rudd-Jones et.al. 2504.12777 null
2025-04-17 GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks Hao Xu et.al. 2504.12764 null
2025-04-16 Adapting a World Model for Trajectory Following in a 3D Game Marko Tot et.al. 2504.12299 null
2025-04-16 d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning Siyan Zhao et.al. 2504.12216 null
2025-04-16 Reasoning-Based AI for Startup Evaluation (R.A.I.S.E.): A Memory-Augmented, Multi-Step Decision Framework Jack Preuveneers et.al. 2504.12090 null
2025-04-16 pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild Jonas Myhre Schiøtt et.al. 2504.12045 null
2025-04-16 Evolutionary Reinforcement Learning for Interpretable Decision-Making in Supply Chain Management Stefano Genetti et.al. 2504.12023 null
2025-04-16 Control of Rayleigh-Bénard Convection: Effectiveness of Reinforcement Learning in the Turbulent Regime Thorben Markmann et.al. 2504.12000 null
2025-04-16 A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs Kihyuk Hong et.al. 2504.11997 null
2025-04-16 Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions Yifei Dong et.al. 2504.11967 null
2025-04-16 R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors Haoyang Wang et.al. 2504.11946 null
2025-04-16 VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning Xuyang Chen et.al. 2504.11944 null
2025-04-15 DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Zhiwei He et.al. 2504.11456 null
2025-04-15 A Clean Slate for Offline Reinforcement Learning Matthew Thomas Jackson et.al. 2504.11453 null
2025-04-15 Embodied World Models Emerge from Navigational Task in Open-Ended Environments Li Jin et.al. 2504.11419 null
2025-04-15 Measures of Variability for Risk-averse Policy Gradient Yudong Luo et.al. 2504.11412 null
2025-04-15 Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning Haiming Wang et.al. 2504.11354 null
2025-04-15 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Wei Xiong et.al. 2504.11343 null
2025-04-15 Multi-Agent Reinforcement Learning for Greenhouse Gas Offset Credit Markets Liam Welsh et.al. 2504.11258 null
2025-04-15 A Rollout-Based Algorithm and Reward Function for Efficient Resource Allocation in Business Processes Jeroen Middelhuis et.al. 2504.11250 null
2025-04-15 Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks Fikrican Özgür et.al. 2504.11247 null
2025-04-15 Revealing Covert Attention by Analyzing Human and Reinforcement Learning Agent Gameplay Henrik Krauss et.al. 2504.11118 null
2025-04-14 Weight Ensembling Improves Reasoning in Language Models Xingyu Dang et.al. 2504.10478 null
2025-04-14 Co-optimizing Physical Reconfiguration Parameters and Controllers for an Origami-inspired Reconfigurable Manipulator Zhe Chen et.al. 2504.10474 null
2025-04-14 GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents Xiaobo Xia et.al. 2504.10458 null
2025-04-14 The Communication and Computation Trade-off in Wireless Semantic Communications Xuyang Chen et.al. 2504.10357 null
2025-04-14 Heimdall: test-time scaling on the generative verification Wenlei Shi et.al. 2504.10337 null
2025-04-14 Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning Guanqi He et.al. 2504.10334 null
2025-04-14 InstructEngine: Instruction-driven Text-to-Image Alignment Xingyu Lu et.al. 2504.10329 null
2025-04-14 Vision based driving agent for race car simulation environments Gergely Bári et.al. 2504.10266 null
2025-04-14 Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins Collins O. Ogbodo et.al. 2504.10248 null
2025-04-14 Deep Reasoning Translation via Reinforcement Learning Jiaan Wang et.al. 2504.10187 null
2025-04-11 Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing Vinal Asodia et.al. 2504.08704 null
2025-04-11 Pobogot – An Open-Hardware Open-Source Low Cost Robot for Swarm Robotics Alessia Loi et.al. 2504.08686 null
2025-04-11 Reinforcement Learning-Driven Plant-Wide Refinery Planning Using Model Decomposition Zhouchang Li et.al. 2504.08642 null
2025-04-11 Neural Fidelity Calibration for Informative Sim-to-Real Adaptation Youwei Yu et.al. 2504.08604 null
2025-04-11 SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Peixian Ma et.al. 2504.08600 link
2025-04-11 Playpen: An Environment for Exploring Learning Through Conversational Interaction Nicola Horst et.al. 2504.08590 null
2025-04-11 Slicing the Gaussian Mixture Wasserstein Distance Moritz Piening et.al. 2504.08544 null
2025-04-11 Diffusion Models for Robotic Manipulation: A Survey Rosa Wolf et.al. 2504.08438 null
2025-04-11 Belief States for Cooperative Multi-Agent Reinforcement Learning under Partial Observability Paul J. Pritz et.al. 2504.08417 null
2025-04-11 Scalable Conflict-free Decision Making with Photons Kohei Konaka et.al. 2504.08331 null
2025-04-10 Perception-R1: Pioneering Perception Policy with Reinforcement Learning En Yu et.al. 2504.07954 link
2025-04-10 Echo: An Open-Source, Low-Cost Teleoperation System with Force Feedback for Dataset Collection in Robot Learning Artem Bazhenov et.al. 2504.07939 null
2025-04-10 Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining Rosie Zhao et.al. 2504.07912 link
2025-04-10 Fast Adaptation with Behavioral Foundation Models Harshit Sikchi et.al. 2504.07896 null
2025-04-10 2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization Mengyang Li et.al. 2504.07856 null
2025-04-10 Genetic Programming with Reinforcement Learning Trained Transformer for Real-World Dynamic Scheduling Problems Xian Chen et.al. 2504.07779 null
2025-04-10 Harnessing Equivariance: Modeling Turbulence with Graph Neural Networks Marius Kurz et.al. 2504.07741 null
2025-04-10 Relaxing the Markov Requirements on Reinforcement Learning Under Weak Partial Ignorability MaryLena Bleile et.al. 2504.07722 null
2025-04-10 Sim-to-Real Transfer in Reinforcement Learning for Maneuver Control of a Variable-Pitch MAV Zhikun Wang et.al. 2504.07694 null
2025-04-10 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Haozhan Shen et.al. 2504.07615 link
2025-04-09 Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning Chenjie Hao et.al. 2504.07095 link
2025-04-09 AssistanceZero: Scalably Solving Assistance Games Cassidy Laidlaw et.al. 2504.07091 link
2025-04-09 A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility Andreas Hochlehnert et.al. 2504.07086 link
2025-04-09 To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning Tian Qin et.al. 2504.07052 null
2025-04-09 Free Random Projection for In-Context Reinforcement Learning Tomohiro Hayase et.al. 2504.06983 null
2025-04-09 VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Xinhao Li et.al. 2504.06958 link
2025-04-09 Regret Bounds for Robust Online Decision Making Alexander Appel et.al. 2504.06820 null
2025-04-09 Interactive Expressive Motion Generation Using Dynamic Movement Primitives Till Hielscher et.al. 2504.06735 null
2025-04-09 Learning global control of underactuated systems with Model-Based Reinforcement Learning Niccolò Turcato et.al. 2504.06721 null
2025-04-09 SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination Delin Zhao et.al. 2504.06684 null
2025-04-08 ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface Fangchen Liu et.al. 2504.06156 null
2025-04-08 Adversarial Training of Reward Models Alexander Bukharin et.al. 2504.06141 null
2025-04-08 A Multimedia Analytics Model for the Foundation Model Era Marcel Worring et.al. 2504.06138 null
2025-04-08 Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms Ido Greenberg et.al. 2504.06126 null
2025-04-08 Robo-taxi Fleet Coordination at Scale via Reinforcement Learning Luigi Tresca et.al. 2504.06125 link
2025-04-09 Leanabell-Prover: Posttraining Scaling in Formal Reasoning Jingyuan Zhang et.al. 2504.06122 link
2025-04-08 Trust-Region Twisted Policy Improvement Joery A. de Vries et.al. 2504.06048 null
2025-04-08 Information-Theoretic Reward Decomposition for Generalizable RLHF Liyuan Mao et.al. 2504.06020 null
2025-04-08 Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models J. S. van Hulst et.al. 2504.05978 null
2025-04-08 AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems Zhuoli Zhuang et.al. 2504.05950 null
2025-04-07 RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception Hui Zhang et.al. 2504.05287 null
2025-04-07 Concise Reasoning via Reinforcement Learning Mehdi Fatemi et.al. 2504.05185 link
2025-04-07 Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval Kidist Amde Mekonnen et.al. 2504.05181 link
2025-04-07 RLBayes: a Bayesian Network Structure Learning Algorithm via Reinforcement Learning-Based Search Strategy Mingcan Wang et.al. 2504.05167 null
2025-04-07 A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks Leonardo Kanashiro Felizardo et.al. 2504.05150 link
2025-04-08 VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks Yu Yue et.al. 2504.05118 null
2025-04-07 Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning Anja Surina et.al. 2504.05108 null
2025-04-08 Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation Huilin Yin et.al. 2504.05045 null
2025-04-07 Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning Bibek Poudel et.al. 2504.05018 null
2025-04-07 Wavelet Policy: Imitation Policy Learning in Frequency Domain with Wavelet Transforms Changchuan Yang et.al. 2504.04991 link
2025-04-04 Align to Structure: Aligning Large Language Models with Structural Information Zae Myung Kim et.al. 2504.03622 null
2025-04-04 Optimization of a Triangular Delaunay Mesh Generator using Reinforcement Learning Will Thacher et.al. 2504.03610 null
2025-04-04 Dexterous Manipulation through Imitation Learning: A Survey Shan An et.al. 2504.03515 null
2025-04-04 Learning Dual-Arm Coordination for Grasping Large Flat Objects Yongliang Wang et.al. 2504.03500 null
2025-04-04 Optimizing Quantum Circuits via ZX Diagrams using Reinforcement Learning and Graph Neural Networks Alexander Mattick et.al. 2504.03429 null
2025-04-04 DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models Sathish Kumar et.al. 2504.03423 null
2025-04-04 Autonomous state-space segmentation for Deep-RL sparse reward scenarios Gianluca Maselli et.al. 2504.03420 null
2025-04-04 Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning Sanghwan Bae et.al. 2504.03380 null
2025-04-04 Verification of Autonomous Neural Car Control with KeYmaera X Enguerrand Prebet et.al. 2504.03272 null
2025-04-04 Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward Yanming Wan et.al. 2504.03206 null
2025-04-03 Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Chuning Zhu et.al. 2504.02792 link
2025-04-03 A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy Andrea Ghezzi et.al. 2504.02710 null
2025-04-03 Handover and SINR-Aware Path Optimization in 5G-UAV mmWave Communication using DRL Achilles Kiwanuka Machumilane et.al. 2504.02688 null
2025-04-03 Integrating Human Knowledge Through Action Masking in Reinforcement Learning for Operations Research Mirko Stappert et.al. 2504.02662 null
2025-04-03 SymDQN: Symbolic Knowledge and Reasoning in Neural Network-based Reinforcement Learning Ivo Amador et.al. 2504.02654 null
2025-04-03 Solving the Paint Shop Problem with Flexible Management of Multi-Lane Buffers Using Reinforcement Learning and Action Masking Mirko Stappert et.al. 2504.02644 null
2025-04-03 Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Daoguang Zan et.al. 2504.02605 link
2025-04-03 Regulating Spatial Fairness in a Tripartite Micromobility Sharing System via Reinforcement Learning Matteo Cederle et.al. 2504.02597 null
2025-04-03 LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning Kepu Zhang et.al. 2504.02590 null
2025-04-04 Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Yan Ma et.al. 2504.02587 link
2025-04-02 OpenCodeReasoning: Advancing Data Distillation for Competitive Coding Wasi Uddin Ahmad et.al. 2504.01943 null
2025-04-02 Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity Lisa Coiffard et.al. 2504.01915 null
2025-04-02 GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning Yanzhou Su et.al. 2504.01886 link
2025-04-02 Interpreting Emergent Planning in Model-Free Reinforcement Learning Thomas Bush et.al. 2504.01871 null
2025-04-02 Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error Anne Somalwar et.al. 2504.01766 null
2025-04-03 Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning Ke Jiang et.al. 2504.01719 null
2025-04-02 ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs Yi-Long Lu et.al. 2504.01698 null
2025-04-02 8-DoFs Cable Driven Parallel Robots for Bimanual Teleportation Hung Hon Cheng et.al. 2504.01554 null
2025-04-02 A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics Qihao Ye et.al. 2504.01482 null
2025-04-02 Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning Llewyn Salt et.al. 2504.01459 null
2025-03-31 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Yi Chen et.al. 2503.24376 link
2025-03-31 Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning Yubo Zhang et.al. 2503.24296 null
2025-03-31 Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Jingcheng Hu et.al. 2503.24290 link
2025-03-31 Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning Jiacheng Lin et.al. 2503.24289 link
2025-03-31 Moving Edge for On-Demand Edge Computing: An Uncertainty-aware Approach Fangtong Zhou et.al. 2503.24214 null
2025-03-31 Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantees via Constrained Mean-Field Reinforcement Learning Matej Jusup et.al. 2503.24183 link
2025-03-31 Learning a Canonical Basis of Human Preferences from Binary Ratings Kailas Vodrahalli et.al. 2503.24150 null
2025-03-31 Reinforcement Learning for Safe Autonomous Two Device Navigation of Cerebral Vessels in Mechanical Thrombectomy Harry Robertshaw et.al. 2503.24140 null
2025-03-31 Level the Level: Balancing Game Levels for Asymmetric Player Archetypes With Reinforcement Learning Florian Rupp et.al. 2503.24099 null
2025-03-31 HACTS: a Human-As-Copilot Teleoperation System for Robot Learning Zhiyuan Xu et.al. 2503.24070 null
2025-03-28 Q-Insight: Understanding Image Quality via Visual Reinforcement Learning Weiqi Li et.al. 2503.22679 link
2025-03-28 Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels Adam Wei et.al. 2503.22634 null
2025-03-28 Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments S. Aaron McClendon et.al. 2503.22595 null
2025-03-28 On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations Rajdeep Singh Hundal et.al. 2503.22575 null
2025-03-28 Robust Offline Imitation Learning Through State-level Trajectory Stitching Shuze Wang et.al. 2503.22524 null
2025-03-28 Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments Luke Rowe et.al. 2503.22496 null
2025-03-28 Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model Wangtao Sun et.al. 2503.22480 null
2025-03-28 Control of Humanoid Robots with Parallel Mechanisms using Kinematic Actuation Models Victor Lutz et.al. 2503.22459 null
2025-03-28 Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning Abdullah Vanlioglu et.al. 2503.22456 null
2025-03-28 Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses Sebastián Espinel-Ríos et.al. 2503.22409 null
2025-03-27 Video-R1: Reinforcing Video Reasoning in MLLMs Kaituo Feng et.al. 2503.21776 link
2025-03-27 ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation Zhicheng Lee et.al. 2503.21729 link
2025-03-27 Collab: Controlled Decoding using Mixture of Agents for LLM Alignment Souradip Chakraborty et.al. 2503.21720 null
2025-03-27 Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Wenqi Zhang et.al. 2503.21696 link
2025-03-27 LLM-Gomoku: A Large Language Model-Based System for Strategic Gomoku with Self-Play and Reinforcement Learning Hui Wang et.al. 2503.21683 null
2025-03-27 A tale of two goals: leveraging sequentiality in multi-goal scenarios Olivier Serris et.al. 2503.21677 null
2025-03-27 UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Zhengxi Lu et.al. 2503.21620 link
2025-03-27 A Deep Reinforcement Learning-based Approach for Adaptive Handover Protocols Johannes Voigt et.al. 2503.21601 null
2025-03-27 DATA-WA: Demand-based Adaptive Task Assignment with Dynamic Worker Availability Windows Jinwen Chen et.al. 2503.21458 null
2025-03-27 On Learning-Based Traffic Monitoring With a Swarm of Drones Marko Maljkovic et.al. 2503.21433 null
2025-03-26 Understanding R1-Zero-Like Training: A Critical Perspective Zichen Liu et.al. 2503.20783 link
2025-03-27 Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning Huajie Tan et.al. 2503.20752 link
2025-03-26 Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control Eloy Anguiano Batanero et.al. 2503.20688 null
2025-03-26 Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound Yuhao Huang et.al. 2503.20685 null
2025-03-26 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Han Wu et.al. 2503.20641 link
2025-03-26 State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning Zongyuan Zhang et.al. 2503.20613 null
2025-03-26 Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models Siyuan Guo et.al. 2503.20576 null
2025-03-26 Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems Rakesh Nadig et.al. 2503.20507 null
2025-03-26 Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles Ruoqi Wen et.al. 2503.20462 null
2025-03-26 The Crucial Role of Problem Formulation in Real-World Reinforcement Learning Georg Schäfer et.al. 2503.20442 null
2025-03-25 Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Xiaoyu Tian et.al. 2503.19855 link
2025-03-25 Optimal Path Planning and Cost Minimization for a Drone Delivery System Via Model Predictive Control Muhammad Al-Zafar Khan et.al. 2503.19699 null
2025-03-25 Risk-Aware Reinforcement Learning for Autonomous Driving: Improving Safety When Driving through Intersection Bo Leng et.al. 2503.19690 null
2025-03-25 Learning to chain-of-thought with Jensen’s evidence lower bound Yunhao Tang et.al. 2503.19618 null
2025-03-25 RL-finetuning LLMs from on- and off-policy data with a single algorithm Yunhao Tang et.al. 2503.19612 null
2025-03-25 Optimizing Language Models for Inference Time Objectives using Reinforcement Learning Yunhao Tang et.al. 2503.19595 null
2025-03-25 One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF Xin Cai et.al. 2503.19523 null
2025-03-25 ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning Mingyang Chen et.al. 2503.19470 link
2025-03-25 Multi-Agent Deep Reinforcement Learning for Safe Autonomous Driving with RICS-Assisted MEC Xueyao Zhang et.al. 2503.19418 null
2025-03-25 NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios Songyi Gao et.al. 2503.19267 link
2025-03-24 Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training Brian R. Bartoldson et.al. 2503.18929 link
2025-03-24 SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Weihao Zeng et.al. 2503.18892 link
2025-03-24 Bootstrapped Model Predictive Control Yuhang Wang et.al. 2503.18871 link
2025-03-24 Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm Chak Lam Shek et.al. 2503.18816 null
2025-03-24 Sample-Efficient Reinforcement Learning of Koopman eNMPC Daniel Mayfrank et.al. 2503.18787 null
2025-03-24 Simulation-Driven Balancing of Competitive Game Levels with Reinforcement Learning Florian Rupp et.al. 2503.18748 null
2025-03-24 RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation Chengbo Yuan et.al. 2503.18738 null
2025-03-24 FF-SRL: High Performance GPU-Based Surgical Simulation For Robot Learning Diego Dall’Alba et.al. 2503.18616 null
2025-03-24 Adventurer: Exploration with BiGAN for Deep Reinforcement Learning Yongshuai Liu et.al. 2503.18612 null
2025-03-24 Reinforcement Learning in Switching Non-Stationary Markov Decision Processes: Algorithms and Convergence Analysis Mohsen Amiri et.al. 2503.18607 null
2025-03-21 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Yihe Deng et.al. 2503.17352 link
2025-03-21 Capturing Individual Human Preferences with Reward Features André Barreto et.al. 2503.17338 null
2025-03-21 FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models Mingyang Song et.al. 2503.17287 link
2025-03-21 Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem Abhijeet Pendyala et.al. 2503.17194 null
2025-03-21 Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning Chan Kim et.al. 2503.17125 null
2025-03-21 Neural-Guided Equation Discovery Jannis Brugger et.al. 2503.16953 null
2025-03-21 A New Segment Routing method with Swap Node Selection Strategy Based on Deep Reinforcement Learning for Software Defined Network Miao Ye et.al. 2503.16914 null
2025-03-21 Federated Digital Twin Construction via Distributed Sensing: A Game-Theoretic Online Optimization with Overlapping Coalitions Ruoyang Chen et.al. 2503.16823 null
2025-03-21 BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation Hirotaka Tahara et.al. 2503.16803 null
2025-03-21 Causally Aligned Curriculum Learning Mingxuan Li et.al. 2503.16799 null
2025-03-20 Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Yang Sui et.al. 2503.16419 link
2025-03-20 RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Yiran Qin et.al. 2503.16408 null
2025-03-20 Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming Minori Narita et.al. 2503.16371 null
2025-03-20 JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Muyao Li et.al. 2503.16365 link
2025-03-21 Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Zhaowei Liu et.al. 2503.16252 link
2025-03-20 Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t Quy-Anh Dang et.al. 2503.16219 link
2025-03-20 Explosive Jumping with Rigid and Articulated Soft Quadrupeds via Example Guided Reinforcement Learning Georgios Apostolides et.al. 2503.16197 null
2025-03-20 Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning Yuki Akiyama et.al. 2503.16192 null
2025-03-20 CLS-RL: Image Classification with Rule-Based Reinforcement Learning Ming Li et.al. 2503.16188 link
2025-03-20 Cultural Alignment in Large Language Models Using Soft Prompt Tuning Reem I. Masoud et.al. 2503.16094 null
2025-03-19 Learning to Play Piano in the Real World Yves-Simon Zeulner et.al. 2503.15481 null
2025-03-19 What Makes a Reward Model a Good Teacher? An Optimization Perspective Noam Razin et.al. 2503.15477 link
2025-03-19 CCDP: Composition of Conditional Diffusion Policies with Guided Sampling Amirreza Razmjoo et.al. 2503.15386 null
2025-03-19 Online Imitation Learning for Manipulation via Decaying Relative Correction through Teleoperation Cheng Pan et.al. 2503.15368 null
2025-03-19 Optimizing Decomposition for Optimal Claim Verification Yining Lu et.al. 2503.15354 link
2025-03-19 aiXcoder-7B-v2: Training LLMs to Fully Utilize the Long Context in Repository-level Code Completion Jia Li et.al. 2503.15301 null
2025-03-19 Reinforcement Learning for Robust Athletic Intelligence: Lessons from the 2nd ‘AI Olympics with RealAIGym’ Competition Felix Wiebe et.al. 2503.15290 null
2025-03-19 DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning Ruowen Zhao et.al. 2503.15265 link
2025-03-19 Partially Observable Reinforcement Learning with Memory Traces Onno Eberhard et.al. 2503.15200 null
2025-03-19 Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach Mohamed Hassouna et.al. 2503.15190 null
2025-03-18 DAPO: An Open-Source LLM Reinforcement Learning System at Scale Qiying Yu et.al. 2503.14476 null
2025-03-18 Pauli Network Circuit Synthesis with Reinforcement Learning Ayushi Dubal et.al. 2503.14448 null
2025-03-18 Flying in Highly Dynamic Environments with End-to-end Learning Approach Xiyu Fan et.al. 2503.14352 null
2025-03-18 MANTRA: Enhancing Automated Method-Level Refactoring with Contextual RAG and Multi-Agent LLM Collaboration Yisen Xu et.al. 2503.14340 null
2025-03-18 Revealing higher-order neural representations with generative artificial intelligence Hojjat Azimi Asrari et.al. 2503.14333 null
2025-03-18 Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs Nicolas Le Roux et.al. 2503.14286 null
2025-03-18 Integral modelling and Reinforcement Learning control of 3D liquid metal coating on a moving substrate Fabio Pino et.al. 2503.14270 null
2025-03-18 Automating Experimental Optics with Sample Efficient Machine Learning Methods Arindam Saha et.al. 2503.14260 null
2025-03-18 Quantization-Free Autoregressive Action Transformer Ziyad Sheebaelhamd et.al. 2503.14259 null
2025-03-18 CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration Chunyu Yang et.al. 2503.14254 null
2025-03-17 Uncovering Utility Functions from Observed Outcomes Marta Grzeskiewicz et.al. 2503.13432 null
2025-03-17 FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation Shijie Fang et.al. 2503.13418 null
2025-03-17 A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives Weiqiang Jin et.al. 2503.13415 null
2025-03-17 TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM Ye Wang et.al. 2503.13377 link
2025-03-17 Agents Play Thousands of 3D Video Games Zhongwen Xu et.al. 2503.13356 null
2025-03-17 Local-Global Learning of Interpretable Control Policies: The Interface between MPC and Reinforcement Learning Thomas Banker et.al. 2503.13289 null
2025-03-17 Timing the Match: A Deep Reinforcement Learning Approach for Ride-Hailing and Ride-Pooling Services Yiman Bao et.al. 2503.13200 null
2025-03-17 A representational framework for learning and encoding structurally enriched trajectories in complex agent environments Corina Catarau-Cotutiu et.al. 2503.13194 null
2025-03-17 HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning Wensheng Wang et.al. 2503.13171 null
2025-03-17 Efficient Imitation Under Misspecification Nicolas Espinosa-Dice et.al. 2503.13162 null
2025-03-14 Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning Siyuan Huang et.al. 2503.11646 null
2025-03-14 Scaling the Automated Discovery of Quantum Circuits via Reinforcement Learning with Gadgets Jan Olle et.al. 2503.11638 null
2025-03-14 Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control Yifeng Zhang et.al. 2503.11488 null
2025-03-14 A Review of DeepSeek Models’ Key Innovative Techniques Chengen Wang et.al. 2503.11486 null
2025-03-14 Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning Jose-Luis Holgado-Alvarez et.al. 2503.11467 null
2025-03-14 Optimizing 6G Dense Network Deployment for the Metaverse Using Deep Reinforcement Learning Jie Zhang et.al. 2503.11449 null
2025-03-14 Adaptive Torque Control of Exoskeletons under Spasticity Conditions via Reinforcement Learning Andrés Chavarrías et.al. 2503.11433 null
2025-03-14 TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Hongxiang Zhao et.al. 2503.11423 null
2025-03-14 Reinforcement Learning-Based Controlled Switching Approach for Inrush Current Minimization in Power Transformers Jone Ugarte Valdivielso et.al. 2503.11398 null
2025-03-14 Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model Moritz A. Zanger et.al. 2503.11339 null
2025-03-13 NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models Mert Albaba et.al. 2503.10626 null
2025-03-13 R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Yi Yang et.al. 2503.10615 link
2025-03-13 The Lagrangian Method for Solving Constrained Markov Games Soham Das et.al. 2503.10561 null
2025-03-13 Towards Safe Path Tracking Using the Simplex Architecture Georg Jäger et.al. 2503.10559 null
2025-03-13 SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models Sahar Admoni et.al. 2503.10509 null
2025-03-13 Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality Wei Xiao et.al. 2503.10484 null
2025-03-13 SortingEnv: An Extendable RL-Environment for an Industrial Sorting Process Tom Maus et.al. 2503.10466 null
2025-03-13 Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Liang Wen et.al. 2503.10460 link
2025-03-13 Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback Derun Li et.al. 2503.10434 null
2025-03-13 Towards Constraint-Based Adaptive Hypergraph Learning for Solving Vehicle Routing: An End-to-End Solution Zhenwei Wang et.al. 2503.10421 null
2025-03-12 Strategyproof Reinforcement Learning from Human Feedback Thomas Kleine Buening et.al. 2503.09561 null
2025-03-12 Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Bowen Jin et.al. 2503.09516 link
2025-03-12 RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment Md Morshed Alam et.al. 2503.09513 null
2025-03-12 Reinforcement Learning is all You Need Yongsheng Lian et.al. 2503.09512 null
2025-03-12 ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning Ziyu Wan et.al. 2503.09501 link
2025-03-12 Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic Kexuan Wang et.al. 2503.09391 null
2025-03-12 Evaluating Reinforcement Learning Safety and Trustworthiness in Cyber-Physical Systems Katherine Dearstyne et.al. 2503.09388 null
2025-03-12 Rule-Guided Reinforcement Learning Policy Evaluation and Improvement Martin Tappler et.al. 2503.09270 null
2025-03-12 Large-scale Regional Traffic Signal Control Based on Single-Agent Reinforcement Learning Qiang Li et.al. 2503.09252 null
2025-03-12 MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics Shuguang Chu et.al. 2503.09203 null
2025-03-11 MoE-Loco: Mixture of Experts for Multitask Locomotion Runhan Huang et.al. 2503.08564 null
2025-03-11 Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies Chen Xu et.al. 2503.08558 null
2025-03-11 TLA: Tactile-Language-Action Model for Contact-Rich Manipulation Peng Hao et.al. 2503.08548 null
2025-03-11 GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training Tong Wei et.al. 2503.08525 null
2025-03-11 Hierarchical Multi Agent DRL for Soft Handovers Between Edge Clouds in Open RAN F. Giarrè et.al. 2503.08493 null
2025-03-11 Hybrid Deep Reinforcement Learning for Radio Tracer Localisation in Robotic-assisted Radioguided Surgery Hanyi Zhang et.al. 2503.08492 null
2025-03-12 An Autonomous RL Agent Methodology for Dynamic Web UI Testing in a BDD Framework Ali Hassaan Mughal et.al. 2503.08464 null
2025-03-11 V-Max: Making RL practical for Autonomous Driving Valentin Charraut et.al. 2503.08388 link
2025-03-11 Gait in Eight: Efficient On-Robot Learning for Omnidirectional Quadruped Locomotion Nico Bohlinger et.al. 2503.08375 null
2025-03-11 LiPS: Large-Scale Humanoid Robot Reinforcement Learning with Parallel-Series Structures Qiang Zhang et.al. 2503.08349 null
2025-03-10 Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration Dylan J. Foster et.al. 2503.07453 null
2025-03-10 DRESS: Diffusion Reasoning-based Reward Shaping Scheme For Intelligent Networks Feiran You et.al. 2503.07433 null
2025-03-10 The Interplay of AI-and-RAN: Dynamic Resource Allocation for Converged 6G Platform Syed Danial Ali Shah et.al. 2503.07420 null
2025-03-10 Cost-Effective Design of Grid-tied Community Microgrid Moslem Uddin et.al. 2503.07414 null
2025-03-10 PER-DPP Sampling Framework and Its Application in Path Planning Junzhe Wang et.al. 2503.07411 null
2025-03-10 Towards Safe Robot Foundation Models Maximilian Tölle et.al. 2503.07404 null
2025-03-10 Q-MARL: A quantum-inspired algorithm using neural message passing for large-scale multi-agent reinforcement learning Kha Vo et.al. 2503.07397 null
2025-03-10 AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments Grik Tadevosyan et.al. 2503.07376 null
2025-03-10 MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Fanqing Meng et.al. 2503.07365 link
2025-03-10 Artificial Utopia: Simulation and Intelligent Agents for a Democratised Future Yannick Oswald et.al. 2503.07364 null
2025-03-07 Multi-Fidelity Policy Gradient Algorithms Xinjie Liu et.al. 2503.05696 null
2025-03-07 dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale Yihao Liu et.al. 2503.05646 null
2025-03-07 R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Huatong Song et.al. 2503.05592 null
2025-03-07 InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model Feeza Khan Khanzada et.al. 2503.05573 null
2025-03-07 Tractable Representations for Convergent Approximation of Distributional HJB Equations Julie Alhosh et.al. 2503.05563 null
2025-03-07 Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning Raphael Trumpp et.al. 2503.05546 null
2025-03-07 RiLoCo: An ISAC-oriented AI Solution to Build RIS-empowered Networks Guillermo Encinas-Lago et.al. 2503.05480 null
2025-03-07 Controllable Complementarity: Subjective Preferences in Human-AI Collaboration Chase McDonald et.al. 2503.05455 null
2025-03-07 R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Jiaxing Zhao et.al. 2503.05379 null
2025-03-07 Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning Hyungkyu Kang et.al. 2503.05306 null
2025-03-06 Sample-Optimal Agnostic Boosting with Unlabeled Data Udaya Ghai et.al. 2503.04706 null
2025-03-06 L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning Pranjal Aggarwal et.al. 2503.04697 null
2025-03-06 Multi-Agent Inverse Q-Learning from Demonstrations Nathaniel Haynam et.al. 2503.04679 null
2025-03-06 Learning Generalizable Language-Conditioned Cloth Manipulation from Long Demonstrations Hanyi Zhao et.al. 2503.04557 null
2025-03-06 PALo: Learning Posture-Aware Locomotion for Quadruped Robots Xiangyu Miao et.al. 2503.04462 null
2025-03-06 AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services Xiaoqi Wang et.al. 2503.04418 null
2025-03-06 Learning Transformer-based World Models with Contrastive Predictive Coding Maxime Burchi et.al. 2503.04416 null
2025-03-06 Energy-Aware Task Offloading for Rotatable STAR-RIS-Enhanced Mobile Edge Computing Systems Dongdong Yang et.al. 2503.04397 null
2025-03-06 Delay-Aware Digital Twin Synchronization in Mobile Edge Networks with Semantic Communications Bin Li et.al. 2503.04387 null
2025-03-06 Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models Niccolò Turcato et.al. 2503.04280 null
2025-03-05 Curating Demonstrations using Online Experience Annie S. Chen et.al. 2503.03707 null
2025-03-05 A Generative Approach to High Fidelity 3D Reconstruction from Text Data Venkat Kumar R et.al. 2503.03664 null
2025-03-05 Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns Dong Tian et.al. 2503.03660 null
2025-03-05 Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset Jessica Hoffmann et.al. 2503.03654 null
2025-03-05 Olympus: A Jumping Quadruped for Planetary Exploration Utilizing Reinforcement Learning for In-Flight Attitude Control Jørgen Anker Olsen et.al. 2503.03574 null
2025-03-05 Probabilistic Insights for Efficient Exploration Strategies in Reinforcement Learning Ernesto Garcia et.al. 2503.03565 null
2025-03-05 DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions Anna Kuchko et.al. 2503.03515 null
2025-03-05 SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning Borong Zhang et.al. 2503.03480 null
2025-03-05 Continuous Control of Diverse Skills in Quadruped Robots Without Complete Expert Datasets Jiaxin Tu et.al. 2503.03476 null
2025-03-05 Navigating Intelligence: A Survey of Google OR-Tools and Machine Learning for Global Path Planning in Autonomous Vehicles Alexandre Benoit et.al. 2503.03338 null
2025-03-04 Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation Han Xue et.al. 2503.02881 null
2025-03-04 AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation Songming Zhang et.al. 2503.02832 null
2025-03-04 Meta-Learning to Explore via Memory Density Feedback Kevin L. McKee et.al. 2503.02831 null
2025-03-04 Quantitative Resilience Modeling for Autonomous Cyber Defense Xavier Cadet et.al. 2503.02780 null
2025-03-04 Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning Qiyang Yan et.al. 2503.02738 null
2025-03-04 Learning-Based Passive Fault-Tolerant Control of a Quadrotor with Rotor Failure Jiehao Chen et.al. 2503.02649 null
2025-03-04 Human-aligned Safe Reinforcement Learning for Highway On-Ramp Merging in Dense Traffic Yang Li et.al. 2503.02624 null
2025-03-04 Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models Paul Stangel et.al. 2503.02623 null
2025-03-04 Reinforcement Learning-based Threat Assessment Wuzhou Sun et.al. 2503.02612 null
2025-03-04 What Makes a Model Breathe? Understanding Reinforcement Learning Reward Function Design in Biomechanical User Simulation Hannah Selder et.al. 2503.02571 null
2025-02-28 LLM Post-Training: A Deep Dive into Reasoning Large Language Models Komal Kumar et.al. 2502.21321 null
2025-02-28 ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers Alexander Scarlatos et.al. 2502.21267 null
2025-02-28 ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs Hao Ge et.al. 2502.21231 null
2025-02-28 A Method of Selective Attention for Reservoir Based Agents Kevin McKee et.al. 2502.21229 null
2025-02-28 Reducing Reward Dependence in RL Through Adaptive Confidence Discounting Muhammed Yusuf Satici et.al. 2502.21181 null
2025-02-28 Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning Léopold Maytié et.al. 2502.21142 null
2025-02-28 Dynamically Local-Enhancement Planner for Large-Scale Autonomous Driving Nanshan Deng et.al. 2502.21134 null
2025-02-28 AuthSim: Towards Authentic and Effective Safety-critical Scenario Generation for Autonomous Driving Tests Yukuan Yang et.al. 2502.21100 null
2025-02-28 Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control Taeho Lee et.al. 2502.21057 null
2025-02-28 Motion ReTouch: Motion Modification Using Four-Channel Bilateral Control Koki Inami et.al. 2502.20982 null
2025-02-27 Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids Toru Lin et.al. 2502.20396 null
2025-02-27 Multi-Turn Code Generation Through Single-Step Rewards Arnav Kumar Jain et.al. 2502.20380 null
2025-02-27 The Role of Tactile Sensing for Learning Reach and Grasp Boya Zhang et.al. 2502.20367 null
2025-02-27 Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning Thomas Budiarjo et.al. 2502.20348 null
2025-02-27 Safety Representations for Safer Policy Learning Kaustubh Mani et.al. 2502.20341 null
2025-02-27 Deep Reinforcement Learning based Autonomous Decision-Making for Cooperative UAVs: A Search and Rescue Real World Application Thomas Hickling et.al. 2502.20326 null
2025-02-27 On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+( $λ$,$λ$ ))-GA Tai Nguyen et.al. 2502.20265 null
2025-02-27 Explainable physics-based constraints on reinforcement learning for accelerator controls Jonathan Colen et.al. 2502.20247 null
2025-02-27 MARVEL: Multi-Agent Reinforcement Learning for constrained field-of-View multi-robot Exploration in Large-scale environments Jimmy Chiun et.al. 2502.20217 null
2025-02-27 Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies Zhouyu He et.al. 2502.20190 null
2025-02-26 Recurrent Auto-Encoders for Enhanced Deep Reinforcement Learning in Wilderness Search and Rescue Planning Jan-Hendrik Ewers et.al. 2502.19356 null
2025-02-26 Hybrid Robot Learning for Automatic Robot Motion Planning in Manufacturing Siddharth Singh et.al. 2502.19340 null
2025-02-26 WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies William Solow et.al. 2502.19308 null
2025-02-26 Combining Planning and Reinforcement Learning for Solving Relational Multiagent Domains Nikhilesh Prabhakar et.al. 2502.19297 null
2025-02-26 Deep Computerized Adaptive Testing Jiguang Li et.al. 2502.19275 null
2025-02-26 Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective Jiawei Huang et.al. 2502.19255 null
2025-02-26 ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration Minjie Zhu et.al. 2502.19250 null
2025-02-26 Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time Jiazheng Li et.al. 2502.19230 null
2025-02-26 When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning Yijiang River Dong et.al. 2502.19158 null
2025-02-26 Policy Testing with MDPFuzz (Replicability Study) Quentin Mazouni et.al. 2502.19116 null
2025-02-25 SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Yuxiang Wei et.al. 2502.18449 null
2025-02-25 MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning Chanwoo Park et.al. 2502.18439 null
2025-02-25 Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand Fengshuo Bai et.al. 2502.18423 null
2025-02-25 Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck Ryo Takizawa et.al. 2502.18121 null
2025-02-25 Controlling dynamics of stochastic systems with deep reinforcement learning Ruslan Mukhamadiarov et.al. 2502.18111 null
2025-02-25 From planning to policy: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation Haewon Jung et.al. 2502.18015 null
2025-02-25 NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms Yashan Wang et.al. 2502.18008 null
2025-02-25 Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric Zhenyu Tao et.al. 2502.17983 null
2025-02-25 FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real Weiheng Liu et.al. 2502.17894 null
2025-02-25 Sample-efficient diffusion-based control of complex nonlinear systems Hongyi Chen et.al. 2502.17893 null
2025-02-24 Event-Based Limit Order Book Simulation under a Neural Hawkes Process: Application in Market-Making Luca Lalor et.al. 2502.17417 null
2025-02-24 Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Alon Albalak et.al. 2502.17387 link
2025-02-24 Distributed Coordination for Heterogeneous Non-Terrestrial Networks Jikang Deng et.al. 2502.17366 null
2025-02-24 TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control Zifeng Zhuang et.al. 2502.17322 null
2025-02-24 Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach Jichen Li et.al. 2502.17307 null
2025-02-24 A Reinforcement Learning Approach to Non-prehensile Manipulation through Sliding Hamidreza Raei et.al. 2502.17221 null
2025-02-24 Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning Weiji Xie et.al. 2502.17219 null
2025-02-24 Teleology-Driven Affective Computing: A Causal Framework for Sustained Well-Being Bin Yin et.al. 2502.17172 null
2025-02-24 A Novel Multiple Access Scheme for Heterogeneous Wireless Communications using Symmetry-aware Continual Deep Reinforcement Learning Hamidreza Mazandarani et.al. 2502.17167 null
2025-02-24 MA2RL: Masked Autoencoders for Generalizable Multi-Agent Reinforcement Learning Jinyuan Feng et.al. 2502.17046 null
2025-02-21 BOSS: Benchmark for Observation Space Shift in Long-Horizon Task Yue Yang et.al. 2502.15679 null
2025-02-21 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling Florent Bartoccioni et.al. 2502.15672 link
2025-02-21 Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network Vincent Hsiao et.al. 2502.15662 null
2025-02-21 A Simulation Pipeline to Facilitate Real-World Robotic Reinforcement Learning Applications Jefferson Silveira et.al. 2502.15649 null
2025-02-21 Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach Xiangtong Yao et.al. 2502.15613 null
2025-02-21 SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning Xuyang Li et.al. 2502.15512 null
2025-02-21 Learning Long-Horizon Robot Manipulation Skills via Privileged Action Xiaofeng Mao et.al. 2502.15442 null
2025-02-21 TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning Giuseppe Paolo et.al. 2502.15425 null
2025-02-21 Hyperspherical Normalization for Scalable Deep Reinforcement Learning Hojoon Lee et.al. 2502.15280 null
2025-02-21 CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models Shunchang Liu et.al. 2502.15278 null
2025-02-20 Generating $π$ -Functional Molecules Using STGG+ with Active Learning Alexia Jolicoeur-Martineau et.al. 2502.14842 link
2025-02-20 Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models Vlad Sobal et.al. 2502.14819 null
2025-02-20 Making Universal Policies Universal Niklas Höpner et.al. 2502.14777 null
2025-02-20 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Tian Xie et.al. 2502.14768 link
2025-02-20 Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse Michael Doherty et.al. 2502.14741 null
2025-02-20 Length-Controlled Margin-Based Preference Optimization without Reference Model Gengxu Li et.al. 2502.14643 null
2025-02-20 Curiosity Driven Multi-agent Reinforcement Learning for 3D Game Testing Raihana Ferdous et.al. 2502.14606 null
2025-02-20 ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification Hyunseok Lee et.al. 2502.14565 link
2025-02-20 MLGym: A New Framework and Benchmark for Advancing AI Research Agents Deepak Nathani et.al. 2502.14499 link
2025-02-20 Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization Zhitao He et.al. 2502.14496 link
2025-02-19 A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects Arjun Gupta et.al. 2502.13964 null
2025-02-19 Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks Guilherme Palma et.al. 2502.13918 null
2025-02-19 Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning Antoine Moulin et.al. 2502.13900 null
2025-02-19 NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants Yiran Qin et.al. 2502.13894 null
2025-02-19 Uncertainty quantification for Markov chains with application to temporal difference learning Weichen Wu et.al. 2502.13822 null
2025-02-19 Learning to explore when mistakes are not allowed Charly Pecqueux-Guézénec et.al. 2502.13801 null
2025-02-19 User Agency and System Automation in Interactive Intelligent Systems Thomas Langerak et.al. 2502.13779 null
2025-02-19 Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values Hongbo Zhang et.al. 2502.13723 null
2025-02-19 Hierarchical RL-MPC for Demand Response Scheduling Maximilian Bloor et.al. 2502.13714 null
2025-02-19 User Association and Coordinated Beamforming in Cognitive Aerial-Terrestrial Networks: A Safe Reinforcement Learning Approach Zizhen Zhou et.al. 2502.13663 null
2025-02-18 Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Shuo Xing et.al. 2502.13146 link
2025-02-18 RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning Hao Gao et.al. 2502.13144 link
2025-02-18 Theorem Prover as a Judge for Synthetic Data Generation Joshua Ong Jun Leang et.al. 2502.13137 null
2025-02-18 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Mengkang Hu et.al. 2502.13092 link
2025-02-18 Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation Sha Li et.al. 2502.13019 null
2025-02-18 HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit Qingwei Ben et.al. 2502.13013 link
2025-02-18 Integrating Reinforcement Learning, Action Model Learning, and Numeric Planning for Tackling Complex Tasks Yarin Benyamin et.al. 2502.13006 link
2025-02-18 Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options Lakshmi Nair et.al. 2502.12929 link
2025-02-18 Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning Nandakishor M et.al. 2502.12876 null
2025-02-18 A Survey on DRL based UAV Communications and Networking: DRL Fundamentals, Applications and Implementations Wei Zhao et.al. 2502.12875 null
2025-02-17 Scaling Test-Time Compute Without Verification or RL is Suboptimal Amrith Setlur et.al. 2502.12118 null
2025-02-17 Unhackable Temporal Rewarding for Scalable Video MLLMs En Yu et.al. 2502.12081 link
2025-02-17 How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines Ayan Sengupta et.al. 2502.12051 null
2025-02-17 Theoretical Barriers in Bellman-Based Reinforcement Learning Brieuc Pinon et.al. 2502.11968 null
2025-02-17 Massively Scaling Explicit Policy-conditioned Value Functions Nico Bohlinger et.al. 2502.11949 null
2025-02-17 FitLight: Federated Imitation Learning for Plug-and-Play Autonomous Traffic Signal Control Yutong Ye et.al. 2502.11937 null
2025-02-17 VLP: Vision-Language Preference Learning for Embodied Manipulation Runze Liu et.al. 2502.11918 null
2025-02-17 CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning Yanxiao Zhao et.al. 2502.11896 null
2025-02-17 Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving? Natalie Grabowsky et.al. 2502.11864 null
2025-02-17 Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces Eric Eaton et.al. 2502.11828 null
2025-02-14 BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds Huayi Wang et.al. 2502.10363 null
2025-02-14 Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations Abdelrhman Shaheen et.al. 2502.10303 null
2025-02-14 Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding Laurin Luttmann et.al. 2502.10233 null
2025-02-14 Dynamic Reinforcement Learning for Actors Katsunari Shibata et.al. 2502.10200 null
2025-02-14 Reinforcement Learning based Constrained Optimal Control: an Interpretable Reward Design Jingjie Ni et.al. 2502.10187 null
2025-02-14 Combinatorial Reinforcement Learning with Preference Feedback Joongkyu Lee et.al. 2502.10158 null
2025-02-14 MonoForce: Learnable Image-conditioned Physics Engine Ruslan Agishev et.al. 2502.10156 null
2025-02-14 Cooperative Multi-Agent Planning with Adaptive Skill Synthesis Zhiyuan Li et.al. 2502.10148 null
2025-02-14 Provably Efficient RL under Episode-Wise Safety in Linear CMDPs Toshinori Kitamura et.al. 2502.10138 null
2025-02-14 Causal Information Prioritization for Efficient Reinforcement Learning Hongye Cao et.al. 2502.10097 null
2025-02-13 DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References Xueyi Liu et.al. 2502.09614 link
2025-02-13 Coupled Rendezvous and Docking Maneuver control of satellite using Reinforcement learning-based Adaptive Fixed-Time Sliding Mode Controller Rakesh Kumar Sahoo et.al. 2502.09517 null
2025-02-13 Variable Stiffness for Robust Locomotion through Reinforcement Learning Dario Spoljaric et.al. 2502.09436 null
2025-02-13 A Survey of Reinforcement Learning for Optimization in Automation Ahmad Farooq et.al. 2502.09417 null
2025-02-13 Generalizable Reinforcement Learning with Biologically Inspired Hyperdimensional Occupancy Grid Maps for Exploration and Goal-Directed Path Planning Shay Snyder et.al. 2502.09393 null
2025-02-13 Machine learning for modelling unstructured grid data in computational physics: a review Sibo Cheng et.al. 2502.09346 null
2025-02-13 Revisiting Topological Interference Management: A Learning-to-Code on Graphs Perspective Zhiwei Shan et.al. 2502.09344 null
2025-02-13 Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning Daniel Koutas et.al. 2502.09298 null
2025-02-13 Autonomous Task Completion Based on Goal-directed Answer Set Programming Alexis R. Tudor et.al. 2502.09208 null
2025-02-13 Logical Reasoning in Large Language Models: A Survey Hanmeng Liu et.al. 2502.09100 link
2025-02-12 Re $^3$ Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation Xiaoshen Han et.al. 2502.08645 link
2025-02-12 A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards Shivansh Patel et.al. 2502.08643 null
2025-02-12 Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning Dhruv Rohatgi et.al. 2502.08632 null
2025-02-12 Robot Data Curation with Mutual Information Estimators Joey Hejna et.al. 2502.08623 null
2025-02-12 Learning to Group and Grasp Multiple Objects Takahiro Yonemaru et.al. 2502.08452 null
2025-02-12 CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World Yankai Fu et.al. 2502.08449 null
2025-02-12 Acceleration of crystal structure relaxation with Deep Reinforcement Learning Elena Trukhan et.al. 2502.08405 null
2025-02-12 Learning Humanoid Standing-up Control across Diverse Postures Tao Huang et.al. 2502.08378 link
2025-02-12 Towards Principled Multi-Agent Task Agnostic Exploration Riccardo Zamboni et.al. 2502.08365 null
2025-02-12 Deterministic generation of non-classical mechanical states in cavity optomechanics via reinforcement learning Yu-Hong Liu et.al. 2502.08350 null
2025-02-11 Polynomial-Time Approximability of Constrained Reinforcement Learning Jeremy McMahan et.al. 2502.07764 null
2025-02-11 DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove Han Zhang et.al. 2502.07730 null
2025-02-11 Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning Aya Kayal et.al. 2502.07715 null
2025-02-11 A Unifying Framework for Causal Imitation Learning with Hidden Confounders Daqian Shao et.al. 2502.07656 null
2025-02-11 Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning Zhaoting Li et.al. 2502.07645 null
2025-02-11 Distributed Value Decomposition Networks with Networked Agents Guilherme S. Varela et.al. 2502.07635 null
2025-02-11 Evolution of cooperation in a bimodal mixture of conditional cooperators Chenyang Zhao et.al. 2502.07537 null
2025-02-11 Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization Daniel Palenicek et.al. 2502.07523 null
2025-02-11 Logarithmic Regret for Online KL-Regularized Reinforcement Learning Heyang Zhao et.al. 2502.07460 null
2025-02-11 Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation Erik M. Lintunen et.al. 2502.07423 null
2025-02-10 Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Chengqi Lyu et.al. 2502.06781 link
2025-02-10 On the Emergence of Thinking in LLMs I: Searching for the Right Intuition Guanghao Ye et.al. 2502.06773 link
2025-02-10 ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Ling Yang et.al. 2502.06772 link
2025-02-10 AgilePilot: DRL-Based Drone Agent for Real-Time Motion Planning in Dynamic Environments by Leveraging Object Detection Roohan Ahmed Khan et.al. 2502.06725 null
2025-02-10 Discovery of skill switching criteria for learning agile quadruped locomotion Wanming Yu et.al. 2502.06676 null
2025-02-10 Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series Aurélien Renault et.al. 2502.06584 null
2025-02-10 Predictive Red Teaming: Breaking Policies Without Breaking Robots Anirudha Majumdar et.al. 2502.06575 null
2025-02-10 Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Jean Vassoyan et.al. 2502.06533 link
2025-02-10 Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling Shenghong He et.al. 2502.06491 null
2025-02-10 SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding Shuhao Liao et.al. 2502.06440 null
2025-02-07 DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Yihe Deng et.al. 2502.05163 link
2025-02-07 Use of Winsome Robots for Understanding Human Feedback (UWU) Jessica Eggers et.al. 2502.05118 null
2025-02-07 3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery Xiuyuan Hu et.al. 2502.05107 link
2025-02-07 Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures Tushar Pandey et.al. 2502.05078 link
2025-02-07 Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning approach with Policy Distillation Wenqi Bai et.al. 2502.05069 null
2025-02-07 Seasonal Station-Keeping of Short Duration High Altitude Balloons using Deep Reinforcement Learning Tristan K. Schuler et.al. 2502.05014 null
2025-02-07 A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning Enhanced Approach Taiyi Wang et.al. 2502.05001 null
2025-02-07 Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits Finn Rietz et.al. 2502.04979 null
2025-02-07 Towards Smarter Sensing: 2D Clutter Mitigation in RL-Driven Cognitive MIMO Radar Adam Umra et.al. 2502.04967 null
2025-02-07 Fast Adaptive Anti-Jamming Channel Access via Deep Q Learning and Coarse-Grained Spectrum Prediction Jianshu Zhang et.al. 2502.04963 null
2025-02-06 DexterityGen: Foundation Controller for Unprecedented Dexterity Zhao-Heng Yin et.al. 2502.04307 null
2025-02-06 PILAF: Optimal Human Preference Sampling for Reward Modeling Yunzhen Feng et.al. 2502.04270 null
2025-02-06 Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning Wesley A. Suttle et.al. 2502.04141 null
2025-02-06 Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents Yuchen Lian et.al. 2502.04038 null
2025-02-06 Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning Nikunj Gupta et.al. 2502.04028 link
2025-02-06 Bilevel Multi-Armed Bandit-Based Hierarchical Reinforcement Learning for Interaction-Aware Self-Driving at Unsignalized Intersections Zengqi Peng et.al. 2502.03960 null
2025-02-06 Fairness Aware Reinforcement Learning via Proximal Policy Optimization Gabriele La Malfa et.al. 2502.03953 null
2025-02-06 CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning Yousef Koka et.al. 2502.03946 null
2025-02-06 Mirror Descent Actor Critic via Bounded Advantage Learning Ryo Iwaki et.al. 2502.03854 null
2025-02-06 PAGNet: Pluggable Adaptive Generative Networks for Information Completion in Multi-Agent Communication Zhuohui Zhang et.al. 2502.03845 null
2025-02-05 Deep Reinforcement Learning-Based Optimization of Second-Life Battery Utilization in Electric Vehicles Charging Stations Rouzbeh Haghighi et.al. 2502.03412 null
2025-02-05 Lightweight Authenticated Task Offloading in 6G-Cloud Vehicular Twin Networks Sarah Al-Shareeda et.al. 2502.03403 null
2025-02-05 Energy-Efficient Flying LoRa Gateways: A Multi-Agent Reinforcement Learning Approach Abdullahi Isa Ahmed et.al. 2502.03377 null
2025-02-05 Demystifying Long Chain-of-Thought Reasoning in LLMs Edward Yeo et.al. 2502.03373 link
2025-02-05 Learning from Active Human Involvement through Proxy Value Propagation Zhenghao Peng et.al. 2502.03369 null
2025-02-05 Conditional Prediction by Simulation for Automated Driving Fabian Konstantinidis et.al. 2502.03286 null
2025-02-05 Calibrated Unsupervised Anomaly Detection in Multivariate Time-series using Reinforcement Learning Saba Sanami et.al. 2502.03245 null
2025-02-05 Underwater Soft Fin Flapping Motion with Deep Neural Network Based Surrogate Model Yuya Hamamatsu et.al. 2502.03135 null
2025-02-05 Double Distillation Network for Multi-Agent Reinforcement Learning Yang Zhou et.al. 2502.03125 null
2025-02-05 HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller Qiyuan Zhang et.al. 2502.03122 null
2025-02-04 Flow Q-Learning Seohong Park et.al. 2502.02538 null
2025-02-04 Brief analysis of DeepSeek R1 and it’s implications for Generative AI Sarah Mercer et.al. 2502.02523 null
2025-02-04 Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Maohao Shen et.al. 2502.02508 null
2025-02-04 Towards Fast Graph Generation via Autoregressive Noisy Filtration Modeling Markus Krimmel et.al. 2502.02415 null
2025-02-04 Achieving Hiding and Smart Anti-Jamming Communication: A Parallel DRL Approach against Moving Reactive Jammer Yangyang Li et.al. 2502.02385 null
2025-02-04 Circular Microalgae-Based Carbon Control for Net Zero Federico Zocco et.al. 2502.02382 null
2025-02-04 Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning Donglin Zhan et.al. 2502.02332 null
2025-02-04 Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation Siyu Wang et.al. 2502.02327 null
2025-02-04 DIME:Diffusion-Based Maximum Entropy Reinforcement Learning Onur Celik et.al. 2502.02316 null
2025-02-04 MAGNNET: Multi-Agent Graph Neural Network-based Efficient Task Allocation for Autonomous Vehicles with Deep Reinforcement Learning Lavanya Ratnabala et.al. 2502.02311 null
2025-01-31 Vintix: Action Model via In-Context Reinforcement Learning Andrey Polubarov et.al. 2501.19400 link
2025-01-31 The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking Yuchun Miao et.al. 2501.19358 null
2025-01-31 Jackpot! Alignment as a Maximal Lottery Roberto-Rafael Maura-Rivero et.al. 2501.19266 null
2025-01-31 Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning Balint Gyevnar et.al. 2501.19256 null
2025-01-31 Linear $Q$ -Learning Does Not Diverge: Convergence Rates to a Bounded Set Xinyu Liu et.al. 2501.19254 null
2025-02-03 SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments Hüseyin Aydın et.al. 2501.19245 null
2025-01-31 An Empirical Game-Theoretic Analysis of Autonomous Cyber-Defence Agents Gregory Palmer et.al. 2501.19206 null
2025-01-31 APEX: Automated Parameter Exploration for Low-Power Wireless Protocols Mohamed Hassaan M. Hydher et.al. 2501.19194 null
2025-01-31 Test-Time Training Scaling for Chemical Exploration in Drug Design Morgan Thomas et.al. 2501.19153 null
2025-01-31 Decorrelated Soft Actor-Critic for Efficient Deep Reinforcement Learning Burcu Küçükoğlu et.al. 2501.19133 null
2025-01-30 Design and Validation of Learning Aware HMI For Learning-Enabled Increasingly Autonomous Systems Parth Ganeriwala et.al. 2501.18506 null
2025-01-30 Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor Fausto Mauricio Lagos Suarez et.al. 2501.18490 null
2025-01-30 Model-Free RL Agents Demonstrate System 1-Like Intentionality Hal Ashton et.al. 2501.18299 null
2025-01-30 Neural Operator based Reinforcement Learning for Control of first-order PDEs with Spatially-Varying State Delay Jiaqi Hu et.al. 2501.18201 null
2025-01-30 QNN-QRL: Quantum Neural Network Integrated with Quantum Reinforcement Learning for Quantum Key Distribution Bikash K. Behera et.al. 2501.18188 null
2025-01-30 Investigating Tax Evasion Emergence Using Dual Large Language Model and Deep Reinforcement Learning Powered Agent-based Simulation Teddy Lazebnik et.al. 2501.18177 null
2025-01-30 B3C: A Minimalist Approach to Offline Multi-Agent Reinforcement Learning Woojun Kim et.al. 2501.18138 null
2025-01-30 Diverse Preference Optimization Jack Lanchantin et.al. 2501.18101 null
2025-01-30 Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method Hoda Yamani et.al. 2501.18093 null
2025-01-30 DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems Se-Wook Yoo et.al. 2501.18086 null
2025-01-29 From Sparse to Dense: Toddler-inspired Reward Transition in Goal-Oriented Reinforcement Learning Junseok Park et.al. 2501.17842 null
2025-01-29 Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning Haque Ishfaq et.al. 2501.17827 null
2025-01-29 Consensus Based Stochastic Control Liyao Lyu et.al. 2501.17801 null
2025-01-29 CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization Derui Wang et.al. 2501.17667 link
2025-01-29 Accelerated DC loadflow solver for topology optimization Nico Westerbeck et.al. 2501.17529 null
2025-01-29 Human-Aligned Skill Discovery: Balancing Behaviour Exploration and Alignment Maxence Hussonnois et.al. 2501.17431 null
2025-01-29 Certificated Actor-Critic: Hierarchical Reinforcement Learning with Control Barrier Functions for Safe Navigation Junjun Xie et.al. 2501.17424 null
2025-01-29 Value Function Decomposition in Markov Recommendation Process Xiaobei Wang et.al. 2501.17409 null
2025-01-29 A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning Zhengpeng Xie et.al. 2501.17384 null
2025-01-29 ASAP: Learning Generalizable Online Bin Packing via Adaptive Selection After Pruning Han Fang et.al. 2501.17377 null
2025-01-28 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Tianzhe Chu et.al. 2501.17161 null
2025-01-28 Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning Rémy Hosseinkhan Boucher et.al. 2501.17115 null
2025-01-28 Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction Carl-Leander Henneking et.al. 2501.17112 null
2025-01-28 COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models Tobias Materzok et.al. 2501.17104 null
2025-01-28 Learning Mean Field Control on Sparse Graphs Christian Fabian et.al. 2501.17079 null
2025-01-28 Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning Anna Soligo et.al. 2501.17077 null
2025-01-28 Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies Manojkumar Parmar et.al. 2501.17030 null
2025-01-28 Network Slice-based Low-Altitude Intelligent Network for Advanced Air Mobility Kai Xiong et.al. 2501.17014 null
2025-01-28 Heterogeneity-aware Personalized Federated Learning via Adaptive Dual-Agent Reinforcement Learning Xi Chen et.al. 2501.16966 null
2025-01-28 On Rollouts in Model-Based Reinforcement Learning Bernd Frauenknecht et.al. 2501.16918 link
2025-01-27 Upside Down Reinforcement Learning with Policy Generators Jacopo Di Ventura et.al. 2501.16288 link
2025-01-27 Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach Yang Xu et.al. 2501.16243 null
2025-01-27 Towards General-Purpose Model-Free Reinforcement Learning Scott Fujimoto et.al. 2501.16142 link
2025-01-27 Quantifying the Self-Interest Level of Markov Social Dilemmas Richard Willis et.al. 2501.16138 null
2025-01-27 ReFill: Reinforcement Learning for Fill-In Minimization Elfarouk Harb et.al. 2501.16130 null
2025-01-27 Multi-Agent Meta-Offline Reinforcement Learning for Timely UAV Path Planning and Data Collection Eslam Eldeeb et.al. 2501.16098 null
2025-01-27 Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback Harry Emerson et.al. 2501.15972 null
2025-01-27 REINFORCE-ING Chemical Language Models in Drug Design Morgan Thomas et.al. 2501.15971 null
2025-01-27 Inverse Reinforcement Learning via Convex Optimization Hao Zhu et.al. 2501.15957 null
2025-01-27 Generative AI for Lyapunov Optimization Theory in UAV-based Low-Altitude Economy Networking Zhang Liu et.al. 2501.15928 null
2025-01-24 An Attentive Graph Agent for Topology-Adaptive Cyber Defence Ilya Orson Sandoval et.al. 2501.14700 link
2025-01-24 ACT-JEPA: Joint-Embedding Predictive Architecture Improves Policy Representation Learning Aleksandar Vujinovic et.al. 2501.14622 null
2025-01-24 COMIX: Generalized Conflict Management in O-RAN xApps – Architecture, Workflow, and a Power Control case Anastasios Giannopoulos et.al. 2501.14619 null
2025-01-24 Age and Power Minimization via Meta-Deep Reinforcement Learning in UAV Networks Sankani Sarathchandra et.al. 2501.14603 null
2025-01-24 Reducing Action Space for Deep Reinforcement Learning via Causal Effect Estimation Wenzhang Liu et.al. 2501.14543 link
2025-01-24 Breaking the Pre-Planning Barrier: Real-Time Adaptive Coordination of Mission and Charging UAVs Using Graph Reinforcement Learning Yuhan Hu et.al. 2501.14488 null
2025-01-24 MARL-OT: Multi-Agent Reinforcement Learning Guided Online Fuzzing to Detect Safety Violation in Autonomous Driving Systems Linfeng Liang et.al. 2501.14451 null
2025-01-24 Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent Lucía Güitta-López et.al. 2501.14443 null
2025-01-24 SKIL: Semantic Keypoint Imitation Learning for Generalizable Data-efficient Manipulation Shengjie Wang et.al. 2501.14400 null
2025-01-24 Reinforcement Learning for Efficient Returns Management Pascal Linden et.al. 2501.14394 null
2025-01-23 CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation Guofeng Cui et.al. 2501.13927 null
2025-01-23 Improving Video Generation with Human Feedback Jie Liu et.al. 2501.13918 link
2025-01-23 GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration Yue Fan et.al. 2501.13896 null
2025-01-23 Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning Matyáš Lorenc et.al. 2501.13883 link
2025-01-23 A space-decoupling framework for optimization on bounded-rank matrices with orthogonally invariant constraints Yan Yang et.al. 2501.13830 null
2025-01-23 Large Language Model driven Policy Exploration for Recommender Systems Jie Wang et.al. 2501.13816 null
2025-01-23 Integrating Causality with Neurochaos Learning: Proposed Approach and Research Agenda Nanjangud C. Narendra et.al. 2501.13763 null
2025-01-23 Scalable Safe Multi-Agent Reinforcement Learning for Multi-Agent System Haikuo Du et.al. 2501.13727 null
2025-01-23 WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control Claire Bizon Monroc et.al. 2501.13592 link
2025-01-23 Explainable AI-aided Feature Selection and Model Reduction for DRL-based V2X Resource Allocation Nasir Khan et.al. 2501.13552 null
2025-01-22 Which Sensor to Observe? Timely Tracking of a Joint Markov Source with Model Predictive Control Ismail Cosandal et.al. 2501.13099 null
2025-01-22 Attention-Driven Hierarchical Reinforcement Learning with Particle Filtering for Source Localization in Dynamic Fields Yiwei Shi et.al. 2501.13084 null
2025-01-22 Evolution and The Knightian Blindspot of Machine Learning Joel Lehman et.al. 2501.13075 null
2025-01-22 AdaWM: Adaptive World Model based Planning for Autonomous Driving Hang Wang et.al. 2501.13072 null
2025-01-22 Optimizing Return Distributions with Distributional Dynamic Programming Bernardo Ávila Pires et.al. 2501.13028 null
2025-01-22 MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking Sebastian Farquhar et.al. 2501.13011 null
2025-01-22 An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management Eslam Eldeeb et.al. 2501.12991 null
2025-01-22 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning DeepSeek-AI et.al. 2501.12948 link
2025-01-22 Offline Critic-Guided Diffusion Policy for Multi-User Delay-Constrained Scheduling Zhuoran Li et.al. 2501.12942 null
2025-01-22 Reinforcement learning Based Automated Design of Differential Evolution Algorithm for Black-box Optimization Xu Yang et.al. 2501.12881 null
2025-01-21 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Yuhang Zang et.al. 2501.12368 link
2025-01-21 ARM-IRL: Adaptive Resilience Metric Quantification Using Inverse Reinforcement Learning Abhijeet Sahu et.al. 2501.12362 null
2025-01-21 Sum Rate Enhancement using Machine Learning for Semi-Self Sensing Hybrid RIS-Enabled ISAC in THz Bands Sara Farrag Mobarak et.al. 2501.12353 null
2025-01-21 Towards neural reinforcement learning for large deviations in nonequilibrium systems with memory Venkata D. Pamulaparthy et.al. 2501.12333 null
2025-01-21 Heuristic Deep Reinforcement Learning for Phase Shift Optimization in RIS-assisted Secure Satellite Communication Systems with RSMA Tingnan Bao et.al. 2501.12311 null
2025-01-21 RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression Uri Gadot et.al. 2501.12216 null
2025-01-21 Experience-replay Innovative Dynamics Tuo Zhang et.al. 2501.12199 null
2025-01-21 Extend Adversarial Policy Against Neural Machine Translation via Unknown Token Wei Zou et.al. 2501.12183 null
2025-01-21 DNRSelect: Active Best View Selection for Deferred Neural Rendering Dongli Wu et.al. 2501.12150 null
2025-01-21 Tackling Uncertainties in Multi-Agent Reinforcement Learning through Integration of Agent Termination Dynamics Somnath Hazra et.al. 2501.12061 link
2025-01-17 DexForce: Extracting Force-informed Actions from Kinesthetic Demonstrations for Dexterous Manipulation Claire Chen et.al. 2501.10356 null
2025-01-17 Enhancing AI Transparency: XRL-Based Resource Management and RAN Slicing for 6G ORAN Architecture Suvidha Mhatre et.al. 2501.10292 null
2025-01-17 Enhancing UAV Path Planning Efficiency Through Accelerated Learning Joseanne Viana et.al. 2501.10141 null
2025-01-17 Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking Futian Wang et.al. 2501.10129 null
2025-01-17 PaSa: An LLM Agent for Comprehensive Academic Paper Search Yichen He et.al. 2501.10120 link
2025-01-17 GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning Zifeng Shi et.al. 2501.10116 null
2025-01-17 Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics Chenhao Li et.al. 2501.10100 null
2025-01-17 ForestProtector: An IoT Architecture Integrating Machine Vision and Deep Reinforcement Learning for Efficient Wildfire Monitoring Kenneth Bonilla-Ormachea et.al. 2501.09926 null
2025-01-17 SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning Haichao Zhang et.al. 2501.09905 null
2025-01-16 From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation Peilang Li et.al. 2501.09858 null
2025-01-16 Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Fengli Xu et.al. 2501.09686 null
2025-01-16 Optimizing hypergraph product codes with random walks, simulated annealing and reinforcement learning Bruno C. A. Freire et.al. 2501.09622 null
2025-01-16 Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment Chaoqi Wang et.al. 2501.09620 null
2025-01-16 EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning Siddharth Aravindan et.al. 2501.09611 null
2025-01-16 RE-POSE: Synergizing Reinforcement Learning-Based Partitioning and Offloading for Edge Object Detection Jianrui Shi et.al. 2501.09465 null
2025-01-16 ADAGE: A generic two-layer framework for adaptive agent based modelling Benjamin Patrick Evans et.al. 2501.09429 null
2025-01-16 Fast Searching of Extreme Operating Conditions for Relay Protection Setting Calculation Based on Graph Neural Network and Reinforcement Learning Yan Li et.al. 2501.09399 null
2025-01-16 Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse Guangyuan Liu et.al. 2501.09391 null
2025-01-16 Adaptive Contextual Caching for Mobile Edge Large Language Model Service Guangyuan Liu et.al. 2501.09383 null
2025-01-16 Solving Infinite-Player Games with Player-to-Strategy Networks Carlos Martin et.al. 2501.09330 null
2025-01-15 Computing Approximated Fixpoints via Dampened Mann Iteration Paolo Baldan et.al. 2501.08950 null
2025-01-15 A Reinforcement Learning Approach to Quiet and Safe UAM Traffic Management Surya Murthy et.al. 2501.08941 null
2025-01-15 Reinforcement learning-based adaptive time-integration for nonsmooth dynamics David Riley et.al. 2501.08934 null
2025-01-15 Projection Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning Xinchen Han et.al. 2501.08907 null
2025-01-15 Deep Learning Meets Queue-Reactive: A Framework for Realistic Limit Order Book Simulation Hamza Bodor et.al. 2501.08822 null
2025-01-15 Multi-visual modality micro drone-based structural damage detection Isaac Osei Agyemanga et.al. 2501.08807 null
2025-01-15 Networked Agents in the Dark: Team Value Learning under Partial Observability Guilherme S. Varela et.al. 2501.08778 null
2025-01-15 SPEQ: Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning Carlo Romeo et.al. 2501.08669 null
2025-01-15 Application of Deep Reinforcement Learning to UAV Swarming for Ground Surveillance Raúl Arranz et.al. 2501.08655 null
2025-01-15 RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation Kaiqu Liang et.al. 2501.08617 null
2025-01-14 FDPP: Fine-tune Diffusion Policy with Human Preference Yuxin Chen et.al. 2501.08259 null
2025-01-14 Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning Enrique Adrian Villarrubia-Martin et.al. 2501.08234 null
2025-01-14 Optimization of Link Configuration for Satellite Communication Using Reinforcement Learning Tobias Rohe et.al. 2501.08220 null
2025-01-14 In-situ graph reasoning and knowledge expansion using Graph-PReFLexOR Markus J. Buehler et.al. 2501.08120 null
2025-01-14 Data-driven inventory management for new products: A warm-start and adjusted Dyna- $Q$ approach Xinyu Qu et.al. 2501.08109 null
2025-01-14 Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving Guizhe Jin et.al. 2501.08096 null
2025-01-14 CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning Guoliang He et.al. 2501.08071 null
2025-01-14 Continual Reinforcement Learning for Digital Twin Synchronization Optimization Haonan Tong et.al. 2501.08045 null
2025-01-14 READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data Rohit Sharma et.al. 2501.08035 null
2025-01-14 Cooperative Patrol Routing: Optimizing Urban Crime Surveillance through Multi-Agent Reinforcement Learning Juan Palma-Borda et.al. 2501.08020 null
2025-01-13 SafeSwarm: Decentralized Safe RL for the Swarm of Drones Landing in Dense Crowds Grik Tadevosyan et.al. 2501.07566 null
2025-01-13 Improving DeFi Accessibility through Efficient Liquidity Provisioning with Deep Reinforcement Learning Haonan Xu et.al. 2501.07508 null
2025-01-13 RbRL2.0: Integrated Reward and Policy Learning for Rating-based Reinforcement Learning Mingkang Wu et.al. 2501.07502 null
2025-01-13 Online inductive learning from answer sets for efficient reinforcement learning exploration Celeste Veronese et.al. 2501.07445 null
2025-01-13 Attention when you need Lokesh Boominathan et.al. 2501.07440 null
2025-01-13 Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data Shilong Deng et.al. 2501.07346 link
2025-01-13 Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring Buse Sibel Korkmaz et.al. 2501.07324 link
2025-01-13 Mining Intraday Risk Factor Collections via Hierarchical Reinforcement Learning based on Transferred Options Wenyan Xu et.al. 2501.07274 null
2025-01-13 Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer Chongming Gao et.al. 2501.07212 null
2025-01-13 Generalizable Graph Neural Networks for Robust Power Grid Topology Control Matthijs de Jong et.al. 2501.07186 null
2025-01-10 From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training Julius Berner et.al. 2501.06148 link
2025-01-10 Vehicle-in-Virtual-Environment (VVE) Based Autonomous Driving Function Development and Evaluation Methodology for Vulnerable Road User Safety Haochong Chen et.al. 2501.06113 null
2025-01-10 Learning Flexible Heterogeneous Coordination with Capability-Aware Shared Hypernetworks Kevin Fu et.al. 2501.06058 null
2025-01-10 Investigating the Impact of Observation Space Design Choices On Training Reinforcement Learning Solutions for Spacecraft Problems Nathaniel Hamilton et.al. 2501.06016 null
2025-01-10 The Safe Trusted Autonomy for Responsible Space Program Kerianne L. Hobbs et.al. 2501.05984 null
2025-01-10 A Practical Demonstration of DRL-Based Dynamic Resource Allocation xApp Using OpenAirInterface Onur Sever et.al. 2501.05879 null
2025-01-10 Diffusion Models for Smarter UAVs: Decision-Making and Modeling Yousef Emami et.al. 2501.05819 null
2025-01-10 Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform Jingyi Cheng et.al. 2501.05808 null
2025-01-10 Understanding Impact of Human Feedback via Influence Functions Taywon Min et.al. 2501.05790 link
2025-01-09 Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning Tao Liu et.al. 2501.05591 null
2025-01-09 TimeRL: Efficient Deep Reinforcement Learning with Polyhedral Dependence Graphs Pedro F. Silvestre et.al. 2501.05408 null
2025-01-09 Search-o1: Agentic Search-Enhanced Large Reasoning Models Xiaoxi Li et.al. 2501.05366 link
2025-01-09 Knowledge Transfer in Model-Based Reinforcement Learning Agents for Efficient Multi-Task Learning Dmytro Kuzmenko et.al. 2501.05329 null
2025-01-09 Design and Control of a Bipedal Robotic Character Ruben Grandia et.al. 2501.05204 null
2025-01-09 Constrained Optimization of Charged Particle Tracking with Multi-Agent Reinforcement Learning Tobias Kortus et.al. 2501.05113 null
2025-01-09 LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models Zengqi Peng et.al. 2501.05057 null
2025-01-09 CuRLA: Curriculum Learning Based Deep Reinforcement Learning for Autonomous Driving Bhargava Uppuluri et.al. 2501.04982 null
2025-01-09 Promoting Shared Energy Storage Aggregation among High Price-Tolerance Prosumer: An Incentive Deposit and Withdrawal Service Xin Lu et.al. 2501.04964 null
2025-01-09 Balancing Exploration and Cybersickness: Investigating Curiosity-Driven Behavior in Virtual Environments Tangyao Li et.al. 2501.04905 null
2025-01-08 Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning Sergio Rozada et.al. 2501.04879 null
2025-01-08 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought Violet Xiang et.al. 2501.04682 null
2025-01-08 Framework for Integrating Machine Learning Methods for Path-Aware Source Routing Anees Al-Najjar et.al. 2501.04624 null
2025-01-08 MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data Zifan Wang et.al. 2501.04595 null
2025-01-08 HypeRL: Parameter-Informed Reinforcement Learning for Parametric PDEs Nicolò Botteghi et.al. 2501.04538 null
2025-01-08 Safe Reinforcement Learning with Minimal Supervision Alexander Quessy et.al. 2501.04481 null
2025-01-08 Research on environment perception and behavior prediction of intelligent UAV based on semantic communication Kechong Ren et.al. 2501.04480 null
2025-01-08 Hybrid Artificial Intelligence Strategies for Drone Navigation Rubén San-Segundo et.al. 2501.04472 null
2025-01-08 Risk-averse policies for natural gas futures trading using distributional reinforcement learning Félicien Hêche et.al. 2501.04421 null
2025-01-08 Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions Yu Ishihara et.al. 2501.04228 null
2025-01-07 Explainable Reinforcement Learning via Temporal Policy Decomposition Franco Ruggeri et.al. 2501.03902 null
2025-01-07 Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies Kexin Gu Baugh et.al. 2501.03888 null
2025-01-07 AlphaPO – Reward shape matters for LLM alignment Aman Gupta et.al. 2501.03884 null
2025-01-07 Online Reinforcement Learning-Based Dynamic Adaptive Evaluation Function for Real-Time Strategy Tasks Weilong Yang et.al. 2501.03824 null
2025-01-07 Run-and-tumble chemotaxis using reinforcement learning Ramesh Pramanik et.al. 2501.03687 null
2025-01-07 IEEE 802.11bn Multi-AP Coordinated Spatial Reuse with Hierarchical Multi-Armed Bandits Maksymilian Wojnar et.al. 2501.03680 null
2025-01-07 SALE-Based Offline Reinforcement Learning with Ensemble Q-Networks Zheng Chun et.al. 2501.03676 null
2025-01-07 Imitation Learning of MPC with Neural Networks: Error Guarantees and Sparsification Hendrik Alsmeier et.al. 2501.03671 null
2025-01-07 Rethinking Adversarial Attacks in Reinforcement Learning from Policy Distribution Perspective Tianyang Duan et.al. 2501.03562 null
2025-01-07 Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment Prashant Trivedi et.al. 2501.03486 null
2025-01-06 Turn-based Multi-Agent Reinforcement Learning Model Checking Dennis Gross et.al. 2501.03187 null
2025-01-06 Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies Dennis Gross et.al. 2501.03142 null
2025-01-06 CALM: Curiosity-Driven Auditing for Large Language Models Xiang Zheng et.al. 2501.02997 null
2025-01-06 CAMP: Collaborative Attention Model with Profiles for Vehicle Routing Problems Chuanbo Hua et.al. 2501.02977 null
2025-01-06 Sim-to-Real Transfer for Mobile Robots with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots Sahar Salimpour et.al. 2501.02902 link
2025-01-06 Revisiting Communication Efficiency in Multi-Agent Reinforcement Learning from the Dimensional Analysis Perspective Chuxiong Sun et.al. 2501.02888 null
2025-01-06 First-place Solution for Streetscape Shop Sign Recognition Competition Bin Wang et.al. 2501.02811 null
2025-01-06 Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model Yueqin Yin et.al. 2501.02790 null
2025-01-06 Joint Optimization of UAV-Carried IRS for Urban Low Altitude mmWave Communications with Deep Reinforcement Learning Wenwen Xie et.al. 2501.02787 null
2025-01-06 Learn A Flexible Exploration Model for Parameterized Action Markov Decision Processes Zijian Wang et.al. 2501.02774 null
2025-01-03 Evaluating Scenario-based Decision-making for Interactive Autonomous Driving Using Rational Criteria: A Survey Zhen Tian et.al. 2501.01886 null
2025-01-03 Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Yanjiang Liu et.al. 2501.01830 null
2025-01-03 Genetic algorithm enhanced Solovay-Kitaev algorithm for quantum compiling Jiangwei Long et.al. 2501.01746 null
2025-01-03 Proposing Hierarchical Goal-Conditioned Policy Planning in Multi-Goal Reinforcement Learning Gavin B. Rens et.al. 2501.01727 null
2025-01-03 Inversely Learning Transferable Rewards via Abstracted States Yikang Gui et.al. 2501.01669 null
2025-01-03 BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems Yinbo Yu et.al. 2501.01593 null
2025-01-02 Reinforcement-learning-based control of turbulent channel flows at high Reynolds numbers Zisong Zhou et.al. 2501.01573 null
2025-01-02 Reinforcement Learning for Respondent-Driven Sampling Justin Weltz et.al. 2501.01505 null
2025-01-02 Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension Yanbo Fang et.al. 2501.01332 null
2025-01-02 Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems Shunxing Yang et.al. 2501.01281 null
2025-01-02 PIMAEX: Multi-Agent Exploration through Peer Incentivization Michael Kölle et.al. 2501.01266 null
2025-01-02 Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method Ruichen Zhang et.al. 2501.01141 null
2025-01-02 Communicating Unexpectedness for Out-of-Distribution Multi-Agent Reinforcement Learning Min Whoo Lee et.al. 2501.01140 null
2025-01-02 Symmetries-enhanced Multi-Agent Reinforcement Learning Nikolaos Bousias et.al. 2501.01136 null
2025-01-02 Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning Chenglu Sun et.al. 2501.01085 null
2025-01-02 Enhancing Neural Adaptive Wireless Video Streaming via Lower-Layer Information Exposure and Online Tuning Lingzhi Zhao et.al. 2501.01044 null
2025-01-02 Energy-Efficient and Intelligent ISAC in V2X Networks with Spiking Neural Networks-Driven DRL Chen Shang et.al. 2501.01038 null
2025-01-02 Deep Reinforcement Learning for Job Scheduling and Resource Management in Cloud Computing: An Algorithm-Level Review Yan Gu et.al. 2501.01007 null
2024-12-30 Advances in Multi-agent Reinforcement Learning: Persistent Autonomy and Robot Learning Lab Report 2024 Reza Azadeh et.al. 2412.21088 null
2024-12-30 Learning Epidemiological Dynamics via the Finite Expression Method Jianda Du et.al. 2412.21049 null
2024-12-30 Weber-Fechner Law in Temporal Difference learning derived from Control as Inference Keiichiro Takahashi et.al. 2412.21004 null
2024-12-30 LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency Xiao-Yin Liu et.al. 2412.21001 link
2024-12-30 UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI Fangwei Zhong et.al. 2412.20977 null
2024-12-30 Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients Dongdong Li et.al. 2412.20845 null
2024-12-30 Isoperimetry is All We Need: Langevin Posterior Sampling for RL with Sublinear Regret Emilio Jorge et.al. 2412.20824 null
2024-12-29 The intrinsic motivation of reinforcement and imitation learning for sequential tasks Sao Mai Nguyen et.al. 2412.20573 null
2024-12-29 Diminishing Return of Value Expansion Methods Daniel Palenicek et.al. 2412.20537 link
2024-12-29 Game Theory and Multi-Agent Reinforcement Learning : From Nash Equilibria to Evolutionary Dynamics Neil De La Fuente et.al. 2412.20523 null
2024-12-27 From Ceilings to Walls: Universal Dynamic Perching of Small Aerial Robots on Surfaces with Variable Orientations Bryan Habas et.al. 2412.19765 null
2024-12-27 Adaptive Context-Aware Multi-Path Transmission Control for VR/AR Content: A Deep Reinforcement Learning Approach Shakil Ahmed et.al. 2412.19737 null
2024-12-27 Goal-oriented Communications based on Recursive Early Exit Neural Networks Jary Pomponi et.al. 2412.19587 null
2024-12-27 Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization Shixuan Liu et.al. 2412.19578 null
2024-12-27 Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing Yongbiao Gao et.al. 2412.19563 null
2024-12-27 Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning Xuan Zhou et.al. 2412.19538 null
2024-12-27 An Overview of Machine Learning-Driven Resource Allocation in IoT Networks Zhengdong Li et.al. 2412.19478 null
2024-12-27 DeepSeek-V3 Technical Report DeepSeek-AI et.al. 2412.19437 link
2024-12-27 Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback Seong Jin Lee et.al. 2412.19436 null
2024-12-27 Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe Kiran Koshy Thekumparampil et.al. 2412.19396 null
2024-12-24 Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making David Shoresh et.al. 2412.18593 null
2024-12-24 Dynamic Optimization of Portfolio Allocation Using Deep Reinforcement Learning Gang Huang et.al. 2412.18563 link
2024-12-24 Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving Hao Pang et.al. 2412.18511 null
2024-12-24 Joint Adaptive OFDM and Reinforcement Learning Design for Autonomous Vehicles: Leveraging Age of Updates Mamady Delamou et.al. 2412.18500 null
2024-12-24 Contrastive Representation for Interactive Recommendation Jingyu Li et.al. 2412.18396 link
2024-12-24 Navigating Data Corruption in Machine Learning: Balancing Quality, Quantity, and Imputation Strategies Qi Liu et.al. 2412.18296 null
2024-12-24 Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization Jiacai Liu et.al. 2412.18279 null
2024-12-24 Accelerating AIGC Services with Latent Action Diffusion Scheduling in Edge Networks Changfu Xu et.al. 2412.18212 link
2024-12-24 Quantum framework for Reinforcement Learning: integrating Markov Decision Process, quantum arithmetic, and trajectory search Thet Htar Su et.al. 2412.18208 null
2024-12-24 Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models Xiaomeng Hu et.al. 2412.18171 null
2024-12-23 HyperQ-Opt: Q-learning for Hyperparameter Optimization Md. Tarek Hasan et.al. 2412.17765 null
2024-12-23 Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking Yun Liu et.al. 2412.17730 null
2024-12-23 SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC Yue Deng et.al. 2412.17707 link
2024-12-23 Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning Huchen Jiang et.al. 2412.17397 null
2024-12-23 Reinforcement Learning with a Focus on Adjusting Policies to Reach Targets Akane Tsuboya et.al. 2412.17344 null
2024-12-23 Multimodal Deep Reinforcement Learning for Portfolio Optimization Sumit Nawathe et.al. 2412.17293 null
2024-12-23 LMD-PGN: Cross-Modal Knowledge Distillation from First-Person-View Images to Third-Person-View BEV Maps for Universal Point Goal Navigation Riku Uemura et.al. 2412.17282 null
2024-12-23 ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models Chengran Yang et.al. 2412.17264 null
2024-12-23 A Coalition Game for On-demand Multi-modal 3D Automated Delivery System Farzan Moosavi et.al. 2412.17252 null
2024-12-23 Model-free stochastic linear quadratic design by semidefinite programming Jing Guo et.al. 2412.17230 null
2024-12-20 Offline Reinforcement Learning for LLM Multi-Step Reasoning Huaijie Wang et.al. 2412.16145 null
2024-12-20 APIRL: Deep Reinforcement Learning for REST API Fuzzing Myles Foley et.al. 2412.15991 link
2024-12-20 Active Flow Control for Bluff Body under High Reynolds Number Turbulent Flow Conditions Using Deep Reinforcement Learning Jingbo Chen et.al. 2412.15975 null
2024-12-20 From General to Specific: Tailoring Large Language Models for Personalized Healthcare Ruize Shi et.al. 2412.15957 null
2024-12-20 What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning Yiran Ma et.al. 2412.15904 null
2024-12-20 Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback Jiaming Ji et.al. 2412.15838 link
2024-12-20 MacLight: Multi-scene Aggregation Convolutional Learning for Traffic Signal Control Sunbowen Lee et.al. 2412.15703 link
2024-12-20 AIR: Unifying Individual and Cooperative Exploration in Collective Multi-Agent Reinforcement Learning Guangchong Zhou et.al. 2412.15700 link
2024-12-20 Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning Lunjun Liu et.al. 2412.15639 null
2024-12-20 Dexterous Manipulation Based on Prior Dexterous Grasp Pose Knowledge Hengxu Yan et.al. 2412.15587 null
2024-12-19 Qwen2.5 Technical Report Qwen et.al. 2412.15115 null
2024-12-19 Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination Leonardo Barcellona et.al. 2412.14957 null
2024-12-19 Effective Method with Compression for Distributed and Federated Cocoercive Variational Inequalities Daniil Medyakov et.al. 2412.14935 null
2024-12-19 Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning Anthony Kobanda et.al. 2412.14865 null
2024-12-19 Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning Mohammadreza nakhaei et.al. 2412.14834 link
2024-12-19 Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning Aditya Kapoor et.al. 2412.14779 null
2024-12-19 Learning to Generate Research Idea with Dynamic Control Ruochen Li et.al. 2412.14626 null
2024-12-19 Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues Tao He et.al. 2412.14584 null
2024-12-19 Single-Loop Federated Actor-Critic across Heterogeneous Environments Ye Zhu et.al. 2412.14555 null
2024-12-18 Implementing TD3 to train a Neural Network to fly a Quadcopter through an FPV Gate Patrick Thomas et.al. 2412.14367 null
2024-12-18 Learning from Massive Human Videos for Universal Humanoid Pose Control Jiageng Mao et.al. 2412.14172 null
2024-12-18 Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective Zhiyuan Zeng et.al. 2412.14135 null
2024-12-18 Alignment faking in large language models Ryan Greenblatt et.al. 2412.14093 link
2024-12-18 Spatio-Temporal SIR Model of Pandemic Spread During Warfare with Optimal Dual-use Healthcare System Administration using Deep Reinforcement Learning Adi Shuchami et.al. 2412.14039 null
2024-12-18 Robust Optimal Safe and Stability Guaranteeing Reinforcement Learning Control for Quadcopter Sanghyoup Gu et.al. 2412.14003 null
2024-12-18 Harvesting energy from turbulent winds with Reinforcement Learning Lorenzo Basile et.al. 2412.13961 null
2024-12-18 RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation Kun Wu et.al. 2412.13877 null
2024-12-18 AI-Powered Algorithm-Centric Quantum Processor Topology Design Tian Li et.al. 2412.13805 link
2024-12-18 Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Pengxiang Li et.al. 2412.13795 link
2024-12-18 A hybrid learning agent for episodic learning tasks with unknown target distance Oliver Sefrin et.al. 2412.13686 null
2024-12-17 ExBody2: Advanced Expressive Humanoid Whole-Body Control Mazeyu Ji et.al. 2412.13196 null
2024-12-17 Tilted Quantile Gradient Updates for Quantile-Constrained Reinforcement Learning Chenglin Li et.al. 2412.13184 link
2024-12-17 Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions Juan Del Aguila Ferrandis et.al. 2412.13157 null
2024-12-17 Practicable Black-box Evasion Attacks on Link Prediction in Dynamic Graphs – A Graph Sequential Embedding Method Jiate Li et.al. 2412.13134 link
2024-12-17 Active Reinforcement Learning Strategies for Offline Policy Improvement Ambedkar Dukkipati et.al. 2412.13106 null
2024-12-17 Reservoir Computing for Fast, Simplified Reinforcement Learning on Memory Tasks Kevin McKee et.al. 2412.13093 null
2024-12-17 SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks Mátyás Vincze et.al. 2412.13053 null
2024-12-17 Relational Neurosymbolic Markov Models Lennert De Smet et.al. 2412.13023 null
2024-12-17 Future Aspects in Human Action Recognition: Exploring Emerging Techniques and Ethical Influences Antonios Gasteratos et.al. 2412.12990 null
2024-12-17 Guiding Generative Protein Language Models with Reinforcement Learning Filippo Stocco et.al. 2412.12979 null
2024-12-16 MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization Bhavya Sukhija et.al. 2412.12098 null
2024-12-16 Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation Eliot Xing et.al. 2412.12089 null
2024-12-16 Artificial Intelligence in Traffic Systems Ritwik Raj Saxena et.al. 2412.12046 null
2024-12-16 Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps Linfeng Zhao et.al. 2412.12024 null
2024-12-16 Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm Rajat Khanda et.al. 2412.12006 null
2024-12-16 AlphaZero Neural Scaling and Zipf’s Law: a Tale of Board Games and Power Laws Oren Neumann et.al. 2412.11979 link
2024-12-16 Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Qi Sun et.al. 2412.11974 link
2024-12-16 Hierarchical Meta-Reinforcement Learning via Automated Macro-Action Discovery Minjae Cho et.al. 2412.11930 null
2024-12-16 Generalized Bayesian deep reinforcement learning Shreya Sinha Roy et.al. 2412.11743 null
2024-12-16 Learning UAV-based path planning for efficient localization of objects using prior knowledge Rick van Essen et.al. 2412.11717 null
2024-12-13 A Novel Framework Using Deep Reinforcement Learning for Join Order Selection Chang Liu et.al. 2412.10253 null
2024-12-13 Physics Instrument Design with Reinforcement Learning Shah Rukh Qasim et.al. 2412.10237 null
2024-12-13 Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation Federico Julian Camerota Verdù et.al. 2412.10163 null
2024-12-13 AMUSE: Adaptive Model Updating using a Simulated Environment Louis Chislett et.al. 2412.10119 null
2024-12-13 Reward Machine Inference for Robotic Manipulation Mattijs Baert et.al. 2412.10096 null
2024-12-13 Optimized Coordination Strategy for Multi-Aerospace Systems in Pick-and-Place Tasks By Deep Neural Network Ye Zhang et.al. 2412.09877 null
2024-12-13 RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning Charles Xu et.al. 2412.09858 null
2024-12-13 ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression Kai Yao et.al. 2412.09812 null
2024-12-12 GainAdaptor: Learning Quadrupedal Locomotion with Dual Actors for Adaptable and Energy-Efficient Walking on Various Terrains Mincheol Kim et.al. 2412.09520 null
2024-12-12 Distributional Reinforcement Learning based Integrated Decision Making and Control for Autonomous Surface Vehicles Xi Lin et.al. 2412.09466 link
2024-12-12 Learning to Adapt: Bio-Inspired Gait Strategies for Versatile Quadruped Locomotion Joseph Humphreys et.al. 2412.09440 null
2024-12-12 Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer Adam Labiosa et.al. 2412.09417 null
2024-12-12 Does Low Spoilage Under Cold Conditions Foster Cultural Complexity During the Foraging Era? – A Theoretical and Computational Inquiry Minhyeok Lee et.al. 2412.09335 null
2024-12-12 Learning to be Indifferent in Complex Decisions: A Coarse Payoff-Assessment Model Philippe Jehiel et.al. 2412.09321 null
2024-12-12 Learning Novel Skills from Language-Generated Demonstrations Ao-Qun Jin et.al. 2412.09286 null
2024-12-12 Student-Informed Teacher Training Nico Messikommer et.al. 2412.09149 null
2024-12-12 Reconfigurable Intelligent Surface for Internet of Robotic Things Wanli Ni et.al. 2412.09117 null
2024-12-12 In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning Songjun Tu et.al. 2412.09104 null
2024-12-11 Learning Sketch Decompositions in Planning via Deep Reinforcement Learning Michael Aichmüller et.al. 2412.08574 null
2024-12-11 GenPlan: Generative sequence models as adaptive planners Akash Karthikeyan et.al. 2412.08565 null
2024-12-11 An End-to-End Collaborative Learning Approach for Connected Autonomous Vehicles in Occluded Scenarios Leandro Parada et.al. 2412.08562 null
2024-12-11 MaestroMotif: Skill Design from Artificial Intelligence Feedback Martin Klissarov et.al. 2412.08542 null
2024-12-11 Subspace-wise Hybrid RL for Articulated Object Manipulation Yujin Kim et.al. 2412.08522 null
2024-12-11 Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation Huiyuan Lai et.al. 2412.08473 null
2024-12-11 IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health Gauri Jain et.al. 2412.08463 link
2024-12-11 SINERGYM – A virtual testbed for building energy optimization with Reinforcement Learning Alejandro Campoy-Nieves et.al. 2412.08293 link
2024-12-11 Coarse-to-Fine: A Dual-Phase Channel-Adaptive Method for Wireless Image Transmission Hanlei Li et.al. 2412.08211 null
2024-12-11 Learn How to Query from Unlabeled Data Streams in Federated Learning Yuchang Sun et.al. 2412.08138 link
2024-12-10 Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control Chenhao Lu et.al. 2412.07773 null
2024-12-10 Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data Zhiyuan Zhou et.al. 2412.07762 null
2024-12-10 Optimizing Sensor Redundancy in Sequential Decision-Making Problems Jonas Nüßlein et.al. 2412.07686 null
2024-12-10 Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization Zongkai Liu et.al. 2412.07639 null
2024-12-10 Swarm Behavior Cloning Jonas Nüßlein et.al. 2412.07617 null
2024-12-10 Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery Amin Abyaneh et.al. 2412.07544 null
2024-12-10 ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning Hongshu Guo et.al. 2412.07507 null
2024-12-10 Optimizing pulsed blowing parameters for active separation control in a one-sided diffuser using reinforcement learning Alexandra Müller et.al. 2412.07480 null
2024-12-10 Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulation for Time-Efficient Fine-Resolution Policy Learning Yuki Kadokawa et.al. 2412.07477 null
2024-12-10 RLT4Rec: Reinforcement Learning Transformer for User Cold Start and Item Recommendation Dilina Chandika Rajapakse et.al. 2412.07403 null
2024-12-09 Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning Ali Devran Kara et.al. 2412.06735 null
2024-12-09 Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone Max Sobol Mark et.al. 2412.06685 null
2024-12-09 Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures Adrien Bolland et.al. 2412.06655 null
2024-12-09 Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Egor Cherepanov et.al. 2412.06531 null
2024-12-09 SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation Catalin E. Brita et.al. 2412.06486 link
2024-12-09 Edge Delayed Deep Deterministic Policy Gradient: efficient continuous control for edge scenarios Alberto Sinigaglia et.al. 2412.06390 null
2024-12-09 Tracking control of latent dynamic systems with application to spacecraft attitude control Congxi Zhang et.al. 2412.06342 null
2024-12-09 Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi F. Bredell et.al. 2412.06333 null
2024-12-09 Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information Junqiao Wang et.al. 2412.06313 null
2024-12-09 A Scalable Decentralized Reinforcement Learning Framework for UAV Target Localization Using Recurrent PPO Leon Fernando et.al. 2412.06231 null
2024-12-06 Reinforcement Learning: An Overview Kevin Murphy et.al. 2412.05265 null
2024-12-06 TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft Qian Long et.al. 2412.05255 link
2024-12-06 LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds James Beetham et.al. 2412.05232 null
2024-12-06 FlowPolicy: Enabling Fast and Robust 3D Flow-based Policy via Consistency Flow Matching for Robot Manipulation Qinglun Zhang et.al. 2412.04987 null
2024-12-06 Putting the Iterative Training of Decision Trees to the Test on a Real-World Robotic Task Raphael C. Engelhardt et.al. 2412.04974 null
2024-12-06 DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling Minzheng Wang et.al. 2412.04905 link
2024-12-06 Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment Ran Tian et.al. 2412.04835 null
2024-12-06 Learning-based Control for Tendon-Driven Continuum Robotic Arms Nima Maghooli et.al. 2412.04829 null
2024-12-06 A Temporally Correlated Latent Exploration for Reinforcement Learning SuMin Oh et.al. 2412.04775 null
2024-12-06 Measuring Goal-Directedness Matt MacDermott et.al. 2412.04758 null
2024-12-05 Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy Keru Chen et.al. 2412.04426 null
2024-12-05 Intersection-Aware Assessment of EMS Accessibility in NYC: A Data-Driven Approach Haoran Su et.al. 2412.04369 null
2024-12-05 Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting Edoardo Cetin et.al. 2412.04368 null
2024-12-05 Reinforcement Learning for Freeway Lane-Change Regulation via Connected Vehicles Ke Sun et.al. 2412.04341 null
2024-12-05 Action Mapping for Reinforcement Learning in Continuous Environments with Constraints Mirco Theile et.al. 2412.04327 null
2024-12-05 GRAM: Generalization in Deep RL with a Robust Adaptation Module James Queeney et.al. 2412.04323 link
2024-12-05 Reinforcement Learning from Wild Animal Videos Elliot Chane-Sane et.al. 2412.04273 null
2024-12-05 HyperMARL: Adaptive Hypernetworks for Multi-Agent RL Kale-ab Abebe Tessera et.al. 2412.04233 null
2024-12-05 A Dynamic Safety Shield for Safe and Efficient Reinforcement Learning of Navigation Tasks Murad Dawood et.al. 2412.04153 null
2024-12-05 Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning Shicheng Zhou et.al. 2412.04078 link
2024-12-04 AI-Driven Day-to-Day Route Choice Leizhen Wang et.al. 2412.03338 null
2024-12-04 Rotograb: Combining Biomimetic Hands with Industrial Grippers using a Rotating Thumb Arnaud Bersier et.al. 2412.03279 null
2024-12-04 Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning Mianchu Wang et.al. 2412.03258 null
2024-12-04 Alignment at Pre-training! Towards Native Alignment for Arabic LLMs Juhao Liang et.al. 2412.03253 link
2024-12-04 Variable-Speed Teaching-Playback as Real-World Data Augmentation for Imitation Learning Nozomu Masuya et.al. 2412.03252 null
2024-12-04 Using Deep Reinforcement Learning to Enhance Channel Sampling Patterns in Integrated Sensing and Communication Federico Mason et.al. 2412.03157 null
2024-12-04 Experience-driven discovery of planning strategies Ruiqi He et.al. 2412.03111 null
2024-12-04 Less is More: A Stealthy and Efficient Adversarial Attack Method for DRL-based Autonomous Driving Policies Junchao Fan et.al. 2412.03051 null
2024-12-04 Learning Whole-Body Loco-Manipulation for Omni-Directional Task Space Pose Tracking with a Wheeled-Quadrupedal-Manipulator Kaiwen Jiang et.al. 2412.03012 null
2024-12-04 Data Acquisition for Improving Model Fairness using Reinforcement Learning Jahid Hasan et.al. 2412.03009 null
2024-12-03 UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping Wenbo Wang et.al. 2412.02699 link
2024-12-03 Preliminary Investigation into Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving Yupeng Zheng et.al. 2412.02689 null
2024-12-03 T-REG: Preference Optimization with Token-Level Reward Regularization Wenxuan Zhou et.al. 2412.02685 link
2024-12-03 AI-Driven Resource Allocation Framework for Microservices in Hybrid Cloud Platforms Biman Barua et.al. 2412.02610 null
2024-12-03 Explainable CTR Prediction via LLM Reasoning Xiaohan Yu et.al. 2412.02588 null
2024-12-03 Mobile Cell-Free Massive MIMO with Multi-Agent Reinforcement Learning: A Scalable Framework Ziheng Liu et.al. 2412.02581 null
2024-12-03 Generating Critical Scenarios for Testing Automated Driving Systems Trung-Hieu Nguyen et.al. 2412.02574 link
2024-12-03 Cooperative Cruising: Reinforcement Learning based Time-Headway Control for Increased Traffic Efficiency Yaron Veksler et.al. 2412.02520 null
2024-12-03 Reinforcement learning to learn quantum states for Heisenberg scaling accuracy Jeongwoo Jae et.al. 2412.02334 null
2024-12-03 Optimizing Plastic Waste Collection in Water Bodies Using Heterogeneous Autonomous Surface Vehicles with Deep Reinforcement Learning Alejandro Mendoza Barrionuevo et.al. 2412.02316 null
2024-11-29 PDDLFuse: A Tool for Generating Diverse Planning Domains Vedant Khandelwal et.al. 2411.19886 null
2024-11-29 CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives Armin Saghafian et.al. 2411.19787 link
2024-11-29 HVAC-DPT: A Decision Pretrained Transformer for HVAC Control Anaïs Berkes et.al. 2411.19746 null
2024-11-29 Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning Severin Bochem et.al. 2411.19732 null
2024-11-29 RMIO: A Model-Based MARL Framework for Scenarios with Observation Loss in Some Agents Shi Zifeng et.al. 2411.19639 null
2024-11-29 Build An Influential Bot In Social Media Simulations With Large Language Models Bailu Jin et.al. 2411.19635 null
2024-11-29 Adaptive dynamics of Ising spins in one dimension leveraging Reinforcement Learning Anish Kumar et.al. 2411.19602 null
2024-11-29 Solving Rubik’s Cube Without Tricky Sampling Yicheng Lin et.al. 2411.19583 null
2024-11-29 Training Agents with Weakly Supervised Feedback from Large Language Models Dihong Gong et.al. 2411.19547 null
2024-11-29 A Local Information Aggregation based Multi-Agent Reinforcement Learning for Robot Swarm Dynamic Task Allocation Yang Lv et.al. 2411.19526 null
2024-11-27 Robust Offline Reinforcement Learning with Linearly Structured $f$ -Divergence Regularization Cheng Tang et.al. 2411.18612 null
2024-11-27 A Talent-infused Policy-gradient Approach to Efficient Co-Design of Morphology and Task Allocation Behavior of Multi-Robot Systems Prajit KrisshnaKumar et.al. 2411.18519 null
2024-11-27 G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation Tianxing Chen et.al. 2411.18369 null
2024-11-27 Two-Timescale Digital Twin Assisted Model Interference and Retraining over Wireless Network Jiayi Cong et.al. 2411.18329 null
2024-11-27 Application of Soft Actor-Critic Algorithms in Optimizing Wastewater Treatment with Time Delays Integration Esmaeel Mohammadi et.al. 2411.18305 null
2024-11-27 NeoHebbian Synapses to Accelerate Online Training of Neuromorphic Hardware Shubham Pande et.al. 2411.18272 null
2024-11-27 Dynamic Retail Pricing via Q-Learning – A Reinforcement Learning Framework for Enhanced Revenue Management Mohit Apte et.al. 2411.18261 null
2024-11-27 Dependency-Aware CAV Task Scheduling via Diffusion-Based Reinforcement Learning Xiang Cheng et.al. 2411.18230 null
2024-11-27 Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Di Zhang et.al. 2411.18203 link
2024-11-27 Learning for Long-Horizon Planning via Neuro-Symbolic Abductive Imitation Jie-Jing Shao et.al. 2411.18201 link
2024-11-26 Multi-Objective Reinforcement Learning for Automated Resilient Cyber Defence Ross O’Driscoll et.al. 2411.17585 null
2024-11-26 Ensuring Safety in Target Pursuit Control: A CBF-Safe Reinforcement Learning Approach Yaosheng Deng et.al. 2411.17552 null
2024-11-26 IMPROVE: Improving Medical Plausibility without Reliance on HumanValidation – An Enhanced Prototype-Guided Diffusion Framework Anurag Shandilya et.al. 2411.17535 null
2024-11-26 Spatially Visual Perception for End-to-End Robotic Learning Travis Davies et.al. 2411.17458 null
2024-11-26 BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving Teng Wang et.al. 2411.17404 null
2024-11-26 Joint Combinatorial Node Selection and Resource Allocations in the Lightning Network using Attention-based Reinforcement Learning Mahdi Salahshour et.al. 2411.17353 null
2024-11-26 SIL-RRT*: Learning Sampling Distribution through Self Imitation Learning Xuzhe Dang et.al. 2411.17293 null
2024-11-26 LHPF: Look back the History and Plan for the Future in Autonomous Driving Sheng Wang et.al. 2411.17253 null
2024-11-26 Self-reconfiguration Strategies for Space-distributed Spacecraft Tianle Liu et.al. 2411.17137 null
2024-11-26 LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward Ensemble Yujeong Lee et.al. 2411.17135 null
2024-11-25 Self-Generated Critiques Boost Reward Modeling for Language Models Yue Yu et.al. 2411.16646 null
2024-11-25 Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation Muhammad Burhan Hafez et.al. 2411.16532 link
2024-11-25 Reinforcement Learning for Bidding Strategy Optimization in Day-Ahead Energy Market Luca Di Persio et.al. 2411.16519 null
2024-11-25 Unsupervised Event Outlier Detection in Continuous Time Somjit Nath et.al. 2411.16427 null
2024-11-25 CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning Duo Wu et.al. 2411.16313 null
2024-11-25 Probing for Consciousness in Machines Mathis Immertreu et.al. 2411.16262 null
2024-11-25 Multi-Robot Reliable Navigation in Uncertain Topological Environments with Graph Attention Networks Zhuoyuan Yu et.al. 2411.16134 null
2024-11-25 End-to-End Steering for Autonomous Vehicles via Conditional Imitation Co-Learning Mahmoud M. Kishky et.al. 2411.16131 null
2024-11-25 Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision Masks Rui Zuo et.al. 2411.16120 null
2024-11-25 M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling Youngmin Oh et.al. 2411.16019 null
2024-11-22 WildLMa: Long Horizon Loco-Manipulation in the Wild Ri-Zhao Qiu et.al. 2411.15131 null
2024-11-22 Learning-based Trajectory Tracking for Bird-inspired Flapping-Wing Robots Jiaze Cai et.al. 2411.15130 null
2024-11-22 TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Nathan Lambert et.al. 2411.15124 link
2024-11-22 On Multi-Agent Inverse Reinforcement Learning Till Freihaut et.al. 2411.15046 null
2024-11-22 Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium Zeyang Li et.al. 2411.15036 null
2024-11-22 On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations Guojun Xiong et.al. 2411.15014 null
2024-11-22 Free Energy Projective Simulation (FEPS): Active inference with interpretability Joséphine Pazem et.al. 2411.14991 null
2024-11-22 Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation Huy Le et.al. 2411.14913 null
2024-11-22 Segmenting Action-Value Functions Over Time-Scales in SARSA using TD( $Δ$ ) Mahammad Humayoo et.al. 2411.14783 null
2024-11-22 Enhancing Molecular Design through Graph-based Topological Reinforcement Learning Xiangyu Zhang et.al. 2411.14726 null
2024-11-21 Multi-Agent Environments for Vehicle Routing Problems Ricardo Gama et.al. 2411.14411 null
2024-11-21 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Yu Zhao et.al. 2411.14405 link
2024-11-21 23 DoF Grasping Policies from a Raw Point Cloud Martin Matak et.al. 2411.14400 null
2024-11-21 Model Checking for Reinforcement Learning in Autonomous Driving: One Can Do More Than You Think! Rong Gu et.al. 2411.14375 null
2024-11-21 Convex Approximation of Probabilistic Reachable Sets from Small Samples Using Self-supervised Neural Networks Jun Xiang et.al. 2411.14356 null
2024-11-21 Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect Ojash Neopane et.al. 2411.14341 null
2024-11-21 Explainable Multi-Agent Reinforcement Learning for Extended Reality Codec Adaptation Pedro Enrique Iturria-Rivera et.al. 2411.14264 null
2024-11-21 Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs Zeyu Dong et.al. 2411.14256 null
2024-11-21 Natural Language Reinforcement Learning Xidong Feng et.al. 2411.14251 link
2024-11-21 Umbrella Reinforcement Learning – computationally efficient tool for hard non-linear problems Egor E. Nuzhin et.al. 2411.14117 null
2024-11-20 BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Davide Paglieri et.al. 2411.13543 link
2024-11-20 Metacognition for Unknown Situations and Environments (MUSE) Rodolfo Valiente et.al. 2411.13537 null
2024-11-20 Robust Monocular Visual Odometry using Curriculum Learning Assaf Lahiany et.al. 2411.13438 null
2024-11-20 A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback Alireza Rashidi Laleh et.al. 2411.13410 null
2024-11-20 Fine-tuning Myoelectric Control through Reinforcement Learning in a Game Environment Kilian Freitag et.al. 2411.13327 null
2024-11-20 Backward Stochastic Control System with Entropy Regularization Ziyue Chen et.al. 2411.13219 null
2024-11-20 ViSTa Dataset: Do vision-language models understand sequential tasks? Evžen Wybitul et.al. 2411.13211 link
2024-11-20 Engagement-Driven Content Generation with Large Language Models Erica Coppolillo et.al. 2411.13187 null
2024-11-20 Learning Time-Optimal and Speed-Adjustable Tactile In-Hand Manipulation Johannes Pitz et.al. 2411.13148 null
2024-11-20 ReinFog: A DRL Empowered Framework for Resource Management in Edge and Cloud Computing Environments Zhiyu Wang et.al. 2411.13121 null
2024-11-19 ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models Salma Kharrat et.al. 2411.12736 link
2024-11-19 Reinforcement Learning, Collusion, and the Folk Theorem Galit Askenazi-Golan et.al. 2411.12725 null
2024-11-19 UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments Chunru Lin et.al. 2411.12711 null
2024-11-19 Instant Policy: In-Context Imitation Learning via Graph Diffusion Vitalis Vosylius et.al. 2411.12633 null
2024-11-19 Robotic transcatheter tricuspid valve replacement with hybrid enhanced intelligence: a new paradigm and first-in-vivo study Shuangyi Wang et.al. 2411.12478 null
2024-11-19 Variable-Frequency Imitation Learning for Variable-Speed Motion Nozomu Masuya et.al. 2411.12310 null
2024-11-19 Emergence of Implicit World Models from Mortal Agents Kazuya Horibe et.al. 2411.12304 null
2024-11-19 DT-RaDaR: Digital Twin Assisted Robot Navigation using Differential Ray-Tracing Sunday Amatare et.al. 2411.12284 null
2024-11-19 Error-Feedback Model for Output Correction in Bilateral Control-Based Imitation Learning Hiroshi Sato et.al. 2411.12255 null
2024-11-19 Efficient Training in Multi-Agent Reinforcement Learning: A Communication-Free Framework for the Box-Pushing Problem David Ge et.al. 2411.12246 null
2024-11-18 Design And Optimization Of Multi-rendezvous Manoeuvres Based On Reinforcement Learning And Convex Optimization Antonio López Rivera et.al. 2411.11778 null
2024-11-18 High-Speed Cornering Control and Real-Vehicle Deployment for Autonomous Electric Vehicles Shiyue Zhao et.al. 2411.11762 null
2024-11-18 Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework Yannick Metz et.al. 2411.11761 null
2024-11-18 Aligning Few-Step Diffusion Models with Dense Reward Difference Learning Ziyi Zhang et.al. 2411.11727 link
2024-11-18 Bitcoin Under Volatile Block Rewards: How Mempool Statistics Can Influence Bitcoin Mining Roozbeh Sarenche et.al. 2411.11702 null
2024-11-18 Robust Reinforcement Learning under Diffusion Models for Data with Jumps Chenyang Jiang et.al. 2411.11697 null
2024-11-18 Coevolution of Opinion Dynamics and Recommendation System: Modeling Analysis and Reinforcement Learning Based Manipulation Yuhong Chen et.al. 2411.11687 null
2024-11-18 No-regret Exploration in Shuffle Private Reinforcement Learning Shaojie Bai et.al. 2411.11647 null
2024-11-18 Signaling and Social Learning in Swarms of Robots Leo Cazenille et.al. 2411.11616 null
2024-11-18 A Pre-Trained Graph-Based Model for Adaptive Sequencing of Educational Documents Jean Vassoyan et.al. 2411.11520 null
2024-11-15 Mitigating Parameter Degeneracy using Joint Conditional Diffusion Model for WECC Composite Load Model in Power Systems Feiqin Zhu et.al. 2411.10431 null
2024-11-15 Continual Adversarial Reinforcement Learning (CARL) of False Data Injection detection: forgetting and explainability Pooja Aslami et.al. 2411.10367 null
2024-11-15 BMP: Bridging the Gap between B-Spline and Movement Primitives Weiran Liao et.al. 2411.10336 null
2024-11-15 Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review Hossein Hassani et.al. 2411.10268 null
2024-11-15 Learning Generalizable 3D Manipulation With 10 Demonstrations Yu Ren et.al. 2411.10203 null
2024-11-15 The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning Moritz Schneider et.al. 2411.10175 null
2024-11-15 Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles Anant Garg et.al. 2411.10171 null
2024-11-15 Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention Libo Wang et.al. 2411.10156 link
2024-11-15 That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design Anna Goldie et.al. 2411.10053 null
2024-11-15 Enforcing Cooperative Safety for Reinforcement Learning-based Mixed-Autonomy Platoon Control Jingyuan Zhou et.al. 2411.10031 null
2024-11-14 A Risk Sensitive Contract-unified Reinforcement Learning Approach for Option Hedging Xianhua Peng et.al. 2411.09659 null
2024-11-14 Motion Before Action: Diffusing Object Motion as Manipulation Condition Yup Su et.al. 2411.09658 null
2024-11-14 Tailoring interactions between active nematic defects with reinforcement learning Carlos Floyd et.al. 2411.09588 null
2024-11-14 Developement of Reinforcement Learning based Optimisation Method for Side-Sill Design Aditya Borse et.al. 2411.09499 null
2024-11-14 Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment Yuang Cai et.al. 2411.09341 null
2024-11-14 Socio-Economic Consequences of Generative AI: A Review of Methodological Approaches Carlos J. Costa et.al. 2411.09313 null
2024-11-14 Enhancing reinforcement learning for population setpoint tracking in co-cultures Sebastián Espinel-Ríos et.al. 2411.09177 null
2024-11-14 Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging Bo Wang et.al. 2411.09176 null
2024-11-14 Rationality based Innate-Values-driven Reinforcement Learning Qin Yang et.al. 2411.09160 null
2024-11-14 Secrecy Energy Efficiency Maximization in IRS-Assisted VLC MISO Networks with RSMA: A DS-PPO approach Yangbo Guo et.al. 2411.09146 null
2024-11-13 LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs Piyush Jha et.al. 2411.08862 null
2024-11-13 Goal-oriented Semantic Communication for Robot Arm Reconstruction in Digital Twin: Feature and Temporal Selections Shutong Chen et.al. 2411.08835 null
2024-11-13 Recommender systems and reinforcement learning for building control and occupant interaction: A text-mining driven review of scientific literature Wenhao Zhang et.al. 2411.08734 null
2024-11-13 Joint Model Caching and Resource Allocation in Generative AI-Enabled Wireless Edge Networks Zhang Liu et.al. 2411.08672 null
2024-11-13 Estimating unknown parameters in differential equations with a reinforcement learning based PSO method Wenkui Sun et.al. 2411.08651 null
2024-11-13 Towards Secure Intelligent O-RAN Architecture: Vulnerabilities, Threats and Promising Technical Solutions using LLMs Mojdeh Karbalaee Motalleb et.al. 2411.08640 null
2024-11-13 Robot See, Robot Do: Imitation Reward for Noisy Financial Environments Sven Goluža et.al. 2411.08637 null
2024-11-13 Precision-Focused Reinforcement Learning Model for Robotic Object Pushing Lara Bergmann et.al. 2411.08622 link
2024-11-13 Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent Leonidas Askianakis et.al. 2411.08566 null
2024-11-13 Towards Practical Deep Schedulers for Allocating Cellular Radio Resources Petteri Kela et.al. 2411.08529 null
2024-11-12 Learning Memory Mechanisms for Decision Making through Demonstrations William Yue et.al. 2411.07954 link
2024-11-12 Doubly Mild Generalization for Offline Reinforcement Learning Yixiu Mao et.al. 2411.07934 link
2024-11-12 Scaling policy iteration based reinforcement learning for unknown discrete-time linear systems Zhen Pang et.al. 2411.07825 null
2024-11-12 Navigation with QPHIL: Quantizing Planner for Hierarchical Implicit Q-Learning Alexi Canesse et.al. 2411.07760 null
2024-11-12 Optimizing Traffic Signal Control using High-Dimensional State Representation and Efficient Deep Reinforcement Learning Lawrence Francis et.al. 2411.07759 null
2024-11-12 EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners Niklas Hanselmann et.al. 2411.07719 null
2024-11-12 Test Where Decisions Matter: Importance-driven Testing for Deep Reinforcement Learning Stefan Pranger et.al. 2411.07700 null
2024-11-12 Exploring Multi-Agent Reinforcement Learning for Unrelated Parallel Machine Scheduling Maria Zampella et.al. 2411.07634 null
2024-11-12 Direct Preference Optimization Using Sparse Feature-Level Constraints Qingyu Yin et.al. 2411.07618 null
2024-11-12 Entropy Controllable Direct Preference Optimization Motoki Omura et.al. 2411.07595 null
2024-11-11 ‘Explaining RL Decisions with Trajectories’: A Reproducibility Study Karim Abdel Sadek et.al. 2411.07200 link
2024-11-11 Joint Age-State Belief is All You Need: Minimizing AoII via Pull-Based Remote Estimation Ismail Cosandal et.al. 2411.07179 null
2024-11-11 Learning Multi-Agent Collaborative Manipulation for Long-Horizon Quadrupedal Pushing Chuye Hong et.al. 2411.07104 null
2024-11-11 A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs Myeongsoo Kim et.al. 2411.07098 null
2024-11-11 OCMDP: Observation-Constrained Markov Decision Process Taiyi Wang et.al. 2411.07087 null
2024-11-11 To Train or Not to Train: Balancing Efficiency and Training Cost in Deep Reinforcement Learning for Mobile Edge Computing Maddalena Boscaro et.al. 2411.07086 null
2024-11-11 Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching Arnav Kumar Jain et.al. 2411.07007 link
2024-11-11 Enhancing Robot Assistive Behaviour with Reinforcement Learning and Theory of Mind Antonio Andriella et.al. 2411.07003 link
2024-11-11 Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration Xingrui Yu et.al. 2411.06965 null
2024-11-11 Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC Aditya Soni et.al. 2411.06815 null
2024-11-08 Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles Jonas Kiemel et.al. 2411.05784 null
2024-11-08 Tract-RLFormer: A Tract-Specific RL policy based Decoder-only Transformer Network Ankita Joshi et.al. 2411.05757 null
2024-11-08 Topology-aware Reinforcement Feature Space Reconstruction for Graph Data Wangyang Ying et.al. 2411.05742 null
2024-11-08 Renewable Energy Powered and Open RAN-based Architecture for 5G Fixed Wireless Access Provisioning in Rural Areas Anselme Ndikumana et.al. 2411.05699 null
2024-11-08 Data-Driven Distributed Common Operational Picture from Heterogeneous Platforms using Multi-Agent Reinforcement Learning Indranil Sur et.al. 2411.05683 null
2024-11-08 Digital Twin Backed Closed-Loops for Energy-Aware and Open RAN-based Fixed Wireless Access Serving Rural Areas Anselme Ndikumana et.al. 2411.05664 null
2024-11-08 Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey Zhihong Liu et.al. 2411.05614 null
2024-11-08 Smart navigation through a rotating barrier: Deep reinforcement learning with application to size-based separation of active microagents Mohammad Hossein Masoudi et.al. 2411.05587 null
2024-11-08 Tangled Program Graphs as an alternative to DRL-based control algorithms for UAVs Hubert Szolc et.al. 2411.05586 null
2024-11-08 Towards Active Flow Control Strategies Through Deep Reinforcement Learning Ricard Montalà et.al. 2411.05536 null
2024-11-07 Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games Usman Anwar et.al. 2411.04976 link
2024-11-07 A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model Panwen Hu et.al. 2411.04942 null
2024-11-07 Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion Kaizhe Hu et.al. 2411.04919 link
2024-11-07 Evaluating Robustness of Reinforcement Learning Algorithms for Autonomous Shipping Bavo Lesy et.al. 2411.04915 null
2024-11-07 Think Smart, Act SMARL! Analyzing Probabilistic Logic Driven Safety in Multi-Agent Reinforcement Learning Satchit Chatterji et.al. 2411.04867 link
2024-11-07 Asymptotic regularity of a generalised stochastic Halpern scheme with applications Nicholas Pischke et.al. 2411.04845 null
2024-11-07 Plasticity Loss in Deep Reinforcement Learning: A Survey Timo Klein et.al. 2411.04832 null
2024-11-07 Harnessing the Power of Gradient-Based Simulations for Multi-Objective Optimization in Particle Accelerators Kishansingh Rajput et.al. 2411.04817 null
2024-11-07 AllGaits: Learning All Quadruped Gaits and Transitions Guillaume Bellegarda et.al. 2411.04787 null
2024-11-07 Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning Zuzanna Osika et.al. 2411.04784 link
2024-11-06 A Comparative Study of Deep Reinforcement Learning for Crop Production Management Joseph Balderas et.al. 2411.04106 null
2024-11-06 Interpretable and Efficient Data-driven Discovery and Control of Distributed Systems Florian Wolf et.al. 2411.04098 null
2024-11-06 Memorized action chunking with Transformers: Imitation learning for vision-based tissue surface scanning Bochen Yang et.al. 2411.04050 null
2024-11-06 Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset Alexandre Galashov et.al. 2411.04034 null
2024-11-06 Predicting and Publishing Accurate Imbalance Prices Using Monte Carlo Tree Search Fabio Pavirani et.al. 2411.04011 null
2024-11-06 Object-Centric Dexterous Manipulation from Human Motion Data Yuanpei Chen et.al. 2411.04005 null
2024-11-06 ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy Chenrui Tie et.al. 2411.03990 null
2024-11-06 AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making Yizhe Huang et.al. 2411.03865 link
2024-11-06 Beyond The Rainbow: High Performance Deep Reinforcement Learning On A Desktop PC Tyler Clark et.al. 2411.03820 null
2024-11-06 From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning Zhirui Deng et.al. 2411.03817 null
2024-11-05 Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy For Visuomotor Imitation Learning George Jiayuan Gao et.al. 2411.03294 null
2024-11-05 Pre-trained Visual Dynamics Representations for Efficient Policy Learning Hao Luo et.al. 2411.03169 null
2024-11-05 Hierarchical Orchestra of Policies Thomas P Cannon et.al. 2411.03008 null
2024-11-05 Accelerating Task Generalisation with Multi-Level Hierarchical Options Thomas P Cannon et.al. 2411.02998 null
2024-11-05 Autonomous Decision Making for UAV Cooperative Pursuit-Evasion Game with Reinforcement Learning Yang Zhao et.al. 2411.02983 null
2024-11-05 Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation Francisco Giral et.al. 2411.02975 null
2024-11-05 Embedding Safety into RL: A New Take on Trust Region Methods Nikola Milosevic et.al. 2411.02957 null
2024-11-05 The Unreasonable Effectiveness of LLMs for Query Optimization Peter Akioyamen et.al. 2411.02862 link
2024-11-05 ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate Shohei Taniguchi et.al. 2411.02853 link
2024-11-05 When to Localize? A Risk-Constrained Reinforcement Learning Approach Chak Lam Shek et.al. 2411.02788 null
2024-11-04 Simulation of Nanorobots with Artificial Intelligence and Reinforcement Learning for Advanced Cancer Cell Detection and Tracking Shahab Kavousinejad et.al. 2411.02345 link
2024-11-04 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Zehan Qi et.al. 2411.02337 null
2024-11-04 Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback Marcus Williams et.al. 2411.02306 link
2024-11-04 N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs Ilya Zisman et.al. 2411.01958 null
2024-11-04 RoboCrowd: Scaling Robot Data Collection through Crowdsourcing Suvir Mirchandani et.al. 2411.01915 null
2024-11-04 Efficient Active Imitation Learning with Random Network Distillation Emilien Biré et.al. 2411.01894 null
2024-11-04 Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback Guan-Ting Lin et.al. 2411.01834 null
2024-11-04 Risk-sensitive control as inference with Rényi divergence Kaito Ito et.al. 2411.01827 null
2024-11-04 IRS-Enhanced Secure Semantic Communication Networks: Cross-Layer and Context-Awared Resource Allocation Lingyi Wang et.al. 2411.01821 null
2024-11-04 So You Think You Can Scale Up Autonomous Robot Data Collection? Suvir Mirchandani et.al. 2411.01813 null
2024-10-31 EgoMimic: Scaling Imitation Learning via Egocentric Video Simar Kareer et.al. 2410.24221 link
2024-10-31 Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use Jiajun Xi et.al. 2410.24218 link
2024-10-31 ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs Yuchen Yang et.al. 2410.24214 null
2024-10-31 Zonal RL-RRT: Integrated RL-RRT Path Planning with Collision Probability and Zone Connectivity AmirMohammad Tahmasbi et.al. 2410.24205 link
2024-10-31 DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning Zhenyu Jiang et.al. 2410.24185 null
2024-10-31 Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning Jiaqi Liu et.al. 2410.24152 null
2024-10-31 Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers Kai Yan et.al. 2410.24108 link
2024-10-31 Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning Nabil Omi et.al. 2410.24096 null
2024-10-31 3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing Binghao Huang et.al. 2410.24091 null
2024-10-31 Demystifying Linear MDPs and Novel Dynamics Aggregation Framework Joongkyu Lee et.al. 2410.24089 null
2024-10-30 Keypoint Abstraction using Large Models for Object-Relative Imitation Learning Xiaolin Fang et.al. 2410.23254 null
2024-10-30 Carrot and Stick: Eliciting Comparison Data and Beyond Yiling Chen et.al. 2410.23243 null
2024-10-30 A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment Matteo G. Mecattaf et.al. 2410.23242 null
2024-10-30 COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences Yixin Liu et.al. 2410.23223 link
2024-10-31 Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval Sheryl Hsu et.al. 2410.23214 null
2024-10-30 Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks Michael Matthews et.al. 2410.23208 null
2024-10-30 Energy-Efficient Intra-Domain Network Slicing for Multi-Layer Orchestration in Intelligent-Driven Distributed 6G Networks: Learning Generic Assignment Skills with Unsupervised Reinforcement Learning Navideh Ghafouri et.al. 2410.23161 null
2024-10-30 VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning Yichao Liang et.al. 2410.23156 null
2024-10-30 From Hype to Reality: The Road Ahead of Deploying DRL in 6G Networks Haiyuan Li et.al. 2410.23086 null
2024-10-30 Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation Samuele Peri et.al. 2410.23031 null
2024-10-29 Environment as Policy: Learning to Race in Unseen Tracks Hongze Wang et.al. 2410.22308 null
2024-10-29 EconoJax: A Fast & Scalable Economic Simulation in Jax Koen Ponse et.al. 2410.22165 link
2024-10-29 Learning Successor Features the Simple Way Raymond Chua et.al. 2410.22133 null
2024-10-29 PC-Gym: Benchmark Environments For Process Control Problems Maximilian Bloor et.al. 2410.22093 null
2024-10-29 PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference Kendong Liu et.al. 2410.21966 null
2024-10-29 Human-Readable Programs as Actors of Reinforcement Learning Agents Using Critic-Moderated Evolution Senne Deproost et.al. 2410.21940 link
2024-10-29 Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning Jianlan Luo et.al. 2410.21845 link
2024-10-29 Robot Policy Learning with Temporal Optimal Transport Reward Yuwei Fu et.al. 2410.21795 link
2024-10-29 Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem Shaan Ul Haque et.al. 2410.21704 null
2024-10-29 Sequential choice in ordered bundles Rajeev Kohli et.al. 2410.21670 null
2024-10-28 LongReward: Improving Long-context Large Language Models with AI Feedback Jiajie Zhang et.al. 2410.21252 link
2024-10-28 Quantum Reinforcement Learning-Based Two-Stage Unit Commitment Framework for Enhanced Power Systems Robustness Xiang Wei et.al. 2410.21240 null
2024-10-28 Offline Reinforcement Learning With Combinatorial Action Spaces Matthew Landers et.al. 2410.21151 null
2024-10-28 Robustness and Generalization in Quantum Reinforcement Learning via Lipschitz Regularization Nico Meyer et.al. 2410.21117 link
2024-10-28 Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment Yi Zheng et.al. 2410.21109 null
2024-10-28 Stronger Regret Bounds for Safe Online Reinforcement Learning in the Linear Quadratic Regulator Benjamin Schiffer et.al. 2410.21081 null
2024-10-28 Getting By Goal Misgeneralization With a Little Help From a Mentor Tu Trinh et.al. 2410.21052 null
2024-10-28 FairStream: Fair Multimedia Streaming Benchmark for Reinforcement Learning Agents Jannis Weil et.al. 2410.21029 null
2024-10-28 Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies Franck Djeumou et.al. 2410.20990 null
2024-10-28 BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks Yunhan Zhao et.al. 2410.20971 null
2024-10-25 Adversarial Environment Design via Regret-Guided Diffusion Models Hojun Chung et.al. 2410.19715 null
2024-10-25 DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control Md Faizal Karim et.al. 2410.19712 null
2024-10-25 MILES: Making Imitation Learning Easy with Self-Supervision Georgios Papagiannis et.al. 2410.19693 null
2024-10-25 Automated generation of photonic circuits for Bell tests with homodyne measurements Corentin Lanore et.al. 2410.19670 null
2024-10-25 MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services Hongjia Wu et.al. 2410.19665 null
2024-10-25 Shared Control with Black Box Agents using Oracle Queries Inbal Avraham et.al. 2410.19612 null
2024-10-25 OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization Hongliang He et.al. 2410.19609 link
2024-10-25 Diverse Sign Language Translation Xin Shen et.al. 2410.19586 null
2024-10-25 Robotic Learning in your Backyard: A Neural Simulator from Open Source Components Liyou Zhou et.al. 2410.19564 null
2024-10-25 AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design Francisco Erivaldo Fernandes Junior et.al. 2410.19528 null
2024-10-24 SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment Caelan Garrett et.al. 2410.18907 null
2024-10-24 Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks Graziano A. Manduzio et.al. 2410.18890 null
2024-10-24 Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences Weijian Luo et.al. 2410.18881 null
2024-10-24 Learning Collusion in Episodic, Inventory-Constrained Markets Paul Friedrich et.al. 2410.18871 null
2024-10-24 Towards Visual Text Design Transfer Across Languages Yejin Choi et.al. 2410.18823 null
2024-10-24 PointPatchRL – Masked Reconstruction Improves Reinforcement Learning on Point Clouds Balázs Gyenes et.al. 2410.18800 null
2024-10-24 Adapting MLOps for Diverse In-Network Intelligence in 6G Era: Challenges and Solutions Peizheng Li et.al. 2410.18793 null
2024-10-24 Data Scaling Laws in Imitation Learning for Robotic Manipulation Fanqi Lin et.al. 2410.18647 link
2024-10-24 Multi-agent cooperation through learning-aware policy gradients Alexander Meulemans et.al. 2410.18636 null
2024-10-24 Leveraging Graph Neural Networks and Multi-Agent Reinforcement Learning for Inventory Control in Supply Chains Niki Kotecha et.al. 2410.18631 null
2024-10-23 Prioritized Generative Replay Renhao Wang et.al. 2410.18082 null
2024-10-23 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration Max Wilcoxson et.al. 2410.18076 link
2024-10-23 SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation Zihan Zhou et.al. 2410.18065 null
2024-10-23 Cross-lingual Transfer of Reward Models in Multilingual Alignment Jiwoo Hong et.al. 2410.18027 link
2024-10-23 Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning Nguyen Van Huynh et.al. 2410.17971 null
2024-10-23 Slot: Provenance-Driven APT Detection through Graph Reinforcement Learning Wei Qiao et.al. 2410.17910 null
2024-10-23 Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity Philip Amortila et.al. 2410.17904 null
2024-10-23 Scalable Offline Reinforcement Learning for Mean Field Games Axel Brunnbauer et.al. 2410.17898 null
2024-10-23 Learning Versatile Skills with Curriculum Masking Yao Tang et.al. 2410.17744 link
2024-10-23 Optimizing Load Scheduling in Power Grids Using Reinforcement Learning and Markov Decision Processes Dongwen Luo et.al. 2410.17696 null
2024-10-22 Few-shot In-Context Preference Learning Using Large Language Models Chao Yu et.al. 2410.17233 null
2024-10-22 DyPNIPP: Predicting Environment Dynamics for RL-based Robust Informative Path Planning Srujan Deolasee et.al. 2410.17186 null
2024-10-22 Reinforcement learning on structure-conditioned categorical diffusion for protein inverse folding Yasha Ektefaie et.al. 2410.17173 link
2024-10-22 Reinforcement Learning for Data-Driven Workflows in Radio Interferometry. I. Principal Demonstration in Calibration Brian M. Kirk et.al. 2410.17135 null
2024-10-22 Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards Alexander G. Padula et.al. 2410.17126 link
2024-10-22 Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning Haining Wang et.al. 2410.17088 link
2024-10-22 Delay-Constrained Grant-Free Random Access in MIMO Systems: Distributed Pilot Allocation and Power Control Jianan Bai et.al. 2410.17068 null
2024-10-22 Optimal Design for Reward Modeling in RLHF Antoine Scheid et.al. 2410.17055 null
2024-10-22 Proleptic Temporal Ensemble for Improving the Speed of Robot Tasks Generated by Imitation Learning Hyeonjun Park et.al. 2410.16981 null
2024-10-22 Safe Load Balancing in Software-Defined-Networking Lam Dinh et.al. 2410.16846 null
2024-10-21 Improve Vision Language Model Chain-of-thought Reasoning Ruohong Zhang et.al. 2410.16198 link
2024-10-21 RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style Yantao Liu et.al. 2410.16184 link
2024-10-21 SMART: Self-learning Meta-strategy Agent for Reasoning Tasks Rongxing Liu et.al. 2410.16128 link
2024-10-21 Statistical Inference for Temporal Difference Learning with Linear Function Approximation Weichen Wu et.al. 2410.16106 null
2024-10-21 A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models Yue Deng et.al. 2410.16024 link
2024-10-21 Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality Raghav Bongole et.al. 2410.16013 null
2024-10-21 ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning Yue Yang et.al. 2410.15994 null
2024-10-21 Learning Quadrotor Control From Visual Features Using Differentiable Simulation Johannes Heeg et.al. 2410.15979 null
2024-10-21 Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning Hanlin Yang et.al. 2410.15910 null
2024-10-21 FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL Woosung Koh et.al. 2410.15876 link
2024-10-18 Online Reinforcement Learning with Passive Memory Anay Pattanaik et.al. 2410.14665 null
2024-10-18 A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning Shengjie Sun et.al. 2410.14660 null
2024-10-18 Harnessing Causality in Reinforcement Learning With Bagged Decision Times Daiqi Gao et.al. 2410.14659 null
2024-10-18 Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments Mariusz Wisniewski et.al. 2410.14616 link
2024-10-18 Streaming Deep Reinforcement Learning Finally Works Mohamed Elsayed et.al. 2410.14606 link
2024-10-18 Reinforcement Learning in Non-Markov Market-Making Luca Lalor et.al. 2410.14504 null
2024-10-18 Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping Kavinayan P. Sivakumar et.al. 2410.14484 null
2024-10-18 DRL Optimization Trajectory Generation via Wireless Network Intent-Guided Diffusion Models for Optimizing Resource Allocation Junjie Wu et.al. 2410.14481 null
2024-10-18 From Simple to Complex: Knowledge Transfer in Safe and Efficient Reinforcement Learning for Autonomous Driving Rongliang Zhou et.al. 2410.14468 null
2024-10-18 MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation Toby Godfrey et.al. 2410.14383 null
2024-10-17 Diffusing States and Matching Scores: A New Framework for Imitation Learning Runzhe Wu et.al. 2410.13855 link
2024-10-17 ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization Chen Bo Calvin Zhang et.al. 2410.13837 link
2024-10-17 A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement Hui Yuan et.al. 2410.13828 link
2024-10-17 Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation Jean-Pierre Sleiman et.al. 2410.13817 null
2024-10-17 Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible? Argyrios Gerogiannis et.al. 2410.13772 null
2024-10-17 Transformer Guided Coevolution: Improved Team Formation in Multiagent Adversarial Games Pranav Rajbhandari et.al. 2410.13769 null
2024-10-17 Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design Chenyu Wang et.al. 2410.13643 link
2024-10-17 Ornstein-Uhlenbeck Adaptation as a Mechanism for Learning in Brains and Machines Jesus Garcia Fernandez et.al. 2410.13563 null
2024-10-17 Contracting With a Reinforcement Learning Agent by Playing Trick or Treat Matteo Bollini et.al. 2410.13520 null
2024-10-17 Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning Yoav Alon et.al. 2410.13501 null
2024-10-16 Neural-based Control for CubeSat Docking Maneuvers Matteo Stoisa et.al. 2410.12703 null
2024-10-16 Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach Henrique Donâncio et.al. 2410.12598 null
2024-10-16 Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving Sihao Wu et.al. 2410.12568 null
2024-10-16 Spectrum Sharing using Deep Reinforcement Learning in Vehicular Networks Riya Dinesh Deshpande et.al. 2410.12521 null
2024-10-16 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL Jared Joselowitz et.al. 2410.12491 null
2024-10-16 SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling Loris Gaven et.al. 2410.12481 null
2024-10-16 Sharpness-Aware Black-Box Optimization Feiyang Ye et.al. 2410.12457 null
2024-10-16 AoI-Aware Resource Allocation for Smart Multi-QoS Provisioning Jingqing Wang et.al. 2410.12384 null
2024-10-16 PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking Markus J. Buehler et.al. 2410.12375 link
2024-10-16 GAN Based Top-Down View Synthesis in Reinforcement Learning Environments Usama Younus et.al. 2410.12372 null
2024-10-15 Molecular Quantum Control Algorithm Design by Reinforcement Learning Anastasia Pipi et.al. 2410.11839 null
2024-10-15 Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions Ayush Jain et.al. 2410.11833 null
2024-10-15 Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies Zixuan Chen et.al. 2410.11825 null
2024-10-15 Solving The Dynamic Volatility Fitting Problem: A Deep Reinforcement Learning Approach Emmanuel Gnabeyeu et.al. 2410.11789 null
2024-10-15 Zero-shot Model-based Reinforcement Learning using Large Language Models Abdelhakim Benechehab et.al. 2410.11711 link
2024-10-15 BlendRL: A Framework for Merging Symbolic and Neural Policy Learning Hikaru Shindo et.al. 2410.11689 null
2024-10-15 Understanding Likelihood Over-optimisation in Direct Alignment Algorithms Zhengyan Shi et.al. 2410.11677 null
2024-10-15 Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents Federico Pizarro Bejarano et.al. 2410.11671 link
2024-10-15 Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search Jiamian Li et.al. 2410.11642 null
2024-10-15 DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment Wendi Chen et.al. 2410.11584 link
2024-10-14 Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation Youwei Yu et.al. 2410.10766 null
2024-10-14 Online Statistical Inference for Time-varying Sample-averaged Q-learning Saunak Kumar Panda et.al. 2410.10737 null
2024-10-14 Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Rory Young et.al. 2410.10674 null
2024-10-14 Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning William A. Stigall et.al. 2410.10660 null
2024-10-14 DR-MPC: Deep Residual Model Predictive Control for Real-world Social Navigation James R. Han et.al. 2410.10646 null
2024-10-14 Traversability-Aware Legged Navigation by Learning from Real-World Visual Data Hongbo Zhang et.al. 2410.10621 null
2024-10-14 Online waveform selection for cognitive radar Thulasi Tholeti et.al. 2410.10591 null
2024-10-14 STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack Naman Gupta et.al. 2410.10584 null
2024-10-14 Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes Juan Sebastian Rojas et.al. 2410.10578 null
2024-10-14 Continual Deep Reinforcement Learning to Prevent Catastrophic Forgetting in Jamming Mitigation Kemal Davaslioglu et.al. 2410.10521 null
2024-10-11 Hierarchical Universal Value Function Approximators Rushiv Arora et.al. 2410.08997 null
2024-10-11 Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control Devdhar Patel et.al. 2410.08979 null
2024-10-11 MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL Claas A Voelcker et.al. 2410.08896 null
2024-10-11 Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient Wenlong Wang et.al. 2410.08893 link
2024-10-11 Adaptive optimization of wave energy conversion in oscillatory wave surge converters via SPH simulation and deep reinforcement learning Mai Ye et.al. 2410.08871 null
2024-10-11 Can we hop in general? A discussion of benchmark selection and design using the Hopper environment Claas A Voelcker et.al. 2410.08870 null
2024-10-11 Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving Zijiang Yan et.al. 2410.08854 null
2024-10-11 Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback Michelle Zhao et.al. 2410.08852 null
2024-10-11 Public Transport Network Design for Equality of Accessibility via Message Passing Neural Networks and Reinforcement Learning Duo Wang et.al. 2410.08841 null
2024-10-11 SOLD: Reinforcement Learning with Slot Object-Centric Latent Dynamics Malte Mosbach et.al. 2410.08822 null
2024-10-10 GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment Yuancheng Xu et.al. 2410.08193 null
2024-10-10 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Amrith Setlur et.al. 2410.08146 null
2024-10-10 VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers Jianing Qi et.al. 2410.08048 null
2024-10-10 Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching Xiaoshan Lin et.al. 2410.08022 null
2024-10-10 Neuroplastic Expansion in Deep Reinforcement Learning Jiashun Liu et.al. 2410.07994 null
2024-10-10 Variational Inequality Methods for Multi-Agent Reinforcement Learning: Performance and Stability Gains Baraah A. M. Sidahmed et.al. 2410.07976 null
2024-10-10 AI Surrogate Model for Distributed Computing Workloads David K. Park et.al. 2410.07940 null
2024-10-10 Offline Hierarchical Reinforcement Learning via Inverse Optimization Carolin Schmidt et.al. 2410.07933 null
2024-10-10 Efficient Reinforcement Learning with Large Language Model Priors Xue Yan et.al. 2410.07927 null
2024-10-10 Meta-Learning Integration in Hierarchical Reinforcement Learning for Advanced Task Complexity Arash Khajooeinejad et.al. 2410.07921 link
2024-10-09 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Fabian Paischer et.al. 2410.07170 null
2024-10-09 Retrieval-Augmented Decision Transformer: External Memory for In-context RL Thomas Schmied et.al. 2410.07071 null
2024-10-09 Safe Reinforcement Learning Filter for Multicopter Collision-Free Tracking under disturbances Qihan Qi et.al. 2410.06852 null
2024-10-09 A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering Qihan Qi et.al. 2410.06847 null
2024-10-09 Transfer Learning for a Class of Cascade Dynamical Systems Shima Rabiei et.al. 2410.06828 null
2024-10-09 Deep End-to-End Survival Analysis with Temporal Consistency Mariana Vargas Vieyra et.al. 2410.06786 null
2024-10-09 Q-WSL:Leveraging Dynamic Programming for Weighted Supervised Learning in Goal-conditioned RL Xing Lei et.al. 2410.06648 null
2024-10-09 Variations in Multi-Agent Actor-Critic Frameworks for Joint Optimizations in UAV Swarm Networks: Recent Evolution, Challenges, and Directions Muhammad Morshed Alam et.al. 2410.06627 null
2024-10-09 Effective Exploration Based on the Structural Information Principles Xianghua Zeng et.al. 2410.06621 null
2024-10-09 Disturbance Observer-based Control Barrier Functions with Residual Model Learning for Safe Reinforcement Learning Dvij Kalaria et.al. 2410.06570 null
2024-10-07 DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control Kaifeng Zhao et.al. 2410.05260 null
2024-10-07 SePPO: Semi-Policy Preference Optimization for Diffusion Alignment Daoan Zhang et.al. 2410.05255 link
2024-10-07 ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control Ehsan Futuhi et.al. 2410.05225 null
2024-10-07 Smart Jamming Attack and Mitigation on Deep Transfer Reinforcement Learning Enabled Resource Allocation for Network Slicing Shavbo Salehi et.al. 2410.05153 null
2024-10-07 PAMLR: A Passive-Active Multi-Armed Bandit-Based Solution for LoRa Channel Allocation Jihoon Yun et.al. 2410.05147 null
2024-10-07 Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning Ayano Hiranaka et.al. 2410.05116 null
2024-10-07 AlphaRouter: Quantum Circuit Routing with Reinforcement Learning and Tree Search Wei Tang et.al. 2410.05115 null
2024-10-07 Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools Filippo A. Spinelli et.al. 2410.05093 null
2024-10-07 HE-Drive: Human-Like End-to-End Driving with Vision Language Models Junming Wang et.al. 2410.05051 null
2024-10-07 Active Fine-Tuning of Generalist Policies Marco Bagatella et.al. 2410.05026 null
2024-10-04 Learning Humanoid Locomotion over Challenging Terrain Ilija Radosavovic et.al. 2410.03654 null
2024-10-04 Aligning LLMs with Individual Preferences via Interaction Shujin Wu et.al. 2410.03642 link
2024-10-04 Robust Offline Imitation Learning from Diverse Auxiliary Data Udita Ghosh et.al. 2410.03626 null
2024-10-04 Open-World Reinforcement Learning over Long Short-Term Imagination Jiajian Li et.al. 2410.03618 null
2024-10-04 Training on more Reachable Tasks for Generalisation in Reinforcement Learning Max Weltevrede et.al. 2410.03565 null
2024-10-04 GAP-RL: Grasps As Points for RL Towards Dynamic Object Grasping Pengwei Xie et.al. 2410.03509 null
2024-10-04 STREAMS: An Assistive Multimodal AI Framework for Empowering Biosignal Based Robotic Controls Ali Rabiee et.al. 2410.03486 null
2024-10-04 Deep Reinforcement Learning for Delay-Optimized Task Offloading in Vehicular Fog Computin Mohammad Parsa Toopchinezhad et.al. 2410.03472 null
2024-10-04 CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control Guy Tevet et.al. 2410.03441 link
2024-10-04 ToolGen: Unified Tool Retrieval and Calling via Generation Renxi Wang et.al. 2410.03439 link
2024-10-03 ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI Ahmad Elawady et.al. 2410.02751 link
2024-10-03 MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions Yekun Chai et.al. 2410.02743 link
2024-10-03 DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects Zhaowei Wang et.al. 2410.02730 link
2024-10-03 Grounded Answers for Multi-agent Decision-making Problem through Generative World Model Zeyang Liu et.al. 2410.02664 null
2024-10-03 Beyond Expected Returns: A Policy Gradient Algorithm for Cumulative Prospect Theoretic Reinforcement Learning Olivier Lepel et.al. 2410.02605 null
2024-10-03 Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance Joshua McClellan et.al. 2410.02581 null
2024-10-03 Machine Learning Approaches for Active Queue Management: A Survey, Taxonomy, and Future Directions Mohammad Parsa Toopchinezhad et.al. 2410.02563 null
2024-10-03 Semantic-Guided RL for Interpretable Feature Engineering Mohamed Bouadi et.al. 2410.02519 null
2024-10-03 Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments Vasanth Reddy Baddam et.al. 2410.02516 null
2024-10-03 A Hitchhiker’s Guide To Active Motion Tobias Plasczyk et.al. 2410.02515 null
2024-10-02 Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space Yangming Li et.al. 2410.01796 null
2024-10-02 Open Human-Robot Collaboration using Decentralized Inverse Reinforcement Learning Prasanth Sengadu Suresh et.al. 2410.01790 null
2024-10-02 Investigating on RLHF methodology Alexey Kutalev et.al. 2410.01789 null
2024-10-02 Social coordination perpetuates stereotypic expectations and behaviors across generations in deep multi-agent reinforcement learning Rebekah A. Gelpí et.al. 2410.01763 null
2024-10-02 PreND: Enhancing Intrinsic Motivation in Reinforcement Learning through Pre-trained Network Distillation Mohammadamin Davoodabadi et.al. 2410.01745 null
2024-10-02 Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning Xingrui Gu et.al. 2410.01739 null
2024-10-02 Evaluating Robustness of Reward Models for Mathematical Reasoning Sunghwan Kim et.al. 2410.01729 null
2024-10-02 Performant, Memory Efficient and Scalable Multi-Agent Reinforcement Learning Omayma Mahjoub et.al. 2410.01706 null
2024-10-02 VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Amirhossein Kazemnejad et.al. 2410.01679 link
2024-10-02 Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning Jason Piquenot et.al. 2410.01661 null
2024-09-30 Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning Zhishuai Liu et.al. 2409.20521 null
2024-09-30 Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation Fukang Liu et.al. 2409.20514 null
2024-09-30 The Perfect Blend: Redefining RLHF with Mixture of Judges Tengyu Xu et.al. 2409.20370 null
2024-10-01 Enhancing GANs with Contrastive Learning-Based Multistage Progressive Finetuning SNN and RL-Based External Optimization Osama Mustafa et.al. 2409.20340 null
2024-09-30 MARLadona – Towards Cooperative Team Play Using Multi-Agent Reinforcement Learning Zichong Li et.al. 2409.20326 null
2024-09-30 RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning Yuxuan Wu et.al. 2409.20291 null
2024-09-30 Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning Junlin Lu et.al. 2409.20258 link
2024-09-30 Professor X: Manipulating EEG BCI with Invisible and Robust Backdoor Attack Xuan-Hao Liu et.al. 2409.20158 null
2024-09-30 GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation Yangtao Chen et.al. 2409.20154 null
2024-09-30 DRLinSPH: An open-source platform using deep reinforcement learning and SPHinXsys for fluid-structure-interaction problems Mai Ye et.al. 2409.20134 null
2024-09-27 Robust Deep Reinforcement Learning for Volt-VAR Optimization in Active Distribution System under Uncertainty Zhengrong Chen et.al. 2409.18937 null
2024-09-27 HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models Yu Zhou et.al. 2409.18893 null
2024-09-27 ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning Jannis Becktepe et.al. 2409.18827 link
2024-09-27 LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis Hamed Babaei Giglou et.al. 2409.18812 null
2024-09-27 Autoregressive Policy Optimization for Constrained Allocation Tasks David Winkel et.al. 2409.18735 link
2024-09-27 Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning Sheikh Salman Hassan et.al. 2409.18718 null
2024-09-27 Refutation of Spectral Graph Theory Conjectures with Search Algorithms) Milo Roucairol et.al. 2409.18626 null
2024-09-27 TemporalPaD: a reinforcement-learning framework for temporal feature representation and dimension reduction Xuechen Mu et.al. 2409.18597 null
2024-09-27 Climate Adaptation with Reinforcement Learning: Experiments with Flooding and Transportation in Copenhagen Miguel Costa et.al. 2409.18574 null
2024-09-27 Cost-Aware Dynamic Cloud Workflow Scheduling using Self-Attention and Evolutionary Reinforcement Learning Ya Shen et.al. 2409.18444 null
2024-09-26 Inverse Reinforcement Learning with Multiple Planning Horizons Jiayu Yao et.al. 2409.18051 null
2024-09-26 Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles Lewei He et.al. 2409.18014 null
2024-09-26 LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots Peilin Wu et.al. 2409.17992 null
2024-09-26 Navigation in a simplified Urban Flow through Deep Reinforcement Learning Federica Tonti et.al. 2409.17922 null
2024-09-26 Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions David Olivares et.al. 2409.17896 null
2024-09-26 Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness Jian Li et.al. 2409.17791 link
2024-09-26 Robust Ladder Climbing with a Quadrupedal Robot Dylan Vogel et.al. 2409.17731 null
2024-09-26 Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization Kaden Uhlig et.al. 2409.17673 null
2024-09-26 Hierarchical End-to-End Autonomous Driving: Integrating BEV Perception with Deep Reinforcement Learning Siyi Lu et.al. 2409.17659 null
2024-09-26 FactorSim: Generative Simulation via Factorized Representation Fan-Yun Sun et.al. 2409.17652 null
2024-09-25 Learning with Dynamics: Autonomous Regulation of UAV Based Communication Networks with Dynamic UAV Crew Ran Zhang et.al. 2409.17139 null
2024-09-25 Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action Xin Chen et.al. 2409.17138 null
2024-09-25 On-orbit Servicing for Spacecraft Collision Avoidance With Autonomous Decision Making Susmitha Patnala et.al. 2409.17125 null
2024-09-25 AI-Driven Risk-Aware Scheduling for Active Debris Removal Missions Antoine Poupon et.al. 2409.17012 null
2024-09-25 Multi-Robot Informative Path Planning for Efficient Target Mapping using Deep Reinforcement Learning Apoorva Vashisth et.al. 2409.16967 link
2024-09-25 Dynamic Obstacle Avoidance through Uncertainty-Based Adaptive Planning with Diffusion Vineet Punyamoorty et.al. 2409.16950 null
2024-09-25 Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering Wanqi Yang et.al. 2409.16909 null
2024-09-25 Revisiting Space Mission Planning: A Reinforcement Learning-Guided Approach for Multi-Debris Rendezvous Agni Bandyopadhyay et.al. 2409.16882 null
2024-09-25 Behavior evolution-inspired approach to walking gait reinforcement training for quadruped robots Yu Wang et.al. 2409.16862 null
2024-09-25 Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing Lyudong Jin et.al. 2409.16832 null
2024-09-24 A Critical Review of Safe Reinforcement Learning Techniques in Smart Grid Applications Van-Hai Bui et.al. 2409.16256 null
2024-09-24 Context-Based Meta Reinforcement Learning for Robust and Adaptable Peg-in-Hole Assembly Tasks Ahmed Shokry et.al. 2409.16208 null
2024-09-24 Microsecond-Latency Feedback at a Particle Accelerator by Online Reinforcement Learning on Hardware Luca Scomparin et.al. 2409.16177 null
2024-09-24 The Digital Transformation in Health: How AI Can Improve the Performance of Health Systems África Periáñez et.al. 2409.16098 null
2024-09-24 Whole-body end-effector pose tracking Tifanny Portela et.al. 2409.16048 null
2024-09-24 Bridging Environments and Language with Rendering Functions and Vision-Language Models Theo Cachet et.al. 2409.16024 null
2024-09-24 Provably Efficient Exploration in Inverse Constrained Reinforcement Learning Bo Yue et.al. 2409.15963 null
2024-09-24 Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning Sukai Huang et.al. 2409.15922 null
2024-09-24 Multi-UAV Pursuit-Evasion with Online Planning in Unknown Environments by Deep Reinforcement Learning Jiayu Chen et.al. 2409.15866 null
2024-09-24 Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection Matteo Zecchin et.al. 2409.15844 null
2024-09-18 DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control Zichen Jeff Cui et.al. 2409.12192 null
2024-09-18 Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games Ravi Pandya et.al. 2409.12153 null
2024-09-18 Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features Jiuqi Wang et.al. 2409.12135 null
2024-09-18 Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement An Yang et.al. 2409.12122 null
2024-09-18 IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition Rui Liu et.al. 2409.12092 null
2024-09-18 Generalized Robot Learning Framework Jiahuan Yan et.al. 2409.12061 null
2024-09-23 Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning Jonas Günster et.al. 2409.12045 link
2024-09-18 Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning Claude Formanek et.al. 2409.12001 null
2024-09-18 Data-Efficient Quadratic Q-Learning Using LMIs J. S. van Hulst et.al. 2409.11986 null
2024-09-18 Reinforcement Learning with Lie Group Orientations for Robotics Martin Schuck et.al. 2409.11935 null
2024-09-17 UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning Kathakoli Sengupta et.al. 2409.11403 null
2024-09-17 Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids Caio Fabio Oliveira da Silva et.al. 2409.11267 null
2024-09-17 Attacking Slicing Network via Side-channel Reinforcement Learning Attack Wei Shao et.al. 2409.11258 null
2024-09-17 LLM-as-a-Judge & Reward Model: What They Can and Cannot Do Guijin Son et.al. 2409.11239 null
2024-09-17 Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems Jake Welde et.al. 2409.11238 null
2024-09-17 Linear Jamming Bandits: Learning to Jam 5G-based Coded Communications Systems Zachary Schutz et.al. 2409.11191 null
2024-09-17 Preventing Unconstrained CBF Safety Filters Caused by Invalid Relative Degree Assumptions Lukas Brunke et.al. 2409.11171 null
2024-09-17 Co-Designing Tools and Control Policies for Robust Manipulation Yifei Dong et.al. 2409.11113 null
2024-09-17 Reactive Environments for Active Inference Agents with RxEnvironments.jl Wouter W. L. Nuijten et.al. 2409.11087 link
2024-09-17 A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler Nazim Bendib et.al. 2409.11068 null
2024-09-16 Instigating Cooperation among LLM Agents Using Adaptive Information Modulation Qiliang Chen et.al. 2409.10372 null
2024-09-16 Catch It! Learning to Catch in Flight with Mobile Dexterous Hands Yuanhang Zhang et.al. 2409.10319 null
2024-09-16 ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework Jiahao Yuan et.al. 2409.10289 null
2024-09-16 Safety-Oriented Pruning and Interpretation of Reinforcement Learning Policies Dennis Gross et.al. 2409.10218 null
2024-09-16 Enhancing RL Safety with Counterfactual LLM Reasoning Dennis Gross et.al. 2409.10188 null
2024-09-16 Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control Sebastian Hirt et.al. 2409.10171 null
2024-09-16 Quantile Regression for Distributional Reward Models in RLHF Nicolai Dorka et.al. 2409.10164 link
2024-09-16 Robust Reinforcement Learning with Dynamic Distortion Risk Measures Anthony Coache et.al. 2409.10096 null
2024-09-16 Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments Wessel Ledder et.al. 2409.10048 null
2024-09-16 Reinforcement learning-based statistical search strategy for an axion model from flavor Satsuki Nishimura et.al. 2409.10023 null
2024-09-13 The unknotting number, hard unknot diagrams, and reinforcement learning Taylor Applebaum et.al. 2409.09032 null
2024-09-13 Modeling Rational Adaptation of Visual Search to Hierarchical Structures Saku Sourulahti et.al. 2409.08967 null
2024-09-13 Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks Jean Seong Bjorn Choe et.al. 2409.08938 null
2024-09-13 AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models Yifei Yao et.al. 2409.08904 null
2024-09-13 Deep reinforcement learning for tracking a moving target in jellyfish-like swimming Yihao Chen et.al. 2409.08815 null
2024-09-13 DexSim2Real $^{2}$ : Building Explicit World Model for Precise Articulated Object Dexterous Manipulation Taoran Jiang et.al. 2409.08750 null
2024-09-13 Quasimetric Value Functions with Dense Rewards Khadichabonu Valieva et.al. 2409.08724 null
2024-09-13 Secure Offloading in NOMA-Aided Aerial MEC Systems Based on Deep Reinforcement Learning Hongjiang Lei et.al. 2409.08579 null
2024-09-13 Batch Ensemble for Variance Dependent Regret in Stochastic Bandits Asaf Cassel et.al. 2409.08570 null
2024-09-13 OIDM: An Observability-based Intelligent Distributed Edge Sensing Method for Industrial Cyber-Physical Systems Shigeng Wang et.al. 2409.08549 null
2024-09-12 Hand-Object Interaction Pretraining from Videos Himanshu Gaurav Singh et.al. 2409.08273 null
2024-09-12 Multi-Model based Federated Learning Against Model Poisoning Attack: A Deep Learning Based Model Selection for MEC Systems Somayeh Kianpisheh et.al. 2409.08237 null
2024-09-12 Towards Online Safety Corrections for Robotic Manipulation Policies Ariana Spalter et.al. 2409.08233 null
2024-09-12 Design Optimization of Nuclear Fusion Reactor through Deep Reinforcement Learning Jinsu Kim et.al. 2409.08231 null
2024-09-12 Adaptive Language-Guided Abstraction from Contrastive Explanations Andi Peng et.al. 2409.08212 null
2024-09-12 Optimal Management of Grid-Interactive Efficient Buildings via Safe Reinforcement Learning Xiang Huo et.al. 2409.08132 null
2024-09-12 Linear Complementary Dual Codes Constructed from Reinforcement Learning Yansheng Wu et.al. 2409.08114 null
2024-09-12 Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning Teng Yan et.al. 2409.08062 null
2024-09-12 Learning Causally Invariant Reward Functions from Diverse Demonstrations Ivan Ovinnikov et.al. 2409.08012 null
2024-09-12 Digital Twin for Autonomous Guided Vehicles based on Integrated Sensing and Communications Van-Phuc Bui et.al. 2409.08005 null
2024-09-11 Autonomous loading of ore piles with Load-Haul-Dump machines using Deep Reinforcement Learning Rodrigo Salas et.al. 2409.07449 null
2024-09-11 Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation Luo Ji et.al. 2409.07416 null
2024-09-11 Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching Eugenio Chisari et.al. 2409.07343 null
2024-09-11 Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence Luo Ji et.al. 2409.07341 null
2024-09-11 A Framework for Predicting the Impact of Game Balance Changes through Meta Discovery Akash Saravanan et.al. 2409.07340 null
2024-09-11 Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences Ziang Liu et.al. 2409.07268 null
2024-09-11 Perceptive Pedipulation with Local Obstacle Avoidance Jonas Stolle et.al. 2409.07195 null
2024-09-11 A Perspective on AI-Guided Molecular Simulations in VR: Exploring Strategies for Imitation Learning in Hyperdimensional Molecular Systems Mohamed Dhouioui et.al. 2409.07189 null
2024-09-11 Learning Efficient Recursive Numeral Systems via Reinforcement Learning Jonathan D. Thomas et.al. 2409.07170 null
2024-09-11 DCMAC: Demand-aware Customized Multi-Agent Communication via Upper Bound Training Dongkun Huo et.al. 2409.07127 null
2024-09-10 DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots Maria Bauza et.al. 2409.06613 null
2024-09-10 Advancements in Gesture Recognition Techniques and Machine Learning for Enhanced Human-Robot Interaction: A Comprehensive Review Sajjad Hussain et.al. 2409.06503 null
2024-09-10 Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout Atharva Gundawar et.al. 2409.06477 null
2024-09-10 Learning Generative Interactive Environments By Trained Agent Exploration Naser Kazemi et.al. 2409.06445 link
2024-09-10 Length Desensitization in Directed Preference Optimization Wei Liu et.al. 2409.06411 null
2024-09-10 One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion Nico Bohlinger et.al. 2409.06366 null
2024-09-10 Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning Shreyas S R et.al. 2409.06356 null
2024-09-10 Learning Augmentation Policies from A Model Zoo for Time Series Forecasting Haochen Yuan et.al. 2409.06282 null
2024-09-09 Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments Haritheja Etukuru et.al. 2409.05865 link
2024-09-09 An Introduction to Quantum Reinforcement Learning (QRL) Samuel Yen-Chi Chen et.al. 2409.05846 null
2024-09-09 Learning control of underactuated double pendulum with Model-Based Reinforcement Learning Niccolò Turcato et.al. 2409.05811 null
2024-09-09 Markov Chain Variance Estimation: A Stochastic Approximation Approach Shubhada Agrawal et.al. 2409.05733 null
2024-09-09 Cooperative Decision-Making for CAVs at Unsignalized Intersections: A MARL Approach with Attention and Hierarchical Game Priors Jiaqi Liu et.al. 2409.05712 null
2024-09-09 Interactive incremental learning of generalizable skills with local trajectory modulation Markus Knauer et.al. 2409.05655 null
2024-09-09 Forward KL Regularized Preference Optimization for Aligning Diffusion Policies Zhao Shan et.al. 2409.05622 null
2024-09-09 Adaptive Multi-Layer Deployment for A Digital Twin Empowered Satellite-Terrestrial Integrated Network Yihong Tao et.al. 2409.05480 null
2024-09-09 Reinforcement Learning for Variational Quantum Circuits Design Simone Foderà et.al. 2409.05475 null
2024-09-09 Semifactual Explanations for Reinforcement Learning Jasmina Gajcin et.al. 2409.05435 null
2024-09-06 RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs Jiaxing Wu et.al. 2409.04421 null
2024-09-06 Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization Minh Vu et.al. 2409.04374 null
2024-09-06 Refined Bounds on Near Optimality Finite Window Policies in POMDPs and Their Reinforcement Learning Yunus Emre Demirci et.al. 2409.04351 null
2024-09-06 Advancing Multi-Organ Disease Care: A Hierarchical Multi-Agent Reinforcement Learning Framework Daniel J. Tan et.al. 2409.04224 null
2024-09-06 The Prevalence of Neural Collapse in Neural Multivariate Regression George Andriopoulos et.al. 2409.04180 null
2024-09-06 Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering Jan Hofmann et.al. 2409.04122 null
2024-09-05 DRAL: Deep Reinforcement Adaptive Learning for Multi-UAVs Navigation in Unknown Indoor Environment Kangtong Mo et.al. 2409.03930 null
2024-09-05 Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning Huizhen Yu et.al. 2409.03915 null
2024-09-05 On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments Muxing Wang et.al. 2409.03897 null
2024-09-05 Multi-agent Path Finding for Mixed Autonomy Traffic Coordination Han Zheng et.al. 2409.03881 null
2024-09-05 Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron Christian Schmid et.al. 2409.03749 null
2024-09-05 Differentiable Discrete Event Simulation for Queuing Network Control Ethan Che et.al. 2409.03740 null
2024-09-05 On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization Yong Lin et.al. 2409.03650 null
2024-09-05 1 Modular Parallel Manipulator for Long-Term Soft Robotic Data Collection Kiyn Chin et.al. 2409.03614 null
2024-09-05 CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning John Birkbeck et.al. 2409.03577 null
2024-09-05 Sparsifying Parametric Models with L0 Regularization Nicolò Botteghi et.al. 2409.03489 null
2024-09-05 Reinforcement Learning Approach to Optimizing Profilometric Sensor Trajectories for Surface Inspection Sara Roos-Hoefgeest et.al. 2409.03429 null
2024-09-05 Game On: Towards Language Models as RL Experimenters Jingwei Zhang et.al. 2409.03402 null
2024-09-05 ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models Qi Ju et.al. 2409.03301 link
2024-09-05 Robust synchronization and policy adaptation for networked heterogeneous agents Miguel F. Arevalo-Castiblanco et.al. 2409.03273 null
2024-09-04 Hybrid Imitation-Learning Motion Planner for Urban Driving Cristian Gariboldi et.al. 2409.02871 null
2024-09-04 Knowledge Transfer for Collaborative Misbehavior Detection in Untrusted Vehicular Environments Roshan Sedar et.al. 2409.02844 null
2024-09-04 Tractable Offline Learning of Regular Decision Processes Ahana Deb et.al. 2409.02747 null
2024-09-04 Surgical Task Automation Using Actor-Critic Frameworks and Self-Supervised Imitation Learning Jingshuai Liu et.al. 2409.02724 null
2024-09-04 Decision Transformer for Enhancing Neural Local Search on the Job Shop Scheduling Problem Constantin Waubert de Puiseau et.al. 2409.02697 null
2024-09-04 Causality-Aware Transformer Networks for Robotic Navigation Ruoyu Wang et.al. 2409.02669 null
2024-09-04 A Survey on Emergent Language Jannik Peters et.al. 2409.02645 null
2024-09-04 Mamba as a motion encoder for robotic imitation learning Toshiaki Tsuji et.al. 2409.02636 null
2024-09-04 Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal Jifeng Hu et.al. 2409.02512 null
2024-09-04 USV-AUV Collaboration Framework for Underwater Tasks under Extreme Sea Conditions Jingzehua Xu et.al. 2409.02444 null
2024-08-30 Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control Zihao Sheng et.al. 2408.17380 link
2024-08-30 Stationary Policies are Optimal in Risk-averse Total-reward MDPs with EVaR Xihong Su et.al. 2408.17286 null
2024-08-30 Using Quantum Solved Deep Boltzmann Machines to Increase the Data Efficiency of RL Agents Daniel Kent et.al. 2408.17240 null
2024-08-30 MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models Yujing Wang et.al. 2408.17072 null
2024-08-30 Efficient Camera Exposure Control for Visual Odometry via Deep Reinforcement Learning Shuyang Zhang et.al. 2408.17005 link
2024-08-30 A Tighter Convergence Proof of Reverse Experience Replay Nan Jiang et.al. 2408.16999 link
2024-08-30 Discovery of False Data Injection Schemes on Frequency Controllers with Reinforcement Learning Romesh Prasad et.al. 2408.16958 null
2024-08-29 FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning Li-Heng Lin et.al. 2408.16944 null
2024-08-29 Manipulating OpenFlow Link Discovery Packet Forwarding for Topology Poisoning Mingming Chen et.al. 2408.16940 null
2024-08-29 Coverage Analysis of Multi-Environment Q-Learning Algorithms for Wireless Network Optimization Talha Bozkus et.al. 2408.16882 null
2024-08-29 Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models Alec Solway et.al. 2408.16753 null
2024-08-29 A GREAT Architecture for Edge-Based Graph Problems Like TSP Attila Lischka et.al. 2408.16717 null
2024-08-29 RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model Zhuan Shi et.al. 2408.16634 null
2024-08-29 Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning Keqin Li et.al. 2408.16633 null
2024-08-29 Phase Optimization and Relay Selection for Joint Relay and IRS-Assisted Communication Uyoata E. Uyoata et.al. 2408.16399 null
2024-08-29 EasyChauffeur: A Baseline Advancing Simplicity and Efficiency on Waymax Lingyu Xiao et.al. 2408.16375 null
2024-08-29 Efficient Multi-agent Navigation with Lightweight DRL Policy Xingrong Diao et.al. 2408.16370 null
2024-08-29 On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes Yi Wan et.al. 2408.16262 null
2024-08-28 DECAF: a Discrete-Event based Collaborative Human-Robot Framework for Furniture Assembly Giulio Giacomuzzo et.al. 2408.16125 null
2024-08-28 RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models Pritthijit Nath et.al. 2408.16118 link
2024-08-28 In-Context Imitation Learning via Next-Token Prediction Letian Fu et.al. 2408.15980 link
2024-08-28 Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games Nicholas R. Waytowich et.al. 2408.15950 null
2024-08-28 DeMoBot: Deformable Mobile Manipulation with Vision-based Sub-goal Retrieval Yuying Zhang et.al. 2408.15919 null
2024-08-28 Adaptive Traffic Signal Control Using Reinforcement Learning Muhammad Tahir Rafique et.al. 2408.15751 null
2024-08-28 Deep Reinforcement Learning for Radiative Heat Transfer Optimization Problems Eva Ortiz-Mansilla et.al. 2408.15727 null
2024-08-28 Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System Georg Schäfer et.al. 2408.15633 null
2024-08-28 Structural Optimization of Lightweight Bipedal Robot via SERL Yi Cheng et.al. 2408.15632 null
2024-08-28 Statistical QoS Provision in Business-Centric Networks Chang Wu et.al. 2408.15609 null
2024-08-28 Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning Minjong Yoo et.al. 2408.15593 null
2024-08-28 Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits Woojin Jeong et.al. 2408.15535 null
2024-08-27 SpecGuard: Specification Aware Recovery for Robotic Autonomous Vehicles from Physical Attacks Pritam Dash et.al. 2408.15200 null
2024-08-27 Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning Batuhan Yardim et.al. 2408.15173 null
2024-08-27 Applications in CityLearn Gym Environment for Multi-Objective Control Benchmarking in Grid-Interactive Buildings and Districts Kingsley Nweye et.al. 2408.15170 null
2024-08-27 muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults Deepak-George Thomas et.al. 2408.15150 null
2024-08-27 No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery Alexander Rutherford et.al. 2408.15099 link
2024-08-27 MiWaves Reinforcement Learning Algorithm Susobhan Ghosh et.al. 2408.15076 null
2024-08-27 Earth Observation Satellite Scheduling with Graph Neural Networks Antoine Jacquet et.al. 2408.15041 null
2024-08-27 Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data Han Xia et.al. 2408.14874 null
2024-08-27 Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation Haozhe Lou et.al. 2408.14873 null
2024-08-27 Learning Robust Reward Machines from Noisy Labels Roko Parac et.al. 2408.14871 link
2024-08-26 Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning Xinyang Gu et.al. 2408.14472 null
2024-08-26 Equivariant Reinforcement Learning under Partial Observability Hai Nguyen et.al. 2408.14336 null
2024-08-26 Efficient Active Flow Control Strategy for Confined Square Cylinder Wake Using Deep Learning-Based Surrogate Model and Reinforcement Learning Meng Zhang et.al. 2408.14232 null
2024-08-26 DynamicRouteGPT: A Real-Time Multi-Vehicle Dynamic Navigation Framework Based on Large Language Models Ziai Zhou et.al. 2408.14185 null
2024-08-26 Robot Navigation with Entity-Based Collision Avoidance using Deep Reinforcement Learning Yury Kolomeytsev et.al. 2408.14183 null
2024-08-26 ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners Xiangge Huang et.al. 2408.14086 null
2024-08-26 Bridging the gap between Learning-to-plan, Motion Primitives and Safe Reinforcement Learning Piotr Kicki et.al. 2408.14063 null
2024-08-26 Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning Joey Hejna et.al. 2408.14037 link
2024-08-26 Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning Wen-Han Hsieh et.al. 2408.14009 null
2024-08-26 Quantitative Representation of Scenario Difficulty for Autonomous Driving Based on Adversarial Policy Search Shuo Yang et.al. 2408.14000 null
2024-08-23 Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach Johan Peralez et.al. 2408.13139 null
2024-08-23 Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning Jihwan Oh et.al. 2408.13092 null
2024-08-23 Guiding IoT-Based Healthcare Alert Systems with Large Language Models Yulan Gao et.al. 2408.13071 null
2024-08-23 cc-DRL: a Convex Combined Deep Reinforcement Learning Flight Control Design for a Morphing Quadrotor Tao Yang et.al. 2408.13054 null
2024-08-23 In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting Haowei Du et.al. 2408.13028 null
2024-08-23 Robust Iterative Value Conversion: Deep Reinforcement Learning for Neurochip-driven Edge Robots Yuki Kadokawa et.al. 2408.13018 null
2024-08-23 SUMO: Search-Based Uncertainty Estimation for Model-Based Offline Reinforcement Learning Zhongjian Qiao et.al. 2408.12970 null
2024-08-23 SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning Wang Luo et.al. 2408.12830 null
2024-08-23 DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation Xiaowei Mao et.al. 2408.12809 null
2024-08-23 Intelligent OPC Engineer Assistant for Semiconductor Manufacturing Guojin Chen et.al. 2408.12775 null
2024-08-22 Controllable Text Generation for Large Language Models: A Survey Xun Liang et.al. 2408.12599 link
2024-08-22 Automating Deformable Gasket Assembly Simeon Adebola et.al. 2408.12593 null
2024-08-22 Human-In-The-Loop Machine Learning for Safe and Ethical Autonomous Vehicles: Principles, Challenges, and Opportunities Yousef Emami et.al. 2408.12548 null
2024-08-22 PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators Sam Earle et.al. 2408.12525 null
2024-08-22 EX-DRL: Hedging Against Heavy Losses with EXtreme Distributional Reinforcement Learning Parvin Malekzadeh et.al. 2408.12446 null
2024-08-22 Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning Yen-Ru Lai et.al. 2408.12307 null
2024-08-22 Domino-cooling Oscillator Networks with Deep Reinforcement Learning Sampreet Kalita et.al. 2408.12271 null
2024-08-22 UNCO: Towards Unifying Neural Combinatorial Optimization through Large Language Model Xia Jiang et.al. 2408.12214 null
2024-08-22 A Safety-Oriented Self-Learning Algorithm for Autonomous Driving: Evolution Starting from a Basic Model Shuo Yang et.al. 2408.12190 null
2024-08-22 A Safe and Efficient Self-evolving Algorithm for Decision-making and Control of Autonomous Driving Systems Shuo Yang et.al. 2408.12187 null
2024-08-21 Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction Anthony GX-Chen et.al. 2408.11816 null
2024-08-21 ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation Shiqi Yang et.al. 2408.11805 null
2024-08-21 Critique-out-Loud Reward Models Zachary Ankner et.al. 2408.11791 link
2024-08-21 Deviations from the Nash equilibrium and emergence of tacit collusion in a two-player optimal execution game with reinforcement learning Fabrizio Lillo et.al. 2408.11773 null
2024-08-21 Bayesian Optimization Framework for Efficient Fleet Design in Autonomous Multi-Robot Exploration David Molina Concha et.al. 2408.11751 null
2024-08-21 Optimizing Interpretable Decision Tree Policies for Reinforcement Learning Daniël Vos et.al. 2408.11632 link
2024-08-21 A Survey of Embodied Learning for Object-Centric Robotic Manipulation Ying Zheng et.al. 2408.11537 link
2024-08-22 Using Part-based Representations for Explainable Deep Reinforcement Learning Manos Kirtas et.al. 2408.11455 null
2024-08-21 Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration Cheng Xu et.al. 2408.11416 link
2024-08-21 Reflex-Based Open-Vocabulary Navigation without Prior Knowledge Using Omnidirectional Camera and Multiple Vision-Language Models Kento Kawaharazuka et.al. 2408.11380 null
2024-08-20 Accelerating Goal-Conditioned RL Algorithms and Research Michał Bortkiewicz et.al. 2408.11052 link
2024-08-20 RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands Yi Zhao et.al. 2408.11048 null
2024-08-20 Quantum Machine Learning Algorithms for Anomaly Detection: a Survey Sebastiano Corli et.al. 2408.11047 null
2024-08-20 Deep Reinforcement Learning for Network Energy Saving in 6G and Beyond Networks Dinh-Hieu Tran et.al. 2408.10974 null
2024-08-20 The Evolution of Reinforcement Learning in Quantitative Finance Nikolaos Pippas et.al. 2408.10932 null
2024-08-20 Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning Haozhe Ma et.al. 2408.10858 link
2024-08-20 Offline Model-Based Reinforcement Learning with Anti-Exploration Padmanaba Srinivasan et.al. 2408.10713 null
2024-08-20 Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation Shiming Xie et.al. 2408.10642 null
2024-08-20 Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search Jonathan Light et.al. 2408.10635 link
2024-08-20 Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams Litian Huang et.al. 2408.10592 link
2024-08-19 LEAD: Towards Learning-Based Equity-Aware Decarbonization in Ridesharing Platforms Mahsa Sahebdel et.al. 2408.10201 null
2024-08-19 Physics-Aware Combinatorial Assembly Planning using Deep Reinforcement Learning Ruixuan Liu et.al. 2408.10162 null
2024-08-19 $R^2$ -Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement Haoyang Wang et.al. 2408.10135 null
2024-08-19 Enhancing Reinforcement Learning Through Guided Search Jérôme Arjonilla et.al. 2408.10113 null
2024-08-19 Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Sriyash Poddar et.al. 2408.10075 null
2024-08-19 Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm Nikolai Rozanov et.al. 2408.10055 null
2024-08-19 Adaptive BESS and Grid Setpoints Optimization: A Model-Free Framework for Efficient Battery Management under Dynamic Tariff Pricing Alaa Selim et.al. 2408.09989 null
2024-08-19 The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective Renye Yan et.al. 2408.09974 null
2024-08-19 GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits Gongpu Chen et.al. 2408.09882 null
2024-08-19 ShortCircuit: AlphaZero-Driven Circuit Design Dimitrios Tsaras et.al. 2408.09858 null
2024-08-16 HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis Zhi-Bo Liu et.al. 2408.08847 link
2024-08-16 CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk Mohamad Fares El Hajj Chehade et.al. 2408.08812 null
2024-08-16 Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions Bhuvanashree Murugadoss et.al. 2408.08781 null
2024-08-16 SYMPOL: Symbolic Tree-Based On-Policy Reinforcement Learning Sascha Marton et.al. 2408.08761 link
2024-08-16 Efficient Multi-Policy Evaluation for Reinforcement Learning Shuze Liu et.al. 2408.08706 null
2024-08-16 Neural Reward Machines Elena Umili et.al. 2408.08677 link
2024-08-16 Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space Program Alejandro Carrasco et.al. 2408.08676 link
2024-08-16 DeepREST: Automated Test Case Generation for REST APIs Exploiting Deep Reinforcement Learning Davide Corradini et.al. 2408.08594 null
2024-08-16 Multilevel Graph Reinforcement Learning for Consistent Cognitive Decision-making in Heterogeneous Mixed Autonomy Xin Gao et.al. 2408.08516 null
2024-08-16 Deep multi-intentional inverse reinforcement learning for cognitive multi-function radar inverse cognition Hancong Feng et.al. 2408.08478 null
2024-08-15 A Conflicts-free, Speed-lossless KAN-based Reinforcement Learning Decision System for Interactive Driving in Roundabouts Zhihao Lin et.al. 2408.08242 null
2024-08-15 Explaining an Agent’s Future Beliefs through Temporally Decomposing Future Reward Estimators Mark Towers et.al. 2408.08230 link
2024-08-15 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Huajian Xin et.al. 2408.08152 link
2024-08-15 Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players Pragnya Alatur et.al. 2408.08075 null
2024-08-15 An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation Jun Wang et.al. 2408.08047 null
2024-08-15 Adaptive User Journeys in Pharma E-Commerce with Reinforcement Learning: Insights from SwipeRx Ana Fernández del Río et.al. 2408.08024 null
2024-08-15 Experimental evaluation of offline reinforcement learning for HVAC control in buildings Jun Wang et.al. 2408.07986 link
2024-08-15 Meta SAC-Lag: Towards Deployable Safe Reinforcement Learning via MetaGradient-based Hyperparameter Tuning Homayoun Honari et.al. 2408.07962 null
2024-08-15 Solving a Rubik’s Cube Using its Local Graph Structure Shunyu Yao et.al. 2408.07945 null
2024-08-15 IReCa: Intrinsic Reward-enhanced Context-aware Reinforcement Learning for Human-AI Coordination Xin Hao et.al. 2408.07877 null
2024-08-14 Off-Policy Reinforcement Learning with High Dimensional Reward Dong Neuck Lee et.al. 2408.07660 null
2024-08-14 Adaptive Behavioral AI: Reinforcement Learning to Enhance Pharmacy Services Ana Fernández del Río et.al. 2408.07647 null
2024-08-14 SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning Jianye Xu et.al. 2408.07644 link
2024-08-14 Optimizing HIV Patient Engagement with Reinforcement Learning in Resource-Limited Settings África Periáñez et.al. 2408.07629 null
2024-08-14 A Nested Graph Reinforcement Learning-based Decision-making Strategy for Eco-platooning Xin Gao et.al. 2408.07578 null
2024-08-14 Large Language Models Know What Makes Exemplary Contexts Quanyu Long et.al. 2408.07505 null
2024-08-14 Large Language Models Prompting With Episodic Memory Dai Do et.al. 2408.07465 null
2024-08-14 Real-world validation of safe reinforcement learning, model predictive control and decision tree-based home energy management systems Julian Ruddick et.al. 2408.07435 null
2024-08-14 Bridging Training and Execution via Dynamic Directed Graph-Based Communication in Cooperative Multi-Agent Systems Zhuohui Zhang et.al. 2408.07397 null
2024-08-14 Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space Xiaoyang Yu et.al. 2408.07395 null
2024-08-13 LLMs can Schedule Henrik Abgaryan et.al. 2408.06993 link
2024-08-13 IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and Optimization Guanchang Li et.al. 2408.06969 null
2024-08-13 Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation Yanjie Dong et.al. 2408.06945 null
2024-08-13 Multi-Agent Continuous Control with Generative Flow Networks Shuang Luo et.al. 2408.06920 link
2024-08-13 Personalized Dynamic Difficulty Adjustment – Imitation Learning Meets Reinforcement Learning Ronja Fuchs et.al. 2408.06818 link
2024-08-13 Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection Matthias Bartolo et.al. 2408.06803 link
2024-08-13 Residual Deep Reinforcement Learning for Inverter-based Volt-Var Control Qiong Liu et.al. 2408.06790 null
2024-08-13 Deep reinforcement learning for the management of the wall regeneration cycle in wall-bounded turbulent flows Giorgio Maria Cavallazzi et.al. 2408.06783 null
2024-08-13 Robust Deep Reinforcement Learning for Inverter-based Volt-Var Control in Partially Observable Distribution Networks Qiong Liu et.al. 2408.06776 null
2024-08-13 MAPPO-PIS: A Multi-Agent Proximal Policy Optimization Method with Prior Intent Sharing for CAVs’ Cooperative Decision-Making Yicheng Guo et.al. 2408.06656 link
2024-08-12 Body Transformer: Leveraging Robot Embodiment for Policy Learning Carmelo Sferrazza et.al. 2408.06316 null
2024-08-12 Inverse designing metamaterials with programmable nonlinear functional responses in graph space Marco Maurizi et.al. 2408.06300 null
2024-08-12 EyeSight Hand: Design of a Fully-Actuated Dexterous Robot Hand with Integrated Vision-Based Tactile Sensors and Compliant Actuation Branden Romero et.al. 2408.06265 null
2024-08-12 Stable-BC: Controlling Covariate Shift with Stable Behavior Cloning Shaunak A. Mehta et.al. 2408.06246 null
2024-08-12 Building Decision Making Models Through Language Model Regime Yu Zhang et.al. 2408.06087 null
2024-08-12 Sequential sampling without comparison to boundary through model-free reinforcement learning Jamal Esmaily et.al. 2408.06080 null
2024-08-12 Online Optimization of Curriculum Learning Schedules using Evolutionary Optimization Mohit Jiwatode et.al. 2408.06068 null
2024-08-12 GFlowNet Training by Policy Gradients Puhua Niu et.al. 2408.05885 link
2024-08-12 Multi-Agent Deep Reinforcement Learning Framework for Wireless MAC Protocol Design and Optimization Navid Keshtiarast et.al. 2408.05884 null
2024-08-11 Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning Shi Bo et.al. 2408.05860 null
2024-08-09 Deterministic remote entanglement using a chiral quantum interconnect Aziza Almanakly et.al. 2408.05164 null
2024-08-09 Kolmogorov-Arnold Network for Online Reinforcement Learning Victor Augusto Kich et.al. 2408.04841 null
2024-08-09 Multi-User MISO with Stacked Intelligent Metasurfaces: A DRL-Based Sum-Rate Optimization Approach Hao Liu et.al. 2408.04837 null
2024-08-09 Next-Generation Wi-Fi Networks with Generative AI: Design and Insights Jingyu Wang et.al. 2408.04835 null
2024-08-08 Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity Martin Smit et.al. 2408.04549 link
2024-08-08 Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs Kevin Tan et.al. 2408.04526 null
2024-08-08 Model-Based Transfer Learning for Contextual Reinforcement Learning Jung-Hoon Cho et.al. 2408.04498 null
2024-08-08 Reinforcement Learning from Human Feedback for Lane Changing of Autonomous Vehicles in Mixed Traffic Yuting Wang et.al. 2408.04447 null
2024-08-08 Non-maximizing policies that fulfill multi-criterion aspirations in expectation Simon Dima et.al. 2408.04385 null
2024-08-08 Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations Julen Urain et.al. 2408.04380 null
2024-08-08 Deep Reinforcement Learning for the Design of Metamaterial Mechanisms with Functional Compliance Control Yejun Choi et.al. 2408.04376 null
2024-08-08 Goal-Oriented UAV Communication Design and Optimization for Target Tracking: A MachineLearning Approach Wenchao Wu et.al. 2408.04358 null
2024-08-08 KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination Yin Gu et.al. 2408.04336 null
2024-08-08 Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization Aditya Kapoor et.al. 2408.04295 null
2024-08-07 Traffic and Obstacle-aware UAV Positioning in Urban Environments Using Reinforcement Learning Kamran Shafafi et.al. 2408.03894 null
2024-08-07 Navigating the Human Maze: Real-Time Robot Pathfinding with Generative Imitation Learning Martin Moder et.al. 2408.03807 null
2024-08-07 HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks Jingsong Liang et.al. 2408.03768 null
2024-08-07 Asynchronous Credit Assignment Framework for Multi-Agent Reinforcement Learning Yongheng Liang et.al. 2408.03692 null
2024-08-07 RL-ADN: A High-Performance Deep Reinforcement Learning Environment for Optimal Energy Storage Systems Dispatch in Active Distribution Networks Shengren Hou et.al. 2408.03685 null
2024-08-07 AI-Driven approach for sustainable extraction of earth’s subsurface renewable energy while minimizing seismic activity Diego Gutierrez-Oribio et.al. 2408.03664 null
2024-08-07 A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case Sonia Meyer et.al. 2408.03562 null
2024-08-07 Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes Chen Tang et.al. 2408.03539 null
2024-08-06 Spacecraft inertial parameters estimation using time series clustering and reinforcement learning Konstantinos Platanitis et.al. 2408.03445 null
2024-08-06 Communication-Aware Consistent Edge Selection for Mobile Users and Autonomous Vehicles Nazish Tahir et.al. 2408.03435 null
2024-08-07 Adversarial Safety-Critical Scenario Generation using Naturalistic Human Driving Priors Kunkun Hao et.al. 2408.03200 null
2024-08-06 RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning Jiapeng Zhu et.al. 2408.03195 null
2024-08-06 Integrated Intention Prediction and Decision-Making with Spectrum Attention Net and Proximal Policy Optimization Xiao Zhou et.al. 2408.03191 null
2024-08-06 CADRL: Category-aware Dual-agent Reinforcement Learning for Explainable Recommendations over Knowledge Graphs Shangfei Zheng et.al. 2408.03166 null
2024-08-06 QADQN: Quantum Attention Deep Q-Network for Financial Market Prediction Siddhant Dutta et.al. 2408.03088 null
2024-08-06 Research on Autonomous Driving Decision-making Strategies based Deep Reinforcement Learning Zixiang Wang et.al. 2408.03084 null
2024-08-06 Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach Ehsan Badfar et.al. 2408.03077 null
2024-08-06 Learning to Turn: Diffusion Imitation for Robust Row Turning in Under-Canopy Robots Arun N. Sivakumar et.al. 2408.03059 null
2024-08-06 A Course in Dynamic Optimization Bar Light et.al. 2408.03034 null
2024-08-07 Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning Haozhe Ma et.al. 2408.03029 null
2024-08-05 Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion Ho Jae Lee et.al. 2408.02662 null
2024-08-05 Context-aware Mamba-based Reinforcement Learning for social robot navigation Syed Muhammad Mustafa et.al. 2408.02661 null
2024-08-05 Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models? Mohammad Bahrami Karkevandi et.al. 2408.02651 null
2024-08-05 Backward explanations via redefinition of predicates Léo Saulières et.al. 2408.02606 null
2024-08-05 Progressively Selective Label Enhancement for Language Model Alignment Biao Liu et.al. 2408.02599 null
2024-08-05 Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information Yauwai Yim et.al. 2408.02559 null
2024-08-05 Counterfactual Shapley Values for Explaining Reinforcement Learning Yiwei Shi et.al. 2408.02529 null
2024-08-05 Fair Resource Allocation For Hierarchical Federated Edge Learning in Space-Air-Ground Integrated Networks via Deep Reinforcement Learning with Hybrid Control Chong Huang et.al. 2408.02501 null
2024-08-05 Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise Noufel Frikha et.al. 2408.02489 null
2024-08-05 Terracorder: Sense Long and Prosper Josh Millar et.al. 2408.02407 null
2024-08-02 Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer Yu Yang et.al. 2408.01402 null
2024-08-02 NOLO: Navigate Only Look Once Bohan Zhou et.al. 2408.01384 null
2024-08-02 Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation Ruoxuan Feng et.al. 2408.01366 null
2024-08-02 Jacta: A Versatile Planner for Learning Dexterous and Whole-body Manipulation Jan Brüdigam et.al. 2408.01258 null
2024-08-02 Deep progressive reinforcement learning-based flexible resource scheduling framework for IRS and UAV-assisted MEC system Li Dong et.al. 2408.01248 null
2024-08-02 Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems Juan C. Rosero et.al. 2408.01188 null
2024-08-02 Optimizing Variational Quantum Circuits Using Metaheuristic Strategies in Reinforcement Learning Michael Kölle et.al. 2408.01187 null
2024-08-02 TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation Yicheng Lin et.al. 2408.01156 null
2024-08-02 Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning Yueen Ma et.al. 2408.01147 null
2024-08-02 A Survey on Self-play Methods in Reinforcement Learning Ruize Zhang et.al. 2408.01072 null
2024-08-01 A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence Mingyang Liu et.al. 2408.00751 null
2024-08-01 Insurance Portfolio Pursuit with Reinforcement Learning Edward James Young et.al. 2408.00713 null
2024-08-01 Learning in Multi-Objective Public Goods Games with Non-Linear Utilities Nicole Orzan et.al. 2408.00682 null
2024-08-01 Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning Yuanyang Zhu et.al. 2408.00309 null
2024-08-01 A Reinforcement Learning Based Motion Planner for Quadrotor Autonomous Flight in Dense Environment Zhaohong Liu et.al. 2408.00275 null
2024-08-01 Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control Hao Zhou et.al. 2408.00214 null
2024-07-31 CREW: Facilitating Human-AI Teaming Research Lingyu Zhang et.al. 2408.00170 null
2024-07-31 Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates Colin Shea-Blymyer et.al. 2408.00147 null
2024-07-31 Adaptive Transit Signal Priority based on Deep Reinforcement Learning and Connected Vehicles in a Traffic Microsimulation Environment Dickness Kwesiga et.al. 2408.00098 null
2024-07-31 Berkeley Humanoid: A Research Platform for Learning-based Control Qiayuan Liao et.al. 2407.21781 null
2024-07-31 Human-Machine Co-Adaptation for Robot-Assisted Rehabilitation via Dual-Agent Multiple Model Reinforcement Learning (DAMMRL) Yang An et.al. 2407.21734 null
2024-07-31 Multi-agent reinforcement learning for the control of three-dimensional Rayleigh-Bénard convection Joel Vasanth et.al. 2407.21565 null
2024-07-31 Black box meta-learning intrinsic rewards for sparse-reward environments Octavio Pappalardo et.al. 2407.21546 null
2024-07-31 Multi-agent Assessment with QoS Enhancement for HD Map Updates in a Vehicular Network Jeffrey Redondo et.al. 2407.21460 null
2024-07-31 ProSpec RL: Plan Ahead, then Execute Liangliang Liu et.al. 2407.21359 null
2024-07-31 Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks David Valencia et.al. 2407.21338 null
2024-07-31 Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation Taehyun Cho et.al. 2407.21260 null
2024-07-30 VITAL: Visual Teleoperation to Enhance Robot Learning through Human-in-the-Loop Corrections Hamidreza Kasaei et.al. 2407.21244 null
2024-07-30 Learning Stable Robot Grasping with Transformer-based Tactile Control Policies En Yen Puang et.al. 2407.21172 link
2024-07-30 Securing Proof of Stake Blockchains: Leveraging Multi-Agent Reinforcement Learning for Detecting and Mitigating Malicious Nodes Faisal Haque Bappy et.al. 2407.20983 null
2024-07-30 How to Choose a Reinforcement-Learning Algorithm Fabian Bongratz et.al. 2407.20917 null
2024-07-30 ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning Hosung Lee et.al. 2407.20806 link
2024-07-30 Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning Norman Di Palo et.al. 2407.20798 null
2024-07-30 Architectural Influence on Variational Quantum Circuits in Multi-Agent Reinforcement Learning: Evolutionary Strategies for Optimization Michael Kölle et.al. 2407.20739 null
2024-07-30 Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems Qionghua Liao et.al. 2407.20679 null
2024-07-30 Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations Yupei Yang et.al. 2407.20651 null
2024-07-30 Wireless Multi-User Interactive Virtual Reality in Metaverse with Edge-Device Collaborative Computing Caolu Xu et.al. 2407.20523 null
2024-07-30 Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge Yupei Yang et.al. 2407.20506 link
2024-07-29 A Method for Fast Autonomy Transfer in Reinforcement Learning Dinuka Sahabandu et.al. 2407.20466 null
2024-07-29 SAPG: Split and Aggregate Policy Gradients Jayesh Singla et.al. 2407.20230 null
2024-07-29 Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration Yixiao Ma et.al. 2407.20203 null
2024-07-29 Language-Conditioned Offline RL for Multi-Robot Navigation Steven Morad et.al. 2407.20164 null
2024-07-29 Quantum Machine Learning Architecture Search via Deep Reinforcement Learning Xin Dai et.al. 2407.20147 null
2024-07-29 Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning Liyuan Mao et.al. 2407.20109 null
2024-07-29 Counterfactual rewards promote collective transport using individually controlled swarm microrobots Veit-Lorenz Heuthe et.al. 2407.20041 null
2024-07-29 Collision Probability Distribution Estimation via Temporal Difference Learning Thomas Steinecker et.al. 2407.20000 link
2024-07-29 Integrated Communications and Security: RIS-Assisted Simultaneous Transmission and Generation of Secret Keys Ning Gao et.al. 2407.19960 null
2024-07-29 A Differential Dynamic Programming Framework for Inverse Reinforcement Learning Kun Cao et.al. 2407.19902 null
2024-07-29 Imitation Learning for Intra-Day Power Grid Operation through Topology Actions Matthijs de Jong et.al. 2407.19865 null
2024-07-26 SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments Shu Ishida et.al. 2407.18913 null
2024-07-26 Lessons from Learning to Spin “Pens” Jun Wang et.al. 2407.18902 null
2024-07-26 SHANGUS: Deep Reinforcement Learning Meets Heuristic Optimization for Speedy Frontier-Based Exploration of Autonomous Vehicles in Unknown Spaces Seunghyeop Nam et.al. 2407.18892 null
2024-07-26 An Accelerated Multi-level Monte Carlo Approach for Average Reward Reinforcement Learning with General Policy Parametrization Swetha Ganesh et.al. 2407.18878 null
2024-07-26 QT-TDM: Planning with Transformer Dynamics Model and Autoregressive Q-Learning Mostafa Kotb et.al. 2407.18841 null
2024-07-26 The Cross-environment Hyperparameter Setting Benchmark for Reinforcement Learning Andrew Patterson et.al. 2407.18840 null
2024-07-26 Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects Johannes Pitz et.al. 2407.18834 null
2024-07-26 Online Planning in POMDPs with State-Requests Raphael Avalos et.al. 2407.18812 null
2024-07-26 Tuning the kinetics of intracellular transport Ardra Suchitran et.al. 2407.18784 null
2024-07-26 A Deep Reinforcement Learning Approach to Wavefront Control for Exoplanet Imaging Yann Gutierrez et.al. 2407.18733 null
2024-07-25 Recursive Introspection: Teaching Language Model Agents How to Self-Improve Yuxiao Qu et.al. 2407.18219 null
2024-07-25 Differentiable Quantum Architecture Search in Asynchronous Quantum Reinforcement Learning Samuel Yen-Chi Chen et.al. 2407.18202 null
2024-07-25 Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation Jean Seong Bjorn Choe et.al. 2407.18143 null
2024-07-25 MapTune: Advancing ASIC Technology Mapping via Reinforcement Learning Guided Library Tuning Mingju Liu et.al. 2407.18110 link
2024-07-25 Principal-Agent Reinforcement Learning Dima Ivanov et.al. 2407.18074 null
2024-07-25 Multi-Agent Deep Reinforcement Learning for Resilience Optimization in 5G RAN Soumeya Kaada et.al. 2407.18066 null
2024-07-25 Personalized and Context-aware Route Planning for Edge-assisted Vehicles Dinesh Cyril Selvaraj et.al. 2407.17980 null
2024-07-25 Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization Feihu Huang et.al. 2407.17823 null
2024-07-25 Advanced deep-reinforcement-learning methods for flow control: group-invariant and positional-encoding networks improve learning speed and quality Joogoo Jeon et.al. 2407.17822 null
2024-07-25 Preliminary Results of Neuromorphic Controller Design and a Parkinson’s Disease Dataset Building for Closed-Loop Deep Brain Stimulation Ananna Biswas et.al. 2407.17756 null
2024-07-24 Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning Shuang Qiu et.al. 2407.17466 null
2024-07-24 Toward human-centered shared autonomy AI paradigms for human-robot teaming in healthcare Reza Abiri et.al. 2407.17464 null
2024-07-24 SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning Jianpeng Yao et.al. 2407.17460 null
2024-07-24 Joint Transmit and Jamming Power Optimization for Secrecy in Energy Harvesting Networks: A Reinforcement Learning Approach Shalini Tripathi et.al. 2407.17435 null
2024-07-24 Market Making with Exogenous Competition Robert Boyce et.al. 2407.17393 null
2024-07-24 MoveLight: Enhancing Traffic Signal Control through Movement-Centric Deep Reinforcement Learning Junqi Shao et.al. 2407.17303 null
2024-07-24 Pretrained Visual Representations in Reinforcement Learning Emlyn Williams et.al. 2407.17238 null
2024-07-24 Sublinear Regret for An Actor-Critic Algorithm in Continuous-Time Linear-Quadratic Reinforcement Learning Yilie Huang et.al. 2407.17226 null
2024-07-24 Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization Jonathan Pirnay et.al. 2407.17206 link
2024-07-24 Path Following and Stabilisation of a Bicycle Model using a Reinforcement Learning Approach Sebastian Weyrer et.al. 2407.17156 null
2024-07-23 A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data Adrian Remonda et.al. 2407.16680 link
2024-07-23 From Imitation to Refinement – Residual RL for Precise Visual Assembly Lars Ankile et.al. 2407.16677 null
2024-07-23 Efficient Discovery of Actual Causality using Abstraction-Refinement Arshia Rafieioskouei et.al. 2407.16629 null
2024-07-23 Functional Acceleration for Policy Mirror Descent Veronica Chelu et.al. 2407.16602 null
2024-07-23 Real-Time Interactions Between Human Controllers and Remote Devices in Metaverse Kan Chen et.al. 2407.16591 null
2024-07-23 TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback Eunseop Yoon et.al. 2407.16574 null
2024-07-23 Cross Anything: General Quadruped Robot Navigation through Complex Terrains Shaoting Zhu et.al. 2407.16412 null
2024-07-23 Evaluating Uncertainties in Electricity Markets via Machine Learning and Quantum Computing Shuyang Zhu et.al. 2407.16404 null
2024-07-23 Reinforcement Learning-based Adaptive Mitigation of Uncorrected DRAM Errors in the Field Isaac Boixaderas et.al. 2407.16377 null
2024-07-23 Arbitrary quantum states preparation aided by deep reinforcement learning Zhao-Wei Wang et.al. 2407.16368 null
2024-07-22 WayEx: Waypoint Exploration using a Single Demonstration Mara Levy et.al. 2407.15849 null
2024-07-23 QueST: Self-Supervised Skill Abstractions for Learning Continuous Control Atharva Mete et.al. 2407.15840 null
2024-07-22 Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments Mansur Arief et.al. 2407.15839 null
2024-07-22 On shallow planning under partial observability Randy Lefebvre et.al. 2407.15820 null
2024-07-22 Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning Zhecheng Yuan et.al. 2407.15815 null
2024-07-22 Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels Zhuorui Ye et.al. 2407.15786 null
2024-07-22 Diffusion Model Based Resource Allocation Strategy in Ultra-Reliable Wireless Networked Control Systems Amirhassan Babazadeh Darabi et.al. 2407.15784 null
2024-07-22 How to Shrink Confidence Sets for Many Equivalent Discrete Distributions? Odalric-Ambrym Maillard et.al. 2407.15662 null
2024-07-22 Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN Norman Becker et.al. 2407.15656 null
2024-07-22 Reinforcement Learning Meets Visual Odometry Nico Messikommer et.al. 2407.15626 null
2024-07-19 Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification Thomas Kwa et.al. 2407.14503 null
2024-07-19 Explainable Post hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning agent Alejandra de la Rica Escudero et.al. 2407.14486 link
2024-07-19 Data-Centric Human Preference Optimization with Rationales Hoang Anh Just et.al. 2407.14477 null
2024-07-19 FuzzTheREST: An Intelligent Automated Black-box RESTful API Fuzzer Tiago Dias et.al. 2407.14361 null
2024-07-19 Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning Nihal Acharya Adde et.al. 2407.14262 null
2024-07-19 On Policy Evaluation Algorithms in Distributional Reinforcement Learning Julian Gerstenberg et.al. 2407.14175 null
2024-07-19 A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C Neil De La Fuente et.al. 2407.14151 link
2024-07-19 Track-MDP: Reinforcement Learning for Target Tracking with Controlled Sensing Adarsh M. Subramaniam et.al. 2407.13995 null
2024-07-19 The Effect of Training Schedules on Morphological Robustness and Generalization Edoardo Barba et.al. 2407.13965 link
2024-07-18 Event-Triggered Reinforcement Learning Based Joint Resource Allocation for Ultra-Reliable Low-Latency V2X Communications Nasir Khan et.al. 2407.13947 null
2024-07-18 Random Latent Exploration for Deep Reinforcement Learning Srinath Mahankali et.al. 2407.13755 null
2024-07-18 Optimistic Q-learning for average reward and episodic reinforcement learning Priyank Agrawal et.al. 2407.13743 null
2024-07-18 Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review Masatoshi Uehara et.al. 2407.13734 null
2024-07-18 A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice Shaina Raza et.al. 2407.13699 null
2024-07-18 Misspecified $Q$ -Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error Ally Yalei Du et.al. 2407.13622 null
2024-07-18 Hyp2Nav: Hyperbolic Planning and Curiosity for Crowd Navigation Alessandro Flaborea et.al. 2407.13567 null
2024-07-18 Model-based Policy Optimization using Symbolic World Model Andrey Gorodetskiy et.al. 2407.13518 null
2024-07-18 Instance Selection for Dynamic Algorithm Configuration with Reinforcement Learning: Improving Generalization Carolin Benjamins et.al. 2407.13513 null
2024-07-18 LIMT: Language-Informed Multi-Task Visual World Models Elie Aljalbout et.al. 2407.13466 null
2024-07-18 The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations Jan Ole von Hartz et.al. 2407.13432 null
2024-07-17 Navigating the Smog: A Cooperative Multi-Agent RL for Accurate Air Pollution Mapping through Data Assimilation Ichrak Mokhtari et.al. 2407.12539 null
2024-07-17 Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models Xihe Qiu et.al. 2407.12532 null
2024-07-17 Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments Runfa Chen et.al. 2407.12505 null
2024-07-17 Estimating Reaction Barriers with Deep Reinforcement Learning Adittya Pal et.al. 2407.12453 null
2024-07-17 Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning Xu-Hui Liu et.al. 2407.12448 link
2024-07-17 Variable-Agnostic Causal Exploration for Reinforcement Learning Minh Hoang Nguyen et.al. 2407.12437 null
2024-07-17 Flow Matching Imitation Learning for Multi-Support Manipulation Quentin Rouxel et.al. 2407.12381 null
2024-07-17 A foundation model approach to guide antimicrobial peptide design in the era of artificial intelligence driven scientific discovery Jike Wang et.al. 2407.12296 null
2024-07-17 Chip Placement with Diffusion Vint Lee et.al. 2407.12282 null
2024-07-17 Individualized Federated Learning for Traffic Prediction with Error Driven Aggregation Hang Chen et.al. 2407.12226 link
2024-07-16 Why long model-based rollouts are no reason for bad Q-value estimates Philipp Wissmann et.al. 2407.11751 null
2024-07-16 Pareto local search for a multi-objective demand response problem in residential areas with heat pumps and electric vehicles Thomas Dengiz et.al. 2407.11719 null
2024-07-16 A Comparative Analysis of Interactive Reinforcement Learning Algorithms in Warehouse Robot Grid Based Environment Arunabh Bora et.al. 2407.11671 null
2024-07-16 Exciting Action: Investigating Efficient Exploration for Learning Musculoskeletal Humanoid Locomotion Henri-Jacques Geiß et.al. 2407.11658 null
2024-07-16 Building Resilience in Wireless Communication Systems With a Secret-Key Budget Karl-Ludwig Besser et.al. 2407.11604 null
2024-07-16 Learning to Imitate Spatial Organization in Multi-robot Systems Ayomide O. Agunloye et.al. 2407.11592 null
2024-07-16 Green Resource Allocation in Cloud-Native O-RAN Enabled Small Cell Networks Rana M. Sohaib et.al. 2407.11563 null
2024-07-16 RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards Fatemeh Zargarbashi et.al. 2407.11562 null
2024-07-16 Imitation learning with artificial neural networks for demand response with a heuristic control approach for heat pumps Thomas Dengiz et.al. 2407.11561 null
2024-07-16 DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN Rana M. Sohaib et.al. 2407.11558 null
2024-07-15 Walking the Values in Bayesian Inverse Reinforcement Learning Ondrej Bajgar et.al. 2407.10971 null
2024-07-15 BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning Haohong Lin et.al. 2407.10967 null
2024-07-15 Hedging Beyond the Mean: A Distributional Reinforcement Learning Perspective for Hedging Portfolios with Structured Products Anil Sharma et.al. 2407.10903 null
2024-07-15 Offline Reinforcement Learning with Imputed Rewards Carlo Romeo et.al. 2407.10839 null
2024-07-15 Exploration in Knowledge Transfer Utilizing Reinforcement Learning Adam Jedlička et.al. 2407.10835 null
2024-07-15 GuideLight: “Industrial Solution” Guidance for More Practical Traffic Signal Control Agents Haoyuan Jiang et.al. 2407.10811 null
2024-07-15 DINO Pre-training for Vision-based End-to-end Autonomous Driving Shubham Juneja et.al. 2407.10803 null
2024-07-15 Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning Alessandro Montenegro et.al. 2407.10775 null
2024-07-16 Back to Newton’s Laws: Learning Vision-based Agile Flight via Differentiable Physics Yuang Zhang et.al. 2407.10648 null
2024-07-15 Balancing the Scales: Reinforcement Learning for Fair Classification Leon Eshuijs et.al. 2407.10629 null
2024-07-12 Learning Coordinated Maneuver in Adversarial Environments Zechen Hu et.al. 2407.09469 null
2024-07-12 ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts Amelia F. Hardy et.al. 2407.09447 null
2024-07-12 A Benchmark Environment for Offline Reinforcement Learning in Racing Games Girolamo Macaluso et.al. 2407.09415 link
2024-07-12 Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments Zoya Volovikova et.al. 2407.09287 null
2024-07-12 GNN with Model-based RL for Multi-agent Systems Hanxiao Chen et.al. 2407.09249 null
2024-07-12 Constrained Intrinsic Motivation for Reinforcement Learning Xiang Zheng et.al. 2407.09247 null
2024-07-12 Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network Shun Kotoku et.al. 2407.09124 null
2024-07-12 New Desiderata for Direct Preference Optimization Xiangkun Hu et.al. 2407.09072 null
2024-07-12 Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control Huayu Chen et.al. 2407.09024 null
2024-07-12 Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control Sicong Jiang et.al. 2407.08964 null
2024-07-11 MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces Wayne Wu et.al. 2407.08725 null
2024-07-11 RoboMorph: Evolving Robot Morphology using Large Language Models Kevin Qiu et.al. 2407.08626 null
2024-07-11 A Review of Nine Physics Engines for Reinforcement Learning Research Michael Kaup et.al. 2407.08590 null
2024-07-11 HACMan++: Spatially-Grounded Motion Primitives for Manipulation Bowen Jiang et.al. 2407.08585 null
2024-07-11 Imitation Learning for Robotic Assisted Ultrasound Examination of Deep Venous Thrombosis using Kernelized Movement Primitives Diego Dall’Alba et.al. 2407.08506 null
2024-07-11 TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations Junik Bae et.al. 2407.08464 null
2024-07-11 Distributed Deep Reinforcement Learning Based Gradient Quantization for Federated Learning Enabled Vehicle Edge Computing Cui Zhang et.al. 2407.08462 null
2024-07-11 Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning Shulin Song et.al. 2407.08458 link
2024-07-11 A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning Adrien Banse et.al. 2407.08324 null
2024-07-11 A Deep Reinforcement Learning Framework and Methodology for Reducing the Sim-to-Real Gap in ASV Navigation Luis F W Batista et.al. 2407.08263 null
2024-07-10 Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing Jessica Yin et.al. 2407.07885 null
2024-07-10 Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation Eugene Teoh et.al. 2407.07868 null
2024-07-10 Reinforcement Learning of Adaptive Acquisition Policies for Inverse Problems Gianluigi Silvestri et.al. 2407.07794 null
2024-07-11 BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark Nikita Chernyadev et.al. 2407.07788 null
2024-07-10 Continuous Control with Coarse-to-fine Reinforcement Learning Younggyo Seo et.al. 2407.07787 null
2024-07-10 Towards Human-Like Driving: Active Inference in Autonomous Vehicle Control Elahe Delavari et.al. 2407.07684 null
2024-07-10 Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning Dake Zhang et.al. 2407.07631 null
2024-07-10 Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network Yu Xie et.al. 2407.07575 link
2024-07-10 CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias Jiacheng Shen et.al. 2407.07454 link
2024-07-10 Real-time system optimal traffic routing under uncertainties – Can physics models boost reinforcement learning? Zemian Ke et.al. 2407.07364 null
2024-07-09 Safe and Reliable Training of Learning-Based Aerospace Controllers Udayan Mandal et.al. 2407.07088 null
2024-07-09 Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models Logan Cross et.al. 2407.07086 link
2024-07-09 Can Learned Optimization Make Reinforcement Learning Less Difficult? Alexander David Goldie et.al. 2407.07082 link
2024-07-09 A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning Jesse Jiang et.al. 2407.06931 null
2024-07-09 Intercepting Unauthorized Aerial Robots in Controlled Airspace Using Reinforcement Learning Francisco Giral et.al. 2407.06909 null
2024-07-09 Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective Shahana Ibrahim et.al. 2407.06902 null
2024-07-09 Energy Efficient Fair STAR-RIS for Mobile Users Ashok S. Kumar et.al. 2407.06868 null
2024-07-09 Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning Augustine N. Mavor-Parker et.al. 2407.06756 null
2024-07-09 Hierarchical Average-Reward Linearly-solvable Markov Decision Processes Guillermo Infante et.al. 2407.06690 null
2024-07-09 Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning Fanyue Wei et.al. 2407.06642 link
2024-07-08 Periodic agent-state based Q-learning for POMDPs Amit Sinha et.al. 2407.06121 null
2024-07-08 QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train Chen-Yu Liu et.al. 2407.06103 null
2024-07-08 Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation Sara Pohland et.al. 2407.06056 link
2024-07-08 iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement Aoyu Pang et.al. 2407.06025 link
2024-07-08 Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals Moritz Reuss et.al. 2407.05996 null
2024-07-08 On Bellman equations for continuous-time policy evaluation I: discretization and approximation Wenlong Mou et.al. 2407.05966 null
2024-07-08 Graph Anomaly Detection with Noisy Labels by Reinforcement Learning Zhu Wang et.al. 2407.05934 null
2024-07-08 FedMRL: Data Heterogeneity Aware Federated Multi-agent Deep Reinforcement Learning for Medical Imaging Pranab Sahoo et.al. 2407.05800 link
2024-07-08 Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning Jakob Nyberg et.al. 2407.05775 link
2024-07-08 Multi-agent Reinforcement Learning-based Network Intrusion Detection System Amine Tellache et.al. 2407.05766 null
2024-07-05 Graph Reinforcement Learning in Power Grids: A Survey Mohamed Hassouna et.al. 2407.04522 null
2024-07-05 Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks Timon Sachweh et.al. 2407.04481 null
2024-07-05 Hindsight Preference Learning for Offline Preference-based Reinforcement Learning Chen-Xiao Gao et.al. 2407.04451 link
2024-07-05 Enhancing Safety for Autonomous Agents in Partly Concealed Urban Traffic Environments Through Representation-Based Shielding Pierre Haritz et.al. 2407.04343 null
2024-07-05 Gradient-based Regularization for Action Smoothness in Robotic Control with Reinforcement Learning I Lee et.al. 2407.04315 null
2024-07-05 Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling Jiawei Xu et.al. 2407.04285 null
2024-07-05 Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator Mehryar Abbasi et.al. 2407.04258 null
2024-07-05 PA-LOCO: Learning Perturbation-Adaptive Locomotion for Quadruped Robots Zhiyuan Xiao et.al. 2407.04224 null
2024-07-05 Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents Sam Earle et.al. 2407.04221 null
2024-07-04 Orchestrating LLMs with Different Personalizations Jin Peng Zhou et.al. 2407.04181 null
2024-07-03 Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations Trevor Ablett et.al. 2407.03311 link
2024-07-03 A Review of the Applications of Deep Learning-Based Emergent Communication Brendon Boldt et.al. 2407.03302 null
2024-07-03 Cooperative Multi-Agent Deep Reinforcement Learning Methods for UAV-aided Mobile Edge Computing Networks Mintae Kim et.al. 2407.03280 null
2024-07-03 Policy-guided Monte Carlo on general state spaces: Application to glass-forming mixtures Leonardo Galliano et.al. 2407.03275 null
2024-07-03 PPO-based Dynamic Control of Uncertain Floating Platforms in the Zero-G Environment Mahya Ramezani et.al. 2407.03224 null
2024-07-03 Combining AI Control Systems and Human Decision Support via Robustness and Criticality Walt Woods et.al. 2407.03210 null
2024-07-03 Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning Runyu Ding et.al. 2407.03162 null
2024-07-03 Reinforcement Learning for Sequence Design Leveraging Protein Language Models Jithendaraa Subramanian et.al. 2407.03154 null
2024-07-03 Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes Asaf Cassel et.al. 2407.03065 null
2024-07-03 Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment Janghwan Lee et.al. 2407.03051 null
2024-07-02 PWM: Policy Learning with Large World Models Ignat Georgiev et.al. 2407.02466 null
2024-07-02 Predicting Visual Attention in Graphic Design Documents Souradeep Chakraborty et.al. 2407.02439 null
2024-07-02 Reinforcement Learning and Machine ethics:a systematic review Ajay Vishwanath et.al. 2407.02425 null
2024-07-02 Talking to Machines: do you read me? Lina M. Rojas-Barahona et.al. 2407.02354 null
2024-07-02 DextrAH-G: Pixels-to-Action Dexterous Arm-Hand Grasping with Geometric Fabrics Tyler Ga Wei Lum et.al. 2407.02274 null
2024-07-02 Safe CoR: A Dual-Expert Approach to Integrating Imitation Learning and Safe Reinforcement Learning Using Constraint Rewards Hyeokjin Kwon et.al. 2407.02245 null
2024-07-02 Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization Yuchen Hu et.al. 2407.02243 null
2024-07-02 Safety-Driven Deep Reinforcement Learning Framework for Cobots: A Sim2Real Approach Ammar N. Abbas et.al. 2407.02231 link
2024-07-02 Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning Zakariae El Asri et.al. 2407.02217 null
2024-07-02 Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning Yifang Chen et.al. 2407.02119 null
2024-06-28 PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators Kuo-Hao Zeng et.al. 2406.20083 null
2024-06-28 Applying RLAIF for Code Generation with API-usage in Lightweight LLMs Sujan Dutta et.al. 2406.20060 null
2024-06-28 HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid Xinyu Xu et.al. 2406.19972 null
2024-06-28 Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies Pingcheng Jian et.al. 2406.19971 null
2024-06-28 Operator World Models for Reinforcement Learning Pietro Novelli et.al. 2406.19861 null
2024-06-28 3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints Yoonkyu Yoo et.al. 2406.19848 null
2024-06-28 Reinforcement Learning for Efficient Design and Control Co-optimisation of Energy Systems Marine Cauz et.al. 2406.19825 null
2024-06-28 Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning Tobias Nagel et.al. 2406.19817 null
2024-06-28 Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs Shiyu Zhang et.al. 2406.19812 null
2024-06-28 Decision Transformer for IRS-Assisted Systems with Diffusion-Driven Generative Channels Jie Zhang et.al. 2406.19769 null
2024-06-27 Efficient World Models with Context-Aware Tokenization Vincent Micheli et.al. 2406.19320 link
2024-06-27 Averaging log-likelihoods in direct alignment Nathan Grinsztajn et.al. 2406.19188 null
2024-06-27 Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion Yannis Flet-Berliac et.al. 2406.19185 null
2024-06-27 Learning Pareto Set for Multi-Objective Continuous Robot Control Tianye Shu et.al. 2406.18924 link
2024-06-27 Autonomous Control of a Novel Closed Chain Five Bar Active Suspension via Deep Reinforcement Learning Nishesh Singh et.al. 2406.18899 null
2024-06-27 State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems Tochukwu Elijah Ogri et.al. 2406.18804 null
2024-06-26 Decentralized Semantic Traffic Control in AVs Using RL and DQN for Dynamic Roadblocks Emanuel Figetakis et.al. 2406.18741 null
2024-06-26 Confident Natural Policy Gradient for Local Planning in $q_π$ -realizable Constrained MDPs Tian Tian et.al. 2406.18529 null
2024-06-26 Mental Modeling of Reinforcement Learning Agents by Language Models Wenhao Lu et.al. 2406.18505 null
2024-06-26 Preference Elicitation for Offline Reinforcement Learning Alizée Pace et.al. 2406.18450 null
2024-06-26 Mixture of Experts in a Mixture of RL settings Timon Willi et.al. 2406.18420 null
2024-06-26 AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors Hao Shi et.al. 2406.18394 null
2024-06-26 Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control Zifan Liu et.al. 2406.18351 null
2024-06-26 AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations Adam Dahlgren Lindström et.al. 2406.18346 null
2024-06-26 Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution Wenting Chen et.al. 2406.18310 link
2024-06-26 Combining Automated Optimisation of Hyperparameters and Reward Shape Julian Dierkes et.al. 2406.18293 link
2024-06-26 Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems Italo Luis da Silva et.al. 2406.18245 link
2024-06-25 EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data Jesse Zhang et.al. 2406.17768 null
2024-06-25 When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning Claas Voelcker et.al. 2406.17718 null
2024-06-25 Privacy Preserving Reinforcement Learning for Population Processes Samuel Yang-Zhao et.al. 2406.17649 null
2024-06-25 KANQAS: Kolmogorov Arnold Network for Quantum Architecture Search Akash Kundu et.al. 2406.17630 link
2024-06-25 Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations Cheng Wang et.al. 2406.17576 null
2024-06-25 On the consistency of hyper-parameter selection in value-based deep reinforcement learning Johan Obando-Ceron et.al. 2406.17523 null
2024-06-25 BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO Sebastian Dittert et.al. 2406.17490 null
2024-06-25 CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems Zhen Chen et.al. 2406.17425 null
2024-06-25 Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning Tianfu Wang et.al. 2406.17334 link
2024-06-25 The State-Action-Reward-State-Action Algorithm in Spatial Prisoner’s Dilemma Game Lanyu Yang et.al. 2406.17326 null
2024-06-24 Confidence Aware Inverse Constrained Reinforcement Learning Sriram Ganapathi Subramanian et.al. 2406.16782 null
2024-06-24 WARP: On the Benefits of Weight Averaged Rewarded Policies Alexandre Ramé et.al. 2406.16768 null
2024-06-24 The MRI Scanner as a Diagnostic: Image-less Active Sampling Yuning Du et.al. 2406.16754 null
2024-06-24 OCALM: Object-Centric Assessment with Language Models Timo Kaufmann et.al. 2406.16748 null
2024-06-24 Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization Zhengyue Zhao et.al. 2406.16743 null
2024-06-24 Probabilistic Subgoal Representations for Hierarchical Reinforcement learning Vivienne Huiling Wang et.al. 2406.16707 null
2024-06-24 Decentralized RL-Based Data Transmission Scheme for Energy Efficient Harvesting Rafaela Scaciota et.al. 2406.16624 null
2024-06-24 Towards Physically Talented Aerial Robots with Tactically Smart Swarm Behavior thereof: An Efficient Co-design Approach Prajit KrisshnaKumar et.al. 2406.16612 null
2024-06-24 $\text{Alpha}^2$ : Discovering Logical Formulaic Alphas using Deep Reinforcement Learning Feng Xu et.al. 2406.16505 link
2024-06-24 Towards Comprehensive Preference Data Collection for Reward Modeling Yulan Hu et.al. 2406.16486 null
2024-06-21 MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Xuan He et.al. 2406.15252 null
2024-06-21 Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning Sattar Vakili et.al. 2406.15250 null
2024-06-21 Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting Jiyong Oh et.al. 2406.15225 null
2024-06-21 Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks Alex Quach et.al. 2406.15149 null
2024-06-21 KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty Philipp Becker et.al. 2406.15131 null
2024-06-21 A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning Gianluca Drappo et.al. 2406.15124 null
2024-06-21 Towards General Negotiation Strategies with End-to-End Reinforcement Learning Bram M. Renting et.al. 2406.15096 null
2024-06-21 KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning Jiahan Chen et.al. 2406.15073 null
2024-06-21 Behaviour Distillation Andrei Lupu et.al. 2406.15042 link
2024-06-21 SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning Matthias Weissenbacher et.al. 2406.15025 null
2024-06-20 CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics Jiawei Gao et.al. 2406.14558 null
2024-06-20 MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading Chuqiao Zong et.al. 2406.14537 link
2024-06-20 RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold Amrith Setlur et.al. 2406.14532 link
2024-06-20 Learning telic-controllable state representations Nadav Amir et.al. 2406.14476 null
2024-06-20 Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue Huifang Du et.al. 2406.14457 null
2024-06-20 Revealing the learning process in reinforcement learning agents through attention-oriented metrics Charlotte Beylier et.al. 2406.14324 null
2024-06-20 Resource Optimization for Tail-Based Control in Wireless Networked Control Systems Rasika Vijithasena et.al. 2406.14301 null
2024-06-21 REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability Shuang Ao et.al. 2406.14214 link
2024-06-20 Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning Amit Sharma et.al. 2406.14169 null
2024-06-20 Iterative Sizing Field Prediction for Adaptive Mesh Generation From Expert Demonstrations Niklas Freymuth et.al. 2406.14161 link
2024-06-18 Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts Haoxiang Wang et.al. 2406.12845 link
2024-06-18 Injection Optimization at Particle Accelerators via Reinforcement Learning: From Simulation to Real-World Application Awal Awal et.al. 2406.12735 null
2024-06-18 A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement Learning Flora Angileri et.al. 2406.12667 null
2024-06-18 Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry A. L. García Navarro et.al. 2406.12602 null
2024-06-18 Discovering Minimal Reinforcement Learning Environments Jarek Liesen et.al. 2406.12589 null
2024-06-18 RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation Shuting Wang et.al. 2406.12566 null
2024-06-18 A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo Miguel Vasco et.al. 2406.12563 null
2024-06-18 Offline Imitation Learning with Model-based Reverse Augmentation Jie-Jing Shao et.al. 2406.12550 null
2024-06-18 Demonstrating Agile Flight from Pixels without State Estimation Ismail Geles et.al. 2406.12505 null
2024-06-18 Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning Harry Robertshaw et.al. 2406.12499 null
2024-06-17 WPO: Enhancing RLHF with Weighted Preference Optimization Wenxuan Zhou et.al. 2406.11827 link
2024-06-17 Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics Runzhe Wu et.al. 2406.11810 null
2024-06-17 Run Time Assured Reinforcement Learning for Six Degree-of-Freedom Spacecraft Inspection Kyle Dunlap et.al. 2406.11795 null
2024-06-17 FetchBench: A Simulation Benchmark for Robot Fetching Beining Han et.al. 2406.11793 null
2024-06-17 Optimal Transport-Assisted Risk-Sensitive Q-Learning Zahra Shahrooei et.al. 2406.11774 null
2024-06-17 Measuring memorization in RLHF for code completion Aneesh Pappu et.al. 2406.11715 null
2024-06-17 The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation Noah Golowich et.al. 2406.11686 null
2024-06-17 Communication-Efficient MARL for Platoon Stability and Energy-efficiency Co-optimization in Cooperative Adaptive Cruise Control of CAVs Min Hua et.al. 2406.11653 null
2024-06-17 Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions Noah Golowich et.al. 2406.11640 null
2024-06-17 Style Transfer with Multi-iteration Preference Optimization Shuai Liu et.al. 2406.11581 null
2024-06-14 Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Rui Yang et.al. 2406.10216 null
2024-06-14 A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors Naaman Tan et.al. 2406.10203 null
2024-06-14 Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication Sanjali Yadav et.al. 2406.10166 null
2024-06-14 Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models Carson Denison et.al. 2406.10162 link
2024-06-14 BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation Dongjie Yu et.al. 2406.10093 null
2024-06-14 PRIMER: Perception-Aware Robust Learning-based Multiagent Trajectory Planner Kota Kondo et.al. 2406.10060 null
2024-06-14 Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation Federico Tavella et.al. 2406.10043 null
2024-06-14 ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR Vishwanath Pratap Singh et.al. 2406.09999 null
2024-06-14 Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model Siemen Herremans et.al. 2406.09976 link
2024-06-14 InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning Tiancheng Li et.al. 2406.09973 null
2024-06-13 Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms Miaosen Zhang et.al. 2406.09397 null
2024-06-13 Is Value Learning Really the Main Bottleneck in Offline RL? Seohong Park et.al. 2406.09329 null
2024-06-13 OpenVLA: An Open-Source Vision-Language-Action Model Moo Jin Kim et.al. 2406.09246 null
2024-06-13 AutomaChef: A Physics-informed Demonstration-guided Learning Framework for Granular Material Manipulation Minglun Wei et.al. 2406.09178 null
2024-06-13 Direct Imitation Learning-based Visual Servoing using the Large Projection Formulation Sayantan Auddy et.al. 2406.09120 null
2024-06-13 Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems Ashwin P. Dani et.al. 2406.09097 null
2024-06-13 DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning Xuemin Hu et.al. 2406.09089 null
2024-06-13 Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles Hao Zhang et.al. 2406.09082 null
2024-06-13 Latent Assistance Networks: Rediscovering Hyperbolic Tangents in RL Jacob E. Kooi et.al. 2406.09079 null
2024-06-13 Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation Claude Formanek et.al. 2406.09068 null
2024-06-12 RILe: Reinforced Imitation Learning Mert Albaba et.al. 2406.08472 null
2024-06-12 Adaptive Swarm Mesh Refinement using Deep Reinforcement Learning with Local Rewards Niklas Freymuth et.al. 2406.08440 null
2024-06-12 RRLS : Robust Reinforcement Learning Suite Adil Zouitine et.al. 2406.08406 link
2024-06-12 Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning Yuhui Wang et.al. 2406.08404 null
2024-06-12 Time-Constrained Robust MDPs Adil Zouitine et.al. 2406.08395 null
2024-06-12 Residual Learning and Context Encoding for Adaptive Offline-to-Online Reinforcement Learning Mohammadreza Nakhaei et.al. 2406.08238 link
2024-06-12 MaIL: Improving Imitation Learning with Mamba Xiaogang Jia et.al. 2406.08234 null
2024-06-12 Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning Max Weltevrede et.al. 2406.08069 null
2024-06-12 Deep reinforcement learning with positional context for intraday trading Sven Goluža et.al. 2406.08013 null
2024-06-12 Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning Yizhe Huang et.al. 2406.08002 null
2024-06-11 CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning Zeyuan Liu et.al. 2406.07541 null
2024-06-11 BAKU: An Efficient Transformer for Multi-Task Policy Learning Siddhant Haldar et.al. 2406.07539 null
2024-06-11 Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis Qining Zhang et.al. 2406.07455 null
2024-06-11 Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization Weiliang Zhang et.al. 2406.07418 null
2024-06-11 Federated Multi-Agent DRL for Radio Resource Management in Industrial 6G in-X subnetworks Bjarke Madsen et.al. 2406.07383 null
2024-06-11 World Models with Hints of Large Language Models for Goal Achieving Zeyuan Liu et.al. 2406.07381 null
2024-06-11 EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning Yijun Hao et.al. 2406.07342 null
2024-06-11 Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling Constantin Waubert de Puiseau et.al. 2406.07325 null
2024-06-11 Multi-objective Reinforcement learning from AI Feedback Marcus Williams et.al. 2406.07295 null
2024-06-11 Hybrid Reinforcement Learning from Offline Observation Alone Yuda Song et.al. 2406.07253 null
2024-06-10 Verification-Guided Shielding for Deep Reinforcement Learning Davide Corsi et.al. 2406.06507 null
2024-06-10 Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation Mohidul Haque Mridul et.al. 2406.06500 null
2024-06-10 Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity Calarina Muslimani et.al. 2406.06495 null
2024-06-10 Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots Bahador Beigomi et.al. 2406.06460 link
2024-06-10 Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning? Denis Tarasov et.al. 2406.06309 link
2024-06-10 Learning-based cognitive architecture for enhancing coordination in human groups Antonio Grotta et.al. 2406.06297 null
2024-06-10 Deep Multi-Objective Reinforcement Learning for Utility-Based Infrastructural Maintenance Optimization Jesse van Remmerden et.al. 2406.06184 null
2024-06-10 Mastering truss structure optimization with tree search Gabriel E. Garayalde et.al. 2406.06145 null
2024-06-10 EXPIL: Explanatory Predicate Invention for Learning in Games Jingyuan Sha et.al. 2406.06107 null
2024-06-10 Sim-To-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery Paul Maria Scheikl et.al. 2406.06092 null
2024-06-07 LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration Tavor Lipman et.al. 2406.05107 null
2024-06-07 Massively Multiagent Minigames for Training Generalist Agents Kyoung Whan Choe et.al. 2406.05071 link
2024-06-07 Online Frequency Scheduling by Learning Parallel Actions Anastasios Giovanidis et.al. 2406.05041 null
2024-06-07 Optimizing Automatic Differentiation with Deep Reinforcement Learning Jamie Lohoff et.al. 2406.05027 null
2024-06-07 Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems Rohan Paleja et.al. 2406.05003 null
2024-06-07 SLOPE: Search with Learned Optimal Pruning-based Expansion Davor Bokan et.al. 2406.04935 link
2024-06-07 Sim-to-real Transfer of Deep Reinforcement Learning Agents for Online Coverage Path Planning Arvi Jonnarth et.al. 2406.04920 null
2024-06-07 Online Adaptation for Enhancing Imitation Learning Policies Federico Malato et.al. 2406.04913 link
2024-06-07 Stabilizing Extreme Q-learning by Maclaurin Expansion Motoki Omura et.al. 2406.04896 null
2024-06-07 Primitive Agentic First-Order Optimization R. Sala et.al. 2406.04841 null
2024-06-06 ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories Qianlan Yang et.al. 2406.04323 null
2024-06-06 Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models Xiang Ji et.al. 2406.04274 null
2024-06-06 Multi-Agent Imitation Learning: Value is Easy, Regret is Hard Jingwu Tang et.al. 2406.04219 null
2024-06-06 Aligning Agents like Large Language Models Adam Jelley et.al. 2406.04208 null
2024-06-06 MARLander: A Local Path Planning for Drone Swarms using Multiagent Deep Reinforcement Learning Demetros Aschu et.al. 2406.04159 null
2024-06-06 Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning Abdullah Akgül et.al. 2406.04088 null
2024-06-06 Bootstrapping Expectiles in Reinforcement Learning Pierre Clavier et.al. 2406.04081 null
2024-06-06 Spatio-temporal Early Prediction based on Multi-objective Reinforcement Learning Wei Shao et.al. 2406.04035 link
2024-06-06 Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents Yoann Poupart et.al. 2406.04028 link
2024-06-06 HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning Quentin Delfosse et.al. 2406.03997 link
2024-06-05 Automating Turkish Educational Quiz Generation Using Large Language Models Kamyar Zeinalipour et.al. 2406.03397 null
2024-06-05 LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback Timon Ziegenbein et.al. 2406.03363 link
2024-06-05 UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning Yu Zhang et.al. 2406.03324 null
2024-06-05 Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning Mohamed Elsayed et.al. 2406.03276 null
2024-06-05 Prompt-based Visual Alignment for Zero-shot Policy Transfer Haihan Gao et.al. 2406.03250 null
2024-06-05 Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning Inwoo Hwang et.al. 2406.03234 link
2024-06-05 CommonPower: Supercharging Machine Learning for Smart Grids Michael Eichelbeck et.al. 2406.03231 link
2024-06-05 Object Manipulation in Marine Environments using Reinforcement Learning Ahmed Nader et.al. 2406.03223 null
2024-06-05 Adaptive Distance Functions via Kelvin Transformation Rafael I. Cabral Muchacho et.al. 2406.03200 null
2024-06-05 DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays Bo Xia et.al. 2406.03102 null
2024-06-04 RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Soroush Nasiriany et.al. 2406.02523 link
2024-06-04 Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs Filippo Valdettaro et.al. 2406.02456 null
2024-06-04 A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies Md Mirajul Islam et.al. 2406.02450 null
2024-06-04 Algorithmic Collusion in Dynamic Pricing with Deep Reinforcement Learning Shidi Deng et.al. 2406.02437 null
2024-06-04 Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Philip Anastassiou et.al. 2406.02430 link
2024-06-04 Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning Jiaxu Wang et.al. 2406.02370 null
2024-06-04 How to Explore with Belief: State Entropy Maximization in POMDPs Riccardo Zamboni et.al. 2406.02295 null
2024-06-04 Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling Arthur Müller et.al. 2406.02294 null
2024-06-04 Test-Time Regret Minimization in Meta Reinforcement Learning Mirco Mutti et.al. 2406.02282 null
2024-06-04 Reinforcement Learning with Lookahead Information Nadav Merlis et.al. 2406.02258 null
2024-05-31 Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Tengyang Xie et.al. 2405.21046 null
2024-05-31 Direct Alignment of Language Models via Quality-Aware Self-Refinement Runsheng Yu et.al. 2405.21040 null
2024-06-03 Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles Jiesong Lian et.al. 2405.21027 null
2024-05-31 Generating Triangulations and Fibrations with Reinforcement Learning Per Berglund et.al. 2405.21017 null
2024-05-31 Bayesian Design Principles for Offline-to-Online Reinforcement Learning Hao Hu et.al. 2405.20984 null
2024-05-31 Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring Prasoon Raghuwanshi et.al. 2405.20983 null
2024-05-31 SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales Tianyang Xu et.al. 2405.20974 link
2024-05-31 Amortizing intractable inference in diffusion models for vision, language, and control Siddarth Venkatraman et.al. 2405.20971 link
2024-05-31 Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation Shangding Gu et.al. 2405.20860 null
2024-05-31 Improving Reward Models with Synthetic Critiques Zihuiwen Ye et.al. 2405.20850 null
2024-05-30 Group Robust Preference Optimization in Reward-free RLHF Shyam Sundhar Ramesh et.al. 2405.20304 link
2024-05-30 Evaluating Large Language Model Biases in Persona-Steered Generation Andy Liu et.al. 2405.20253 link
2024-05-30 InstructionCP: A fast approach to transfer Large Language Models into target language Kuang-Ming Chen et.al. 2405.20175 null
2024-05-30 Enhancing Battlefield Awareness: An Aerial RIS-assisted ISAC System with Deep Reinforcement Learning Hyunsang Cho et.al. 2405.20168 null
2024-05-30 Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation Wooseong Cho et.al. 2405.20165 null
2024-05-30 NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models Kai Wu et.al. 2405.20081 null
2024-05-30 Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads Avelina Asada Hadji-Kyriacou et.al. 2405.20053 link
2024-05-30 Deep Reinforcement Learning for Intrusion Detection in IoT: A Survey Afrah Gueriani et.al. 2405.20038 null
2024-05-30 Safe Multi-agent Reinforcement Learning with Natural Language Constraints Ziyan Wang et.al. 2405.20018 null
2024-05-30 LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning Hyungho Na et.al. 2405.19998 null
2024-05-29 Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Shenao Zhang et.al. 2405.19332 link
2024-05-29 Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Shicong Cen et.al. 2405.19320 null
2024-05-29 Robust Preference Optimization through Reward Model Distillation Adam Fisch et.al. 2405.19316 null
2024-05-29 Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels Abhay Deshpande et.al. 2405.19307 null
2024-05-29 Act Natural! Projecting Autonomous System Trajectories Into Naturalistic Behavior Sets Hamzah I. Khan et.al. 2405.19292 null
2024-05-29 Rich-Observation Reinforcement Learning with Continuous Latent Dynamics Yuda Song et.al. 2405.19269 null
2024-05-29 Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach Amir Hossein Karbasi et.al. 2405.19236 null
2024-05-29 Diffusion-based Dynamics Models for Long-Horizon Rollout in Offline Reinforcement Learning Hanye Zhao et.al. 2405.19189 null
2024-05-29 Conditional Latent ODEs for Motion Prediction in Autonomous Driving Khang Truong Giang et.al. 2405.19183 null
2024-05-29 A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning Arthur Juliani et.al. 2405.19153 null
2024-05-28 Hierarchical World Models as Visual Whole-Body Humanoid Controllers Nicklas Hansen et.al. 2405.18418 null
2024-05-28 Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study Shreyas Bhat et.al. 2405.18324 null
2024-05-28 Highway Reinforcement Learning Yuhui Wang et.al. 2405.18289 null
2024-05-28 Extreme Value Monte Carlo Tree Search Masataro Asai et.al. 2405.18248 null
2024-05-28 Recurrent Natural Policy Gradient for POMDPs Semih Cayci et.al. 2405.18221 null
2024-05-28 Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving Zhi Zheng et.al. 2405.18209 link
2024-05-28 Mutation-Bias Learning in Games Johann Bauer et.al. 2405.18190 null
2024-05-28 Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding Daniel Bethell et.al. 2405.18180 link
2024-05-28 Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing Wei Zhao et.al. 2405.18166 link
2024-05-28 PyTAG: Tabletop Games for Multi-Agent Reinforcement Learning Martin Balla et.al. 2405.18123 link
2024-05-27 A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning Abdulaziz Almuzairee et.al. 2405.17416 null
2024-05-27 Rethinking Transformers in Solving POMDPs Chenhao Lu et.al. 2405.17358 link
2024-05-27 Opinion-Guided Reinforcement Learning Kyanna Dagenais et.al. 2405.17287 null
2024-05-27 DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems Zhi Zheng et.al. 2405.17272 link
2024-05-27 Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning Adriana Hugessen et.al. 2405.17243 null
2024-05-27 InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning Guozheng Li et.al. 2405.17229 null
2024-05-27 Learning Generic and Dynamic Locomotion of Humanoids Across Discrete Terrains Shangqun Yu et.al. 2405.17227 null
2024-05-27 Flow control of three-dimensional cylinders transitioning to turbulence via multi-agent reinforcement learning P. Suárez et.al. 2405.17210 null
2024-05-27 CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control Jingqing Ruan et.al. 2405.17152 link
2024-05-27 Q-value Regularized Transformer for Offline Reinforcement Learning Shengchao Hu et.al. 2405.17098 null
2024-05-24 Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment Hao Sun et.al. 2405.15624 null
2024-05-24 Neuromorphic dreaming: A pathway to efficient learning in artificial agents Ingo Blakowski et.al. 2405.15616 null
2024-05-24 OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code Maxence Faldor et.al. 2405.15568 link
2024-05-24 Learning Generalizable Human Motion Generator with Reinforcement Learning Yunyao Mao et.al. 2405.15541 null
2024-05-24 Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces Angeliki Kamoutsi et.al. 2405.15509 null
2024-05-24 Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments Olivia Jullian Parra et.al. 2405.15508 null
2024-05-24 TD3 Based Collision Free Motion Planning for Robot Navigation Hao Liu et.al. 2405.15460 null
2024-05-24 Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics David Boetius et.al. 2405.15430 null
2024-05-24 Model-free reinforcement learning with noisy actions for automated experimental control in optics Lea Richtmann et.al. 2405.15421 null
2024-05-24 Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate Fan-Ming Luo et.al. 2405.15384 null
2024-05-23 Privileged Sensing Scaffolds Reinforcement Learning Edward S. Hu et.al. 2405.14853 null
2024-05-23 Axioms for AI Alignment from Human Feedback Luise Ge et.al. 2405.14758 null
2024-05-23 AGILE: A Novel Framework of LLM Agents Peiyuan Feng et.al. 2405.14751 link
2024-05-23 Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence Minheng Xiao et.al. 2405.14749 null
2024-05-23 SimPO: Simple Preference Optimization with a Reference-Free Reward Yu Meng et.al. 2405.14734 link
2024-05-23 Multi-turn Reinforcement Learning from Preference Human Feedback Lior Shani et.al. 2405.14655 null
2024-05-23 Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models Jingyi Chen et.al. 2405.14632 null
2024-05-23 Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences Takuya Hiraoka et.al. 2405.14629 null
2024-05-23 Closed-form Symbolic Solutions: A New Perspective on Solving Partial Differential Equations Shu Wei et.al. 2405.14620 null
2024-05-23 Discretization of continuous input spaces in the hippocampal autoencoder Adrian F. Amil et.al. 2405.14600 null
2024-05-21 Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale Shriram Chennakesavalu et.al. 2405.12961 null
2024-05-21 Effect of Synthetic Jets Actuator Parameters on Deep Reinforcement Learning-Based Flow Control Performance in a Square Cylinder Wang Jia et.al. 2405.12834 null
2024-05-21 Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones Jan-Hendrik Ewers et.al. 2405.12800 null
2024-05-21 Generative AI and Large Language Models for Cyber Security: All Insights You Need Mohamed Amine Ferrag et.al. 2405.12750 null
2024-05-21 Reinforcement Learning Enabled Peer-to-Peer Energy Trading for Dairy Farms Mian Ibad Ali Shah et.al. 2405.12716 null
2024-05-21 A Multimodal Learning-based Approach for Autonomous Landing of UAV Francisco Neves et.al. 2405.12681 null
2024-05-21 Learning Causal Dynamics Models in Object-Oriented Environments Zhongwei Yu et.al. 2405.12615 null
2024-05-21 PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation Yuhua Zhu et.al. 2405.12535 null
2024-05-21 GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems Zhenwei Wang et.al. 2405.12475 null
2024-05-21 Physics-based Scene Layout Generation from Human Motion Jianan Li et.al. 2405.12460 null
2024-05-20 Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? Yang Dai et.al. 2405.12094 null
2024-05-20 PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation Zhuobin Huang et.al. 2405.12079 null
2024-05-20 Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning Hai Zhang et.al. 2405.12001 null
2024-05-20 Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space Qianmei Liu et.al. 2405.11982 null
2024-05-20 A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers Tom Roth et.al. 2405.11904 null
2024-05-20 Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process Ermo Hua et.al. 2405.11870 link
2024-05-20 Reward-Punishment Reinforcement Learning with Maximum Entropy Jiexin Wang et.al. 2405.11784 null
2024-05-20 Efficient Multi-agent Reinforcement Learning by Planning Qihan Liu et.al. 2405.11778 link
2024-05-20 Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning Xin Liu et.al. 2405.11740 null
2024-05-20 Highway Graph to Accelerate Reinforcement Learning Zidu Yin et.al. 2405.11727 link
2024-05-17 Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review Hongyi Yang et.al. 2405.10883 null
2024-05-17 Automated Radiology Report Generation: A Review of Recent Advances Phillip Sloan et.al. 2405.10842 null
2024-05-17 Combining Teacher-Student with Representation Learning: A Concurrent Teacher-Student Reinforcement Learning Paradigm for Legged Locomotion Hongxi Wang et.al. 2405.10830 null
2024-05-17 Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities Hao Zhou et.al. 2405.10825 null
2024-05-17 A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization Andrzej Ruszczyński et.al. 2405.10815 null
2024-05-17 SignLLM: Sign Languages Production Large Language Models Sen Fang et.al. 2405.10718 null
2024-05-17 Sample-Efficient Constrained Reinforcement Learning with General Parameterization Washim Uddin Mondal et.al. 2405.10624 null
2024-05-17 An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems Jiyue Tao et.al. 2405.10576 null
2024-05-17 Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage Control Jaeik Jeong et.al. 2405.10536 null
2024-05-17 Towards Better Question Generation in QA-Based Event Extraction Zijin Hong et.al. 2405.10517 null
2024-05-16 Stochastic Q-learning for Large Discrete Action Spaces Fares Fourati et.al. 2405.10310 null
2024-05-16 Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning Yuexiang Zhai et.al. 2405.10292 null
2024-05-16 Keep It Private: Unsupervised Privatization of Online Text Calvin Bao et.al. 2405.10260 link
2024-05-16 A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy Zhaoxing Li et.al. 2405.10214 null
2024-05-16 Continuous Transfer Learning for UAV Communication-aware Trajectory Design Chenrui Sun et.al. 2405.10087 null
2024-05-16 Optimizing Search and Rescue UAV Connectivity in Challenging Terrain through Multi Q-Learning Mohammed M. H. Qazzaz et.al. 2405.10042 null
2024-05-16 Reward Centering Abhishek Naik et.al. 2405.09999 null
2024-05-16 Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning Francisco Leiva et.al. 2405.09760 null
2024-05-16 NIFTY Financial News Headlines Dataset Raeid Saqur et.al. 2405.09747 null
2024-05-15 Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning Sihan Zeng et.al. 2405.09660 null
2024-05-15 Reinforcement Learning-Based Framework for the Intelligent Adaptation of User Interfaces Daniel Gaspar-Figueiredo et.al. 2405.09255 null
2024-05-15 DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State Representation Jingwen Yang et.al. 2405.09163 null
2024-05-15 CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving Dechen Gao et.al. 2405.09111 null
2024-05-15 Chaos-based reinforcement learning with TD3 Toshitaka Matsuki et.al. 2405.09086 null
2024-05-15 Deep Learning in Earthquake Engineering: A Comprehensive Review Yazhou Xie et.al. 2405.09021 null
2024-05-14 Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language Jan Kaiser et.al. 2405.08888 null
2024-05-14 Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes Samuel Tesfazgi et.al. 2405.08756 null
2024-05-14 Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach Urvij Saroliya et.al. 2405.08754 null
2024-05-14 Reinformer: Max-Return Sequence Modeling for offline RL Zifeng Zhuang et.al. 2405.08740 null
2024-05-14 I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning Yashuai Yan et.al. 2405.08726 null
2024-05-15 Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search Planning Jan-Hendrik Ewers et.al. 2405.08691 null
2024-05-14 A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning Matteo Cederle et.al. 2405.08655 link
2024-05-14 vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement Yiwen Zhu et.al. 2405.08638 null
2024-05-14 Optimizing Deep Reinforcement Learning for American Put Option Hedging Reilly Pickard et.al. 2405.08602 null
2024-05-14 Python-Based Reinforcement Learning on Simulink Models Georg Schäfer et.al. 2405.08567 null
2024-05-14 Growing Artificial Neural Networks for Control: the Role of Neuronal Diversity Eleni Nisioti et.al. 2405.08510 null
2024-05-13 Hierarchical Decision Mamba André Correia et.al. 2405.07943 link
2024-05-13 RLHF Workflow: From Reward Modeling to Online RLHF Hanze Dong et.al. 2405.07863 link
2024-05-13 Adaptive Exploration for Data-Efficient General Value Function Evaluations Arushi Jain et.al. 2405.07838 null
2024-05-13 Fixed Point Theory Analysis of a Lambda Policy Iteration with Randomization for the Ćirić Contraction Operator Abdelkader Belhenniche et.al. 2405.07824 null
2024-05-13 Hamiltonian-based Quantum Reinforcement Learning for Neural Combinatorial Optimization Georg Kruse et.al. 2405.07790 null
2024-05-13 Hype or Heuristic? Quantum Reinforcement Learning for Join Order Optimisation Maja Franz et.al. 2405.07770 null
2024-05-13 CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian Optimization Wei-Ting Tang et.al. 2405.07760 null
2024-05-13 MADRL-Based Rate Adaptation for 360 $\degree$ Video Streaming with Multi-Viewpoint Prediction Haopeng Wang et.al. 2405.07759 null
2024-05-13 Neural Network Compression for Reinforcement Learning Tasks Dmitry A. Ivanov et.al. 2405.07748 null
2024-05-13 Backdoor Removal for Generative Large Language Models Haoran Li et.al. 2405.07667 null
2024-05-10 Value Augmented Sampling for Language Model Alignment and Personalization Seungwook Han et.al. 2405.06639 link
2024-05-10 EcoEdgeTwin: Enhanced 6G Network via Mobile Edge Computing and Digital Twin Integration Synthia Hossain Karobi et.al. 2405.06507 null
2024-05-10 Advantageous and disadvantageous inequality aversion can be taught through vicarious learning of others’ preferences Shen Zhang et.al. 2405.06500 null
2024-05-10 Contextual Affordances for Safe Exploration in Robotic Scenarios William Z. Ye et.al. 2405.06422 null
2024-05-10 Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs Davide Maran et.al. 2405.06363 null
2024-05-10 Learning Latent Dynamic Robust Representations for World Models Ruixiang Sun et.al. 2405.06263 link
2024-05-10 Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning Xiaoyu Wen et.al. 2405.06192 link
2024-05-10 (A Partial Survey of) Decentralized, Cooperative Multi-Agent Reinforcement Learning Christopher Amato et.al. 2405.06161 null
2024-05-09 An RNN-policy gradient approach for quantum architecture search Gang Wang et.al. 2405.05892 null
2024-05-09 Safe Exploration Using Bayesian World Models and Log-Barrier Optimization Yarden As et.al. 2405.05890 null
2024-05-09 ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers Liangliang Chen et.al. 2405.05861 null
2024-05-09 Policy Gradient with Active Importance Sampling Matteo Papini et.al. 2405.05630 null
2024-05-09 An Automatic Prompt Generation System for Tabular Data Tasks Ashlesha Akella et.al. 2405.05618 null
2024-05-09 Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning Yuchen Shi et.al. 2405.05542 link
2024-05-08 Model-Free Robust $φ$ -Divergence Reinforcement Learning Using Both Offline and Online Data Kishan Panaganti et.al. 2405.05468 null
2024-05-08 Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management Gang Hu et.al. 2405.05449 null
2024-05-08 Learning to Play Pursuit-Evasion with Dynamic and Sensor Constraints Burak M. Gonultas et.al. 2405.05372 null
2024-05-08 Offline Model-Based Optimization via Policy-Guided Gradient Search Yassine Chemingui et.al. 2405.05349 link
2024-05-08 Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models Aylin Gunal et.al. 2405.05060 null
2024-05-08 Fault Identification Enhancement with Reinforcement Learning (FIERL) Valentina Zaccaria et.al. 2405.04938 link
2024-05-07 RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes Kyle Stachowicz et.al. 2405.04714 null
2024-05-07 Proximal Policy Optimization with Adaptive Exploration Andrei Lixandru et.al. 2405.04664 null
2024-05-07 ACEGEN: Reinforcement learning of generative chemical agents for drug discovery Albert Bou et.al. 2405.04657 link
2024-05-07 TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters Jonathan Wilder Lavington et.al. 2405.04491 null
2024-05-07 Designing, Developing, and Validating Network Intelligence for Scaling in Service-Based Architectures based on Deep Reinforcement Learning Paola Soto et.al. 2405.04441 null
2024-05-08 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI et.al. 2405.04434 link
2024-05-07 The Curse of Diversity in Ensemble-Based Exploration Zhixuan Lin et.al. 2405.04342 link
2024-05-07 Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation Atharvan Dogra et.al. 2405.04325 null
2024-05-07 Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies Paul Templier et.al. 2405.04322 null
2024-05-07 Improving Offline Reinforcement Learning with Inaccurate Simulators Yiwen Hou et.al. 2405.04307 null
2024-05-07 Deep Reinforcement Learning for Multi-User RF Charging with Non-linear Energy Harvesters Amirhossein Azarbahram et.al. 2405.04218 null
2024-05-07 In-context Learning for Automated Driving Scenarios Ziqi Zhou et.al. 2405.04135 null
2024-05-07 Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning Chunlin Tian et.al. 2405.04122 null
2024-05-06 $ε$ -Policy Gradient for Online Pricing Lukasz Szpruch et.al. 2405.03624 null
2024-05-06 Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions Xingyou Song et.al. 2405.03547 null
2024-05-06 ReinWiFi: A Reinforcement-Learning-Based Framework for the Application-Layer QoS Optimization of WiFi Networks Qianren Li et.al. 2405.03526 null
2024-05-06 Robotic Constrained Imitation Learning for the Peg Transfer Task in Fundamentals of Laparoscopic Surgery Kento Kawaharazuka et.al. 2405.03440 null
2024-05-06 Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning Stone Tao et.al. 2405.03379 null
2024-05-06 Enhancing Q-Learning with Large Language Model Heuristics Xiefeng Wu et.al. 2405.03341 null
2024-05-06 Artificial Intelligence in the Autonomous Navigation of Endovascular Interventions: A Systematic Review Harry Robertshaw et.al. 2405.03305 null
2024-05-06 End-to-End Reinforcement Learning of Curative Curtailment with Partial Measurement Availability Hinrikus Wolf et.al. 2405.03262 null
2024-05-06 Federated Reinforcement Learning with Constraint Heterogeneity Hao Jin et.al. 2405.03236 null
2024-05-06 Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning Caleb Chuck et.al. 2405.03113 null
2024-05-03 Geometric Fabrics: a Safe Guiding Medium for Policy Learning Karl Van Wyk et.al. 2405.02250 null
2024-05-03 Learning Optimal Deterministic Policies with Stochastic Policy Gradients Alessandro Montenegro et.al. 2405.02235 null
2024-05-03 The Cambridge RoboMaster: An Agile Multi-Robot Research Platform Jan Blumenkamp et.al. 2405.02198 null
2024-05-03 Imitation Learning in Discounted Linear MDPs without exploration assumptions Luca Viano et.al. 2405.02181 null
2024-05-03 Simulating the economic impact of rationality through reinforcement learning and agent-based modelling Simone Brusatin et.al. 2405.02161 null
2024-05-03 Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach Anton Plaksin et.al. 2405.02044 null
2024-05-03 Model-based reinforcement learning for protein backbone design Frederic Renard et.al. 2405.01983 null
2024-05-03 Rescale-Invariant Federated Reinforcement Learning for Resource Allocation in V2X Networks Kaidi Xu et.al. 2405.01961 null
2024-05-03 Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization Changliang Zhou et.al. 2405.01906 null
2024-05-03 Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants Francesco Maldonato et.al. 2405.01889 link
2024-05-02 Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks Murtaza Dalal et.al. 2405.01534 null
2024-05-02 FLAME: Factuality-Aware Alignment for Large Language Models Sheng-Chieh Lin et.al. 2405.01525 null
2024-05-02 NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment Gerald Shen et.al. 2405.01481 link
2024-05-02 IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning Ryan Hoque et.al. 2405.01472 null
2024-05-02 Goal-conditioned reinforcement learning for ultrasound navigation guidance Abdoul Aziz Amadou et.al. 2405.01409 null
2024-05-02 Learning Force Control for Legged Manipulation Tifanny Portela et.al. 2405.01402 null
2024-05-02 Constrained Reinforcement Learning Under Model Mismatch Zhongchang Sun et.al. 2405.01327 null
2024-05-02 Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network Hyeonsu Lyu et.al. 2405.01314 null
2024-05-02 Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning Liu Qiyuan et.al. 2405.01284 null
2024-05-02 Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation Hao Wang et.al. 2405.01280 null
2024-05-01 Self-Play Preference Optimization for Language Model Alignment Yue Wu et.al. 2405.00675 null
2024-05-01 No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO Skander Moalla et.al. 2405.00662 link
2024-05-01 HUGO – Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach Malte Lehna et.al. 2405.00629 null
2024-05-01 Koopman-based Deep Learning for Nonlinear System Estimation Zexin Sun et.al. 2405.00627 null
2024-05-01 Queue-based Eco-Driving at Roundabouts with Reinforcement Learning Anna-Lena Schlamp et.al. 2405.00625 null
2024-05-01 The Real, the Better: Aligning Large Language Models with Online Human Behaviors Guanying Jiang et.al. 2405.00578 null
2024-05-01 Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment Zhili Liu et.al. 2405.00557 null
2024-05-01 Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning Lucas-Andreï Thil et.al. 2405.00516 null
2024-05-01 MetaRM: Shifted Distributions Alignment via Meta-Learning Shihan Dou et.al. 2405.00438 null
2024-05-01 UCB-driven Utility Function Search for Multi-objective Reinforcement Learning Yucheng Shi et.al. 2405.00410 link
2024-04-30 Collaborative Control Method of Transit Signal Priority Based on Cooperative Game and Reinforcement Learning Hao Qin et.al. 2404.19683 null
2024-04-30 Towards Generalist Robot Learning from Internet Video: A Survey Robert McCarthy et.al. 2404.19664 null
2024-04-30 Short term vs. long term: optimization of microswimmer navigation on different time horizons Navid Mousavi et.al. 2404.19561 null
2024-04-30 Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation Cengis Hasan et.al. 2404.19462 null
2024-04-30 Imitation Learning: A Survey of Learning Methods, Environments and Metrics Nathan Gavenski et.al. 2404.19456 null
2024-04-30 Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning Mathieu Rita et.al. 2404.19409 link
2024-04-30 Numeric Reward Machines Kristina Levina et.al. 2404.19370 null
2024-04-30 Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning Chenjia Bai et.al. 2404.19346 link
2024-04-30 Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning Qiaosheng Zhang et.al. 2404.19292 null
2024-04-30 DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets Xiaoyu Huang et.al. 2404.19264 null
2024-04-29 DPO Meets PPO: Reinforced Token Optimization for RLHF Han Zhong et.al. 2404.18922 null
2024-04-29 Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty Laixi Shi et.al. 2404.18909 null
2024-04-29 Overcoming Knowledge Barriers: Online Imitation Learning from Observation with Pretrained World Models Xingyuan Zhang et.al. 2404.18896 null
2024-04-29 More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness Aaron J. Li et.al. 2404.18870 link
2024-04-29 Performance-Aligned LLMs for Generating Fast Code Daniel Nichols et.al. 2404.18864 null
2024-04-29 PlanNetX: Learning an Efficient Neural Network Planner from MPC for Longitudinal Control Jasper Hoffmann et.al. 2404.18863 null
2024-04-30 Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization Qi Zhang et.al. 2404.18826 null
2024-04-29 Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies Seyed Soroush Karimi Madahi et.al. 2404.18821 null
2024-04-29 Multi-Agent Synchronization Tasks Rolando Fernandez et.al. 2404.18798 null
2024-04-29 Resource-rational reinforcement learning and sensorimotor causal states Sarah Marzen et.al. 2404.18775 null
2024-04-26 Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo Stephen Zhao et.al. 2404.17546 null
2024-04-26 Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations Puhao Li et.al. 2404.17521 link
2024-04-26 Quantum Multi-Agent Reinforcement Learning for Aerial Ad-hoc Networks Theodora-Augustina Drăgan et.al. 2404.17499 null
2024-04-26 Q-Learning to navigate turbulence without a map Marco Rando et.al. 2404.17495 null
2024-04-26 Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning Hao Liu et.al. 2404.17379 null
2024-04-26 When to Trust LLMs: Aligning Confidence with Response Quality Shuchang Tao et.al. 2404.17287 null
2024-04-26 Enhancing Privacy and Security of Autonomous UAV Navigation Vatsal Aggarwal et.al. 2404.17225 null
2024-04-26 Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving C. Gong et.al. 2404.17198 null
2024-04-26 An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging Sadjad Anzabi Zadeh et.al. 2404.17187 null
2024-04-25 Compiler for Distributed Quantum Computing: a Reinforcement Learning Approach Panagiotis Promponas et.al. 2404.17077 null
2024-04-25 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao et.al. 2404.16767 null
2024-04-25 Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods Min Kyu Shin et.al. 2404.16721 null
2024-04-25 RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments Diego Martinez-Baselga et.al. 2404.16672 null
2024-04-25 Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare Emre Can Acikgoz et.al. 2404.16621 null
2024-04-25 Exploring the Dynamics of Data Transmission in 5G Networks: A Conceptual Analysis Nikita Smirnov et.al. 2404.16508 null
2024-04-25 Leveraging Pretrained Latent Representations for Few-Shot Imitation Learning on a Dexterous Robotic Hand Davide Liconti et.al. 2404.16483 null
2024-04-25 A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints Bram De Cooman et.al. 2404.16468 null
2024-04-25 Offline Reinforcement Learning with Behavioral Supervisor Tuning Padmanaba Srinivasan et.al. 2404.16399 null
2024-04-25 SwarmRL: Building the Future of Smart Active Systems Samuel Tovey et.al. 2404.16388 link
2024-04-25 Reinforcement Learning with Generative Models for Compact Support Sets Nico Schiavone et.al. 2404.16300 link
2024-04-24 DPO: Differential reinforcement learning with application to optimal configuration search Chandrajit Bajaj et.al. 2404.15617 null
2024-04-24 GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL Lang Qin et.al. 2404.15597 null
2024-04-24 Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems Sarah Keren et.al. 2404.15583 null
2024-04-23 An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models Yangchen Pan et.al. 2404.15518 null
2024-04-23 The Power of Resets in Online Reinforcement Learning Zakaria Mhammedi et.al. 2404.15417 null
2024-04-23 Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments Mateus G. Machado et.al. 2404.15410 link
2024-04-23 Reinforcement Learning with Adaptive Control Regularization for Safe Control of Critical Systems Haozhe Tian et.al. 2404.15199 null
2024-04-23 Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation Xun Wu et.al. 2404.15100 null
2024-04-23 Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot Neil Guan et.al. 2404.15096 null
2024-04-23 Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem Raphael Koster et.al. 2404.15059 null
2024-04-23 Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems Xiaoshuang Chen et.al. 2404.14961 null
2024-04-23 Multi-Objective Deep Reinforcement Learning for 5G Base Station Placement to Support Localisation for Future Sustainable Traffic Ahmed Al-Tahmeesschi et.al. 2404.14954 null
2024-04-23 MultiSTOP: Solving Functional Equations with Reinforcement Learning Alessandro Trenta et.al. 2404.14909 null
2024-04-23 Unitary Synthesis of Clifford+T Circuits with Reinforcement Learning Sebastian Rietsch et.al. 2404.14865 null
2024-04-23 Evolutionary Reinforcement Learning via Cooperative Coevolution Chengpeng Hu et.al. 2404.14763 null
2024-04-23 Rank2Reward: Learning Shaped Reward Functions from Passive Video Daniel Yang et.al. 2404.14735 null
2024-04-22 Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data Fahim Tajwar et.al. 2404.14367 link
2024-04-22 PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving Jie Cheng et.al. 2404.14327 null
2024-04-22 Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs David R. Nickel et.al. 2404.14319 null
2024-04-22 LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots Dongge Han et.al. 2404.14285 null
2024-04-22 Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories Ning Yang et.al. 2404.14238 null
2024-04-22 Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems Yiyang Zhu et.al. 2404.14092 null
2024-04-22 Mechanistic Interpretability for AI Safety – A Review Leonard Bereska et.al. 2404.14082 null
2024-04-22 Research on Robot Path Planning Based on Reinforcement Learning Wang Ruiqi et.al. 2404.14077 link
2024-04-22 Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras Mhairi Dunion et.al. 2404.14064 link
2024-04-22 A survey of air combat behavior modeling using machine learning Patrick Ribu Gorton et.al. 2404.13954 null
2024-04-19 Mapping Social Choice Theory to RLHF Jessica Dai et.al. 2404.13038 null
2024-04-19 Deep Reinforcement Learning-Based Active Flow Control of an Elliptical Cylinder: Transitioning from an Elliptical Cylinder to a Circular Cylinder and a Flat Plate Wang Jia et.al. 2404.13003 null
2024-04-19 Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning Lisheng Wu et.al. 2404.12999 null
2024-04-19 MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering Avinash Anand et.al. 2404.12926 null
2024-04-19 Zero-Shot Stitching in Reinforcement Learning using Relative Representations Antonio Pio Ricciardi et.al. 2404.12917 null
2024-04-19 MAexp: A Generic Platform for RL-based Multi-Agent Exploration Shaohao Zhu et.al. 2404.12824 link
2024-04-19 Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation Qiang He et.al. 2404.12754 link
2024-04-19 Demonstration of quantum projective simulation on a single-photon-based quantum computer Giacomo Franceschetto et.al. 2404.12729 null
2024-04-19 Energy Conserved Failure Detection for NS-IoT Systems Guojin Liu et.al. 2404.12713 null
2024-04-19 Single-Task Continual Offline Reinforcement Learning Sibo Gai et.al. 2404.12639 null
2024-04-18 From $r$ to $Q^*$ : Your Language Model is Secretly a Q-Function Rafael Rafailov et.al. 2404.12358 null
2024-04-18 Improving the interpretability of GNN predictions through conformal-based graph sparsification Pablo Sanchez-Martin et.al. 2404.12356 link
2024-04-18 Practical Considerations for Discrete-Time Implementations of Continuous-Time Control Barrier Function-Based Safety Filters Lukas Brunke et.al. 2404.12329 null
2024-04-18 ASID: Active Exploration for System Identification in Robotic Manipulation Marius Memmel et.al. 2404.12308 null
2024-04-18 RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective Chenxi Wang et.al. 2404.12281 null
2024-04-18 Privacy-Preserving UCB Decision Process Verification via zk-SNARKs Xikun Jiang et.al. 2404.12186 null
2024-04-18 Aligning language models with human preferences Tomasz Korbak et.al. 2404.12150 link
2024-04-19 Robust and Adaptive Deep Reinforcement Learning for Enhancing Flow Control around a Square Cylinder with Varying Reynolds Numbers Wang Jia et.al. 2404.12123 null
2024-04-18 X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner Haoyuan Jiang et.al. 2404.12090 link
2024-04-18 Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning Hyunwoo Park et.al. 2404.12079 null
2024-04-17 Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding Zezhong Fan et.al. 2404.11589 null
2024-04-17 Deep Policy Optimization with Temporal Logic Constraints Ameesh Shah et.al. 2404.11578 null
2024-04-17 Spatio-Temporal Motion Retargeting for Quadruped Robots Taerim Yoon et.al. 2404.11557 null
2024-04-17 VC Theory for Inventory Policies Yaqi Xie et.al. 2404.11509 null
2024-04-17 Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem Bowen Fang et.al. 2404.11458 null
2024-04-17 What-if Analysis Framework for Digital Twins in 6G Wireless Network Management Elif Ak et.al. 2404.11394 null
2024-04-17 Convergence of Policy Gradient for Stochastic Linear-Quadratic Control Problem in Infinite Horizon Xinpei Zhang et.al. 2404.11382 null
2024-04-17 Following the Human Thread in Social Navigation Luca Scofano et.al. 2404.11327 link
2024-04-17 On Learning Parities with Dependent Noise Noah Golowich et.al. 2404.11325 null
2024-04-17 Physics-informed Actor-Critic for Coordination of Virtual Inertia from Power Distribution Systems Simon Stock et.al. 2404.11149 null
2024-04-16 Settling Constant Regrets in Linear Markov Decision Processes Weitong Zhang et.al. 2404.10745 null
2024-04-16 N-Agent Ad Hoc Teamwork Caroline Wang et.al. 2404.10740 null
2024-04-16 Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration Benjamin A Newman et.al. 2404.10733 null
2024-04-16 Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning Hao-Lun Hsu et.al. 2404.10728 null
2024-04-16 Automatic re-calibration of quantum devices by reinforcement learning T. Crosta et.al. 2404.10726 null
2024-04-16 Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Shusheng Xu et.al. 2404.10719 null
2024-04-16 Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning David Winkel et.al. 2404.10683 null
2024-04-16 SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation Chang Chen et.al. 2404.10675 null
2024-04-16 Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay Jinmei Liu et.al. 2404.10662 link
2024-04-16 Trajectory Planning using Reinforcement Learning for Interactive Overtaking Maneuvers in Autonomous Racing Scenarios Levent Ögretmen et.al. 2404.10658 null
2024-04-15 Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model Hyunsoo Cho et.al. 2404.09717 null
2024-04-15 Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning Linjie Xu et.al. 2404.09715 null
2024-04-15 Learn Your Reference Model for Real Good Alignment Alexey Gorbatovski et.al. 2404.09656 null
2024-04-15 Reliability Estimation of News Media Sources: Birds of a Feather Flock Together Sergio Burdisso et.al. 2404.09565 null
2024-04-15 Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning Tidiane Camaret Ndir et.al. 2404.09521 link
2024-04-14 Correlated Mean Field Imitation Learning Zhiyu Zhao et.al. 2404.09324 null
2024-04-14 Egret: Reinforcement Mechanism for Sequential Computation Offloading in Edge Computing Haosong Peng et.al. 2404.09285 null
2024-04-14 A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs Elliot Kolker-Hicks et.al. 2404.09264 null
2024-04-14 Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts Jing-Cheng Pang et.al. 2404.09248 null
2024-04-14 Advanced Intelligent Optimization Algorithms for Multi-Objective Optimal Power Flow in Future Power Systems: A Review Yuyan Li et.al. 2404.09203 null
2024-04-12 Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation Hanlin Tian et.al. 2404.08570 null
2024-04-12 RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs Shreyas Chaudhari et.al. 2404.08555 null
2024-04-12 Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement Lucas Murray et.al. 2404.08523 null
2024-04-12 Adversarial Imitation Learning via Boosting Jonathan D. Chang et.al. 2404.08513 null
2024-04-12 Prescribing Optimal Health-Aware Operation for Urban Air Mobility with Deep Reinforcement Learning Mina Montazeri et.al. 2404.08497 null
2024-04-12 Dataset Reset Policy Optimization for RLHF Jonathan D. Chang et.al. 2404.08495 link
2024-04-12 Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing Cui Zhang et.al. 2404.08444 null
2024-04-12 SIR-RL: Reinforcement Learning for Optimized Policy Control during Epidemiological Outbreaks in Emerging Market and Developing Economies Maeghal Jain et.al. 2404.08423 null
2024-04-12 TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability Shiwei Lian et.al. 2404.08353 null
2024-04-12 Agile and versatile bipedal robot tracking control through reinforcement learning Jiayi Li et.al. 2404.08246 null
2024-04-11 High-Dimension Human Value Representation in Large Language Models Samuel Cahyawijaya et.al. 2404.07900 null
2024-04-11 Data-Driven System Identification of Quadrotors Subject to Motor Delays Jonas Eschmann et.al. 2404.07837 null
2024-04-11 On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning Giuseppe Canonaco et.al. 2404.07826 null
2024-04-11 An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization Minshuo Chen et.al. 2404.07771 null
2024-04-11 Differentially Private Reinforcement Learning with Self-Play Dan Qiao et.al. 2404.07559 null
2024-04-11 Enhancing Policy Gradient with the Polyak Step-Size Adaption Yunxiang Li et.al. 2404.07525 null
2024-04-11 Generative Probabilistic Planning for Optimizing Supply Chain Networks Hyung-il Ahn et.al. 2404.07511 null
2024-04-11 Neural Fault Injection: Generating Software Faults from Natural Language Domenico Cotroneo et.al. 2404.07491 null
2024-04-11 Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains Soichiro Nishimori et.al. 2404.07465 null
2024-04-11 UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning Saichao Liu et.al. 2404.07453 null
2024-04-10 Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery Zohre Karimi et.al. 2404.07185 null
2024-04-10 Adaptive behavior with stable synapses Cristiano Capone et.al. 2404.07150 null
2024-04-10 How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models Unnseo Park et.al. 2404.07148 null
2024-04-10 Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection Linas Nasvytis et.al. 2404.07099 link
2024-04-10 Improving Language Model Reasoning with Self-motivated Learning Yunlong Feng et.al. 2404.07017 null
2024-04-10 Agent-driven Generative Semantic Communication for Remote Surveillance Wanting Yang et.al. 2404.06997 null
2024-04-10 Deep Reinforcement Learning for Mobile Robot Path Planning Hao Liu et.al. 2404.06974 null
2024-04-10 UAV-Assisted Enhanced Coverage and Capacity in Dynamic MU-mMIMO IoT Systems: A Deep Reinforcement Learning Approach MohammadMahdi Ghadaksaz et.al. 2404.06726 null
2024-04-10 Dual Ensemble Kalman Filter for Stochastic Optimal Control Anant A. Joshi et.al. 2404.06696 null
2024-04-09 Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective Victor-Alexandru Darvariu et.al. 2404.06492 null
2024-04-09 Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints Hritik Bana et.al. 2404.06423 null
2024-04-09 The Power in Communication: Power Regularization of Communication for Autonomy in Cooperative Multi-Agent Reinforcement Learning Nancirose Piazza et.al. 2404.06387 null
2024-04-09 Policy-Guided Diffusion Matthew Thomas Jackson et.al. 2404.06356 link
2024-04-09 Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning Yanjie Li et.al. 2404.06330 null
2024-04-09 Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning Xudong Yu et.al. 2404.06188 null
2024-04-09 A quantum information theoretic analysis of reinforcement learning-assisted quantum architecture search Abhishek Sadhu et.al. 2404.06174 null
2024-04-09 Adaptable Recovery Behaviors in Robotics: A Behavior Trees and Motion Generators(BTMG) Approach for Failure Management Faseeh Ahmad et.al. 2404.06129 null
2024-04-09 Automatic Configuration Tuning on Cloud Database: A Survey Limeng Zhang et.al. 2404.06043 null
2024-04-09 Commute with Community: Enhancing Shared Travel through Social Networks Tian Siyuan et.al. 2404.05987 null
2024-04-08 Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer Xinyang Gu et.al. 2404.05695 null
2024-04-08 YaART: Yet Another ART Rendering Technology Sergey Kastryulin et.al. 2404.05666 null
2024-04-08 Dynamic Backtracking in GFlowNet: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms Shuai Guo et.al. 2404.05576 null
2024-04-08 Optimal Flow Admission Control in Edge Computing via Safe Reinforcement Learning A. Fox et.al. 2404.05564 null
2024-04-08 Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data Tim Baumgärtner et.al. 2404.05530 null
2024-04-08 CNN-based Game State Detection for a Foosball Table David Hagens et.al. 2404.05357 null
2024-04-08 Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models Yutao Ouyang et.al. 2404.05291 null
2024-04-08 SAFE-GIL: SAFEty Guided Imitation Learning Yusuf Umut Ciftci et.al. 2404.05249 null
2024-04-08 MeSA-DRL: Memory-Enhanced Deep Reinforcement Learning for Advanced Socially Aware Robot Navigation in Crowded Environments Mannan Saeed Muhammad et.al. 2404.05203 null
2024-04-08 Decision Transformer for Wireless Communications: A New Paradigm of Resource Management Jie Zhang et.al. 2404.05199 null
2024-04-05 Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution Tim Seyde et.al. 2404.04253 null
2024-04-05 Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation Lanpei Li et.al. 2404.04219 null
2024-04-05 Enhancing IoT Intelligence: A Transformer-based Reinforcement Learning Methodology Gaith Rjoub et.al. 2404.04205 null
2024-04-05 Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report Jerrod Wigmore et.al. 2404.04106 null
2024-04-05 Dynamic Prompt Optimizing for Text-to-Image Generation Wenyi Mo et.al. 2404.04095 link
2024-04-05 Demonstration Guided Multi-Objective Reinforcement Learning Junlin Lu et.al. 2404.03997 null
2024-04-05 A proximal policy optimization based intelligent home solar management Kode Creer et.al. 2404.03888 null
2024-04-05 Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration Xudong Guo et.al. 2404.03869 null
2024-04-04 Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning Noah Golowich et.al. 2404.03774 null
2024-04-04 A Reinforcement Learning based Reset Policy for CDCL SAT Solvers Chunxiao Li et.al. 2404.03753 null
2024-04-04 AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent Hanyu Lai et.al. 2404.03648 link
2024-04-04 Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention Ziru Liu et.al. 2404.03637 link
2024-04-04 Laser Learning Environment: A new environment for coordination-critical multi-agent tasks Yannick Molinghen et.al. 2404.03596 link
2024-04-04 Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm Miao Lu et.al. 2404.03578 null
2024-04-04 Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity Jake Varley et.al. 2404.03570 null
2024-04-04 AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale Adam Pardyl et.al. 2404.03482 link
2024-04-04 Integrating Hyperparameter Search into GramML Hernán Ceferino Vázquez et.al. 2404.03419 link
2024-04-04 Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought Jooyoung Lee et.al. 2404.03414 null
2024-04-04 SENSOR: Imitate Third-Person Expert’s Behaviors via Active Sensoring Kaichen Huang et.al. 2404.03386 null
2024-04-04 DIDA: Denoised Imitation Learning based on Domain Adaptation Kaichen Huang et.al. 2404.03382 null
2024-04-03 Learning Quadrupedal Locomotion via Differentiable Simulation Clemens Schwarke et.al. 2404.02887 null
2024-04-03 Unsupervised Learning of Effective Actions in Robotics Marko Zaric et.al. 2404.02728 link
2024-04-03 Reinforcement Learning in Categorical Cybernetics Jules Hedges et.al. 2404.02688 null
2024-04-03 Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering Abhijeet Pendyala et.al. 2404.02577 null
2024-04-03 SliceIt! – A Dual Simulator Framework for Learning Robot Food Slicing Cristian C. Beltran-Hernandez et.al. 2404.02569 link
2024-04-03 Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning Yi Shen et.al. 2404.02545 link
2024-04-03 Versatile Scene-Consistent Traffic Scenario Generation as Optimization with Diffusion Zhiyu Huang et.al. 2404.02524 null
2024-04-03 Joint Optimization on Uplink OFDMA and MU-MIMO for IEEE 802.11ax: Deep Hierarchical Reinforcement Learning Approach Hyeonho Noh et.al. 2404.02486 null
2024-04-03 Deep Reinforcement Learning for Traveling Purchaser Problems Haofeng Yuan et.al. 2404.02476 null
2024-04-03 Electric Vehicle Routing Problem for Emergency Power Supply: Towards Telecom Base Station Relief Daisuke Kikuta et.al. 2404.02448 link
2024-04-02 Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL Golnaz Mesbahi et.al. 2404.02113 null
2024-04-02 Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning Samuel Tovey et.al. 2404.01999 null
2024-04-02 VLRM: Vision-Language Models act as Reward Models for Image Captioning Maksim Dzabraev et.al. 2404.01911 null
2024-04-02 Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation Carlos Plou et.al. 2404.01867 null
2024-04-02 Keeping Behavioral Programs Alive: Specifying and Executing Liveness Requirements Tom Yaacov et.al. 2404.01858 null
2024-04-02 EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking Stavros Orfanoudakis et.al. 2404.01849 null
2024-04-02 Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy Kyungbok Lee et.al. 2404.01830 null
2024-04-02 Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid Eric MSP Veith et.al. 2404.01794 null
2024-04-02 Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems Dapeng Zhi et.al. 2404.01769 null
2024-04-02 Asymptotics of Language Model Alignment Joy Qiping Yang et.al. 2404.01730 null
2024-03-29 Learning Visual Quadrupedal Loco-Manipulation from Demonstrations Zhengmao He et.al. 2403.20328 null
2024-03-29 Active flow control of a turbulent separation bubble through deep reinforcement learning Bernat Font et.al. 2403.20295 null
2024-03-29 Functional Bilevel Optimization for Machine Learning Ieva Petrulionyte et.al. 2403.20233 null
2024-03-29 Decentralized Multimedia Data Sharing in IoV: A Learning-based Equilibrium of Supply and Demand Jiani Fan et.al. 2403.20218 null
2024-03-29 Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning Duzhen Zhang et.al. 2403.20163 null
2024-03-29 CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening Hei Yi Mak et.al. 2403.20156 null
2024-03-29 A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles Jiani Fan et.al. 2403.20151 null
2024-03-29 Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation Jinyeong Park et.al. 2403.20109 link
2024-03-29 Reinforcement learning for graph theory, II. Small Ramsey numbers Mohammad Ghebleh et.al. 2403.20055 null
2024-03-29 Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering Yuki Akiyama et.al. 2403.20020 null
2024-03-28 Human-compatible driving partners through data-regularized self-play reinforcement learning Daphne Cornelisse et.al. 2403.19648 link
2024-03-28 Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics Norman Di Palo et.al. 2403.19578 null
2024-03-28 Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment Alireza Ganjdanesh et.al. 2403.19490 null
2024-03-28 Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization Teodor V. Marinov et.al. 2403.19462 null
2024-03-28 RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation Chongkai Gao et.al. 2403.19460 null
2024-03-28 EDA-Driven Preprocessing for SAT Solving Zhengyuan Shi et.al. 2403.19446 null
2024-03-28 Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model Qi Gou et.al. 2403.19443 null
2024-03-28 Fine-Tuning Language Models with Reward Learning on Policy Hao Lang et.al. 2403.19279 link
2024-03-28 Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning Dieter Coppens et.al. 2403.19262 null
2024-03-28 Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning Wei Duan et.al. 2403.19253 null
2024-03-27 Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment Li Siyao et.al. 2403.18811 null
2024-03-27 CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning Elliot Chane-Sane et.al. 2403.18765 null
2024-03-27 Probabilistic Model Checking of Stochastic Reinforcement Learning Policies Dennis Gross et.al. 2403.18725 null
2024-03-27 Fpga-Based Neural Thrust Controller for UAVs Sharif Azem et.al. 2403.18703 null
2024-03-27 Safe and Robust Reinforcement-Learning: Principles and Practice Taku Yamagata et.al. 2403.18539 null
2024-03-27 Bridging the Gap: Regularized Reinforcement Learning for Improved Classical Motion Planning with Safety Modules Elias Goldsztejn et.al. 2403.18524 null
2024-03-27 VersaT2I: Improving Text-to-Image Models with Versatile Reward Jianshu Guo et.al. 2403.18493 null
2024-03-27 Scaling Vision-and-Language Navigation With Offline RL Valay Bundele et.al. 2403.18454 null
2024-03-27 FRESCO: Federated Reinforcement Energy System for Cooperative Optimization Nicolas Mauricio Cuadrado et.al. 2403.18444 null
2024-03-27 Reinforcement learning for graph theory, I. Reimplementation of Wagner’s approach Salem Al-Yakoob et.al. 2403.18429 null
2024-03-26 TractOracle: towards an anatomically-informed reward function for RL-based tractography Antoine Théberge et.al. 2403.17845 null
2024-03-26 Learning the Optimal Power Flow: Environment Design Matters Thomas Wolgast et.al. 2403.17831 link
2024-03-26 Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games Yikuan Yan et.al. 2403.17674 null
2024-03-26 Learning Goal-Directed Object Pushing in Cluttered Scenes with Location-Based Attention Nils Dengler et.al. 2403.17667 null
2024-03-26 Uncertainty-aware Distributional Offline Reinforcement Learning Xiaocong Chen et.al. 2403.17646 null
2024-03-26 PeersimGym: An Environment for Solving the Task Offloading Problem with Reinforcement Learning Frederico Metelo et.al. 2403.17637 null
2024-03-26 Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems Siyu Wang et.al. 2403.17634 null
2024-03-26 LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation Ke Guo et.al. 2403.17601 link
2024-03-26 Towards a Zero-Data, Controllable, Adaptive Dialog System Dirk Väth et.al. 2403.17582 null
2024-03-26 VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts Marius Captari et.al. 2403.17542 null
2024-03-25 An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems Hanqing Yang et.al. 2403.16809 null
2024-03-25 Enhancing Software Effort Estimation through Reinforcement Learning-based Project Management-Oriented Feature Selection Haoyang Chen et.al. 2403.16749 null
2024-03-25 Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization Fernando Acero et.al. 2403.16667 null
2024-03-25 Skill Q-Network: Learning Adaptive Skill Ensemble for Mapless Navigation in Unknown Environments Hyunki Seong et.al. 2403.16664 null
2024-03-25 Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL Osama Ahmad et.al. 2403.16652 null
2024-03-25 CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment Feiteng Fang et.al. 2403.16649 link
2024-03-25 Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications Thao Dang et.al. 2403.16593 null
2024-03-25 Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged Robot Zifan Wang et.al. 2403.16535 link
2024-03-25 Towards Cooperative Maneuver Planning in Mixed Traffic at Urban Intersections Marvin Klimke et.al. 2403.16478 null
2024-03-25 If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions Reza Esfandiarpoor et.al. 2403.16442 link
2024-03-25 Physics-informed RL for Maximal Safety Probability Estimation Hikaru Hoshino et.al. 2403.16391 null
2024-03-25 Learning Action-based Representations Using Invariance Max Rudolph et.al. 2403.16369 null
2024-03-22 Can large language models explore in-context? Akshay Krishnamurthy et.al. 2403.15371 null
2024-03-22 Planning with a Learned Policy Basis to Optimally Solve Complex Tasks Guillermo Infante et.al. 2403.15301 null
2024-03-22 Blockchain-based Pseudonym Management for Vehicle Twin Migrations in Vehicular Edge Metaverse Jiawen Kang et.al. 2403.15285 null
2024-03-22 Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies Nicolò Botteghi et.al. 2403.15267 null
2024-03-22 Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement Jonathan Pirnay et.al. 2403.15180 null
2024-03-22 Subequivariant Reinforcement Learning Framework for Coordinated Motion Control Haoyu Wang et.al. 2403.15100 null
2024-03-22 Improved Long Short-Term Memory-based Wastewater Treatment Simulators for Deep Reinforcement Learning Esmaeel Mohammadi et.al. 2403.15091 null
2024-03-22 Automated Feature Selection for Inverse Reinforcement Learning Daulet Baimukashev et.al. 2403.15079 null
2024-03-22 Testing for Fault Diversity in Reinforcement Learning Quentin Mazouni et.al. 2403.15065 null
2024-03-22 Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation Zhenrui Yue et.al. 2403.14952 null
2024-03-21 Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery Yangchun Zhang et.al. 2403.14593 null
2024-03-21 A Mathematical Introduction to Deep Reinforcement Learning for 5G/6G Applications Farhad Rezazadeh et.al. 2403.14516 null
2024-03-21 Constrained Reinforcement Learning with Smoothed Log Barrier Function Baohe Zhang et.al. 2403.14508 null
2024-03-21 On the continuity and smoothness of the value function in reinforcement learning and optimal control Hans Harder et.al. 2403.14432 null
2024-03-21 Emergent communication and learning pressures in language models: a language evolution perspective Lukas Galke et.al. 2403.14427 null
2024-03-21 Task-optimal data-driven surrogate models for eNMPC via differentiable simulation and optimization Daniel Mayfrank et.al. 2403.14425 null
2024-03-21 A reinforcement learning guided hybrid evolutionary algorithm for the latency location routing problem Yuji Zou et.al. 2403.14405 link
2024-03-21 Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression Fernando Acero et.al. 2403.14328 null
2024-03-21 Bayesian Optimization for Sample-Efficient Policy Improvement in Robotic Manipulation Adrian Röfer et.al. 2403.14305 null
2024-03-21 Reactor Optimization Benchmark by Reinforcement Learning Deborah Schwarcz et.al. 2403.14273 link
2024-03-20 Information-Theoretic Distillation for Reference-less Summarization Jaehun Jung et.al. 2403.13780 null
2024-03-20 Towards Principled Representation Learning from Videos for Reinforcement Learning Dipendra Misra et.al. 2403.13765 null
2024-03-20 Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension Study Luca Giamattei et.al. 2403.13729 null
2024-03-20 Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections Zengqi Peng et.al. 2403.13674 null
2024-03-20 Multi-agent Reinforcement Traffic Signal Control based on Interpretable Influence Mechanism and Biased ReLU Approximation Zhiyue Luo et.al. 2403.13639 null
2024-03-20 Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation Do June Min et.al. 2403.13578 link
2024-03-20 GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot Wenxuan Song et.al. 2403.13358 null
2024-03-20 Waypoint-Based Reinforcement Learning for Robot Manipulation Tasks Shaunak A. Mehta et.al. 2403.13281 null
2024-03-20 Federated reinforcement learning for robot motion planning with zero-shot generalization Zhenyuan Yuan et.al. 2403.13245 null
2024-03-20 Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0 Jiana Liao et.al. 2403.13237 null
2024-03-19 Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes He Wang et.al. 2403.12946 null
2024-03-19 Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Vidhi Jain et.al. 2403.12943 null
2024-03-19 Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types Rui Liu et.al. 2403.12891 null
2024-03-19 HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Fucai Ke et.al. 2403.12884 null
2024-03-19 Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning Mirco Theile et.al. 2403.12856 null
2024-03-19 Policy Bifurcation in Safe Reinforcement Learning Wenjun Zou et.al. 2403.12847 link
2024-03-19 AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents Jieming Cui et.al. 2403.12835 null
2024-03-19 Oriented and Non-oriented Cubical Surfaces in The Penteract Manuel Estevez et.al. 2403.12825 null
2024-03-19 Dynamic Manipulation of Deformable Objects using Imitation Learning with Adaptation to Hardware Constraints Eric Hannus et.al. 2403.12685 null
2024-03-19 Automated Contrastive Learning Strategy Search for Time Series Baoyu Jing et.al. 2403.12641 null
2024-03-18 The Value of Reward Lookahead in Reinforcement Learning Nadav Merlis et.al. 2403.11637 null
2024-03-18 Offline Multitask Representation Learning for Reinforcement Learning Haque Ishfaq et.al. 2403.11574 null
2024-03-18 Reinforcement Learning with Token-level Feedback for Controllable Text Generation Wendi Li et.al. 2403.11558 null
2024-03-18 TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling Weiran Chen et.al. 2403.11550 null
2024-03-18 State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards Yuto Tanimoto et.al. 2403.11520 link
2024-03-18 Demystifying Deep Reinforcement Learning-Based Autonomous Vehicle Decision-Making Hanxi Wan et.al. 2403.11432 null
2024-03-18 Variational Sampling of Temporal Trajectories Jurijs Nazarovs et.al. 2403.11418 null
2024-03-17 Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective Muhammad Aneeq uz Zaman et.al. 2403.11345 null
2024-03-17 Causality from Bottom to Top: A Survey Abraham Itzhak Weinberg et.al. 2403.11219 null
2024-03-17 Continuous Jumping of a Parallel Wire-Driven Monopedal Robot RAMIEL Using Reinforcement Learning Kento Kawaharazuka et.al. 2403.11205 null
2024-03-14 Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning Zhishuai Liu et.al. 2403.09621 null
2024-03-14 ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models Runyu Ma et.al. 2403.09583 null
2024-03-14 A Reinforcement Learning Approach to Dairy Farm Battery Management using Q Learning Nawazish Ali et.al. 2403.09499 null
2024-03-14 Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision Zhiqing Sun et.al. 2403.09472 link
2024-03-14 A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces Hyuckjin Choi et.al. 2403.09270 null
2024-03-14 Leveraging Constraint Programming in a Deep Learning Approach for Dynamically Solving the Flexible Job-Shop Scheduling Problem Imanol Echeverria et.al. 2403.09249 null
2024-03-14 Rumor Mitigation in Social Media Platforms with Deep Reinforcement Learning Hongyuan Su et.al. 2403.09217 null
2024-03-14 MetroGNN: Metro Network Expansion with Reinforcement Learning Hongyuan Su et.al. 2403.09197 null
2024-03-14 SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning Nicholas Zolman et.al. 2403.09110 link
2024-03-14 CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences Martin Weyssow et.al. 2403.09032 link
2024-03-13 TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning Shangding Gu et.al. 2403.08694 null
2024-03-13 Digital Twin-assisted Reinforcement Learning for Resource-aware Microservice Offloading in Edge Computing Xiangchun Chen et.al. 2403.08687 null
2024-03-13 Meta Reinforcement Learning for Resource Allocation in Aerial Active-RIS-assisted Networks with Rate-Splitting Multiple Access Sajad Faramarzi et.al. 2403.08648 null
2024-03-13 Human Alignment of Large Language Models through Online Preference Optimisation Daniele Calandriello et.al. 2403.08635 null
2024-03-13 Specification Overfitting in Artificial Intelligence Benjamin Roth et.al. 2403.08425 null
2024-03-13 Optimizing Risk-averse Human-AI Hybrid Teams Andrew Fuchs et.al. 2403.08386 null
2024-03-13 Learning to Describe for Predicting Zero-shot Drug-Drug Interactions Fangqi Zhu et.al. 2403.08377 link
2024-03-13 LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments Maonan Wang et.al. 2403.08337 link
2024-03-14 HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback Ang Li et.al. 2403.08309 null
2024-03-13 SpaceOctopus: An Octopus-inspired Motion Planning Framework for Multi-arm Space Robot Wenbo Zhao et.al. 2403.08219 null
2024-03-12 TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation Shivin Dass et.al. 2403.07869 null
2024-03-12 Exploring Safety Generalization Challenges of Large Language Models via Code Qibing Ren et.al. 2403.07865 null
2024-03-12 DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation Chen Wang et.al. 2403.07788 null
2024-03-12 Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards Wei Shen et.al. 2403.07708 null
2024-03-12 Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning Motoki Omura et.al. 2403.07704 null
2024-03-12 Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation Michael Ogezi et.al. 2403.07605 null
2024-03-12 An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning Weiwei Gu et.al. 2403.07566 null
2024-03-12 Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding Huijie Tang et.al. 2403.07559 link
2024-03-12 Constrained Optimal Fuel Consumption of HEV: A Constrained Reinforcement Learning Approach Shuchang Yan et.al. 2403.07503 null
2024-03-12 Optimization of Pressure Management Strategies for Geological CO2 Sequestration Using Surrogate Model-based Reinforcement Learning Jungang Chen et.al. 2403.07360 null
2024-03-11 Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts Onur Celik et.al. 2403.06966 null
2024-03-11 Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning Junseok Park et.al. 2403.06880 null
2024-03-11 Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification Joar Skalse et.al. 2403.06854 null
2024-03-11 In-context Exploration-Exploitation for Reinforcement Learning Zhenwen Dai et.al. 2403.06826 null
2024-03-11 ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment Hao-Lun Hsu et.al. 2403.06814 null
2024-03-11 From Factor Models to Deep Learning: Machine Learning in Reshaping Empirical Asset Pricing Junyi Ye et.al. 2403.06779 null
2024-03-11 ALaRM: Align Language Models via Hierarchical Rewards Modeling Yuhang Lai et.al. 2403.06754 null
2024-03-11 Generalising Multi-Agent Cooperation through Task-Agnostic Communication Dulhan Jayalath et.al. 2403.06750 link
2024-03-11 Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback Adarsh N L et.al. 2403.06735 null
2024-03-11 Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning Zijian Zhou et.al. 2403.06728 null
2024-03-08 Will GPT-4 Run DOOM? Adrian de Wynter et.al. 2403.05468 null
2024-03-08 Switching the Loss Reduces the Cost in Batch Reinforcement Learning Alex Ayoub et.al. 2403.05385 null
2024-03-08 Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation Xiaoying Zhang et.al. 2403.05171 null
2024-03-08 Inverse Design of Photonic Crystal Surface Emitting Lasers is a Sequence Modeling Problem Ceyao Zhang et.al. 2403.05149 null
2024-03-08 ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models Jun Xu et.al. 2403.05132 null
2024-03-08 RLPeri: Accelerating Visual Perimetry Test with Reinforcement Learning and Convolutional Feature Extraction Tanvi Verma et.al. 2403.05112 null
2024-03-08 Efficient Data Collection for Robotic Manipulation via Compositional Generalization Jensen Gao et.al. 2403.05110 null
2024-03-08 Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detection Jared M. Ping et.al. 2403.05106 null
2024-03-08 Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning Hongjoon Ahn et.al. 2403.05066 null
2024-03-08 Aligning Large Language Models for Controllable Recommendations Wensheng Lu et.al. 2403.05063 null
2024-03-07 Teaching Large Language Models to Reason with Reinforcement Learning Alex Havrilla et.al. 2403.04642 null
2024-03-07 Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace Léopold Maytié et.al. 2403.04588 null
2024-03-07 Learning Agility Adaptation for Flight in Clutter Guangyu Zhao et.al. 2403.04586 null
2024-03-07 Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition Long-Fei Li et.al. 2403.04568 null
2024-03-07 Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation Fabian Otto et.al. 2403.04453 null
2024-03-07 Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation Tairan He et.al. 2403.04436 null
2024-03-07 iTRPL: An Intelligent and Trusted RPL Protocol based on Multi-Agent Reinforcement Learning Debasmita Dey et.al. 2403.04416 null
2024-03-07 Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning Jing Guo Jing Guo et.al. 2403.04412 null
2024-03-07 Model-Free Load Frequency Control of Nonlinear Power Systems Based on Deep Reinforcement Learning Xiaodi Chen et.al. 2403.04374 null
2024-03-07 Symmetry Considerations for Learning Task Symmetric Robot Policies Mayank Mittal et.al. 2403.04359 null
2024-03-06 3D Diffusion Policy Yanjie Ze et.al. 2403.03954 link
2024-03-06 Stop Regressing: Training Value Functions via Classification for Scalable Deep RL Jesse Farebrother et.al. 2403.03950 null
2024-03-06 Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation Marcel Torne et.al. 2403.03949 null
2024-03-06 Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning Zifan Xu et.al. 2403.03848 null
2024-03-06 A Survey on Applications of Reinforcement Learning in Spatial Resource Allocation Di Zhang et.al. 2403.03643 null
2024-03-06 Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem Yuhong Sun et.al. 2403.03558 link
2024-03-06 Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning Zida Wu et.al. 2403.03552 null
2024-03-05 RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging Jordan Poots et.al. 2403.03359 null
2024-03-05 Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks Jianfeng Gao et.al. 2403.03270 null
2024-03-05 Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination Liangzhou Wang et.al. 2403.03172 null
2024-03-05 Leveraging Federated Learning and Edge Computing for Recommendation Systems within Cloud Computing Networks Yaqian Qi et.al. 2403.03165 null
2024-03-05 Language Guided Exploration for RL Agents in Text Environments Hitesh Golchha et.al. 2403.03141 null
2024-03-05 SplAgger: Split Aggregation for Meta-Reinforcement Learning Jacob Beck et.al. 2403.03020 null
2024-03-05 Autonomous vehicle decision and control through reinforcement learning with traffic flow randomization Yuan Lin et.al. 2403.02882 null
2024-03-05 SpaceHopper: A Small-Scale Legged Robot for Exploring Low-Gravity Celestial Bodies Alexander Spiridonov et.al. 2403.02831 null
2024-03-05 A Zero-Shot Reinforcement Learning Strategy for Autonomous Guidewire Navigation Valentina Scarponi et.al. 2403.02777 null
2024-03-05 RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches Priya Sundaresan et.al. 2403.02709 null
2024-03-05 Fighting Game Adaptive Background Music for Improved Gameplay Ibrahim Khan et.al. 2403.02701 null
2024-03-05 PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning Ke Zhang et.al. 2403.02635 null
2024-03-02 Improving the Validity of Automatically Generated Feedback via Reinforcement Learning Alexander Scarlatos et.al. 2403.01304 link
2024-03-02 Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey Hamza Kheddar et.al. 2403.01255 null
2024-03-02 Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding Ha-Thanh Nguyen et.al. 2403.01185 null
2024-03-02 Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning Hyungho Na et.al. 2403.01112 null
2024-03-02 Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL) Noah Ford et.al. 2403.01059 null
2024-03-01 A Holistic Power Optimization Approach for Microgrid Control Based on Deep Reinforcement Learning Fulong Yao et.al. 2403.01013 null
2024-03-01 Policy Optimization for PDE Control with a Warm Start Xiangyuan Zhang et.al. 2403.01005 null
2024-03-01 On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games Awni Altabaa et.al. 2403.00993 null
2024-03-01 SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation Noriaki Hirose et.al. 2403.00991 null
2024-03-01 Scale-free Adversarial Reinforcement Learning Mingyu Chen et.al. 2403.00930 null
2024-02-29 Curiosity-driven Red-teaming for Large Language Models Zhang-Wei Hong et.al. 2402.19464 link
2024-02-29 ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL Yifei Zhou et.al. 2402.19446 link
2024-02-29 Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation Jonathan Yang et.al. 2402.19432 null
2024-02-29 Understanding Iterative Combinatorial Auction Designs via Multi-Agent Reinforcement Learning Greg d’Eon et.al. 2402.19420 null
2024-02-29 RL-GPT: Integrating Reinforcement Learning and Code-as-policy Shaoteng Liu et.al. 2402.19299 null
2024-02-29 StiefelGen: A Simple, Model Agnostic Approach for Time Series Data Augmentation over Riemannian Manifolds Prasad Cheema et.al. 2402.19287 null
2024-02-29 Adaptive Testing Environment Generation for Connected and Automated Vehicles with Dense Reinforcement Learning Jingxuan Yang et.al. 2402.19275 null
2024-02-29 Deep Reinforcement Learning: A Convex Optimization Approach Ather Gattami et.al. 2402.19212 null
2024-02-29 ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration Angelo Caregnato-Neto et.al. 2402.19128 null
2024-02-29 Temporal-Aware Deep Reinforcement Learning for Energy Storage Bidding in Energy and Contingency Reserve Markets Jinhao Li et.al. 2402.19110 null
2024-02-28 Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards Haoxiang Wang et.al. 2402.18571 link
2024-02-28 Unifying F1TENTH Autonomous Racing: Survey, Methods and Benchmarks Benjamin David Evans et.al. 2402.18558 null
2024-02-28 Human-Centric Aware UAV Trajectory Planning in Search and Rescue Missions Employing Multi-Objective Reinforcement Learning with AHP and Similarity-Based Experience Replay Mahya Ramezani et.al. 2402.18487 null
2024-02-28 FinAgent: A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist Wentao Zhang et.al. 2402.18485 null
2024-02-28 Implementing Online Reinforcement Learning with Clustering Neural Networks James E. Smith et.al. 2402.18472 null
2024-02-28 Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning Jin Hwa Lee et.al. 2402.18361 null
2024-02-28 Solving Multi-Entity Robotic Problems Using Permutation Invariant Neural Networks Tianxu An et.al. 2402.18345 null
2024-02-28 Whole-body Humanoid Robot Locomotion with Human Reference Qiang Zhang et.al. 2402.18294 null
2024-02-28 Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization Shuo Yang et.al. 2402.18284 null
2024-02-28 Reinforcement Learning and Graph Neural Networks for Probabilistic Risk Assessment Joachim Grimstad et.al. 2402.18246 null

(<a href=../README.md>back to main</a>)