Reinforcement Learning
Reinforcement Learning
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-18 | Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification | Qihao Liu et.al. | 2512.16921 | null |
| 2025-12-18 | AdaTooler-V: Adaptive Tool-Use for Images and Videos | Chaoyang Wang et.al. | 2512.16918 | null |
| 2025-12-18 | Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning | Qihao Liu et.al. | 2512.16917 | null |
| 2025-12-18 | Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward | Peter Chen et.al. | 2512.16912 | null |
| 2025-12-18 | Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning | Andrew Wagenmaker et.al. | 2512.16911 | null |
| 2025-12-18 | MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning | Yuanchen Ju et.al. | 2512.16909 | null |
| 2025-12-18 | AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning | Tzu-Han Lin et.al. | 2512.16883 | null |
| 2025-12-18 | A survey of the orienteering problem: model evolution, algorithmic advances, and future directions | Songhao Shen et.al. | 2512.16865 | null |
| 2025-12-18 | RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing | Tianyuan Qu et.al. | 2512.16864 | null |
| 2025-12-18 | ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning | Zihan Zhou et.al. | 2512.16861 | null |
| 2025-12-18 | Meta-RL Induces Exploration in Language Agents | Yulun Jiang et.al. | 2512.16848 | null |
| 2025-12-18 | Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning | Bahman Abolhassani et.al. | 2512.16813 | null |
| 2025-12-18 | Olaf: Bringing an Animated Character to Life in the Physical World | David Müller et.al. | 2512.16705 | null |
| 2025-12-18 | JustRL: Scaling a 1.5B LLM with a Simple RL Recipe | Bingxiang He et.al. | 2512.16649 | null |
| 2025-12-18 | Implementing a Sharia Chatbot as a Consultation Medium for Questions About Islam | Wisnu Uriawan et.al. | 2512.16644 | null |
| 2025-12-18 | Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game | Barna Pásztor et.al. | 2512.16626 | null |
| 2025-12-18 | Non-Asymptotic Global Convergence of PPO-Clip | Yin Liu et.al. | 2512.16565 | null |
| 2025-12-18 | ParamExplorer: A framework for exploring parameters in generative art | Julien Gachadoat et.al. | 2512.16529 | null |
| 2025-12-18 | Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment | Yuan Li et.al. | 2512.16484 | null |
| 2025-12-18 | E-SDS: Environment-aware See it, Do it, Sorted - Automated Environment-Aware Reinforcement Learning for Humanoid Locomotion | Enis Yalcin et.al. | 2512.16446 | null |
| 2025-12-18 | StarCraft+: Benchmarking Multi-agent Algorithms in Adversary Paradigm | Yadong Li et.al. | 2512.16444 | null |
| 2025-12-18 | NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning | Ruifeng Xu et.al. | 2512.16408 | null |
| 2025-12-18 | Hypernetworks That Evolve Themselves | Joachim Winther Pedersen et.al. | 2512.16406 | null |
| 2025-12-18 | Machine Learning-based Optimal Control for Colloidal Self-Assembly | Andres Lizano-Villalobos et.al. | 2512.16402 | null |
| 2025-12-18 | ManiLong-Shot: Interaction-Aware One-Shot Imitation Learning for Long-Horizon Manipulation | Zixuan Chen et.al. | 2512.16302 | null |
| 2025-12-18 | Simultaneous Secrecy and Covert Communications (SSACC) in Mobility-Aware RIS-Aided Networks | Yanyu Cheng et.al. | 2512.16224 | null |
| 2025-12-18 | Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation | Sarosij Bose et.al. | 2512.16201 | null |
| 2025-12-18 | MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation | Pengyu Wang et.al. | 2512.16145 | null |
| 2025-12-18 | INTELLECT-3: Technical Report | Prime Intellect Team et.al. | 2512.16144 | null |
| 2025-12-17 | Techno-economic optimization of a heat-pipe microreactor, part I: theory and cost optimization | Paul Seurin et.al. | 2512.16032 | null |
| 2025-12-17 | Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models | Caner Erden et.al. | 2512.15973 | null |
| 2025-12-17 | Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning | Polaris Jhandi et.al. | 2512.15943 | null |
| 2025-12-17 | DSO: Direct Steering Optimization for Bias Mitigation | Lucas Monteiro Paes et.al. | 2512.15926 | null |
| 2025-12-15 | Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT) | Akhil Sharma et.al. | 2512.15790 | null |
| 2025-12-17 | Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning | Zhenwen Liang et.al. | 2512.15687 | null |
| 2025-12-17 | Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning | Jiaqi Xu et.al. | 2512.15662 | null |
| 2025-12-17 | Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction | Mathieu Blondel et.al. | 2512.15605 | null |
| 2025-12-17 | Deep Reinforcement Learning for EH-Enabled Cognitive-IoT Under Jamming Attacks | Nadia Abdolkhani et.al. | 2512.15558 | null |
| 2025-12-17 | Autonomous Pressure Control in MuVacAS via Deep Reinforcement Learning and Deep Learning Surrogate Models | Guillermo Rodriguez-Llorente et.al. | 2512.15521 | null |
| 2025-12-17 | Double Horizon Model-Based Policy Optimization | Akihiro Kubo et.al. | 2512.15439 | null |
| 2025-12-17 | FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments | Quanxi Zhou et.al. | 2512.15430 | null |
| 2025-12-17 | Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods | Ji Zhou et.al. | 2512.15422 | null |
| 2025-12-17 | EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning | Jianfei Ma et.al. | 2512.15405 | null |
| 2025-12-17 | Graph Contextual Reinforcement Learning for Efficient Directed Controller Synthesis | Toshihide Ubukata et.al. | 2512.15295 | null |
| 2025-12-17 | Learning-Based Phase Shift Optimization of Liquid Crystal RIS in Dynamic mmWave Networks | Le Hao et.al. | 2512.15279 | null |
| 2025-12-17 | Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning | Yiliu Sun et.al. | 2512.15274 | null |
| 2025-12-17 | EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence | Jiaxu Wan et.al. | 2512.15160 | null |
| 2025-12-17 | Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning | Weiqin Wang et.al. | 2512.15146 | null |
| 2025-12-17 | Automatic Reward Shaping from Multi-Objective Human Heuristics | Yuqing Xie et.al. | 2512.15120 | null |
| 2025-12-17 | QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management | Jiayang Wan et.al. | 2512.15119 | null |
| 2025-12-17 | Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models | Jinwu Hu et.al. | 2512.15089 | null |
| 2025-12-17 | Deep Reinforcement Learning for Joint Time and Power Management in SWIPT-EH CIoT | Nadia Abdolkhani et.al. | 2512.15062 | null |
| 2025-12-17 | Spectral Representation-based Reinforcement Learning | Chenxiao Gao et.al. | 2512.15036 | null |
| 2025-12-17 | ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision | Wenlong Xia et.al. | 2512.15020 | null |
| 2025-12-17 | Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management | E. C. Garrido-Merchán et.al. | 2512.14992 | null |
| 2025-12-17 | Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes | Hanqing Jin et.al. | 2512.14991 | null |
| 2025-12-16 | Puzzle Curriculum GRPO for Vision-Centric Reasoning | Ahmadreza Jeddi et.al. | 2512.14944 | null |
| 2025-12-16 | Imitation Learning for Multi-turn LM Agents via On-policy Expert Corrections | Niklas Lauffer et.al. | 2512.14895 | null |
| 2025-12-16 | Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse | Jingwei Chen et.al. | 2512.14879 | null |
| 2025-12-16 | TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs | Jun Zhang et.al. | 2512.14698 | null |
| 2025-12-16 | CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives | Zihan Wang et.al. | 2512.14696 | null |
| 2025-12-16 | Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes | Alessandro Trapasso et.al. | 2512.14617 | null |
| 2025-12-16 | RecGPT-V2 Technical Report | Chao Yi et.al. | 2512.14503 | null |
| 2025-12-16 | Hybrid Cognitive IoT with Cooperative Caching and SWIPT-EH: A Hierarchical Reinforcement Learning Framework | Nadia Abdolkhani et.al. | 2512.14488 | null |
| 2025-12-16 | Context-Picker: Dynamic context selection using multi-stage reinforcement learning | Siyuan Zhu et.al. | 2512.14465 | null |
| 2025-12-16 | A data-physics hybrid generative model for patient-specific post-stroke motor rehabilitation using wearable sensor data | Yanning Dai et.al. | 2512.14329 | null |
| 2025-12-16 | Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations | Xudong Han et.al. | 2512.14321 | null |
| 2025-12-16 | A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks | Agrippina Mwangi et.al. | 2512.14297 | null |
| 2025-12-16 | GLM-TTS Technical Report | Jiayan Cui et.al. | 2512.14291 | null |
| 2025-12-16 | Understanding and Improving Hyperbolic Deep Reinforcement Learning | Timo Klein et.al. | 2512.14202 | null |
| 2025-12-16 | Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis | Yankai Jiang et.al. | 2512.14157 | null |
| 2025-12-16 | A First-Order Logic-Based Alternative to Reward Models in RLHF | Chunjin Jian et.al. | 2512.14100 | null |
| 2025-12-16 | RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees | Junjie Ma et.al. | 2512.14069 | null |
| 2025-12-16 | Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning | Amir M. Soufi Enayati et.al. | 2512.14057 | null |
| 2025-12-16 | OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving | Zhenguo Zhang et.al. | 2512.14044 | null |
| 2025-12-16 | Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model | Zhaofeng Hu et.al. | 2512.14031 | null |
| 2025-12-16 | Cooperative Caching Towards Efficient Spectrum Utilization in Cognitive-IoT Networks | Nadia Abdolkhani et.al. | 2512.14029 | null |
| 2025-12-16 | Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks | Nadia Abdolkhani et.al. | 2512.14013 | null |
| 2025-12-15 | Adaptive digital twins for predictive decision-making: Online Bayesian learning of transition dynamics | Eugenio Varetti et.al. | 2512.13919 | null |
| 2025-12-15 | Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences | Charles Marrder et.al. | 2512.13890 | null |
| 2025-12-15 | SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning | Jitesh Jain et.al. | 2512.13874 | null |
| 2025-12-15 | Explainable reinforcement learning from human feedback to improve alignment | Shicheng Liu et.al. | 2512.13837 | null |
| 2025-12-13 | RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing | Yuhan Tang et.al. | 2512.13727 | null |
| 2025-12-13 | Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce | Sayak Chakrabarty et.al. | 2512.13726 | null |
| 2025-12-15 | AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection | Junwen Miao et.al. | 2512.13671 | null |
| 2025-12-15 | A Scientific Reasoning Model for Organic Synthesis Procedure Generation | Guoqing Liu et.al. | 2512.13668 | null |
| 2025-12-15 | Advancing Machine Learning Optimization of Chiral Photonic Metasurface: Comparative Study of Neural Network and Genetic Algorithm Approaches | Davide Filippozzi et.al. | 2512.13656 | null |
| 2025-12-15 | MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning | Haoyu Fu et.al. | 2512.13636 | null |
| 2025-12-15 | SCR2-ST: Combine Single Cell with Spatial Transcriptomics for Efficient Active Sampling via Reinforcement Learning | Junchao Zhu et.al. | 2512.13635 | null |
| 2025-12-15 | Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models | Boxin Wang et.al. | 2512.13607 | null |
| 2025-12-15 | Image Diffusion Preview with Consistency Solver | Fu-Yun Wang et.al. | 2512.13592 | link |
| 2025-12-15 | MMhops-R1: Multimodal Multi-hop Reasoning | Tao Zhang et.al. | 2512.13573 | null |
| 2025-12-15 | Memory in the Age of AI Agents | Yuyang Hu et.al. | 2512.13564 | link |
| 2025-12-15 | How Low Can You Go? The Data-Light SE Challenge | Kishan Kumar Ganguly et.al. | 2512.13524 | null |
| 2025-12-15 | Reinforcement Learning based 6-DoF Maneuvers for Microgravity Intravehicular Docking: A Simulation Study with Int-Ball2 in ISS-JEM | Aman Arora et.al. | 2512.13514 | null |
| 2025-12-15 | MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph | Linjie Mu et.al. | 2512.13510 | null |
| 2025-12-15 | Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model | Heyi Chen et.al. | 2512.13507 | null |
| 2025-12-15 | Differentiable Evolutionary Reinforcement Learning | Sitao Cheng et.al. | 2512.13399 | null |
| 2025-12-15 | QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution | Mohammad Reza Fasihi et.al. | 2512.13393 | null |
| 2025-12-15 | Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning | Chuan Mao et.al. | 2512.13380 | null |
| 2025-12-15 | Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles | Sümer Tunçay et.al. | 2512.13359 | null |
| 2025-12-15 | Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3) | Zeyad Gamal et.al. | 2512.13356 | null |
| 2025-12-15 | Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration | Hao Fu et.al. | 2512.13293 | null |
| 2025-12-15 | AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning | Jiaru Zou et.al. | 2512.13278 | null |
| 2025-12-15 | SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling | Muhammad Alfian Amrizal et.al. | 2512.13268 | null |
| 2025-12-15 | Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving | Hyunki Seong et.al. | 2512.13262 | null |
| 2025-12-15 | Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection | Zihui Zhao et.al. | 2512.13240 | null |
| 2025-12-15 | SACn: Soft Actor-Critic with n-step Returns | Jakub Łyskawa et.al. | 2512.13165 | null |
| 2025-12-15 | SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning | Emre Can Acikgoz et.al. | 2512.13159 | null |
| 2025-12-15 | TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning | Shenzhi Yang et.al. | 2512.13106 | null |
| 2025-12-15 | Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures | Mohammad Walid Charrwi et.al. | 2512.13096 | null |
| 2025-12-15 | ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning | Feng Zhang et.al. | 2512.13095 | null |
| 2025-12-15 | Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation | Xiang Li et.al. | 2512.13094 | null |
| 2025-12-15 | PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations | Mingqi Yuan et.al. | 2512.13093 | null |
| 2025-12-15 | M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization | Bizhe Bai et.al. | 2512.13070 | null |
| 2025-12-15 | Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments | Kangning Gao et.al. | 2512.13060 | null |
| 2025-12-15 | GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training | Tong Wei et.al. | 2512.13043 | null |
| 2025-12-15 | What Happens Next? Next Scene Prediction with a Unified Video Model | Xinjie Li et.al. | 2512.13015 | null |
| 2025-12-15 | Learning Terrain Aware Bipedal Locomotion via Reduced Dimensional Perceptual Representations | Guillermo A. Castillo et.al. | 2512.12993 | null |
| 2025-12-15 | Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning | Amin Jalal Aghdasian et.al. | 2512.12987 | null |
| 2025-12-15 | QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management | Weizhou Shen et.al. | 2512.12967 | null |
| 2025-12-15 | Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals | Gagan Deep et.al. | 2512.12924 | null |
| 2025-12-15 | LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization | Bangyu Li et.al. | 2512.12922 | null |
| 2025-12-15 | Meta-GPT: Decoding the Metasurface Genome with Generative Artificial Intelligence | David Dang et.al. | 2512.12888 | null |
| 2025-12-14 | Information-Consistent Language Model Recommendations through Group Relative Policy Optimization | Sonal Prabhune et.al. | 2512.12858 | null |
| 2025-12-14 | MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems | Patrick Kostelac et.al. | 2512.12855 | null |
| 2025-12-14 | Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks | Dong Liu et.al. | 2512.12803 | null |
| 2025-12-14 | CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning | Xuanzhang Liu et.al. | 2512.12716 | null |
| 2025-12-14 | Self-Motivated Growing Neural Network for Adaptive Architecture via Local Structural Plasticity | Yiyang Jia et.al. | 2512.12713 | null |
| 2025-12-14 | Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning | Enhong Mu et.al. | 2512.12706 | null |
| 2025-12-14 | Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning | Yongcan Yu et.al. | 2512.12690 | null |
| 2025-12-14 | CogDoc: Towards Unified thinking in Documents | Qixin Xu et.al. | 2512.12658 | null |
| 2025-12-14 | Coupled Variational Reinforcement Learning for Language Model General Reasoning | Xueru Wen et.al. | 2512.12576 | null |
| 2025-12-14 | World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents | Yesid Fonseca et.al. | 2512.12548 | null |
| 2025-12-13 | Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings | Shengkai Xu et.al. | 2512.12492 | null |
| 2025-12-13 | More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models | Hoang Anh Just et.al. | 2512.12487 | null |
| 2025-12-13 | HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments | Yongjun He et.al. | 2512.12476 | null |
| 2025-12-13 | Sim2Real Reinforcement Learning for Soccer skills | Jonathan Spraggett et.al. | 2512.12437 | null |
| 2025-12-13 | Deep Hedging with Reinforcement Learning: A Practical Framework for Option Risk Management | Travon Lucius et.al. | 2512.12420 | null |
| 2025-12-13 | ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) Systems | Babak Badnava et.al. | 2512.12366 | null |
| 2025-12-13 | The Role of AI in Modern Penetration Testing | J. Alexander Curtis et.al. | 2512.12326 | null |
| 2025-12-13 | A Conflict-Aware Resource Management Framework for the Computing Continuum | Vlad Popescu-Vifor et.al. | 2512.12299 | null |
| 2025-12-13 | Moment and Highlight Detection via MLLM Frame Segmentation | I Putu Andika Bagas Jiwanta et.al. | 2512.12246 | null |
| 2025-12-13 | Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy | Jonathan Spraggett et.al. | 2512.12230 | null |
| 2025-12-12 | Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning | Vittorio Giammarino et.al. | 2512.12046 | null |
| 2025-12-12 | Policy Gradient Algorithms for Age-of-Information Cost Minimization | José-Ramón Vidal et.al. | 2512.11990 | null |
| 2025-12-12 | Learning to Extract Context for Context-Aware LLM Inference | Minseon Kim et.al. | 2512.11986 | null |
| 2025-12-12 | A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach | Jia Hu et.al. | 2512.11944 | null |
| 2025-12-12 | Evolutionary Reinforcement Learning based AI tutor for Socratic Interdisciplinary Instruction | Mei Jiang et.al. | 2512.11930 | null |
| 2025-12-12 | AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis | Junjie Ye et.al. | 2512.11797 | null |
| 2025-12-12 | Agile Flight Emerges from Multi-Agent Competitive Racing | Vineet Pasumarti et.al. | 2512.11781 | null |
| 2025-12-12 | SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support | Yuming Feng et.al. | 2512.11755 | null |
| 2025-12-12 | UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations | Tingyu Yuan et.al. | 2512.11609 | null |
| 2025-12-12 | DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry | Zhenyang Cai et.al. | 2512.11558 | null |
| 2025-12-12 | Rethinking Expert Trajectory Utilization in LLM Post-training | Bowen Ding et.al. | 2512.11470 | null |
| 2025-12-12 | Three methods, one problem: Classical and AI approaches to no-three-in-line | Pranav Ramanathan et.al. | 2512.11469 | null |
| 2025-12-12 | Towards Trustworthy Multi-Turn LLM Agents via Behavioral Guidance | Gonca Gürsun et.al. | 2512.11421 | null |
| 2025-12-12 | Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization | Yifan Niu et.al. | 2512.11391 | null |
| 2025-12-12 | Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits | Minwoo Park et.al. | 2512.11345 | null |
| 2025-12-12 | DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning | Jinming Ge et.al. | 2512.11342 | null |
| 2025-12-12 | RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training | Tianyuan Wu et.al. | 2512.11306 | null |
| 2025-12-12 | When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents | Mrinal Rawat et.al. | 2512.11277 | null |
| 2025-12-12 | A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation | Hong Je-Gal et.al. | 2512.11270 | null |
| 2025-12-12 | Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control | Iftekharul Islam et.al. | 2512.11247 | null |
| 2025-12-11 | Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning | Wei Duan et.al. | 2512.11179 | null |
| 2025-12-11 | Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance | Tzu-Hsien Lee et.al. | 2512.11173 | null |
| 2025-12-11 | CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound | Akhil S Anand et.al. | 2512.11169 | null |
| 2025-12-11 | Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts | Guanli Liu et.al. | 2512.11161 | null |
| 2025-12-11 | In-Context Multi-Objective Optimization | Xinyu Zhang et.al. | 2512.11114 | null |
| 2025-12-11 | Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation | Yiwen Tang et.al. | 2512.10949 | link |
| 2025-12-11 | Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit | Zamirddine Mari et.al. | 2512.10934 | null |
| 2025-12-11 | Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation | Zamirddine Mari et.al. | 2512.10925 | null |
| 2025-12-11 | Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies | Mohammad Rezoanul Hoque et.al. | 2512.10913 | null |
| 2025-12-11 | Iterative Compositional Data Generation for Robot Control | Anh-Quan Pham et.al. | 2512.10891 | null |
| 2025-12-11 | Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments | Atahan Cilan et.al. | 2512.10835 | null |
| 2025-12-11 | OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification | Zijian Wu et.al. | 2512.10756 | null |
| 2025-12-11 | Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification | Maya Swisa et.al. | 2512.10747 | null |
| 2025-12-11 | Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving | Songyang Gao et.al. | 2512.10739 | null |
| 2025-12-11 | How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning | Jianbo Wang et.al. | 2512.10698 | null |
| 2025-12-11 | Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning | Benjamin Gundersen et.al. | 2512.10691 | null |
| 2025-12-11 | AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence | Bo Yang et.al. | 2512.10624 | null |
| 2025-12-11 | Multi-Objective Reward and Preference Optimization: Theory and Algorithms | Akhil Agnihotri et.al. | 2512.10601 | null |
| 2025-12-11 | Grounding Everything in Tokens for Multimodal Large Language Models | Xiangxuan Ren et.al. | 2512.10554 | null |
| 2025-12-11 | Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning | Haiteng Zhao et.al. | 2512.10534 | null |
| 2025-12-11 | Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning | Chihyeon Song et.al. | 2512.10510 | null |
| 2025-12-11 | UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning | Jiaxi Wu et.al. | 2512.10492 | null |
| 2025-12-11 | Shot and Architecture Adaptive Subspace Variational Quantum Eigensolver for Microwave Simulation | Zhixiu Han et.al. | 2512.10458 | null |
| 2025-12-11 | HypeR Adaptivity: Joint $hr$ -Adaptive Meshing via Hypergraph Multi-Agent Deep Reinforcement Learning | Niccolò Grillo et.al. | 2512.10439 | null |
| 2025-12-11 | Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention | Yang Yu et.al. | 2512.10414 | null |
| 2025-12-11 | A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale | Vinoth Punniyamoorthy et.al. | 2512.10341 | null |
| 2025-12-11 | Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters | Shruti Dongare et.al. | 2512.10271 | null |
| 2025-12-11 | Multi-dimensional Preference Alignment by Conditioning Reward Itself | Jiho Jang et.al. | 2512.10237 | null |
| 2025-12-11 | Task-Oriented Grasping Using Reinforcement Learning with a Contextual Reward Machine | Hui Li et.al. | 2512.10235 | null |
| 2025-12-11 | Latent Chain-of-Thought World Modeling for End-to-End Driving | Shuhan Tan et.al. | 2512.10226 | null |
| 2025-12-11 | An exploration for higher efficiency in multi objective optimisation with reinforcement learning | Mehmet Emin Aydin et.al. | 2512.10208 | null |
| 2025-12-10 | Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation | Pol Mestres et.al. | 2512.10118 | null |
| 2025-12-10 | Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation | Steven Caro et.al. | 2512.10099 | null |
| 2025-12-10 | SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation | Jongmin Lee et.al. | 2512.10042 | null |
| 2025-12-10 | Diffusion Is Your Friend in Show, Suggest and Tell | Jia Cheng Hu et.al. | 2512.10038 | null |
| 2025-12-10 | Latent Action World Models for Control with Unlabeled Trajectories | Marvin Alles et.al. | 2512.10016 | null |
| 2025-12-10 | TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0 | Jinyu Chen et.al. | 2512.09961 | null |
| 2025-12-10 | STACHE: Local Black-Box Explanations for Reinforcement Learning Policies | Andrew Elashkin et.al. | 2512.09909 | null |
| 2025-12-10 | FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning | Khurram Khalil et.al. | 2512.09872 | null |
| 2025-12-10 | Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation | Yuyang Li et.al. | 2512.09851 | link |
| 2025-12-10 | ChronusOmni: Improving Time Awareness of Omni Large Language Models | Yijing Chen et.al. | 2512.09841 | null |
| 2025-12-10 | RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning | Khurram Khalil et.al. | 2512.09829 | null |
| 2025-12-10 | Prefrontal scaling of reward prediction error readout gates reinforcement-derived adaptive behavior in primates | Tian Sang et.al. | 2512.09761 | null |
| 2025-12-10 | MOA: Multi-Objective Alignment for Role-Playing Agents | Chonghua Liao et.al. | 2512.09756 | null |
| 2025-12-10 | Flexible Reconfigurable Intelligent Surface-Aided Covert Communications in UAV Networks | Chong Huang et.al. | 2512.09714 | null |
| 2025-12-10 | Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning | Kaichen He et.al. | 2512.09706 | null |
| 2025-12-10 | Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies | Mika Persson et.al. | 2512.09682 | null |
| 2025-12-10 | d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models | Leyi Pan et.al. | 2512.09675 | null |
| 2025-12-10 | SynthPix: A lightspeed PIV images generator | Antonio Terpin et.al. | 2512.09664 | null |
| 2025-12-10 | Mastering Diverse, Unknown, and Cluttered Tracks for Robust Vision-Based Drone Racing | Feng Yu et.al. | 2512.09571 | null |
| 2025-12-10 | Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search | Junkai Ji et.al. | 2512.09566 | null |
| 2025-12-10 | REASAN: Learning Reactive Safe Navigation for Legged Robots | Qihao Yuan et.al. | 2512.09537 | null |
| 2025-12-10 | RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning | Yucan Guo et.al. | 2512.09487 | null |
| 2025-12-10 | Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation | Jialin Ying et.al. | 2512.09410 | null |
| 2025-12-10 | CFLight: Enhancing Safety with Traffic Signal Control through Counterfactual Learning | Mingyuan Li et.al. | 2512.09368 | null |
| 2025-12-10 | COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning | Lin Li et.al. | 2512.09349 | null |
| 2025-12-10 | Tyche: A Hybrid Computation Framework of Illumination Pattern for Satellite Beam Hopping | Ziheng Yang et.al. | 2512.09312 | null |
| 2025-12-10 | One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation | Huayi Zhou et.al. | 2512.09297 | null |
| 2025-12-10 | Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning | Ruonan Pi et.al. | 2512.09293 | null |
| 2025-12-10 | Exploratory Mean-Variance with Jumps: An Equilibrium Approach | Yuling Max Chen et.al. | 2512.09224 | null |
| 2025-12-09 | Learning Unmasking Policies for Diffusion Language Models | Metod Jazbec et.al. | 2512.09106 | null |
| 2025-12-09 | Masked Generative Policy for Robotic Control | Lipeng Zhuang et.al. | 2512.09101 | null |
| 2025-12-09 | No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers | Damiano Marsili et.al. | 2512.08889 | null |
| 2025-12-09 | IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams | Ryan LeRoy et.al. | 2512.08877 | null |
| 2025-12-09 | Reinforcement Learning From State and Temporal Differences | Lex Weaver et.al. | 2512.08855 | null |
| 2025-12-09 | Optimal navigation in two-dimensional regular and turbulent flows | Vladimir Parfenyev et.al. | 2512.08766 | null |
| 2025-12-09 | Learning and Editing Universal Graph Prompt Tuning via Reinforcement Learning | Jinfeng Xu et.al. | 2512.08763 | null |
| 2025-12-09 | Direct transfer of optimized controllers to similar systems using dimensionless MPC | Josip Kir Hromatko et.al. | 2512.08667 | null |
| 2025-12-09 | Sim2Swim: Zero-Shot Velocity Control for Agile AUV Maneuvering in 3 Minutes | Lauritz Rismark Fosso et.al. | 2512.08656 | null |
| 2025-12-09 | Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis | Orit Davidovich et.al. | 2512.08601 | null |
| 2025-12-09 | Mind to Hand: Purposeful Robotic Control via Embodied Reasoning | Peijun Tang et.al. | 2512.08580 | null |
| 2025-12-09 | Thinking with Images via Self-Calling Agent | Wenxi Yang et.al. | 2512.08511 | link |
| 2025-12-09 | Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning | Junnan Qiu et.al. | 2512.08485 | null |
| 2025-12-09 | Using reinforcement learning to probe the role of feedback in skill acquisition | Antonio Terpin et.al. | 2512.08463 | null |
| 2025-12-09 | From Accuracy to Impact: The Impact-Driven AI Framework (IDAIF) for Aligning Engineering Architecture with Theory of Change | Yong-Woon Kim et.al. | 2512.08449 | null |
| 2025-12-09 | Turning Threat into Opportunity: DRL-Powered Anti-Jamming via Energy Harvesting in UAV-Disrupted Channels | Ngoc-Tan Nguyen et.al. | 2512.08351 | null |
| 2025-12-09 | Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks | Thai Duong Nguyen et.al. | 2512.08341 | null |
| 2025-12-09 | Collaborative Intelligence for UAV-Satellite Network Slicing: Towards a Joint QoS-Energy-Fairness MADRL Optimization | Thanh-Dao Nguyen et.al. | 2512.08322 | null |
| 2025-12-09 | rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection | Sijia Chen et.al. | 2512.08300 | null |
| 2025-12-09 | Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventions | Eunice Yiu et.al. | 2512.08230 | null |
| 2025-12-09 | Primal-dual policy learning for mean-field stochastic LQR problem | Xiushan Jiang et.al. | 2512.08205 | null |
| 2025-12-09 | TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models | Zheng Ding et.al. | 2512.08153 | null |
| 2025-12-09 | Robust Agents in Open-Ended Worlds | Mikayel Samvelyan et.al. | 2512.08139 | null |
| 2025-12-09 | Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward | Sampriti Soor et.al. | 2512.08131 | null |
| 2025-12-08 | Scalable Offline Model-Based RL with Action Chunks | Kwanyoung Park et.al. | 2512.08108 | null |
| 2025-12-08 | Training LLMs for Honesty via Confessions | Manas Joglekar et.al. | 2512.08093 | null |
| 2025-12-08 | An Introduction to Deep Reinforcement and Imitation Learning | Pedro Santana et.al. | 2512.08052 | null |
| 2025-12-08 | F2: Offline Reinforcement Learning for Hamiltonian Simulation via Free-Fermionic Subroutine Compilation | Ethan Decker et.al. | 2512.08023 | null |
| 2025-12-08 | Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care | Aryaman Bansal et.al. | 2512.08012 | null |
| 2025-12-08 | VLD: Visual Language Goal Distance for Reinforcement Learning Navigation | Lazar Milikic et.al. | 2512.07976 | null |
| 2025-12-08 | Agentic Artificial Intelligence for Ethical Cybersecurity in Uganda: A Reinforcement Learning Framework for Threat Detection in Resource-Constrained Environments | Ibrahim Adabara et.al. | 2512.07909 | null |
| 2025-12-08 | An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning | Lukas Johannes Möller et.al. | 2512.07827 | null |
| 2025-12-08 | On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models | Charlie Zhang et.al. | 2512.07783 | null |
| 2025-12-08 | RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models | Xiqiao Xiong et.al. | 2512.07761 | null |
| 2025-12-08 | DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving | Jialv Zou et.al. | 2512.07745 | null |
| 2025-12-08 | SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery | Meng Cao et.al. | 2512.07733 | null |
| 2025-12-08 | Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE | Anxiang Zeng et.al. | 2512.07710 | null |
| 2025-12-08 | Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks | Aileen Liao et.al. | 2512.07697 | null |
| 2025-12-08 | The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds | Shahar Lutati et.al. | 2512.07631 | null |
| 2025-12-08 | Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement | Yongsheng Lian et.al. | 2512.07611 | null |
| 2025-12-08 | Understanding Individual Decision-Making in Multi-Agent Reinforcement Learning: A Dynamical Systems Approach | James Rudd-Jones et.al. | 2512.07588 | null |
| 2025-12-08 | ReLaX: Reasoning with Latent Exploration for Large Reasoning Models | Shimin Zhang et.al. | 2512.07558 | null |
| 2025-12-08 | Model-Based Reinforcement Learning Under Confounding | Nishanth Venkatesh et.al. | 2512.07528 | null |
| 2025-12-08 | How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations | JV Roig et.al. | 2512.07497 | null |
| 2025-12-08 | Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization | Zhuoran Zhuang et.al. | 2512.07478 | null |
| 2025-12-08 | Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction | Haolin Song et.al. | 2512.07464 | null |
| 2025-12-08 | Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning | Tong Wu et.al. | 2512.07461 | null |
| 2025-12-08 | From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models | Clarisse Bardiot et.al. | 2512.07452 | null |
| 2025-12-08 | KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models | Chenwei Shi et.al. | 2512.07437 | null |
| 2025-12-08 | Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models | Haidong Kang et.al. | 2512.07419 | null |
| 2025-12-08 | Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning | Giray Önür et.al. | 2512.07417 | null |
| 2025-12-08 | Training Language Models to Use Prolog as a Tool | Niklas Mellgren et.al. | 2512.07407 | null |
| 2025-12-08 | Control and Reinforcement Learning through the Lens of Optimization: An Algorithmic Perspective | Tolga Ok et.al. | 2512.07377 | null |
| 2025-12-08 | ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning | Byungju Kim et.al. | 2512.07371 | null |
| 2025-12-08 | Multi-Rigid-Body Approximation of Human Hands with Application to Digital Twin | Bin Zhao et.al. | 2512.07359 | null |
| 2025-12-08 | PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning | Chen Gong et.al. | 2512.07342 | null |
| 2025-12-08 | RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation | Zhi Rao et.al. | 2512.07273 | null |
| 2025-12-08 | SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks | Florian Tretter et.al. | 2512.07266 | null |
| 2025-12-08 | Benchmarking Humanoid Imitation Learning with Motion Difficulty | Zhaorui Meng et.al. | 2512.07248 | null |
| 2025-12-08 | Towards Robust Protective Perturbation against DeepFake Face Swapping | Hengyang Yao et.al. | 2512.07228 | null |
| 2025-12-08 | Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation | Zhaoyang Liu et.al. | 2512.07212 | null |
| 2025-12-08 | MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning | Xuhui Zheng et.al. | 2512.07203 | null |
| 2025-12-08 | Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction | Zhen Huang et.al. | 2512.07200 | null |
| 2025-12-08 | Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models | Fenghua Weng et.al. | 2512.07141 | null |
| 2025-12-08 | TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning | Zebin Xing et.al. | 2512.07135 | null |
| 2025-12-08 | Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots | Jue Wang et.al. | 2512.07114 | null |
| 2025-12-07 | A Hetero-Associative Sequential Memory Model Utilizing Neuromorphic Signals: Validated on a Mobile Manipulator | Runcong Wang et.al. | 2512.07032 | null |
| 2025-12-07 | Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients | Krishna Arun et.al. | 2512.06990 | null |
| 2025-12-07 | LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding | Yu Yu et.al. | 2512.06982 | null |
| 2025-12-07 | Neuro-Vesicles: Neuromodulation Should Be a Dynamical System, Not a Tensor Decoration | Zilin Li et.al. | 2512.06966 | null |
| 2025-12-07 | Statistical analysis of Inverse Entropy-regularized Reinforcement Learning | Denis Belomestny et.al. | 2512.06956 | null |
| 2025-12-07 | Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features | Aseer Al Faisal et.al. | 2512.06925 | null |
| 2025-12-07 | Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models | Alexandr Plashchinsky et.al. | 2512.06920 | null |
| 2025-12-07 | Know your Trajectory – Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis | Clifford F et.al. | 2512.06917 | null |
| 2025-12-07 | Khalasi: Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields | Rushiraj Gadhvi et.al. | 2512.06912 | null |
| 2025-12-07 | An Analysis of Large Language Models for Simulating User Responses in Surveys | Ziyun Yu et.al. | 2512.06874 | null |
| 2025-12-07 | JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models | Ce Chi et.al. | 2512.06859 | null |
| 2025-12-07 | Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning | Tingyu Li et.al. | 2512.06835 | null |
| 2025-12-07 | MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning | Yueqian Wang et.al. | 2512.06810 | null |
| 2025-12-07 | PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance | Jifar Wakuma Ayana et.al. | 2512.06747 | null |
| 2025-12-07 | The Role of Entropy in Visual Grounding: Analysis and Optimization | Shuo Li et.al. | 2512.06726 | null |
| 2025-12-07 | RunawayEvil: Jailbreaking the Image-to-Video Generative Models | Songping Wang et.al. | 2512.06674 | null |
| 2025-12-07 | LightSearcher: Efficient DeepSearch via Experiential Memory | Hengzhi Lan et.al. | 2512.06653 | null |
| 2025-12-07 | Analyzing Collision Rates in Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning | Muyang Fan et.al. | 2512.06645 | null |
| 2025-12-07 | Learning to Hedge Swaptions | Zaniar Ahmadi et.al. | 2512.06639 | null |
| 2025-12-07 | MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment | Ruicheng Zhang et.al. | 2512.06628 | null |
| 2025-12-07 | A New Trajectory-Oriented Approach to Enhancing Comprehensive Crowd Navigation Performance | Xinyu Zhou et.al. | 2512.06608 | null |
| 2025-12-06 | MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding | Yuhao Su et.al. | 2512.06581 | null |
| 2025-12-06 | Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input | Zifan Xu et.al. | 2512.06571 | null |
| 2025-12-06 | A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation | Xiaocan Li et.al. | 2512.06547 | null |
| 2025-12-06 | Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning | Ming Chen et.al. | 2512.06533 | null |
| 2025-12-06 | Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains | Wanru Gong et.al. | 2512.06486 | null |
| 2025-12-06 | Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control | Nathan P. Lawrence et.al. | 2512.06471 | null |
| 2025-12-06 | RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs | Runlong Zhou et.al. | 2512.06392 | null |
| 2025-12-06 | VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning | Yuji Wang et.al. | 2512.06373 | null |
| 2025-12-06 | LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing | Zhiying Yang et.al. | 2512.06351 | null |
| 2025-12-06 | ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models | Jiahao Li et.al. | 2512.06328 | null |
| 2025-12-06 | A Hybrid Physics-Based and Reinforcement Learning Framework for Electric Vehicle Charging Time Prediction | Praharshitha Aryasomayajula et.al. | 2512.06287 | null |
| 2025-12-06 | Networked Restless Multi-Arm Bandits with Reinforcement Learning | Hanmo Zhang et.al. | 2512.06274 | null |
| 2025-12-06 | Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models | Chen Yang et.al. | 2512.06266 | null |
| 2025-12-06 | Learning Without Time-Based Embodiment Resets in Soft-Actor Critic | Homayoon Farrahi et.al. | 2512.06252 | null |
| 2025-12-06 | Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning | Chris Tava et.al. | 2512.06250 | null |
| 2025-12-06 | Auto-exploration for online reinforcement learning | Caleb Ju et.al. | 2512.06244 | null |
| 2025-12-06 | AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems | Chuanhao Nie et.al. | 2512.06240 | null |
| 2025-12-05 | Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration | Huizhen Yu et.al. | 2512.06218 | null |
| 2025-12-05 | Quantifying Memory Use in Reinforcement Learning with Temporal Range | Rodney Lafuente-Mercado et.al. | 2512.06204 | null |
| 2025-12-05 | JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning | Ufuk Çakır et.al. | 2512.06102 | null |
| 2025-12-05 | Empathy by Design: Aligning Large Language Models for Healthcare Dialogue | Emre Umucu et.al. | 2512.06097 | null |
| 2025-12-05 | Comparative Analysis of Autonomous and Systematic Control Strategies for Hole-Doped Hubbard Clusters: Reinforcement Learning versus Physics-Guided Design | Shivanshu Dwivedi et.al. | 2512.06095 | null |
| 2025-12-05 | Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring | Mohanakrishnan Hariharan et.al. | 2512.06060 | null |
| 2025-12-05 | EditThinker: Unlocking Iterative Reasoning for Any Image Editor | Hongyu Li et.al. | 2512.05965 | null |
| 2025-12-05 | Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity | Germán Kruszewski et.al. | 2512.05962 | null |
| 2025-12-05 | Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning | Yunhao Cao et.al. | 2512.05953 | null |
| 2025-12-05 | Variational Quantum Rainbow Deep Q-Network for Optimizing Resource Allocation Problem | Truong Thanh Hung Nguyen et.al. | 2512.05946 | null |
| 2025-12-05 | Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation | Fabian Konstantinidis et.al. | 2512.05812 | null |
| 2025-12-05 | Real-time Remote Tracking and Autonomous Planning for Whale Rendezvous using Robots | Sushmita Bhattacharya et.al. | 2512.05808 | null |
| 2025-12-05 | A Fast Anti-Jamming Cognitive Radar Deployment Algorithm Based on Reinforcement Learning | Wencheng Cai et.al. | 2512.05753 | null |
| 2025-12-05 | A High-Order Immersed Boundary Method for Fluid-Structure Interaction Problems | Yingjie Xia et.al. | 2512.05733 | null |
| 2025-12-05 | Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning | Ali Krayani et.al. | 2512.05711 | null |
| 2025-12-05 | LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving | Yiming Shu et.al. | 2512.05686 | null |
| 2025-12-05 | MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation | Zhitao He et.al. | 2512.05671 | null |
| 2025-12-05 | Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning | Zhenpeng Su et.al. | 2512.05591 | null |
| 2025-12-05 | Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning | Pengcheng Dai et.al. | 2512.05447 | null |
| 2025-12-05 | ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction | Jiangtong Tan et.al. | 2512.05422 | null |
| 2025-12-05 | State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning | Yuxiang Liu et.al. | 2512.05335 | null |
| 2025-12-04 | Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay | Mehmet Efe Lorasdagi et.al. | 2512.05320 | null |
| 2025-12-04 | Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces | Na Li et.al. | 2512.05291 | null |
| 2025-12-04 | Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem | Ali Al Housseini et.al. | 2512.05207 | null |
| 2025-12-04 | ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning | Shengyuan Ding et.al. | 2512.05111 | null |
| 2025-12-04 | STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models | Feng Xu et.al. | 2512.05107 | null |
| 2025-12-04 | Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning | Purbesh Mitra et.al. | 2512.05105 | link |
| 2025-11-06 | FoodRL: A Reinforcement Learning Ensembling Framework For In-Kind Food Donation Forecasting | Esha Sharma et.al. | 2511.04865 | null |
| 2025-11-06 | Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning | Thore Gerlach et.al. | 2511.04856 | null |
| 2025-11-06 | Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning | NVIDIA et.al. | 2511.04831 | null |
| 2025-11-06 | Unified Multimodal Diffusion Forcing for Forceful Manipulation | Zixuan Huang et.al. | 2511.04812 | null |
| 2025-11-06 | Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models | Chenxi Liu et.al. | 2511.04800 | null |
| 2025-11-05 | SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change Memory | Mahek Desai et.al. | 2511.04713 | null |
| 2025-11-05 | NCSAC: Effective Neural Community Search via Attribute-augmented Conductance | Longlong Lin et.al. | 2511.04712 | null |
| 2025-11-06 | GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction | Qingzhou Lu et.al. | 2511.04679 | null |
| 2025-11-06 | Forgetting is Everywhere | Ben Sanati et.al. | 2511.04666 | null |
| 2025-11-06 | Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning | Hampus Åström et.al. | 2511.04598 | null |
| 2025-11-06 | End-to-End Reinforcement Learning of Koopman Models for eNMPC of an Air Separation Unit | Daniel Mayfrank et.al. | 2511.04522 | null |
| 2025-11-06 | V-Thinker: Interactive Thinking with Images | Runqi Qiao et.al. | 2511.04460 | null |
| 2025-11-06 | Fitting Reinforcement Learning Model to Behavioral Data under Bandits | Hao Zhu et.al. | 2511.04454 | null |
| 2025-11-06 | The Peril of Preference: Why GRPO fails on Ordinal Rewards | Anisha Garg et.al. | 2511.04439 | null |
| 2025-11-06 | Temporal Action Selection for Action Chunking | Yueyang Weng et.al. | 2511.04421 | null |
| 2025-11-06 | GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies | Maëlic Neau et.al. | 2511.04357 | null |
| 2025-11-06 | MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments | Kuankuan Sima et.al. | 2511.04320 | null |
| 2025-11-06 | GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents | Jian Mu et.al. | 2511.04307 | null |
| 2025-11-06 | Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference | Matteo Cercola et.al. | 2511.04286 | null |
| 2025-11-06 | RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization | Zeng Zhiyuan et.al. | 2511.04285 | null |
| 2025-11-06 | SSPO: Subsentence-level Policy Optimization | Kun Yang et.al. | 2511.04256 | null |
| 2025-11-06 | Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies | Marco Iannotta et.al. | 2511.04249 | null |
| 2025-11-06 | Shared Spatial Memory Through Predictive Coding | Zhengru Fang et.al. | 2511.04235 | null |
| 2025-11-06 | Opus: A Quantitative Framework for Workflow Evaluation | Alan Seroul et.al. | 2511.04220 | null |
| 2025-11-06 | Black-Box Guardrail Reverse-engineering Attack | Hongwei Yao et.al. | 2511.04215 | null |
| 2025-11-06 | PUL-SLAM: Path-Uncertainty Co-Optimization with Lightweight Stagnation Detection for Efficient Robotic Exploration | Yizhen Yin et.al. | 2511.04180 | null |
| 2025-11-06 | Deep reinforcement learning based navigation of a jellyfish-like swimmer in flows with obstacles | Yihao Chen et.al. | 2511.04156 | null |
| 2025-11-06 | Exchange Policy Optimization Algorithm for Semi-Infinite Safe Reinforcement Learning | Jiaming Zhang et.al. | 2511.04147 | null |
| 2025-11-06 | BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning | Yitang Li et.al. | 2511.04131 | null |
| 2025-11-06 | RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning | Xinyuan Li et.al. | 2511.04120 | null |
| 2025-11-06 | CBMC-V3: A CNS-inspired Control Framework Towards Manipulation Agility with SNN | Yanbo Pang et.al. | 2511.04109 | null |
| 2025-11-06 | Necessary and Sufficient Conditions for the Optimization-Based Concurrent Execution of Learned Robotic Tasks | Sheikh A. Tahmid et.al. | 2511.04054 | null |
| 2025-11-06 | Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots | Yushi Wang et.al. | 2511.03996 | null |
| 2025-11-06 | Adaptive Temporal Refinement: Continuous Depth Allocation and Distance Regression for Efficient Action Localization | Ibne Farabi Shihab et.al. | 2511.03943 | null |
| 2025-11-06 | RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods | Raghav Sharma et.al. | 2511.03939 | null |
| 2025-11-05 | Learning to shine: Neuroevolution enables optical control of phase transitions | Sraddha Agrawal et.al. | 2511.03895 | null |
| 2025-11-05 | Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures | Florence Klitzner et.al. | 2511.03882 | null |
| 2025-11-05 | From Static to Dynamic: Enhancing Offline-to-Online Reinforcement Learning via Energy-Guided Diffusion Stratification | Lipeng Zu et.al. | 2511.03828 | null |
| 2025-11-05 | Scaling Agent Learning via Experience Synthesis | Zhaorun Chen et.al. | 2511.03773 | null |
| 2025-11-05 | Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning | Richard Dewey et.al. | 2511.03724 | null |
| 2025-11-05 | Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards | Guanning Zeng et.al. | 2511.03710 | null |
| 2025-11-05 | AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing | Mohsen Ahmadzadeh et.al. | 2511.03697 | null |
| 2025-11-05 | Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL | Lipeng Zu et.al. | 2511.03695 | null |
| 2025-11-05 | Simulation-Based Validation of an Integrated 4D/5D Digital-Twin Framework for Predictive Construction Control | Atena Khoshkonesh et.al. | 2511.03684 | null |
| 2025-11-05 | DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay | Daniel Perkins et.al. | 2511.03670 | null |
| 2025-11-05 | Towards Formalizing Reinforcement Learning Theory | Shangtong Zhang et.al. | 2511.03618 | null |
| 2025-11-05 | Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning | Iason Chrysomallis et.al. | 2511.03616 | null |
| 2025-11-05 | Tensor-Efficient High-Dimensional Q-learning | Junyi Wu et.al. | 2511.03595 | null |
| 2025-11-05 | PerfDojo: Automated ML Library Generation for Heterogeneous Architectures | Andrei Ivanov et.al. | 2511.03586 | null |
| 2025-11-05 | Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances | Iason Chrysomallis et.al. | 2511.03565 | null |
| 2025-11-05 | Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments | Bryan L. M. de Oliveira et.al. | 2511.03527 | null |
| 2025-11-05 | Reinforcement Learning Using known Invariances | Alexandru Cioba et.al. | 2511.03473 | null |
| 2025-11-05 | Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG | Longpeng Qiu et.al. | 2511.03410 | null |
| 2025-11-05 | Adaptable Hindsight Experience Replay for Search-Based Learning | Alexandros Vazaios et.al. | 2511.03405 | null |
| 2025-11-05 | Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning | Changxi Zhu et.al. | 2511.03348 | null |
| 2025-11-05 | DRL-Based Robust Multi-Timescale Anti-Jamming Approaches under State Uncertainty | Haoqin Zhao et.al. | 2511.03305 | null |
| 2025-11-05 | Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning | Ning Lyu et.al. | 2511.03279 | null |
| 2025-11-05 | Climate Adaptation with Reinforcement Learning: Economic vs. Quality of Life Adaptation Pathways | Miguel Costa et.al. | 2511.03243 | null |
| 2025-11-05 | Incorporating Quality of Life in Climate Adaptation Planning via Reinforcement Learning | Miguel Costa et.al. | 2511.03238 | null |
| 2025-11-05 | Collaborative Assembly Policy Learning of a Sightless Robot | Zeqing Zhang et.al. | 2511.03189 | null |
| 2025-11-05 | Periodic Skill Discovery | Jonghae Park et.al. | 2511.03187 | null |
| 2025-11-05 | Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control | Rewida Ali et.al. | 2511.03181 | null |
| 2025-11-05 | Optimizing Earth-Moon Transfer and Cislunar Navigation: Integrating Low-Energy Trajectories, AI Techniques and GNSS-R Technologies | Arsalan Muhammad et.al. | 2511.03173 | null |
| 2025-11-05 | Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning | Xin Liu et.al. | 2511.03167 | null |
| 2025-11-05 | Accelerating inverse materials design using generative diffusion models with reinforcement learning | Junwu Chen et.al. | 2511.03112 | null |
| 2025-11-05 | Scaling Multi-Agent Environment Co-Design with Diffusion Models | Hao Xiang Li et.al. | 2511.03100 | null |
| 2025-11-04 | Leveraging Discrete Function Decomposability for Scientific Design | James C. Bowden et.al. | 2511.03032 | null |
| 2025-11-04 | Value of Information-Enhanced Exploration in Bootstrapped DQN | Stergios Plataniotis et.al. | 2511.02969 | null |
| 2025-11-04 | Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks | Mohsin Mahmud Topu et.al. | 2511.02957 | null |
| 2025-11-04 | Audience Amplified: Virtual Audiences in Asynchronously Performed AR Theater | You-Jin Kim et.al. | 2511.02807 | null |
| 2025-11-04 | MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning | Qianhao Yuan et.al. | 2511.02805 | null |
| 2025-11-04 | From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos | Xun Wang et.al. | 2511.02762 | null |
| 2025-11-04 | Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning | Bowen Jin et.al. | 2511.02755 | null |
| 2025-11-04 | VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models | Zhicheng Zhang et.al. | 2511.02712 | null |
| 2025-11-04 | Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs | Georgios Tzannetos et.al. | 2511.02690 | null |
| 2025-11-04 | RL-Aided Cognitive ISAC: Robust Detection and Sensing-Communication Trade-offs | Adam Umra et.al. | 2511.02672 | null |
| 2025-11-04 | Natural-gas storage modelling by deep reinforcement learning | Tiziano Balaconi et.al. | 2511.02646 | null |
| 2025-11-04 | Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning | Tiberiu-Andrei Georgescu et.al. | 2511.02605 | null |
| 2025-11-04 | Directional-Clamp PPO | Gilad Karpel et.al. | 2511.02577 | null |
| 2025-11-04 | Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning | Yixiu Mao et.al. | 2511.02567 | null |
| 2025-11-04 | An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems | Changhao Miao et.al. | 2511.02525 | null |
| 2025-11-04 | Dexterous Robotic Piano Playing at Scale | Le Chen et.al. | 2511.02504 | null |
| 2025-11-04 | Auditable-choice reframing unlocks RL-based verification for open-ended tasks | Mengyu Zhang et.al. | 2511.02463 | null |
| 2025-11-04 | ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension | Duo Xu et.al. | 2511.02415 | null |
| 2025-11-04 | Large-scale automatic carbon ion treatment planning for head and neck cancers via parallel multi-agent reinforcement learning | Jueye Zhang et.al. | 2511.02314 | null |
| 2025-11-04 | Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning | Beyazit Yalcinkaya et.al. | 2511.02304 | null |
| 2025-11-04 | Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation | Zhiwei Zhang et.al. | 2511.02303 | null |
| 2025-11-04 | Reinforcement learning based data assimilation for unknown state model | Ziyi Wang et.al. | 2511.02286 | null |
| 2025-11-04 | SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning | Fangxun Shu et.al. | 2511.02280 | null |
| 2025-11-04 | Structural Plasticity as Active Inference: A Biologically-Inspired Architecture for Homeostatic Control | Brennen A. Hill et.al. | 2511.02241 | null |
| 2025-11-04 | Learning Interactive World Model for Object-Centric Reinforcement Learning | Fan Feng et.al. | 2511.02225 | null |
| 2025-11-04 | Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments | Manonmani Sekar et.al. | 2511.02217 | null |
| 2025-11-04 | Adaptive Cooperative Transmission Design for Ultra-Reliable Low-Latency Communications via Deep Reinforcement Learning | Hyemin Yu et.al. | 2511.02216 | null |
| 2025-11-04 | Training Proactive and Personalized LLM Agents | Weiwei Sun et.al. | 2511.02208 | null |
| 2025-11-04 | A Quantitative Comparison of Centralised and Distributed Reinforcement Learning-Based Control for Soft Robotic Arms | Linxin Hou et.al. | 2511.02192 | null |
| 2025-11-03 | JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading | Valentin Mohl et.al. | 2511.02136 | null |
| 2025-11-03 | Second-Order Policy Gradient Methods for the Linear Quadratic Regulator | Amirreza Valaei et.al. | 2511.02095 | null |
| 2025-11-03 | Automated Reward Design for Gran Turismo | Michel Ma et.al. | 2511.02094 | null |
| 2025-11-03 | Deep Reinforcement Learning for Multi-flow Routing in Heterogeneous Wireless Networks | Brian Kim et.al. | 2511.02030 | null |
| 2025-11-03 | ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book | Patrick Cheridito et.al. | 2511.02016 | null |
| 2025-11-02 | Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR | Abdelaziz Bounhar et.al. | 2511.01937 | link |
| 2025-11-02 | Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch | Yirong Zeng et.al. | 2511.01934 | null |
| 2025-11-03 | GenDexHand: Generative Simulation for Dexterous Hands | Feng Chen et.al. | 2511.01791 | null |
| 2025-11-03 | MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll | Alexander Schperberg et.al. | 2511.01774 | null |
| 2025-11-03 | RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks | Mian Wu et.al. | 2511.01758 | null |
| 2025-11-03 | Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding | Jungyeon Koh et.al. | 2511.01695 | null |
| 2025-11-03 | Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward | Xiaogang Xu et.al. | 2511.01645 | null |
| 2025-11-03 | Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models | Xiaoyu Zhan et.al. | 2511.01618 | null |
| 2025-11-03 | L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3 | Xinyue Yang et.al. | 2511.01602 | null |
| 2025-11-03 | Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning | Aditya Kapoor et.al. | 2511.01554 | null |
| 2025-11-03 | TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks | Hanwen Xu et.al. | 2511.01527 | null |
| 2025-11-03 | BARD: budget-aware reasoning distillation | Lujie Niu et.al. | 2511.01470 | null |
| 2025-11-03 | Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis | Yuhang Huang et.al. | 2511.01425 | null |
| 2025-11-03 | Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm | Amrapali Pednekar et.al. | 2511.01415 | null |
| 2025-11-03 | AoI-Aware Machine Learning for Constrained Multimodal Sensing-Aided Communications | Abolfazl Zakeri et.al. | 2511.01406 | null |
| 2025-11-03 | Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization | Ziqi Wang et.al. | 2511.01374 | null |
| 2025-11-03 | Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series | Wenrui Cai et.al. | 2511.01354 | null |
| 2025-11-03 | Diffusion-Based Solver for CNF Placement on the Cloud-Continuum | Álvaro Vázquez Rodríguez et.al. | 2511.01343 | null |
| 2025-11-03 | RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models | Hongyin Zhang et.al. | 2511.01331 | null |
| 2025-11-03 | From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models | Sureyya Akin et.al. | 2511.01310 | null |
| 2025-11-03 | Optimizing Electric Vehicle Charging Station Placement Using Reinforcement Learning and Agent-Based Simulations | Minh-Duc Nguyen et.al. | 2511.01218 | null |
| 2025-11-03 | Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering | Riddhi Jain et.al. | 2511.01213 | null |
| 2025-11-03 | DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection | Guoxin Ma et.al. | 2511.01192 | null |
| 2025-11-03 | Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning | Ru Wang et.al. | 2511.01191 | null |
| 2025-11-03 | DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models | Ruofan Zhang et.al. | 2511.01170 | null |
| 2025-11-02 | SLAP: Shortcut Learning for Abstract Planning | Y. Isabel Liu et.al. | 2511.01107 | null |
| 2025-11-02 | HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning | Yujian Liu et.al. | 2511.01104 | null |
| 2025-11-02 | Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment | Zihan Wang et.al. | 2511.01083 | null |
| 2025-11-02 | Predictive Auxiliary Learning for Belief-based Multi-Agent Systems | Qinwei Huang et.al. | 2511.01078 | null |
| 2025-11-02 | Quantum Reinforcement Learning for 6G and Beyond Wireless Networks | Dinh-Hieu Tran et.al. | 2511.01070 | null |
| 2025-11-02 | Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning | Wenjin Liu et.al. | 2511.01016 | link |
| 2025-11-02 | IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation | Bosi Wen et.al. | 2511.01014 | null |
| 2025-11-02 | MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL | Haolin Yang et.al. | 2511.01008 | link |
| 2025-11-02 | GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies | Ziye Wang et.al. | 2511.00998 | null |
| 2025-11-02 | Optimizing Energy and Latency in 6G Smart Cities with Edge CyberTwins | Amine Abouaomar et.al. | 2511.00955 | null |
| 2025-11-02 | KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization | Joonyoung Lim et.al. | 2511.00880 | null |
| 2025-11-02 | Optimal Undulatory Swimming with Constrained Deformation and Actuation Intervals | Fumiya Tokoro et.al. | 2511.00816 | null |
| 2025-11-02 | Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games | Runyu Lu et.al. | 2511.00811 | null |
| 2025-11-02 | Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events? | Bowen Fang et.al. | 2511.00808 | null |
| 2025-11-02 | Logic-informed reinforcement learning for cross-domain optimization of large-scale cyber-physical systems | Guangxi Wan et.al. | 2511.00806 | null |
| 2025-11-02 | GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents | Jie JW Wu et.al. | 2511.00802 | null |
| 2025-11-02 | Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration | Yan Sun et.al. | 2511.00794 | null |
| 2025-11-02 | Power Control Based on Multi-Agent Deep Q Network for D2D Communication | Shi Gengtian et.al. | 2511.00767 | null |
| 2025-11-01 | Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries | Minghe Shen et.al. | 2511.00710 | null |
| 2025-11-01 | PreferThinker: Reasoning-based Personalized Image Preference Assessment | Shengqi Xu et.al. | 2511.00609 | null |
| 2025-11-01 | OpenSIR: Open-Ended Self-Improving Reasoner | Wai-Chung Kwan et.al. | 2511.00602 | link |
| 2025-11-01 | Improving Robustness to Out-of-Distribution States in Imitation Learning via Deep Koopman-Boosted Diffusion Policy | Dianye Huang et.al. | 2511.00555 | null |
| 2025-11-01 | Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control | Qiang Li et.al. | 2511.00551 | null |
| 2025-11-01 | Robust Single-Agent Reinforcement Learning for Regional Traffic Signal Control Under Demand Fluctuations | Qiang Li et.al. | 2511.00549 | null |
| 2025-11-01 | ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation | Panwang Pan et.al. | 2511.00511 | null |
| 2025-11-01 | GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining | Chunyu Wei et.al. | 2511.00457 | null |
| 2025-11-01 | Bootstrap Off-policy with World Model | Guojian Zhan et.al. | 2511.00423 | null |
| 2025-11-01 | UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings | Zhibin Lan et.al. | 2511.00405 | link |
| 2025-11-01 | CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks | Long Li et.al. | 2511.00396 | null |
| 2025-11-01 | VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning | Xuanle Zhao et.al. | 2511.00391 | link |
| 2025-11-01 | Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond | Fan Zhang et.al. | 2511.00389 | null |
| 2025-11-01 | Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict | Chaochen Wu et.al. | 2511.00370 | null |
| 2025-10-31 | Reinforcement Learning for Resource Allocation in Vehicular Multi-Fog Computing | Mohammad Hadi Akbarzadeh et.al. | 2511.00276 | null |
| 2025-10-31 | Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning | Michiel Straat et.al. | 2511.00272 | null |
| 2025-10-31 | Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning | Marwa Abdulhai et.al. | 2511.00222 | null |
| 2025-10-31 | Iterative Foundation Model Fine-Tuning on Multiple Rewards | Pouya M. Ghari et.al. | 2511.00220 | null |
| 2025-10-31 | Deep reinforcement learning for optimal trading with partial information | Andrea Macrì et.al. | 2511.00190 | null |
| 2025-10-31 | Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning | Shiman Zhang et.al. | 2511.00166 | null |
| 2025-10-31 | EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations | Justin Yu et.al. | 2511.00153 | null |
| 2025-10-31 | A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control | Qing Guo et.al. | 2511.00136 | null |
| 2025-10-31 | DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads | Antonio Guillen-Perez et.al. | 2511.00117 | null |
| 2025-10-31 | LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers | Avisek Naug et.al. | 2511.00116 | null |
| 2025-10-31 | End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning | Hanae Elmekki et.al. | 2511.00114 | null |
| 2025-10-30 | Real-DRL: Teach and Learn in Reality | Yanbing Mao et.al. | 2511.00112 | null |
| 2025-10-30 | Self-Improving Vision-Language-Action Models with Data Generation via Residual RL | Wenli Xiao et.al. | 2511.00091 | null |
| 2025-10-30 | Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail | NVIDIA et.al. | 2511.00088 | null |
| 2025-10-29 | Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models | Tue Le et.al. | 2511.00066 | null |
| 2025-10-31 | Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems | Alireza Saleh Abadi et.al. | 2510.27659 | null |
| 2025-10-31 | Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning | Yuhong Liu et.al. | 2510.27606 | link |
| 2025-10-31 | MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval | Qi Luo et.al. | 2510.27569 | null |
| 2025-10-31 | Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval | Yulong Hui et.al. | 2510.27566 | null |
| 2025-10-31 | VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision | Xuan Gong et.al. | 2510.27462 | null |
| 2025-10-31 | Learning Soft Robotic Dynamics with Active Exploration | Hehui Zheng et.al. | 2510.27428 | null |
| 2025-10-31 | DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains | Tian Liang et.al. | 2510.27419 | null |
| 2025-10-31 | Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints | Yueyang Wang et.al. | 2510.27383 | null |
| 2025-10-31 | Reasoning Models Sometimes Output Illegible Chains of Thought | Arun Jose et.al. | 2510.27338 | null |
| 2025-10-31 | When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making | Ali Raza Jafree et.al. | 2510.27334 | null |
| 2025-10-31 | Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines | Kristina Levina et.al. | 2510.27329 | null |
| 2025-10-31 | A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination | Zhengchang Hua et.al. | 2510.27289 | null |
| 2025-10-31 | Inferring trust in recommendation systems from brain, behavioural, and physiological data | Vincent K. M. Cheung et.al. | 2510.27272 | null |
| 2025-10-31 | MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models | Kangkun Mao et.al. | 2510.27267 | null |
| 2025-10-31 | GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation | Tao Liu et.al. | 2510.27210 | null |
| 2025-10-31 | ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction | Jing Chang et.al. | 2510.27168 | null |
| 2025-10-31 | Disrupting Networks: Amplifying Social Dissensus via Opinion Perturbation and Large Language Models | Erica Coppolillo et.al. | 2510.27152 | null |
| 2025-10-31 | AURA: A Reinforcement Learning Framework for AI-Driven Adaptive Conversational Surveys | Jinwen Tang et.al. | 2510.27126 | null |
| 2025-10-31 | Towards Understanding Self-play for LLM Reasoning | Justin Yang Chae et.al. | 2510.27072 | null |
| 2025-10-31 | Distributed Precoding for Cell-free Massive MIMO in O-RAN: A Multi-agent Deep Reinforcement Learning Framework | Mohammad Hossein Shokouhi et.al. | 2510.27069 | null |
| 2025-10-31 | Adaptive Human-Computer Interaction Strategies Through Reinforcement Learning in Complex | Rui Liu et.al. | 2510.27058 | null |
| 2025-10-30 | SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation | Eric T. Chang et.al. | 2510.27048 | null |
| 2025-10-30 | Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning | Md Tanvirul Alam et.al. | 2510.27044 | link |
| 2025-10-30 | e1: Learning Adaptive Control of Reasoning Effort | Michael Kleinman et.al. | 2510.27042 | null |
| 2025-10-30 | Algorithmic Predation: Equilibrium Analysis in Dynamic Oligopolies with Smooth Market Sharing | Fabian Raoul Pieroth et.al. | 2510.27008 | null |
| 2025-10-30 | A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms | Elise Wolf et.al. | 2510.27001 | null |
| 2025-10-30 | Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench | Fenfen Lin et.al. | 2510.26865 | link |
| 2025-10-30 | Defeating the Training-Inference Mismatch via FP16 | Penghui Qi et.al. | 2510.26788 | link |
| 2025-10-30 | A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation | Ashwin Kumar et.al. | 2510.26740 | null |
| 2025-10-30 | Stabilizing Rayleigh-Benard convection with reinforcement learning trained on a reduced-order model | Qiwei Chen et.al. | 2510.26705 | null |
| 2025-10-30 | Kimi Linear: An Expressive, Efficient Attention Architecture | Kimi Team et.al. | 2510.26692 | link |
| 2025-10-30 | Action-Driven Processes for Continuous-Time Control | Ruimin He et.al. | 2510.26672 | null |
| 2025-10-30 | Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation | Qianyou Zhao et.al. | 2510.26670 | null |
| 2025-10-30 | The Era of Agentic Organization: Learning to Organize with Language Models | Zewen Chi et.al. | 2510.26658 | null |
| 2025-10-30 | Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments | Xiaoyi He et.al. | 2510.26646 | null |
| 2025-10-30 | Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications | Chuang Zhang et.al. | 2510.26628 | null |
| 2025-10-30 | A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication | Weixuan Chen et.al. | 2510.26610 | null |
| 2025-10-30 | Emu3.5: Native Multimodal Models are World Learners | Yufeng Cui et.al. | 2510.26583 | link |
| 2025-10-30 | InfoFlow: Reinforcing Search Agent Via Reward Density Optimization | Kun Luo et.al. | 2510.26575 | null |
| 2025-10-30 | Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics | Prathamesh Kothavale et.al. | 2510.26551 | null |
| 2025-10-30 | Think Outside the Policy: In-Context Steered Policy Optimization | Hsiu-Yuan Huang et.al. | 2510.26519 | null |
| 2025-10-30 | Data-Efficient RLVR via Off-Policy Influence Guidance | Erle Zhu et.al. | 2510.26491 | null |
| 2025-10-30 | ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems | Qiaoling Chen et.al. | 2510.26475 | null |
| 2025-10-30 | PolarZero: A Reinforcement Learning Approach for Low-Complexity Polarization Kernel Design | Yi-Ting Hong et.al. | 2510.26452 | null |
| 2025-10-30 | An Impulse Control Approach to Market Making in a Hawkes LOB Market | Konark Jain et.al. | 2510.26438 | null |
| 2025-10-30 | Human-in-the-loop Online Rejection Sampling for Robotic Manipulation | Guanxing Lu et.al. | 2510.26406 | null |
| 2025-10-30 | Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning | Wenchang Duan et.al. | 2510.26389 | null |
| 2025-10-30 | Towards Reinforcement Learning Based Log Loading Automation | Ilya Kurinov et.al. | 2510.26363 | null |
| 2025-10-30 | Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle | Sebastian Zieglmeier et.al. | 2510.26347 | null |
| 2025-10-30 | Offline Clustering of Preference Learning with Active-data Augmentation | Jingyuan Liu et.al. | 2510.26301 | null |
| 2025-10-30 | Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving | Lin Liu et.al. | 2510.26292 | null |
| 2025-10-30 | Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search | Guochang Li et.al. | 2510.26287 | null |
| 2025-10-30 | Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments | Gangyang Li et.al. | 2510.26280 | null |
| 2025-10-30 | Graph-Enhanced Policy Optimization in LLM Agent Training | Jiazhen Yuan et.al. | 2510.26270 | null |
| 2025-10-30 | A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation | Songxin Lei et.al. | 2510.26184 | null |
| 2025-10-30 | One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning | Renhao Li et.al. | 2510.26167 | null |
| 2025-10-30 | Learning to Manage Investment Portfolios beyond Simple Utility Functions | Maarten P. Scholl et.al. | 2510.26165 | null |
| 2025-10-30 | Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math | Bo Pang et.al. | 2510.26143 | null |
| 2025-10-30 | EgoExo-Con: Exploring View-Invariant Video Temporal Understanding | Minjoon Jung et.al. | 2510.26113 | null |
| 2025-10-30 | Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error | Chenming Tang et.al. | 2510.26109 | null |
| 2025-10-30 | GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks | Chenrui Shi et.al. | 2510.26098 | null |
| 2025-10-30 | Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing | Fazel Arasteh et.al. | 2510.26089 | null |
| 2025-10-30 | Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion | Chi Zhang et.al. | 2510.26067 | null |
| 2025-10-30 | Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods | Emily Steiner et.al. | 2510.26040 | null |
| 2025-10-29 | Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation | Feichen Gan et.al. | 2510.26026 | null |
| 2025-10-29 | PORTool: Tool-Use LLM Training with Rewarded Tree | Feijie Wu et.al. | 2510.26020 | null |
| 2025-10-29 | Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning | Yihe Deng et.al. | 2510.25992 | null |
| 2025-10-29 | Estimating cognitive biases with attention-aware inverse planning | Sounak Banerjee et.al. | 2510.25951 | null |
| 2025-10-29 | InputDSA: Demixing then Comparing Recurrent and Externally Driven Dynamics | Ann Huang et.al. | 2510.25943 | null |
| 2025-10-29 | Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion | Ziyi Wang et.al. | 2510.25929 | null |
| 2025-10-29 | $π_\texttt{RL}$ : Online RL Fine-tuning for Flow-based Vision-Language-Action Models | Kang Chen et.al. | 2510.25889 | null |
| 2025-10-29 | Approximating Human Preferences Using a Multi-Judge Learned System | Eitán Sprejer et.al. | 2510.25884 | null |
| 2025-10-29 | MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs | Xiaoke Huang et.al. | 2510.25867 | null |
| 2025-10-29 | Adversarial Pre-Padding: Generating Evasive Network Traffic Against Transformer-Based Classifiers | Quanliang Jing et.al. | 2510.25810 | null |
| 2025-10-29 | MetaLore: Learning to Orchestrate Communication and Computation for Metaverse Synchronization | Elif Ebru Ohri et.al. | 2510.25705 | null |
| 2025-10-29 | PairUni: Pairwise Training for Unified Multimodal Language Models | Jiani Zheng et.al. | 2510.25682 | null |
| 2025-10-29 | Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning | Federica Tonti et.al. | 2510.25679 | null |
| 2025-10-29 | ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents | Tianyu Yang et.al. | 2510.25668 | null |
| 2025-10-29 | Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills | Weikang Wan et.al. | 2510.25634 | null |
| 2025-10-29 | EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis | Yusheng Liao et.al. | 2510.25628 | null |
| 2025-10-29 | On the instability of local learning algorithms: Q-learning can fail in infinite state spaces | Urtzi Ayesta et.al. | 2510.25572 | null |
| 2025-10-29 | Deep Reinforcement Learning-Based Cooperative Rate Splitting for Satellite-to-Underground Communication Networks | Kaiqiang Lin et.al. | 2510.25562 | null |
| 2025-10-29 | Off-policy Reinforcement Learning with Model-based Exploration Augmentation | Likun Wang et.al. | 2510.25529 | null |
| 2025-10-29 | Zero Reinforcement Learning Towards General Domains | Yuyuan Zeng et.al. | 2510.25528 | null |
| 2025-10-29 | MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL | Zekun Xu et.al. | 2510.25510 | null |
| 2025-10-29 | Dynamic Beamforming and Power Allocation in ISAC via Deep Reinforcement Learning | Duc Nguyen Dao et.al. | 2510.25496 | null |
| 2025-10-29 | Reinforcement Learning techniques for the flavor problem in particle physics | A. Giarnetti et.al. | 2510.25495 | null |
| 2025-10-29 | Generalized Pseudo-Relevance Feedback | Yiteng Tu et.al. | 2510.25488 | null |
| 2025-10-29 | Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning | Kei Ikemura et.al. | 2510.25405 | null |
| 2025-10-29 | Model-Free Robust Beamforming in Satellite Downlink using Reinforcement Learning | Alea Schröder et.al. | 2510.25393 | null |
| 2025-10-29 | Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork | Beiwen Zhang et.al. | 2510.25340 | null |
| 2025-10-29 | GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning | Jiaqi Wu et.al. | 2510.25320 | null |
| 2025-10-29 | Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning | Sagalpreet Singh et.al. | 2510.25311 | null |
| 2025-10-29 | Adaptive Design of mmWave Initial Access Codebooks using Reinforcement Learning | Sabrine Aroua et.al. | 2510.25271 | null |
| 2025-10-29 | The influence of the random numbers quality on the results in stochastic simulations and machine learning | Benjamin A. Antunes et.al. | 2510.25269 | null |
| 2025-10-29 | SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation | Wang zhi et.al. | 2510.25268 | null |
| 2025-10-29 | One-shot Humanoid Whole-body Motion Learning | Hao Huang et.al. | 2510.25241 | null |
| 2025-09-26 | Impact of Collective Behaviors of Autonomous Vehicles on Urban Traffic Dynamics: A Multi-Agent Reinforcement Learning Approach | Ahmet Onur Akman et.al. | 2509.22216 | null |
| 2025-07-29 | Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics | Leonard Hinckeldey et.al. | 2507.21638 | null |
| 2025-07-23 | Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains | Anisha Gunjal et.al. | 2507.17746 | null |
| 2025-07-23 | Megrez2 Technical Report | Boxun Li et.al. | 2507.17728 | null |
| 2025-07-23 | How Should We Meta-Learn Reinforcement Learning Algorithms? | Alexander David Goldie et.al. | 2507.17668 | null |
| 2025-07-23 | CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning | Lingxiao Tang et.al. | 2507.17548 | null |
| 2025-07-23 | Generalized Advantage Estimation for Distributional Policy Gradients | Shahil Shaik et.al. | 2507.17530 | null |
| 2025-07-23 | Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice | Shanbo Cheng et.al. | 2507.17527 | null |
| 2025-07-23 | URPO: A Unified Reward & Policy Optimization Framework for Large Language Models | Songshuo Lu et.al. | 2507.17515 | null |
| 2025-07-23 | Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning | Yu Li et.al. | 2507.17512 | null |
| 2025-07-23 | ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents | Chang Nie et.al. | 2507.17462 | null |
| 2025-07-23 | Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning | Situo Zhang et.al. | 2507.17448 | null |
| 2025-07-22 | Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning | Junhao Shen et.al. | 2507.16814 | null |
| 2025-07-22 | Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty | Mehul Damani et.al. | 2507.16806 | null |
| 2025-07-22 | Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning | Mian Ibad Ali Shah et.al. | 2507.16796 | null |
| 2025-07-22 | Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning | Ang Li et.al. | 2507.16746 | link |
| 2025-07-23 | Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints | Zhenyun Yin et.al. | 2507.16727 | null |
| 2025-07-22 | Adaptive Inventory Strategies using Deep Reinforcement Learning for Dynamic Agri-Food Supply Chains | Amandeep Kaur et.al. | 2507.16670 | null |
| 2025-07-22 | FOGNITE: Federated Learning-Enhanced Fog-Cloud Architecture | Somayeh Sobati-M et.al. | 2507.16668 | null |
| 2025-07-22 | Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis | Sara Giordano et.al. | 2507.16641 | null |
| 2025-07-22 | Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems | Ali Mohamed Ali et.al. | 2507.16635 | null |
| 2025-07-22 | Step-Audio 2 Technical Report | Boyong Wu et.al. | 2507.16632 | link |
| 2025-07-21 | The Impact of Language Mixing on Bilingual LLM Reasoning | Yihao Li et.al. | 2507.15849 | null |
| 2025-07-21 | GUI-G $^2$ : Gaussian Reward Modeling for GUI Grounding | Fei Tang et.al. | 2507.15846 | link |
| 2025-07-22 | Hierarchical Budget Policy Optimization for Adaptive Reasoning | Shangke Lyu et.al. | 2507.15844 | link |
| 2025-07-21 | LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra | Seth Karten et.al. | 2507.15815 | link |
| 2025-07-21 | Power-Constrained Policy Gradient Methods for LQR | Ashwin Verma et.al. | 2507.15806 | null |
| 2025-07-21 | Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning | Sneheel Sarangi et.al. | 2507.15788 | null |
| 2025-07-21 | Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR | Jiakang Wang et.al. | 2507.15778 | link |
| 2025-07-21 | LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization | Xingyu Wu et.al. | 2507.15758 | link |
| 2025-07-21 | EMP: Executable Motion Prior for Humanoid Robot Standing Upper-body Motion Imitation | Haocheng Xu et.al. | 2507.15649 | null |
| 2025-07-21 | Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training | Kailai Yang et.al. | 2507.15640 | null |
| 2025-07-18 | CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | Xiaoya Li et.al. | 2507.14111 | link |
| 2025-07-18 | Preference-based Multi-Objective Reinforcement Learning | Ni Mu et.al. | 2507.14066 | null |
| 2025-07-18 | Reframing attention as a reinforcement learning problem for causal discovery | Turan Orujlu et.al. | 2507.13920 | null |
| 2025-07-18 | Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments | Kathrin Korte et.al. | 2507.13846 | null |
| 2025-07-18 | Scalable Submodular Policy Optimization via Pruned Submodularity Graph | Aditi Anand et.al. | 2507.13834 | null |
| 2025-07-18 | DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training | Zhixin Wang et.al. | 2507.13833 | null |
| 2025-07-18 | Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery | Joydeep Chandra et.al. | 2507.13757 | null |
| 2025-07-18 | LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction | Jing Chang et.al. | 2507.13712 | null |
| 2025-07-18 | CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation | Jing Chang et.al. | 2507.13710 | null |
| 2025-07-18 | State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions | Sen Lu et.al. | 2507.13638 | null |
| 2025-07-17 | VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning | Senqiao Yang et.al. | 2507.13348 | link |
| 2025-07-17 | The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner | Zhouqi Hua et.al. | 2507.13332 | null |
| 2025-07-17 | Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour | Emma M. A. Harrison et.al. | 2507.13277 | null |
| 2025-07-17 | QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation | Jiazheng Li et.al. | 2507.13266 | null |
| 2025-07-17 | Signal Temporal Logic Compliant Co-design of Planning and Control | Manas Sashank Juvvi et.al. | 2507.13225 | null |
| 2025-07-17 | Spectral Bellman Method: Unifying Representation and Exploration in RL | Ofir Nabati et.al. | 2507.13181 | null |
| 2025-07-17 | Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback | Suzie Kim et.al. | 2507.13171 | null |
| 2025-07-17 | Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities | Hao Sun et.al. | 2507.13158 | null |
| 2025-07-17 | From Roots to Rewards: Dynamic Tree Reasoning with RL | Ahmed Bahloul et.al. | 2507.13142 | null |
| 2025-07-17 | ZipMPC: Compressed Context-Dependent MPC Cost via Imitation Learning | Rahel Rickenbach et.al. | 2507.13088 | null |
| 2025-07-16 | EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos | Ruihan Yang et.al. | 2507.12440 | null |
| 2025-07-16 | Improving Reinforcement Learning Sample-Efficiency using Local Approximation | Mohit Prashant et.al. | 2507.12383 | null |
| 2025-07-16 | Thought Purity: Defense Paradigm For Chain-of-Thought Attack | Zihao Xue et.al. | 2507.12314 | null |
| 2025-07-16 | Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning | Yuhao Chen et.al. | 2507.12215 | null |
| 2025-07-16 | BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search | Azhar Ikhtiarudin et.al. | 2507.12189 | link |
| 2025-07-17 | Efficient Preparation of Fermionic Superfluids in an Optical Dipole Trap through Reinforcement Learning | Yueyang Min et.al. | 2507.12152 | null |
| 2025-07-16 | Topology Enhanced MARL for Multi-Vehicle Cooperative Decision-Making of CAVs | Ye Han et.al. | 2507.12110 | null |
| 2025-07-16 | Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics | Muleilan Pei et.al. | 2507.12083 | null |
| 2025-07-16 | Towards Ultra-Reliable 6G in-X Subnetworks: Dynamic Link Adaptation by Deep Reinforcement Learning | Fateme Salehi et.al. | 2507.12031 | null |
| 2025-07-16 | QAS-QTNs: Curriculum Reinforcement Learning-Driven Quantum Architecture Search for Quantum Tensor Networks | Siddhant Dutta et.al. | 2507.12013 | null |
| 2025-07-15 | Robot Drummer: Learning Rhythmic Skills for Humanoid Drumming | Asad Ali Shahid et.al. | 2507.11498 | null |
| 2025-07-15 | Exploring the robustness of TractOracle methods in RL-based tractography | Jeremi Levesque et.al. | 2507.11486 | null |
| 2025-07-15 | Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light | Mani Hamidi et.al. | 2507.11482 | null |
| 2025-07-15 | Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs | Gabriel Bo et.al. | 2507.11371 | null |
| 2025-07-15 | Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning | Daniel Tanneberg et.al. | 2507.11367 | null |
| 2025-07-15 | Sensing Accuracy Optimization for Multi-UAV SAR Interferometry with Data Offloading | Mohamed-Amine Lahmeri et.al. | 2507.11284 | null |
| 2025-07-15 | Ocean Diviner: A Diffusion-Augmented Reinforcement Learning for AUV Robust Control in the Underwater Tasks | Weiyi Liu et.al. | 2507.11283 | null |
| 2025-07-15 | Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound | Tal Fiskus et.al. | 2507.11269 | null |
| 2025-07-15 | Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction | Deepak Kumar Panda et.al. | 2507.11173 | null |
| 2025-07-15 | Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities | Yiting Qu et.al. | 2507.11155 | null |
| 2025-07-14 | EmbRACE-3K: Embodied Reasoning and Action in Complex Environments | Mingxian Lin et.al. | 2507.10548 | link |
| 2025-07-14 | Disentangling Neural Disjunctive Normal Form Models | Kexin Gu Baugh et.al. | 2507.10546 | null |
| 2025-07-14 | Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Mingqi Wu et.al. | 2507.10532 | link |
| 2025-07-14 | Some remarks on gradient dominance and LQR policy optimization | Eduardo D. Sontag et.al. | 2507.10452 | null |
| 2025-07-14 | Prompt Informed Reinforcement Learning for Visual Coverage Path Planning | Venkat Margapuri et.al. | 2507.10284 | null |
| 2025-07-14 | Cross-Timeslot Optimization for Distributed GPU Inference Using Reinforcement Learning | Chengze Du et.al. | 2507.10259 | null |
| 2025-07-14 | ToMacVF : Temporal Macro-action Value Factorization for Asynchronous Multi-Agent Reinforcement Learning | Wenjing Zhang et.al. | 2507.10251 | null |
| 2025-07-14 | Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning? | Yumi Omori et.al. | 2507.10174 | null |
| 2025-07-14 | Robust RL Control for Bipedal Locomotion with Closed Kinematic Chains | Egor Maslennikov et.al. | 2507.10164 | null |
| 2025-07-14 | Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review | Siyi Hu et.al. | 2507.10142 | null |
| 2025-07-11 | One Token to Fool LLM-as-a-Judge | Yulai Zhao et.al. | 2507.08794 | null |
| 2025-07-11 | Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning | James McCarthy et.al. | 2507.08793 | null |
| 2025-07-11 | Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data | Jeonghye Kim et.al. | 2507.08761 | null |
| 2025-07-11 | On the Effect of Regularization in Policy Mirror Descent | Jan Felix Kleuker et.al. | 2507.08718 | null |
| 2025-07-11 | SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations | Peter Crowley et.al. | 2507.08707 | null |
| 2025-07-11 | elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings | Philip Osborne et.al. | 2507.08705 | null |
| 2025-07-11 | Multi-critic Learning for Whole-body End-effector Twist Tracking | Aravind Elanjimattathil Vijayan et.al. | 2507.08656 | null |
| 2025-07-11 | Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees | Berire Gunes Reyhan et.al. | 2507.08653 | null |
| 2025-07-11 | Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning | Xingguang Ji et.al. | 2507.08649 | link |
| 2025-07-11 | Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data | Parag Dutta et.al. | 2507.08610 | null |
| 2025-07-10 | Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology | Haochen Wang et.al. | 2507.07999 | link |
| 2025-07-10 | Single-pass Adaptive Image Tokenization for Minimum Program Search | Shivam Duggal et.al. | 2507.07995 | null |
| 2025-07-10 | EXPO: Stable Reinforcement Learning with Expressive Policies | Perry Dong et.al. | 2507.07986 | null |
| 2025-07-10 | Reinforcement Learning with Action Chunking | Qiyang Li et.al. | 2507.07969 | null |
| 2025-07-10 | Scaling RL to Long Videos | Yukang Chen et.al. | 2507.07966 | link |
| 2025-07-10 | Excess Observables Reveal Nonreciprocity in Integrated Covariance | Timur Aslyamov et.al. | 2507.07876 | null |
| 2025-07-10 | “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents | Giovanni Dispoto et.al. | 2507.07848 | null |
| 2025-07-10 | Beyond Robustness: Learning Unknown Dynamic Load Adaptation for Quadruped Locomotion on Rough Terrain | Leixin Chang et.al. | 2507.07825 | null |
| 2025-07-10 | BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning | Ruohong Liu et.al. | 2507.07769 | null |
| 2025-07-10 | Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization | Chengtao Jian et.al. | 2507.07723 | null |
| 2025-07-09 | Graph-Based Complexity Metrics for Multi-Agent Curriculum Learning: A Validated Approach to Task Ordering in Cooperative Coordination Environments | Farhaan Ebadulla et.al. | 2507.07074 | null |
| 2025-07-09 | First Return, Entropy-Eliciting Explore | Tianyu Zheng et.al. | 2507.07017 | null |
| 2025-07-09 | Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks | Deemah H. Tashman et.al. | 2507.06997 | null |
| 2025-07-09 | Optimizing Cognitive Networks: Reinforcement Learning Meets Energy Harvesting Over Cascaded Channels | Deemah H. Tashman et.al. | 2507.06981 | null |
| 2025-07-09 | Bounomodes: the grazing ox algorithm for exploration of clustered anomalies | Samuel Matloob et.al. | 2507.06960 | null |
| 2025-07-10 | Rethinking Verification for LLM Code Generation: From Generation to Testing | Zihan Ma et.al. | 2507.06920 | link |
| 2025-07-09 | Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams | Abolfazl Zarghani et.al. | 2507.06901 | null |
| 2025-07-09 | Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model | Jing Liang et.al. | 2507.06892 | null |
| 2025-07-09 | Episodic Contextual Bandits with Knapsacks under Conversion Models | Zitian Li et.al. | 2507.06859 | null |
| 2025-07-10 | Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning | Matej Straka et.al. | 2507.06825 | link |
| 2025-07-08 | EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow | Yixiang Chen et.al. | 2507.06224 | null |
| 2025-07-08 | CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization | Zhongyuan Peng et.al. | 2507.06181 | link |
| 2025-07-08 | Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model | Koki Yamane et.al. | 2507.06174 | null |
| 2025-07-08 | Learning Agile Tensile Perching for Aerial Robots from Demonstrations | Kangle Yuan et.al. | 2507.06172 | null |
| 2025-07-08 | Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation | Mohamad H. Danesh et.al. | 2507.06111 | null |
| 2025-07-08 | AI-Based Demand Forecasting and Load Balancing for Optimising Energy use in Healthcare Systems: A real case study | Iman Rahimi et.al. | 2507.06077 | null |
| 2025-07-09 | FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models | Bo Pang et.al. | 2507.06057 | null |
| 2025-07-08 | CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation | Kushal Gajjar et.al. | 2507.06013 | null |
| 2025-07-08 | From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination | Chang Yao et.al. | 2507.06004 | null |
| 2025-07-08 | BlueLM-2.5-3B Technical Report | Baojiao Xiong et.al. | 2507.05934 | null |
| 2025-07-07 | Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Yana Wei et.al. | 2507.05255 | link |
| 2025-07-07 | Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving | Elahe Delavari et.al. | 2507.05251 | null |
| 2025-07-07 | NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving | Qucheng Peng et.al. | 2507.05227 | null |
| 2025-07-07 | EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling | Boyuan Wang et.al. | 2507.05198 | null |
| 2025-07-07 | Sequential Attention-based Sampling for Histopathological Analysis | Tarun G et.al. | 2507.05077 | null |
| 2025-07-07 | Replacing thinking with tool usage enables reasoning in small language models | Corrado Rainone et.al. | 2507.05065 | null |
| 2025-07-07 | When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning | Maxence Boels et.al. | 2507.05011 | null |
| 2025-07-07 | Linking Homeostasis to Reinforcement Learning: Internal State Control of Motivated Behavior | Naoto Yoshida et.al. | 2507.04998 | null |
| 2025-07-07 | Object-centric Denoising Diffusion Models for Physical Reasoning | Moritz Lange et.al. | 2507.04920 | null |
| 2025-07-07 | Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning | Sanyam Vyas et.al. | 2507.04883 | null |
| 2025-07-03 | MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs | Purbesh Mitra et.al. | 2507.02851 | link |
| 2025-07-03 | StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason | Kaiyi Zhang et.al. | 2507.02841 | null |
| 2025-07-03 | ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning | Ruiyang Zhou et.al. | 2507.02834 | null |
| 2025-07-03 | Generalizing Verifiable Instruction Following | Valentina Pyatkin et.al. | 2507.02833 | null |
| 2025-07-03 | Multimodal Mathematical Reasoning with Diverse Solving Perspective | Wenhao Shi et.al. | 2507.02804 | null |
| 2025-07-03 | A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control | Zilin Kang et.al. | 2507.02712 | null |
| 2025-07-03 | Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions | Thomas Hazenberg et.al. | 2507.02698 | null |
| 2025-07-03 | RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes | Jiaxing Wang et.al. | 2507.02690 | null |
| 2025-07-03 | TUC-PPO: Team Utility-Constrained Proximal Policy Optimization for Spatial Public Goods Games | Zhaoqilin Yang et.al. | 2507.02675 | null |
| 2025-07-03 | On Efficient Bayesian Exploration in Model-Based Reinforcement Learning | Alberto Caron et.al. | 2507.02639 | null |
| 2025-07-02 | Kwai Keye-VL Technical Report | Kwai Keye Team et.al. | 2507.01949 | link |
| 2025-07-02 | NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks | Yang Li et.al. | 2507.01921 | null |
| 2025-07-02 | Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models | Chengao Li et.al. | 2507.01915 | null |
| 2025-07-02 | TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types | Yuhao Lin et.al. | 2507.01857 | null |
| 2025-07-02 | TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents | Dmytro Kuzmenko et.al. | 2507.01823 | null |
| 2025-07-02 | Quantum reinforcement learning in dynamic environments | Oliver Sefrin et.al. | 2507.01691 | null |
| 2025-07-02 | AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training | Zhenyu Han et.al. | 2507.01663 | null |
| 2025-07-02 | Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning | Wu Fei et.al. | 2507.01551 | null |
| 2025-07-02 | Chargax: A JAX Accelerated EV Charging Simulator | Koen Ponse et.al. | 2507.01522 | null |
| 2025-07-02 | Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning | Yanfei Zhang et.al. | 2507.01489 | null |
| 2025-07-01 | SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning | Bo Liu et.al. | 2506.24119 | link |
| 2025-06-30 | Scaling Human Judgment in Community Notes with LLMs | Haiwen Li et.al. | 2506.24118 | null |
| 2025-06-30 | Constructing Non-Markovian Decision Process via History Aggregator | Yongyi Wang et.al. | 2506.24026 | null |
| 2025-06-30 | Provably Efficient and Agile Randomized Q-Learning | He Wang et.al. | 2506.24005 | null |
| 2025-06-30 | Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning | Seungjun Yi et.al. | 2506.23998 | null |
| 2025-06-30 | ADReFT: Adaptive Decision Repair for Safe Autonomous Driving via Reinforcement Fine-Tuning | Mingfei Cheng et.al. | 2506.23960 | null |
| 2025-07-01 | Adapt Your Body: Mitigating Proprioception Shifts in Imitation Learning | Fuhang Kuang et.al. | 2506.23944 | null |
| 2025-06-30 | Reinforcement Learning for Synchronised Flow Control in a Dual-Gate Resin Infusion System | Miguel Camacho-Sánchez et.al. | 2506.23923 | null |
| 2025-06-30 | The Trilemma of Truth in Large Language Models | Germans Savcisens et.al. | 2506.23921 | link |
| 2025-06-30 | Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning | Anton Andreychuk et.al. | 2506.23793 | link |
| 2025-06-27 | MiCo: Multi-image Contrast for Reinforcement Visual Reasoning | Xi Chen et.al. | 2506.22434 | null |
| 2025-06-27 | ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks | Pritam Dash et.al. | 2506.22423 | null |
| 2025-06-27 | HyperCLOVA X THINK Technical Report | NAVER Cloud HyperCLOVA X Team et.al. | 2506.22403 | null |
| 2025-06-27 | Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL | Tong Yang et.al. | 2506.22401 | null |
| 2025-06-27 | Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation | Tao Li et.al. | 2506.22365 | null |
| 2025-06-27 | Education-Oriented Graph Retrieval-Augmented Generation for Learning Path Recommendation | Xinghe Cheng et.al. | 2506.22303 | null |
| 2025-06-27 | ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning | Ming Zhao et.al. | 2506.22216 | null |
| 2025-06-27 | A Reinforcement Learning Framework for Some Singular Stochastic Control Problems | Zongxia Liang et.al. | 2506.22203 | null |
| 2025-06-27 | EFRame: Deeper Reasoning via Exploration-Filtering-Replay Reinforcement Learning Framework | Chen Wang et.al. | 2506.22200 | link |
| 2025-06-27 | ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research | Bavo Lesy et.al. | 2506.22174 | null |
| 2025-06-26 | Joint Scheduling of DER under Demand Charges: Structure and Approximation | Ruixiao Yang et.al. | 2506.21510 | null |
| 2025-06-26 | Bridging Offline and Online Reinforcement Learning for LLMs | Jack Lanchantin et.al. | 2506.21495 | null |
| 2025-06-26 | Reinforcement Learning for Optimal Control of Spin Magnetometers | Logan W. Cooke et.al. | 2506.21475 | null |
| 2025-06-26 | Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage | Gavin Lee Goodship et.al. | 2506.21465 | null |
| 2025-06-26 | Spatial Mental Modeling from Limited Views | Baiqiao Yin et.al. | 2506.21458 | null |
| 2025-06-26 | Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning | Prajwal Koirala et.al. | 2506.21427 | null |
| 2025-06-26 | rQdia: Regularizing Q-Value Distributions With Image Augmentation | Sam Lerman et.al. | 2506.21367 | null |
| 2025-06-26 | HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context | Qize Yang et.al. | 2506.21277 | link |
| 2025-06-26 | World-aware Planning Narratives Enhance Large Vision-Language Model Planner | Junhao Shi et.al. | 2506.21230 | null |
| 2025-06-26 | Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design | Hampus Gummesson Svensson et.al. | 2506.21158 | null |
| 2025-06-25 | MMSearch-R1: Incentivizing LMMs to Search | Jinming Wu et.al. | 2506.20670 | link |
| 2025-06-25 | DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy | Sungjae Park et.al. | 2506.20668 | null |
| 2025-06-25 | The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind | Andrei Lupu et.al. | 2506.20664 | null |
| 2025-06-25 | DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation | Shansan Gong et.al. | 2506.20639 | link |
| 2025-06-25 | PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models | Soufiane Hayou et.al. | 2506.20629 | link |
| 2025-06-25 | Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control | Andrew Mole et.al. | 2506.20554 | null |
| 2025-06-25 | Demonstration of effective UCB-based routing in skill-based queues on real-world data | Sanne van Kempen et.al. | 2506.20543 | null |
| 2025-06-25 | Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards | Charles Arnal et.al. | 2506.20520 | null |
| 2025-06-25 | OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling | Zengzhi Wang et.al. | 2506.20512 | link |
| 2025-06-25 | ReCode: Updating Code API Knowledge with Reinforcement Learning | Haoze Wu et.al. | 2506.20495 | link |
| 2025-06-24 | JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning | Ai Han et.al. | 2506.19846 | null |
| 2025-06-24 | Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning | Guo Li et.al. | 2506.19843 | null |
| 2025-06-24 | Persona Features Control Emergent Misalignment | Miles Wang et.al. | 2506.19823 | null |
| 2025-06-24 | KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | Baochang Ren et.al. | 2506.19807 | null |
| 2025-06-24 | Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning | Menglong Zhang et.al. | 2506.19785 | null |
| 2025-06-24 | SAGE: Strategy-Adaptive Generation Engine for Query Rewriting | Teng Wang et.al. | 2506.19783 | null |
| 2025-06-24 | Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment | Yuhui Sun et.al. | 2506.19780 | null |
| 2025-06-24 | SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning | Yuqian Fu et.al. | 2506.19767 | null |
| 2025-06-24 | Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks | Nathan Maurer et.al. | 2506.19703 | null |
| 2025-06-24 | From memories to maps: Mechanisms of in context reinforcement learning in transformers | Ching Fang et.al. | 2506.19686 | null |
| 2025-06-23 | ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | Jiaru Zou et.al. | 2506.18896 | null |
| 2025-06-23 | Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning | Anthony Kobanda et.al. | 2506.18847 | null |
| 2025-06-23 | LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning | Yuhao Wu et.al. | 2506.18841 | null |
| 2025-06-23 | SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives | Yizhou Chen et.al. | 2506.18825 | null |
| 2025-06-23 | MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation | Ruicheng Zhang et.al. | 2506.18679 | null |
| 2025-06-23 | Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation | Jingming Liu et.al. | 2506.18670 | null |
| 2025-06-23 | RL-Driven Semantic Compression Model Selection and Resource Allocation in Semantic Communication Systems | Xinyi Lin et.al. | 2506.18660 | null |
| 2025-06-23 | Dual-level Behavioral Consistency for Inter-group and Intra-group Coordination in Multi-Agent Systems | Shuocun Yang et.al. | 2506.18651 | null |
| 2025-06-23 | Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits | Yannik Mahlau et.al. | 2506.18627 | null |
| 2025-06-23 | Policy gradient methods for ordinal policies | Simón Weinberger et.al. | 2506.18614 | null |
| 2025-06-20 | No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Yanzhi Zhang et.al. | 2506.17219 | null |
| 2025-06-20 | Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens | Zeyuan Yang et.al. | 2506.17218 | null |
| 2025-06-20 | BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning | Xuechen Zhang et.al. | 2506.17211 | null |
| 2025-06-20 | Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning | Guozheng Ma et.al. | 2506.17204 | null |
| 2025-06-20 | Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity | Samin Yeasar Arnob et.al. | 2506.17155 | null |
| 2025-06-20 | When Can Model-Free Reinforcement Learning be Enough for Thinking? | Josiah P. Hanna et.al. | 2506.17124 | null |
| 2025-06-20 | TransDreamerV3: Implanting Transformer In DreamerV3 | Shruti Sadanand Dongare et.al. | 2506.17103 | null |
| 2025-06-20 | Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs | Ricardo Rei et.al. | 2506.17080 | null |
| 2025-06-20 | Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment | Leizhen Wang et.al. | 2506.17029 | null |
| 2025-06-20 | Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators | Marco Jiralerspong et.al. | 2506.17007 | null |
| 2025-06-18 | Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards | Qingming Liu et.al. | 2506.15684 | null |
| 2025-06-18 | CC-LEARN: Cohort-based Consistency Learning | Xiao Ye et.al. | 2506.15662 | null |
| 2025-06-18 | CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization | Ranting Hu et.al. | 2506.15654 | null |
| 2025-06-18 | AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning | Tevin Wang et.al. | 2506.15651 | null |
| 2025-06-18 | Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement | Weixiang Zhao et.al. | 2506.15647 | null |
| 2025-06-18 | Learning to flock in open space by avoiding collisions and staying together | Martino Brambati et.al. | 2506.15587 | null |
| 2025-06-18 | Design of an all-facet illuminator for high NA EUV lithography exposure tool based on deep reinforcement learning | Tong Li et.al. | 2506.15558 | null |
| 2025-06-18 | Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning | Roger Creus Castanyer et.al. | 2506.15544 | link |
| 2025-06-18 | Lessons from Training Grounded LLMs with Verifiable Rewards | Shang Hong Sim et.al. | 2506.15522 | null |
| 2025-06-18 | Zero-Shot Reinforcement Learning Under Partial Observability | Scott Jeen et.al. | 2506.15446 | null |
| 2025-06-17 | Reasoning with Exploration: An Entropy Perspective | Daixuan Cheng et.al. | 2506.14758 | null |
| 2025-06-17 | Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation | Carolina Higuera et.al. | 2506.14754 | null |
| 2025-06-17 | Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Ring Team et.al. | 2506.14731 | null |
| 2025-06-17 | Adaptive Accompaniment with ReaLchords | Yusong Wu et.al. | 2506.14723 | null |
| 2025-06-17 | SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning | Hexian Ni et.al. | 2506.14648 | null |
| 2025-06-17 | On Quantum BSDE Solver for High-Dimensional Parabolic PDEs | Howard Su et.al. | 2506.14612 | null |
| 2025-06-17 | TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization | Mingkang Zhu et.al. | 2506.14574 | null |
| 2025-06-17 | Toward Safety-First Human-Like Decision Making for Autonomous Vehicles in Time-Varying Traffic Flow | Xiao Wang et.al. | 2506.14502 | null |
| 2025-06-17 | Zeroth-Order Optimization is Secretly Single-Step Policy Optimization | Junbin Qiu et.al. | 2506.14460 | null |
| 2025-06-17 | Toward Rich Video Human-Motion2D Generation | Ruihao Xi et.al. | 2506.14428 | null |
| 2025-06-16 | Touch begins where vision ends: Generalizable policies for contact-rich manipulation | Zifan Zhao et.al. | 2506.13762 | null |
| 2025-06-16 | MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering | Arya Fayyazi et.al. | 2506.13755 | null |
| 2025-06-16 | LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction | Haoru Xue et.al. | 2506.13751 | null |
| 2025-06-16 | PB $^2$ : Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning | Brahim Driss et.al. | 2506.13741 | null |
| 2025-06-16 | TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning | Junru Zhang et.al. | 2506.13705 | link |
| 2025-06-16 | Value-Free Policy Optimization via Reward Partitioning | Bilal Faye et.al. | 2506.13702 | null |
| 2025-06-16 | OneRec Technical Report | Guorui Zhou et.al. | 2506.13695 | null |
| 2025-06-16 | Meta-learning how to Share Credit among Macro-Actions | Ionel-Alexandru Hosu et.al. | 2506.13690 | null |
| 2025-06-16 | The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning | Jiashun Liu et.al. | 2506.13672 | null |
| 2025-06-16 | We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems | Junfeng Fang et.al. | 2506.13666 | null |
| 2025-06-13 | Schema-R1: A reasoning training approach for schema linking in Text-to-SQL Task | Wuzhenghong Wen et.al. | 2506.11986 | null |
| 2025-06-13 | Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks | Ankit Bhardwaj et.al. | 2506.11973 | null |
| 2025-06-13 | Visual Pre-Training on Unlabeled Images using Reinforcement Learning | Dibya Ghosh et.al. | 2506.11967 | null |
| 2025-06-13 | Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning | Mohammadamin Moradi et.al. | 2506.11957 | null |
| 2025-06-13 | SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies | Nadun Ranawaka Arachchige et.al. | 2506.11948 | null |
| 2025-06-13 | Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations | Miguel Suau et.al. | 2506.11912 | null |
| 2025-06-13 | Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients | Chapa Sirithunge et.al. | 2506.11906 | null |
| 2025-06-13 | TreeRL: LLM Reinforcement Learning with On-Policy Tree Search | Zhenyu Hou et.al. | 2506.11902 | link |
| 2025-06-13 | An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing | Haochen Sun et.al. | 2506.11882 | null |
| 2025-06-13 | LLM-based Dynamic Differential Testing for Database Connectors with Reinforcement Learning-Guided Prompt Selection | Ce Lyu et.al. | 2506.11870 | null |
| 2025-06-12 | Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop | Justin Kerr et.al. | 2506.10968 | null |
| 2025-06-12 | Spurious Rewards: Rethinking Training Signals in RLVR | Rulin Shao et.al. | 2506.10947 | link |
| 2025-06-12 | Self-Adapting Language Models | Adam Zweiger et.al. | 2506.10943 | null |
| 2025-06-12 | Magistral | Mistral-AI et.al. | 2506.10910 | null |
| 2025-06-12 | Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning | Waylon Luo et.al. | 2506.10889 | null |
| 2025-06-12 | Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization | Pierre-François Massiani et.al. | 2506.10871 | null |
| 2025-06-13 | Joint Beamforming with Extremely Large Scale RIS: A Sequential Multi-Agent A2C Approach | Zhi Chai et.al. | 2506.10815 | null |
| 2025-06-12 | Human-Robot Navigation using Event-based Cameras and Reinforcement Learning | Ignacio Bugueno-Cordova et.al. | 2506.10790 | null |
| 2025-06-12 | PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework | SiXiang Chen et.al. | 2506.10741 | link |
| 2025-06-12 | Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs | Yucong Luo et.al. | 2506.10630 | null |
| 2025-06-11 | Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Junfei Wu et.al. | 2506.09965 | link |
| 2025-06-11 | VerIF: Verification Engineering for Reinforcement Learning in Instruction Following | Hao Peng et.al. | 2506.09942 | link |
| 2025-06-11 | The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability | Jiachen Hu et.al. | 2506.09940 | null |
| 2025-06-11 | From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models | Irving Fang et.al. | 2506.09930 | link |
| 2025-06-11 | “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) | Noel Brindise et.al. | 2506.09901 | null |
| 2025-06-11 | Hierarchical Learning-Enhanced MPC for Safe Crowd Navigation with Heterogeneous Constraints | Huajian Liu et.al. | 2506.09859 | null |
| 2025-06-11 | Foundation Model-Aided Deep Reinforcement Learning for RIS-Assisted Wireless Communication | Mohammad Ghassemi et.al. | 2506.09855 | null |
| 2025-06-11 | CoRT: Code-integrated Reasoning within Thinking | Chengpeng Li et.al. | 2506.09820 | link |
| 2025-06-11 | Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy | Tonghe Wang et.al. | 2506.09805 | null |
| 2025-06-11 | Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving | Haochen Liu et.al. | 2506.09800 | null |
| 2025-06-09 | Play to Generalize: Learning to Reason Through Game Play | Yunfei Xie et.al. | 2506.08011 | link |
| 2025-06-09 | Reinforcement Pre-Training | Qingxiu Dong et.al. | 2506.08007 | null |
| 2025-06-09 | Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator | Alberto Bazán-Guillén et.al. | 2506.07980 | null |
| 2025-06-09 | Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction | Junhong Shen et.al. | 2506.07976 | link |
| 2025-06-09 | A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle | Amirreza Yasami et.al. | 2506.07929 | null |
| 2025-06-09 | LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement | Dimitris Panagopoulos et.al. | 2506.07915 | null |
| 2025-06-09 | WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning | Jie Yang et.al. | 2506.07905 | link |
| 2025-06-09 | MiniCPM4: Ultra-Efficient LLMs on End Devices | MiniCPM Team et.al. | 2506.07900 | link |
| 2025-06-09 | Diffusion-RL for Scalable Resource Allocation for 6G Networks | Salar Nouri et.al. | 2506.07880 | null |
| 2025-06-09 | Versatile Loco-Manipulation through Flexible Interlimb Coordination | Xinghao Zhu et.al. | 2506.07876 | null |
| 2025-06-06 | Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens | Jihwan Jeong et.al. | 2506.06261 | null |
| 2025-06-06 | How to craft a deep reinforcement learning policy for wind farm flow control | Elie Kadoche et.al. | 2506.06204 | null |
| 2025-06-06 | Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization | Jonathan Yang et.al. | 2506.06196 | null |
| 2025-06-06 | A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization | Muhammed Ustaomeroglu et.al. | 2506.06179 | null |
| 2025-06-06 | Reusing Trajectories in Policy Gradients Enables Fast Convergence | Alessandro Montenegro et.al. | 2506.06178 | null |
| 2025-06-06 | Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach | James Ford et.al. | 2506.06175 | null |
| 2025-06-06 | Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models | Rihui Jin et.al. | 2506.06137 | null |
| 2025-06-06 | Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library | Weixun Wang et.al. | 2506.06122 | link |
| 2025-06-06 | On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems | Elim Kwan et.al. | 2506.06094 | null |
| 2025-06-06 | Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning | Atharv Kulkarni et.al. | 2506.06093 | null |
| 2025-06-05 | ContentV: Efficient Training of Video Generation Models with Limited Compute | Wenfeng Lin et.al. | 2506.05343 | null |
| 2025-06-05 | AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs | Lidong Lu et.al. | 2506.05328 | link |
| 2025-06-05 | Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay | Yifan Sun et.al. | 2506.05316 | null |
| 2025-06-05 | Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s | Ramesh Johari et.al. | 2506.05308 | null |
| 2025-06-05 | A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$ : Robust Imitation via Learning to Search | Arnav Kumar Jain et.al. | 2506.05294 | link |
| 2025-06-06 | Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning | Violet Xiang et.al. | 2506.05256 | null |
| 2025-06-05 | Towards Language-Augmented Multi-Agent Deep Reinforcement Learning | Maxime Toquebiau et.al. | 2506.05236 | null |
| 2025-06-05 | Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning | Yuhua Zhu et.al. | 2506.05208 | null |
| 2025-06-05 | TreeRPO: Tree Relative Policy Optimization | Zhicheng Yang et.al. | 2506.05183 | link |
| 2025-06-05 | Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning | Yunsheng Tian et.al. | 2506.05168 | null |
| 2025-06-04 | Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning | Shuang Chen et.al. | 2506.04207 | link |
| 2025-06-04 | MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures | Elena Zamaraeva et.al. | 2506.04195 | null |
| 2025-06-04 | R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning | Qingfei Zhao et.al. | 2506.04185 | link |
| 2025-06-04 | Horizon Reduction Makes RL Scalable | Seohong Park et.al. | 2506.04168 | null |
| 2025-06-04 | SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL | Jiaheng Hu et.al. | 2506.04147 | null |
| 2025-06-04 | Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning | Muling Wu et.al. | 2506.04065 | null |
| 2025-06-04 | Crowd-SFT: Crowdsourcing for LLM Alignment | Alex Sotiropoulos et.al. | 2506.04063 | null |
| 2025-06-04 | Autonomous Vehicle Lateral Control Using Deep Reinforcement Learning with MPC-PID Demonstration | Chengdong Wu et.al. | 2506.04040 | null |
| 2025-06-04 | Interpretability by Design for Efficient Multi-Objective Reinforcement Learning | Qiyue Xia et.al. | 2506.04022 | null |
| 2025-06-04 | Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning | Xunzhu Tang et.al. | 2506.03921 | null |
| 2025-06-03 | Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning | Yinjie Wang et.al. | 2506.03136 | link |
| 2025-06-03 | AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation | Prashanth Vijayaraghavan et.al. | 2506.03122 | null |
| 2025-06-03 | Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback | Xiaoying Zhang et.al. | 2506.03106 | link |
| 2025-06-03 | EgoVLM: Policy Optimization for Egocentric Video Understanding | Ashwin Vinod et.al. | 2506.03097 | link |
| 2025-06-03 | DPO Learning with LLMs-Judge Signal for Computer Use Agents | Man Luo et.al. | 2506.03095 | null |
| 2025-06-03 | Provable Reinforcement Learning from Human Feedback with an Unknown Link Function | Qining Zhang et.al. | 2506.03066 | null |
| 2025-06-03 | EDEN: Entorhinal Driven Egocentric Navigation Toward Robotic Deployment | Mikolaj Walczak et.al. | 2506.03046 | null |
| 2025-06-03 | Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective | Jintian Shao et.al. | 2506.03038 | null |
| 2025-06-03 | MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver | Yuepeng Zheng et.al. | 2506.02935 | null |
| 2025-06-03 | Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning | Yin Fang et.al. | 2506.02911 | link |
| 2025-05-30 | ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL | Yu Zhang et.al. | 2505.24875 | null |
| 2025-05-30 | ProxyThinker: Test-Time Guidance through Small Visual Reasoners | Zilin Xiao et.al. | 2505.24872 | null |
| 2025-05-30 | MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning | Yiqing Liang et.al. | 2505.24871 | null |
| 2025-05-30 | ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models | Mingjie Liu et.al. | 2505.24864 | null |
| 2025-05-30 | MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning | Jingyan Shen et.al. | 2505.24846 | null |
| 2025-05-30 | AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models | Conor Heins et.al. | 2505.24784 | null |
| 2025-05-30 | Diffusion-Based Symbolic Regression | Zachary Bastiani et.al. | 2505.24776 | null |
| 2025-05-30 | REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards | Zafir Stojanovski et.al. | 2505.24760 | link |
| 2025-05-30 | Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning | Shelly Bensal et.al. | 2505.24726 | null |
| 2025-06-03 | Reinforcing Video Reasoning with Focused Thinking | Jisheng Dang et.al. | 2505.24718 | link |
| 2025-05-29 | ZeroGUI: Automating Online GUI Learning at Zero Human Cost | Chenyu Yang et.al. | 2505.23762 | link |
| 2025-05-29 | DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning | Ziyin Zhang et.al. | 2505.23754 | link |
| 2025-05-29 | PixelThink: Towards Efficient Chain-of-Pixel Reasoning | Song Wang et.al. | 2505.23727 | null |
| 2025-05-29 | ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering | Zexi Liu et.al. | 2505.23723 | link |
| 2025-05-29 | AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning | Lucas N. Alegre et.al. | 2505.23708 | null |
| 2025-05-29 | Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability | Ruida Wang et.al. | 2505.23703 | null |
| 2025-05-29 | Grounded Reinforcement Learning for Visual Reasoning | Gabriel Sarch et.al. | 2505.23678 | null |
| 2025-05-29 | Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models | Lang Cao et.al. | 2505.23667 | null |
| 2025-05-29 | AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction | Niklas Freymuth et.al. | 2505.23663 | link |
| 2025-05-29 | Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation | Hongxiang Zhang et.al. | 2505.23657 | null |
| 2025-05-28 | Maximizing Confidence Alone Improves Reasoning | Mihir Prabhudesai et.al. | 2505.22660 | null |
| 2025-05-28 | The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason | Ang Lv et.al. | 2505.22653 | null |
| 2025-05-28 | WebDancer: Towards Autonomous Information Seeking Agency | Jialong Wu et.al. | 2505.22648 | null |
| 2025-05-28 | FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control | Younggyo Seo et.al. | 2505.22642 | null |
| 2025-05-28 | SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning | Yu Zhang et.al. | 2505.22626 | null |
| 2025-05-28 | The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models | Ganqu Cui et.al. | 2505.22617 | null |
| 2025-05-28 | HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym | Ngoc La et.al. | 2505.22597 | null |
| 2025-05-28 | SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning | Jiaqi Huang et.al. | 2505.22596 | null |
| 2025-05-28 | Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs | Changhao Song et.al. | 2505.22548 | null |
| 2025-05-28 | Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation | Hongyi Zhou et.al. | 2505.22492 | null |
| 2025-05-27 | Reinforcing General Reasoning without Verifiers | Xiangxin Zhou et.al. | 2505.21493 | null |
| 2025-05-27 | Policy Optimized Text-to-Image Pipeline Design | Uri Gadot et.al. | 2505.21478 | null |
| 2025-05-27 | Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO | Muzhi Zhu et.al. | 2505.21457 | null |
| 2025-05-27 | Can Large Reasoning Models Self-Train? | Sheikh Shafayat et.al. | 2505.21444 | null |
| 2025-05-27 | A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment | Brett Bissey et.al. | 2505.21414 | null |
| 2025-05-27 | MRSD: Multi-Resolution Skill Discovery for HRL Agents | Shashank Sharma et.al. | 2505.21410 | null |
| 2025-05-27 | Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features | Zixuan Xie et.al. | 2505.21391 | null |
| 2025-05-27 | EgoWalk: A Multimodal Dataset for Robot Navigation in the Wild | Timur Akhtyamov et.al. | 2505.21282 | null |
| 2025-05-27 | Data-Driven Cellular Mobility Management via Bayesian Optimization and Reinforcement Learning | Mohamed Benzaghta et.al. | 2505.21249 | null |
| 2025-05-27 | Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies | Felix Chalumeau et.al. | 2505.21236 | null |
| 2025-05-26 | FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities | Jin Wang et.al. | 2505.20147 | null |
| 2025-05-26 | MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning | Yuanxin Zhuang et.al. | 2505.20131 | null |
| 2025-05-26 | Proxy-Free GFlowNet | Ruishuo Chen et.al. | 2505.20110 | null |
| 2025-05-26 | Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning | Ziyi Zhang et.al. | 2505.20107 | null |
| 2025-05-26 | Adaptive Deep Reasoning: Triggering Deep Thinking When Needed | Yunhao Wang et.al. | 2505.20101 | null |
| 2025-05-26 | SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale | Qi Li et.al. | 2505.20094 | null |
| 2025-05-26 | Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback | Mengdi Li et.al. | 2505.20075 | null |
| 2025-05-26 | Incentivizing Reasoning from Weak Supervision | Yige Yuan et.al. | 2505.20072 | null |
| 2025-05-26 | SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety | Geon-Hyeong Kim et.al. | 2505.20065 | null |
| 2025-05-26 | REARANK: Reasoning Re-ranking Agent via Reinforcement Learning | Le Zhang et.al. | 2505.20046 | null |
| 2025-05-23 | One RL to See Them All: Visual Triple Unified Reinforcement Learning | Yan Ma et.al. | 2505.18129 | null |
| 2025-05-23 | Reward Model Overoptimisation in Iterated RLHF | Lorenz Wolf et.al. | 2505.18126 | null |
| 2025-05-23 | ProgRM: Build Better GUI Agents with Progress Rewards | Danyang Zhang et.al. | 2505.18121 | null |
| 2025-05-23 | Bridging Supervised Learning and Reinforcement Learning in Math Reasoning | Huayu Chen et.al. | 2505.18116 | null |
| 2025-05-23 | Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL | Joey Hong et.al. | 2505.18098 | null |
| 2025-05-23 | Stable Reinforcement Learning for Efficient Reasoning | Muzhi Dai et.al. | 2505.18086 | null |
| 2025-05-23 | What Do You Need for Diverse Trajectory Stitching in Diffusion Planning? | Quentin Clark et.al. | 2505.18083 | null |
| 2025-05-23 | Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals | Jia-Nan Li et.al. | 2505.18071 | null |
| 2025-05-23 | Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective | Jintian Shao et.al. | 2505.17997 | null |
| 2025-05-23 | Outcome-based Reinforcement Learning to Predict the Future | Benjamin Turtel et.al. | 2505.17989 | null |
| 2025-05-22 | GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | Chengqi Duan et.al. | 2505.17022 | link |
| 2025-05-22 | SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward | Kaixuan Fan et.al. | 2505.17018 | link |
| 2025-05-22 | Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO | Chengzhuo Tong et.al. | 2505.17017 | link |
| 2025-05-22 | Interactive Post-Training for Vision-Language-Action Models | Shuhan Tan et.al. | 2505.17016 | null |
| 2025-05-22 | R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning | Huatong Song et.al. | 2505.17005 | link |
| 2025-05-22 | $\text{R}^2\text{ec}$ : Towards Large Recommender Models with Reasoning | Runyang You et.al. | 2505.16994 | link |
| 2025-05-22 | SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development | Yaxin Du et.al. | 2505.16975 | link |
| 2025-05-22 | Risk-Averse Reinforcement Learning with Itakura-Saito Loss | Igor Udovichenko et.al. | 2505.16925 | null |
| 2025-05-22 | LARES: Latent Reasoning for Sequential Recommendation | Enze Liu et.al. | 2505.16865 | null |
| 2025-05-22 | Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only | Wei Xiao et.al. | 2505.16856 | null |
| 2025-05-21 | GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents | Yuqi Zhou et.al. | 2505.15810 | link |
| 2025-05-21 | MMaDA: Multimodal Large Diffusion Language Models | Ling Yang et.al. | 2505.15809 | link |
| 2025-05-21 | STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | Zongzhao Li et.al. | 2505.15804 | null |
| 2025-05-21 | VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models | Yuchen Yan et.al. | 2505.15801 | null |
| 2025-05-21 | Reverse Engineering Human Preferences with Reinforcement Learning | Lisa Alazraki et.al. | 2505.15795 | null |
| 2025-05-21 | HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving | Zhiwen Chen et.al. | 2505.15793 | null |
| 2025-05-21 | VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL | Fengyuan Dai et.al. | 2505.15791 | null |
| 2025-05-21 | ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning | Changtai Zhu et.al. | 2505.15776 | null |
| 2025-05-21 | Improving planning and MBRL with temporally-extended actions | Palash Chatterjee et.al. | 2505.15754 | null |
| 2025-05-21 | UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning | Xiangyu Wang et.al. | 2505.15725 | null |
| 2025-05-20 | Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning | Haolei Xu et.al. | 2505.14684 | link |
| 2025-05-20 | Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning | Jiaer Xia et.al. | 2505.14677 | link |
| 2025-05-20 | Reward Reasoning Model | Jiaxin Guo et.al. | 2505.14674 | null |
| 2025-05-20 | General-Reasoner: Advancing LLM Reasoning Across All Domains | Xueguang Ma et.al. | 2505.14652 | link |
| 2025-05-20 | Think Only When You Need with Large Hybrid-Reasoning Models | Lingjie Jiang et.al. | 2505.14631 | null |
| 2025-05-20 | TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning | Zhangchen Xu et.al. | 2505.14625 | link |
| 2025-05-20 | Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning | Wenbin Hu et.al. | 2505.14585 | null |
| 2025-05-20 | Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning | Deemah H. Tashman et.al. | 2505.14581 | null |
| 2025-05-20 | KIPPO: Koopman-Inspired Proximal Policy Optimization | Andrei Cozma et.al. | 2505.14566 | null |
| 2025-05-20 | Bellman operator convergence enhancements in reinforcement learning algorithms | David Krame Kadurha et.al. | 2505.14564 | null |
| 2025-05-19 | Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards | Xiaoyuan Liu et.al. | 2505.13445 | link |
| 2025-05-19 | Optimizing Anytime Reasoning via Budget Relative Policy Optimization | Penghui Qi et.al. | 2505.13438 | link |
| 2025-05-19 | KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture | R. James Cotton et.al. | 2505.13436 | null |
| 2025-05-19 | G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning | Liang Chen et.al. | 2505.13426 | link |
| 2025-05-20 | A Dataless Reinforcement Learning Approach to Rounding Hyperplane Optimization for Max-Cut | Gabriel Malikal et.al. | 2505.13405 | null |
| 2025-05-19 | Thinkless: LLM Learns When to Think | Gongfan Fang et.al. | 2505.13379 | link |
| 2025-05-19 | Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning | Irene Brugnara et.al. | 2505.13372 | null |
| 2025-05-19 | J4R: Learning to Judge with Equivalent Initial State Group Relative Preference Optimization | Austin Xu et.al. | 2505.13346 | null |
| 2025-05-19 | Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems | Babak Badnava et.al. | 2505.13337 | null |
| 2025-05-19 | CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning | Lei Sheng et.al. | 2505.13271 | link |
| 2025-05-16 | SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics | Lizhi Yang et.al. | 2505.11494 | null |
| 2025-05-16 | Improving Assembly Code Performance with Large Language Models via Reinforcement Learning | Anjiang Wei et.al. | 2505.11480 | null |
| 2025-05-16 | Automatic Reward Shaping from Confounded Offline Data | Mingxuan Li et.al. | 2505.11478 | null |
| 2025-05-16 | HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages | Zhilin Wang et.al. | 2505.11475 | null |
| 2025-05-16 | Disentangling Reasoning and Knowledge in Medical Large Language Models | Rahul Thapa et.al. | 2505.11462 | null |
| 2025-05-16 | Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks | Wesley A Suttle et.al. | 2505.11461 | null |
| 2025-05-16 | Visual Planning: Let’s Think Only with Images | Yi Xu et.al. | 2505.11409 | link |
| 2025-05-16 | Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner | Wenchuan Zhang et.al. | 2505.11404 | link |
| 2025-05-16 | Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space | Ali Rabiee et.al. | 2505.11366 | null |
| 2025-05-16 | Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics | Ardian Selmonaj et.al. | 2505.11311 | null |
| 2025-05-15 | Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models | Zhiyuan Hu et.al. | 2505.10554 | link |
| 2025-05-15 | Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation | Xinrui Wang et.al. | 2505.10522 | null |
| 2025-05-15 | Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning | Andrea Baisero et.al. | 2505.10484 | null |
| 2025-05-15 | Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps | Ningyuan Yang et.al. | 2505.10482 | null |
| 2025-05-15 | Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | Zemin Huang et.al. | 2505.10446 | null |
| 2025-05-15 | IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning | Dechen Gao et.al. | 2505.10442 | null |
| 2025-05-15 | Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs | Jingyao Wang et.al. | 2505.10425 | null |
| 2025-05-15 | Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency | Daniel Weitekamp et.al. | 2505.10422 | null |
| 2025-05-15 | Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change | Jonathan Clifford Balloch et.al. | 2505.10330 | null |
| 2025-05-15 | J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning | Chenxi Whitehouse et.al. | 2505.10320 | null |
| 2025-05-14 | DataMIL: Selecting Data for Robot Imitation Learning with Datamodels | Shivin Dass et.al. | 2505.09603 | null |
| 2025-05-14 | Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware | Justin Yu et.al. | 2505.09601 | link |
| 2025-05-14 | VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation | Chaofan Zhang et.al. | 2505.09577 | null |
| 2025-05-14 | Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach | Shannon Lodoen et.al. | 2505.09576 | null |
| 2025-05-14 | Learning Long-Context Diffusion Policies via Past-Token Prediction | Marcel Torne et.al. | 2505.09561 | null |
| 2025-05-14 | WavReward: Spoken Dialogue Models With Generalist Reward Evaluators | Shengpeng Ji et.al. | 2505.09558 | link |
| 2025-05-14 | Distilling Realizable Students from Unrealizable Teachers | Yujin Kim et.al. | 2505.09546 | null |
| 2025-05-14 | Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data | Rui Miao et.al. | 2505.09496 | null |
| 2025-05-14 | Preserving Plasticity in Continual Learning with Adaptive Linearity Injection | Seyed Roozbeh Razavi Rohani et.al. | 2505.09486 | null |
| 2025-05-14 | Quantum state-agnostic work extraction (almost) without dissipation | Josep Lumbreras et.al. | 2505.09456 | null |
| 2025-05-13 | Generative Molecular Design with Steerable and Granular Synthesizability Control | Jeff Guo et.al. | 2505.08774 | null |
| 2025-05-13 | Preference Optimization for Combinatorial Optimization Problems | Mingjun Pan et.al. | 2505.08735 | null |
| 2025-05-13 | A Study of Data-driven Methods for Inventory Optimization | Lee Yeung Ping et.al. | 2505.08673 | null |
| 2025-05-13 | Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning | Shuai Han et.al. | 2505.08630 | null |
| 2025-05-13 | Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations | Sarmad Mehrdad et.al. | 2505.08619 | null |
| 2025-05-13 | OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning | Zhaochen Su et.al. | 2505.08617 | link |
| 2025-05-13 | Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection | Ayush K. Rai et.al. | 2505.08561 | null |
| 2025-05-13 | Strategy-Augmented Planning for Large Language Models via Opponent Exploitation | Shuai Xu et.al. | 2505.08459 | null |
| 2025-05-13 | Zero-Shot Sim-to-Real Reinforcement Learning for Fruit Harvesting | Emlyn Williams et.al. | 2505.08458 | null |
| 2025-05-13 | Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges | Miguel Arana-Catania et.al. | 2505.08453 | null |
| 2025-05-12 | DanceGRPO: Unleashing GRPO on Visual Generation | Zeyue Xue et.al. | 2505.07818 | link |
| 2025-05-12 | A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values | Daniel Beechey et.al. | 2505.07797 | link |
| 2025-05-12 | MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering | Rushi Qiang et.al. | 2505.07782 | link |
| 2025-05-12 | Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | Xinji Mai et.al. | 2505.07773 | link |
| 2025-05-12 | Guiding Data Collection via Factored Scaling Curves | Lihan Zha et.al. | 2505.07728 | link |
| 2025-05-12 | S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models | Muzhi Dai et.al. | 2505.07686 | null |
| 2025-05-12 | A comparative study of Bitcoin and Ripple cryptocurrencies trading using Deep Reinforcement Learning algorithms | Dieu-Donne Fangnon et.al. | 2505.07660 | null |
| 2025-05-12 | MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining | Xiaomi LLM-Core Team et.al. | 2505.07608 | link |
| 2025-05-12 | Multi-Objective Reinforcement Learning for Energy-Efficient Industrial Control | Georg Schäfer et.al. | 2505.07607 | null |
| 2025-05-12 | Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent | Ziyang Huang et.al. | 2505.07596 | link |
| 2025-05-09 | VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction | Noah Frahm et.al. | 2505.06219 | null |
| 2025-05-09 | Let Humanoids Hike! Integrative Skill Development on Complex Trails | Kwan-Yee Lin et.al. | 2505.06218 | null |
| 2025-05-09 | Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach | Tim Schneider et.al. | 2505.06182 | null |
| 2025-05-09 | Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning | Haokun Yu et.al. | 2505.06122 | null |
| 2025-05-09 | TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations | Shuaiyi Huang et.al. | 2505.06079 | null |
| 2025-05-09 | Safe-EF: Error Feedback for Nonsmooth Constrained Optimization | Rustem Islamov et.al. | 2505.06053 | null |
| 2025-05-09 | Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI | Jianpeng Qi et.al. | 2505.06025 | null |
| 2025-05-09 | Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models | Lennart Stöpler et.al. | 2505.05970 | null |
| 2025-05-09 | Offline Multi-agent Reinforcement Learning via Score Decomposition | Dan Qiao et.al. | 2505.05968 | null |
| 2025-05-09 | Learning Power Control Protocol for In-Factory 6G Subnetworks | Uyoata E. Uyoata et.al. | 2505.05967 | null |
| 2025-05-08 | Flow-GRPO: Training Flow Matching Models via Online RL | Jie Liu et.al. | 2505.05470 | link |
| 2025-05-08 | RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles | Pouria Behnoudfar et.al. | 2505.05452 | null |
| 2025-05-08 | Reasoning Models Don’t Always Say What They Think | Yanda Chen et.al. | 2505.05410 | null |
| 2025-05-08 | Repair Crew Routing for Infrastructure Network Restoration under Incomplete Information | Subhojit Biswas et.al. | 2505.05297 | null |
| 2025-05-08 | Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation | Zechu Li et.al. | 2505.05287 | null |
| 2025-05-08 | Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration | Andreas Kontogiannis et.al. | 2505.05262 | null |
| 2025-05-08 | High Altitude Platform-Based Caching and Multicasting for Rural Connectivity | Yongqiang Zhang et.al. | 2505.05251 | null |
| 2025-05-08 | Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation | Luca Marzari et.al. | 2505.05235 | null |
| 2025-05-08 | Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network | Changxiang Wu et.al. | 2505.05231 | null |
| 2025-05-08 | Multi-Objective Reinforcement Learning for Adaptive Personalized Autonomous Driving | Hendrik Surmann et.al. | 2505.05223 | null |
| 2025-05-07 | EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning | Zhenghao Xing et.al. | 2505.04623 | link |
| 2025-05-07 | Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation | Abdulaziz Almuzairee et.al. | 2505.04619 | null |
| 2025-05-07 | ZeroSearch: Incentivize the Search Capability of LLMs without Searching | Hao Sun et.al. | 2505.04588 | link |
| 2025-05-07 | Active Sampling for MRI-based Sequential Decision Making | Yuning Du et.al. | 2505.04586 | link |
| 2025-05-07 | Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions | Stéphane Aroca-Ouellette et.al. | 2505.04579 | null |
| 2025-05-07 | Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization | Wenjun Cao et.al. | 2505.04578 | null |
| 2025-05-07 | Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions | Shanyu Han et.al. | 2505.04553 | null |
| 2025-05-07 | A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance | Axel Friedrich Wolter et.al. | 2505.04494 | null |
| 2025-05-07 | RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation | Jing Hu et.al. | 2505.04424 | link |
| 2025-05-07 | A Heuristic-Integrated DRL Approach for Phase Optimization in Large-Scale RISs | Wei Wang et.al. | 2505.04401 | null |
| 2025-05-06 | AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control | Jialong Li et.al. | 2505.03738 | null |
| 2025-05-06 | Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning | Dian Chen et.al. | 2505.03721 | null |
| 2025-05-06 | Actor-Critics Can Achieve Optimal Sample Efficiency | Kevin Tan et.al. | 2505.03710 | null |
| 2025-05-06 | Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches | Feiran Zhao et.al. | 2505.03706 | null |
| 2025-05-06 | Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation | Songchen Fu et.al. | 2505.03586 | null |
| 2025-05-06 | Ergodic Generative Flows | Leo Maxime Brunswic et.al. | 2505.03561 | null |
| 2025-05-06 | Multi-Agent Reinforcement Learning Scheduling to Support Low Latency in Teleoperated Driving | Giacomo Avanzi et.al. | 2505.03558 | null |
| 2025-05-06 | Small-Scale-Fading-Aware Resource Allocation in Wireless Federated Learning | Jiacheng Wang et.al. | 2505.03533 | null |
| 2025-05-06 | The Steganographic Potentials of Language Models | Artem Karpov et.al. | 2505.03439 | null |
| 2025-05-06 | Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients | Stefano Bruno et.al. | 2505.03432 | null |
| 2025-05-05 | R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning | Yi-Fan Zhang et.al. | 2505.02835 | link |
| 2025-05-05 | TWIST: Teleoperated Whole-Body Imitation System | Yanjie Ze et.al. | 2505.02833 | null |
| 2025-05-05 | Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing | Diji Yang et.al. | 2505.02811 | link |
| 2025-05-05 | Teaching the social media generation: rethinking learning without sacrificing quality | Sepinoud Azimi et.al. | 2505.02770 | null |
| 2025-05-05 | The use of Artificial Intelligence for Intervention and Assessment in Individuals with ASD | Aggeliki Sideraki et.al. | 2505.02747 | null |
| 2025-05-05 | Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry | Junu Kim et.al. | 2505.02722 | link |
| 2025-05-05 | Graph Neural Network-Based Reinforcement Learning for Controlling Biological Networks: The GATTACA Framework | Andrzej Mizera et.al. | 2505.02712 | null |
| 2025-05-05 | Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models | Xiaobao Wu et.al. | 2505.02686 | link |
| 2025-05-05 | Online Phase Estimation of Human Oscillatory Motions using Deep Learning | Antonio Grotta et.al. | 2505.02668 | null |
| 2025-05-05 | A Survey on Progress in LLM Alignment from the Perspective of Reward Design | Miaomiao Ji et.al. | 2505.02666 | null |
| 2025-05-02 | FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research | Yan Miao et.al. | 2505.01383 | null |
| 2025-05-02 | Stabilizing Temporal Difference Learning via Implicit Stochastic Approximation | Hwanwoo Kim et.al. | 2505.01361 | null |
| 2025-05-02 | Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story | Vincenzo De Paola et.al. | 2505.01336 | null |
| 2025-05-02 | Integration of Multi-Mode Preference into Home Energy Management System Using Deep Reinforcement Learning | Mohammed Sumayli et.al. | 2505.01332 | null |
| 2025-05-02 | Exploring Equity of Climate Policies using Multi-Agent Multi-Objective Reinforcement Learning | Palok Biswas et.al. | 2505.01115 | null |
| 2025-05-02 | Multi-Objective Reinforcement Learning for Water Management | Zuzanna Osika et.al. | 2505.01094 | null |
| 2025-05-02 | Llama-Nemotron: Efficient Reasoning Models | Akhiad Bercovich et.al. | 2505.00949 | null |
| 2025-05-01 | Learning Neural Control Barrier Functions from Offline Data with Conservatism | Ihab Tabbara et.al. | 2505.00908 | null |
| 2025-05-01 | SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation | Quang P. M. Pham et.al. | 2505.00831 | null |
| 2025-05-01 | Constructing an Optimal Behavior Basis for the Option Keyboard | Lucas N. Alegre et.al. | 2505.00787 | null |
| 2025-05-01 | T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT | Dongzhi Jiang et.al. | 2505.00703 | link |
| 2025-05-01 | Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions | Chenggang Wang et.al. | 2505.00671 | null |
| 2025-05-01 | Deep Reinforcement Learning for Urban Air Quality Management: Multi-Objective Optimization of Pollution Mitigation Booth Placement in Metropolitan Environments | Kirtan Rajesh et.al. | 2505.00668 | null |
| 2025-05-01 | Wasserstein Policy Optimization | David Pfau et.al. | 2505.00663 | null |
| 2025-05-01 | DeepCritic: Deliberate Critique with Large Language Models | Wenkai Yang et.al. | 2505.00662 | link |
| 2025-05-02 | 100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models | Chong Zhang et.al. | 2505.00551 | null |
| 2025-05-01 | Directly Forecasting Belief for Reinforcement Learning with Delays | Qingyuan Wu et.al. | 2505.00546 | null |
| 2025-05-01 | Emergence of Roles in Robotic Teams with Model Sharing and Limited Communication | Ian O’Flynn et.al. | 2505.00540 | null |
| 2025-05-01 | Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks | Xinyu Wang et.al. | 2505.00530 | null |
| 2025-05-01 | DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation | Zixuan Chen et.al. | 2505.00527 | null |
| 2025-04-30 | DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition | Z. Z. Ren et.al. | 2504.21801 | link |
| 2025-04-30 | Reconciling Discrete-Time Mixed Policies and Continuous-Time Relaxed Controls in Reinforcement Learning and Stochastic Control | Rene Carmona et.al. | 2504.21793 | null |
| 2025-04-30 | MAGNET: an open-source library for mesh agglomeration by Graph Neural Networks | Paola F. Antonietti et.al. | 2504.21780 | null |
| 2025-04-30 | LLM-based Interactive Imitation Learning for Robotic Manipulation | Jonas Werner et.al. | 2504.21769 | null |
| 2025-04-30 | LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning | Yiyang Shao et.al. | 2504.21738 | null |
| 2025-04-30 | Adaptive 3D UI Placement in Mixed Reality Using Deep Reinforcement Learning | Feiyu Lu et.al. | 2504.21731 | null |
| 2025-04-30 | MovementVR: An open-source tool for the study of motor control and learning in virtual reality | Cristina Rossi et.al. | 2504.21696 | null |
| 2025-04-30 | Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation | Luca Marzari et.al. | 2504.21643 | null |
| 2025-04-30 | Multi-Goal Dexterous Hand Manipulation using Probabilistic Model-based Reinforcement Learning | Yingzhuo Jiang et.al. | 2504.21585 | null |
| 2025-04-30 | SimPRIVE: a Simulation framework for Physical Robot Interaction with Virtual Environments | Federico Nesti et.al. | 2504.21454 | null |
| 2025-04-29 | Toward Efficient Exploration by Large Language Model Agents | Dilip Arumugam et.al. | 2504.20997 | null |
| 2025-04-29 | XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search | Yiting Zhang et.al. | 2504.20969 | null |
| 2025-04-29 | Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity | Taisuke Kobayashi et.al. | 2504.20932 | null |
| 2025-04-29 | ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification | Ziqing Fan et.al. | 2504.20930 | link |
| 2025-04-29 | Exploiting inter-agent coupling information for efficient reinforcement learning of cooperative LQR | Shahbaz P Qadri Syed et.al. | 2504.20927 | null |
| 2025-04-29 | A Domain-Agnostic Scalable AI Safety Ensuring Framework | Beomjun Kim et.al. | 2504.20924 | null |
| 2025-04-29 | Reinforcement Learning for LLM Reasoning Under Memory Constraints | Alan Lee et.al. | 2504.20834 | null |
| 2025-04-29 | A Teacher-Student MPC-PPO Coupled Reinforcement Learning Framework for Winter Temperature Control of Solar Greenhouses in Northern China | Jingxin Yu et.al. | 2504.20815 | null |
| 2025-04-29 | SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings | Florian Vahl et.al. | 2504.20808 | null |
| 2025-04-29 | Q-Fusion: Diffusing Quantum Circuits | Collin Beaudoin et.al. | 2504.20794 | null |
| 2025-04-28 | SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning | Wufei Ma et.al. | 2504.20024 | null |
| 2025-04-28 | Socially-Aware Autonomous Driving: Inferring Yielding Intentions for Safer Interactions | Jing Wang et.al. | 2504.20004 | null |
| 2025-04-28 | Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Adam Younsi et.al. | 2504.19981 | null |
| 2025-04-28 | Mesh-Learner: Texturing Mesh with Spherical Harmonics | Yunfei Wan et.al. | 2504.19938 | null |
| 2025-04-28 | Automated decision-making for dynamic task assignment at scale | Riccardo Lo Bianco et.al. | 2504.19933 | null |
| 2025-04-28 | GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets | Mingqian He et.al. | 2504.19898 | null |
| 2025-04-28 | Optimizing the Charging of Open Quantum Batteries using Long Short-Term Memory-Driven Reinforcement Learning | Shadab Zakavati et.al. | 2504.19840 | null |
| 2025-04-28 | LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects | Guangyi Liu et.al. | 2504.19838 | link |
| 2025-04-28 | Reinforcement Learning-Based Heterogeneous Multi-Task Optimization in Semantic Broadcast Communications | Zhilin Lu et.al. | 2504.19806 | null |
| 2025-04-28 | Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control | Heisei Yonezawa et.al. | 2504.19715 | null |
| 2025-04-25 | Generalization Capability for Imitation Learning | Yixiao Wang et.al. | 2504.18538 | null |
| 2025-04-25 | Intelligent Attacks and Defense Methods in Federated Learning-enabled Energy-Efficient Wireless Networks | Han Zhang et.al. | 2504.18519 | null |
| 2025-04-25 | Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation | Peiyuan Jing et.al. | 2504.18453 | null |
| 2025-04-25 | Pushing the boundary on Natural Language Inference | Pablo Miralles-González et.al. | 2504.18376 | null |
| 2025-04-25 | Explainable AI for UAV Mobility Management: A Deep Q-Network Approach for Handover Minimization | Irshad A. Meer et.al. | 2504.18371 | null |
| 2025-04-25 | Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps | Simon Hakenes et.al. | 2504.18300 | null |
| 2025-04-25 | Depth-Constrained ASV Navigation with Deep RL and Limited Sensing | Amirhossein Zhalehmehrabi et.al. | 2504.18253 | null |
| 2025-04-25 | Aligning Language Models for Icelandic Legal Text Summarization | Þórir Hrafn Harðarson et.al. | 2504.18180 | null |
| 2025-04-25 | Offline Learning of Controllable Diverse Behaviors | Mathieu Petitbois et.al. | 2504.18160 | null |
| 2025-04-25 | Learning from Less: SINDy Surrogates in RL | Aniket Dixit et.al. | 2504.18113 | null |
| 2025-04-24 | Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control | Haochen Wang et.al. | 2504.17771 | null |
| 2025-04-24 | Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence | Edward Collins et.al. | 2504.17703 | null |
| 2025-04-24 | Applied Sheaf Theory For Multi-agent Artificial Intelligence (Reinforcement Learning) Systems: A Prospectus | Eric Schmid et.al. | 2504.17700 | null |
| 2025-04-24 | SAPO-RL: Sequential Actuator Placement Optimization for Fuselage Assembly via Reinforcement Learning | Peng Ye et.al. | 2504.17603 | null |
| 2025-04-24 | Mitigating xApp conflicts for efficient network slicing in 6G O-RAN: a graph convolutional-based attention network approach | Sihem Bakri et.al. | 2504.17590 | null |
| 2025-04-24 | Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization | Hongshu Guo et.al. | 2504.17578 | null |
| 2025-04-24 | Cooperative Task Offloading through Asynchronous Deep Reinforcement Learning in Mobile Edge Computing for Future Networks | Yuelin Liu et.al. | 2504.17526 | null |
| 2025-04-24 | Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning | Mingqi Yuan et.al. | 2504.17490 | null |
| 2025-04-24 | Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning | Weiliang Zhang et.al. | 2504.17356 | null |
| 2025-04-24 | Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization | Xiaohan Huang et.al. | 2504.17355 | null |
| 2025-04-23 | Latent Diffusion Planning for Imitation Learning | Amber Xie et.al. | 2504.16925 | null |
| 2025-04-23 | Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms | Hsin-Jung Yang et.al. | 2504.16916 | null |
| 2025-04-23 | Hybrid Reinforcement Learning and Model Predictive Control for Adaptive Control of Hydrogen-Diesel Dual-Fuel Combustion | Julian Bedei et.al. | 2504.16875 | null |
| 2025-04-23 | Monte Carlo Planning with Large Language Model for Text-Based Game Agents | Zijing Shi et.al. | 2504.16855 | null |
| 2025-04-23 | SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward | Nicolas Jonason et.al. | 2504.16839 | null |
| 2025-04-23 | MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme | Weixi Li et.al. | 2504.16729 | null |
| 2025-04-23 | PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation | Wenxuan Li et.al. | 2504.16693 | null |
| 2025-04-23 | Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator | Chenhao Li et.al. | 2504.16680 | null |
| 2025-04-23 | Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning | Chris et.al. | 2504.16656 | link |
| 2025-04-23 | Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models | Fredy Pokou et.al. | 2504.16635 | null |
| 2025-04-22 | TTRL: Test-Time Reinforcement Learning | Yuxin Zuo et.al. | 2504.16084 | link |
| 2025-04-22 | LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities | Thomas Schmied et.al. | 2504.16078 | null |
| 2025-04-22 | Reinforcement Learning and Metaheuristics for Feynman Integral Reduction | Mao Zeng et.al. | 2504.16045 | null |
| 2025-04-22 | The Formation of Production Networks: How Supply Chains Arise from Simple Learning with Minimal Information | Tuong Manh Vu et.al. | 2504.16010 | null |
| 2025-04-22 | Making Neural Networks More Suitable for Approximate Clifford+T Circuit Synthesis | Mathias Weiden et.al. | 2504.15990 | null |
| 2025-04-22 | Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems | Lukas Gehrke et.al. | 2504.15984 | null |
| 2025-04-22 | Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning | Wang Lin et.al. | 2504.15932 | null |
| 2025-04-22 | StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation | Yinmin Zhong et.al. | 2504.15930 | null |
| 2025-04-22 | New Recipe for Semi-supervised Community Detection: Clique Annealing under Crystallization Kinetics | Ling Cheng et.al. | 2504.15927 | null |
| 2025-04-22 | GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network | Wenjing Xiao et.al. | 2504.15905 | null |
| 2025-04-21 | VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models | Weiye Xu et.al. | 2504.15279 | null |
| 2025-04-21 | Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Jie Cheng et.al. | 2504.15275 | link |
| 2025-04-21 | FlowReasoner: Reinforcing Query-Level Meta-Agents | Hongcheng Gao et.al. | 2504.15257 | link |
| 2025-04-21 | DRAGON: Distributional Rewards Optimize Diffusion Generative Models | Yatong Bai et.al. | 2504.15217 | null |
| 2025-04-21 | Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs | Marina Sakharova et.al. | 2504.15210 | null |
| 2025-04-21 | Beyond Binary Opinions: A Deep Reinforcement Learning-Based Approach to Uncertainty-Aware Competitive Influence Maximization | Qi Zhang et.al. | 2504.15131 | null |
| 2025-04-21 | A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment | Kangyao Huang et.al. | 2504.15129 | null |
| 2025-04-21 | Fast-Slow Co-advancing Optimizer: Toward Harmonious Adversarial Training of GAN | Lin Wang et.al. | 2504.15099 | null |
| 2025-04-21 | Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL | Simone Papicchio et.al. | 2504.15077 | null |
| 2025-04-21 | Energy-Efficient UAV-Mounted RIS for IoT: A Hybrid Energy Harvesting and DRL Approach | Mahmoud M. Salim et.al. | 2504.15043 | null |
| 2025-04-18 | Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Yang Yue et.al. | 2504.13837 | null |
| 2025-04-18 | Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning | Yixuan Even Xu et.al. | 2504.13818 | null |
| 2025-04-18 | DiffOG: Differentiable Policy Trajectory Optimization with Generalizability | Zhengtong Xu et.al. | 2504.13807 | null |
| 2025-04-18 | Imitation Learning with Precisely Labeled Human Demonstrations | Yilong Song et.al. | 2504.13803 | null |
| 2025-04-18 | Bake Two Cakes with One Oven: RL for Defusing Popularity Bias and Cold-start in Third-Party Library Recommendations | Minh Hoang Vuong et.al. | 2504.13772 | null |
| 2025-04-18 | A Reinforcement Learning Method to Factual and Counterfactual Explanations for Session-based Recommendation | Han Zhou et.al. | 2504.13632 | null |
| 2025-04-18 | Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning | Rohan P. Singh et.al. | 2504.13619 | null |
| 2025-04-18 | On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting | Niklas Funk et.al. | 2504.13618 | null |
| 2025-04-18 | Compile Scene Graphs with Reinforcement Learning | Zuyao Chen et.al. | 2504.13617 | null |
| 2025-04-18 | Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling | Zihao Feng et.al. | 2504.13592 | null |
| 2025-04-17 | Energy-Based Reward Models for Robust Language Model Alignment | Anamika Lochab et.al. | 2504.13134 | null |
| 2025-04-17 | LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard | Varun Rao et.al. | 2504.13125 | null |
| 2025-04-17 | SkyReels-V2: Infinite-length Film Generative Model | Guibin Chen et.al. | 2504.13074 | null |
| 2025-04-17 | NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation | Xiangyan Liu et.al. | 2504.13055 | null |
| 2025-04-17 | InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning | Zheng Wang et.al. | 2504.13032 | null |
| 2025-04-17 | QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? | Zhouyang Jiang et.al. | 2504.12961 | null |
| 2025-04-17 | RL-PINNs: Reinforcement Learning-Driven Adaptive Sampling for Efficient Training of PINNs | Zhenao Song et.al. | 2504.12949 | null |
| 2025-04-17 | Image-Editing Specialists: An RLAIF Approach for Diffusion Models | Elior Benarous et.al. | 2504.12833 | link |
| 2025-04-17 | Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis | James Rudd-Jones et.al. | 2504.12777 | null |
| 2025-04-17 | GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks | Hao Xu et.al. | 2504.12764 | null |
| 2025-04-16 | Adapting a World Model for Trajectory Following in a 3D Game | Marko Tot et.al. | 2504.12299 | null |
| 2025-04-16 | d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | Siyan Zhao et.al. | 2504.12216 | null |
| 2025-04-16 | Reasoning-Based AI for Startup Evaluation (R.A.I.S.E.): A Memory-Augmented, Multi-Step Decision Framework | Jack Preuveneers et.al. | 2504.12090 | null |
| 2025-04-16 | pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild | Jonas Myhre Schiøtt et.al. | 2504.12045 | null |
| 2025-04-16 | Evolutionary Reinforcement Learning for Interpretable Decision-Making in Supply Chain Management | Stefano Genetti et.al. | 2504.12023 | null |
| 2025-04-16 | Control of Rayleigh-Bénard Convection: Effectiveness of Reinforcement Learning in the Turbulent Regime | Thorben Markmann et.al. | 2504.12000 | null |
| 2025-04-16 | A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs | Kihyuk Hong et.al. | 2504.11997 | null |
| 2025-04-16 | Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions | Yifei Dong et.al. | 2504.11967 | null |
| 2025-04-16 | R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors | Haoyang Wang et.al. | 2504.11946 | null |
| 2025-04-16 | VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning | Xuyang Chen et.al. | 2504.11944 | null |
| 2025-04-15 | DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning | Zhiwei He et.al. | 2504.11456 | null |
| 2025-04-15 | A Clean Slate for Offline Reinforcement Learning | Matthew Thomas Jackson et.al. | 2504.11453 | null |
| 2025-04-15 | Embodied World Models Emerge from Navigational Task in Open-Ended Environments | Li Jin et.al. | 2504.11419 | null |
| 2025-04-15 | Measures of Variability for Risk-averse Policy Gradient | Yudong Luo et.al. | 2504.11412 | null |
| 2025-04-15 | Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning | Haiming Wang et.al. | 2504.11354 | null |
| 2025-04-15 | A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce | Wei Xiong et.al. | 2504.11343 | null |
| 2025-04-15 | Multi-Agent Reinforcement Learning for Greenhouse Gas Offset Credit Markets | Liam Welsh et.al. | 2504.11258 | null |
| 2025-04-15 | A Rollout-Based Algorithm and Reward Function for Efficient Resource Allocation in Business Processes | Jeroen Middelhuis et.al. | 2504.11250 | null |
| 2025-04-15 | Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks | Fikrican Özgür et.al. | 2504.11247 | null |
| 2025-04-15 | Revealing Covert Attention by Analyzing Human and Reinforcement Learning Agent Gameplay | Henrik Krauss et.al. | 2504.11118 | null |
| 2025-04-14 | Weight Ensembling Improves Reasoning in Language Models | Xingyu Dang et.al. | 2504.10478 | null |
| 2025-04-14 | Co-optimizing Physical Reconfiguration Parameters and Controllers for an Origami-inspired Reconfigurable Manipulator | Zhe Chen et.al. | 2504.10474 | null |
| 2025-04-14 | GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents | Xiaobo Xia et.al. | 2504.10458 | null |
| 2025-04-14 | The Communication and Computation Trade-off in Wireless Semantic Communications | Xuyang Chen et.al. | 2504.10357 | null |
| 2025-04-14 | Heimdall: test-time scaling on the generative verification | Wenlei Shi et.al. | 2504.10337 | null |
| 2025-04-14 | Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning | Guanqi He et.al. | 2504.10334 | null |
| 2025-04-14 | InstructEngine: Instruction-driven Text-to-Image Alignment | Xingyu Lu et.al. | 2504.10329 | null |
| 2025-04-14 | Vision based driving agent for race car simulation environments | Gergely Bári et.al. | 2504.10266 | null |
| 2025-04-14 | Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins | Collins O. Ogbodo et.al. | 2504.10248 | null |
| 2025-04-14 | Deep Reasoning Translation via Reinforcement Learning | Jiaan Wang et.al. | 2504.10187 | null |
| 2025-04-11 | Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing | Vinal Asodia et.al. | 2504.08704 | null |
| 2025-04-11 | Pobogot – An Open-Hardware Open-Source Low Cost Robot for Swarm Robotics | Alessia Loi et.al. | 2504.08686 | null |
| 2025-04-11 | Reinforcement Learning-Driven Plant-Wide Refinery Planning Using Model Decomposition | Zhouchang Li et.al. | 2504.08642 | null |
| 2025-04-11 | Neural Fidelity Calibration for Informative Sim-to-Real Adaptation | Youwei Yu et.al. | 2504.08604 | null |
| 2025-04-11 | SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning | Peixian Ma et.al. | 2504.08600 | link |
| 2025-04-11 | Playpen: An Environment for Exploring Learning Through Conversational Interaction | Nicola Horst et.al. | 2504.08590 | null |
| 2025-04-11 | Slicing the Gaussian Mixture Wasserstein Distance | Moritz Piening et.al. | 2504.08544 | null |
| 2025-04-11 | Diffusion Models for Robotic Manipulation: A Survey | Rosa Wolf et.al. | 2504.08438 | null |
| 2025-04-11 | Belief States for Cooperative Multi-Agent Reinforcement Learning under Partial Observability | Paul J. Pritz et.al. | 2504.08417 | null |
| 2025-04-11 | Scalable Conflict-free Decision Making with Photons | Kohei Konaka et.al. | 2504.08331 | null |
| 2025-04-10 | Perception-R1: Pioneering Perception Policy with Reinforcement Learning | En Yu et.al. | 2504.07954 | link |
| 2025-04-10 | Echo: An Open-Source, Low-Cost Teleoperation System with Force Feedback for Dataset Collection in Robot Learning | Artem Bazhenov et.al. | 2504.07939 | null |
| 2025-04-10 | Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining | Rosie Zhao et.al. | 2504.07912 | link |
| 2025-04-10 | Fast Adaptation with Behavioral Foundation Models | Harshit Sikchi et.al. | 2504.07896 | null |
| 2025-04-10 | 2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization | Mengyang Li et.al. | 2504.07856 | null |
| 2025-04-10 | Genetic Programming with Reinforcement Learning Trained Transformer for Real-World Dynamic Scheduling Problems | Xian Chen et.al. | 2504.07779 | null |
| 2025-04-10 | Harnessing Equivariance: Modeling Turbulence with Graph Neural Networks | Marius Kurz et.al. | 2504.07741 | null |
| 2025-04-10 | Relaxing the Markov Requirements on Reinforcement Learning Under Weak Partial Ignorability | MaryLena Bleile et.al. | 2504.07722 | null |
| 2025-04-10 | Sim-to-Real Transfer in Reinforcement Learning for Maneuver Control of a Variable-Pitch MAV | Zhikun Wang et.al. | 2504.07694 | null |
| 2025-04-10 | VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model | Haozhan Shen et.al. | 2504.07615 | link |
| 2025-04-09 | Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning | Chenjie Hao et.al. | 2504.07095 | link |
| 2025-04-09 | AssistanceZero: Scalably Solving Assistance Games | Cassidy Laidlaw et.al. | 2504.07091 | link |
| 2025-04-09 | A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility | Andreas Hochlehnert et.al. | 2504.07086 | link |
| 2025-04-09 | To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning | Tian Qin et.al. | 2504.07052 | null |
| 2025-04-09 | Free Random Projection for In-Context Reinforcement Learning | Tomohiro Hayase et.al. | 2504.06983 | null |
| 2025-04-09 | VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning | Xinhao Li et.al. | 2504.06958 | link |
| 2025-04-09 | Regret Bounds for Robust Online Decision Making | Alexander Appel et.al. | 2504.06820 | null |
| 2025-04-09 | Interactive Expressive Motion Generation Using Dynamic Movement Primitives | Till Hielscher et.al. | 2504.06735 | null |
| 2025-04-09 | Learning global control of underactuated systems with Model-Based Reinforcement Learning | Niccolò Turcato et.al. | 2504.06721 | null |
| 2025-04-09 | SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination | Delin Zhao et.al. | 2504.06684 | null |
| 2025-04-08 | ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface | Fangchen Liu et.al. | 2504.06156 | null |
| 2025-04-08 | Adversarial Training of Reward Models | Alexander Bukharin et.al. | 2504.06141 | null |
| 2025-04-08 | A Multimedia Analytics Model for the Foundation Model Era | Marcel Worring et.al. | 2504.06138 | null |
| 2025-04-08 | Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms | Ido Greenberg et.al. | 2504.06126 | null |
| 2025-04-08 | Robo-taxi Fleet Coordination at Scale via Reinforcement Learning | Luigi Tresca et.al. | 2504.06125 | link |
| 2025-04-09 | Leanabell-Prover: Posttraining Scaling in Formal Reasoning | Jingyuan Zhang et.al. | 2504.06122 | link |
| 2025-04-08 | Trust-Region Twisted Policy Improvement | Joery A. de Vries et.al. | 2504.06048 | null |
| 2025-04-08 | Information-Theoretic Reward Decomposition for Generalizable RLHF | Liyuan Mao et.al. | 2504.06020 | null |
| 2025-04-08 | Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models | J. S. van Hulst et.al. | 2504.05978 | null |
| 2025-04-08 | AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems | Zhuoli Zhuang et.al. | 2504.05950 | null |
| 2025-04-07 | RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception | Hui Zhang et.al. | 2504.05287 | null |
| 2025-04-07 | Concise Reasoning via Reinforcement Learning | Mehdi Fatemi et.al. | 2504.05185 | link |
| 2025-04-07 | Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval | Kidist Amde Mekonnen et.al. | 2504.05181 | link |
| 2025-04-07 | RLBayes: a Bayesian Network Structure Learning Algorithm via Reinforcement Learning-Based Search Strategy | Mingcan Wang et.al. | 2504.05167 | null |
| 2025-04-07 | A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks | Leonardo Kanashiro Felizardo et.al. | 2504.05150 | link |
| 2025-04-08 | VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks | Yu Yue et.al. | 2504.05118 | null |
| 2025-04-07 | Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning | Anja Surina et.al. | 2504.05108 | null |
| 2025-04-08 | Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation | Huilin Yin et.al. | 2504.05045 | null |
| 2025-04-07 | Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning | Bibek Poudel et.al. | 2504.05018 | null |
| 2025-04-07 | Wavelet Policy: Imitation Policy Learning in Frequency Domain with Wavelet Transforms | Changchuan Yang et.al. | 2504.04991 | link |
| 2025-04-04 | Align to Structure: Aligning Large Language Models with Structural Information | Zae Myung Kim et.al. | 2504.03622 | null |
| 2025-04-04 | Optimization of a Triangular Delaunay Mesh Generator using Reinforcement Learning | Will Thacher et.al. | 2504.03610 | null |
| 2025-04-04 | Dexterous Manipulation through Imitation Learning: A Survey | Shan An et.al. | 2504.03515 | null |
| 2025-04-04 | Learning Dual-Arm Coordination for Grasping Large Flat Objects | Yongliang Wang et.al. | 2504.03500 | null |
| 2025-04-04 | Optimizing Quantum Circuits via ZX Diagrams using Reinforcement Learning and Graph Neural Networks | Alexander Mattick et.al. | 2504.03429 | null |
| 2025-04-04 | DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models | Sathish Kumar et.al. | 2504.03423 | null |
| 2025-04-04 | Autonomous state-space segmentation for Deep-RL sparse reward scenarios | Gianluca Maselli et.al. | 2504.03420 | null |
| 2025-04-04 | Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning | Sanghwan Bae et.al. | 2504.03380 | null |
| 2025-04-04 | Verification of Autonomous Neural Car Control with KeYmaera X | Enguerrand Prebet et.al. | 2504.03272 | null |
| 2025-04-04 | Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward | Yanming Wan et.al. | 2504.03206 | null |
| 2025-04-03 | Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets | Chuning Zhu et.al. | 2504.02792 | link |
| 2025-04-03 | A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy | Andrea Ghezzi et.al. | 2504.02710 | null |
| 2025-04-03 | Handover and SINR-Aware Path Optimization in 5G-UAV mmWave Communication using DRL | Achilles Kiwanuka Machumilane et.al. | 2504.02688 | null |
| 2025-04-03 | Integrating Human Knowledge Through Action Masking in Reinforcement Learning for Operations Research | Mirko Stappert et.al. | 2504.02662 | null |
| 2025-04-03 | SymDQN: Symbolic Knowledge and Reasoning in Neural Network-based Reinforcement Learning | Ivo Amador et.al. | 2504.02654 | null |
| 2025-04-03 | Solving the Paint Shop Problem with Flexible Management of Multi-Lane Buffers Using Reinforcement Learning and Action Masking | Mirko Stappert et.al. | 2504.02644 | null |
| 2025-04-03 | Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving | Daoguang Zan et.al. | 2504.02605 | link |
| 2025-04-03 | Regulating Spatial Fairness in a Tripartite Micromobility Sharing System via Reinforcement Learning | Matteo Cederle et.al. | 2504.02597 | null |
| 2025-04-03 | LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning | Kepu Zhang et.al. | 2504.02590 | null |
| 2025-04-04 | Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme | Yan Ma et.al. | 2504.02587 | link |
| 2025-04-02 | OpenCodeReasoning: Advancing Data Distillation for Competitive Coding | Wasi Uddin Ahmad et.al. | 2504.01943 | null |
| 2025-04-02 | Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity | Lisa Coiffard et.al. | 2504.01915 | null |
| 2025-04-02 | GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning | Yanzhou Su et.al. | 2504.01886 | link |
| 2025-04-02 | Interpreting Emergent Planning in Model-Free Reinforcement Learning | Thomas Bush et.al. | 2504.01871 | null |
| 2025-04-02 | Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error | Anne Somalwar et.al. | 2504.01766 | null |
| 2025-04-03 | Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning | Ke Jiang et.al. | 2504.01719 | null |
| 2025-04-02 | ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs | Yi-Long Lu et.al. | 2504.01698 | null |
| 2025-04-02 | 8-DoFs Cable Driven Parallel Robots for Bimanual Teleportation | Hung Hon Cheng et.al. | 2504.01554 | null |
| 2025-04-02 | A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics | Qihao Ye et.al. | 2504.01482 | null |
| 2025-04-02 | Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning | Llewyn Salt et.al. | 2504.01459 | null |
| 2025-03-31 | Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | Yi Chen et.al. | 2503.24376 | link |
| 2025-03-31 | Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning | Yubo Zhang et.al. | 2503.24296 | null |
| 2025-03-31 | Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model | Jingcheng Hu et.al. | 2503.24290 | link |
| 2025-03-31 | Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning | Jiacheng Lin et.al. | 2503.24289 | link |
| 2025-03-31 | Moving Edge for On-Demand Edge Computing: An Uncertainty-aware Approach | Fangtong Zhou et.al. | 2503.24214 | null |
| 2025-03-31 | Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantees via Constrained Mean-Field Reinforcement Learning | Matej Jusup et.al. | 2503.24183 | link |
| 2025-03-31 | Learning a Canonical Basis of Human Preferences from Binary Ratings | Kailas Vodrahalli et.al. | 2503.24150 | null |
| 2025-03-31 | Reinforcement Learning for Safe Autonomous Two Device Navigation of Cerebral Vessels in Mechanical Thrombectomy | Harry Robertshaw et.al. | 2503.24140 | null |
| 2025-03-31 | Level the Level: Balancing Game Levels for Asymmetric Player Archetypes With Reinforcement Learning | Florian Rupp et.al. | 2503.24099 | null |
| 2025-03-31 | HACTS: a Human-As-Copilot Teleoperation System for Robot Learning | Zhiyuan Xu et.al. | 2503.24070 | null |
| 2025-03-28 | Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Weiqi Li et.al. | 2503.22679 | link |
| 2025-03-28 | Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels | Adam Wei et.al. | 2503.22634 | null |
| 2025-03-28 | Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments | S. Aaron McClendon et.al. | 2503.22595 | null |
| 2025-03-28 | On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations | Rajdeep Singh Hundal et.al. | 2503.22575 | null |
| 2025-03-28 | Robust Offline Imitation Learning Through State-level Trajectory Stitching | Shuze Wang et.al. | 2503.22524 | null |
| 2025-03-28 | Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments | Luke Rowe et.al. | 2503.22496 | null |
| 2025-03-28 | Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model | Wangtao Sun et.al. | 2503.22480 | null |
| 2025-03-28 | Control of Humanoid Robots with Parallel Mechanisms using Kinematic Actuation Models | Victor Lutz et.al. | 2503.22459 | null |
| 2025-03-28 | Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning | Abdullah Vanlioglu et.al. | 2503.22456 | null |
| 2025-03-28 | Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses | Sebastián Espinel-Ríos et.al. | 2503.22409 | null |
| 2025-03-27 | Video-R1: Reinforcing Video Reasoning in MLLMs | Kaituo Feng et.al. | 2503.21776 | link |
| 2025-03-27 | ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation | Zhicheng Lee et.al. | 2503.21729 | link |
| 2025-03-27 | Collab: Controlled Decoding using Mixture of Agents for LLM Alignment | Souradip Chakraborty et.al. | 2503.21720 | null |
| 2025-03-27 | Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks | Wenqi Zhang et.al. | 2503.21696 | link |
| 2025-03-27 | LLM-Gomoku: A Large Language Model-Based System for Strategic Gomoku with Self-Play and Reinforcement Learning | Hui Wang et.al. | 2503.21683 | null |
| 2025-03-27 | A tale of two goals: leveraging sequentiality in multi-goal scenarios | Olivier Serris et.al. | 2503.21677 | null |
| 2025-03-27 | UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning | Zhengxi Lu et.al. | 2503.21620 | link |
| 2025-03-27 | A Deep Reinforcement Learning-based Approach for Adaptive Handover Protocols | Johannes Voigt et.al. | 2503.21601 | null |
| 2025-03-27 | DATA-WA: Demand-based Adaptive Task Assignment with Dynamic Worker Availability Windows | Jinwen Chen et.al. | 2503.21458 | null |
| 2025-03-27 | On Learning-Based Traffic Monitoring With a Swarm of Drones | Marko Maljkovic et.al. | 2503.21433 | null |
| 2025-03-26 | Understanding R1-Zero-Like Training: A Critical Perspective | Zichen Liu et.al. | 2503.20783 | link |
| 2025-03-27 | Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning | Huajie Tan et.al. | 2503.20752 | link |
| 2025-03-26 | Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control | Eloy Anguiano Batanero et.al. | 2503.20688 | null |
| 2025-03-26 | Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound | Yuhao Huang et.al. | 2503.20685 | null |
| 2025-03-26 | Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging | Han Wu et.al. | 2503.20641 | link |
| 2025-03-26 | State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning | Zongyuan Zhang et.al. | 2503.20613 | null |
| 2025-03-26 | Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models | Siyuan Guo et.al. | 2503.20576 | null |
| 2025-03-26 | Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems | Rakesh Nadig et.al. | 2503.20507 | null |
| 2025-03-26 | Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles | Ruoqi Wen et.al. | 2503.20462 | null |
| 2025-03-26 | The Crucial Role of Problem Formulation in Real-World Reinforcement Learning | Georg Schäfer et.al. | 2503.20442 | null |
| 2025-03-25 | Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking | Xiaoyu Tian et.al. | 2503.19855 | link |
| 2025-03-25 | Optimal Path Planning and Cost Minimization for a Drone Delivery System Via Model Predictive Control | Muhammad Al-Zafar Khan et.al. | 2503.19699 | null |
| 2025-03-25 | Risk-Aware Reinforcement Learning for Autonomous Driving: Improving Safety When Driving through Intersection | Bo Leng et.al. | 2503.19690 | null |
| 2025-03-25 | Learning to chain-of-thought with Jensen’s evidence lower bound | Yunhao Tang et.al. | 2503.19618 | null |
| 2025-03-25 | RL-finetuning LLMs from on- and off-policy data with a single algorithm | Yunhao Tang et.al. | 2503.19612 | null |
| 2025-03-25 | Optimizing Language Models for Inference Time Objectives using Reinforcement Learning | Yunhao Tang et.al. | 2503.19595 | null |
| 2025-03-25 | One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF | Xin Cai et.al. | 2503.19523 | null |
| 2025-03-25 | ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning | Mingyang Chen et.al. | 2503.19470 | link |
| 2025-03-25 | Multi-Agent Deep Reinforcement Learning for Safe Autonomous Driving with RICS-Assisted MEC | Xueyao Zhang et.al. | 2503.19418 | null |
| 2025-03-25 | NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios | Songyi Gao et.al. | 2503.19267 | link |
| 2025-03-24 | Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training | Brian R. Bartoldson et.al. | 2503.18929 | link |
| 2025-03-24 | SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild | Weihao Zeng et.al. | 2503.18892 | link |
| 2025-03-24 | Bootstrapped Model Predictive Control | Yuhang Wang et.al. | 2503.18871 | link |
| 2025-03-24 | Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm | Chak Lam Shek et.al. | 2503.18816 | null |
| 2025-03-24 | Sample-Efficient Reinforcement Learning of Koopman eNMPC | Daniel Mayfrank et.al. | 2503.18787 | null |
| 2025-03-24 | Simulation-Driven Balancing of Competitive Game Levels with Reinforcement Learning | Florian Rupp et.al. | 2503.18748 | null |
| 2025-03-24 | RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation | Chengbo Yuan et.al. | 2503.18738 | null |
| 2025-03-24 | FF-SRL: High Performance GPU-Based Surgical Simulation For Robot Learning | Diego Dall’Alba et.al. | 2503.18616 | null |
| 2025-03-24 | Adventurer: Exploration with BiGAN for Deep Reinforcement Learning | Yongshuai Liu et.al. | 2503.18612 | null |
| 2025-03-24 | Reinforcement Learning in Switching Non-Stationary Markov Decision Processes: Algorithms and Convergence Analysis | Mohsen Amiri et.al. | 2503.18607 | null |
| 2025-03-21 | OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement | Yihe Deng et.al. | 2503.17352 | link |
| 2025-03-21 | Capturing Individual Human Preferences with Reward Features | André Barreto et.al. | 2503.17338 | null |
| 2025-03-21 | FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models | Mingyang Song et.al. | 2503.17287 | link |
| 2025-03-21 | Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem | Abhijeet Pendyala et.al. | 2503.17194 | null |
| 2025-03-21 | Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning | Chan Kim et.al. | 2503.17125 | null |
| 2025-03-21 | Neural-Guided Equation Discovery | Jannis Brugger et.al. | 2503.16953 | null |
| 2025-03-21 | A New Segment Routing method with Swap Node Selection Strategy Based on Deep Reinforcement Learning for Software Defined Network | Miao Ye et.al. | 2503.16914 | null |
| 2025-03-21 | Federated Digital Twin Construction via Distributed Sensing: A Game-Theoretic Online Optimization with Overlapping Coalitions | Ruoyang Chen et.al. | 2503.16823 | null |
| 2025-03-21 | BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation | Hirotaka Tahara et.al. | 2503.16803 | null |
| 2025-03-21 | Causally Aligned Curriculum Learning | Mingxuan Li et.al. | 2503.16799 | null |
| 2025-03-20 | Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Yang Sui et.al. | 2503.16419 | link |
| 2025-03-20 | RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints | Yiran Qin et.al. | 2503.16408 | null |
| 2025-03-20 | Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming | Minori Narita et.al. | 2503.16371 | null |
| 2025-03-20 | JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse | Muyao Li et.al. | 2503.16365 | link |
| 2025-03-21 | Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning | Zhaowei Liu et.al. | 2503.16252 | link |
| 2025-03-20 | Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t | Quy-Anh Dang et.al. | 2503.16219 | link |
| 2025-03-20 | Explosive Jumping with Rigid and Articulated Soft Quadrupeds via Example Guided Reinforcement Learning | Georgios Apostolides et.al. | 2503.16197 | null |
| 2025-03-20 | Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning | Yuki Akiyama et.al. | 2503.16192 | null |
| 2025-03-20 | CLS-RL: Image Classification with Rule-Based Reinforcement Learning | Ming Li et.al. | 2503.16188 | link |
| 2025-03-20 | Cultural Alignment in Large Language Models Using Soft Prompt Tuning | Reem I. Masoud et.al. | 2503.16094 | null |
| 2025-03-19 | Learning to Play Piano in the Real World | Yves-Simon Zeulner et.al. | 2503.15481 | null |
| 2025-03-19 | What Makes a Reward Model a Good Teacher? An Optimization Perspective | Noam Razin et.al. | 2503.15477 | link |
| 2025-03-19 | CCDP: Composition of Conditional Diffusion Policies with Guided Sampling | Amirreza Razmjoo et.al. | 2503.15386 | null |
| 2025-03-19 | Online Imitation Learning for Manipulation via Decaying Relative Correction through Teleoperation | Cheng Pan et.al. | 2503.15368 | null |
| 2025-03-19 | Optimizing Decomposition for Optimal Claim Verification | Yining Lu et.al. | 2503.15354 | link |
| 2025-03-19 | aiXcoder-7B-v2: Training LLMs to Fully Utilize the Long Context in Repository-level Code Completion | Jia Li et.al. | 2503.15301 | null |
| 2025-03-19 | Reinforcement Learning for Robust Athletic Intelligence: Lessons from the 2nd ‘AI Olympics with RealAIGym’ Competition | Felix Wiebe et.al. | 2503.15290 | null |
| 2025-03-19 | DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning | Ruowen Zhao et.al. | 2503.15265 | link |
| 2025-03-19 | Partially Observable Reinforcement Learning with Memory Traces | Onno Eberhard et.al. | 2503.15200 | null |
| 2025-03-19 | Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach | Mohamed Hassouna et.al. | 2503.15190 | null |
| 2025-03-18 | DAPO: An Open-Source LLM Reinforcement Learning System at Scale | Qiying Yu et.al. | 2503.14476 | null |
| 2025-03-18 | Pauli Network Circuit Synthesis with Reinforcement Learning | Ayushi Dubal et.al. | 2503.14448 | null |
| 2025-03-18 | Flying in Highly Dynamic Environments with End-to-end Learning Approach | Xiyu Fan et.al. | 2503.14352 | null |
| 2025-03-18 | MANTRA: Enhancing Automated Method-Level Refactoring with Contextual RAG and Multi-Agent LLM Collaboration | Yisen Xu et.al. | 2503.14340 | null |
| 2025-03-18 | Revealing higher-order neural representations with generative artificial intelligence | Hojjat Azimi Asrari et.al. | 2503.14333 | null |
| 2025-03-18 | Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs | Nicolas Le Roux et.al. | 2503.14286 | null |
| 2025-03-18 | Integral modelling and Reinforcement Learning control of 3D liquid metal coating on a moving substrate | Fabio Pino et.al. | 2503.14270 | null |
| 2025-03-18 | Automating Experimental Optics with Sample Efficient Machine Learning Methods | Arindam Saha et.al. | 2503.14260 | null |
| 2025-03-18 | Quantization-Free Autoregressive Action Transformer | Ziyad Sheebaelhamd et.al. | 2503.14259 | null |
| 2025-03-18 | CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration | Chunyu Yang et.al. | 2503.14254 | null |
| 2025-03-17 | Uncovering Utility Functions from Observed Outcomes | Marta Grzeskiewicz et.al. | 2503.13432 | null |
| 2025-03-17 | FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation | Shijie Fang et.al. | 2503.13418 | null |
| 2025-03-17 | A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives | Weiqiang Jin et.al. | 2503.13415 | null |
| 2025-03-17 | TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM | Ye Wang et.al. | 2503.13377 | link |
| 2025-03-17 | Agents Play Thousands of 3D Video Games | Zhongwen Xu et.al. | 2503.13356 | null |
| 2025-03-17 | Local-Global Learning of Interpretable Control Policies: The Interface between MPC and Reinforcement Learning | Thomas Banker et.al. | 2503.13289 | null |
| 2025-03-17 | Timing the Match: A Deep Reinforcement Learning Approach for Ride-Hailing and Ride-Pooling Services | Yiman Bao et.al. | 2503.13200 | null |
| 2025-03-17 | A representational framework for learning and encoding structurally enriched trajectories in complex agent environments | Corina Catarau-Cotutiu et.al. | 2503.13194 | null |
| 2025-03-17 | HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning | Wensheng Wang et.al. | 2503.13171 | null |
| 2025-03-17 | Efficient Imitation Under Misspecification | Nicolas Espinosa-Dice et.al. | 2503.13162 | null |
| 2025-03-14 | Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning | Siyuan Huang et.al. | 2503.11646 | null |
| 2025-03-14 | Scaling the Automated Discovery of Quantum Circuits via Reinforcement Learning with Gadgets | Jan Olle et.al. | 2503.11638 | null |
| 2025-03-14 | Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control | Yifeng Zhang et.al. | 2503.11488 | null |
| 2025-03-14 | A Review of DeepSeek Models’ Key Innovative Techniques | Chengen Wang et.al. | 2503.11486 | null |
| 2025-03-14 | Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning | Jose-Luis Holgado-Alvarez et.al. | 2503.11467 | null |
| 2025-03-14 | Optimizing 6G Dense Network Deployment for the Metaverse Using Deep Reinforcement Learning | Jie Zhang et.al. | 2503.11449 | null |
| 2025-03-14 | Adaptive Torque Control of Exoskeletons under Spasticity Conditions via Reinforcement Learning | Andrés Chavarrías et.al. | 2503.11433 | null |
| 2025-03-14 | TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation | Hongxiang Zhao et.al. | 2503.11423 | null |
| 2025-03-14 | Reinforcement Learning-Based Controlled Switching Approach for Inrush Current Minimization in Power Transformers | Jone Ugarte Valdivielso et.al. | 2503.11398 | null |
| 2025-03-14 | Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model | Moritz A. Zanger et.al. | 2503.11339 | null |
| 2025-03-13 | NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models | Mert Albaba et.al. | 2503.10626 | null |
| 2025-03-13 | R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization | Yi Yang et.al. | 2503.10615 | link |
| 2025-03-13 | The Lagrangian Method for Solving Constrained Markov Games | Soham Das et.al. | 2503.10561 | null |
| 2025-03-13 | Towards Safe Path Tracking Using the Simplex Architecture | Georg Jäger et.al. | 2503.10559 | null |
| 2025-03-13 | SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models | Sahar Admoni et.al. | 2503.10509 | null |
| 2025-03-13 | Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality | Wei Xiao et.al. | 2503.10484 | null |
| 2025-03-13 | SortingEnv: An Extendable RL-Environment for an Industrial Sorting Process | Tom Maus et.al. | 2503.10466 | null |
| 2025-03-13 | Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | Liang Wen et.al. | 2503.10460 | link |
| 2025-03-13 | Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback | Derun Li et.al. | 2503.10434 | null |
| 2025-03-13 | Towards Constraint-Based Adaptive Hypergraph Learning for Solving Vehicle Routing: An End-to-End Solution | Zhenwei Wang et.al. | 2503.10421 | null |
| 2025-03-12 | Strategyproof Reinforcement Learning from Human Feedback | Thomas Kleine Buening et.al. | 2503.09561 | null |
| 2025-03-12 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Bowen Jin et.al. | 2503.09516 | link |
| 2025-03-12 | RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment | Md Morshed Alam et.al. | 2503.09513 | null |
| 2025-03-12 | Reinforcement Learning is all You Need | Yongsheng Lian et.al. | 2503.09512 | null |
| 2025-03-12 | ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning | Ziyu Wan et.al. | 2503.09501 | link |
| 2025-03-12 | Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic | Kexuan Wang et.al. | 2503.09391 | null |
| 2025-03-12 | Evaluating Reinforcement Learning Safety and Trustworthiness in Cyber-Physical Systems | Katherine Dearstyne et.al. | 2503.09388 | null |
| 2025-03-12 | Rule-Guided Reinforcement Learning Policy Evaluation and Improvement | Martin Tappler et.al. | 2503.09270 | null |
| 2025-03-12 | Large-scale Regional Traffic Signal Control Based on Single-Agent Reinforcement Learning | Qiang Li et.al. | 2503.09252 | null |
| 2025-03-12 | MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics | Shuguang Chu et.al. | 2503.09203 | null |
| 2025-03-11 | MoE-Loco: Mixture of Experts for Multitask Locomotion | Runhan Huang et.al. | 2503.08564 | null |
| 2025-03-11 | Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies | Chen Xu et.al. | 2503.08558 | null |
| 2025-03-11 | TLA: Tactile-Language-Action Model for Contact-Rich Manipulation | Peng Hao et.al. | 2503.08548 | null |
| 2025-03-11 | GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training | Tong Wei et.al. | 2503.08525 | null |
| 2025-03-11 | Hierarchical Multi Agent DRL for Soft Handovers Between Edge Clouds in Open RAN | F. Giarrè et.al. | 2503.08493 | null |
| 2025-03-11 | Hybrid Deep Reinforcement Learning for Radio Tracer Localisation in Robotic-assisted Radioguided Surgery | Hanyi Zhang et.al. | 2503.08492 | null |
| 2025-03-12 | An Autonomous RL Agent Methodology for Dynamic Web UI Testing in a BDD Framework | Ali Hassaan Mughal et.al. | 2503.08464 | null |
| 2025-03-11 | V-Max: Making RL practical for Autonomous Driving | Valentin Charraut et.al. | 2503.08388 | link |
| 2025-03-11 | Gait in Eight: Efficient On-Robot Learning for Omnidirectional Quadruped Locomotion | Nico Bohlinger et.al. | 2503.08375 | null |
| 2025-03-11 | LiPS: Large-Scale Humanoid Robot Reinforcement Learning with Parallel-Series Structures | Qiang Zhang et.al. | 2503.08349 | null |
| 2025-03-10 | Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration | Dylan J. Foster et.al. | 2503.07453 | null |
| 2025-03-10 | DRESS: Diffusion Reasoning-based Reward Shaping Scheme For Intelligent Networks | Feiran You et.al. | 2503.07433 | null |
| 2025-03-10 | The Interplay of AI-and-RAN: Dynamic Resource Allocation for Converged 6G Platform | Syed Danial Ali Shah et.al. | 2503.07420 | null |
| 2025-03-10 | Cost-Effective Design of Grid-tied Community Microgrid | Moslem Uddin et.al. | 2503.07414 | null |
| 2025-03-10 | PER-DPP Sampling Framework and Its Application in Path Planning | Junzhe Wang et.al. | 2503.07411 | null |
| 2025-03-10 | Towards Safe Robot Foundation Models | Maximilian Tölle et.al. | 2503.07404 | null |
| 2025-03-10 | Q-MARL: A quantum-inspired algorithm using neural message passing for large-scale multi-agent reinforcement learning | Kha Vo et.al. | 2503.07397 | null |
| 2025-03-10 | AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments | Grik Tadevosyan et.al. | 2503.07376 | null |
| 2025-03-10 | MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning | Fanqing Meng et.al. | 2503.07365 | link |
| 2025-03-10 | Artificial Utopia: Simulation and Intelligent Agents for a Democratised Future | Yannick Oswald et.al. | 2503.07364 | null |
| 2025-03-07 | Multi-Fidelity Policy Gradient Algorithms | Xinjie Liu et.al. | 2503.05696 | null |
| 2025-03-07 | dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale | Yihao Liu et.al. | 2503.05646 | null |
| 2025-03-07 | R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning | Huatong Song et.al. | 2503.05592 | null |
| 2025-03-07 | InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model | Feeza Khan Khanzada et.al. | 2503.05573 | null |
| 2025-03-07 | Tractable Representations for Convergent Approximation of Distributional HJB Equations | Julie Alhosh et.al. | 2503.05563 | null |
| 2025-03-07 | Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning | Raphael Trumpp et.al. | 2503.05546 | null |
| 2025-03-07 | RiLoCo: An ISAC-oriented AI Solution to Build RIS-empowered Networks | Guillermo Encinas-Lago et.al. | 2503.05480 | null |
| 2025-03-07 | Controllable Complementarity: Subjective Preferences in Human-AI Collaboration | Chase McDonald et.al. | 2503.05455 | null |
| 2025-03-07 | R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning | Jiaxing Zhao et.al. | 2503.05379 | null |
| 2025-03-07 | Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning | Hyungkyu Kang et.al. | 2503.05306 | null |
| 2025-03-06 | Sample-Optimal Agnostic Boosting with Unlabeled Data | Udaya Ghai et.al. | 2503.04706 | null |
| 2025-03-06 | L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning | Pranjal Aggarwal et.al. | 2503.04697 | null |
| 2025-03-06 | Multi-Agent Inverse Q-Learning from Demonstrations | Nathaniel Haynam et.al. | 2503.04679 | null |
| 2025-03-06 | Learning Generalizable Language-Conditioned Cloth Manipulation from Long Demonstrations | Hanyi Zhao et.al. | 2503.04557 | null |
| 2025-03-06 | PALo: Learning Posture-Aware Locomotion for Quadruped Robots | Xiangyu Miao et.al. | 2503.04462 | null |
| 2025-03-06 | AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services | Xiaoqi Wang et.al. | 2503.04418 | null |
| 2025-03-06 | Learning Transformer-based World Models with Contrastive Predictive Coding | Maxime Burchi et.al. | 2503.04416 | null |
| 2025-03-06 | Energy-Aware Task Offloading for Rotatable STAR-RIS-Enhanced Mobile Edge Computing Systems | Dongdong Yang et.al. | 2503.04397 | null |
| 2025-03-06 | Delay-Aware Digital Twin Synchronization in Mobile Edge Networks with Semantic Communications | Bin Li et.al. | 2503.04387 | null |
| 2025-03-06 | Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models | Niccolò Turcato et.al. | 2503.04280 | null |
| 2025-03-05 | Curating Demonstrations using Online Experience | Annie S. Chen et.al. | 2503.03707 | null |
| 2025-03-05 | A Generative Approach to High Fidelity 3D Reconstruction from Text Data | Venkat Kumar R et.al. | 2503.03664 | null |
| 2025-03-05 | Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns | Dong Tian et.al. | 2503.03660 | null |
| 2025-03-05 | Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset | Jessica Hoffmann et.al. | 2503.03654 | null |
| 2025-03-05 | Olympus: A Jumping Quadruped for Planetary Exploration Utilizing Reinforcement Learning for In-Flight Attitude Control | Jørgen Anker Olsen et.al. | 2503.03574 | null |
| 2025-03-05 | Probabilistic Insights for Efficient Exploration Strategies in Reinforcement Learning | Ernesto Garcia et.al. | 2503.03565 | null |
| 2025-03-05 | DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions | Anna Kuchko et.al. | 2503.03515 | null |
| 2025-03-05 | SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning | Borong Zhang et.al. | 2503.03480 | null |
| 2025-03-05 | Continuous Control of Diverse Skills in Quadruped Robots Without Complete Expert Datasets | Jiaxin Tu et.al. | 2503.03476 | null |
| 2025-03-05 | Navigating Intelligence: A Survey of Google OR-Tools and Machine Learning for Global Path Planning in Autonomous Vehicles | Alexandre Benoit et.al. | 2503.03338 | null |
| 2025-03-04 | Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation | Han Xue et.al. | 2503.02881 | null |
| 2025-03-04 | AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation | Songming Zhang et.al. | 2503.02832 | null |
| 2025-03-04 | Meta-Learning to Explore via Memory Density Feedback | Kevin L. McKee et.al. | 2503.02831 | null |
| 2025-03-04 | Quantitative Resilience Modeling for Autonomous Cyber Defense | Xavier Cadet et.al. | 2503.02780 | null |
| 2025-03-04 | Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning | Qiyang Yan et.al. | 2503.02738 | null |
| 2025-03-04 | Learning-Based Passive Fault-Tolerant Control of a Quadrotor with Rotor Failure | Jiehao Chen et.al. | 2503.02649 | null |
| 2025-03-04 | Human-aligned Safe Reinforcement Learning for Highway On-Ramp Merging in Dense Traffic | Yang Li et.al. | 2503.02624 | null |
| 2025-03-04 | Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models | Paul Stangel et.al. | 2503.02623 | null |
| 2025-03-04 | Reinforcement Learning-based Threat Assessment | Wuzhou Sun et.al. | 2503.02612 | null |
| 2025-03-04 | What Makes a Model Breathe? Understanding Reinforcement Learning Reward Function Design in Biomechanical User Simulation | Hannah Selder et.al. | 2503.02571 | null |
| 2025-02-28 | LLM Post-Training: A Deep Dive into Reasoning Large Language Models | Komal Kumar et.al. | 2502.21321 | null |
| 2025-02-28 | ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers | Alexander Scarlatos et.al. | 2502.21267 | null |
| 2025-02-28 | ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs | Hao Ge et.al. | 2502.21231 | null |
| 2025-02-28 | A Method of Selective Attention for Reservoir Based Agents | Kevin McKee et.al. | 2502.21229 | null |
| 2025-02-28 | Reducing Reward Dependence in RL Through Adaptive Confidence Discounting | Muhammed Yusuf Satici et.al. | 2502.21181 | null |
| 2025-02-28 | Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning | Léopold Maytié et.al. | 2502.21142 | null |
| 2025-02-28 | Dynamically Local-Enhancement Planner for Large-Scale Autonomous Driving | Nanshan Deng et.al. | 2502.21134 | null |
| 2025-02-28 | AuthSim: Towards Authentic and Effective Safety-critical Scenario Generation for Autonomous Driving Tests | Yukuan Yang et.al. | 2502.21100 | null |
| 2025-02-28 | Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control | Taeho Lee et.al. | 2502.21057 | null |
| 2025-02-28 | Motion ReTouch: Motion Modification Using Four-Channel Bilateral Control | Koki Inami et.al. | 2502.20982 | null |
| 2025-02-27 | Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids | Toru Lin et.al. | 2502.20396 | null |
| 2025-02-27 | Multi-Turn Code Generation Through Single-Step Rewards | Arnav Kumar Jain et.al. | 2502.20380 | null |
| 2025-02-27 | The Role of Tactile Sensing for Learning Reach and Grasp | Boya Zhang et.al. | 2502.20367 | null |
| 2025-02-27 | Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning | Thomas Budiarjo et.al. | 2502.20348 | null |
| 2025-02-27 | Safety Representations for Safer Policy Learning | Kaustubh Mani et.al. | 2502.20341 | null |
| 2025-02-27 | Deep Reinforcement Learning based Autonomous Decision-Making for Cooperative UAVs: A Search and Rescue Real World Application | Thomas Hickling et.al. | 2502.20326 | null |
| 2025-02-27 | On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+( $λ$,$λ$ ))-GA | Tai Nguyen et.al. | 2502.20265 | null |
| 2025-02-27 | Explainable physics-based constraints on reinforcement learning for accelerator controls | Jonathan Colen et.al. | 2502.20247 | null |
| 2025-02-27 | MARVEL: Multi-Agent Reinforcement Learning for constrained field-of-View multi-robot Exploration in Large-scale environments | Jimmy Chiun et.al. | 2502.20217 | null |
| 2025-02-27 | Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies | Zhouyu He et.al. | 2502.20190 | null |
| 2025-02-26 | Recurrent Auto-Encoders for Enhanced Deep Reinforcement Learning in Wilderness Search and Rescue Planning | Jan-Hendrik Ewers et.al. | 2502.19356 | null |
| 2025-02-26 | Hybrid Robot Learning for Automatic Robot Motion Planning in Manufacturing | Siddharth Singh et.al. | 2502.19340 | null |
| 2025-02-26 | WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies | William Solow et.al. | 2502.19308 | null |
| 2025-02-26 | Combining Planning and Reinforcement Learning for Solving Relational Multiagent Domains | Nikhilesh Prabhakar et.al. | 2502.19297 | null |
| 2025-02-26 | Deep Computerized Adaptive Testing | Jiguang Li et.al. | 2502.19275 | null |
| 2025-02-26 | Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective | Jiawei Huang et.al. | 2502.19255 | null |
| 2025-02-26 | ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration | Minjie Zhu et.al. | 2502.19250 | null |
| 2025-02-26 | Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time | Jiazheng Li et.al. | 2502.19230 | null |
| 2025-02-26 | When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning | Yijiang River Dong et.al. | 2502.19158 | null |
| 2025-02-26 | Policy Testing with MDPFuzz (Replicability Study) | Quentin Mazouni et.al. | 2502.19116 | null |
| 2025-02-25 | SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | Yuxiang Wei et.al. | 2502.18449 | null |
| 2025-02-25 | MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning | Chanwoo Park et.al. | 2502.18439 | null |
| 2025-02-25 | Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand | Fengshuo Bai et.al. | 2502.18423 | null |
| 2025-02-25 | Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck | Ryo Takizawa et.al. | 2502.18121 | null |
| 2025-02-25 | Controlling dynamics of stochastic systems with deep reinforcement learning | Ruslan Mukhamadiarov et.al. | 2502.18111 | null |
| 2025-02-25 | From planning to policy: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation | Haewon Jung et.al. | 2502.18015 | null |
| 2025-02-25 | NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms | Yashan Wang et.al. | 2502.18008 | null |
| 2025-02-25 | Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric | Zhenyu Tao et.al. | 2502.17983 | null |
| 2025-02-25 | FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real | Weiheng Liu et.al. | 2502.17894 | null |
| 2025-02-25 | Sample-efficient diffusion-based control of complex nonlinear systems | Hongyi Chen et.al. | 2502.17893 | null |
| 2025-02-24 | Event-Based Limit Order Book Simulation under a Neural Hawkes Process: Application in Market-Making | Luca Lalor et.al. | 2502.17417 | null |
| 2025-02-24 | Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models | Alon Albalak et.al. | 2502.17387 | link |
| 2025-02-24 | Distributed Coordination for Heterogeneous Non-Terrestrial Networks | Jikang Deng et.al. | 2502.17366 | null |
| 2025-02-24 | TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control | Zifeng Zhuang et.al. | 2502.17322 | null |
| 2025-02-24 | Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach | Jichen Li et.al. | 2502.17307 | null |
| 2025-02-24 | A Reinforcement Learning Approach to Non-prehensile Manipulation through Sliding | Hamidreza Raei et.al. | 2502.17221 | null |
| 2025-02-24 | Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning | Weiji Xie et.al. | 2502.17219 | null |
| 2025-02-24 | Teleology-Driven Affective Computing: A Causal Framework for Sustained Well-Being | Bin Yin et.al. | 2502.17172 | null |
| 2025-02-24 | A Novel Multiple Access Scheme for Heterogeneous Wireless Communications using Symmetry-aware Continual Deep Reinforcement Learning | Hamidreza Mazandarani et.al. | 2502.17167 | null |
| 2025-02-24 | MA2RL: Masked Autoencoders for Generalizable Multi-Agent Reinforcement Learning | Jinyuan Feng et.al. | 2502.17046 | null |
| 2025-02-21 | BOSS: Benchmark for Observation Space Shift in Long-Horizon Task | Yue Yang et.al. | 2502.15679 | null |
| 2025-02-21 | VaViM and VaVAM: Autonomous Driving through Video Generative Modeling | Florent Bartoccioni et.al. | 2502.15672 | link |
| 2025-02-21 | Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network | Vincent Hsiao et.al. | 2502.15662 | null |
| 2025-02-21 | A Simulation Pipeline to Facilitate Real-World Robotic Reinforcement Learning Applications | Jefferson Silveira et.al. | 2502.15649 | null |
| 2025-02-21 | Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach | Xiangtong Yao et.al. | 2502.15613 | null |
| 2025-02-21 | SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning | Xuyang Li et.al. | 2502.15512 | null |
| 2025-02-21 | Learning Long-Horizon Robot Manipulation Skills via Privileged Action | Xiaofeng Mao et.al. | 2502.15442 | null |
| 2025-02-21 | TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning | Giuseppe Paolo et.al. | 2502.15425 | null |
| 2025-02-21 | Hyperspherical Normalization for Scalable Deep Reinforcement Learning | Hojoon Lee et.al. | 2502.15280 | null |
| 2025-02-21 | CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models | Shunchang Liu et.al. | 2502.15278 | null |
| 2025-02-20 | Generating $π$ -Functional Molecules Using STGG+ with Active Learning | Alexia Jolicoeur-Martineau et.al. | 2502.14842 | link |
| 2025-02-20 | Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models | Vlad Sobal et.al. | 2502.14819 | null |
| 2025-02-20 | Making Universal Policies Universal | Niklas Höpner et.al. | 2502.14777 | null |
| 2025-02-20 | Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | Tian Xie et.al. | 2502.14768 | link |
| 2025-02-20 | Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse | Michael Doherty et.al. | 2502.14741 | null |
| 2025-02-20 | Length-Controlled Margin-Based Preference Optimization without Reference Model | Gengxu Li et.al. | 2502.14643 | null |
| 2025-02-20 | Curiosity Driven Multi-agent Reinforcement Learning for 3D Game Testing | Raihana Ferdous et.al. | 2502.14606 | null |
| 2025-02-20 | ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification | Hyunseok Lee et.al. | 2502.14565 | link |
| 2025-02-20 | MLGym: A New Framework and Benchmark for Advancing AI Research Agents | Deepak Nathani et.al. | 2502.14499 | link |
| 2025-02-20 | Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization | Zhitao He et.al. | 2502.14496 | link |
| 2025-02-19 | A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects | Arjun Gupta et.al. | 2502.13964 | null |
| 2025-02-19 | Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks | Guilherme Palma et.al. | 2502.13918 | null |
| 2025-02-19 | Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning | Antoine Moulin et.al. | 2502.13900 | null |
| 2025-02-19 | NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants | Yiran Qin et.al. | 2502.13894 | null |
| 2025-02-19 | Uncertainty quantification for Markov chains with application to temporal difference learning | Weichen Wu et.al. | 2502.13822 | null |
| 2025-02-19 | Learning to explore when mistakes are not allowed | Charly Pecqueux-Guézénec et.al. | 2502.13801 | null |
| 2025-02-19 | User Agency and System Automation in Interactive Intelligent Systems | Thomas Langerak et.al. | 2502.13779 | null |
| 2025-02-19 | Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values | Hongbo Zhang et.al. | 2502.13723 | null |
| 2025-02-19 | Hierarchical RL-MPC for Demand Response Scheduling | Maximilian Bloor et.al. | 2502.13714 | null |
| 2025-02-19 | User Association and Coordinated Beamforming in Cognitive Aerial-Terrestrial Networks: A Safe Reinforcement Learning Approach | Zizhen Zhou et.al. | 2502.13663 | null |
| 2025-02-18 | Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Shuo Xing et.al. | 2502.13146 | link |
| 2025-02-18 | RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning | Hao Gao et.al. | 2502.13144 | link |
| 2025-02-18 | Theorem Prover as a Judge for Synthetic Data Generation | Joshua Ong Jun Leang et.al. | 2502.13137 | null |
| 2025-02-18 | Text2World: Benchmarking Large Language Models for Symbolic World Model Generation | Mengkang Hu et.al. | 2502.13092 | link |
| 2025-02-18 | Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation | Sha Li et.al. | 2502.13019 | null |
| 2025-02-18 | HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit | Qingwei Ben et.al. | 2502.13013 | link |
| 2025-02-18 | Integrating Reinforcement Learning, Action Model Learning, and Numeric Planning for Tackling Complex Tasks | Yarin Benyamin et.al. | 2502.13006 | link |
| 2025-02-18 | Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options | Lakshmi Nair et.al. | 2502.12929 | link |
| 2025-02-18 | Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning | Nandakishor M et.al. | 2502.12876 | null |
| 2025-02-18 | A Survey on DRL based UAV Communications and Networking: DRL Fundamentals, Applications and Implementations | Wei Zhao et.al. | 2502.12875 | null |
| 2025-02-17 | Scaling Test-Time Compute Without Verification or RL is Suboptimal | Amrith Setlur et.al. | 2502.12118 | null |
| 2025-02-17 | Unhackable Temporal Rewarding for Scalable Video MLLMs | En Yu et.al. | 2502.12081 | link |
| 2025-02-17 | How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines | Ayan Sengupta et.al. | 2502.12051 | null |
| 2025-02-17 | Theoretical Barriers in Bellman-Based Reinforcement Learning | Brieuc Pinon et.al. | 2502.11968 | null |
| 2025-02-17 | Massively Scaling Explicit Policy-conditioned Value Functions | Nico Bohlinger et.al. | 2502.11949 | null |
| 2025-02-17 | FitLight: Federated Imitation Learning for Plug-and-Play Autonomous Traffic Signal Control | Yutong Ye et.al. | 2502.11937 | null |
| 2025-02-17 | VLP: Vision-Language Preference Learning for Embodied Manipulation | Runze Liu et.al. | 2502.11918 | null |
| 2025-02-17 | CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning | Yanxiao Zhao et.al. | 2502.11896 | null |
| 2025-02-17 | Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving? | Natalie Grabowsky et.al. | 2502.11864 | null |
| 2025-02-17 | Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces | Eric Eaton et.al. | 2502.11828 | null |
| 2025-02-14 | BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds | Huayi Wang et.al. | 2502.10363 | null |
| 2025-02-14 | Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations | Abdelrhman Shaheen et.al. | 2502.10303 | null |
| 2025-02-14 | Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding | Laurin Luttmann et.al. | 2502.10233 | null |
| 2025-02-14 | Dynamic Reinforcement Learning for Actors | Katsunari Shibata et.al. | 2502.10200 | null |
| 2025-02-14 | Reinforcement Learning based Constrained Optimal Control: an Interpretable Reward Design | Jingjie Ni et.al. | 2502.10187 | null |
| 2025-02-14 | Combinatorial Reinforcement Learning with Preference Feedback | Joongkyu Lee et.al. | 2502.10158 | null |
| 2025-02-14 | MonoForce: Learnable Image-conditioned Physics Engine | Ruslan Agishev et.al. | 2502.10156 | null |
| 2025-02-14 | Cooperative Multi-Agent Planning with Adaptive Skill Synthesis | Zhiyuan Li et.al. | 2502.10148 | null |
| 2025-02-14 | Provably Efficient RL under Episode-Wise Safety in Linear CMDPs | Toshinori Kitamura et.al. | 2502.10138 | null |
| 2025-02-14 | Causal Information Prioritization for Efficient Reinforcement Learning | Hongye Cao et.al. | 2502.10097 | null |
| 2025-02-13 | DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References | Xueyi Liu et.al. | 2502.09614 | link |
| 2025-02-13 | Coupled Rendezvous and Docking Maneuver control of satellite using Reinforcement learning-based Adaptive Fixed-Time Sliding Mode Controller | Rakesh Kumar Sahoo et.al. | 2502.09517 | null |
| 2025-02-13 | Variable Stiffness for Robust Locomotion through Reinforcement Learning | Dario Spoljaric et.al. | 2502.09436 | null |
| 2025-02-13 | A Survey of Reinforcement Learning for Optimization in Automation | Ahmad Farooq et.al. | 2502.09417 | null |
| 2025-02-13 | Generalizable Reinforcement Learning with Biologically Inspired Hyperdimensional Occupancy Grid Maps for Exploration and Goal-Directed Path Planning | Shay Snyder et.al. | 2502.09393 | null |
| 2025-02-13 | Machine learning for modelling unstructured grid data in computational physics: a review | Sibo Cheng et.al. | 2502.09346 | null |
| 2025-02-13 | Revisiting Topological Interference Management: A Learning-to-Code on Graphs Perspective | Zhiwei Shan et.al. | 2502.09344 | null |
| 2025-02-13 | Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning | Daniel Koutas et.al. | 2502.09298 | null |
| 2025-02-13 | Autonomous Task Completion Based on Goal-directed Answer Set Programming | Alexis R. Tudor et.al. | 2502.09208 | null |
| 2025-02-13 | Logical Reasoning in Large Language Models: A Survey | Hanmeng Liu et.al. | 2502.09100 | link |
| 2025-02-12 | Re $^3$ Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation | Xiaoshen Han et.al. | 2502.08645 | link |
| 2025-02-12 | A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards | Shivansh Patel et.al. | 2502.08643 | null |
| 2025-02-12 | Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning | Dhruv Rohatgi et.al. | 2502.08632 | null |
| 2025-02-12 | Robot Data Curation with Mutual Information Estimators | Joey Hejna et.al. | 2502.08623 | null |
| 2025-02-12 | Learning to Group and Grasp Multiple Objects | Takahiro Yonemaru et.al. | 2502.08452 | null |
| 2025-02-12 | CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World | Yankai Fu et.al. | 2502.08449 | null |
| 2025-02-12 | Acceleration of crystal structure relaxation with Deep Reinforcement Learning | Elena Trukhan et.al. | 2502.08405 | null |
| 2025-02-12 | Learning Humanoid Standing-up Control across Diverse Postures | Tao Huang et.al. | 2502.08378 | link |
| 2025-02-12 | Towards Principled Multi-Agent Task Agnostic Exploration | Riccardo Zamboni et.al. | 2502.08365 | null |
| 2025-02-12 | Deterministic generation of non-classical mechanical states in cavity optomechanics via reinforcement learning | Yu-Hong Liu et.al. | 2502.08350 | null |
| 2025-02-11 | Polynomial-Time Approximability of Constrained Reinforcement Learning | Jeremy McMahan et.al. | 2502.07764 | null |
| 2025-02-11 | DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove | Han Zhang et.al. | 2502.07730 | null |
| 2025-02-11 | Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning | Aya Kayal et.al. | 2502.07715 | null |
| 2025-02-11 | A Unifying Framework for Causal Imitation Learning with Hidden Confounders | Daqian Shao et.al. | 2502.07656 | null |
| 2025-02-11 | Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning | Zhaoting Li et.al. | 2502.07645 | null |
| 2025-02-11 | Distributed Value Decomposition Networks with Networked Agents | Guilherme S. Varela et.al. | 2502.07635 | null |
| 2025-02-11 | Evolution of cooperation in a bimodal mixture of conditional cooperators | Chenyang Zhao et.al. | 2502.07537 | null |
| 2025-02-11 | Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization | Daniel Palenicek et.al. | 2502.07523 | null |
| 2025-02-11 | Logarithmic Regret for Online KL-Regularized Reinforcement Learning | Heyang Zhao et.al. | 2502.07460 | null |
| 2025-02-11 | Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation | Erik M. Lintunen et.al. | 2502.07423 | null |
| 2025-02-10 | Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Chengqi Lyu et.al. | 2502.06781 | link |
| 2025-02-10 | On the Emergence of Thinking in LLMs I: Searching for the Right Intuition | Guanghao Ye et.al. | 2502.06773 | link |
| 2025-02-10 | ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates | Ling Yang et.al. | 2502.06772 | link |
| 2025-02-10 | AgilePilot: DRL-Based Drone Agent for Real-Time Motion Planning in Dynamic Environments by Leveraging Object Detection | Roohan Ahmed Khan et.al. | 2502.06725 | null |
| 2025-02-10 | Discovery of skill switching criteria for learning agile quadruped locomotion | Wanming Yu et.al. | 2502.06676 | null |
| 2025-02-10 | Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series | Aurélien Renault et.al. | 2502.06584 | null |
| 2025-02-10 | Predictive Red Teaming: Breaking Policies Without Breaking Robots | Anirudha Majumdar et.al. | 2502.06575 | null |
| 2025-02-10 | Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning | Jean Vassoyan et.al. | 2502.06533 | link |
| 2025-02-10 | Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling | Shenghong He et.al. | 2502.06491 | null |
| 2025-02-10 | SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding | Shuhao Liao et.al. | 2502.06440 | null |
| 2025-02-07 | DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails | Yihe Deng et.al. | 2502.05163 | link |
| 2025-02-07 | Use of Winsome Robots for Understanding Human Feedback (UWU) | Jessica Eggers et.al. | 2502.05118 | null |
| 2025-02-07 | 3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery | Xiuyuan Hu et.al. | 2502.05107 | link |
| 2025-02-07 | Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Tushar Pandey et.al. | 2502.05078 | link |
| 2025-02-07 | Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning approach with Policy Distillation | Wenqi Bai et.al. | 2502.05069 | null |
| 2025-02-07 | Seasonal Station-Keeping of Short Duration High Altitude Balloons using Deep Reinforcement Learning | Tristan K. Schuler et.al. | 2502.05014 | null |
| 2025-02-07 | A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning Enhanced Approach | Taiyi Wang et.al. | 2502.05001 | null |
| 2025-02-07 | Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits | Finn Rietz et.al. | 2502.04979 | null |
| 2025-02-07 | Towards Smarter Sensing: 2D Clutter Mitigation in RL-Driven Cognitive MIMO Radar | Adam Umra et.al. | 2502.04967 | null |
| 2025-02-07 | Fast Adaptive Anti-Jamming Channel Access via Deep Q Learning and Coarse-Grained Spectrum Prediction | Jianshu Zhang et.al. | 2502.04963 | null |
| 2025-02-06 | DexterityGen: Foundation Controller for Unprecedented Dexterity | Zhao-Heng Yin et.al. | 2502.04307 | null |
| 2025-02-06 | PILAF: Optimal Human Preference Sampling for Reward Modeling | Yunzhen Feng et.al. | 2502.04270 | null |
| 2025-02-06 | Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning | Wesley A. Suttle et.al. | 2502.04141 | null |
| 2025-02-06 | Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents | Yuchen Lian et.al. | 2502.04038 | null |
| 2025-02-06 | Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning | Nikunj Gupta et.al. | 2502.04028 | link |
| 2025-02-06 | Bilevel Multi-Armed Bandit-Based Hierarchical Reinforcement Learning for Interaction-Aware Self-Driving at Unsignalized Intersections | Zengqi Peng et.al. | 2502.03960 | null |
| 2025-02-06 | Fairness Aware Reinforcement Learning via Proximal Policy Optimization | Gabriele La Malfa et.al. | 2502.03953 | null |
| 2025-02-06 | CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning | Yousef Koka et.al. | 2502.03946 | null |
| 2025-02-06 | Mirror Descent Actor Critic via Bounded Advantage Learning | Ryo Iwaki et.al. | 2502.03854 | null |
| 2025-02-06 | PAGNet: Pluggable Adaptive Generative Networks for Information Completion in Multi-Agent Communication | Zhuohui Zhang et.al. | 2502.03845 | null |
| 2025-02-05 | Deep Reinforcement Learning-Based Optimization of Second-Life Battery Utilization in Electric Vehicles Charging Stations | Rouzbeh Haghighi et.al. | 2502.03412 | null |
| 2025-02-05 | Lightweight Authenticated Task Offloading in 6G-Cloud Vehicular Twin Networks | Sarah Al-Shareeda et.al. | 2502.03403 | null |
| 2025-02-05 | Energy-Efficient Flying LoRa Gateways: A Multi-Agent Reinforcement Learning Approach | Abdullahi Isa Ahmed et.al. | 2502.03377 | null |
| 2025-02-05 | Demystifying Long Chain-of-Thought Reasoning in LLMs | Edward Yeo et.al. | 2502.03373 | link |
| 2025-02-05 | Learning from Active Human Involvement through Proxy Value Propagation | Zhenghao Peng et.al. | 2502.03369 | null |
| 2025-02-05 | Conditional Prediction by Simulation for Automated Driving | Fabian Konstantinidis et.al. | 2502.03286 | null |
| 2025-02-05 | Calibrated Unsupervised Anomaly Detection in Multivariate Time-series using Reinforcement Learning | Saba Sanami et.al. | 2502.03245 | null |
| 2025-02-05 | Underwater Soft Fin Flapping Motion with Deep Neural Network Based Surrogate Model | Yuya Hamamatsu et.al. | 2502.03135 | null |
| 2025-02-05 | Double Distillation Network for Multi-Agent Reinforcement Learning | Yang Zhou et.al. | 2502.03125 | null |
| 2025-02-05 | HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller | Qiyuan Zhang et.al. | 2502.03122 | null |
| 2025-02-04 | Flow Q-Learning | Seohong Park et.al. | 2502.02538 | null |
| 2025-02-04 | Brief analysis of DeepSeek R1 and it’s implications for Generative AI | Sarah Mercer et.al. | 2502.02523 | null |
| 2025-02-04 | Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Maohao Shen et.al. | 2502.02508 | null |
| 2025-02-04 | Towards Fast Graph Generation via Autoregressive Noisy Filtration Modeling | Markus Krimmel et.al. | 2502.02415 | null |
| 2025-02-04 | Achieving Hiding and Smart Anti-Jamming Communication: A Parallel DRL Approach against Moving Reactive Jammer | Yangyang Li et.al. | 2502.02385 | null |
| 2025-02-04 | Circular Microalgae-Based Carbon Control for Net Zero | Federico Zocco et.al. | 2502.02382 | null |
| 2025-02-04 | Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning | Donglin Zhan et.al. | 2502.02332 | null |
| 2025-02-04 | Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation | Siyu Wang et.al. | 2502.02327 | null |
| 2025-02-04 | DIME:Diffusion-Based Maximum Entropy Reinforcement Learning | Onur Celik et.al. | 2502.02316 | null |
| 2025-02-04 | MAGNNET: Multi-Agent Graph Neural Network-based Efficient Task Allocation for Autonomous Vehicles with Deep Reinforcement Learning | Lavanya Ratnabala et.al. | 2502.02311 | null |
| 2025-01-31 | Vintix: Action Model via In-Context Reinforcement Learning | Andrey Polubarov et.al. | 2501.19400 | link |
| 2025-01-31 | The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking | Yuchun Miao et.al. | 2501.19358 | null |
| 2025-01-31 | Jackpot! Alignment as a Maximal Lottery | Roberto-Rafael Maura-Rivero et.al. | 2501.19266 | null |
| 2025-01-31 | Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning | Balint Gyevnar et.al. | 2501.19256 | null |
| 2025-01-31 | Linear $Q$ -Learning Does Not Diverge: Convergence Rates to a Bounded Set | Xinyu Liu et.al. | 2501.19254 | null |
| 2025-02-03 | SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments | Hüseyin Aydın et.al. | 2501.19245 | null |
| 2025-01-31 | An Empirical Game-Theoretic Analysis of Autonomous Cyber-Defence Agents | Gregory Palmer et.al. | 2501.19206 | null |
| 2025-01-31 | APEX: Automated Parameter Exploration for Low-Power Wireless Protocols | Mohamed Hassaan M. Hydher et.al. | 2501.19194 | null |
| 2025-01-31 | Test-Time Training Scaling for Chemical Exploration in Drug Design | Morgan Thomas et.al. | 2501.19153 | null |
| 2025-01-31 | Decorrelated Soft Actor-Critic for Efficient Deep Reinforcement Learning | Burcu Küçükoğlu et.al. | 2501.19133 | null |
| 2025-01-30 | Design and Validation of Learning Aware HMI For Learning-Enabled Increasingly Autonomous Systems | Parth Ganeriwala et.al. | 2501.18506 | null |
| 2025-01-30 | Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor | Fausto Mauricio Lagos Suarez et.al. | 2501.18490 | null |
| 2025-01-30 | Model-Free RL Agents Demonstrate System 1-Like Intentionality | Hal Ashton et.al. | 2501.18299 | null |
| 2025-01-30 | Neural Operator based Reinforcement Learning for Control of first-order PDEs with Spatially-Varying State Delay | Jiaqi Hu et.al. | 2501.18201 | null |
| 2025-01-30 | QNN-QRL: Quantum Neural Network Integrated with Quantum Reinforcement Learning for Quantum Key Distribution | Bikash K. Behera et.al. | 2501.18188 | null |
| 2025-01-30 | Investigating Tax Evasion Emergence Using Dual Large Language Model and Deep Reinforcement Learning Powered Agent-based Simulation | Teddy Lazebnik et.al. | 2501.18177 | null |
| 2025-01-30 | B3C: A Minimalist Approach to Offline Multi-Agent Reinforcement Learning | Woojun Kim et.al. | 2501.18138 | null |
| 2025-01-30 | Diverse Preference Optimization | Jack Lanchantin et.al. | 2501.18101 | null |
| 2025-01-30 | Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method | Hoda Yamani et.al. | 2501.18093 | null |
| 2025-01-30 | DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems | Se-Wook Yoo et.al. | 2501.18086 | null |
| 2025-01-29 | From Sparse to Dense: Toddler-inspired Reward Transition in Goal-Oriented Reinforcement Learning | Junseok Park et.al. | 2501.17842 | null |
| 2025-01-29 | Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning | Haque Ishfaq et.al. | 2501.17827 | null |
| 2025-01-29 | Consensus Based Stochastic Control | Liyao Lyu et.al. | 2501.17801 | null |
| 2025-01-29 | CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization | Derui Wang et.al. | 2501.17667 | link |
| 2025-01-29 | Accelerated DC loadflow solver for topology optimization | Nico Westerbeck et.al. | 2501.17529 | null |
| 2025-01-29 | Human-Aligned Skill Discovery: Balancing Behaviour Exploration and Alignment | Maxence Hussonnois et.al. | 2501.17431 | null |
| 2025-01-29 | Certificated Actor-Critic: Hierarchical Reinforcement Learning with Control Barrier Functions for Safe Navigation | Junjun Xie et.al. | 2501.17424 | null |
| 2025-01-29 | Value Function Decomposition in Markov Recommendation Process | Xiaobei Wang et.al. | 2501.17409 | null |
| 2025-01-29 | A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning | Zhengpeng Xie et.al. | 2501.17384 | null |
| 2025-01-29 | ASAP: Learning Generalizable Online Bin Packing via Adaptive Selection After Pruning | Han Fang et.al. | 2501.17377 | null |
| 2025-01-28 | SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | Tianzhe Chu et.al. | 2501.17161 | null |
| 2025-01-28 | Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning | Rémy Hosseinkhan Boucher et.al. | 2501.17115 | null |
| 2025-01-28 | Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction | Carl-Leander Henneking et.al. | 2501.17112 | null |
| 2025-01-28 | COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models | Tobias Materzok et.al. | 2501.17104 | null |
| 2025-01-28 | Learning Mean Field Control on Sparse Graphs | Christian Fabian et.al. | 2501.17079 | null |
| 2025-01-28 | Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning | Anna Soligo et.al. | 2501.17077 | null |
| 2025-01-28 | Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies | Manojkumar Parmar et.al. | 2501.17030 | null |
| 2025-01-28 | Network Slice-based Low-Altitude Intelligent Network for Advanced Air Mobility | Kai Xiong et.al. | 2501.17014 | null |
| 2025-01-28 | Heterogeneity-aware Personalized Federated Learning via Adaptive Dual-Agent Reinforcement Learning | Xi Chen et.al. | 2501.16966 | null |
| 2025-01-28 | On Rollouts in Model-Based Reinforcement Learning | Bernd Frauenknecht et.al. | 2501.16918 | link |
| 2025-01-27 | Upside Down Reinforcement Learning with Policy Generators | Jacopo Di Ventura et.al. | 2501.16288 | link |
| 2025-01-27 | Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach | Yang Xu et.al. | 2501.16243 | null |
| 2025-01-27 | Towards General-Purpose Model-Free Reinforcement Learning | Scott Fujimoto et.al. | 2501.16142 | link |
| 2025-01-27 | Quantifying the Self-Interest Level of Markov Social Dilemmas | Richard Willis et.al. | 2501.16138 | null |
| 2025-01-27 | ReFill: Reinforcement Learning for Fill-In Minimization | Elfarouk Harb et.al. | 2501.16130 | null |
| 2025-01-27 | Multi-Agent Meta-Offline Reinforcement Learning for Timely UAV Path Planning and Data Collection | Eslam Eldeeb et.al. | 2501.16098 | null |
| 2025-01-27 | Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback | Harry Emerson et.al. | 2501.15972 | null |
| 2025-01-27 | REINFORCE-ING Chemical Language Models in Drug Design | Morgan Thomas et.al. | 2501.15971 | null |
| 2025-01-27 | Inverse Reinforcement Learning via Convex Optimization | Hao Zhu et.al. | 2501.15957 | null |
| 2025-01-27 | Generative AI for Lyapunov Optimization Theory in UAV-based Low-Altitude Economy Networking | Zhang Liu et.al. | 2501.15928 | null |
| 2025-01-24 | An Attentive Graph Agent for Topology-Adaptive Cyber Defence | Ilya Orson Sandoval et.al. | 2501.14700 | link |
| 2025-01-24 | ACT-JEPA: Joint-Embedding Predictive Architecture Improves Policy Representation Learning | Aleksandar Vujinovic et.al. | 2501.14622 | null |
| 2025-01-24 | COMIX: Generalized Conflict Management in O-RAN xApps – Architecture, Workflow, and a Power Control case | Anastasios Giannopoulos et.al. | 2501.14619 | null |
| 2025-01-24 | Age and Power Minimization via Meta-Deep Reinforcement Learning in UAV Networks | Sankani Sarathchandra et.al. | 2501.14603 | null |
| 2025-01-24 | Reducing Action Space for Deep Reinforcement Learning via Causal Effect Estimation | Wenzhang Liu et.al. | 2501.14543 | link |
| 2025-01-24 | Breaking the Pre-Planning Barrier: Real-Time Adaptive Coordination of Mission and Charging UAVs Using Graph Reinforcement Learning | Yuhan Hu et.al. | 2501.14488 | null |
| 2025-01-24 | MARL-OT: Multi-Agent Reinforcement Learning Guided Online Fuzzing to Detect Safety Violation in Autonomous Driving Systems | Linfeng Liang et.al. | 2501.14451 | null |
| 2025-01-24 | Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent | Lucía Güitta-López et.al. | 2501.14443 | null |
| 2025-01-24 | SKIL: Semantic Keypoint Imitation Learning for Generalizable Data-efficient Manipulation | Shengjie Wang et.al. | 2501.14400 | null |
| 2025-01-24 | Reinforcement Learning for Efficient Returns Management | Pascal Linden et.al. | 2501.14394 | null |
| 2025-01-23 | CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation | Guofeng Cui et.al. | 2501.13927 | null |
| 2025-01-23 | Improving Video Generation with Human Feedback | Jie Liu et.al. | 2501.13918 | link |
| 2025-01-23 | GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration | Yue Fan et.al. | 2501.13896 | null |
| 2025-01-23 | Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning | Matyáš Lorenc et.al. | 2501.13883 | link |
| 2025-01-23 | A space-decoupling framework for optimization on bounded-rank matrices with orthogonally invariant constraints | Yan Yang et.al. | 2501.13830 | null |
| 2025-01-23 | Large Language Model driven Policy Exploration for Recommender Systems | Jie Wang et.al. | 2501.13816 | null |
| 2025-01-23 | Integrating Causality with Neurochaos Learning: Proposed Approach and Research Agenda | Nanjangud C. Narendra et.al. | 2501.13763 | null |
| 2025-01-23 | Scalable Safe Multi-Agent Reinforcement Learning for Multi-Agent System | Haikuo Du et.al. | 2501.13727 | null |
| 2025-01-23 | WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control | Claire Bizon Monroc et.al. | 2501.13592 | link |
| 2025-01-23 | Explainable AI-aided Feature Selection and Model Reduction for DRL-based V2X Resource Allocation | Nasir Khan et.al. | 2501.13552 | null |
| 2025-01-22 | Which Sensor to Observe? Timely Tracking of a Joint Markov Source with Model Predictive Control | Ismail Cosandal et.al. | 2501.13099 | null |
| 2025-01-22 | Attention-Driven Hierarchical Reinforcement Learning with Particle Filtering for Source Localization in Dynamic Fields | Yiwei Shi et.al. | 2501.13084 | null |
| 2025-01-22 | Evolution and The Knightian Blindspot of Machine Learning | Joel Lehman et.al. | 2501.13075 | null |
| 2025-01-22 | AdaWM: Adaptive World Model based Planning for Autonomous Driving | Hang Wang et.al. | 2501.13072 | null |
| 2025-01-22 | Optimizing Return Distributions with Distributional Dynamic Programming | Bernardo Ávila Pires et.al. | 2501.13028 | null |
| 2025-01-22 | MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking | Sebastian Farquhar et.al. | 2501.13011 | null |
| 2025-01-22 | An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management | Eslam Eldeeb et.al. | 2501.12991 | null |
| 2025-01-22 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | DeepSeek-AI et.al. | 2501.12948 | link |
| 2025-01-22 | Offline Critic-Guided Diffusion Policy for Multi-User Delay-Constrained Scheduling | Zhuoran Li et.al. | 2501.12942 | null |
| 2025-01-22 | Reinforcement learning Based Automated Design of Differential Evolution Algorithm for Black-box Optimization | Xu Yang et.al. | 2501.12881 | null |
| 2025-01-21 | InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model | Yuhang Zang et.al. | 2501.12368 | link |
| 2025-01-21 | ARM-IRL: Adaptive Resilience Metric Quantification Using Inverse Reinforcement Learning | Abhijeet Sahu et.al. | 2501.12362 | null |
| 2025-01-21 | Sum Rate Enhancement using Machine Learning for Semi-Self Sensing Hybrid RIS-Enabled ISAC in THz Bands | Sara Farrag Mobarak et.al. | 2501.12353 | null |
| 2025-01-21 | Towards neural reinforcement learning for large deviations in nonequilibrium systems with memory | Venkata D. Pamulaparthy et.al. | 2501.12333 | null |
| 2025-01-21 | Heuristic Deep Reinforcement Learning for Phase Shift Optimization in RIS-assisted Secure Satellite Communication Systems with RSMA | Tingnan Bao et.al. | 2501.12311 | null |
| 2025-01-21 | RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression | Uri Gadot et.al. | 2501.12216 | null |
| 2025-01-21 | Experience-replay Innovative Dynamics | Tuo Zhang et.al. | 2501.12199 | null |
| 2025-01-21 | Extend Adversarial Policy Against Neural Machine Translation via Unknown Token | Wei Zou et.al. | 2501.12183 | null |
| 2025-01-21 | DNRSelect: Active Best View Selection for Deferred Neural Rendering | Dongli Wu et.al. | 2501.12150 | null |
| 2025-01-21 | Tackling Uncertainties in Multi-Agent Reinforcement Learning through Integration of Agent Termination Dynamics | Somnath Hazra et.al. | 2501.12061 | link |
| 2025-01-17 | DexForce: Extracting Force-informed Actions from Kinesthetic Demonstrations for Dexterous Manipulation | Claire Chen et.al. | 2501.10356 | null |
| 2025-01-17 | Enhancing AI Transparency: XRL-Based Resource Management and RAN Slicing for 6G ORAN Architecture | Suvidha Mhatre et.al. | 2501.10292 | null |
| 2025-01-17 | Enhancing UAV Path Planning Efficiency Through Accelerated Learning | Joseanne Viana et.al. | 2501.10141 | null |
| 2025-01-17 | Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking | Futian Wang et.al. | 2501.10129 | null |
| 2025-01-17 | PaSa: An LLM Agent for Comprehensive Academic Paper Search | Yichen He et.al. | 2501.10120 | link |
| 2025-01-17 | GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning | Zifeng Shi et.al. | 2501.10116 | null |
| 2025-01-17 | Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics | Chenhao Li et.al. | 2501.10100 | null |
| 2025-01-17 | ForestProtector: An IoT Architecture Integrating Machine Vision and Deep Reinforcement Learning for Efficient Wildfire Monitoring | Kenneth Bonilla-Ormachea et.al. | 2501.09926 | null |
| 2025-01-17 | SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning | Haichao Zhang et.al. | 2501.09905 | null |
| 2025-01-16 | From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation | Peilang Li et.al. | 2501.09858 | null |
| 2025-01-16 | Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models | Fengli Xu et.al. | 2501.09686 | null |
| 2025-01-16 | Optimizing hypergraph product codes with random walks, simulated annealing and reinforcement learning | Bruno C. A. Freire et.al. | 2501.09622 | null |
| 2025-01-16 | Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment | Chaoqi Wang et.al. | 2501.09620 | null |
| 2025-01-16 | EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning | Siddharth Aravindan et.al. | 2501.09611 | null |
| 2025-01-16 | RE-POSE: Synergizing Reinforcement Learning-Based Partitioning and Offloading for Edge Object Detection | Jianrui Shi et.al. | 2501.09465 | null |
| 2025-01-16 | ADAGE: A generic two-layer framework for adaptive agent based modelling | Benjamin Patrick Evans et.al. | 2501.09429 | null |
| 2025-01-16 | Fast Searching of Extreme Operating Conditions for Relay Protection Setting Calculation Based on Graph Neural Network and Reinforcement Learning | Yan Li et.al. | 2501.09399 | null |
| 2025-01-16 | Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse | Guangyuan Liu et.al. | 2501.09391 | null |
| 2025-01-16 | Adaptive Contextual Caching for Mobile Edge Large Language Model Service | Guangyuan Liu et.al. | 2501.09383 | null |
| 2025-01-16 | Solving Infinite-Player Games with Player-to-Strategy Networks | Carlos Martin et.al. | 2501.09330 | null |
| 2025-01-15 | Computing Approximated Fixpoints via Dampened Mann Iteration | Paolo Baldan et.al. | 2501.08950 | null |
| 2025-01-15 | A Reinforcement Learning Approach to Quiet and Safe UAM Traffic Management | Surya Murthy et.al. | 2501.08941 | null |
| 2025-01-15 | Reinforcement learning-based adaptive time-integration for nonsmooth dynamics | David Riley et.al. | 2501.08934 | null |
| 2025-01-15 | Projection Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning | Xinchen Han et.al. | 2501.08907 | null |
| 2025-01-15 | Deep Learning Meets Queue-Reactive: A Framework for Realistic Limit Order Book Simulation | Hamza Bodor et.al. | 2501.08822 | null |
| 2025-01-15 | Multi-visual modality micro drone-based structural damage detection | Isaac Osei Agyemanga et.al. | 2501.08807 | null |
| 2025-01-15 | Networked Agents in the Dark: Team Value Learning under Partial Observability | Guilherme S. Varela et.al. | 2501.08778 | null |
| 2025-01-15 | SPEQ: Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning | Carlo Romeo et.al. | 2501.08669 | null |
| 2025-01-15 | Application of Deep Reinforcement Learning to UAV Swarming for Ground Surveillance | Raúl Arranz et.al. | 2501.08655 | null |
| 2025-01-15 | RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation | Kaiqu Liang et.al. | 2501.08617 | null |
| 2025-01-14 | FDPP: Fine-tune Diffusion Policy with Human Preference | Yuxin Chen et.al. | 2501.08259 | null |
| 2025-01-14 | Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning | Enrique Adrian Villarrubia-Martin et.al. | 2501.08234 | null |
| 2025-01-14 | Optimization of Link Configuration for Satellite Communication Using Reinforcement Learning | Tobias Rohe et.al. | 2501.08220 | null |
| 2025-01-14 | In-situ graph reasoning and knowledge expansion using Graph-PReFLexOR | Markus J. Buehler et.al. | 2501.08120 | null |
| 2025-01-14 | Data-driven inventory management for new products: A warm-start and adjusted Dyna- $Q$ approach | Xinyu Qu et.al. | 2501.08109 | null |
| 2025-01-14 | Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving | Guizhe Jin et.al. | 2501.08096 | null |
| 2025-01-14 | CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning | Guoliang He et.al. | 2501.08071 | null |
| 2025-01-14 | Continual Reinforcement Learning for Digital Twin Synchronization Optimization | Haonan Tong et.al. | 2501.08045 | null |
| 2025-01-14 | READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data | Rohit Sharma et.al. | 2501.08035 | null |
| 2025-01-14 | Cooperative Patrol Routing: Optimizing Urban Crime Surveillance through Multi-Agent Reinforcement Learning | Juan Palma-Borda et.al. | 2501.08020 | null |
| 2025-01-13 | SafeSwarm: Decentralized Safe RL for the Swarm of Drones Landing in Dense Crowds | Grik Tadevosyan et.al. | 2501.07566 | null |
| 2025-01-13 | Improving DeFi Accessibility through Efficient Liquidity Provisioning with Deep Reinforcement Learning | Haonan Xu et.al. | 2501.07508 | null |
| 2025-01-13 | RbRL2.0: Integrated Reward and Policy Learning for Rating-based Reinforcement Learning | Mingkang Wu et.al. | 2501.07502 | null |
| 2025-01-13 | Online inductive learning from answer sets for efficient reinforcement learning exploration | Celeste Veronese et.al. | 2501.07445 | null |
| 2025-01-13 | Attention when you need | Lokesh Boominathan et.al. | 2501.07440 | null |
| 2025-01-13 | Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data | Shilong Deng et.al. | 2501.07346 | link |
| 2025-01-13 | Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring | Buse Sibel Korkmaz et.al. | 2501.07324 | link |
| 2025-01-13 | Mining Intraday Risk Factor Collections via Hierarchical Reinforcement Learning based on Transferred Options | Wenyan Xu et.al. | 2501.07274 | null |
| 2025-01-13 | Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer | Chongming Gao et.al. | 2501.07212 | null |
| 2025-01-13 | Generalizable Graph Neural Networks for Robust Power Grid Topology Control | Matthijs de Jong et.al. | 2501.07186 | null |
| 2025-01-10 | From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training | Julius Berner et.al. | 2501.06148 | link |
| 2025-01-10 | Vehicle-in-Virtual-Environment (VVE) Based Autonomous Driving Function Development and Evaluation Methodology for Vulnerable Road User Safety | Haochong Chen et.al. | 2501.06113 | null |
| 2025-01-10 | Learning Flexible Heterogeneous Coordination with Capability-Aware Shared Hypernetworks | Kevin Fu et.al. | 2501.06058 | null |
| 2025-01-10 | Investigating the Impact of Observation Space Design Choices On Training Reinforcement Learning Solutions for Spacecraft Problems | Nathaniel Hamilton et.al. | 2501.06016 | null |
| 2025-01-10 | The Safe Trusted Autonomy for Responsible Space Program | Kerianne L. Hobbs et.al. | 2501.05984 | null |
| 2025-01-10 | A Practical Demonstration of DRL-Based Dynamic Resource Allocation xApp Using OpenAirInterface | Onur Sever et.al. | 2501.05879 | null |
| 2025-01-10 | Diffusion Models for Smarter UAVs: Decision-Making and Modeling | Yousef Emami et.al. | 2501.05819 | null |
| 2025-01-10 | Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform | Jingyi Cheng et.al. | 2501.05808 | null |
| 2025-01-10 | Understanding Impact of Human Feedback via Influence Functions | Taywon Min et.al. | 2501.05790 | link |
| 2025-01-09 | Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning | Tao Liu et.al. | 2501.05591 | null |
| 2025-01-09 | TimeRL: Efficient Deep Reinforcement Learning with Polyhedral Dependence Graphs | Pedro F. Silvestre et.al. | 2501.05408 | null |
| 2025-01-09 | Search-o1: Agentic Search-Enhanced Large Reasoning Models | Xiaoxi Li et.al. | 2501.05366 | link |
| 2025-01-09 | Knowledge Transfer in Model-Based Reinforcement Learning Agents for Efficient Multi-Task Learning | Dmytro Kuzmenko et.al. | 2501.05329 | null |
| 2025-01-09 | Design and Control of a Bipedal Robotic Character | Ruben Grandia et.al. | 2501.05204 | null |
| 2025-01-09 | Constrained Optimization of Charged Particle Tracking with Multi-Agent Reinforcement Learning | Tobias Kortus et.al. | 2501.05113 | null |
| 2025-01-09 | LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models | Zengqi Peng et.al. | 2501.05057 | null |
| 2025-01-09 | CuRLA: Curriculum Learning Based Deep Reinforcement Learning for Autonomous Driving | Bhargava Uppuluri et.al. | 2501.04982 | null |
| 2025-01-09 | Promoting Shared Energy Storage Aggregation among High Price-Tolerance Prosumer: An Incentive Deposit and Withdrawal Service | Xin Lu et.al. | 2501.04964 | null |
| 2025-01-09 | Balancing Exploration and Cybersickness: Investigating Curiosity-Driven Behavior in Virtual Environments | Tangyao Li et.al. | 2501.04905 | null |
| 2025-01-08 | Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning | Sergio Rozada et.al. | 2501.04879 | null |
| 2025-01-08 | Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought | Violet Xiang et.al. | 2501.04682 | null |
| 2025-01-08 | Framework for Integrating Machine Learning Methods for Path-Aware Source Routing | Anees Al-Najjar et.al. | 2501.04624 | null |
| 2025-01-08 | MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data | Zifan Wang et.al. | 2501.04595 | null |
| 2025-01-08 | HypeRL: Parameter-Informed Reinforcement Learning for Parametric PDEs | Nicolò Botteghi et.al. | 2501.04538 | null |
| 2025-01-08 | Safe Reinforcement Learning with Minimal Supervision | Alexander Quessy et.al. | 2501.04481 | null |
| 2025-01-08 | Research on environment perception and behavior prediction of intelligent UAV based on semantic communication | Kechong Ren et.al. | 2501.04480 | null |
| 2025-01-08 | Hybrid Artificial Intelligence Strategies for Drone Navigation | Rubén San-Segundo et.al. | 2501.04472 | null |
| 2025-01-08 | Risk-averse policies for natural gas futures trading using distributional reinforcement learning | Félicien Hêche et.al. | 2501.04421 | null |
| 2025-01-08 | Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions | Yu Ishihara et.al. | 2501.04228 | null |
| 2025-01-07 | Explainable Reinforcement Learning via Temporal Policy Decomposition | Franco Ruggeri et.al. | 2501.03902 | null |
| 2025-01-07 | Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies | Kexin Gu Baugh et.al. | 2501.03888 | null |
| 2025-01-07 | AlphaPO – Reward shape matters for LLM alignment | Aman Gupta et.al. | 2501.03884 | null |
| 2025-01-07 | Online Reinforcement Learning-Based Dynamic Adaptive Evaluation Function for Real-Time Strategy Tasks | Weilong Yang et.al. | 2501.03824 | null |
| 2025-01-07 | Run-and-tumble chemotaxis using reinforcement learning | Ramesh Pramanik et.al. | 2501.03687 | null |
| 2025-01-07 | IEEE 802.11bn Multi-AP Coordinated Spatial Reuse with Hierarchical Multi-Armed Bandits | Maksymilian Wojnar et.al. | 2501.03680 | null |
| 2025-01-07 | SALE-Based Offline Reinforcement Learning with Ensemble Q-Networks | Zheng Chun et.al. | 2501.03676 | null |
| 2025-01-07 | Imitation Learning of MPC with Neural Networks: Error Guarantees and Sparsification | Hendrik Alsmeier et.al. | 2501.03671 | null |
| 2025-01-07 | Rethinking Adversarial Attacks in Reinforcement Learning from Policy Distribution Perspective | Tianyang Duan et.al. | 2501.03562 | null |
| 2025-01-07 | Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment | Prashant Trivedi et.al. | 2501.03486 | null |
| 2025-01-06 | Turn-based Multi-Agent Reinforcement Learning Model Checking | Dennis Gross et.al. | 2501.03187 | null |
| 2025-01-06 | Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies | Dennis Gross et.al. | 2501.03142 | null |
| 2025-01-06 | CALM: Curiosity-Driven Auditing for Large Language Models | Xiang Zheng et.al. | 2501.02997 | null |
| 2025-01-06 | CAMP: Collaborative Attention Model with Profiles for Vehicle Routing Problems | Chuanbo Hua et.al. | 2501.02977 | null |
| 2025-01-06 | Sim-to-Real Transfer for Mobile Robots with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots | Sahar Salimpour et.al. | 2501.02902 | link |
| 2025-01-06 | Revisiting Communication Efficiency in Multi-Agent Reinforcement Learning from the Dimensional Analysis Perspective | Chuxiong Sun et.al. | 2501.02888 | null |
| 2025-01-06 | First-place Solution for Streetscape Shop Sign Recognition Competition | Bin Wang et.al. | 2501.02811 | null |
| 2025-01-06 | Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model | Yueqin Yin et.al. | 2501.02790 | null |
| 2025-01-06 | Joint Optimization of UAV-Carried IRS for Urban Low Altitude mmWave Communications with Deep Reinforcement Learning | Wenwen Xie et.al. | 2501.02787 | null |
| 2025-01-06 | Learn A Flexible Exploration Model for Parameterized Action Markov Decision Processes | Zijian Wang et.al. | 2501.02774 | null |
| 2025-01-03 | Evaluating Scenario-based Decision-making for Interactive Autonomous Driving Using Rational Criteria: A Survey | Zhen Tian et.al. | 2501.01886 | null |
| 2025-01-03 | Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models | Yanjiang Liu et.al. | 2501.01830 | null |
| 2025-01-03 | Genetic algorithm enhanced Solovay-Kitaev algorithm for quantum compiling | Jiangwei Long et.al. | 2501.01746 | null |
| 2025-01-03 | Proposing Hierarchical Goal-Conditioned Policy Planning in Multi-Goal Reinforcement Learning | Gavin B. Rens et.al. | 2501.01727 | null |
| 2025-01-03 | Inversely Learning Transferable Rewards via Abstracted States | Yikang Gui et.al. | 2501.01669 | null |
| 2025-01-03 | BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems | Yinbo Yu et.al. | 2501.01593 | null |
| 2025-01-02 | Reinforcement-learning-based control of turbulent channel flows at high Reynolds numbers | Zisong Zhou et.al. | 2501.01573 | null |
| 2025-01-02 | Reinforcement Learning for Respondent-Driven Sampling | Justin Weltz et.al. | 2501.01505 | null |
| 2025-01-02 | Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension | Yanbo Fang et.al. | 2501.01332 | null |
| 2025-01-02 | Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems | Shunxing Yang et.al. | 2501.01281 | null |
| 2025-01-02 | PIMAEX: Multi-Agent Exploration through Peer Incentivization | Michael Kölle et.al. | 2501.01266 | null |
| 2025-01-02 | Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method | Ruichen Zhang et.al. | 2501.01141 | null |
| 2025-01-02 | Communicating Unexpectedness for Out-of-Distribution Multi-Agent Reinforcement Learning | Min Whoo Lee et.al. | 2501.01140 | null |
| 2025-01-02 | Symmetries-enhanced Multi-Agent Reinforcement Learning | Nikolaos Bousias et.al. | 2501.01136 | null |
| 2025-01-02 | Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning | Chenglu Sun et.al. | 2501.01085 | null |
| 2025-01-02 | Enhancing Neural Adaptive Wireless Video Streaming via Lower-Layer Information Exposure and Online Tuning | Lingzhi Zhao et.al. | 2501.01044 | null |
| 2025-01-02 | Energy-Efficient and Intelligent ISAC in V2X Networks with Spiking Neural Networks-Driven DRL | Chen Shang et.al. | 2501.01038 | null |
| 2025-01-02 | Deep Reinforcement Learning for Job Scheduling and Resource Management in Cloud Computing: An Algorithm-Level Review | Yan Gu et.al. | 2501.01007 | null |
| 2024-12-30 | Advances in Multi-agent Reinforcement Learning: Persistent Autonomy and Robot Learning Lab Report 2024 | Reza Azadeh et.al. | 2412.21088 | null |
| 2024-12-30 | Learning Epidemiological Dynamics via the Finite Expression Method | Jianda Du et.al. | 2412.21049 | null |
| 2024-12-30 | Weber-Fechner Law in Temporal Difference learning derived from Control as Inference | Keiichiro Takahashi et.al. | 2412.21004 | null |
| 2024-12-30 | LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency | Xiao-Yin Liu et.al. | 2412.21001 | link |
| 2024-12-30 | UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI | Fangwei Zhong et.al. | 2412.20977 | null |
| 2024-12-30 | Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients | Dongdong Li et.al. | 2412.20845 | null |
| 2024-12-30 | Isoperimetry is All We Need: Langevin Posterior Sampling for RL with Sublinear Regret | Emilio Jorge et.al. | 2412.20824 | null |
| 2024-12-29 | The intrinsic motivation of reinforcement and imitation learning for sequential tasks | Sao Mai Nguyen et.al. | 2412.20573 | null |
| 2024-12-29 | Diminishing Return of Value Expansion Methods | Daniel Palenicek et.al. | 2412.20537 | link |
| 2024-12-29 | Game Theory and Multi-Agent Reinforcement Learning : From Nash Equilibria to Evolutionary Dynamics | Neil De La Fuente et.al. | 2412.20523 | null |
| 2024-12-27 | From Ceilings to Walls: Universal Dynamic Perching of Small Aerial Robots on Surfaces with Variable Orientations | Bryan Habas et.al. | 2412.19765 | null |
| 2024-12-27 | Adaptive Context-Aware Multi-Path Transmission Control for VR/AR Content: A Deep Reinforcement Learning Approach | Shakil Ahmed et.al. | 2412.19737 | null |
| 2024-12-27 | Goal-oriented Communications based on Recursive Early Exit Neural Networks | Jary Pomponi et.al. | 2412.19587 | null |
| 2024-12-27 | Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization | Shixuan Liu et.al. | 2412.19578 | null |
| 2024-12-27 | Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing | Yongbiao Gao et.al. | 2412.19563 | null |
| 2024-12-27 | Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning | Xuan Zhou et.al. | 2412.19538 | null |
| 2024-12-27 | An Overview of Machine Learning-Driven Resource Allocation in IoT Networks | Zhengdong Li et.al. | 2412.19478 | null |
| 2024-12-27 | DeepSeek-V3 Technical Report | DeepSeek-AI et.al. | 2412.19437 | link |
| 2024-12-27 | Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback | Seong Jin Lee et.al. | 2412.19436 | null |
| 2024-12-27 | Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe | Kiran Koshy Thekumparampil et.al. | 2412.19396 | null |
| 2024-12-24 | Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making | David Shoresh et.al. | 2412.18593 | null |
| 2024-12-24 | Dynamic Optimization of Portfolio Allocation Using Deep Reinforcement Learning | Gang Huang et.al. | 2412.18563 | link |
| 2024-12-24 | Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving | Hao Pang et.al. | 2412.18511 | null |
| 2024-12-24 | Joint Adaptive OFDM and Reinforcement Learning Design for Autonomous Vehicles: Leveraging Age of Updates | Mamady Delamou et.al. | 2412.18500 | null |
| 2024-12-24 | Contrastive Representation for Interactive Recommendation | Jingyu Li et.al. | 2412.18396 | link |
| 2024-12-24 | Navigating Data Corruption in Machine Learning: Balancing Quality, Quantity, and Imputation Strategies | Qi Liu et.al. | 2412.18296 | null |
| 2024-12-24 | Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization | Jiacai Liu et.al. | 2412.18279 | null |
| 2024-12-24 | Accelerating AIGC Services with Latent Action Diffusion Scheduling in Edge Networks | Changfu Xu et.al. | 2412.18212 | link |
| 2024-12-24 | Quantum framework for Reinforcement Learning: integrating Markov Decision Process, quantum arithmetic, and trajectory search | Thet Htar Su et.al. | 2412.18208 | null |
| 2024-12-24 | Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models | Xiaomeng Hu et.al. | 2412.18171 | null |
| 2024-12-23 | HyperQ-Opt: Q-learning for Hyperparameter Optimization | Md. Tarek Hasan et.al. | 2412.17765 | null |
| 2024-12-23 | Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking | Yun Liu et.al. | 2412.17730 | null |
| 2024-12-23 | SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC | Yue Deng et.al. | 2412.17707 | link |
| 2024-12-23 | Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning | Huchen Jiang et.al. | 2412.17397 | null |
| 2024-12-23 | Reinforcement Learning with a Focus on Adjusting Policies to Reach Targets | Akane Tsuboya et.al. | 2412.17344 | null |
| 2024-12-23 | Multimodal Deep Reinforcement Learning for Portfolio Optimization | Sumit Nawathe et.al. | 2412.17293 | null |
| 2024-12-23 | LMD-PGN: Cross-Modal Knowledge Distillation from First-Person-View Images to Third-Person-View BEV Maps for Universal Point Goal Navigation | Riku Uemura et.al. | 2412.17282 | null |
| 2024-12-23 | ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models | Chengran Yang et.al. | 2412.17264 | null |
| 2024-12-23 | A Coalition Game for On-demand Multi-modal 3D Automated Delivery System | Farzan Moosavi et.al. | 2412.17252 | null |
| 2024-12-23 | Model-free stochastic linear quadratic design by semidefinite programming | Jing Guo et.al. | 2412.17230 | null |
| 2024-12-20 | Offline Reinforcement Learning for LLM Multi-Step Reasoning | Huaijie Wang et.al. | 2412.16145 | null |
| 2024-12-20 | APIRL: Deep Reinforcement Learning for REST API Fuzzing | Myles Foley et.al. | 2412.15991 | link |
| 2024-12-20 | Active Flow Control for Bluff Body under High Reynolds Number Turbulent Flow Conditions Using Deep Reinforcement Learning | Jingbo Chen et.al. | 2412.15975 | null |
| 2024-12-20 | From General to Specific: Tailoring Large Language Models for Personalized Healthcare | Ruize Shi et.al. | 2412.15957 | null |
| 2024-12-20 | What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning | Yiran Ma et.al. | 2412.15904 | null |
| 2024-12-20 | Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback | Jiaming Ji et.al. | 2412.15838 | link |
| 2024-12-20 | MacLight: Multi-scene Aggregation Convolutional Learning for Traffic Signal Control | Sunbowen Lee et.al. | 2412.15703 | link |
| 2024-12-20 | AIR: Unifying Individual and Cooperative Exploration in Collective Multi-Agent Reinforcement Learning | Guangchong Zhou et.al. | 2412.15700 | link |
| 2024-12-20 | Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning | Lunjun Liu et.al. | 2412.15639 | null |
| 2024-12-20 | Dexterous Manipulation Based on Prior Dexterous Grasp Pose Knowledge | Hengxu Yan et.al. | 2412.15587 | null |
| 2024-12-19 | Qwen2.5 Technical Report | Qwen et.al. | 2412.15115 | null |
| 2024-12-19 | Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination | Leonardo Barcellona et.al. | 2412.14957 | null |
| 2024-12-19 | Effective Method with Compression for Distributed and Federated Cocoercive Variational Inequalities | Daniil Medyakov et.al. | 2412.14935 | null |
| 2024-12-19 | Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning | Anthony Kobanda et.al. | 2412.14865 | null |
| 2024-12-19 | Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning | Mohammadreza nakhaei et.al. | 2412.14834 | link |
| 2024-12-19 | Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning | Aditya Kapoor et.al. | 2412.14779 | null |
| 2024-12-19 | Learning to Generate Research Idea with Dynamic Control | Ruochen Li et.al. | 2412.14626 | null |
| 2024-12-19 | Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues | Tao He et.al. | 2412.14584 | null |
| 2024-12-19 | Single-Loop Federated Actor-Critic across Heterogeneous Environments | Ye Zhu et.al. | 2412.14555 | null |
| 2024-12-18 | Implementing TD3 to train a Neural Network to fly a Quadcopter through an FPV Gate | Patrick Thomas et.al. | 2412.14367 | null |
| 2024-12-18 | Learning from Massive Human Videos for Universal Humanoid Pose Control | Jiageng Mao et.al. | 2412.14172 | null |
| 2024-12-18 | Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective | Zhiyuan Zeng et.al. | 2412.14135 | null |
| 2024-12-18 | Alignment faking in large language models | Ryan Greenblatt et.al. | 2412.14093 | link |
| 2024-12-18 | Spatio-Temporal SIR Model of Pandemic Spread During Warfare with Optimal Dual-use Healthcare System Administration using Deep Reinforcement Learning | Adi Shuchami et.al. | 2412.14039 | null |
| 2024-12-18 | Robust Optimal Safe and Stability Guaranteeing Reinforcement Learning Control for Quadcopter | Sanghyoup Gu et.al. | 2412.14003 | null |
| 2024-12-18 | Harvesting energy from turbulent winds with Reinforcement Learning | Lorenzo Basile et.al. | 2412.13961 | null |
| 2024-12-18 | RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation | Kun Wu et.al. | 2412.13877 | null |
| 2024-12-18 | AI-Powered Algorithm-Centric Quantum Processor Topology Design | Tian Li et.al. | 2412.13805 | link |
| 2024-12-18 | Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN | Pengxiang Li et.al. | 2412.13795 | link |
| 2024-12-18 | A hybrid learning agent for episodic learning tasks with unknown target distance | Oliver Sefrin et.al. | 2412.13686 | null |
| 2024-12-17 | ExBody2: Advanced Expressive Humanoid Whole-Body Control | Mazeyu Ji et.al. | 2412.13196 | null |
| 2024-12-17 | Tilted Quantile Gradient Updates for Quantile-Constrained Reinforcement Learning | Chenglin Li et.al. | 2412.13184 | link |
| 2024-12-17 | Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions | Juan Del Aguila Ferrandis et.al. | 2412.13157 | null |
| 2024-12-17 | Practicable Black-box Evasion Attacks on Link Prediction in Dynamic Graphs – A Graph Sequential Embedding Method | Jiate Li et.al. | 2412.13134 | link |
| 2024-12-17 | Active Reinforcement Learning Strategies for Offline Policy Improvement | Ambedkar Dukkipati et.al. | 2412.13106 | null |
| 2024-12-17 | Reservoir Computing for Fast, Simplified Reinforcement Learning on Memory Tasks | Kevin McKee et.al. | 2412.13093 | null |
| 2024-12-17 | SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks | Mátyás Vincze et.al. | 2412.13053 | null |
| 2024-12-17 | Relational Neurosymbolic Markov Models | Lennert De Smet et.al. | 2412.13023 | null |
| 2024-12-17 | Future Aspects in Human Action Recognition: Exploring Emerging Techniques and Ethical Influences | Antonios Gasteratos et.al. | 2412.12990 | null |
| 2024-12-17 | Guiding Generative Protein Language Models with Reinforcement Learning | Filippo Stocco et.al. | 2412.12979 | null |
| 2024-12-16 | MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization | Bhavya Sukhija et.al. | 2412.12098 | null |
| 2024-12-16 | Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation | Eliot Xing et.al. | 2412.12089 | null |
| 2024-12-16 | Artificial Intelligence in Traffic Systems | Ritwik Raj Saxena et.al. | 2412.12046 | null |
| 2024-12-16 | Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps | Linfeng Zhao et.al. | 2412.12024 | null |
| 2024-12-16 | Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm | Rajat Khanda et.al. | 2412.12006 | null |
| 2024-12-16 | AlphaZero Neural Scaling and Zipf’s Law: a Tale of Board Games and Power Laws | Oren Neumann et.al. | 2412.11979 | link |
| 2024-12-16 | Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Qi Sun et.al. | 2412.11974 | link |
| 2024-12-16 | Hierarchical Meta-Reinforcement Learning via Automated Macro-Action Discovery | Minjae Cho et.al. | 2412.11930 | null |
| 2024-12-16 | Generalized Bayesian deep reinforcement learning | Shreya Sinha Roy et.al. | 2412.11743 | null |
| 2024-12-16 | Learning UAV-based path planning for efficient localization of objects using prior knowledge | Rick van Essen et.al. | 2412.11717 | null |
| 2024-12-13 | A Novel Framework Using Deep Reinforcement Learning for Join Order Selection | Chang Liu et.al. | 2412.10253 | null |
| 2024-12-13 | Physics Instrument Design with Reinforcement Learning | Shah Rukh Qasim et.al. | 2412.10237 | null |
| 2024-12-13 | Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation | Federico Julian Camerota Verdù et.al. | 2412.10163 | null |
| 2024-12-13 | AMUSE: Adaptive Model Updating using a Simulated Environment | Louis Chislett et.al. | 2412.10119 | null |
| 2024-12-13 | Reward Machine Inference for Robotic Manipulation | Mattijs Baert et.al. | 2412.10096 | null |
| 2024-12-13 | Optimized Coordination Strategy for Multi-Aerospace Systems in Pick-and-Place Tasks By Deep Neural Network | Ye Zhang et.al. | 2412.09877 | null |
| 2024-12-13 | RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning | Charles Xu et.al. | 2412.09858 | null |
| 2024-12-13 | ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression | Kai Yao et.al. | 2412.09812 | null |
| 2024-12-12 | GainAdaptor: Learning Quadrupedal Locomotion with Dual Actors for Adaptable and Energy-Efficient Walking on Various Terrains | Mincheol Kim et.al. | 2412.09520 | null |
| 2024-12-12 | Distributional Reinforcement Learning based Integrated Decision Making and Control for Autonomous Surface Vehicles | Xi Lin et.al. | 2412.09466 | link |
| 2024-12-12 | Learning to Adapt: Bio-Inspired Gait Strategies for Versatile Quadruped Locomotion | Joseph Humphreys et.al. | 2412.09440 | null |
| 2024-12-12 | Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer | Adam Labiosa et.al. | 2412.09417 | null |
| 2024-12-12 | Does Low Spoilage Under Cold Conditions Foster Cultural Complexity During the Foraging Era? – A Theoretical and Computational Inquiry | Minhyeok Lee et.al. | 2412.09335 | null |
| 2024-12-12 | Learning to be Indifferent in Complex Decisions: A Coarse Payoff-Assessment Model | Philippe Jehiel et.al. | 2412.09321 | null |
| 2024-12-12 | Learning Novel Skills from Language-Generated Demonstrations | Ao-Qun Jin et.al. | 2412.09286 | null |
| 2024-12-12 | Student-Informed Teacher Training | Nico Messikommer et.al. | 2412.09149 | null |
| 2024-12-12 | Reconfigurable Intelligent Surface for Internet of Robotic Things | Wanli Ni et.al. | 2412.09117 | null |
| 2024-12-12 | In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning | Songjun Tu et.al. | 2412.09104 | null |
| 2024-12-11 | Learning Sketch Decompositions in Planning via Deep Reinforcement Learning | Michael Aichmüller et.al. | 2412.08574 | null |
| 2024-12-11 | GenPlan: Generative sequence models as adaptive planners | Akash Karthikeyan et.al. | 2412.08565 | null |
| 2024-12-11 | An End-to-End Collaborative Learning Approach for Connected Autonomous Vehicles in Occluded Scenarios | Leandro Parada et.al. | 2412.08562 | null |
| 2024-12-11 | MaestroMotif: Skill Design from Artificial Intelligence Feedback | Martin Klissarov et.al. | 2412.08542 | null |
| 2024-12-11 | Subspace-wise Hybrid RL for Articulated Object Manipulation | Yujin Kim et.al. | 2412.08522 | null |
| 2024-12-11 | Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation | Huiyuan Lai et.al. | 2412.08473 | null |
| 2024-12-11 | IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health | Gauri Jain et.al. | 2412.08463 | link |
| 2024-12-11 | SINERGYM – A virtual testbed for building energy optimization with Reinforcement Learning | Alejandro Campoy-Nieves et.al. | 2412.08293 | link |
| 2024-12-11 | Coarse-to-Fine: A Dual-Phase Channel-Adaptive Method for Wireless Image Transmission | Hanlei Li et.al. | 2412.08211 | null |
| 2024-12-11 | Learn How to Query from Unlabeled Data Streams in Federated Learning | Yuchang Sun et.al. | 2412.08138 | link |
| 2024-12-10 | Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control | Chenhao Lu et.al. | 2412.07773 | null |
| 2024-12-10 | Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data | Zhiyuan Zhou et.al. | 2412.07762 | null |
| 2024-12-10 | Optimizing Sensor Redundancy in Sequential Decision-Making Problems | Jonas Nüßlein et.al. | 2412.07686 | null |
| 2024-12-10 | Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization | Zongkai Liu et.al. | 2412.07639 | null |
| 2024-12-10 | Swarm Behavior Cloning | Jonas Nüßlein et.al. | 2412.07617 | null |
| 2024-12-10 | Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery | Amin Abyaneh et.al. | 2412.07544 | null |
| 2024-12-10 | ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning | Hongshu Guo et.al. | 2412.07507 | null |
| 2024-12-10 | Optimizing pulsed blowing parameters for active separation control in a one-sided diffuser using reinforcement learning | Alexandra Müller et.al. | 2412.07480 | null |
| 2024-12-10 | Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulation for Time-Efficient Fine-Resolution Policy Learning | Yuki Kadokawa et.al. | 2412.07477 | null |
| 2024-12-10 | RLT4Rec: Reinforcement Learning Transformer for User Cold Start and Item Recommendation | Dilina Chandika Rajapakse et.al. | 2412.07403 | null |
| 2024-12-09 | Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning | Ali Devran Kara et.al. | 2412.06735 | null |
| 2024-12-09 | Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone | Max Sobol Mark et.al. | 2412.06685 | null |
| 2024-12-09 | Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures | Adrien Bolland et.al. | 2412.06655 | null |
| 2024-12-09 | Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation | Egor Cherepanov et.al. | 2412.06531 | null |
| 2024-12-09 | SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation | Catalin E. Brita et.al. | 2412.06486 | link |
| 2024-12-09 | Edge Delayed Deep Deterministic Policy Gradient: efficient continuous control for edge scenarios | Alberto Sinigaglia et.al. | 2412.06390 | null |
| 2024-12-09 | Tracking control of latent dynamic systems with application to spacecraft attitude control | Congxi Zhang et.al. | 2412.06342 | null |
| 2024-12-09 | Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi | F. Bredell et.al. | 2412.06333 | null |
| 2024-12-09 | Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information | Junqiao Wang et.al. | 2412.06313 | null |
| 2024-12-09 | A Scalable Decentralized Reinforcement Learning Framework for UAV Target Localization Using Recurrent PPO | Leon Fernando et.al. | 2412.06231 | null |
| 2024-12-06 | Reinforcement Learning: An Overview | Kevin Murphy et.al. | 2412.05265 | null |
| 2024-12-06 | TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | Qian Long et.al. | 2412.05255 | link |
| 2024-12-06 | LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds | James Beetham et.al. | 2412.05232 | null |
| 2024-12-06 | FlowPolicy: Enabling Fast and Robust 3D Flow-based Policy via Consistency Flow Matching for Robot Manipulation | Qinglun Zhang et.al. | 2412.04987 | null |
| 2024-12-06 | Putting the Iterative Training of Decision Trees to the Test on a Real-World Robotic Task | Raphael C. Engelhardt et.al. | 2412.04974 | null |
| 2024-12-06 | DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling | Minzheng Wang et.al. | 2412.04905 | link |
| 2024-12-06 | Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment | Ran Tian et.al. | 2412.04835 | null |
| 2024-12-06 | Learning-based Control for Tendon-Driven Continuum Robotic Arms | Nima Maghooli et.al. | 2412.04829 | null |
| 2024-12-06 | A Temporally Correlated Latent Exploration for Reinforcement Learning | SuMin Oh et.al. | 2412.04775 | null |
| 2024-12-06 | Measuring Goal-Directedness | Matt MacDermott et.al. | 2412.04758 | null |
| 2024-12-05 | Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy | Keru Chen et.al. | 2412.04426 | null |
| 2024-12-05 | Intersection-Aware Assessment of EMS Accessibility in NYC: A Data-Driven Approach | Haoran Su et.al. | 2412.04369 | null |
| 2024-12-05 | Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting | Edoardo Cetin et.al. | 2412.04368 | null |
| 2024-12-05 | Reinforcement Learning for Freeway Lane-Change Regulation via Connected Vehicles | Ke Sun et.al. | 2412.04341 | null |
| 2024-12-05 | Action Mapping for Reinforcement Learning in Continuous Environments with Constraints | Mirco Theile et.al. | 2412.04327 | null |
| 2024-12-05 | GRAM: Generalization in Deep RL with a Robust Adaptation Module | James Queeney et.al. | 2412.04323 | link |
| 2024-12-05 | Reinforcement Learning from Wild Animal Videos | Elliot Chane-Sane et.al. | 2412.04273 | null |
| 2024-12-05 | HyperMARL: Adaptive Hypernetworks for Multi-Agent RL | Kale-ab Abebe Tessera et.al. | 2412.04233 | null |
| 2024-12-05 | A Dynamic Safety Shield for Safe and Efficient Reinforcement Learning of Navigation Tasks | Murad Dawood et.al. | 2412.04153 | null |
| 2024-12-05 | Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning | Shicheng Zhou et.al. | 2412.04078 | link |
| 2024-12-04 | AI-Driven Day-to-Day Route Choice | Leizhen Wang et.al. | 2412.03338 | null |
| 2024-12-04 | Rotograb: Combining Biomimetic Hands with Industrial Grippers using a Rotating Thumb | Arnaud Bersier et.al. | 2412.03279 | null |
| 2024-12-04 | Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning | Mianchu Wang et.al. | 2412.03258 | null |
| 2024-12-04 | Alignment at Pre-training! Towards Native Alignment for Arabic LLMs | Juhao Liang et.al. | 2412.03253 | link |
| 2024-12-04 | Variable-Speed Teaching-Playback as Real-World Data Augmentation for Imitation Learning | Nozomu Masuya et.al. | 2412.03252 | null |
| 2024-12-04 | Using Deep Reinforcement Learning to Enhance Channel Sampling Patterns in Integrated Sensing and Communication | Federico Mason et.al. | 2412.03157 | null |
| 2024-12-04 | Experience-driven discovery of planning strategies | Ruiqi He et.al. | 2412.03111 | null |
| 2024-12-04 | Less is More: A Stealthy and Efficient Adversarial Attack Method for DRL-based Autonomous Driving Policies | Junchao Fan et.al. | 2412.03051 | null |
| 2024-12-04 | Learning Whole-Body Loco-Manipulation for Omni-Directional Task Space Pose Tracking with a Wheeled-Quadrupedal-Manipulator | Kaiwen Jiang et.al. | 2412.03012 | null |
| 2024-12-04 | Data Acquisition for Improving Model Fairness using Reinforcement Learning | Jahid Hasan et.al. | 2412.03009 | null |
| 2024-12-03 | UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping | Wenbo Wang et.al. | 2412.02699 | link |
| 2024-12-03 | Preliminary Investigation into Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving | Yupeng Zheng et.al. | 2412.02689 | null |
| 2024-12-03 | T-REG: Preference Optimization with Token-Level Reward Regularization | Wenxuan Zhou et.al. | 2412.02685 | link |
| 2024-12-03 | AI-Driven Resource Allocation Framework for Microservices in Hybrid Cloud Platforms | Biman Barua et.al. | 2412.02610 | null |
| 2024-12-03 | Explainable CTR Prediction via LLM Reasoning | Xiaohan Yu et.al. | 2412.02588 | null |
| 2024-12-03 | Mobile Cell-Free Massive MIMO with Multi-Agent Reinforcement Learning: A Scalable Framework | Ziheng Liu et.al. | 2412.02581 | null |
| 2024-12-03 | Generating Critical Scenarios for Testing Automated Driving Systems | Trung-Hieu Nguyen et.al. | 2412.02574 | link |
| 2024-12-03 | Cooperative Cruising: Reinforcement Learning based Time-Headway Control for Increased Traffic Efficiency | Yaron Veksler et.al. | 2412.02520 | null |
| 2024-12-03 | Reinforcement learning to learn quantum states for Heisenberg scaling accuracy | Jeongwoo Jae et.al. | 2412.02334 | null |
| 2024-12-03 | Optimizing Plastic Waste Collection in Water Bodies Using Heterogeneous Autonomous Surface Vehicles with Deep Reinforcement Learning | Alejandro Mendoza Barrionuevo et.al. | 2412.02316 | null |
| 2024-11-29 | PDDLFuse: A Tool for Generating Diverse Planning Domains | Vedant Khandelwal et.al. | 2411.19886 | null |
| 2024-11-29 | CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives | Armin Saghafian et.al. | 2411.19787 | link |
| 2024-11-29 | HVAC-DPT: A Decision Pretrained Transformer for HVAC Control | Anaïs Berkes et.al. | 2411.19746 | null |
| 2024-11-29 | Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning | Severin Bochem et.al. | 2411.19732 | null |
| 2024-11-29 | RMIO: A Model-Based MARL Framework for Scenarios with Observation Loss in Some Agents | Shi Zifeng et.al. | 2411.19639 | null |
| 2024-11-29 | Build An Influential Bot In Social Media Simulations With Large Language Models | Bailu Jin et.al. | 2411.19635 | null |
| 2024-11-29 | Adaptive dynamics of Ising spins in one dimension leveraging Reinforcement Learning | Anish Kumar et.al. | 2411.19602 | null |
| 2024-11-29 | Solving Rubik’s Cube Without Tricky Sampling | Yicheng Lin et.al. | 2411.19583 | null |
| 2024-11-29 | Training Agents with Weakly Supervised Feedback from Large Language Models | Dihong Gong et.al. | 2411.19547 | null |
| 2024-11-29 | A Local Information Aggregation based Multi-Agent Reinforcement Learning for Robot Swarm Dynamic Task Allocation | Yang Lv et.al. | 2411.19526 | null |
| 2024-11-27 | Robust Offline Reinforcement Learning with Linearly Structured $f$ -Divergence Regularization | Cheng Tang et.al. | 2411.18612 | null |
| 2024-11-27 | A Talent-infused Policy-gradient Approach to Efficient Co-Design of Morphology and Task Allocation Behavior of Multi-Robot Systems | Prajit KrisshnaKumar et.al. | 2411.18519 | null |
| 2024-11-27 | G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation | Tianxing Chen et.al. | 2411.18369 | null |
| 2024-11-27 | Two-Timescale Digital Twin Assisted Model Interference and Retraining over Wireless Network | Jiayi Cong et.al. | 2411.18329 | null |
| 2024-11-27 | Application of Soft Actor-Critic Algorithms in Optimizing Wastewater Treatment with Time Delays Integration | Esmaeel Mohammadi et.al. | 2411.18305 | null |
| 2024-11-27 | NeoHebbian Synapses to Accelerate Online Training of Neuromorphic Hardware | Shubham Pande et.al. | 2411.18272 | null |
| 2024-11-27 | Dynamic Retail Pricing via Q-Learning – A Reinforcement Learning Framework for Enhanced Revenue Management | Mohit Apte et.al. | 2411.18261 | null |
| 2024-11-27 | Dependency-Aware CAV Task Scheduling via Diffusion-Based Reinforcement Learning | Xiang Cheng et.al. | 2411.18230 | null |
| 2024-11-27 | Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning | Di Zhang et.al. | 2411.18203 | link |
| 2024-11-27 | Learning for Long-Horizon Planning via Neuro-Symbolic Abductive Imitation | Jie-Jing Shao et.al. | 2411.18201 | link |
| 2024-11-26 | Multi-Objective Reinforcement Learning for Automated Resilient Cyber Defence | Ross O’Driscoll et.al. | 2411.17585 | null |
| 2024-11-26 | Ensuring Safety in Target Pursuit Control: A CBF-Safe Reinforcement Learning Approach | Yaosheng Deng et.al. | 2411.17552 | null |
| 2024-11-26 | IMPROVE: Improving Medical Plausibility without Reliance on HumanValidation – An Enhanced Prototype-Guided Diffusion Framework | Anurag Shandilya et.al. | 2411.17535 | null |
| 2024-11-26 | Spatially Visual Perception for End-to-End Robotic Learning | Travis Davies et.al. | 2411.17458 | null |
| 2024-11-26 | BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving | Teng Wang et.al. | 2411.17404 | null |
| 2024-11-26 | Joint Combinatorial Node Selection and Resource Allocations in the Lightning Network using Attention-based Reinforcement Learning | Mahdi Salahshour et.al. | 2411.17353 | null |
| 2024-11-26 | SIL-RRT*: Learning Sampling Distribution through Self Imitation Learning | Xuzhe Dang et.al. | 2411.17293 | null |
| 2024-11-26 | LHPF: Look back the History and Plan for the Future in Autonomous Driving | Sheng Wang et.al. | 2411.17253 | null |
| 2024-11-26 | Self-reconfiguration Strategies for Space-distributed Spacecraft | Tianle Liu et.al. | 2411.17137 | null |
| 2024-11-26 | LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward Ensemble | Yujeong Lee et.al. | 2411.17135 | null |
| 2024-11-25 | Self-Generated Critiques Boost Reward Modeling for Language Models | Yue Yu et.al. | 2411.16646 | null |
| 2024-11-25 | Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation | Muhammad Burhan Hafez et.al. | 2411.16532 | link |
| 2024-11-25 | Reinforcement Learning for Bidding Strategy Optimization in Day-Ahead Energy Market | Luca Di Persio et.al. | 2411.16519 | null |
| 2024-11-25 | Unsupervised Event Outlier Detection in Continuous Time | Somjit Nath et.al. | 2411.16427 | null |
| 2024-11-25 | CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning | Duo Wu et.al. | 2411.16313 | null |
| 2024-11-25 | Probing for Consciousness in Machines | Mathis Immertreu et.al. | 2411.16262 | null |
| 2024-11-25 | Multi-Robot Reliable Navigation in Uncertain Topological Environments with Graph Attention Networks | Zhuoyuan Yu et.al. | 2411.16134 | null |
| 2024-11-25 | End-to-End Steering for Autonomous Vehicles via Conditional Imitation Co-Learning | Mahmoud M. Kishky et.al. | 2411.16131 | null |
| 2024-11-25 | Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision Masks | Rui Zuo et.al. | 2411.16120 | null |
| 2024-11-25 | M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling | Youngmin Oh et.al. | 2411.16019 | null |
| 2024-11-22 | WildLMa: Long Horizon Loco-Manipulation in the Wild | Ri-Zhao Qiu et.al. | 2411.15131 | null |
| 2024-11-22 | Learning-based Trajectory Tracking for Bird-inspired Flapping-Wing Robots | Jiaze Cai et.al. | 2411.15130 | null |
| 2024-11-22 | TÜLU 3: Pushing Frontiers in Open Language Model Post-Training | Nathan Lambert et.al. | 2411.15124 | link |
| 2024-11-22 | On Multi-Agent Inverse Reinforcement Learning | Till Freihaut et.al. | 2411.15046 | null |
| 2024-11-22 | Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium | Zeyang Li et.al. | 2411.15036 | null |
| 2024-11-22 | On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations | Guojun Xiong et.al. | 2411.15014 | null |
| 2024-11-22 | Free Energy Projective Simulation (FEPS): Active inference with interpretability | Joséphine Pazem et.al. | 2411.14991 | null |
| 2024-11-22 | Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation | Huy Le et.al. | 2411.14913 | null |
| 2024-11-22 | Segmenting Action-Value Functions Over Time-Scales in SARSA using TD( $Δ$ ) | Mahammad Humayoo et.al. | 2411.14783 | null |
| 2024-11-22 | Enhancing Molecular Design through Graph-based Topological Reinforcement Learning | Xiangyu Zhang et.al. | 2411.14726 | null |
| 2024-11-21 | Multi-Agent Environments for Vehicle Routing Problems | Ricardo Gama et.al. | 2411.14411 | null |
| 2024-11-21 | Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions | Yu Zhao et.al. | 2411.14405 | link |
| 2024-11-21 | 23 DoF Grasping Policies from a Raw Point Cloud | Martin Matak et.al. | 2411.14400 | null |
| 2024-11-21 | Model Checking for Reinforcement Learning in Autonomous Driving: One Can Do More Than You Think! | Rong Gu et.al. | 2411.14375 | null |
| 2024-11-21 | Convex Approximation of Probabilistic Reachable Sets from Small Samples Using Self-supervised Neural Networks | Jun Xiang et.al. | 2411.14356 | null |
| 2024-11-21 | Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect | Ojash Neopane et.al. | 2411.14341 | null |
| 2024-11-21 | Explainable Multi-Agent Reinforcement Learning for Extended Reality Codec Adaptation | Pedro Enrique Iturria-Rivera et.al. | 2411.14264 | null |
| 2024-11-21 | Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs | Zeyu Dong et.al. | 2411.14256 | null |
| 2024-11-21 | Natural Language Reinforcement Learning | Xidong Feng et.al. | 2411.14251 | link |
| 2024-11-21 | Umbrella Reinforcement Learning – computationally efficient tool for hard non-linear problems | Egor E. Nuzhin et.al. | 2411.14117 | null |
| 2024-11-20 | BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Davide Paglieri et.al. | 2411.13543 | link |
| 2024-11-20 | Metacognition for Unknown Situations and Environments (MUSE) | Rodolfo Valiente et.al. | 2411.13537 | null |
| 2024-11-20 | Robust Monocular Visual Odometry using Curriculum Learning | Assaf Lahiany et.al. | 2411.13438 | null |
| 2024-11-20 | A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback | Alireza Rashidi Laleh et.al. | 2411.13410 | null |
| 2024-11-20 | Fine-tuning Myoelectric Control through Reinforcement Learning in a Game Environment | Kilian Freitag et.al. | 2411.13327 | null |
| 2024-11-20 | Backward Stochastic Control System with Entropy Regularization | Ziyue Chen et.al. | 2411.13219 | null |
| 2024-11-20 | ViSTa Dataset: Do vision-language models understand sequential tasks? | Evžen Wybitul et.al. | 2411.13211 | link |
| 2024-11-20 | Engagement-Driven Content Generation with Large Language Models | Erica Coppolillo et.al. | 2411.13187 | null |
| 2024-11-20 | Learning Time-Optimal and Speed-Adjustable Tactile In-Hand Manipulation | Johannes Pitz et.al. | 2411.13148 | null |
| 2024-11-20 | ReinFog: A DRL Empowered Framework for Resource Management in Edge and Cloud Computing Environments | Zhiyu Wang et.al. | 2411.13121 | null |
| 2024-11-19 | ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models | Salma Kharrat et.al. | 2411.12736 | link |
| 2024-11-19 | Reinforcement Learning, Collusion, and the Folk Theorem | Galit Askenazi-Golan et.al. | 2411.12725 | null |
| 2024-11-19 | UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments | Chunru Lin et.al. | 2411.12711 | null |
| 2024-11-19 | Instant Policy: In-Context Imitation Learning via Graph Diffusion | Vitalis Vosylius et.al. | 2411.12633 | null |
| 2024-11-19 | Robotic transcatheter tricuspid valve replacement with hybrid enhanced intelligence: a new paradigm and first-in-vivo study | Shuangyi Wang et.al. | 2411.12478 | null |
| 2024-11-19 | Variable-Frequency Imitation Learning for Variable-Speed Motion | Nozomu Masuya et.al. | 2411.12310 | null |
| 2024-11-19 | Emergence of Implicit World Models from Mortal Agents | Kazuya Horibe et.al. | 2411.12304 | null |
| 2024-11-19 | DT-RaDaR: Digital Twin Assisted Robot Navigation using Differential Ray-Tracing | Sunday Amatare et.al. | 2411.12284 | null |
| 2024-11-19 | Error-Feedback Model for Output Correction in Bilateral Control-Based Imitation Learning | Hiroshi Sato et.al. | 2411.12255 | null |
| 2024-11-19 | Efficient Training in Multi-Agent Reinforcement Learning: A Communication-Free Framework for the Box-Pushing Problem | David Ge et.al. | 2411.12246 | null |
| 2024-11-18 | Design And Optimization Of Multi-rendezvous Manoeuvres Based On Reinforcement Learning And Convex Optimization | Antonio López Rivera et.al. | 2411.11778 | null |
| 2024-11-18 | High-Speed Cornering Control and Real-Vehicle Deployment for Autonomous Electric Vehicles | Shiyue Zhao et.al. | 2411.11762 | null |
| 2024-11-18 | Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework | Yannick Metz et.al. | 2411.11761 | null |
| 2024-11-18 | Aligning Few-Step Diffusion Models with Dense Reward Difference Learning | Ziyi Zhang et.al. | 2411.11727 | link |
| 2024-11-18 | Bitcoin Under Volatile Block Rewards: How Mempool Statistics Can Influence Bitcoin Mining | Roozbeh Sarenche et.al. | 2411.11702 | null |
| 2024-11-18 | Robust Reinforcement Learning under Diffusion Models for Data with Jumps | Chenyang Jiang et.al. | 2411.11697 | null |
| 2024-11-18 | Coevolution of Opinion Dynamics and Recommendation System: Modeling Analysis and Reinforcement Learning Based Manipulation | Yuhong Chen et.al. | 2411.11687 | null |
| 2024-11-18 | No-regret Exploration in Shuffle Private Reinforcement Learning | Shaojie Bai et.al. | 2411.11647 | null |
| 2024-11-18 | Signaling and Social Learning in Swarms of Robots | Leo Cazenille et.al. | 2411.11616 | null |
| 2024-11-18 | A Pre-Trained Graph-Based Model for Adaptive Sequencing of Educational Documents | Jean Vassoyan et.al. | 2411.11520 | null |
| 2024-11-15 | Mitigating Parameter Degeneracy using Joint Conditional Diffusion Model for WECC Composite Load Model in Power Systems | Feiqin Zhu et.al. | 2411.10431 | null |
| 2024-11-15 | Continual Adversarial Reinforcement Learning (CARL) of False Data Injection detection: forgetting and explainability | Pooja Aslami et.al. | 2411.10367 | null |
| 2024-11-15 | BMP: Bridging the Gap between B-Spline and Movement Primitives | Weiran Liao et.al. | 2411.10336 | null |
| 2024-11-15 | Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review | Hossein Hassani et.al. | 2411.10268 | null |
| 2024-11-15 | Learning Generalizable 3D Manipulation With 10 Demonstrations | Yu Ren et.al. | 2411.10203 | null |
| 2024-11-15 | The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning | Moritz Schneider et.al. | 2411.10175 | null |
| 2024-11-15 | Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles | Anant Garg et.al. | 2411.10171 | null |
| 2024-11-15 | Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention | Libo Wang et.al. | 2411.10156 | link |
| 2024-11-15 | That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design | Anna Goldie et.al. | 2411.10053 | null |
| 2024-11-15 | Enforcing Cooperative Safety for Reinforcement Learning-based Mixed-Autonomy Platoon Control | Jingyuan Zhou et.al. | 2411.10031 | null |
| 2024-11-14 | A Risk Sensitive Contract-unified Reinforcement Learning Approach for Option Hedging | Xianhua Peng et.al. | 2411.09659 | null |
| 2024-11-14 | Motion Before Action: Diffusing Object Motion as Manipulation Condition | Yup Su et.al. | 2411.09658 | null |
| 2024-11-14 | Tailoring interactions between active nematic defects with reinforcement learning | Carlos Floyd et.al. | 2411.09588 | null |
| 2024-11-14 | Developement of Reinforcement Learning based Optimisation Method for Side-Sill Design | Aditya Borse et.al. | 2411.09499 | null |
| 2024-11-14 | Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment | Yuang Cai et.al. | 2411.09341 | null |
| 2024-11-14 | Socio-Economic Consequences of Generative AI: A Review of Methodological Approaches | Carlos J. Costa et.al. | 2411.09313 | null |
| 2024-11-14 | Enhancing reinforcement learning for population setpoint tracking in co-cultures | Sebastián Espinel-Ríos et.al. | 2411.09177 | null |
| 2024-11-14 | Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging | Bo Wang et.al. | 2411.09176 | null |
| 2024-11-14 | Rationality based Innate-Values-driven Reinforcement Learning | Qin Yang et.al. | 2411.09160 | null |
| 2024-11-14 | Secrecy Energy Efficiency Maximization in IRS-Assisted VLC MISO Networks with RSMA: A DS-PPO approach | Yangbo Guo et.al. | 2411.09146 | null |
| 2024-11-13 | LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs | Piyush Jha et.al. | 2411.08862 | null |
| 2024-11-13 | Goal-oriented Semantic Communication for Robot Arm Reconstruction in Digital Twin: Feature and Temporal Selections | Shutong Chen et.al. | 2411.08835 | null |
| 2024-11-13 | Recommender systems and reinforcement learning for building control and occupant interaction: A text-mining driven review of scientific literature | Wenhao Zhang et.al. | 2411.08734 | null |
| 2024-11-13 | Joint Model Caching and Resource Allocation in Generative AI-Enabled Wireless Edge Networks | Zhang Liu et.al. | 2411.08672 | null |
| 2024-11-13 | Estimating unknown parameters in differential equations with a reinforcement learning based PSO method | Wenkui Sun et.al. | 2411.08651 | null |
| 2024-11-13 | Towards Secure Intelligent O-RAN Architecture: Vulnerabilities, Threats and Promising Technical Solutions using LLMs | Mojdeh Karbalaee Motalleb et.al. | 2411.08640 | null |
| 2024-11-13 | Robot See, Robot Do: Imitation Reward for Noisy Financial Environments | Sven Goluža et.al. | 2411.08637 | null |
| 2024-11-13 | Precision-Focused Reinforcement Learning Model for Robotic Object Pushing | Lara Bergmann et.al. | 2411.08622 | link |
| 2024-11-13 | Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent | Leonidas Askianakis et.al. | 2411.08566 | null |
| 2024-11-13 | Towards Practical Deep Schedulers for Allocating Cellular Radio Resources | Petteri Kela et.al. | 2411.08529 | null |
| 2024-11-12 | Learning Memory Mechanisms for Decision Making through Demonstrations | William Yue et.al. | 2411.07954 | link |
| 2024-11-12 | Doubly Mild Generalization for Offline Reinforcement Learning | Yixiu Mao et.al. | 2411.07934 | link |
| 2024-11-12 | Scaling policy iteration based reinforcement learning for unknown discrete-time linear systems | Zhen Pang et.al. | 2411.07825 | null |
| 2024-11-12 | Navigation with QPHIL: Quantizing Planner for Hierarchical Implicit Q-Learning | Alexi Canesse et.al. | 2411.07760 | null |
| 2024-11-12 | Optimizing Traffic Signal Control using High-Dimensional State Representation and Efficient Deep Reinforcement Learning | Lawrence Francis et.al. | 2411.07759 | null |
| 2024-11-12 | EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners | Niklas Hanselmann et.al. | 2411.07719 | null |
| 2024-11-12 | Test Where Decisions Matter: Importance-driven Testing for Deep Reinforcement Learning | Stefan Pranger et.al. | 2411.07700 | null |
| 2024-11-12 | Exploring Multi-Agent Reinforcement Learning for Unrelated Parallel Machine Scheduling | Maria Zampella et.al. | 2411.07634 | null |
| 2024-11-12 | Direct Preference Optimization Using Sparse Feature-Level Constraints | Qingyu Yin et.al. | 2411.07618 | null |
| 2024-11-12 | Entropy Controllable Direct Preference Optimization | Motoki Omura et.al. | 2411.07595 | null |
| 2024-11-11 | ‘Explaining RL Decisions with Trajectories’: A Reproducibility Study | Karim Abdel Sadek et.al. | 2411.07200 | link |
| 2024-11-11 | Joint Age-State Belief is All You Need: Minimizing AoII via Pull-Based Remote Estimation | Ismail Cosandal et.al. | 2411.07179 | null |
| 2024-11-11 | Learning Multi-Agent Collaborative Manipulation for Long-Horizon Quadrupedal Pushing | Chuye Hong et.al. | 2411.07104 | null |
| 2024-11-11 | A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs | Myeongsoo Kim et.al. | 2411.07098 | null |
| 2024-11-11 | OCMDP: Observation-Constrained Markov Decision Process | Taiyi Wang et.al. | 2411.07087 | null |
| 2024-11-11 | To Train or Not to Train: Balancing Efficiency and Training Cost in Deep Reinforcement Learning for Mobile Edge Computing | Maddalena Boscaro et.al. | 2411.07086 | null |
| 2024-11-11 | Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching | Arnav Kumar Jain et.al. | 2411.07007 | link |
| 2024-11-11 | Enhancing Robot Assistive Behaviour with Reinforcement Learning and Theory of Mind | Antonio Andriella et.al. | 2411.07003 | link |
| 2024-11-11 | Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration | Xingrui Yu et.al. | 2411.06965 | null |
| 2024-11-11 | Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC | Aditya Soni et.al. | 2411.06815 | null |
| 2024-11-08 | Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles | Jonas Kiemel et.al. | 2411.05784 | null |
| 2024-11-08 | Tract-RLFormer: A Tract-Specific RL policy based Decoder-only Transformer Network | Ankita Joshi et.al. | 2411.05757 | null |
| 2024-11-08 | Topology-aware Reinforcement Feature Space Reconstruction for Graph Data | Wangyang Ying et.al. | 2411.05742 | null |
| 2024-11-08 | Renewable Energy Powered and Open RAN-based Architecture for 5G Fixed Wireless Access Provisioning in Rural Areas | Anselme Ndikumana et.al. | 2411.05699 | null |
| 2024-11-08 | Data-Driven Distributed Common Operational Picture from Heterogeneous Platforms using Multi-Agent Reinforcement Learning | Indranil Sur et.al. | 2411.05683 | null |
| 2024-11-08 | Digital Twin Backed Closed-Loops for Energy-Aware and Open RAN-based Fixed Wireless Access Serving Rural Areas | Anselme Ndikumana et.al. | 2411.05664 | null |
| 2024-11-08 | Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey | Zhihong Liu et.al. | 2411.05614 | null |
| 2024-11-08 | Smart navigation through a rotating barrier: Deep reinforcement learning with application to size-based separation of active microagents | Mohammad Hossein Masoudi et.al. | 2411.05587 | null |
| 2024-11-08 | Tangled Program Graphs as an alternative to DRL-based control algorithms for UAVs | Hubert Szolc et.al. | 2411.05586 | null |
| 2024-11-08 | Towards Active Flow Control Strategies Through Deep Reinforcement Learning | Ricard Montalà et.al. | 2411.05536 | null |
| 2024-11-07 | Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games | Usman Anwar et.al. | 2411.04976 | link |
| 2024-11-07 | A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model | Panwen Hu et.al. | 2411.04942 | null |
| 2024-11-07 | Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion | Kaizhe Hu et.al. | 2411.04919 | link |
| 2024-11-07 | Evaluating Robustness of Reinforcement Learning Algorithms for Autonomous Shipping | Bavo Lesy et.al. | 2411.04915 | null |
| 2024-11-07 | Think Smart, Act SMARL! Analyzing Probabilistic Logic Driven Safety in Multi-Agent Reinforcement Learning | Satchit Chatterji et.al. | 2411.04867 | link |
| 2024-11-07 | Asymptotic regularity of a generalised stochastic Halpern scheme with applications | Nicholas Pischke et.al. | 2411.04845 | null |
| 2024-11-07 | Plasticity Loss in Deep Reinforcement Learning: A Survey | Timo Klein et.al. | 2411.04832 | null |
| 2024-11-07 | Harnessing the Power of Gradient-Based Simulations for Multi-Objective Optimization in Particle Accelerators | Kishansingh Rajput et.al. | 2411.04817 | null |
| 2024-11-07 | AllGaits: Learning All Quadruped Gaits and Transitions | Guillaume Bellegarda et.al. | 2411.04787 | null |
| 2024-11-07 | Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning | Zuzanna Osika et.al. | 2411.04784 | link |
| 2024-11-06 | A Comparative Study of Deep Reinforcement Learning for Crop Production Management | Joseph Balderas et.al. | 2411.04106 | null |
| 2024-11-06 | Interpretable and Efficient Data-driven Discovery and Control of Distributed Systems | Florian Wolf et.al. | 2411.04098 | null |
| 2024-11-06 | Memorized action chunking with Transformers: Imitation learning for vision-based tissue surface scanning | Bochen Yang et.al. | 2411.04050 | null |
| 2024-11-06 | Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset | Alexandre Galashov et.al. | 2411.04034 | null |
| 2024-11-06 | Predicting and Publishing Accurate Imbalance Prices Using Monte Carlo Tree Search | Fabio Pavirani et.al. | 2411.04011 | null |
| 2024-11-06 | Object-Centric Dexterous Manipulation from Human Motion Data | Yuanpei Chen et.al. | 2411.04005 | null |
| 2024-11-06 | ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy | Chenrui Tie et.al. | 2411.03990 | null |
| 2024-11-06 | AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making | Yizhe Huang et.al. | 2411.03865 | link |
| 2024-11-06 | Beyond The Rainbow: High Performance Deep Reinforcement Learning On A Desktop PC | Tyler Clark et.al. | 2411.03820 | null |
| 2024-11-06 | From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning | Zhirui Deng et.al. | 2411.03817 | null |
| 2024-11-05 | Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy For Visuomotor Imitation Learning | George Jiayuan Gao et.al. | 2411.03294 | null |
| 2024-11-05 | Pre-trained Visual Dynamics Representations for Efficient Policy Learning | Hao Luo et.al. | 2411.03169 | null |
| 2024-11-05 | Hierarchical Orchestra of Policies | Thomas P Cannon et.al. | 2411.03008 | null |
| 2024-11-05 | Accelerating Task Generalisation with Multi-Level Hierarchical Options | Thomas P Cannon et.al. | 2411.02998 | null |
| 2024-11-05 | Autonomous Decision Making for UAV Cooperative Pursuit-Evasion Game with Reinforcement Learning | Yang Zhao et.al. | 2411.02983 | null |
| 2024-11-05 | Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation | Francisco Giral et.al. | 2411.02975 | null |
| 2024-11-05 | Embedding Safety into RL: A New Take on Trust Region Methods | Nikola Milosevic et.al. | 2411.02957 | null |
| 2024-11-05 | The Unreasonable Effectiveness of LLMs for Query Optimization | Peter Akioyamen et.al. | 2411.02862 | link |
| 2024-11-05 | ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate | Shohei Taniguchi et.al. | 2411.02853 | link |
| 2024-11-05 | When to Localize? A Risk-Constrained Reinforcement Learning Approach | Chak Lam Shek et.al. | 2411.02788 | null |
| 2024-11-04 | Simulation of Nanorobots with Artificial Intelligence and Reinforcement Learning for Advanced Cancer Cell Detection and Tracking | Shahab Kavousinejad et.al. | 2411.02345 | link |
| 2024-11-04 | WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | Zehan Qi et.al. | 2411.02337 | null |
| 2024-11-04 | Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback | Marcus Williams et.al. | 2411.02306 | link |
| 2024-11-04 | N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs | Ilya Zisman et.al. | 2411.01958 | null |
| 2024-11-04 | RoboCrowd: Scaling Robot Data Collection through Crowdsourcing | Suvir Mirchandani et.al. | 2411.01915 | null |
| 2024-11-04 | Efficient Active Imitation Learning with Random Network Distillation | Emilien Biré et.al. | 2411.01894 | null |
| 2024-11-04 | Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback | Guan-Ting Lin et.al. | 2411.01834 | null |
| 2024-11-04 | Risk-sensitive control as inference with Rényi divergence | Kaito Ito et.al. | 2411.01827 | null |
| 2024-11-04 | IRS-Enhanced Secure Semantic Communication Networks: Cross-Layer and Context-Awared Resource Allocation | Lingyi Wang et.al. | 2411.01821 | null |
| 2024-11-04 | So You Think You Can Scale Up Autonomous Robot Data Collection? | Suvir Mirchandani et.al. | 2411.01813 | null |
| 2024-10-31 | EgoMimic: Scaling Imitation Learning via Egocentric Video | Simar Kareer et.al. | 2410.24221 | link |
| 2024-10-31 | Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use | Jiajun Xi et.al. | 2410.24218 | link |
| 2024-10-31 | ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs | Yuchen Yang et.al. | 2410.24214 | null |
| 2024-10-31 | Zonal RL-RRT: Integrated RL-RRT Path Planning with Collision Probability and Zone Connectivity | AmirMohammad Tahmasbi et.al. | 2410.24205 | link |
| 2024-10-31 | DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning | Zhenyu Jiang et.al. | 2410.24185 | null |
| 2024-10-31 | Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning | Jiaqi Liu et.al. | 2410.24152 | null |
| 2024-10-31 | Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers | Kai Yan et.al. | 2410.24108 | link |
| 2024-10-31 | Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning | Nabil Omi et.al. | 2410.24096 | null |
| 2024-10-31 | 3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing | Binghao Huang et.al. | 2410.24091 | null |
| 2024-10-31 | Demystifying Linear MDPs and Novel Dynamics Aggregation Framework | Joongkyu Lee et.al. | 2410.24089 | null |
| 2024-10-30 | Keypoint Abstraction using Large Models for Object-Relative Imitation Learning | Xiaolin Fang et.al. | 2410.23254 | null |
| 2024-10-30 | Carrot and Stick: Eliciting Comparison Data and Beyond | Yiling Chen et.al. | 2410.23243 | null |
| 2024-10-30 | A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment | Matteo G. Mecattaf et.al. | 2410.23242 | null |
| 2024-10-30 | COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences | Yixin Liu et.al. | 2410.23223 | link |
| 2024-10-31 | Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval | Sheryl Hsu et.al. | 2410.23214 | null |
| 2024-10-30 | Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks | Michael Matthews et.al. | 2410.23208 | null |
| 2024-10-30 | Energy-Efficient Intra-Domain Network Slicing for Multi-Layer Orchestration in Intelligent-Driven Distributed 6G Networks: Learning Generic Assignment Skills with Unsupervised Reinforcement Learning | Navideh Ghafouri et.al. | 2410.23161 | null |
| 2024-10-30 | VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning | Yichao Liang et.al. | 2410.23156 | null |
| 2024-10-30 | From Hype to Reality: The Road Ahead of Deploying DRL in 6G Networks | Haiyuan Li et.al. | 2410.23086 | null |
| 2024-10-30 | Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation | Samuele Peri et.al. | 2410.23031 | null |
| 2024-10-29 | Environment as Policy: Learning to Race in Unseen Tracks | Hongze Wang et.al. | 2410.22308 | null |
| 2024-10-29 | EconoJax: A Fast & Scalable Economic Simulation in Jax | Koen Ponse et.al. | 2410.22165 | link |
| 2024-10-29 | Learning Successor Features the Simple Way | Raymond Chua et.al. | 2410.22133 | null |
| 2024-10-29 | PC-Gym: Benchmark Environments For Process Control Problems | Maximilian Bloor et.al. | 2410.22093 | null |
| 2024-10-29 | PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference | Kendong Liu et.al. | 2410.21966 | null |
| 2024-10-29 | Human-Readable Programs as Actors of Reinforcement Learning Agents Using Critic-Moderated Evolution | Senne Deproost et.al. | 2410.21940 | link |
| 2024-10-29 | Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning | Jianlan Luo et.al. | 2410.21845 | link |
| 2024-10-29 | Robot Policy Learning with Temporal Optimal Transport Reward | Yuwei Fu et.al. | 2410.21795 | link |
| 2024-10-29 | Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem | Shaan Ul Haque et.al. | 2410.21704 | null |
| 2024-10-29 | Sequential choice in ordered bundles | Rajeev Kohli et.al. | 2410.21670 | null |
| 2024-10-28 | LongReward: Improving Long-context Large Language Models with AI Feedback | Jiajie Zhang et.al. | 2410.21252 | link |
| 2024-10-28 | Quantum Reinforcement Learning-Based Two-Stage Unit Commitment Framework for Enhanced Power Systems Robustness | Xiang Wei et.al. | 2410.21240 | null |
| 2024-10-28 | Offline Reinforcement Learning With Combinatorial Action Spaces | Matthew Landers et.al. | 2410.21151 | null |
| 2024-10-28 | Robustness and Generalization in Quantum Reinforcement Learning via Lipschitz Regularization | Nico Meyer et.al. | 2410.21117 | link |
| 2024-10-28 | Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment | Yi Zheng et.al. | 2410.21109 | null |
| 2024-10-28 | Stronger Regret Bounds for Safe Online Reinforcement Learning in the Linear Quadratic Regulator | Benjamin Schiffer et.al. | 2410.21081 | null |
| 2024-10-28 | Getting By Goal Misgeneralization With a Little Help From a Mentor | Tu Trinh et.al. | 2410.21052 | null |
| 2024-10-28 | FairStream: Fair Multimedia Streaming Benchmark for Reinforcement Learning Agents | Jannis Weil et.al. | 2410.21029 | null |
| 2024-10-28 | Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies | Franck Djeumou et.al. | 2410.20990 | null |
| 2024-10-28 | BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks | Yunhan Zhao et.al. | 2410.20971 | null |
| 2024-10-25 | Adversarial Environment Design via Regret-Guided Diffusion Models | Hojun Chung et.al. | 2410.19715 | null |
| 2024-10-25 | DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control | Md Faizal Karim et.al. | 2410.19712 | null |
| 2024-10-25 | MILES: Making Imitation Learning Easy with Self-Supervision | Georgios Papagiannis et.al. | 2410.19693 | null |
| 2024-10-25 | Automated generation of photonic circuits for Bell tests with homodyne measurements | Corentin Lanore et.al. | 2410.19670 | null |
| 2024-10-25 | MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services | Hongjia Wu et.al. | 2410.19665 | null |
| 2024-10-25 | Shared Control with Black Box Agents using Oracle Queries | Inbal Avraham et.al. | 2410.19612 | null |
| 2024-10-25 | OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | Hongliang He et.al. | 2410.19609 | link |
| 2024-10-25 | Diverse Sign Language Translation | Xin Shen et.al. | 2410.19586 | null |
| 2024-10-25 | Robotic Learning in your Backyard: A Neural Simulator from Open Source Components | Liyou Zhou et.al. | 2410.19564 | null |
| 2024-10-25 | AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design | Francisco Erivaldo Fernandes Junior et.al. | 2410.19528 | null |
| 2024-10-24 | SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment | Caelan Garrett et.al. | 2410.18907 | null |
| 2024-10-24 | Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks | Graziano A. Manduzio et.al. | 2410.18890 | null |
| 2024-10-24 | Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences | Weijian Luo et.al. | 2410.18881 | null |
| 2024-10-24 | Learning Collusion in Episodic, Inventory-Constrained Markets | Paul Friedrich et.al. | 2410.18871 | null |
| 2024-10-24 | Towards Visual Text Design Transfer Across Languages | Yejin Choi et.al. | 2410.18823 | null |
| 2024-10-24 | PointPatchRL – Masked Reconstruction Improves Reinforcement Learning on Point Clouds | Balázs Gyenes et.al. | 2410.18800 | null |
| 2024-10-24 | Adapting MLOps for Diverse In-Network Intelligence in 6G Era: Challenges and Solutions | Peizheng Li et.al. | 2410.18793 | null |
| 2024-10-24 | Data Scaling Laws in Imitation Learning for Robotic Manipulation | Fanqi Lin et.al. | 2410.18647 | link |
| 2024-10-24 | Multi-agent cooperation through learning-aware policy gradients | Alexander Meulemans et.al. | 2410.18636 | null |
| 2024-10-24 | Leveraging Graph Neural Networks and Multi-Agent Reinforcement Learning for Inventory Control in Supply Chains | Niki Kotecha et.al. | 2410.18631 | null |
| 2024-10-23 | Prioritized Generative Replay | Renhao Wang et.al. | 2410.18082 | null |
| 2024-10-23 | Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration | Max Wilcoxson et.al. | 2410.18076 | link |
| 2024-10-23 | SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation | Zihan Zhou et.al. | 2410.18065 | null |
| 2024-10-23 | Cross-lingual Transfer of Reward Models in Multilingual Alignment | Jiwoo Hong et.al. | 2410.18027 | link |
| 2024-10-23 | Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning | Nguyen Van Huynh et.al. | 2410.17971 | null |
| 2024-10-23 | Slot: Provenance-Driven APT Detection through Graph Reinforcement Learning | Wei Qiao et.al. | 2410.17910 | null |
| 2024-10-23 | Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity | Philip Amortila et.al. | 2410.17904 | null |
| 2024-10-23 | Scalable Offline Reinforcement Learning for Mean Field Games | Axel Brunnbauer et.al. | 2410.17898 | null |
| 2024-10-23 | Learning Versatile Skills with Curriculum Masking | Yao Tang et.al. | 2410.17744 | link |
| 2024-10-23 | Optimizing Load Scheduling in Power Grids Using Reinforcement Learning and Markov Decision Processes | Dongwen Luo et.al. | 2410.17696 | null |
| 2024-10-22 | Few-shot In-Context Preference Learning Using Large Language Models | Chao Yu et.al. | 2410.17233 | null |
| 2024-10-22 | DyPNIPP: Predicting Environment Dynamics for RL-based Robust Informative Path Planning | Srujan Deolasee et.al. | 2410.17186 | null |
| 2024-10-22 | Reinforcement learning on structure-conditioned categorical diffusion for protein inverse folding | Yasha Ektefaie et.al. | 2410.17173 | link |
| 2024-10-22 | Reinforcement Learning for Data-Driven Workflows in Radio Interferometry. I. Principal Demonstration in Calibration | Brian M. Kirk et.al. | 2410.17135 | null |
| 2024-10-22 | Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards | Alexander G. Padula et.al. | 2410.17126 | link |
| 2024-10-22 | Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning | Haining Wang et.al. | 2410.17088 | link |
| 2024-10-22 | Delay-Constrained Grant-Free Random Access in MIMO Systems: Distributed Pilot Allocation and Power Control | Jianan Bai et.al. | 2410.17068 | null |
| 2024-10-22 | Optimal Design for Reward Modeling in RLHF | Antoine Scheid et.al. | 2410.17055 | null |
| 2024-10-22 | Proleptic Temporal Ensemble for Improving the Speed of Robot Tasks Generated by Imitation Learning | Hyeonjun Park et.al. | 2410.16981 | null |
| 2024-10-22 | Safe Load Balancing in Software-Defined-Networking | Lam Dinh et.al. | 2410.16846 | null |
| 2024-10-21 | Improve Vision Language Model Chain-of-thought Reasoning | Ruohong Zhang et.al. | 2410.16198 | link |
| 2024-10-21 | RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style | Yantao Liu et.al. | 2410.16184 | link |
| 2024-10-21 | SMART: Self-learning Meta-strategy Agent for Reasoning Tasks | Rongxing Liu et.al. | 2410.16128 | link |
| 2024-10-21 | Statistical Inference for Temporal Difference Learning with Linear Function Approximation | Weichen Wu et.al. | 2410.16106 | null |
| 2024-10-21 | A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models | Yue Deng et.al. | 2410.16024 | link |
| 2024-10-21 | Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality | Raghav Bongole et.al. | 2410.16013 | null |
| 2024-10-21 | ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning | Yue Yang et.al. | 2410.15994 | null |
| 2024-10-21 | Learning Quadrotor Control From Visual Features Using Differentiable Simulation | Johannes Heeg et.al. | 2410.15979 | null |
| 2024-10-21 | Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning | Hanlin Yang et.al. | 2410.15910 | null |
| 2024-10-21 | FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL | Woosung Koh et.al. | 2410.15876 | link |
| 2024-10-18 | Online Reinforcement Learning with Passive Memory | Anay Pattanaik et.al. | 2410.14665 | null |
| 2024-10-18 | A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning | Shengjie Sun et.al. | 2410.14660 | null |
| 2024-10-18 | Harnessing Causality in Reinforcement Learning With Bagged Decision Times | Daiqi Gao et.al. | 2410.14659 | null |
| 2024-10-18 | Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments | Mariusz Wisniewski et.al. | 2410.14616 | link |
| 2024-10-18 | Streaming Deep Reinforcement Learning Finally Works | Mohamed Elsayed et.al. | 2410.14606 | link |
| 2024-10-18 | Reinforcement Learning in Non-Markov Market-Making | Luca Lalor et.al. | 2410.14504 | null |
| 2024-10-18 | Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping | Kavinayan P. Sivakumar et.al. | 2410.14484 | null |
| 2024-10-18 | DRL Optimization Trajectory Generation via Wireless Network Intent-Guided Diffusion Models for Optimizing Resource Allocation | Junjie Wu et.al. | 2410.14481 | null |
| 2024-10-18 | From Simple to Complex: Knowledge Transfer in Safe and Efficient Reinforcement Learning for Autonomous Driving | Rongliang Zhou et.al. | 2410.14468 | null |
| 2024-10-18 | MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation | Toby Godfrey et.al. | 2410.14383 | null |
| 2024-10-17 | Diffusing States and Matching Scores: A New Framework for Imitation Learning | Runzhe Wu et.al. | 2410.13855 | link |
| 2024-10-17 | ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization | Chen Bo Calvin Zhang et.al. | 2410.13837 | link |
| 2024-10-17 | A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement | Hui Yuan et.al. | 2410.13828 | link |
| 2024-10-17 | Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation | Jean-Pierre Sleiman et.al. | 2410.13817 | null |
| 2024-10-17 | Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible? | Argyrios Gerogiannis et.al. | 2410.13772 | null |
| 2024-10-17 | Transformer Guided Coevolution: Improved Team Formation in Multiagent Adversarial Games | Pranav Rajbhandari et.al. | 2410.13769 | null |
| 2024-10-17 | Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design | Chenyu Wang et.al. | 2410.13643 | link |
| 2024-10-17 | Ornstein-Uhlenbeck Adaptation as a Mechanism for Learning in Brains and Machines | Jesus Garcia Fernandez et.al. | 2410.13563 | null |
| 2024-10-17 | Contracting With a Reinforcement Learning Agent by Playing Trick or Treat | Matteo Bollini et.al. | 2410.13520 | null |
| 2024-10-17 | Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning | Yoav Alon et.al. | 2410.13501 | null |
| 2024-10-16 | Neural-based Control for CubeSat Docking Maneuvers | Matteo Stoisa et.al. | 2410.12703 | null |
| 2024-10-16 | Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach | Henrique Donâncio et.al. | 2410.12598 | null |
| 2024-10-16 | Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving | Sihao Wu et.al. | 2410.12568 | null |
| 2024-10-16 | Spectrum Sharing using Deep Reinforcement Learning in Vehicular Networks | Riya Dinesh Deshpande et.al. | 2410.12521 | null |
| 2024-10-16 | Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL | Jared Joselowitz et.al. | 2410.12491 | null |
| 2024-10-16 | SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling | Loris Gaven et.al. | 2410.12481 | null |
| 2024-10-16 | Sharpness-Aware Black-Box Optimization | Feiyang Ye et.al. | 2410.12457 | null |
| 2024-10-16 | AoI-Aware Resource Allocation for Smart Multi-QoS Provisioning | Jingqing Wang et.al. | 2410.12384 | null |
| 2024-10-16 | PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | Markus J. Buehler et.al. | 2410.12375 | link |
| 2024-10-16 | GAN Based Top-Down View Synthesis in Reinforcement Learning Environments | Usama Younus et.al. | 2410.12372 | null |
| 2024-10-15 | Molecular Quantum Control Algorithm Design by Reinforcement Learning | Anastasia Pipi et.al. | 2410.11839 | null |
| 2024-10-15 | Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions | Ayush Jain et.al. | 2410.11833 | null |
| 2024-10-15 | Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies | Zixuan Chen et.al. | 2410.11825 | null |
| 2024-10-15 | Solving The Dynamic Volatility Fitting Problem: A Deep Reinforcement Learning Approach | Emmanuel Gnabeyeu et.al. | 2410.11789 | null |
| 2024-10-15 | Zero-shot Model-based Reinforcement Learning using Large Language Models | Abdelhakim Benechehab et.al. | 2410.11711 | link |
| 2024-10-15 | BlendRL: A Framework for Merging Symbolic and Neural Policy Learning | Hikaru Shindo et.al. | 2410.11689 | null |
| 2024-10-15 | Understanding Likelihood Over-optimisation in Direct Alignment Algorithms | Zhengyan Shi et.al. | 2410.11677 | null |
| 2024-10-15 | Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents | Federico Pizarro Bejarano et.al. | 2410.11671 | link |
| 2024-10-15 | Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search | Jiamian Li et.al. | 2410.11642 | null |
| 2024-10-15 | DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment | Wendi Chen et.al. | 2410.11584 | link |
| 2024-10-14 | Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation | Youwei Yu et.al. | 2410.10766 | null |
| 2024-10-14 | Online Statistical Inference for Time-varying Sample-averaged Q-learning | Saunak Kumar Panda et.al. | 2410.10737 | null |
| 2024-10-14 | Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach | Rory Young et.al. | 2410.10674 | null |
| 2024-10-14 | Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning | William A. Stigall et.al. | 2410.10660 | null |
| 2024-10-14 | DR-MPC: Deep Residual Model Predictive Control for Real-world Social Navigation | James R. Han et.al. | 2410.10646 | null |
| 2024-10-14 | Traversability-Aware Legged Navigation by Learning from Real-World Visual Data | Hongbo Zhang et.al. | 2410.10621 | null |
| 2024-10-14 | Online waveform selection for cognitive radar | Thulasi Tholeti et.al. | 2410.10591 | null |
| 2024-10-14 | STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack | Naman Gupta et.al. | 2410.10584 | null |
| 2024-10-14 | Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes | Juan Sebastian Rojas et.al. | 2410.10578 | null |
| 2024-10-14 | Continual Deep Reinforcement Learning to Prevent Catastrophic Forgetting in Jamming Mitigation | Kemal Davaslioglu et.al. | 2410.10521 | null |
| 2024-10-11 | Hierarchical Universal Value Function Approximators | Rushiv Arora et.al. | 2410.08997 | null |
| 2024-10-11 | Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control | Devdhar Patel et.al. | 2410.08979 | null |
| 2024-10-11 | MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL | Claas A Voelcker et.al. | 2410.08896 | null |
| 2024-10-11 | Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient | Wenlong Wang et.al. | 2410.08893 | link |
| 2024-10-11 | Adaptive optimization of wave energy conversion in oscillatory wave surge converters via SPH simulation and deep reinforcement learning | Mai Ye et.al. | 2410.08871 | null |
| 2024-10-11 | Can we hop in general? A discussion of benchmark selection and design using the Hopper environment | Claas A Voelcker et.al. | 2410.08870 | null |
| 2024-10-11 | Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving | Zijiang Yan et.al. | 2410.08854 | null |
| 2024-10-11 | Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback | Michelle Zhao et.al. | 2410.08852 | null |
| 2024-10-11 | Public Transport Network Design for Equality of Accessibility via Message Passing Neural Networks and Reinforcement Learning | Duo Wang et.al. | 2410.08841 | null |
| 2024-10-11 | SOLD: Reinforcement Learning with Slot Object-Centric Latent Dynamics | Malte Mosbach et.al. | 2410.08822 | null |
| 2024-10-10 | GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment | Yuancheng Xu et.al. | 2410.08193 | null |
| 2024-10-10 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | Amrith Setlur et.al. | 2410.08146 | null |
| 2024-10-10 | VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers | Jianing Qi et.al. | 2410.08048 | null |
| 2024-10-10 | Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching | Xiaoshan Lin et.al. | 2410.08022 | null |
| 2024-10-10 | Neuroplastic Expansion in Deep Reinforcement Learning | Jiashun Liu et.al. | 2410.07994 | null |
| 2024-10-10 | Variational Inequality Methods for Multi-Agent Reinforcement Learning: Performance and Stability Gains | Baraah A. M. Sidahmed et.al. | 2410.07976 | null |
| 2024-10-10 | AI Surrogate Model for Distributed Computing Workloads | David K. Park et.al. | 2410.07940 | null |
| 2024-10-10 | Offline Hierarchical Reinforcement Learning via Inverse Optimization | Carolin Schmidt et.al. | 2410.07933 | null |
| 2024-10-10 | Efficient Reinforcement Learning with Large Language Model Priors | Xue Yan et.al. | 2410.07927 | null |
| 2024-10-10 | Meta-Learning Integration in Hierarchical Reinforcement Learning for Advanced Task Complexity | Arash Khajooeinejad et.al. | 2410.07921 | link |
| 2024-10-09 | One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation | Fabian Paischer et.al. | 2410.07170 | null |
| 2024-10-09 | Retrieval-Augmented Decision Transformer: External Memory for In-context RL | Thomas Schmied et.al. | 2410.07071 | null |
| 2024-10-09 | Safe Reinforcement Learning Filter for Multicopter Collision-Free Tracking under disturbances | Qihan Qi et.al. | 2410.06852 | null |
| 2024-10-09 | A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering | Qihan Qi et.al. | 2410.06847 | null |
| 2024-10-09 | Transfer Learning for a Class of Cascade Dynamical Systems | Shima Rabiei et.al. | 2410.06828 | null |
| 2024-10-09 | Deep End-to-End Survival Analysis with Temporal Consistency | Mariana Vargas Vieyra et.al. | 2410.06786 | null |
| 2024-10-09 | Q-WSL:Leveraging Dynamic Programming for Weighted Supervised Learning in Goal-conditioned RL | Xing Lei et.al. | 2410.06648 | null |
| 2024-10-09 | Variations in Multi-Agent Actor-Critic Frameworks for Joint Optimizations in UAV Swarm Networks: Recent Evolution, Challenges, and Directions | Muhammad Morshed Alam et.al. | 2410.06627 | null |
| 2024-10-09 | Effective Exploration Based on the Structural Information Principles | Xianghua Zeng et.al. | 2410.06621 | null |
| 2024-10-09 | Disturbance Observer-based Control Barrier Functions with Residual Model Learning for Safe Reinforcement Learning | Dvij Kalaria et.al. | 2410.06570 | null |
| 2024-10-07 | DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control | Kaifeng Zhao et.al. | 2410.05260 | null |
| 2024-10-07 | SePPO: Semi-Policy Preference Optimization for Diffusion Alignment | Daoan Zhang et.al. | 2410.05255 | link |
| 2024-10-07 | ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control | Ehsan Futuhi et.al. | 2410.05225 | null |
| 2024-10-07 | Smart Jamming Attack and Mitigation on Deep Transfer Reinforcement Learning Enabled Resource Allocation for Network Slicing | Shavbo Salehi et.al. | 2410.05153 | null |
| 2024-10-07 | PAMLR: A Passive-Active Multi-Armed Bandit-Based Solution for LoRa Channel Allocation | Jihoon Yun et.al. | 2410.05147 | null |
| 2024-10-07 | Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning | Ayano Hiranaka et.al. | 2410.05116 | null |
| 2024-10-07 | AlphaRouter: Quantum Circuit Routing with Reinforcement Learning and Tree Search | Wei Tang et.al. | 2410.05115 | null |
| 2024-10-07 | Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools | Filippo A. Spinelli et.al. | 2410.05093 | null |
| 2024-10-07 | HE-Drive: Human-Like End-to-End Driving with Vision Language Models | Junming Wang et.al. | 2410.05051 | null |
| 2024-10-07 | Active Fine-Tuning of Generalist Policies | Marco Bagatella et.al. | 2410.05026 | null |
| 2024-10-04 | Learning Humanoid Locomotion over Challenging Terrain | Ilija Radosavovic et.al. | 2410.03654 | null |
| 2024-10-04 | Aligning LLMs with Individual Preferences via Interaction | Shujin Wu et.al. | 2410.03642 | link |
| 2024-10-04 | Robust Offline Imitation Learning from Diverse Auxiliary Data | Udita Ghosh et.al. | 2410.03626 | null |
| 2024-10-04 | Open-World Reinforcement Learning over Long Short-Term Imagination | Jiajian Li et.al. | 2410.03618 | null |
| 2024-10-04 | Training on more Reachable Tasks for Generalisation in Reinforcement Learning | Max Weltevrede et.al. | 2410.03565 | null |
| 2024-10-04 | GAP-RL: Grasps As Points for RL Towards Dynamic Object Grasping | Pengwei Xie et.al. | 2410.03509 | null |
| 2024-10-04 | STREAMS: An Assistive Multimodal AI Framework for Empowering Biosignal Based Robotic Controls | Ali Rabiee et.al. | 2410.03486 | null |
| 2024-10-04 | Deep Reinforcement Learning for Delay-Optimized Task Offloading in Vehicular Fog Computin | Mohammad Parsa Toopchinezhad et.al. | 2410.03472 | null |
| 2024-10-04 | CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control | Guy Tevet et.al. | 2410.03441 | link |
| 2024-10-04 | ToolGen: Unified Tool Retrieval and Calling via Generation | Renxi Wang et.al. | 2410.03439 | link |
| 2024-10-03 | ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI | Ahmad Elawady et.al. | 2410.02751 | link |
| 2024-10-03 | MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions | Yekun Chai et.al. | 2410.02743 | link |
| 2024-10-03 | DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects | Zhaowei Wang et.al. | 2410.02730 | link |
| 2024-10-03 | Grounded Answers for Multi-agent Decision-making Problem through Generative World Model | Zeyang Liu et.al. | 2410.02664 | null |
| 2024-10-03 | Beyond Expected Returns: A Policy Gradient Algorithm for Cumulative Prospect Theoretic Reinforcement Learning | Olivier Lepel et.al. | 2410.02605 | null |
| 2024-10-03 | Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance | Joshua McClellan et.al. | 2410.02581 | null |
| 2024-10-03 | Machine Learning Approaches for Active Queue Management: A Survey, Taxonomy, and Future Directions | Mohammad Parsa Toopchinezhad et.al. | 2410.02563 | null |
| 2024-10-03 | Semantic-Guided RL for Interpretable Feature Engineering | Mohamed Bouadi et.al. | 2410.02519 | null |
| 2024-10-03 | Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments | Vasanth Reddy Baddam et.al. | 2410.02516 | null |
| 2024-10-03 | A Hitchhiker’s Guide To Active Motion | Tobias Plasczyk et.al. | 2410.02515 | null |
| 2024-10-02 | Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space | Yangming Li et.al. | 2410.01796 | null |
| 2024-10-02 | Open Human-Robot Collaboration using Decentralized Inverse Reinforcement Learning | Prasanth Sengadu Suresh et.al. | 2410.01790 | null |
| 2024-10-02 | Investigating on RLHF methodology | Alexey Kutalev et.al. | 2410.01789 | null |
| 2024-10-02 | Social coordination perpetuates stereotypic expectations and behaviors across generations in deep multi-agent reinforcement learning | Rebekah A. Gelpí et.al. | 2410.01763 | null |
| 2024-10-02 | PreND: Enhancing Intrinsic Motivation in Reinforcement Learning through Pre-trained Network Distillation | Mohammadamin Davoodabadi et.al. | 2410.01745 | null |
| 2024-10-02 | Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning | Xingrui Gu et.al. | 2410.01739 | null |
| 2024-10-02 | Evaluating Robustness of Reward Models for Mathematical Reasoning | Sunghwan Kim et.al. | 2410.01729 | null |
| 2024-10-02 | Performant, Memory Efficient and Scalable Multi-Agent Reinforcement Learning | Omayma Mahjoub et.al. | 2410.01706 | null |
| 2024-10-02 | VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment | Amirhossein Kazemnejad et.al. | 2410.01679 | link |
| 2024-10-02 | Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning | Jason Piquenot et.al. | 2410.01661 | null |
| 2024-09-30 | Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning | Zhishuai Liu et.al. | 2409.20521 | null |
| 2024-09-30 | Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation | Fukang Liu et.al. | 2409.20514 | null |
| 2024-09-30 | The Perfect Blend: Redefining RLHF with Mixture of Judges | Tengyu Xu et.al. | 2409.20370 | null |
| 2024-10-01 | Enhancing GANs with Contrastive Learning-Based Multistage Progressive Finetuning SNN and RL-Based External Optimization | Osama Mustafa et.al. | 2409.20340 | null |
| 2024-09-30 | MARLadona – Towards Cooperative Team Play Using Multi-Agent Reinforcement Learning | Zichong Li et.al. | 2409.20326 | null |
| 2024-09-30 | RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning | Yuxuan Wu et.al. | 2409.20291 | null |
| 2024-09-30 | Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning | Junlin Lu et.al. | 2409.20258 | link |
| 2024-09-30 | Professor X: Manipulating EEG BCI with Invisible and Robust Backdoor Attack | Xuan-Hao Liu et.al. | 2409.20158 | null |
| 2024-09-30 | GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation | Yangtao Chen et.al. | 2409.20154 | null |
| 2024-09-30 | DRLinSPH: An open-source platform using deep reinforcement learning and SPHinXsys for fluid-structure-interaction problems | Mai Ye et.al. | 2409.20134 | null |
| 2024-09-27 | Robust Deep Reinforcement Learning for Volt-VAR Optimization in Active Distribution System under Uncertainty | Zhengrong Chen et.al. | 2409.18937 | null |
| 2024-09-27 | HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models | Yu Zhou et.al. | 2409.18893 | null |
| 2024-09-27 | ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning | Jannis Becktepe et.al. | 2409.18827 | link |
| 2024-09-27 | LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis | Hamed Babaei Giglou et.al. | 2409.18812 | null |
| 2024-09-27 | Autoregressive Policy Optimization for Constrained Allocation Tasks | David Winkel et.al. | 2409.18735 | link |
| 2024-09-27 | Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning | Sheikh Salman Hassan et.al. | 2409.18718 | null |
| 2024-09-27 | Refutation of Spectral Graph Theory Conjectures with Search Algorithms) | Milo Roucairol et.al. | 2409.18626 | null |
| 2024-09-27 | TemporalPaD: a reinforcement-learning framework for temporal feature representation and dimension reduction | Xuechen Mu et.al. | 2409.18597 | null |
| 2024-09-27 | Climate Adaptation with Reinforcement Learning: Experiments with Flooding and Transportation in Copenhagen | Miguel Costa et.al. | 2409.18574 | null |
| 2024-09-27 | Cost-Aware Dynamic Cloud Workflow Scheduling using Self-Attention and Evolutionary Reinforcement Learning | Ya Shen et.al. | 2409.18444 | null |
| 2024-09-26 | Inverse Reinforcement Learning with Multiple Planning Horizons | Jiayu Yao et.al. | 2409.18051 | null |
| 2024-09-26 | Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles | Lewei He et.al. | 2409.18014 | null |
| 2024-09-26 | LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots | Peilin Wu et.al. | 2409.17992 | null |
| 2024-09-26 | Navigation in a simplified Urban Flow through Deep Reinforcement Learning | Federica Tonti et.al. | 2409.17922 | null |
| 2024-09-26 | Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions | David Olivares et.al. | 2409.17896 | null |
| 2024-09-26 | Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness | Jian Li et.al. | 2409.17791 | link |
| 2024-09-26 | Robust Ladder Climbing with a Quadrupedal Robot | Dylan Vogel et.al. | 2409.17731 | null |
| 2024-09-26 | Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization | Kaden Uhlig et.al. | 2409.17673 | null |
| 2024-09-26 | Hierarchical End-to-End Autonomous Driving: Integrating BEV Perception with Deep Reinforcement Learning | Siyi Lu et.al. | 2409.17659 | null |
| 2024-09-26 | FactorSim: Generative Simulation via Factorized Representation | Fan-Yun Sun et.al. | 2409.17652 | null |
| 2024-09-25 | Learning with Dynamics: Autonomous Regulation of UAV Based Communication Networks with Dynamic UAV Crew | Ran Zhang et.al. | 2409.17139 | null |
| 2024-09-25 | Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action | Xin Chen et.al. | 2409.17138 | null |
| 2024-09-25 | On-orbit Servicing for Spacecraft Collision Avoidance With Autonomous Decision Making | Susmitha Patnala et.al. | 2409.17125 | null |
| 2024-09-25 | AI-Driven Risk-Aware Scheduling for Active Debris Removal Missions | Antoine Poupon et.al. | 2409.17012 | null |
| 2024-09-25 | Multi-Robot Informative Path Planning for Efficient Target Mapping using Deep Reinforcement Learning | Apoorva Vashisth et.al. | 2409.16967 | link |
| 2024-09-25 | Dynamic Obstacle Avoidance through Uncertainty-Based Adaptive Planning with Diffusion | Vineet Punyamoorty et.al. | 2409.16950 | null |
| 2024-09-25 | Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering | Wanqi Yang et.al. | 2409.16909 | null |
| 2024-09-25 | Revisiting Space Mission Planning: A Reinforcement Learning-Guided Approach for Multi-Debris Rendezvous | Agni Bandyopadhyay et.al. | 2409.16882 | null |
| 2024-09-25 | Behavior evolution-inspired approach to walking gait reinforcement training for quadruped robots | Yu Wang et.al. | 2409.16862 | null |
| 2024-09-25 | Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing | Lyudong Jin et.al. | 2409.16832 | null |
| 2024-09-24 | A Critical Review of Safe Reinforcement Learning Techniques in Smart Grid Applications | Van-Hai Bui et.al. | 2409.16256 | null |
| 2024-09-24 | Context-Based Meta Reinforcement Learning for Robust and Adaptable Peg-in-Hole Assembly Tasks | Ahmed Shokry et.al. | 2409.16208 | null |
| 2024-09-24 | Microsecond-Latency Feedback at a Particle Accelerator by Online Reinforcement Learning on Hardware | Luca Scomparin et.al. | 2409.16177 | null |
| 2024-09-24 | The Digital Transformation in Health: How AI Can Improve the Performance of Health Systems | África Periáñez et.al. | 2409.16098 | null |
| 2024-09-24 | Whole-body end-effector pose tracking | Tifanny Portela et.al. | 2409.16048 | null |
| 2024-09-24 | Bridging Environments and Language with Rendering Functions and Vision-Language Models | Theo Cachet et.al. | 2409.16024 | null |
| 2024-09-24 | Provably Efficient Exploration in Inverse Constrained Reinforcement Learning | Bo Yue et.al. | 2409.15963 | null |
| 2024-09-24 | Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning | Sukai Huang et.al. | 2409.15922 | null |
| 2024-09-24 | Multi-UAV Pursuit-Evasion with Online Planning in Unknown Environments by Deep Reinforcement Learning | Jiayu Chen et.al. | 2409.15866 | null |
| 2024-09-24 | Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection | Matteo Zecchin et.al. | 2409.15844 | null |
| 2024-09-18 | DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control | Zichen Jeff Cui et.al. | 2409.12192 | null |
| 2024-09-18 | Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games | Ravi Pandya et.al. | 2409.12153 | null |
| 2024-09-18 | Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features | Jiuqi Wang et.al. | 2409.12135 | null |
| 2024-09-18 | Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | An Yang et.al. | 2409.12122 | null |
| 2024-09-18 | IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition | Rui Liu et.al. | 2409.12092 | null |
| 2024-09-18 | Generalized Robot Learning Framework | Jiahuan Yan et.al. | 2409.12061 | null |
| 2024-09-23 | Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning | Jonas Günster et.al. | 2409.12045 | link |
| 2024-09-18 | Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning | Claude Formanek et.al. | 2409.12001 | null |
| 2024-09-18 | Data-Efficient Quadratic Q-Learning Using LMIs | J. S. van Hulst et.al. | 2409.11986 | null |
| 2024-09-18 | Reinforcement Learning with Lie Group Orientations for Robotics | Martin Schuck et.al. | 2409.11935 | null |
| 2024-09-17 | UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning | Kathakoli Sengupta et.al. | 2409.11403 | null |
| 2024-09-17 | Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids | Caio Fabio Oliveira da Silva et.al. | 2409.11267 | null |
| 2024-09-17 | Attacking Slicing Network via Side-channel Reinforcement Learning Attack | Wei Shao et.al. | 2409.11258 | null |
| 2024-09-17 | LLM-as-a-Judge & Reward Model: What They Can and Cannot Do | Guijin Son et.al. | 2409.11239 | null |
| 2024-09-17 | Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems | Jake Welde et.al. | 2409.11238 | null |
| 2024-09-17 | Linear Jamming Bandits: Learning to Jam 5G-based Coded Communications Systems | Zachary Schutz et.al. | 2409.11191 | null |
| 2024-09-17 | Preventing Unconstrained CBF Safety Filters Caused by Invalid Relative Degree Assumptions | Lukas Brunke et.al. | 2409.11171 | null |
| 2024-09-17 | Co-Designing Tools and Control Policies for Robust Manipulation | Yifei Dong et.al. | 2409.11113 | null |
| 2024-09-17 | Reactive Environments for Active Inference Agents with RxEnvironments.jl | Wouter W. L. Nuijten et.al. | 2409.11087 | link |
| 2024-09-17 | A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler | Nazim Bendib et.al. | 2409.11068 | null |
| 2024-09-16 | Instigating Cooperation among LLM Agents Using Adaptive Information Modulation | Qiliang Chen et.al. | 2409.10372 | null |
| 2024-09-16 | Catch It! Learning to Catch in Flight with Mobile Dexterous Hands | Yuanhang Zhang et.al. | 2409.10319 | null |
| 2024-09-16 | ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework | Jiahao Yuan et.al. | 2409.10289 | null |
| 2024-09-16 | Safety-Oriented Pruning and Interpretation of Reinforcement Learning Policies | Dennis Gross et.al. | 2409.10218 | null |
| 2024-09-16 | Enhancing RL Safety with Counterfactual LLM Reasoning | Dennis Gross et.al. | 2409.10188 | null |
| 2024-09-16 | Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control | Sebastian Hirt et.al. | 2409.10171 | null |
| 2024-09-16 | Quantile Regression for Distributional Reward Models in RLHF | Nicolai Dorka et.al. | 2409.10164 | link |
| 2024-09-16 | Robust Reinforcement Learning with Dynamic Distortion Risk Measures | Anthony Coache et.al. | 2409.10096 | null |
| 2024-09-16 | Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments | Wessel Ledder et.al. | 2409.10048 | null |
| 2024-09-16 | Reinforcement learning-based statistical search strategy for an axion model from flavor | Satsuki Nishimura et.al. | 2409.10023 | null |
| 2024-09-13 | The unknotting number, hard unknot diagrams, and reinforcement learning | Taylor Applebaum et.al. | 2409.09032 | null |
| 2024-09-13 | Modeling Rational Adaptation of Visual Search to Hierarchical Structures | Saku Sourulahti et.al. | 2409.08967 | null |
| 2024-09-13 | Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks | Jean Seong Bjorn Choe et.al. | 2409.08938 | null |
| 2024-09-13 | AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models | Yifei Yao et.al. | 2409.08904 | null |
| 2024-09-13 | Deep reinforcement learning for tracking a moving target in jellyfish-like swimming | Yihao Chen et.al. | 2409.08815 | null |
| 2024-09-13 | DexSim2Real $^{2}$ : Building Explicit World Model for Precise Articulated Object Dexterous Manipulation | Taoran Jiang et.al. | 2409.08750 | null |
| 2024-09-13 | Quasimetric Value Functions with Dense Rewards | Khadichabonu Valieva et.al. | 2409.08724 | null |
| 2024-09-13 | Secure Offloading in NOMA-Aided Aerial MEC Systems Based on Deep Reinforcement Learning | Hongjiang Lei et.al. | 2409.08579 | null |
| 2024-09-13 | Batch Ensemble for Variance Dependent Regret in Stochastic Bandits | Asaf Cassel et.al. | 2409.08570 | null |
| 2024-09-13 | OIDM: An Observability-based Intelligent Distributed Edge Sensing Method for Industrial Cyber-Physical Systems | Shigeng Wang et.al. | 2409.08549 | null |
| 2024-09-12 | Hand-Object Interaction Pretraining from Videos | Himanshu Gaurav Singh et.al. | 2409.08273 | null |
| 2024-09-12 | Multi-Model based Federated Learning Against Model Poisoning Attack: A Deep Learning Based Model Selection for MEC Systems | Somayeh Kianpisheh et.al. | 2409.08237 | null |
| 2024-09-12 | Towards Online Safety Corrections for Robotic Manipulation Policies | Ariana Spalter et.al. | 2409.08233 | null |
| 2024-09-12 | Design Optimization of Nuclear Fusion Reactor through Deep Reinforcement Learning | Jinsu Kim et.al. | 2409.08231 | null |
| 2024-09-12 | Adaptive Language-Guided Abstraction from Contrastive Explanations | Andi Peng et.al. | 2409.08212 | null |
| 2024-09-12 | Optimal Management of Grid-Interactive Efficient Buildings via Safe Reinforcement Learning | Xiang Huo et.al. | 2409.08132 | null |
| 2024-09-12 | Linear Complementary Dual Codes Constructed from Reinforcement Learning | Yansheng Wu et.al. | 2409.08114 | null |
| 2024-09-12 | Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning | Teng Yan et.al. | 2409.08062 | null |
| 2024-09-12 | Learning Causally Invariant Reward Functions from Diverse Demonstrations | Ivan Ovinnikov et.al. | 2409.08012 | null |
| 2024-09-12 | Digital Twin for Autonomous Guided Vehicles based on Integrated Sensing and Communications | Van-Phuc Bui et.al. | 2409.08005 | null |
| 2024-09-11 | Autonomous loading of ore piles with Load-Haul-Dump machines using Deep Reinforcement Learning | Rodrigo Salas et.al. | 2409.07449 | null |
| 2024-09-11 | Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation | Luo Ji et.al. | 2409.07416 | null |
| 2024-09-11 | Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching | Eugenio Chisari et.al. | 2409.07343 | null |
| 2024-09-11 | Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence | Luo Ji et.al. | 2409.07341 | null |
| 2024-09-11 | A Framework for Predicting the Impact of Game Balance Changes through Meta Discovery | Akash Saravanan et.al. | 2409.07340 | null |
| 2024-09-11 | Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences | Ziang Liu et.al. | 2409.07268 | null |
| 2024-09-11 | Perceptive Pedipulation with Local Obstacle Avoidance | Jonas Stolle et.al. | 2409.07195 | null |
| 2024-09-11 | A Perspective on AI-Guided Molecular Simulations in VR: Exploring Strategies for Imitation Learning in Hyperdimensional Molecular Systems | Mohamed Dhouioui et.al. | 2409.07189 | null |
| 2024-09-11 | Learning Efficient Recursive Numeral Systems via Reinforcement Learning | Jonathan D. Thomas et.al. | 2409.07170 | null |
| 2024-09-11 | DCMAC: Demand-aware Customized Multi-Agent Communication via Upper Bound Training | Dongkun Huo et.al. | 2409.07127 | null |
| 2024-09-10 | DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots | Maria Bauza et.al. | 2409.06613 | null |
| 2024-09-10 | Advancements in Gesture Recognition Techniques and Machine Learning for Enhanced Human-Robot Interaction: A Comprehensive Review | Sajjad Hussain et.al. | 2409.06503 | null |
| 2024-09-10 | Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout | Atharva Gundawar et.al. | 2409.06477 | null |
| 2024-09-10 | Learning Generative Interactive Environments By Trained Agent Exploration | Naser Kazemi et.al. | 2409.06445 | link |
| 2024-09-10 | Length Desensitization in Directed Preference Optimization | Wei Liu et.al. | 2409.06411 | null |
| 2024-09-10 | One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion | Nico Bohlinger et.al. | 2409.06366 | null |
| 2024-09-10 | Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning | Shreyas S R et.al. | 2409.06356 | null |
| 2024-09-10 | Learning Augmentation Policies from A Model Zoo for Time Series Forecasting | Haochen Yuan et.al. | 2409.06282 | null |
| 2024-09-09 | Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments | Haritheja Etukuru et.al. | 2409.05865 | link |
| 2024-09-09 | An Introduction to Quantum Reinforcement Learning (QRL) | Samuel Yen-Chi Chen et.al. | 2409.05846 | null |
| 2024-09-09 | Learning control of underactuated double pendulum with Model-Based Reinforcement Learning | Niccolò Turcato et.al. | 2409.05811 | null |
| 2024-09-09 | Markov Chain Variance Estimation: A Stochastic Approximation Approach | Shubhada Agrawal et.al. | 2409.05733 | null |
| 2024-09-09 | Cooperative Decision-Making for CAVs at Unsignalized Intersections: A MARL Approach with Attention and Hierarchical Game Priors | Jiaqi Liu et.al. | 2409.05712 | null |
| 2024-09-09 | Interactive incremental learning of generalizable skills with local trajectory modulation | Markus Knauer et.al. | 2409.05655 | null |
| 2024-09-09 | Forward KL Regularized Preference Optimization for Aligning Diffusion Policies | Zhao Shan et.al. | 2409.05622 | null |
| 2024-09-09 | Adaptive Multi-Layer Deployment for A Digital Twin Empowered Satellite-Terrestrial Integrated Network | Yihong Tao et.al. | 2409.05480 | null |
| 2024-09-09 | Reinforcement Learning for Variational Quantum Circuits Design | Simone Foderà et.al. | 2409.05475 | null |
| 2024-09-09 | Semifactual Explanations for Reinforcement Learning | Jasmina Gajcin et.al. | 2409.05435 | null |
| 2024-09-06 | RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs | Jiaxing Wu et.al. | 2409.04421 | null |
| 2024-09-06 | Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization | Minh Vu et.al. | 2409.04374 | null |
| 2024-09-06 | Refined Bounds on Near Optimality Finite Window Policies in POMDPs and Their Reinforcement Learning | Yunus Emre Demirci et.al. | 2409.04351 | null |
| 2024-09-06 | Advancing Multi-Organ Disease Care: A Hierarchical Multi-Agent Reinforcement Learning Framework | Daniel J. Tan et.al. | 2409.04224 | null |
| 2024-09-06 | The Prevalence of Neural Collapse in Neural Multivariate Regression | George Andriopoulos et.al. | 2409.04180 | null |
| 2024-09-06 | Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering | Jan Hofmann et.al. | 2409.04122 | null |
| 2024-09-05 | DRAL: Deep Reinforcement Adaptive Learning for Multi-UAVs Navigation in Unknown Indoor Environment | Kangtong Mo et.al. | 2409.03930 | null |
| 2024-09-05 | Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning | Huizhen Yu et.al. | 2409.03915 | null |
| 2024-09-05 | On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments | Muxing Wang et.al. | 2409.03897 | null |
| 2024-09-05 | Multi-agent Path Finding for Mixed Autonomy Traffic Coordination | Han Zheng et.al. | 2409.03881 | null |
| 2024-09-05 | Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron | Christian Schmid et.al. | 2409.03749 | null |
| 2024-09-05 | Differentiable Discrete Event Simulation for Queuing Network Control | Ethan Che et.al. | 2409.03740 | null |
| 2024-09-05 | On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization | Yong Lin et.al. | 2409.03650 | null |
| 2024-09-05 | 1 Modular Parallel Manipulator for Long-Term Soft Robotic Data Collection | Kiyn Chin et.al. | 2409.03614 | null |
| 2024-09-05 | CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning | John Birkbeck et.al. | 2409.03577 | null |
| 2024-09-05 | Sparsifying Parametric Models with L0 Regularization | Nicolò Botteghi et.al. | 2409.03489 | null |
| 2024-09-05 | Reinforcement Learning Approach to Optimizing Profilometric Sensor Trajectories for Surface Inspection | Sara Roos-Hoefgeest et.al. | 2409.03429 | null |
| 2024-09-05 | Game On: Towards Language Models as RL Experimenters | Jingwei Zhang et.al. | 2409.03402 | null |
| 2024-09-05 | ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models | Qi Ju et.al. | 2409.03301 | link |
| 2024-09-05 | Robust synchronization and policy adaptation for networked heterogeneous agents | Miguel F. Arevalo-Castiblanco et.al. | 2409.03273 | null |
| 2024-09-04 | Hybrid Imitation-Learning Motion Planner for Urban Driving | Cristian Gariboldi et.al. | 2409.02871 | null |
| 2024-09-04 | Knowledge Transfer for Collaborative Misbehavior Detection in Untrusted Vehicular Environments | Roshan Sedar et.al. | 2409.02844 | null |
| 2024-09-04 | Tractable Offline Learning of Regular Decision Processes | Ahana Deb et.al. | 2409.02747 | null |
| 2024-09-04 | Surgical Task Automation Using Actor-Critic Frameworks and Self-Supervised Imitation Learning | Jingshuai Liu et.al. | 2409.02724 | null |
| 2024-09-04 | Decision Transformer for Enhancing Neural Local Search on the Job Shop Scheduling Problem | Constantin Waubert de Puiseau et.al. | 2409.02697 | null |
| 2024-09-04 | Causality-Aware Transformer Networks for Robotic Navigation | Ruoyu Wang et.al. | 2409.02669 | null |
| 2024-09-04 | A Survey on Emergent Language | Jannik Peters et.al. | 2409.02645 | null |
| 2024-09-04 | Mamba as a motion encoder for robotic imitation learning | Toshiaki Tsuji et.al. | 2409.02636 | null |
| 2024-09-04 | Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal | Jifeng Hu et.al. | 2409.02512 | null |
| 2024-09-04 | USV-AUV Collaboration Framework for Underwater Tasks under Extreme Sea Conditions | Jingzehua Xu et.al. | 2409.02444 | null |
| 2024-08-30 | Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control | Zihao Sheng et.al. | 2408.17380 | link |
| 2024-08-30 | Stationary Policies are Optimal in Risk-averse Total-reward MDPs with EVaR | Xihong Su et.al. | 2408.17286 | null |
| 2024-08-30 | Using Quantum Solved Deep Boltzmann Machines to Increase the Data Efficiency of RL Agents | Daniel Kent et.al. | 2408.17240 | null |
| 2024-08-30 | MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models | Yujing Wang et.al. | 2408.17072 | null |
| 2024-08-30 | Efficient Camera Exposure Control for Visual Odometry via Deep Reinforcement Learning | Shuyang Zhang et.al. | 2408.17005 | link |
| 2024-08-30 | A Tighter Convergence Proof of Reverse Experience Replay | Nan Jiang et.al. | 2408.16999 | link |
| 2024-08-30 | Discovery of False Data Injection Schemes on Frequency Controllers with Reinforcement Learning | Romesh Prasad et.al. | 2408.16958 | null |
| 2024-08-29 | FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning | Li-Heng Lin et.al. | 2408.16944 | null |
| 2024-08-29 | Manipulating OpenFlow Link Discovery Packet Forwarding for Topology Poisoning | Mingming Chen et.al. | 2408.16940 | null |
| 2024-08-29 | Coverage Analysis of Multi-Environment Q-Learning Algorithms for Wireless Network Optimization | Talha Bozkus et.al. | 2408.16882 | null |
| 2024-08-29 | Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models | Alec Solway et.al. | 2408.16753 | null |
| 2024-08-29 | A GREAT Architecture for Edge-Based Graph Problems Like TSP | Attila Lischka et.al. | 2408.16717 | null |
| 2024-08-29 | RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model | Zhuan Shi et.al. | 2408.16634 | null |
| 2024-08-29 | Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning | Keqin Li et.al. | 2408.16633 | null |
| 2024-08-29 | Phase Optimization and Relay Selection for Joint Relay and IRS-Assisted Communication | Uyoata E. Uyoata et.al. | 2408.16399 | null |
| 2024-08-29 | EasyChauffeur: A Baseline Advancing Simplicity and Efficiency on Waymax | Lingyu Xiao et.al. | 2408.16375 | null |
| 2024-08-29 | Efficient Multi-agent Navigation with Lightweight DRL Policy | Xingrong Diao et.al. | 2408.16370 | null |
| 2024-08-29 | On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes | Yi Wan et.al. | 2408.16262 | null |
| 2024-08-28 | DECAF: a Discrete-Event based Collaborative Human-Robot Framework for Furniture Assembly | Giulio Giacomuzzo et.al. | 2408.16125 | null |
| 2024-08-28 | RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models | Pritthijit Nath et.al. | 2408.16118 | link |
| 2024-08-28 | In-Context Imitation Learning via Next-Token Prediction | Letian Fu et.al. | 2408.15980 | link |
| 2024-08-28 | Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | Nicholas R. Waytowich et.al. | 2408.15950 | null |
| 2024-08-28 | DeMoBot: Deformable Mobile Manipulation with Vision-based Sub-goal Retrieval | Yuying Zhang et.al. | 2408.15919 | null |
| 2024-08-28 | Adaptive Traffic Signal Control Using Reinforcement Learning | Muhammad Tahir Rafique et.al. | 2408.15751 | null |
| 2024-08-28 | Deep Reinforcement Learning for Radiative Heat Transfer Optimization Problems | Eva Ortiz-Mansilla et.al. | 2408.15727 | null |
| 2024-08-28 | Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System | Georg Schäfer et.al. | 2408.15633 | null |
| 2024-08-28 | Structural Optimization of Lightweight Bipedal Robot via SERL | Yi Cheng et.al. | 2408.15632 | null |
| 2024-08-28 | Statistical QoS Provision in Business-Centric Networks | Chang Wu et.al. | 2408.15609 | null |
| 2024-08-28 | Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning | Minjong Yoo et.al. | 2408.15593 | null |
| 2024-08-28 | Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits | Woojin Jeong et.al. | 2408.15535 | null |
| 2024-08-27 | SpecGuard: Specification Aware Recovery for Robotic Autonomous Vehicles from Physical Attacks | Pritam Dash et.al. | 2408.15200 | null |
| 2024-08-27 | Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning | Batuhan Yardim et.al. | 2408.15173 | null |
| 2024-08-27 | Applications in CityLearn Gym Environment for Multi-Objective Control Benchmarking in Grid-Interactive Buildings and Districts | Kingsley Nweye et.al. | 2408.15170 | null |
| 2024-08-27 | muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults | Deepak-George Thomas et.al. | 2408.15150 | null |
| 2024-08-27 | No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery | Alexander Rutherford et.al. | 2408.15099 | link |
| 2024-08-27 | MiWaves Reinforcement Learning Algorithm | Susobhan Ghosh et.al. | 2408.15076 | null |
| 2024-08-27 | Earth Observation Satellite Scheduling with Graph Neural Networks | Antoine Jacquet et.al. | 2408.15041 | null |
| 2024-08-27 | Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data | Han Xia et.al. | 2408.14874 | null |
| 2024-08-27 | Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation | Haozhe Lou et.al. | 2408.14873 | null |
| 2024-08-27 | Learning Robust Reward Machines from Noisy Labels | Roko Parac et.al. | 2408.14871 | link |
| 2024-08-26 | Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning | Xinyang Gu et.al. | 2408.14472 | null |
| 2024-08-26 | Equivariant Reinforcement Learning under Partial Observability | Hai Nguyen et.al. | 2408.14336 | null |
| 2024-08-26 | Efficient Active Flow Control Strategy for Confined Square Cylinder Wake Using Deep Learning-Based Surrogate Model and Reinforcement Learning | Meng Zhang et.al. | 2408.14232 | null |
| 2024-08-26 | DynamicRouteGPT: A Real-Time Multi-Vehicle Dynamic Navigation Framework Based on Large Language Models | Ziai Zhou et.al. | 2408.14185 | null |
| 2024-08-26 | Robot Navigation with Entity-Based Collision Avoidance using Deep Reinforcement Learning | Yury Kolomeytsev et.al. | 2408.14183 | null |
| 2024-08-26 | ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners | Xiangge Huang et.al. | 2408.14086 | null |
| 2024-08-26 | Bridging the gap between Learning-to-plan, Motion Primitives and Safe Reinforcement Learning | Piotr Kicki et.al. | 2408.14063 | null |
| 2024-08-26 | Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning | Joey Hejna et.al. | 2408.14037 | link |
| 2024-08-26 | Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning | Wen-Han Hsieh et.al. | 2408.14009 | null |
| 2024-08-26 | Quantitative Representation of Scenario Difficulty for Autonomous Driving Based on Adversarial Policy Search | Shuo Yang et.al. | 2408.14000 | null |
| 2024-08-23 | Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach | Johan Peralez et.al. | 2408.13139 | null |
| 2024-08-23 | Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning | Jihwan Oh et.al. | 2408.13092 | null |
| 2024-08-23 | Guiding IoT-Based Healthcare Alert Systems with Large Language Models | Yulan Gao et.al. | 2408.13071 | null |
| 2024-08-23 | cc-DRL: a Convex Combined Deep Reinforcement Learning Flight Control Design for a Morphing Quadrotor | Tao Yang et.al. | 2408.13054 | null |
| 2024-08-23 | In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting | Haowei Du et.al. | 2408.13028 | null |
| 2024-08-23 | Robust Iterative Value Conversion: Deep Reinforcement Learning for Neurochip-driven Edge Robots | Yuki Kadokawa et.al. | 2408.13018 | null |
| 2024-08-23 | SUMO: Search-Based Uncertainty Estimation for Model-Based Offline Reinforcement Learning | Zhongjian Qiao et.al. | 2408.12970 | null |
| 2024-08-23 | SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning | Wang Luo et.al. | 2408.12830 | null |
| 2024-08-23 | DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation | Xiaowei Mao et.al. | 2408.12809 | null |
| 2024-08-23 | Intelligent OPC Engineer Assistant for Semiconductor Manufacturing | Guojin Chen et.al. | 2408.12775 | null |
| 2024-08-22 | Controllable Text Generation for Large Language Models: A Survey | Xun Liang et.al. | 2408.12599 | link |
| 2024-08-22 | Automating Deformable Gasket Assembly | Simeon Adebola et.al. | 2408.12593 | null |
| 2024-08-22 | Human-In-The-Loop Machine Learning for Safe and Ethical Autonomous Vehicles: Principles, Challenges, and Opportunities | Yousef Emami et.al. | 2408.12548 | null |
| 2024-08-22 | PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators | Sam Earle et.al. | 2408.12525 | null |
| 2024-08-22 | EX-DRL: Hedging Against Heavy Losses with EXtreme Distributional Reinforcement Learning | Parvin Malekzadeh et.al. | 2408.12446 | null |
| 2024-08-22 | Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning | Yen-Ru Lai et.al. | 2408.12307 | null |
| 2024-08-22 | Domino-cooling Oscillator Networks with Deep Reinforcement Learning | Sampreet Kalita et.al. | 2408.12271 | null |
| 2024-08-22 | UNCO: Towards Unifying Neural Combinatorial Optimization through Large Language Model | Xia Jiang et.al. | 2408.12214 | null |
| 2024-08-22 | A Safety-Oriented Self-Learning Algorithm for Autonomous Driving: Evolution Starting from a Basic Model | Shuo Yang et.al. | 2408.12190 | null |
| 2024-08-22 | A Safe and Efficient Self-evolving Algorithm for Decision-making and Control of Autonomous Driving Systems | Shuo Yang et.al. | 2408.12187 | null |
| 2024-08-21 | Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction | Anthony GX-Chen et.al. | 2408.11816 | null |
| 2024-08-21 | ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation | Shiqi Yang et.al. | 2408.11805 | null |
| 2024-08-21 | Critique-out-Loud Reward Models | Zachary Ankner et.al. | 2408.11791 | link |
| 2024-08-21 | Deviations from the Nash equilibrium and emergence of tacit collusion in a two-player optimal execution game with reinforcement learning | Fabrizio Lillo et.al. | 2408.11773 | null |
| 2024-08-21 | Bayesian Optimization Framework for Efficient Fleet Design in Autonomous Multi-Robot Exploration | David Molina Concha et.al. | 2408.11751 | null |
| 2024-08-21 | Optimizing Interpretable Decision Tree Policies for Reinforcement Learning | Daniël Vos et.al. | 2408.11632 | link |
| 2024-08-21 | A Survey of Embodied Learning for Object-Centric Robotic Manipulation | Ying Zheng et.al. | 2408.11537 | link |
| 2024-08-22 | Using Part-based Representations for Explainable Deep Reinforcement Learning | Manos Kirtas et.al. | 2408.11455 | null |
| 2024-08-21 | Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration | Cheng Xu et.al. | 2408.11416 | link |
| 2024-08-21 | Reflex-Based Open-Vocabulary Navigation without Prior Knowledge Using Omnidirectional Camera and Multiple Vision-Language Models | Kento Kawaharazuka et.al. | 2408.11380 | null |
| 2024-08-20 | Accelerating Goal-Conditioned RL Algorithms and Research | Michał Bortkiewicz et.al. | 2408.11052 | link |
| 2024-08-20 | RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands | Yi Zhao et.al. | 2408.11048 | null |
| 2024-08-20 | Quantum Machine Learning Algorithms for Anomaly Detection: a Survey | Sebastiano Corli et.al. | 2408.11047 | null |
| 2024-08-20 | Deep Reinforcement Learning for Network Energy Saving in 6G and Beyond Networks | Dinh-Hieu Tran et.al. | 2408.10974 | null |
| 2024-08-20 | The Evolution of Reinforcement Learning in Quantitative Finance | Nikolaos Pippas et.al. | 2408.10932 | null |
| 2024-08-20 | Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning | Haozhe Ma et.al. | 2408.10858 | link |
| 2024-08-20 | Offline Model-Based Reinforcement Learning with Anti-Exploration | Padmanaba Srinivasan et.al. | 2408.10713 | null |
| 2024-08-20 | Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation | Shiming Xie et.al. | 2408.10642 | null |
| 2024-08-20 | Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search | Jonathan Light et.al. | 2408.10635 | link |
| 2024-08-20 | Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams | Litian Huang et.al. | 2408.10592 | link |
| 2024-08-19 | LEAD: Towards Learning-Based Equity-Aware Decarbonization in Ridesharing Platforms | Mahsa Sahebdel et.al. | 2408.10201 | null |
| 2024-08-19 | Physics-Aware Combinatorial Assembly Planning using Deep Reinforcement Learning | Ruixuan Liu et.al. | 2408.10162 | null |
| 2024-08-19 | $R^2$ -Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement | Haoyang Wang et.al. | 2408.10135 | null |
| 2024-08-19 | Enhancing Reinforcement Learning Through Guided Search | Jérôme Arjonilla et.al. | 2408.10113 | null |
| 2024-08-19 | Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning | Sriyash Poddar et.al. | 2408.10075 | null |
| 2024-08-19 | Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm | Nikolai Rozanov et.al. | 2408.10055 | null |
| 2024-08-19 | Adaptive BESS and Grid Setpoints Optimization: A Model-Free Framework for Efficient Battery Management under Dynamic Tariff Pricing | Alaa Selim et.al. | 2408.09989 | null |
| 2024-08-19 | The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective | Renye Yan et.al. | 2408.09974 | null |
| 2024-08-19 | GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits | Gongpu Chen et.al. | 2408.09882 | null |
| 2024-08-19 | ShortCircuit: AlphaZero-Driven Circuit Design | Dimitrios Tsaras et.al. | 2408.09858 | null |
| 2024-08-16 | HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis | Zhi-Bo Liu et.al. | 2408.08847 | link |
| 2024-08-16 | CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk | Mohamad Fares El Hajj Chehade et.al. | 2408.08812 | null |
| 2024-08-16 | Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions | Bhuvanashree Murugadoss et.al. | 2408.08781 | null |
| 2024-08-16 | SYMPOL: Symbolic Tree-Based On-Policy Reinforcement Learning | Sascha Marton et.al. | 2408.08761 | link |
| 2024-08-16 | Efficient Multi-Policy Evaluation for Reinforcement Learning | Shuze Liu et.al. | 2408.08706 | null |
| 2024-08-16 | Neural Reward Machines | Elena Umili et.al. | 2408.08677 | link |
| 2024-08-16 | Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space Program | Alejandro Carrasco et.al. | 2408.08676 | link |
| 2024-08-16 | DeepREST: Automated Test Case Generation for REST APIs Exploiting Deep Reinforcement Learning | Davide Corradini et.al. | 2408.08594 | null |
| 2024-08-16 | Multilevel Graph Reinforcement Learning for Consistent Cognitive Decision-making in Heterogeneous Mixed Autonomy | Xin Gao et.al. | 2408.08516 | null |
| 2024-08-16 | Deep multi-intentional inverse reinforcement learning for cognitive multi-function radar inverse cognition | Hancong Feng et.al. | 2408.08478 | null |
| 2024-08-15 | A Conflicts-free, Speed-lossless KAN-based Reinforcement Learning Decision System for Interactive Driving in Roundabouts | Zhihao Lin et.al. | 2408.08242 | null |
| 2024-08-15 | Explaining an Agent’s Future Beliefs through Temporally Decomposing Future Reward Estimators | Mark Towers et.al. | 2408.08230 | link |
| 2024-08-15 | DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search | Huajian Xin et.al. | 2408.08152 | link |
| 2024-08-15 | Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players | Pragnya Alatur et.al. | 2408.08075 | null |
| 2024-08-15 | An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation | Jun Wang et.al. | 2408.08047 | null |
| 2024-08-15 | Adaptive User Journeys in Pharma E-Commerce with Reinforcement Learning: Insights from SwipeRx | Ana Fernández del Río et.al. | 2408.08024 | null |
| 2024-08-15 | Experimental evaluation of offline reinforcement learning for HVAC control in buildings | Jun Wang et.al. | 2408.07986 | link |
| 2024-08-15 | Meta SAC-Lag: Towards Deployable Safe Reinforcement Learning via MetaGradient-based Hyperparameter Tuning | Homayoun Honari et.al. | 2408.07962 | null |
| 2024-08-15 | Solving a Rubik’s Cube Using its Local Graph Structure | Shunyu Yao et.al. | 2408.07945 | null |
| 2024-08-15 | IReCa: Intrinsic Reward-enhanced Context-aware Reinforcement Learning for Human-AI Coordination | Xin Hao et.al. | 2408.07877 | null |
| 2024-08-14 | Off-Policy Reinforcement Learning with High Dimensional Reward | Dong Neuck Lee et.al. | 2408.07660 | null |
| 2024-08-14 | Adaptive Behavioral AI: Reinforcement Learning to Enhance Pharmacy Services | Ana Fernández del Río et.al. | 2408.07647 | null |
| 2024-08-14 | SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning | Jianye Xu et.al. | 2408.07644 | link |
| 2024-08-14 | Optimizing HIV Patient Engagement with Reinforcement Learning in Resource-Limited Settings | África Periáñez et.al. | 2408.07629 | null |
| 2024-08-14 | A Nested Graph Reinforcement Learning-based Decision-making Strategy for Eco-platooning | Xin Gao et.al. | 2408.07578 | null |
| 2024-08-14 | Large Language Models Know What Makes Exemplary Contexts | Quanyu Long et.al. | 2408.07505 | null |
| 2024-08-14 | Large Language Models Prompting With Episodic Memory | Dai Do et.al. | 2408.07465 | null |
| 2024-08-14 | Real-world validation of safe reinforcement learning, model predictive control and decision tree-based home energy management systems | Julian Ruddick et.al. | 2408.07435 | null |
| 2024-08-14 | Bridging Training and Execution via Dynamic Directed Graph-Based Communication in Cooperative Multi-Agent Systems | Zhuohui Zhang et.al. | 2408.07397 | null |
| 2024-08-14 | Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space | Xiaoyang Yu et.al. | 2408.07395 | null |
| 2024-08-13 | LLMs can Schedule | Henrik Abgaryan et.al. | 2408.06993 | link |
| 2024-08-13 | IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and Optimization | Guanchang Li et.al. | 2408.06969 | null |
| 2024-08-13 | Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation | Yanjie Dong et.al. | 2408.06945 | null |
| 2024-08-13 | Multi-Agent Continuous Control with Generative Flow Networks | Shuang Luo et.al. | 2408.06920 | link |
| 2024-08-13 | Personalized Dynamic Difficulty Adjustment – Imitation Learning Meets Reinforcement Learning | Ronja Fuchs et.al. | 2408.06818 | link |
| 2024-08-13 | Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection | Matthias Bartolo et.al. | 2408.06803 | link |
| 2024-08-13 | Residual Deep Reinforcement Learning for Inverter-based Volt-Var Control | Qiong Liu et.al. | 2408.06790 | null |
| 2024-08-13 | Deep reinforcement learning for the management of the wall regeneration cycle in wall-bounded turbulent flows | Giorgio Maria Cavallazzi et.al. | 2408.06783 | null |
| 2024-08-13 | Robust Deep Reinforcement Learning for Inverter-based Volt-Var Control in Partially Observable Distribution Networks | Qiong Liu et.al. | 2408.06776 | null |
| 2024-08-13 | MAPPO-PIS: A Multi-Agent Proximal Policy Optimization Method with Prior Intent Sharing for CAVs’ Cooperative Decision-Making | Yicheng Guo et.al. | 2408.06656 | link |
| 2024-08-12 | Body Transformer: Leveraging Robot Embodiment for Policy Learning | Carmelo Sferrazza et.al. | 2408.06316 | null |
| 2024-08-12 | Inverse designing metamaterials with programmable nonlinear functional responses in graph space | Marco Maurizi et.al. | 2408.06300 | null |
| 2024-08-12 | EyeSight Hand: Design of a Fully-Actuated Dexterous Robot Hand with Integrated Vision-Based Tactile Sensors and Compliant Actuation | Branden Romero et.al. | 2408.06265 | null |
| 2024-08-12 | Stable-BC: Controlling Covariate Shift with Stable Behavior Cloning | Shaunak A. Mehta et.al. | 2408.06246 | null |
| 2024-08-12 | Building Decision Making Models Through Language Model Regime | Yu Zhang et.al. | 2408.06087 | null |
| 2024-08-12 | Sequential sampling without comparison to boundary through model-free reinforcement learning | Jamal Esmaily et.al. | 2408.06080 | null |
| 2024-08-12 | Online Optimization of Curriculum Learning Schedules using Evolutionary Optimization | Mohit Jiwatode et.al. | 2408.06068 | null |
| 2024-08-12 | GFlowNet Training by Policy Gradients | Puhua Niu et.al. | 2408.05885 | link |
| 2024-08-12 | Multi-Agent Deep Reinforcement Learning Framework for Wireless MAC Protocol Design and Optimization | Navid Keshtiarast et.al. | 2408.05884 | null |
| 2024-08-11 | Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning | Shi Bo et.al. | 2408.05860 | null |
| 2024-08-09 | Deterministic remote entanglement using a chiral quantum interconnect | Aziza Almanakly et.al. | 2408.05164 | null |
| 2024-08-09 | Kolmogorov-Arnold Network for Online Reinforcement Learning | Victor Augusto Kich et.al. | 2408.04841 | null |
| 2024-08-09 | Multi-User MISO with Stacked Intelligent Metasurfaces: A DRL-Based Sum-Rate Optimization Approach | Hao Liu et.al. | 2408.04837 | null |
| 2024-08-09 | Next-Generation Wi-Fi Networks with Generative AI: Design and Insights | Jingyu Wang et.al. | 2408.04835 | null |
| 2024-08-08 | Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity | Martin Smit et.al. | 2408.04549 | link |
| 2024-08-08 | Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs | Kevin Tan et.al. | 2408.04526 | null |
| 2024-08-08 | Model-Based Transfer Learning for Contextual Reinforcement Learning | Jung-Hoon Cho et.al. | 2408.04498 | null |
| 2024-08-08 | Reinforcement Learning from Human Feedback for Lane Changing of Autonomous Vehicles in Mixed Traffic | Yuting Wang et.al. | 2408.04447 | null |
| 2024-08-08 | Non-maximizing policies that fulfill multi-criterion aspirations in expectation | Simon Dima et.al. | 2408.04385 | null |
| 2024-08-08 | Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations | Julen Urain et.al. | 2408.04380 | null |
| 2024-08-08 | Deep Reinforcement Learning for the Design of Metamaterial Mechanisms with Functional Compliance Control | Yejun Choi et.al. | 2408.04376 | null |
| 2024-08-08 | Goal-Oriented UAV Communication Design and Optimization for Target Tracking: A MachineLearning Approach | Wenchao Wu et.al. | 2408.04358 | null |
| 2024-08-08 | KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination | Yin Gu et.al. | 2408.04336 | null |
| 2024-08-08 | Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization | Aditya Kapoor et.al. | 2408.04295 | null |
| 2024-08-07 | Traffic and Obstacle-aware UAV Positioning in Urban Environments Using Reinforcement Learning | Kamran Shafafi et.al. | 2408.03894 | null |
| 2024-08-07 | Navigating the Human Maze: Real-Time Robot Pathfinding with Generative Imitation Learning | Martin Moder et.al. | 2408.03807 | null |
| 2024-08-07 | HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks | Jingsong Liang et.al. | 2408.03768 | null |
| 2024-08-07 | Asynchronous Credit Assignment Framework for Multi-Agent Reinforcement Learning | Yongheng Liang et.al. | 2408.03692 | null |
| 2024-08-07 | RL-ADN: A High-Performance Deep Reinforcement Learning Environment for Optimal Energy Storage Systems Dispatch in Active Distribution Networks | Shengren Hou et.al. | 2408.03685 | null |
| 2024-08-07 | AI-Driven approach for sustainable extraction of earth’s subsurface renewable energy while minimizing seismic activity | Diego Gutierrez-Oribio et.al. | 2408.03664 | null |
| 2024-08-07 | A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case | Sonia Meyer et.al. | 2408.03562 | null |
| 2024-08-07 | Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes | Chen Tang et.al. | 2408.03539 | null |
| 2024-08-06 | Spacecraft inertial parameters estimation using time series clustering and reinforcement learning | Konstantinos Platanitis et.al. | 2408.03445 | null |
| 2024-08-06 | Communication-Aware Consistent Edge Selection for Mobile Users and Autonomous Vehicles | Nazish Tahir et.al. | 2408.03435 | null |
| 2024-08-07 | Adversarial Safety-Critical Scenario Generation using Naturalistic Human Driving Priors | Kunkun Hao et.al. | 2408.03200 | null |
| 2024-08-06 | RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning | Jiapeng Zhu et.al. | 2408.03195 | null |
| 2024-08-06 | Integrated Intention Prediction and Decision-Making with Spectrum Attention Net and Proximal Policy Optimization | Xiao Zhou et.al. | 2408.03191 | null |
| 2024-08-06 | CADRL: Category-aware Dual-agent Reinforcement Learning for Explainable Recommendations over Knowledge Graphs | Shangfei Zheng et.al. | 2408.03166 | null |
| 2024-08-06 | QADQN: Quantum Attention Deep Q-Network for Financial Market Prediction | Siddhant Dutta et.al. | 2408.03088 | null |
| 2024-08-06 | Research on Autonomous Driving Decision-making Strategies based Deep Reinforcement Learning | Zixiang Wang et.al. | 2408.03084 | null |
| 2024-08-06 | Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach | Ehsan Badfar et.al. | 2408.03077 | null |
| 2024-08-06 | Learning to Turn: Diffusion Imitation for Robust Row Turning in Under-Canopy Robots | Arun N. Sivakumar et.al. | 2408.03059 | null |
| 2024-08-06 | A Course in Dynamic Optimization | Bar Light et.al. | 2408.03034 | null |
| 2024-08-07 | Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning | Haozhe Ma et.al. | 2408.03029 | null |
| 2024-08-05 | Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion | Ho Jae Lee et.al. | 2408.02662 | null |
| 2024-08-05 | Context-aware Mamba-based Reinforcement Learning for social robot navigation | Syed Muhammad Mustafa et.al. | 2408.02661 | null |
| 2024-08-05 | Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models? | Mohammad Bahrami Karkevandi et.al. | 2408.02651 | null |
| 2024-08-05 | Backward explanations via redefinition of predicates | Léo Saulières et.al. | 2408.02606 | null |
| 2024-08-05 | Progressively Selective Label Enhancement for Language Model Alignment | Biao Liu et.al. | 2408.02599 | null |
| 2024-08-05 | Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | Yauwai Yim et.al. | 2408.02559 | null |
| 2024-08-05 | Counterfactual Shapley Values for Explaining Reinforcement Learning | Yiwei Shi et.al. | 2408.02529 | null |
| 2024-08-05 | Fair Resource Allocation For Hierarchical Federated Edge Learning in Space-Air-Ground Integrated Networks via Deep Reinforcement Learning with Hybrid Control | Chong Huang et.al. | 2408.02501 | null |
| 2024-08-05 | Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise | Noufel Frikha et.al. | 2408.02489 | null |
| 2024-08-05 | Terracorder: Sense Long and Prosper | Josh Millar et.al. | 2408.02407 | null |
| 2024-08-02 | Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer | Yu Yang et.al. | 2408.01402 | null |
| 2024-08-02 | NOLO: Navigate Only Look Once | Bohan Zhou et.al. | 2408.01384 | null |
| 2024-08-02 | Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation | Ruoxuan Feng et.al. | 2408.01366 | null |
| 2024-08-02 | Jacta: A Versatile Planner for Learning Dexterous and Whole-body Manipulation | Jan Brüdigam et.al. | 2408.01258 | null |
| 2024-08-02 | Deep progressive reinforcement learning-based flexible resource scheduling framework for IRS and UAV-assisted MEC system | Li Dong et.al. | 2408.01248 | null |
| 2024-08-02 | Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems | Juan C. Rosero et.al. | 2408.01188 | null |
| 2024-08-02 | Optimizing Variational Quantum Circuits Using Metaheuristic Strategies in Reinforcement Learning | Michael Kölle et.al. | 2408.01187 | null |
| 2024-08-02 | TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation | Yicheng Lin et.al. | 2408.01156 | null |
| 2024-08-02 | Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning | Yueen Ma et.al. | 2408.01147 | null |
| 2024-08-02 | A Survey on Self-play Methods in Reinforcement Learning | Ruize Zhang et.al. | 2408.01072 | null |
| 2024-08-01 | A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence | Mingyang Liu et.al. | 2408.00751 | null |
| 2024-08-01 | Insurance Portfolio Pursuit with Reinforcement Learning | Edward James Young et.al. | 2408.00713 | null |
| 2024-08-01 | Learning in Multi-Objective Public Goods Games with Non-Linear Utilities | Nicole Orzan et.al. | 2408.00682 | null |
| 2024-08-01 | Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning | Yuanyang Zhu et.al. | 2408.00309 | null |
| 2024-08-01 | A Reinforcement Learning Based Motion Planner for Quadrotor Autonomous Flight in Dense Environment | Zhaohong Liu et.al. | 2408.00275 | null |
| 2024-08-01 | Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control | Hao Zhou et.al. | 2408.00214 | null |
| 2024-07-31 | CREW: Facilitating Human-AI Teaming Research | Lingyu Zhang et.al. | 2408.00170 | null |
| 2024-07-31 | Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates | Colin Shea-Blymyer et.al. | 2408.00147 | null |
| 2024-07-31 | Adaptive Transit Signal Priority based on Deep Reinforcement Learning and Connected Vehicles in a Traffic Microsimulation Environment | Dickness Kwesiga et.al. | 2408.00098 | null |
| 2024-07-31 | Berkeley Humanoid: A Research Platform for Learning-based Control | Qiayuan Liao et.al. | 2407.21781 | null |
| 2024-07-31 | Human-Machine Co-Adaptation for Robot-Assisted Rehabilitation via Dual-Agent Multiple Model Reinforcement Learning (DAMMRL) | Yang An et.al. | 2407.21734 | null |
| 2024-07-31 | Multi-agent reinforcement learning for the control of three-dimensional Rayleigh-Bénard convection | Joel Vasanth et.al. | 2407.21565 | null |
| 2024-07-31 | Black box meta-learning intrinsic rewards for sparse-reward environments | Octavio Pappalardo et.al. | 2407.21546 | null |
| 2024-07-31 | Multi-agent Assessment with QoS Enhancement for HD Map Updates in a Vehicular Network | Jeffrey Redondo et.al. | 2407.21460 | null |
| 2024-07-31 | ProSpec RL: Plan Ahead, then Execute | Liangliang Liu et.al. | 2407.21359 | null |
| 2024-07-31 | Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks | David Valencia et.al. | 2407.21338 | null |
| 2024-07-31 | Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation | Taehyun Cho et.al. | 2407.21260 | null |
| 2024-07-30 | VITAL: Visual Teleoperation to Enhance Robot Learning through Human-in-the-Loop Corrections | Hamidreza Kasaei et.al. | 2407.21244 | null |
| 2024-07-30 | Learning Stable Robot Grasping with Transformer-based Tactile Control Policies | En Yen Puang et.al. | 2407.21172 | link |
| 2024-07-30 | Securing Proof of Stake Blockchains: Leveraging Multi-Agent Reinforcement Learning for Detecting and Mitigating Malicious Nodes | Faisal Haque Bappy et.al. | 2407.20983 | null |
| 2024-07-30 | How to Choose a Reinforcement-Learning Algorithm | Fabian Bongratz et.al. | 2407.20917 | null |
| 2024-07-30 | ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning | Hosung Lee et.al. | 2407.20806 | link |
| 2024-07-30 | Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning | Norman Di Palo et.al. | 2407.20798 | null |
| 2024-07-30 | Architectural Influence on Variational Quantum Circuits in Multi-Agent Reinforcement Learning: Evolutionary Strategies for Optimization | Michael Kölle et.al. | 2407.20739 | null |
| 2024-07-30 | Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems | Qionghua Liao et.al. | 2407.20679 | null |
| 2024-07-30 | Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations | Yupei Yang et.al. | 2407.20651 | null |
| 2024-07-30 | Wireless Multi-User Interactive Virtual Reality in Metaverse with Edge-Device Collaborative Computing | Caolu Xu et.al. | 2407.20523 | null |
| 2024-07-30 | Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge | Yupei Yang et.al. | 2407.20506 | link |
| 2024-07-29 | A Method for Fast Autonomy Transfer in Reinforcement Learning | Dinuka Sahabandu et.al. | 2407.20466 | null |
| 2024-07-29 | SAPG: Split and Aggregate Policy Gradients | Jayesh Singla et.al. | 2407.20230 | null |
| 2024-07-29 | Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration | Yixiao Ma et.al. | 2407.20203 | null |
| 2024-07-29 | Language-Conditioned Offline RL for Multi-Robot Navigation | Steven Morad et.al. | 2407.20164 | null |
| 2024-07-29 | Quantum Machine Learning Architecture Search via Deep Reinforcement Learning | Xin Dai et.al. | 2407.20147 | null |
| 2024-07-29 | Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning | Liyuan Mao et.al. | 2407.20109 | null |
| 2024-07-29 | Counterfactual rewards promote collective transport using individually controlled swarm microrobots | Veit-Lorenz Heuthe et.al. | 2407.20041 | null |
| 2024-07-29 | Collision Probability Distribution Estimation via Temporal Difference Learning | Thomas Steinecker et.al. | 2407.20000 | link |
| 2024-07-29 | Integrated Communications and Security: RIS-Assisted Simultaneous Transmission and Generation of Secret Keys | Ning Gao et.al. | 2407.19960 | null |
| 2024-07-29 | A Differential Dynamic Programming Framework for Inverse Reinforcement Learning | Kun Cao et.al. | 2407.19902 | null |
| 2024-07-29 | Imitation Learning for Intra-Day Power Grid Operation through Topology Actions | Matthijs de Jong et.al. | 2407.19865 | null |
| 2024-07-26 | SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments | Shu Ishida et.al. | 2407.18913 | null |
| 2024-07-26 | Lessons from Learning to Spin “Pens” | Jun Wang et.al. | 2407.18902 | null |
| 2024-07-26 | SHANGUS: Deep Reinforcement Learning Meets Heuristic Optimization for Speedy Frontier-Based Exploration of Autonomous Vehicles in Unknown Spaces | Seunghyeop Nam et.al. | 2407.18892 | null |
| 2024-07-26 | An Accelerated Multi-level Monte Carlo Approach for Average Reward Reinforcement Learning with General Policy Parametrization | Swetha Ganesh et.al. | 2407.18878 | null |
| 2024-07-26 | QT-TDM: Planning with Transformer Dynamics Model and Autoregressive Q-Learning | Mostafa Kotb et.al. | 2407.18841 | null |
| 2024-07-26 | The Cross-environment Hyperparameter Setting Benchmark for Reinforcement Learning | Andrew Patterson et.al. | 2407.18840 | null |
| 2024-07-26 | Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects | Johannes Pitz et.al. | 2407.18834 | null |
| 2024-07-26 | Online Planning in POMDPs with State-Requests | Raphael Avalos et.al. | 2407.18812 | null |
| 2024-07-26 | Tuning the kinetics of intracellular transport | Ardra Suchitran et.al. | 2407.18784 | null |
| 2024-07-26 | A Deep Reinforcement Learning Approach to Wavefront Control for Exoplanet Imaging | Yann Gutierrez et.al. | 2407.18733 | null |
| 2024-07-25 | Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Yuxiao Qu et.al. | 2407.18219 | null |
| 2024-07-25 | Differentiable Quantum Architecture Search in Asynchronous Quantum Reinforcement Learning | Samuel Yen-Chi Chen et.al. | 2407.18202 | null |
| 2024-07-25 | Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation | Jean Seong Bjorn Choe et.al. | 2407.18143 | null |
| 2024-07-25 | MapTune: Advancing ASIC Technology Mapping via Reinforcement Learning Guided Library Tuning | Mingju Liu et.al. | 2407.18110 | link |
| 2024-07-25 | Principal-Agent Reinforcement Learning | Dima Ivanov et.al. | 2407.18074 | null |
| 2024-07-25 | Multi-Agent Deep Reinforcement Learning for Resilience Optimization in 5G RAN | Soumeya Kaada et.al. | 2407.18066 | null |
| 2024-07-25 | Personalized and Context-aware Route Planning for Edge-assisted Vehicles | Dinesh Cyril Selvaraj et.al. | 2407.17980 | null |
| 2024-07-25 | Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization | Feihu Huang et.al. | 2407.17823 | null |
| 2024-07-25 | Advanced deep-reinforcement-learning methods for flow control: group-invariant and positional-encoding networks improve learning speed and quality | Joogoo Jeon et.al. | 2407.17822 | null |
| 2024-07-25 | Preliminary Results of Neuromorphic Controller Design and a Parkinson’s Disease Dataset Building for Closed-Loop Deep Brain Stimulation | Ananna Biswas et.al. | 2407.17756 | null |
| 2024-07-24 | Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning | Shuang Qiu et.al. | 2407.17466 | null |
| 2024-07-24 | Toward human-centered shared autonomy AI paradigms for human-robot teaming in healthcare | Reza Abiri et.al. | 2407.17464 | null |
| 2024-07-24 | SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning | Jianpeng Yao et.al. | 2407.17460 | null |
| 2024-07-24 | Joint Transmit and Jamming Power Optimization for Secrecy in Energy Harvesting Networks: A Reinforcement Learning Approach | Shalini Tripathi et.al. | 2407.17435 | null |
| 2024-07-24 | Market Making with Exogenous Competition | Robert Boyce et.al. | 2407.17393 | null |
| 2024-07-24 | MoveLight: Enhancing Traffic Signal Control through Movement-Centric Deep Reinforcement Learning | Junqi Shao et.al. | 2407.17303 | null |
| 2024-07-24 | Pretrained Visual Representations in Reinforcement Learning | Emlyn Williams et.al. | 2407.17238 | null |
| 2024-07-24 | Sublinear Regret for An Actor-Critic Algorithm in Continuous-Time Linear-Quadratic Reinforcement Learning | Yilie Huang et.al. | 2407.17226 | null |
| 2024-07-24 | Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization | Jonathan Pirnay et.al. | 2407.17206 | link |
| 2024-07-24 | Path Following and Stabilisation of a Bicycle Model using a Reinforcement Learning Approach | Sebastian Weyrer et.al. | 2407.17156 | null |
| 2024-07-23 | A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data | Adrian Remonda et.al. | 2407.16680 | link |
| 2024-07-23 | From Imitation to Refinement – Residual RL for Precise Visual Assembly | Lars Ankile et.al. | 2407.16677 | null |
| 2024-07-23 | Efficient Discovery of Actual Causality using Abstraction-Refinement | Arshia Rafieioskouei et.al. | 2407.16629 | null |
| 2024-07-23 | Functional Acceleration for Policy Mirror Descent | Veronica Chelu et.al. | 2407.16602 | null |
| 2024-07-23 | Real-Time Interactions Between Human Controllers and Remote Devices in Metaverse | Kan Chen et.al. | 2407.16591 | null |
| 2024-07-23 | TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback | Eunseop Yoon et.al. | 2407.16574 | null |
| 2024-07-23 | Cross Anything: General Quadruped Robot Navigation through Complex Terrains | Shaoting Zhu et.al. | 2407.16412 | null |
| 2024-07-23 | Evaluating Uncertainties in Electricity Markets via Machine Learning and Quantum Computing | Shuyang Zhu et.al. | 2407.16404 | null |
| 2024-07-23 | Reinforcement Learning-based Adaptive Mitigation of Uncorrected DRAM Errors in the Field | Isaac Boixaderas et.al. | 2407.16377 | null |
| 2024-07-23 | Arbitrary quantum states preparation aided by deep reinforcement learning | Zhao-Wei Wang et.al. | 2407.16368 | null |
| 2024-07-22 | WayEx: Waypoint Exploration using a Single Demonstration | Mara Levy et.al. | 2407.15849 | null |
| 2024-07-23 | QueST: Self-Supervised Skill Abstractions for Learning Continuous Control | Atharva Mete et.al. | 2407.15840 | null |
| 2024-07-22 | Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments | Mansur Arief et.al. | 2407.15839 | null |
| 2024-07-22 | On shallow planning under partial observability | Randy Lefebvre et.al. | 2407.15820 | null |
| 2024-07-22 | Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning | Zhecheng Yuan et.al. | 2407.15815 | null |
| 2024-07-22 | Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels | Zhuorui Ye et.al. | 2407.15786 | null |
| 2024-07-22 | Diffusion Model Based Resource Allocation Strategy in Ultra-Reliable Wireless Networked Control Systems | Amirhassan Babazadeh Darabi et.al. | 2407.15784 | null |
| 2024-07-22 | How to Shrink Confidence Sets for Many Equivalent Discrete Distributions? | Odalric-Ambrym Maillard et.al. | 2407.15662 | null |
| 2024-07-22 | Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN | Norman Becker et.al. | 2407.15656 | null |
| 2024-07-22 | Reinforcement Learning Meets Visual Odometry | Nico Messikommer et.al. | 2407.15626 | null |
| 2024-07-19 | Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification | Thomas Kwa et.al. | 2407.14503 | null |
| 2024-07-19 | Explainable Post hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning agent | Alejandra de la Rica Escudero et.al. | 2407.14486 | link |
| 2024-07-19 | Data-Centric Human Preference Optimization with Rationales | Hoang Anh Just et.al. | 2407.14477 | null |
| 2024-07-19 | FuzzTheREST: An Intelligent Automated Black-box RESTful API Fuzzer | Tiago Dias et.al. | 2407.14361 | null |
| 2024-07-19 | Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning | Nihal Acharya Adde et.al. | 2407.14262 | null |
| 2024-07-19 | On Policy Evaluation Algorithms in Distributional Reinforcement Learning | Julian Gerstenberg et.al. | 2407.14175 | null |
| 2024-07-19 | A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C | Neil De La Fuente et.al. | 2407.14151 | link |
| 2024-07-19 | Track-MDP: Reinforcement Learning for Target Tracking with Controlled Sensing | Adarsh M. Subramaniam et.al. | 2407.13995 | null |
| 2024-07-19 | The Effect of Training Schedules on Morphological Robustness and Generalization | Edoardo Barba et.al. | 2407.13965 | link |
| 2024-07-18 | Event-Triggered Reinforcement Learning Based Joint Resource Allocation for Ultra-Reliable Low-Latency V2X Communications | Nasir Khan et.al. | 2407.13947 | null |
| 2024-07-18 | Random Latent Exploration for Deep Reinforcement Learning | Srinath Mahankali et.al. | 2407.13755 | null |
| 2024-07-18 | Optimistic Q-learning for average reward and episodic reinforcement learning | Priyank Agrawal et.al. | 2407.13743 | null |
| 2024-07-18 | Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review | Masatoshi Uehara et.al. | 2407.13734 | null |
| 2024-07-18 | A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice | Shaina Raza et.al. | 2407.13699 | null |
| 2024-07-18 | Misspecified $Q$ -Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error | Ally Yalei Du et.al. | 2407.13622 | null |
| 2024-07-18 | Hyp2Nav: Hyperbolic Planning and Curiosity for Crowd Navigation | Alessandro Flaborea et.al. | 2407.13567 | null |
| 2024-07-18 | Model-based Policy Optimization using Symbolic World Model | Andrey Gorodetskiy et.al. | 2407.13518 | null |
| 2024-07-18 | Instance Selection for Dynamic Algorithm Configuration with Reinforcement Learning: Improving Generalization | Carolin Benjamins et.al. | 2407.13513 | null |
| 2024-07-18 | LIMT: Language-Informed Multi-Task Visual World Models | Elie Aljalbout et.al. | 2407.13466 | null |
| 2024-07-18 | The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations | Jan Ole von Hartz et.al. | 2407.13432 | null |
| 2024-07-17 | Navigating the Smog: A Cooperative Multi-Agent RL for Accurate Air Pollution Mapping through Data Assimilation | Ichrak Mokhtari et.al. | 2407.12539 | null |
| 2024-07-17 | Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models | Xihe Qiu et.al. | 2407.12532 | null |
| 2024-07-17 | Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments | Runfa Chen et.al. | 2407.12505 | null |
| 2024-07-17 | Estimating Reaction Barriers with Deep Reinforcement Learning | Adittya Pal et.al. | 2407.12453 | null |
| 2024-07-17 | Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning | Xu-Hui Liu et.al. | 2407.12448 | link |
| 2024-07-17 | Variable-Agnostic Causal Exploration for Reinforcement Learning | Minh Hoang Nguyen et.al. | 2407.12437 | null |
| 2024-07-17 | Flow Matching Imitation Learning for Multi-Support Manipulation | Quentin Rouxel et.al. | 2407.12381 | null |
| 2024-07-17 | A foundation model approach to guide antimicrobial peptide design in the era of artificial intelligence driven scientific discovery | Jike Wang et.al. | 2407.12296 | null |
| 2024-07-17 | Chip Placement with Diffusion | Vint Lee et.al. | 2407.12282 | null |
| 2024-07-17 | Individualized Federated Learning for Traffic Prediction with Error Driven Aggregation | Hang Chen et.al. | 2407.12226 | link |
| 2024-07-16 | Why long model-based rollouts are no reason for bad Q-value estimates | Philipp Wissmann et.al. | 2407.11751 | null |
| 2024-07-16 | Pareto local search for a multi-objective demand response problem in residential areas with heat pumps and electric vehicles | Thomas Dengiz et.al. | 2407.11719 | null |
| 2024-07-16 | A Comparative Analysis of Interactive Reinforcement Learning Algorithms in Warehouse Robot Grid Based Environment | Arunabh Bora et.al. | 2407.11671 | null |
| 2024-07-16 | Exciting Action: Investigating Efficient Exploration for Learning Musculoskeletal Humanoid Locomotion | Henri-Jacques Geiß et.al. | 2407.11658 | null |
| 2024-07-16 | Building Resilience in Wireless Communication Systems With a Secret-Key Budget | Karl-Ludwig Besser et.al. | 2407.11604 | null |
| 2024-07-16 | Learning to Imitate Spatial Organization in Multi-robot Systems | Ayomide O. Agunloye et.al. | 2407.11592 | null |
| 2024-07-16 | Green Resource Allocation in Cloud-Native O-RAN Enabled Small Cell Networks | Rana M. Sohaib et.al. | 2407.11563 | null |
| 2024-07-16 | RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards | Fatemeh Zargarbashi et.al. | 2407.11562 | null |
| 2024-07-16 | Imitation learning with artificial neural networks for demand response with a heuristic control approach for heat pumps | Thomas Dengiz et.al. | 2407.11561 | null |
| 2024-07-16 | DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN | Rana M. Sohaib et.al. | 2407.11558 | null |
| 2024-07-15 | Walking the Values in Bayesian Inverse Reinforcement Learning | Ondrej Bajgar et.al. | 2407.10971 | null |
| 2024-07-15 | BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning | Haohong Lin et.al. | 2407.10967 | null |
| 2024-07-15 | Hedging Beyond the Mean: A Distributional Reinforcement Learning Perspective for Hedging Portfolios with Structured Products | Anil Sharma et.al. | 2407.10903 | null |
| 2024-07-15 | Offline Reinforcement Learning with Imputed Rewards | Carlo Romeo et.al. | 2407.10839 | null |
| 2024-07-15 | Exploration in Knowledge Transfer Utilizing Reinforcement Learning | Adam Jedlička et.al. | 2407.10835 | null |
| 2024-07-15 | GuideLight: “Industrial Solution” Guidance for More Practical Traffic Signal Control Agents | Haoyuan Jiang et.al. | 2407.10811 | null |
| 2024-07-15 | DINO Pre-training for Vision-based End-to-end Autonomous Driving | Shubham Juneja et.al. | 2407.10803 | null |
| 2024-07-15 | Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning | Alessandro Montenegro et.al. | 2407.10775 | null |
| 2024-07-16 | Back to Newton’s Laws: Learning Vision-based Agile Flight via Differentiable Physics | Yuang Zhang et.al. | 2407.10648 | null |
| 2024-07-15 | Balancing the Scales: Reinforcement Learning for Fair Classification | Leon Eshuijs et.al. | 2407.10629 | null |
| 2024-07-12 | Learning Coordinated Maneuver in Adversarial Environments | Zechen Hu et.al. | 2407.09469 | null |
| 2024-07-12 | ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts | Amelia F. Hardy et.al. | 2407.09447 | null |
| 2024-07-12 | A Benchmark Environment for Offline Reinforcement Learning in Racing Games | Girolamo Macaluso et.al. | 2407.09415 | link |
| 2024-07-12 | Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments | Zoya Volovikova et.al. | 2407.09287 | null |
| 2024-07-12 | GNN with Model-based RL for Multi-agent Systems | Hanxiao Chen et.al. | 2407.09249 | null |
| 2024-07-12 | Constrained Intrinsic Motivation for Reinforcement Learning | Xiang Zheng et.al. | 2407.09247 | null |
| 2024-07-12 | Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network | Shun Kotoku et.al. | 2407.09124 | null |
| 2024-07-12 | New Desiderata for Direct Preference Optimization | Xiangkun Hu et.al. | 2407.09072 | null |
| 2024-07-12 | Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control | Huayu Chen et.al. | 2407.09024 | null |
| 2024-07-12 | Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control | Sicong Jiang et.al. | 2407.08964 | null |
| 2024-07-11 | MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces | Wayne Wu et.al. | 2407.08725 | null |
| 2024-07-11 | RoboMorph: Evolving Robot Morphology using Large Language Models | Kevin Qiu et.al. | 2407.08626 | null |
| 2024-07-11 | A Review of Nine Physics Engines for Reinforcement Learning Research | Michael Kaup et.al. | 2407.08590 | null |
| 2024-07-11 | HACMan++: Spatially-Grounded Motion Primitives for Manipulation | Bowen Jiang et.al. | 2407.08585 | null |
| 2024-07-11 | Imitation Learning for Robotic Assisted Ultrasound Examination of Deep Venous Thrombosis using Kernelized Movement Primitives | Diego Dall’Alba et.al. | 2407.08506 | null |
| 2024-07-11 | TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations | Junik Bae et.al. | 2407.08464 | null |
| 2024-07-11 | Distributed Deep Reinforcement Learning Based Gradient Quantization for Federated Learning Enabled Vehicle Edge Computing | Cui Zhang et.al. | 2407.08462 | null |
| 2024-07-11 | Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning | Shulin Song et.al. | 2407.08458 | link |
| 2024-07-11 | A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning | Adrien Banse et.al. | 2407.08324 | null |
| 2024-07-11 | A Deep Reinforcement Learning Framework and Methodology for Reducing the Sim-to-Real Gap in ASV Navigation | Luis F W Batista et.al. | 2407.08263 | null |
| 2024-07-10 | Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing | Jessica Yin et.al. | 2407.07885 | null |
| 2024-07-10 | Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation | Eugene Teoh et.al. | 2407.07868 | null |
| 2024-07-10 | Reinforcement Learning of Adaptive Acquisition Policies for Inverse Problems | Gianluigi Silvestri et.al. | 2407.07794 | null |
| 2024-07-11 | BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark | Nikita Chernyadev et.al. | 2407.07788 | null |
| 2024-07-10 | Continuous Control with Coarse-to-fine Reinforcement Learning | Younggyo Seo et.al. | 2407.07787 | null |
| 2024-07-10 | Towards Human-Like Driving: Active Inference in Autonomous Vehicle Control | Elahe Delavari et.al. | 2407.07684 | null |
| 2024-07-10 | Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning | Dake Zhang et.al. | 2407.07631 | null |
| 2024-07-10 | Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network | Yu Xie et.al. | 2407.07575 | link |
| 2024-07-10 | CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias | Jiacheng Shen et.al. | 2407.07454 | link |
| 2024-07-10 | Real-time system optimal traffic routing under uncertainties – Can physics models boost reinforcement learning? | Zemian Ke et.al. | 2407.07364 | null |
| 2024-07-09 | Safe and Reliable Training of Learning-Based Aerospace Controllers | Udayan Mandal et.al. | 2407.07088 | null |
| 2024-07-09 | Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models | Logan Cross et.al. | 2407.07086 | link |
| 2024-07-09 | Can Learned Optimization Make Reinforcement Learning Less Difficult? | Alexander David Goldie et.al. | 2407.07082 | link |
| 2024-07-09 | A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning | Jesse Jiang et.al. | 2407.06931 | null |
| 2024-07-09 | Intercepting Unauthorized Aerial Robots in Controlled Airspace Using Reinforcement Learning | Francisco Giral et.al. | 2407.06909 | null |
| 2024-07-09 | Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective | Shahana Ibrahim et.al. | 2407.06902 | null |
| 2024-07-09 | Energy Efficient Fair STAR-RIS for Mobile Users | Ashok S. Kumar et.al. | 2407.06868 | null |
| 2024-07-09 | Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning | Augustine N. Mavor-Parker et.al. | 2407.06756 | null |
| 2024-07-09 | Hierarchical Average-Reward Linearly-solvable Markov Decision Processes | Guillermo Infante et.al. | 2407.06690 | null |
| 2024-07-09 | Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning | Fanyue Wei et.al. | 2407.06642 | link |
| 2024-07-08 | Periodic agent-state based Q-learning for POMDPs | Amit Sinha et.al. | 2407.06121 | null |
| 2024-07-08 | QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train | Chen-Yu Liu et.al. | 2407.06103 | null |
| 2024-07-08 | Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation | Sara Pohland et.al. | 2407.06056 | link |
| 2024-07-08 | iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement | Aoyu Pang et.al. | 2407.06025 | link |
| 2024-07-08 | Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals | Moritz Reuss et.al. | 2407.05996 | null |
| 2024-07-08 | On Bellman equations for continuous-time policy evaluation I: discretization and approximation | Wenlong Mou et.al. | 2407.05966 | null |
| 2024-07-08 | Graph Anomaly Detection with Noisy Labels by Reinforcement Learning | Zhu Wang et.al. | 2407.05934 | null |
| 2024-07-08 | FedMRL: Data Heterogeneity Aware Federated Multi-agent Deep Reinforcement Learning for Medical Imaging | Pranab Sahoo et.al. | 2407.05800 | link |
| 2024-07-08 | Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning | Jakob Nyberg et.al. | 2407.05775 | link |
| 2024-07-08 | Multi-agent Reinforcement Learning-based Network Intrusion Detection System | Amine Tellache et.al. | 2407.05766 | null |
| 2024-07-05 | Graph Reinforcement Learning in Power Grids: A Survey | Mohamed Hassouna et.al. | 2407.04522 | null |
| 2024-07-05 | Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks | Timon Sachweh et.al. | 2407.04481 | null |
| 2024-07-05 | Hindsight Preference Learning for Offline Preference-based Reinforcement Learning | Chen-Xiao Gao et.al. | 2407.04451 | link |
| 2024-07-05 | Enhancing Safety for Autonomous Agents in Partly Concealed Urban Traffic Environments Through Representation-Based Shielding | Pierre Haritz et.al. | 2407.04343 | null |
| 2024-07-05 | Gradient-based Regularization for Action Smoothness in Robotic Control with Reinforcement Learning | I Lee et.al. | 2407.04315 | null |
| 2024-07-05 | Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling | Jiawei Xu et.al. | 2407.04285 | null |
| 2024-07-05 | Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator | Mehryar Abbasi et.al. | 2407.04258 | null |
| 2024-07-05 | PA-LOCO: Learning Perturbation-Adaptive Locomotion for Quadruped Robots | Zhiyuan Xiao et.al. | 2407.04224 | null |
| 2024-07-05 | Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents | Sam Earle et.al. | 2407.04221 | null |
| 2024-07-04 | Orchestrating LLMs with Different Personalizations | Jin Peng Zhou et.al. | 2407.04181 | null |
| 2024-07-03 | Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations | Trevor Ablett et.al. | 2407.03311 | link |
| 2024-07-03 | A Review of the Applications of Deep Learning-Based Emergent Communication | Brendon Boldt et.al. | 2407.03302 | null |
| 2024-07-03 | Cooperative Multi-Agent Deep Reinforcement Learning Methods for UAV-aided Mobile Edge Computing Networks | Mintae Kim et.al. | 2407.03280 | null |
| 2024-07-03 | Policy-guided Monte Carlo on general state spaces: Application to glass-forming mixtures | Leonardo Galliano et.al. | 2407.03275 | null |
| 2024-07-03 | PPO-based Dynamic Control of Uncertain Floating Platforms in the Zero-G Environment | Mahya Ramezani et.al. | 2407.03224 | null |
| 2024-07-03 | Combining AI Control Systems and Human Decision Support via Robustness and Criticality | Walt Woods et.al. | 2407.03210 | null |
| 2024-07-03 | Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning | Runyu Ding et.al. | 2407.03162 | null |
| 2024-07-03 | Reinforcement Learning for Sequence Design Leveraging Protein Language Models | Jithendaraa Subramanian et.al. | 2407.03154 | null |
| 2024-07-03 | Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes | Asaf Cassel et.al. | 2407.03065 | null |
| 2024-07-03 | Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment | Janghwan Lee et.al. | 2407.03051 | null |
| 2024-07-02 | PWM: Policy Learning with Large World Models | Ignat Georgiev et.al. | 2407.02466 | null |
| 2024-07-02 | Predicting Visual Attention in Graphic Design Documents | Souradeep Chakraborty et.al. | 2407.02439 | null |
| 2024-07-02 | Reinforcement Learning and Machine ethics:a systematic review | Ajay Vishwanath et.al. | 2407.02425 | null |
| 2024-07-02 | Talking to Machines: do you read me? | Lina M. Rojas-Barahona et.al. | 2407.02354 | null |
| 2024-07-02 | DextrAH-G: Pixels-to-Action Dexterous Arm-Hand Grasping with Geometric Fabrics | Tyler Ga Wei Lum et.al. | 2407.02274 | null |
| 2024-07-02 | Safe CoR: A Dual-Expert Approach to Integrating Imitation Learning and Safe Reinforcement Learning Using Constraint Rewards | Hyeokjin Kwon et.al. | 2407.02245 | null |
| 2024-07-02 | Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization | Yuchen Hu et.al. | 2407.02243 | null |
| 2024-07-02 | Safety-Driven Deep Reinforcement Learning Framework for Cobots: A Sim2Real Approach | Ammar N. Abbas et.al. | 2407.02231 | link |
| 2024-07-02 | Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning | Zakariae El Asri et.al. | 2407.02217 | null |
| 2024-07-02 | Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning | Yifang Chen et.al. | 2407.02119 | null |
| 2024-06-28 | PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators | Kuo-Hao Zeng et.al. | 2406.20083 | null |
| 2024-06-28 | Applying RLAIF for Code Generation with API-usage in Lightweight LLMs | Sujan Dutta et.al. | 2406.20060 | null |
| 2024-06-28 | HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid | Xinyu Xu et.al. | 2406.19972 | null |
| 2024-06-28 | Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies | Pingcheng Jian et.al. | 2406.19971 | null |
| 2024-06-28 | Operator World Models for Reinforcement Learning | Pietro Novelli et.al. | 2406.19861 | null |
| 2024-06-28 | 3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints | Yoonkyu Yoo et.al. | 2406.19848 | null |
| 2024-06-28 | Reinforcement Learning for Efficient Design and Control Co-optimisation of Energy Systems | Marine Cauz et.al. | 2406.19825 | null |
| 2024-06-28 | Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning | Tobias Nagel et.al. | 2406.19817 | null |
| 2024-06-28 | Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs | Shiyu Zhang et.al. | 2406.19812 | null |
| 2024-06-28 | Decision Transformer for IRS-Assisted Systems with Diffusion-Driven Generative Channels | Jie Zhang et.al. | 2406.19769 | null |
| 2024-06-27 | Efficient World Models with Context-Aware Tokenization | Vincent Micheli et.al. | 2406.19320 | link |
| 2024-06-27 | Averaging log-likelihoods in direct alignment | Nathan Grinsztajn et.al. | 2406.19188 | null |
| 2024-06-27 | Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion | Yannis Flet-Berliac et.al. | 2406.19185 | null |
| 2024-06-27 | Learning Pareto Set for Multi-Objective Continuous Robot Control | Tianye Shu et.al. | 2406.18924 | link |
| 2024-06-27 | Autonomous Control of a Novel Closed Chain Five Bar Active Suspension via Deep Reinforcement Learning | Nishesh Singh et.al. | 2406.18899 | null |
| 2024-06-27 | State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems | Tochukwu Elijah Ogri et.al. | 2406.18804 | null |
| 2024-06-26 | Decentralized Semantic Traffic Control in AVs Using RL and DQN for Dynamic Roadblocks | Emanuel Figetakis et.al. | 2406.18741 | null |
| 2024-06-26 | Confident Natural Policy Gradient for Local Planning in $q_π$ -realizable Constrained MDPs | Tian Tian et.al. | 2406.18529 | null |
| 2024-06-26 | Mental Modeling of Reinforcement Learning Agents by Language Models | Wenhao Lu et.al. | 2406.18505 | null |
| 2024-06-26 | Preference Elicitation for Offline Reinforcement Learning | Alizée Pace et.al. | 2406.18450 | null |
| 2024-06-26 | Mixture of Experts in a Mixture of RL settings | Timon Willi et.al. | 2406.18420 | null |
| 2024-06-26 | AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors | Hao Shi et.al. | 2406.18394 | null |
| 2024-06-26 | Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control | Zifan Liu et.al. | 2406.18351 | null |
| 2024-06-26 | AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations | Adam Dahlgren Lindström et.al. | 2406.18346 | null |
| 2024-06-26 | Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution | Wenting Chen et.al. | 2406.18310 | link |
| 2024-06-26 | Combining Automated Optimisation of Hyperparameters and Reward Shape | Julian Dierkes et.al. | 2406.18293 | link |
| 2024-06-26 | Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems | Italo Luis da Silva et.al. | 2406.18245 | link |
| 2024-06-25 | EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data | Jesse Zhang et.al. | 2406.17768 | null |
| 2024-06-25 | When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning | Claas Voelcker et.al. | 2406.17718 | null |
| 2024-06-25 | Privacy Preserving Reinforcement Learning for Population Processes | Samuel Yang-Zhao et.al. | 2406.17649 | null |
| 2024-06-25 | KANQAS: Kolmogorov Arnold Network for Quantum Architecture Search | Akash Kundu et.al. | 2406.17630 | link |
| 2024-06-25 | Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations | Cheng Wang et.al. | 2406.17576 | null |
| 2024-06-25 | On the consistency of hyper-parameter selection in value-based deep reinforcement learning | Johan Obando-Ceron et.al. | 2406.17523 | null |
| 2024-06-25 | BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO | Sebastian Dittert et.al. | 2406.17490 | null |
| 2024-06-25 | CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems | Zhen Chen et.al. | 2406.17425 | null |
| 2024-06-25 | Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning | Tianfu Wang et.al. | 2406.17334 | link |
| 2024-06-25 | The State-Action-Reward-State-Action Algorithm in Spatial Prisoner’s Dilemma Game | Lanyu Yang et.al. | 2406.17326 | null |
| 2024-06-24 | Confidence Aware Inverse Constrained Reinforcement Learning | Sriram Ganapathi Subramanian et.al. | 2406.16782 | null |
| 2024-06-24 | WARP: On the Benefits of Weight Averaged Rewarded Policies | Alexandre Ramé et.al. | 2406.16768 | null |
| 2024-06-24 | The MRI Scanner as a Diagnostic: Image-less Active Sampling | Yuning Du et.al. | 2406.16754 | null |
| 2024-06-24 | OCALM: Object-Centric Assessment with Language Models | Timo Kaufmann et.al. | 2406.16748 | null |
| 2024-06-24 | Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization | Zhengyue Zhao et.al. | 2406.16743 | null |
| 2024-06-24 | Probabilistic Subgoal Representations for Hierarchical Reinforcement learning | Vivienne Huiling Wang et.al. | 2406.16707 | null |
| 2024-06-24 | Decentralized RL-Based Data Transmission Scheme for Energy Efficient Harvesting | Rafaela Scaciota et.al. | 2406.16624 | null |
| 2024-06-24 | Towards Physically Talented Aerial Robots with Tactically Smart Swarm Behavior thereof: An Efficient Co-design Approach | Prajit KrisshnaKumar et.al. | 2406.16612 | null |
| 2024-06-24 | $\text{Alpha}^2$ : Discovering Logical Formulaic Alphas using Deep Reinforcement Learning | Feng Xu et.al. | 2406.16505 | link |
| 2024-06-24 | Towards Comprehensive Preference Data Collection for Reward Modeling | Yulan Hu et.al. | 2406.16486 | null |
| 2024-06-21 | MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | Xuan He et.al. | 2406.15252 | null |
| 2024-06-21 | Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning | Sattar Vakili et.al. | 2406.15250 | null |
| 2024-06-21 | Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting | Jiyong Oh et.al. | 2406.15225 | null |
| 2024-06-21 | Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks | Alex Quach et.al. | 2406.15149 | null |
| 2024-06-21 | KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty | Philipp Becker et.al. | 2406.15131 | null |
| 2024-06-21 | A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning | Gianluca Drappo et.al. | 2406.15124 | null |
| 2024-06-21 | Towards General Negotiation Strategies with End-to-End Reinforcement Learning | Bram M. Renting et.al. | 2406.15096 | null |
| 2024-06-21 | KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning | Jiahan Chen et.al. | 2406.15073 | null |
| 2024-06-21 | Behaviour Distillation | Andrei Lupu et.al. | 2406.15042 | link |
| 2024-06-21 | SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning | Matthias Weissenbacher et.al. | 2406.15025 | null |
| 2024-06-20 | CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics | Jiawei Gao et.al. | 2406.14558 | null |
| 2024-06-20 | MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading | Chuqiao Zong et.al. | 2406.14537 | link |
| 2024-06-20 | RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold | Amrith Setlur et.al. | 2406.14532 | link |
| 2024-06-20 | Learning telic-controllable state representations | Nadav Amir et.al. | 2406.14476 | null |
| 2024-06-20 | Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue | Huifang Du et.al. | 2406.14457 | null |
| 2024-06-20 | Revealing the learning process in reinforcement learning agents through attention-oriented metrics | Charlotte Beylier et.al. | 2406.14324 | null |
| 2024-06-20 | Resource Optimization for Tail-Based Control in Wireless Networked Control Systems | Rasika Vijithasena et.al. | 2406.14301 | null |
| 2024-06-21 | REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability | Shuang Ao et.al. | 2406.14214 | link |
| 2024-06-20 | Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning | Amit Sharma et.al. | 2406.14169 | null |
| 2024-06-20 | Iterative Sizing Field Prediction for Adaptive Mesh Generation From Expert Demonstrations | Niklas Freymuth et.al. | 2406.14161 | link |
| 2024-06-18 | Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Haoxiang Wang et.al. | 2406.12845 | link |
| 2024-06-18 | Injection Optimization at Particle Accelerators via Reinforcement Learning: From Simulation to Real-World Application | Awal Awal et.al. | 2406.12735 | null |
| 2024-06-18 | A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement Learning | Flora Angileri et.al. | 2406.12667 | null |
| 2024-06-18 | Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry | A. L. García Navarro et.al. | 2406.12602 | null |
| 2024-06-18 | Discovering Minimal Reinforcement Learning Environments | Jarek Liesen et.al. | 2406.12589 | null |
| 2024-06-18 | RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation | Shuting Wang et.al. | 2406.12566 | null |
| 2024-06-18 | A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo | Miguel Vasco et.al. | 2406.12563 | null |
| 2024-06-18 | Offline Imitation Learning with Model-based Reverse Augmentation | Jie-Jing Shao et.al. | 2406.12550 | null |
| 2024-06-18 | Demonstrating Agile Flight from Pixels without State Estimation | Ismail Geles et.al. | 2406.12505 | null |
| 2024-06-18 | Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning | Harry Robertshaw et.al. | 2406.12499 | null |
| 2024-06-17 | WPO: Enhancing RLHF with Weighted Preference Optimization | Wenxuan Zhou et.al. | 2406.11827 | link |
| 2024-06-17 | Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics | Runzhe Wu et.al. | 2406.11810 | null |
| 2024-06-17 | Run Time Assured Reinforcement Learning for Six Degree-of-Freedom Spacecraft Inspection | Kyle Dunlap et.al. | 2406.11795 | null |
| 2024-06-17 | FetchBench: A Simulation Benchmark for Robot Fetching | Beining Han et.al. | 2406.11793 | null |
| 2024-06-17 | Optimal Transport-Assisted Risk-Sensitive Q-Learning | Zahra Shahrooei et.al. | 2406.11774 | null |
| 2024-06-17 | Measuring memorization in RLHF for code completion | Aneesh Pappu et.al. | 2406.11715 | null |
| 2024-06-17 | The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation | Noah Golowich et.al. | 2406.11686 | null |
| 2024-06-17 | Communication-Efficient MARL for Platoon Stability and Energy-efficiency Co-optimization in Cooperative Adaptive Cruise Control of CAVs | Min Hua et.al. | 2406.11653 | null |
| 2024-06-17 | Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions | Noah Golowich et.al. | 2406.11640 | null |
| 2024-06-17 | Style Transfer with Multi-iteration Preference Optimization | Shuai Liu et.al. | 2406.11581 | null |
| 2024-06-14 | Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs | Rui Yang et.al. | 2406.10216 | null |
| 2024-06-14 | A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors | Naaman Tan et.al. | 2406.10203 | null |
| 2024-06-14 | Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication | Sanjali Yadav et.al. | 2406.10166 | null |
| 2024-06-14 | Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | Carson Denison et.al. | 2406.10162 | link |
| 2024-06-14 | BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation | Dongjie Yu et.al. | 2406.10093 | null |
| 2024-06-14 | PRIMER: Perception-Aware Robust Learning-based Multiagent Trajectory Planner | Kota Kondo et.al. | 2406.10060 | null |
| 2024-06-14 | Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation | Federico Tavella et.al. | 2406.10043 | null |
| 2024-06-14 | ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR | Vishwanath Pratap Singh et.al. | 2406.09999 | null |
| 2024-06-14 | Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model | Siemen Herremans et.al. | 2406.09976 | link |
| 2024-06-14 | InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning | Tiancheng Li et.al. | 2406.09973 | null |
| 2024-06-13 | Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms | Miaosen Zhang et.al. | 2406.09397 | null |
| 2024-06-13 | Is Value Learning Really the Main Bottleneck in Offline RL? | Seohong Park et.al. | 2406.09329 | null |
| 2024-06-13 | OpenVLA: An Open-Source Vision-Language-Action Model | Moo Jin Kim et.al. | 2406.09246 | null |
| 2024-06-13 | AutomaChef: A Physics-informed Demonstration-guided Learning Framework for Granular Material Manipulation | Minglun Wei et.al. | 2406.09178 | null |
| 2024-06-13 | Direct Imitation Learning-based Visual Servoing using the Large Projection Formulation | Sayantan Auddy et.al. | 2406.09120 | null |
| 2024-06-13 | Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems | Ashwin P. Dani et.al. | 2406.09097 | null |
| 2024-06-13 | DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning | Xuemin Hu et.al. | 2406.09089 | null |
| 2024-06-13 | Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles | Hao Zhang et.al. | 2406.09082 | null |
| 2024-06-13 | Latent Assistance Networks: Rediscovering Hyperbolic Tangents in RL | Jacob E. Kooi et.al. | 2406.09079 | null |
| 2024-06-13 | Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation | Claude Formanek et.al. | 2406.09068 | null |
| 2024-06-12 | RILe: Reinforced Imitation Learning | Mert Albaba et.al. | 2406.08472 | null |
| 2024-06-12 | Adaptive Swarm Mesh Refinement using Deep Reinforcement Learning with Local Rewards | Niklas Freymuth et.al. | 2406.08440 | null |
| 2024-06-12 | RRLS : Robust Reinforcement Learning Suite | Adil Zouitine et.al. | 2406.08406 | link |
| 2024-06-12 | Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning | Yuhui Wang et.al. | 2406.08404 | null |
| 2024-06-12 | Time-Constrained Robust MDPs | Adil Zouitine et.al. | 2406.08395 | null |
| 2024-06-12 | Residual Learning and Context Encoding for Adaptive Offline-to-Online Reinforcement Learning | Mohammadreza Nakhaei et.al. | 2406.08238 | link |
| 2024-06-12 | MaIL: Improving Imitation Learning with Mamba | Xiaogang Jia et.al. | 2406.08234 | null |
| 2024-06-12 | Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning | Max Weltevrede et.al. | 2406.08069 | null |
| 2024-06-12 | Deep reinforcement learning with positional context for intraday trading | Sven Goluža et.al. | 2406.08013 | null |
| 2024-06-12 | Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning | Yizhe Huang et.al. | 2406.08002 | null |
| 2024-06-11 | CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning | Zeyuan Liu et.al. | 2406.07541 | null |
| 2024-06-11 | BAKU: An Efficient Transformer for Multi-Task Policy Learning | Siddhant Haldar et.al. | 2406.07539 | null |
| 2024-06-11 | Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis | Qining Zhang et.al. | 2406.07455 | null |
| 2024-06-11 | Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization | Weiliang Zhang et.al. | 2406.07418 | null |
| 2024-06-11 | Federated Multi-Agent DRL for Radio Resource Management in Industrial 6G in-X subnetworks | Bjarke Madsen et.al. | 2406.07383 | null |
| 2024-06-11 | World Models with Hints of Large Language Models for Goal Achieving | Zeyuan Liu et.al. | 2406.07381 | null |
| 2024-06-11 | EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning | Yijun Hao et.al. | 2406.07342 | null |
| 2024-06-11 | Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling | Constantin Waubert de Puiseau et.al. | 2406.07325 | null |
| 2024-06-11 | Multi-objective Reinforcement learning from AI Feedback | Marcus Williams et.al. | 2406.07295 | null |
| 2024-06-11 | Hybrid Reinforcement Learning from Offline Observation Alone | Yuda Song et.al. | 2406.07253 | null |
| 2024-06-10 | Verification-Guided Shielding for Deep Reinforcement Learning | Davide Corsi et.al. | 2406.06507 | null |
| 2024-06-10 | Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation | Mohidul Haque Mridul et.al. | 2406.06500 | null |
| 2024-06-10 | Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity | Calarina Muslimani et.al. | 2406.06495 | null |
| 2024-06-10 | Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots | Bahador Beigomi et.al. | 2406.06460 | link |
| 2024-06-10 | Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning? | Denis Tarasov et.al. | 2406.06309 | link |
| 2024-06-10 | Learning-based cognitive architecture for enhancing coordination in human groups | Antonio Grotta et.al. | 2406.06297 | null |
| 2024-06-10 | Deep Multi-Objective Reinforcement Learning for Utility-Based Infrastructural Maintenance Optimization | Jesse van Remmerden et.al. | 2406.06184 | null |
| 2024-06-10 | Mastering truss structure optimization with tree search | Gabriel E. Garayalde et.al. | 2406.06145 | null |
| 2024-06-10 | EXPIL: Explanatory Predicate Invention for Learning in Games | Jingyuan Sha et.al. | 2406.06107 | null |
| 2024-06-10 | Sim-To-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery | Paul Maria Scheikl et.al. | 2406.06092 | null |
| 2024-06-07 | LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration | Tavor Lipman et.al. | 2406.05107 | null |
| 2024-06-07 | Massively Multiagent Minigames for Training Generalist Agents | Kyoung Whan Choe et.al. | 2406.05071 | link |
| 2024-06-07 | Online Frequency Scheduling by Learning Parallel Actions | Anastasios Giovanidis et.al. | 2406.05041 | null |
| 2024-06-07 | Optimizing Automatic Differentiation with Deep Reinforcement Learning | Jamie Lohoff et.al. | 2406.05027 | null |
| 2024-06-07 | Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems | Rohan Paleja et.al. | 2406.05003 | null |
| 2024-06-07 | SLOPE: Search with Learned Optimal Pruning-based Expansion | Davor Bokan et.al. | 2406.04935 | link |
| 2024-06-07 | Sim-to-real Transfer of Deep Reinforcement Learning Agents for Online Coverage Path Planning | Arvi Jonnarth et.al. | 2406.04920 | null |
| 2024-06-07 | Online Adaptation for Enhancing Imitation Learning Policies | Federico Malato et.al. | 2406.04913 | link |
| 2024-06-07 | Stabilizing Extreme Q-learning by Maclaurin Expansion | Motoki Omura et.al. | 2406.04896 | null |
| 2024-06-07 | Primitive Agentic First-Order Optimization | R. Sala et.al. | 2406.04841 | null |
| 2024-06-06 | ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories | Qianlan Yang et.al. | 2406.04323 | null |
| 2024-06-06 | Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models | Xiang Ji et.al. | 2406.04274 | null |
| 2024-06-06 | Multi-Agent Imitation Learning: Value is Easy, Regret is Hard | Jingwu Tang et.al. | 2406.04219 | null |
| 2024-06-06 | Aligning Agents like Large Language Models | Adam Jelley et.al. | 2406.04208 | null |
| 2024-06-06 | MARLander: A Local Path Planning for Drone Swarms using Multiagent Deep Reinforcement Learning | Demetros Aschu et.al. | 2406.04159 | null |
| 2024-06-06 | Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning | Abdullah Akgül et.al. | 2406.04088 | null |
| 2024-06-06 | Bootstrapping Expectiles in Reinforcement Learning | Pierre Clavier et.al. | 2406.04081 | null |
| 2024-06-06 | Spatio-temporal Early Prediction based on Multi-objective Reinforcement Learning | Wei Shao et.al. | 2406.04035 | link |
| 2024-06-06 | Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents | Yoann Poupart et.al. | 2406.04028 | link |
| 2024-06-06 | HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning | Quentin Delfosse et.al. | 2406.03997 | link |
| 2024-06-05 | Automating Turkish Educational Quiz Generation Using Large Language Models | Kamyar Zeinalipour et.al. | 2406.03397 | null |
| 2024-06-05 | LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | Timon Ziegenbein et.al. | 2406.03363 | link |
| 2024-06-05 | UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning | Yu Zhang et.al. | 2406.03324 | null |
| 2024-06-05 | Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning | Mohamed Elsayed et.al. | 2406.03276 | null |
| 2024-06-05 | Prompt-based Visual Alignment for Zero-shot Policy Transfer | Haihan Gao et.al. | 2406.03250 | null |
| 2024-06-05 | Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning | Inwoo Hwang et.al. | 2406.03234 | link |
| 2024-06-05 | CommonPower: Supercharging Machine Learning for Smart Grids | Michael Eichelbeck et.al. | 2406.03231 | link |
| 2024-06-05 | Object Manipulation in Marine Environments using Reinforcement Learning | Ahmed Nader et.al. | 2406.03223 | null |
| 2024-06-05 | Adaptive Distance Functions via Kelvin Transformation | Rafael I. Cabral Muchacho et.al. | 2406.03200 | null |
| 2024-06-05 | DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays | Bo Xia et.al. | 2406.03102 | null |
| 2024-06-04 | RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots | Soroush Nasiriany et.al. | 2406.02523 | link |
| 2024-06-04 | Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs | Filippo Valdettaro et.al. | 2406.02456 | null |
| 2024-06-04 | A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies | Md Mirajul Islam et.al. | 2406.02450 | null |
| 2024-06-04 | Algorithmic Collusion in Dynamic Pricing with Deep Reinforcement Learning | Shidi Deng et.al. | 2406.02437 | null |
| 2024-06-04 | Seed-TTS: A Family of High-Quality Versatile Speech Generation Models | Philip Anastassiou et.al. | 2406.02430 | link |
| 2024-06-04 | Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning | Jiaxu Wang et.al. | 2406.02370 | null |
| 2024-06-04 | How to Explore with Belief: State Entropy Maximization in POMDPs | Riccardo Zamboni et.al. | 2406.02295 | null |
| 2024-06-04 | Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling | Arthur Müller et.al. | 2406.02294 | null |
| 2024-06-04 | Test-Time Regret Minimization in Meta Reinforcement Learning | Mirco Mutti et.al. | 2406.02282 | null |
| 2024-06-04 | Reinforcement Learning with Lookahead Information | Nadav Merlis et.al. | 2406.02258 | null |
| 2024-05-31 | Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF | Tengyang Xie et.al. | 2405.21046 | null |
| 2024-05-31 | Direct Alignment of Language Models via Quality-Aware Self-Refinement | Runsheng Yu et.al. | 2405.21040 | null |
| 2024-06-03 | Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles | Jiesong Lian et.al. | 2405.21027 | null |
| 2024-05-31 | Generating Triangulations and Fibrations with Reinforcement Learning | Per Berglund et.al. | 2405.21017 | null |
| 2024-05-31 | Bayesian Design Principles for Offline-to-Online Reinforcement Learning | Hao Hu et.al. | 2405.20984 | null |
| 2024-05-31 | Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring | Prasoon Raghuwanshi et.al. | 2405.20983 | null |
| 2024-05-31 | SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | Tianyang Xu et.al. | 2405.20974 | link |
| 2024-05-31 | Amortizing intractable inference in diffusion models for vision, language, and control | Siddarth Venkatraman et.al. | 2405.20971 | link |
| 2024-05-31 | Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation | Shangding Gu et.al. | 2405.20860 | null |
| 2024-05-31 | Improving Reward Models with Synthetic Critiques | Zihuiwen Ye et.al. | 2405.20850 | null |
| 2024-05-30 | Group Robust Preference Optimization in Reward-free RLHF | Shyam Sundhar Ramesh et.al. | 2405.20304 | link |
| 2024-05-30 | Evaluating Large Language Model Biases in Persona-Steered Generation | Andy Liu et.al. | 2405.20253 | link |
| 2024-05-30 | InstructionCP: A fast approach to transfer Large Language Models into target language | Kuang-Ming Chen et.al. | 2405.20175 | null |
| 2024-05-30 | Enhancing Battlefield Awareness: An Aerial RIS-assisted ISAC System with Deep Reinforcement Learning | Hyunsang Cho et.al. | 2405.20168 | null |
| 2024-05-30 | Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation | Wooseong Cho et.al. | 2405.20165 | null |
| 2024-05-30 | NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models | Kai Wu et.al. | 2405.20081 | null |
| 2024-05-30 | Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads | Avelina Asada Hadji-Kyriacou et.al. | 2405.20053 | link |
| 2024-05-30 | Deep Reinforcement Learning for Intrusion Detection in IoT: A Survey | Afrah Gueriani et.al. | 2405.20038 | null |
| 2024-05-30 | Safe Multi-agent Reinforcement Learning with Natural Language Constraints | Ziyan Wang et.al. | 2405.20018 | null |
| 2024-05-30 | LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning | Hyungho Na et.al. | 2405.19998 | null |
| 2024-05-29 | Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | Shenao Zhang et.al. | 2405.19332 | link |
| 2024-05-29 | Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF | Shicong Cen et.al. | 2405.19320 | null |
| 2024-05-29 | Robust Preference Optimization through Reward Model Distillation | Adam Fisch et.al. | 2405.19316 | null |
| 2024-05-29 | Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels | Abhay Deshpande et.al. | 2405.19307 | null |
| 2024-05-29 | Act Natural! Projecting Autonomous System Trajectories Into Naturalistic Behavior Sets | Hamzah I. Khan et.al. | 2405.19292 | null |
| 2024-05-29 | Rich-Observation Reinforcement Learning with Continuous Latent Dynamics | Yuda Song et.al. | 2405.19269 | null |
| 2024-05-29 | Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach | Amir Hossein Karbasi et.al. | 2405.19236 | null |
| 2024-05-29 | Diffusion-based Dynamics Models for Long-Horizon Rollout in Offline Reinforcement Learning | Hanye Zhao et.al. | 2405.19189 | null |
| 2024-05-29 | Conditional Latent ODEs for Motion Prediction in Autonomous Driving | Khang Truong Giang et.al. | 2405.19183 | null |
| 2024-05-29 | A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning | Arthur Juliani et.al. | 2405.19153 | null |
| 2024-05-28 | Hierarchical World Models as Visual Whole-Body Humanoid Controllers | Nicklas Hansen et.al. | 2405.18418 | null |
| 2024-05-28 | Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study | Shreyas Bhat et.al. | 2405.18324 | null |
| 2024-05-28 | Highway Reinforcement Learning | Yuhui Wang et.al. | 2405.18289 | null |
| 2024-05-28 | Extreme Value Monte Carlo Tree Search | Masataro Asai et.al. | 2405.18248 | null |
| 2024-05-28 | Recurrent Natural Policy Gradient for POMDPs | Semih Cayci et.al. | 2405.18221 | null |
| 2024-05-28 | Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving | Zhi Zheng et.al. | 2405.18209 | link |
| 2024-05-28 | Mutation-Bias Learning in Games | Johann Bauer et.al. | 2405.18190 | null |
| 2024-05-28 | Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding | Daniel Bethell et.al. | 2405.18180 | link |
| 2024-05-28 | Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing | Wei Zhao et.al. | 2405.18166 | link |
| 2024-05-28 | PyTAG: Tabletop Games for Multi-Agent Reinforcement Learning | Martin Balla et.al. | 2405.18123 | link |
| 2024-05-27 | A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning | Abdulaziz Almuzairee et.al. | 2405.17416 | null |
| 2024-05-27 | Rethinking Transformers in Solving POMDPs | Chenhao Lu et.al. | 2405.17358 | link |
| 2024-05-27 | Opinion-Guided Reinforcement Learning | Kyanna Dagenais et.al. | 2405.17287 | null |
| 2024-05-27 | DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems | Zhi Zheng et.al. | 2405.17272 | link |
| 2024-05-27 | Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning | Adriana Hugessen et.al. | 2405.17243 | null |
| 2024-05-27 | InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning | Guozheng Li et.al. | 2405.17229 | null |
| 2024-05-27 | Learning Generic and Dynamic Locomotion of Humanoids Across Discrete Terrains | Shangqun Yu et.al. | 2405.17227 | null |
| 2024-05-27 | Flow control of three-dimensional cylinders transitioning to turbulence via multi-agent reinforcement learning | P. Suárez et.al. | 2405.17210 | null |
| 2024-05-27 | CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control | Jingqing Ruan et.al. | 2405.17152 | link |
| 2024-05-27 | Q-value Regularized Transformer for Offline Reinforcement Learning | Shengchao Hu et.al. | 2405.17098 | null |
| 2024-05-24 | Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment | Hao Sun et.al. | 2405.15624 | null |
| 2024-05-24 | Neuromorphic dreaming: A pathway to efficient learning in artificial agents | Ingo Blakowski et.al. | 2405.15616 | null |
| 2024-05-24 | OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code | Maxence Faldor et.al. | 2405.15568 | link |
| 2024-05-24 | Learning Generalizable Human Motion Generator with Reinforcement Learning | Yunyao Mao et.al. | 2405.15541 | null |
| 2024-05-24 | Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces | Angeliki Kamoutsi et.al. | 2405.15509 | null |
| 2024-05-24 | Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments | Olivia Jullian Parra et.al. | 2405.15508 | null |
| 2024-05-24 | TD3 Based Collision Free Motion Planning for Robot Navigation | Hao Liu et.al. | 2405.15460 | null |
| 2024-05-24 | Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics | David Boetius et.al. | 2405.15430 | null |
| 2024-05-24 | Model-free reinforcement learning with noisy actions for automated experimental control in optics | Lea Richtmann et.al. | 2405.15421 | null |
| 2024-05-24 | Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate | Fan-Ming Luo et.al. | 2405.15384 | null |
| 2024-05-23 | Privileged Sensing Scaffolds Reinforcement Learning | Edward S. Hu et.al. | 2405.14853 | null |
| 2024-05-23 | Axioms for AI Alignment from Human Feedback | Luise Ge et.al. | 2405.14758 | null |
| 2024-05-23 | AGILE: A Novel Framework of LLM Agents | Peiyuan Feng et.al. | 2405.14751 | link |
| 2024-05-23 | Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence | Minheng Xiao et.al. | 2405.14749 | null |
| 2024-05-23 | SimPO: Simple Preference Optimization with a Reference-Free Reward | Yu Meng et.al. | 2405.14734 | link |
| 2024-05-23 | Multi-turn Reinforcement Learning from Preference Human Feedback | Lior Shani et.al. | 2405.14655 | null |
| 2024-05-23 | Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models | Jingyi Chen et.al. | 2405.14632 | null |
| 2024-05-23 | Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences | Takuya Hiraoka et.al. | 2405.14629 | null |
| 2024-05-23 | Closed-form Symbolic Solutions: A New Perspective on Solving Partial Differential Equations | Shu Wei et.al. | 2405.14620 | null |
| 2024-05-23 | Discretization of continuous input spaces in the hippocampal autoencoder | Adrian F. Amil et.al. | 2405.14600 | null |
| 2024-05-21 | Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale | Shriram Chennakesavalu et.al. | 2405.12961 | null |
| 2024-05-21 | Effect of Synthetic Jets Actuator Parameters on Deep Reinforcement Learning-Based Flow Control Performance in a Square Cylinder | Wang Jia et.al. | 2405.12834 | null |
| 2024-05-21 | Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones | Jan-Hendrik Ewers et.al. | 2405.12800 | null |
| 2024-05-21 | Generative AI and Large Language Models for Cyber Security: All Insights You Need | Mohamed Amine Ferrag et.al. | 2405.12750 | null |
| 2024-05-21 | Reinforcement Learning Enabled Peer-to-Peer Energy Trading for Dairy Farms | Mian Ibad Ali Shah et.al. | 2405.12716 | null |
| 2024-05-21 | A Multimodal Learning-based Approach for Autonomous Landing of UAV | Francisco Neves et.al. | 2405.12681 | null |
| 2024-05-21 | Learning Causal Dynamics Models in Object-Oriented Environments | Zhongwei Yu et.al. | 2405.12615 | null |
| 2024-05-21 | PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation | Yuhua Zhu et.al. | 2405.12535 | null |
| 2024-05-21 | GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems | Zhenwei Wang et.al. | 2405.12475 | null |
| 2024-05-21 | Physics-based Scene Layout Generation from Human Motion | Jianan Li et.al. | 2405.12460 | null |
| 2024-05-20 | Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? | Yang Dai et.al. | 2405.12094 | null |
| 2024-05-20 | PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation | Zhuobin Huang et.al. | 2405.12079 | null |
| 2024-05-20 | Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning | Hai Zhang et.al. | 2405.12001 | null |
| 2024-05-20 | Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space | Qianmei Liu et.al. | 2405.11982 | null |
| 2024-05-20 | A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers | Tom Roth et.al. | 2405.11904 | null |
| 2024-05-20 | Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process | Ermo Hua et.al. | 2405.11870 | link |
| 2024-05-20 | Reward-Punishment Reinforcement Learning with Maximum Entropy | Jiexin Wang et.al. | 2405.11784 | null |
| 2024-05-20 | Efficient Multi-agent Reinforcement Learning by Planning | Qihan Liu et.al. | 2405.11778 | link |
| 2024-05-20 | Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning | Xin Liu et.al. | 2405.11740 | null |
| 2024-05-20 | Highway Graph to Accelerate Reinforcement Learning | Zidu Yin et.al. | 2405.11727 | link |
| 2024-05-17 | Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review | Hongyi Yang et.al. | 2405.10883 | null |
| 2024-05-17 | Automated Radiology Report Generation: A Review of Recent Advances | Phillip Sloan et.al. | 2405.10842 | null |
| 2024-05-17 | Combining Teacher-Student with Representation Learning: A Concurrent Teacher-Student Reinforcement Learning Paradigm for Legged Locomotion | Hongxi Wang et.al. | 2405.10830 | null |
| 2024-05-17 | Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities | Hao Zhou et.al. | 2405.10825 | null |
| 2024-05-17 | A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization | Andrzej Ruszczyński et.al. | 2405.10815 | null |
| 2024-05-17 | SignLLM: Sign Languages Production Large Language Models | Sen Fang et.al. | 2405.10718 | null |
| 2024-05-17 | Sample-Efficient Constrained Reinforcement Learning with General Parameterization | Washim Uddin Mondal et.al. | 2405.10624 | null |
| 2024-05-17 | An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems | Jiyue Tao et.al. | 2405.10576 | null |
| 2024-05-17 | Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage Control | Jaeik Jeong et.al. | 2405.10536 | null |
| 2024-05-17 | Towards Better Question Generation in QA-Based Event Extraction | Zijin Hong et.al. | 2405.10517 | null |
| 2024-05-16 | Stochastic Q-learning for Large Discrete Action Spaces | Fares Fourati et.al. | 2405.10310 | null |
| 2024-05-16 | Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | Yuexiang Zhai et.al. | 2405.10292 | null |
| 2024-05-16 | Keep It Private: Unsupervised Privatization of Online Text | Calvin Bao et.al. | 2405.10260 | link |
| 2024-05-16 | A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy | Zhaoxing Li et.al. | 2405.10214 | null |
| 2024-05-16 | Continuous Transfer Learning for UAV Communication-aware Trajectory Design | Chenrui Sun et.al. | 2405.10087 | null |
| 2024-05-16 | Optimizing Search and Rescue UAV Connectivity in Challenging Terrain through Multi Q-Learning | Mohammed M. H. Qazzaz et.al. | 2405.10042 | null |
| 2024-05-16 | Reward Centering | Abhishek Naik et.al. | 2405.09999 | null |
| 2024-05-16 | Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning | Francisco Leiva et.al. | 2405.09760 | null |
| 2024-05-16 | NIFTY Financial News Headlines Dataset | Raeid Saqur et.al. | 2405.09747 | null |
| 2024-05-15 | Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning | Sihan Zeng et.al. | 2405.09660 | null |
| 2024-05-15 | Reinforcement Learning-Based Framework for the Intelligent Adaptation of User Interfaces | Daniel Gaspar-Figueiredo et.al. | 2405.09255 | null |
| 2024-05-15 | DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State Representation | Jingwen Yang et.al. | 2405.09163 | null |
| 2024-05-15 | CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving | Dechen Gao et.al. | 2405.09111 | null |
| 2024-05-15 | Chaos-based reinforcement learning with TD3 | Toshitaka Matsuki et.al. | 2405.09086 | null |
| 2024-05-15 | Deep Learning in Earthquake Engineering: A Comprehensive Review | Yazhou Xie et.al. | 2405.09021 | null |
| 2024-05-14 | Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language | Jan Kaiser et.al. | 2405.08888 | null |
| 2024-05-14 | Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes | Samuel Tesfazgi et.al. | 2405.08756 | null |
| 2024-05-14 | Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach | Urvij Saroliya et.al. | 2405.08754 | null |
| 2024-05-14 | Reinformer: Max-Return Sequence Modeling for offline RL | Zifeng Zhuang et.al. | 2405.08740 | null |
| 2024-05-14 | I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning | Yashuai Yan et.al. | 2405.08726 | null |
| 2024-05-15 | Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search Planning | Jan-Hendrik Ewers et.al. | 2405.08691 | null |
| 2024-05-14 | A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning | Matteo Cederle et.al. | 2405.08655 | link |
| 2024-05-14 | vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement | Yiwen Zhu et.al. | 2405.08638 | null |
| 2024-05-14 | Optimizing Deep Reinforcement Learning for American Put Option Hedging | Reilly Pickard et.al. | 2405.08602 | null |
| 2024-05-14 | Python-Based Reinforcement Learning on Simulink Models | Georg Schäfer et.al. | 2405.08567 | null |
| 2024-05-14 | Growing Artificial Neural Networks for Control: the Role of Neuronal Diversity | Eleni Nisioti et.al. | 2405.08510 | null |
| 2024-05-13 | Hierarchical Decision Mamba | André Correia et.al. | 2405.07943 | link |
| 2024-05-13 | RLHF Workflow: From Reward Modeling to Online RLHF | Hanze Dong et.al. | 2405.07863 | link |
| 2024-05-13 | Adaptive Exploration for Data-Efficient General Value Function Evaluations | Arushi Jain et.al. | 2405.07838 | null |
| 2024-05-13 | Fixed Point Theory Analysis of a Lambda Policy Iteration with Randomization for the Ćirić Contraction Operator | Abdelkader Belhenniche et.al. | 2405.07824 | null |
| 2024-05-13 | Hamiltonian-based Quantum Reinforcement Learning for Neural Combinatorial Optimization | Georg Kruse et.al. | 2405.07790 | null |
| 2024-05-13 | Hype or Heuristic? Quantum Reinforcement Learning for Join Order Optimisation | Maja Franz et.al. | 2405.07770 | null |
| 2024-05-13 | CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian Optimization | Wei-Ting Tang et.al. | 2405.07760 | null |
| 2024-05-13 | MADRL-Based Rate Adaptation for 360 $\degree$ Video Streaming with Multi-Viewpoint Prediction | Haopeng Wang et.al. | 2405.07759 | null |
| 2024-05-13 | Neural Network Compression for Reinforcement Learning Tasks | Dmitry A. Ivanov et.al. | 2405.07748 | null |
| 2024-05-13 | Backdoor Removal for Generative Large Language Models | Haoran Li et.al. | 2405.07667 | null |
| 2024-05-10 | Value Augmented Sampling for Language Model Alignment and Personalization | Seungwook Han et.al. | 2405.06639 | link |
| 2024-05-10 | EcoEdgeTwin: Enhanced 6G Network via Mobile Edge Computing and Digital Twin Integration | Synthia Hossain Karobi et.al. | 2405.06507 | null |
| 2024-05-10 | Advantageous and disadvantageous inequality aversion can be taught through vicarious learning of others’ preferences | Shen Zhang et.al. | 2405.06500 | null |
| 2024-05-10 | Contextual Affordances for Safe Exploration in Robotic Scenarios | William Z. Ye et.al. | 2405.06422 | null |
| 2024-05-10 | Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs | Davide Maran et.al. | 2405.06363 | null |
| 2024-05-10 | Learning Latent Dynamic Robust Representations for World Models | Ruixiang Sun et.al. | 2405.06263 | link |
| 2024-05-10 | Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning | Xiaoyu Wen et.al. | 2405.06192 | link |
| 2024-05-10 | (A Partial Survey of) Decentralized, Cooperative Multi-Agent Reinforcement Learning | Christopher Amato et.al. | 2405.06161 | null |
| 2024-05-09 | An RNN-policy gradient approach for quantum architecture search | Gang Wang et.al. | 2405.05892 | null |
| 2024-05-09 | Safe Exploration Using Bayesian World Models and Log-Barrier Optimization | Yarden As et.al. | 2405.05890 | null |
| 2024-05-09 | ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers | Liangliang Chen et.al. | 2405.05861 | null |
| 2024-05-09 | Policy Gradient with Active Importance Sampling | Matteo Papini et.al. | 2405.05630 | null |
| 2024-05-09 | An Automatic Prompt Generation System for Tabular Data Tasks | Ashlesha Akella et.al. | 2405.05618 | null |
| 2024-05-09 | Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning | Yuchen Shi et.al. | 2405.05542 | link |
| 2024-05-08 | Model-Free Robust $φ$ -Divergence Reinforcement Learning Using Both Offline and Online Data | Kishan Panaganti et.al. | 2405.05468 | null |
| 2024-05-08 | Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management | Gang Hu et.al. | 2405.05449 | null |
| 2024-05-08 | Learning to Play Pursuit-Evasion with Dynamic and Sensor Constraints | Burak M. Gonultas et.al. | 2405.05372 | null |
| 2024-05-08 | Offline Model-Based Optimization via Policy-Guided Gradient Search | Yassine Chemingui et.al. | 2405.05349 | link |
| 2024-05-08 | Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models | Aylin Gunal et.al. | 2405.05060 | null |
| 2024-05-08 | Fault Identification Enhancement with Reinforcement Learning (FIERL) | Valentina Zaccaria et.al. | 2405.04938 | link |
| 2024-05-07 | RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes | Kyle Stachowicz et.al. | 2405.04714 | null |
| 2024-05-07 | Proximal Policy Optimization with Adaptive Exploration | Andrei Lixandru et.al. | 2405.04664 | null |
| 2024-05-07 | ACEGEN: Reinforcement learning of generative chemical agents for drug discovery | Albert Bou et.al. | 2405.04657 | link |
| 2024-05-07 | TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters | Jonathan Wilder Lavington et.al. | 2405.04491 | null |
| 2024-05-07 | Designing, Developing, and Validating Network Intelligence for Scaling in Service-Based Architectures based on Deep Reinforcement Learning | Paola Soto et.al. | 2405.04441 | null |
| 2024-05-08 | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | DeepSeek-AI et.al. | 2405.04434 | link |
| 2024-05-07 | The Curse of Diversity in Ensemble-Based Exploration | Zhixuan Lin et.al. | 2405.04342 | link |
| 2024-05-07 | Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation | Atharvan Dogra et.al. | 2405.04325 | null |
| 2024-05-07 | Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies | Paul Templier et.al. | 2405.04322 | null |
| 2024-05-07 | Improving Offline Reinforcement Learning with Inaccurate Simulators | Yiwen Hou et.al. | 2405.04307 | null |
| 2024-05-07 | Deep Reinforcement Learning for Multi-User RF Charging with Non-linear Energy Harvesters | Amirhossein Azarbahram et.al. | 2405.04218 | null |
| 2024-05-07 | In-context Learning for Automated Driving Scenarios | Ziqi Zhou et.al. | 2405.04135 | null |
| 2024-05-07 | Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning | Chunlin Tian et.al. | 2405.04122 | null |
| 2024-05-06 | $ε$ -Policy Gradient for Online Pricing | Lukasz Szpruch et.al. | 2405.03624 | null |
| 2024-05-06 | Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions | Xingyou Song et.al. | 2405.03547 | null |
| 2024-05-06 | ReinWiFi: A Reinforcement-Learning-Based Framework for the Application-Layer QoS Optimization of WiFi Networks | Qianren Li et.al. | 2405.03526 | null |
| 2024-05-06 | Robotic Constrained Imitation Learning for the Peg Transfer Task in Fundamentals of Laparoscopic Surgery | Kento Kawaharazuka et.al. | 2405.03440 | null |
| 2024-05-06 | Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning | Stone Tao et.al. | 2405.03379 | null |
| 2024-05-06 | Enhancing Q-Learning with Large Language Model Heuristics | Xiefeng Wu et.al. | 2405.03341 | null |
| 2024-05-06 | Artificial Intelligence in the Autonomous Navigation of Endovascular Interventions: A Systematic Review | Harry Robertshaw et.al. | 2405.03305 | null |
| 2024-05-06 | End-to-End Reinforcement Learning of Curative Curtailment with Partial Measurement Availability | Hinrikus Wolf et.al. | 2405.03262 | null |
| 2024-05-06 | Federated Reinforcement Learning with Constraint Heterogeneity | Hao Jin et.al. | 2405.03236 | null |
| 2024-05-06 | Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning | Caleb Chuck et.al. | 2405.03113 | null |
| 2024-05-03 | Geometric Fabrics: a Safe Guiding Medium for Policy Learning | Karl Van Wyk et.al. | 2405.02250 | null |
| 2024-05-03 | Learning Optimal Deterministic Policies with Stochastic Policy Gradients | Alessandro Montenegro et.al. | 2405.02235 | null |
| 2024-05-03 | The Cambridge RoboMaster: An Agile Multi-Robot Research Platform | Jan Blumenkamp et.al. | 2405.02198 | null |
| 2024-05-03 | Imitation Learning in Discounted Linear MDPs without exploration assumptions | Luca Viano et.al. | 2405.02181 | null |
| 2024-05-03 | Simulating the economic impact of rationality through reinforcement learning and agent-based modelling | Simone Brusatin et.al. | 2405.02161 | null |
| 2024-05-03 | Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach | Anton Plaksin et.al. | 2405.02044 | null |
| 2024-05-03 | Model-based reinforcement learning for protein backbone design | Frederic Renard et.al. | 2405.01983 | null |
| 2024-05-03 | Rescale-Invariant Federated Reinforcement Learning for Resource Allocation in V2X Networks | Kaidi Xu et.al. | 2405.01961 | null |
| 2024-05-03 | Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization | Changliang Zhou et.al. | 2405.01906 | null |
| 2024-05-03 | Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants | Francesco Maldonato et.al. | 2405.01889 | link |
| 2024-05-02 | Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks | Murtaza Dalal et.al. | 2405.01534 | null |
| 2024-05-02 | FLAME: Factuality-Aware Alignment for Large Language Models | Sheng-Chieh Lin et.al. | 2405.01525 | null |
| 2024-05-02 | NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Gerald Shen et.al. | 2405.01481 | link |
| 2024-05-02 | IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning | Ryan Hoque et.al. | 2405.01472 | null |
| 2024-05-02 | Goal-conditioned reinforcement learning for ultrasound navigation guidance | Abdoul Aziz Amadou et.al. | 2405.01409 | null |
| 2024-05-02 | Learning Force Control for Legged Manipulation | Tifanny Portela et.al. | 2405.01402 | null |
| 2024-05-02 | Constrained Reinforcement Learning Under Model Mismatch | Zhongchang Sun et.al. | 2405.01327 | null |
| 2024-05-02 | Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network | Hyeonsu Lyu et.al. | 2405.01314 | null |
| 2024-05-02 | Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning | Liu Qiyuan et.al. | 2405.01284 | null |
| 2024-05-02 | Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation | Hao Wang et.al. | 2405.01280 | null |
| 2024-05-01 | Self-Play Preference Optimization for Language Model Alignment | Yue Wu et.al. | 2405.00675 | null |
| 2024-05-01 | No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO | Skander Moalla et.al. | 2405.00662 | link |
| 2024-05-01 | HUGO – Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach | Malte Lehna et.al. | 2405.00629 | null |
| 2024-05-01 | Koopman-based Deep Learning for Nonlinear System Estimation | Zexin Sun et.al. | 2405.00627 | null |
| 2024-05-01 | Queue-based Eco-Driving at Roundabouts with Reinforcement Learning | Anna-Lena Schlamp et.al. | 2405.00625 | null |
| 2024-05-01 | The Real, the Better: Aligning Large Language Models with Online Human Behaviors | Guanying Jiang et.al. | 2405.00578 | null |
| 2024-05-01 | Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment | Zhili Liu et.al. | 2405.00557 | null |
| 2024-05-01 | Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | Lucas-Andreï Thil et.al. | 2405.00516 | null |
| 2024-05-01 | MetaRM: Shifted Distributions Alignment via Meta-Learning | Shihan Dou et.al. | 2405.00438 | null |
| 2024-05-01 | UCB-driven Utility Function Search for Multi-objective Reinforcement Learning | Yucheng Shi et.al. | 2405.00410 | link |
| 2024-04-30 | Collaborative Control Method of Transit Signal Priority Based on Cooperative Game and Reinforcement Learning | Hao Qin et.al. | 2404.19683 | null |
| 2024-04-30 | Towards Generalist Robot Learning from Internet Video: A Survey | Robert McCarthy et.al. | 2404.19664 | null |
| 2024-04-30 | Short term vs. long term: optimization of microswimmer navigation on different time horizons | Navid Mousavi et.al. | 2404.19561 | null |
| 2024-04-30 | Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation | Cengis Hasan et.al. | 2404.19462 | null |
| 2024-04-30 | Imitation Learning: A Survey of Learning Methods, Environments and Metrics | Nathan Gavenski et.al. | 2404.19456 | null |
| 2024-04-30 | Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning | Mathieu Rita et.al. | 2404.19409 | link |
| 2024-04-30 | Numeric Reward Machines | Kristina Levina et.al. | 2404.19370 | null |
| 2024-04-30 | Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning | Chenjia Bai et.al. | 2404.19346 | link |
| 2024-04-30 | Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning | Qiaosheng Zhang et.al. | 2404.19292 | null |
| 2024-04-30 | DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets | Xiaoyu Huang et.al. | 2404.19264 | null |
| 2024-04-29 | DPO Meets PPO: Reinforced Token Optimization for RLHF | Han Zhong et.al. | 2404.18922 | null |
| 2024-04-29 | Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty | Laixi Shi et.al. | 2404.18909 | null |
| 2024-04-29 | Overcoming Knowledge Barriers: Online Imitation Learning from Observation with Pretrained World Models | Xingyuan Zhang et.al. | 2404.18896 | null |
| 2024-04-29 | More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness | Aaron J. Li et.al. | 2404.18870 | link |
| 2024-04-29 | Performance-Aligned LLMs for Generating Fast Code | Daniel Nichols et.al. | 2404.18864 | null |
| 2024-04-29 | PlanNetX: Learning an Efficient Neural Network Planner from MPC for Longitudinal Control | Jasper Hoffmann et.al. | 2404.18863 | null |
| 2024-04-30 | Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization | Qi Zhang et.al. | 2404.18826 | null |
| 2024-04-29 | Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies | Seyed Soroush Karimi Madahi et.al. | 2404.18821 | null |
| 2024-04-29 | Multi-Agent Synchronization Tasks | Rolando Fernandez et.al. | 2404.18798 | null |
| 2024-04-29 | Resource-rational reinforcement learning and sensorimotor causal states | Sarah Marzen et.al. | 2404.18775 | null |
| 2024-04-26 | Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | Stephen Zhao et.al. | 2404.17546 | null |
| 2024-04-26 | Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations | Puhao Li et.al. | 2404.17521 | link |
| 2024-04-26 | Quantum Multi-Agent Reinforcement Learning for Aerial Ad-hoc Networks | Theodora-Augustina Drăgan et.al. | 2404.17499 | null |
| 2024-04-26 | Q-Learning to navigate turbulence without a map | Marco Rando et.al. | 2404.17495 | null |
| 2024-04-26 | Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning | Hao Liu et.al. | 2404.17379 | null |
| 2024-04-26 | When to Trust LLMs: Aligning Confidence with Response Quality | Shuchang Tao et.al. | 2404.17287 | null |
| 2024-04-26 | Enhancing Privacy and Security of Autonomous UAV Navigation | Vatsal Aggarwal et.al. | 2404.17225 | null |
| 2024-04-26 | Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving | C. Gong et.al. | 2404.17198 | null |
| 2024-04-26 | An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging | Sadjad Anzabi Zadeh et.al. | 2404.17187 | null |
| 2024-04-25 | Compiler for Distributed Quantum Computing: a Reinforcement Learning Approach | Panagiotis Promponas et.al. | 2404.17077 | null |
| 2024-04-25 | REBEL: Reinforcement Learning via Regressing Relative Rewards | Zhaolin Gao et.al. | 2404.16767 | null |
| 2024-04-25 | Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods | Min Kyu Shin et.al. | 2404.16721 | null |
| 2024-04-25 | RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments | Diego Martinez-Baselga et.al. | 2404.16672 | null |
| 2024-04-25 | Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare | Emre Can Acikgoz et.al. | 2404.16621 | null |
| 2024-04-25 | Exploring the Dynamics of Data Transmission in 5G Networks: A Conceptual Analysis | Nikita Smirnov et.al. | 2404.16508 | null |
| 2024-04-25 | Leveraging Pretrained Latent Representations for Few-Shot Imitation Learning on a Dexterous Robotic Hand | Davide Liconti et.al. | 2404.16483 | null |
| 2024-04-25 | A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints | Bram De Cooman et.al. | 2404.16468 | null |
| 2024-04-25 | Offline Reinforcement Learning with Behavioral Supervisor Tuning | Padmanaba Srinivasan et.al. | 2404.16399 | null |
| 2024-04-25 | SwarmRL: Building the Future of Smart Active Systems | Samuel Tovey et.al. | 2404.16388 | link |
| 2024-04-25 | Reinforcement Learning with Generative Models for Compact Support Sets | Nico Schiavone et.al. | 2404.16300 | link |
| 2024-04-24 | DPO: Differential reinforcement learning with application to optimal configuration search | Chandrajit Bajaj et.al. | 2404.15617 | null |
| 2024-04-24 | GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL | Lang Qin et.al. | 2404.15597 | null |
| 2024-04-24 | Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems | Sarah Keren et.al. | 2404.15583 | null |
| 2024-04-23 | An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models | Yangchen Pan et.al. | 2404.15518 | null |
| 2024-04-23 | The Power of Resets in Online Reinforcement Learning | Zakaria Mhammedi et.al. | 2404.15417 | null |
| 2024-04-23 | Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments | Mateus G. Machado et.al. | 2404.15410 | link |
| 2024-04-23 | Reinforcement Learning with Adaptive Control Regularization for Safe Control of Critical Systems | Haozhe Tian et.al. | 2404.15199 | null |
| 2024-04-23 | Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation | Xun Wu et.al. | 2404.15100 | null |
| 2024-04-23 | Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot | Neil Guan et.al. | 2404.15096 | null |
| 2024-04-23 | Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem | Raphael Koster et.al. | 2404.15059 | null |
| 2024-04-23 | Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems | Xiaoshuang Chen et.al. | 2404.14961 | null |
| 2024-04-23 | Multi-Objective Deep Reinforcement Learning for 5G Base Station Placement to Support Localisation for Future Sustainable Traffic | Ahmed Al-Tahmeesschi et.al. | 2404.14954 | null |
| 2024-04-23 | MultiSTOP: Solving Functional Equations with Reinforcement Learning | Alessandro Trenta et.al. | 2404.14909 | null |
| 2024-04-23 | Unitary Synthesis of Clifford+T Circuits with Reinforcement Learning | Sebastian Rietsch et.al. | 2404.14865 | null |
| 2024-04-23 | Evolutionary Reinforcement Learning via Cooperative Coevolution | Chengpeng Hu et.al. | 2404.14763 | null |
| 2024-04-23 | Rank2Reward: Learning Shaped Reward Functions from Passive Video | Daniel Yang et.al. | 2404.14735 | null |
| 2024-04-22 | Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data | Fahim Tajwar et.al. | 2404.14367 | link |
| 2024-04-22 | PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving | Jie Cheng et.al. | 2404.14327 | null |
| 2024-04-22 | Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs | David R. Nickel et.al. | 2404.14319 | null |
| 2024-04-22 | LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots | Dongge Han et.al. | 2404.14285 | null |
| 2024-04-22 | Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories | Ning Yang et.al. | 2404.14238 | null |
| 2024-04-22 | Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems | Yiyang Zhu et.al. | 2404.14092 | null |
| 2024-04-22 | Mechanistic Interpretability for AI Safety – A Review | Leonard Bereska et.al. | 2404.14082 | null |
| 2024-04-22 | Research on Robot Path Planning Based on Reinforcement Learning | Wang Ruiqi et.al. | 2404.14077 | link |
| 2024-04-22 | Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras | Mhairi Dunion et.al. | 2404.14064 | link |
| 2024-04-22 | A survey of air combat behavior modeling using machine learning | Patrick Ribu Gorton et.al. | 2404.13954 | null |
| 2024-04-19 | Mapping Social Choice Theory to RLHF | Jessica Dai et.al. | 2404.13038 | null |
| 2024-04-19 | Deep Reinforcement Learning-Based Active Flow Control of an Elliptical Cylinder: Transitioning from an Elliptical Cylinder to a Circular Cylinder and a Flat Plate | Wang Jia et.al. | 2404.13003 | null |
| 2024-04-19 | Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning | Lisheng Wu et.al. | 2404.12999 | null |
| 2024-04-19 | MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering | Avinash Anand et.al. | 2404.12926 | null |
| 2024-04-19 | Zero-Shot Stitching in Reinforcement Learning using Relative Representations | Antonio Pio Ricciardi et.al. | 2404.12917 | null |
| 2024-04-19 | MAexp: A Generic Platform for RL-based Multi-Agent Exploration | Shaohao Zhu et.al. | 2404.12824 | link |
| 2024-04-19 | Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation | Qiang He et.al. | 2404.12754 | link |
| 2024-04-19 | Demonstration of quantum projective simulation on a single-photon-based quantum computer | Giacomo Franceschetto et.al. | 2404.12729 | null |
| 2024-04-19 | Energy Conserved Failure Detection for NS-IoT Systems | Guojin Liu et.al. | 2404.12713 | null |
| 2024-04-19 | Single-Task Continual Offline Reinforcement Learning | Sibo Gai et.al. | 2404.12639 | null |
| 2024-04-18 | From $r$ to $Q^*$ : Your Language Model is Secretly a Q-Function | Rafael Rafailov et.al. | 2404.12358 | null |
| 2024-04-18 | Improving the interpretability of GNN predictions through conformal-based graph sparsification | Pablo Sanchez-Martin et.al. | 2404.12356 | link |
| 2024-04-18 | Practical Considerations for Discrete-Time Implementations of Continuous-Time Control Barrier Function-Based Safety Filters | Lukas Brunke et.al. | 2404.12329 | null |
| 2024-04-18 | ASID: Active Exploration for System Identification in Robotic Manipulation | Marius Memmel et.al. | 2404.12308 | null |
| 2024-04-18 | RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective | Chenxi Wang et.al. | 2404.12281 | null |
| 2024-04-18 | Privacy-Preserving UCB Decision Process Verification via zk-SNARKs | Xikun Jiang et.al. | 2404.12186 | null |
| 2024-04-18 | Aligning language models with human preferences | Tomasz Korbak et.al. | 2404.12150 | link |
| 2024-04-19 | Robust and Adaptive Deep Reinforcement Learning for Enhancing Flow Control around a Square Cylinder with Varying Reynolds Numbers | Wang Jia et.al. | 2404.12123 | null |
| 2024-04-18 | X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner | Haoyuan Jiang et.al. | 2404.12090 | link |
| 2024-04-18 | Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning | Hyunwoo Park et.al. | 2404.12079 | null |
| 2024-04-17 | Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding | Zezhong Fan et.al. | 2404.11589 | null |
| 2024-04-17 | Deep Policy Optimization with Temporal Logic Constraints | Ameesh Shah et.al. | 2404.11578 | null |
| 2024-04-17 | Spatio-Temporal Motion Retargeting for Quadruped Robots | Taerim Yoon et.al. | 2404.11557 | null |
| 2024-04-17 | VC Theory for Inventory Policies | Yaqi Xie et.al. | 2404.11509 | null |
| 2024-04-17 | Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem | Bowen Fang et.al. | 2404.11458 | null |
| 2024-04-17 | What-if Analysis Framework for Digital Twins in 6G Wireless Network Management | Elif Ak et.al. | 2404.11394 | null |
| 2024-04-17 | Convergence of Policy Gradient for Stochastic Linear-Quadratic Control Problem in Infinite Horizon | Xinpei Zhang et.al. | 2404.11382 | null |
| 2024-04-17 | Following the Human Thread in Social Navigation | Luca Scofano et.al. | 2404.11327 | link |
| 2024-04-17 | On Learning Parities with Dependent Noise | Noah Golowich et.al. | 2404.11325 | null |
| 2024-04-17 | Physics-informed Actor-Critic for Coordination of Virtual Inertia from Power Distribution Systems | Simon Stock et.al. | 2404.11149 | null |
| 2024-04-16 | Settling Constant Regrets in Linear Markov Decision Processes | Weitong Zhang et.al. | 2404.10745 | null |
| 2024-04-16 | N-Agent Ad Hoc Teamwork | Caroline Wang et.al. | 2404.10740 | null |
| 2024-04-16 | Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration | Benjamin A Newman et.al. | 2404.10733 | null |
| 2024-04-16 | Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning | Hao-Lun Hsu et.al. | 2404.10728 | null |
| 2024-04-16 | Automatic re-calibration of quantum devices by reinforcement learning | T. Crosta et.al. | 2404.10726 | null |
| 2024-04-16 | Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study | Shusheng Xu et.al. | 2404.10719 | null |
| 2024-04-16 | Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning | David Winkel et.al. | 2404.10683 | null |
| 2024-04-16 | SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation | Chang Chen et.al. | 2404.10675 | null |
| 2024-04-16 | Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay | Jinmei Liu et.al. | 2404.10662 | link |
| 2024-04-16 | Trajectory Planning using Reinforcement Learning for Interactive Overtaking Maneuvers in Autonomous Racing Scenarios | Levent Ögretmen et.al. | 2404.10658 | null |
| 2024-04-15 | Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model | Hyunsoo Cho et.al. | 2404.09717 | null |
| 2024-04-15 | Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning | Linjie Xu et.al. | 2404.09715 | null |
| 2024-04-15 | Learn Your Reference Model for Real Good Alignment | Alexey Gorbatovski et.al. | 2404.09656 | null |
| 2024-04-15 | Reliability Estimation of News Media Sources: Birds of a Feather Flock Together | Sergio Burdisso et.al. | 2404.09565 | null |
| 2024-04-15 | Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning | Tidiane Camaret Ndir et.al. | 2404.09521 | link |
| 2024-04-14 | Correlated Mean Field Imitation Learning | Zhiyu Zhao et.al. | 2404.09324 | null |
| 2024-04-14 | Egret: Reinforcement Mechanism for Sequential Computation Offloading in Edge Computing | Haosong Peng et.al. | 2404.09285 | null |
| 2024-04-14 | A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs | Elliot Kolker-Hicks et.al. | 2404.09264 | null |
| 2024-04-14 | Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts | Jing-Cheng Pang et.al. | 2404.09248 | null |
| 2024-04-14 | Advanced Intelligent Optimization Algorithms for Multi-Objective Optimal Power Flow in Future Power Systems: A Review | Yuyan Li et.al. | 2404.09203 | null |
| 2024-04-12 | Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation | Hanlin Tian et.al. | 2404.08570 | null |
| 2024-04-12 | RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs | Shreyas Chaudhari et.al. | 2404.08555 | null |
| 2024-04-12 | Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement | Lucas Murray et.al. | 2404.08523 | null |
| 2024-04-12 | Adversarial Imitation Learning via Boosting | Jonathan D. Chang et.al. | 2404.08513 | null |
| 2024-04-12 | Prescribing Optimal Health-Aware Operation for Urban Air Mobility with Deep Reinforcement Learning | Mina Montazeri et.al. | 2404.08497 | null |
| 2024-04-12 | Dataset Reset Policy Optimization for RLHF | Jonathan D. Chang et.al. | 2404.08495 | link |
| 2024-04-12 | Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing | Cui Zhang et.al. | 2404.08444 | null |
| 2024-04-12 | SIR-RL: Reinforcement Learning for Optimized Policy Control during Epidemiological Outbreaks in Emerging Market and Developing Economies | Maeghal Jain et.al. | 2404.08423 | null |
| 2024-04-12 | TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability | Shiwei Lian et.al. | 2404.08353 | null |
| 2024-04-12 | Agile and versatile bipedal robot tracking control through reinforcement learning | Jiayi Li et.al. | 2404.08246 | null |
| 2024-04-11 | High-Dimension Human Value Representation in Large Language Models | Samuel Cahyawijaya et.al. | 2404.07900 | null |
| 2024-04-11 | Data-Driven System Identification of Quadrotors Subject to Motor Delays | Jonas Eschmann et.al. | 2404.07837 | null |
| 2024-04-11 | On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning | Giuseppe Canonaco et.al. | 2404.07826 | null |
| 2024-04-11 | An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization | Minshuo Chen et.al. | 2404.07771 | null |
| 2024-04-11 | Differentially Private Reinforcement Learning with Self-Play | Dan Qiao et.al. | 2404.07559 | null |
| 2024-04-11 | Enhancing Policy Gradient with the Polyak Step-Size Adaption | Yunxiang Li et.al. | 2404.07525 | null |
| 2024-04-11 | Generative Probabilistic Planning for Optimizing Supply Chain Networks | Hyung-il Ahn et.al. | 2404.07511 | null |
| 2024-04-11 | Neural Fault Injection: Generating Software Faults from Natural Language | Domenico Cotroneo et.al. | 2404.07491 | null |
| 2024-04-11 | Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains | Soichiro Nishimori et.al. | 2404.07465 | null |
| 2024-04-11 | UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning | Saichao Liu et.al. | 2404.07453 | null |
| 2024-04-10 | Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery | Zohre Karimi et.al. | 2404.07185 | null |
| 2024-04-10 | Adaptive behavior with stable synapses | Cristiano Capone et.al. | 2404.07150 | null |
| 2024-04-10 | How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models | Unnseo Park et.al. | 2404.07148 | null |
| 2024-04-10 | Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection | Linas Nasvytis et.al. | 2404.07099 | link |
| 2024-04-10 | Improving Language Model Reasoning with Self-motivated Learning | Yunlong Feng et.al. | 2404.07017 | null |
| 2024-04-10 | Agent-driven Generative Semantic Communication for Remote Surveillance | Wanting Yang et.al. | 2404.06997 | null |
| 2024-04-10 | Deep Reinforcement Learning for Mobile Robot Path Planning | Hao Liu et.al. | 2404.06974 | null |
| 2024-04-10 | UAV-Assisted Enhanced Coverage and Capacity in Dynamic MU-mMIMO IoT Systems: A Deep Reinforcement Learning Approach | MohammadMahdi Ghadaksaz et.al. | 2404.06726 | null |
| 2024-04-10 | Dual Ensemble Kalman Filter for Stochastic Optimal Control | Anant A. Joshi et.al. | 2404.06696 | null |
| 2024-04-09 | Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective | Victor-Alexandru Darvariu et.al. | 2404.06492 | null |
| 2024-04-09 | Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints | Hritik Bana et.al. | 2404.06423 | null |
| 2024-04-09 | The Power in Communication: Power Regularization of Communication for Autonomy in Cooperative Multi-Agent Reinforcement Learning | Nancirose Piazza et.al. | 2404.06387 | null |
| 2024-04-09 | Policy-Guided Diffusion | Matthew Thomas Jackson et.al. | 2404.06356 | link |
| 2024-04-09 | Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning | Yanjie Li et.al. | 2404.06330 | null |
| 2024-04-09 | Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning | Xudong Yu et.al. | 2404.06188 | null |
| 2024-04-09 | A quantum information theoretic analysis of reinforcement learning-assisted quantum architecture search | Abhishek Sadhu et.al. | 2404.06174 | null |
| 2024-04-09 | Adaptable Recovery Behaviors in Robotics: A Behavior Trees and Motion Generators(BTMG) Approach for Failure Management | Faseeh Ahmad et.al. | 2404.06129 | null |
| 2024-04-09 | Automatic Configuration Tuning on Cloud Database: A Survey | Limeng Zhang et.al. | 2404.06043 | null |
| 2024-04-09 | Commute with Community: Enhancing Shared Travel through Social Networks | Tian Siyuan et.al. | 2404.05987 | null |
| 2024-04-08 | Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer | Xinyang Gu et.al. | 2404.05695 | null |
| 2024-04-08 | YaART: Yet Another ART Rendering Technology | Sergey Kastryulin et.al. | 2404.05666 | null |
| 2024-04-08 | Dynamic Backtracking in GFlowNet: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms | Shuai Guo et.al. | 2404.05576 | null |
| 2024-04-08 | Optimal Flow Admission Control in Edge Computing via Safe Reinforcement Learning | A. Fox et.al. | 2404.05564 | null |
| 2024-04-08 | Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data | Tim Baumgärtner et.al. | 2404.05530 | null |
| 2024-04-08 | CNN-based Game State Detection for a Foosball Table | David Hagens et.al. | 2404.05357 | null |
| 2024-04-08 | Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models | Yutao Ouyang et.al. | 2404.05291 | null |
| 2024-04-08 | SAFE-GIL: SAFEty Guided Imitation Learning | Yusuf Umut Ciftci et.al. | 2404.05249 | null |
| 2024-04-08 | MeSA-DRL: Memory-Enhanced Deep Reinforcement Learning for Advanced Socially Aware Robot Navigation in Crowded Environments | Mannan Saeed Muhammad et.al. | 2404.05203 | null |
| 2024-04-08 | Decision Transformer for Wireless Communications: A New Paradigm of Resource Management | Jie Zhang et.al. | 2404.05199 | null |
| 2024-04-05 | Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution | Tim Seyde et.al. | 2404.04253 | null |
| 2024-04-05 | Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation | Lanpei Li et.al. | 2404.04219 | null |
| 2024-04-05 | Enhancing IoT Intelligence: A Transformer-based Reinforcement Learning Methodology | Gaith Rjoub et.al. | 2404.04205 | null |
| 2024-04-05 | Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report | Jerrod Wigmore et.al. | 2404.04106 | null |
| 2024-04-05 | Dynamic Prompt Optimizing for Text-to-Image Generation | Wenyi Mo et.al. | 2404.04095 | link |
| 2024-04-05 | Demonstration Guided Multi-Objective Reinforcement Learning | Junlin Lu et.al. | 2404.03997 | null |
| 2024-04-05 | A proximal policy optimization based intelligent home solar management | Kode Creer et.al. | 2404.03888 | null |
| 2024-04-05 | Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration | Xudong Guo et.al. | 2404.03869 | null |
| 2024-04-04 | Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning | Noah Golowich et.al. | 2404.03774 | null |
| 2024-04-04 | A Reinforcement Learning based Reset Policy for CDCL SAT Solvers | Chunxiao Li et.al. | 2404.03753 | null |
| 2024-04-04 | AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | Hanyu Lai et.al. | 2404.03648 | link |
| 2024-04-04 | Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention | Ziru Liu et.al. | 2404.03637 | link |
| 2024-04-04 | Laser Learning Environment: A new environment for coordination-critical multi-agent tasks | Yannick Molinghen et.al. | 2404.03596 | link |
| 2024-04-04 | Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm | Miao Lu et.al. | 2404.03578 | null |
| 2024-04-04 | Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity | Jake Varley et.al. | 2404.03570 | null |
| 2024-04-04 | AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale | Adam Pardyl et.al. | 2404.03482 | link |
| 2024-04-04 | Integrating Hyperparameter Search into GramML | Hernán Ceferino Vázquez et.al. | 2404.03419 | link |
| 2024-04-04 | Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought | Jooyoung Lee et.al. | 2404.03414 | null |
| 2024-04-04 | SENSOR: Imitate Third-Person Expert’s Behaviors via Active Sensoring | Kaichen Huang et.al. | 2404.03386 | null |
| 2024-04-04 | DIDA: Denoised Imitation Learning based on Domain Adaptation | Kaichen Huang et.al. | 2404.03382 | null |
| 2024-04-03 | Learning Quadrupedal Locomotion via Differentiable Simulation | Clemens Schwarke et.al. | 2404.02887 | null |
| 2024-04-03 | Unsupervised Learning of Effective Actions in Robotics | Marko Zaric et.al. | 2404.02728 | link |
| 2024-04-03 | Reinforcement Learning in Categorical Cybernetics | Jules Hedges et.al. | 2404.02688 | null |
| 2024-04-03 | Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering | Abhijeet Pendyala et.al. | 2404.02577 | null |
| 2024-04-03 | SliceIt! – A Dual Simulator Framework for Learning Robot Food Slicing | Cristian C. Beltran-Hernandez et.al. | 2404.02569 | link |
| 2024-04-03 | Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning | Yi Shen et.al. | 2404.02545 | link |
| 2024-04-03 | Versatile Scene-Consistent Traffic Scenario Generation as Optimization with Diffusion | Zhiyu Huang et.al. | 2404.02524 | null |
| 2024-04-03 | Joint Optimization on Uplink OFDMA and MU-MIMO for IEEE 802.11ax: Deep Hierarchical Reinforcement Learning Approach | Hyeonho Noh et.al. | 2404.02486 | null |
| 2024-04-03 | Deep Reinforcement Learning for Traveling Purchaser Problems | Haofeng Yuan et.al. | 2404.02476 | null |
| 2024-04-03 | Electric Vehicle Routing Problem for Emergency Power Supply: Towards Telecom Base Station Relief | Daisuke Kikuta et.al. | 2404.02448 | link |
| 2024-04-02 | Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL | Golnaz Mesbahi et.al. | 2404.02113 | null |
| 2024-04-02 | Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning | Samuel Tovey et.al. | 2404.01999 | null |
| 2024-04-02 | VLRM: Vision-Language Models act as Reward Models for Image Captioning | Maksim Dzabraev et.al. | 2404.01911 | null |
| 2024-04-02 | Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation | Carlos Plou et.al. | 2404.01867 | null |
| 2024-04-02 | Keeping Behavioral Programs Alive: Specifying and Executing Liveness Requirements | Tom Yaacov et.al. | 2404.01858 | null |
| 2024-04-02 | EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking | Stavros Orfanoudakis et.al. | 2404.01849 | null |
| 2024-04-02 | Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy | Kyungbok Lee et.al. | 2404.01830 | null |
| 2024-04-02 | Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid | Eric MSP Veith et.al. | 2404.01794 | null |
| 2024-04-02 | Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems | Dapeng Zhi et.al. | 2404.01769 | null |
| 2024-04-02 | Asymptotics of Language Model Alignment | Joy Qiping Yang et.al. | 2404.01730 | null |
| 2024-03-29 | Learning Visual Quadrupedal Loco-Manipulation from Demonstrations | Zhengmao He et.al. | 2403.20328 | null |
| 2024-03-29 | Active flow control of a turbulent separation bubble through deep reinforcement learning | Bernat Font et.al. | 2403.20295 | null |
| 2024-03-29 | Functional Bilevel Optimization for Machine Learning | Ieva Petrulionyte et.al. | 2403.20233 | null |
| 2024-03-29 | Decentralized Multimedia Data Sharing in IoV: A Learning-based Equilibrium of Supply and Demand | Jiani Fan et.al. | 2403.20218 | null |
| 2024-03-29 | Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning | Duzhen Zhang et.al. | 2403.20163 | null |
| 2024-03-29 | CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening | Hei Yi Mak et.al. | 2403.20156 | null |
| 2024-03-29 | A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles | Jiani Fan et.al. | 2403.20151 | null |
| 2024-03-29 | Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation | Jinyeong Park et.al. | 2403.20109 | link |
| 2024-03-29 | Reinforcement learning for graph theory, II. Small Ramsey numbers | Mohammad Ghebleh et.al. | 2403.20055 | null |
| 2024-03-29 | Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering | Yuki Akiyama et.al. | 2403.20020 | null |
| 2024-03-28 | Human-compatible driving partners through data-regularized self-play reinforcement learning | Daphne Cornelisse et.al. | 2403.19648 | link |
| 2024-03-28 | Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics | Norman Di Palo et.al. | 2403.19578 | null |
| 2024-03-28 | Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment | Alireza Ganjdanesh et.al. | 2403.19490 | null |
| 2024-03-28 | Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization | Teodor V. Marinov et.al. | 2403.19462 | null |
| 2024-03-28 | RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation | Chongkai Gao et.al. | 2403.19460 | null |
| 2024-03-28 | EDA-Driven Preprocessing for SAT Solving | Zhengyuan Shi et.al. | 2403.19446 | null |
| 2024-03-28 | Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model | Qi Gou et.al. | 2403.19443 | null |
| 2024-03-28 | Fine-Tuning Language Models with Reward Learning on Policy | Hao Lang et.al. | 2403.19279 | link |
| 2024-03-28 | Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning | Dieter Coppens et.al. | 2403.19262 | null |
| 2024-03-28 | Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning | Wei Duan et.al. | 2403.19253 | null |
| 2024-03-27 | Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment | Li Siyao et.al. | 2403.18811 | null |
| 2024-03-27 | CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning | Elliot Chane-Sane et.al. | 2403.18765 | null |
| 2024-03-27 | Probabilistic Model Checking of Stochastic Reinforcement Learning Policies | Dennis Gross et.al. | 2403.18725 | null |
| 2024-03-27 | Fpga-Based Neural Thrust Controller for UAVs | Sharif Azem et.al. | 2403.18703 | null |
| 2024-03-27 | Safe and Robust Reinforcement-Learning: Principles and Practice | Taku Yamagata et.al. | 2403.18539 | null |
| 2024-03-27 | Bridging the Gap: Regularized Reinforcement Learning for Improved Classical Motion Planning with Safety Modules | Elias Goldsztejn et.al. | 2403.18524 | null |
| 2024-03-27 | VersaT2I: Improving Text-to-Image Models with Versatile Reward | Jianshu Guo et.al. | 2403.18493 | null |
| 2024-03-27 | Scaling Vision-and-Language Navigation With Offline RL | Valay Bundele et.al. | 2403.18454 | null |
| 2024-03-27 | FRESCO: Federated Reinforcement Energy System for Cooperative Optimization | Nicolas Mauricio Cuadrado et.al. | 2403.18444 | null |
| 2024-03-27 | Reinforcement learning for graph theory, I. Reimplementation of Wagner’s approach | Salem Al-Yakoob et.al. | 2403.18429 | null |
| 2024-03-26 | TractOracle: towards an anatomically-informed reward function for RL-based tractography | Antoine Théberge et.al. | 2403.17845 | null |
| 2024-03-26 | Learning the Optimal Power Flow: Environment Design Matters | Thomas Wolgast et.al. | 2403.17831 | link |
| 2024-03-26 | Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games | Yikuan Yan et.al. | 2403.17674 | null |
| 2024-03-26 | Learning Goal-Directed Object Pushing in Cluttered Scenes with Location-Based Attention | Nils Dengler et.al. | 2403.17667 | null |
| 2024-03-26 | Uncertainty-aware Distributional Offline Reinforcement Learning | Xiaocong Chen et.al. | 2403.17646 | null |
| 2024-03-26 | PeersimGym: An Environment for Solving the Task Offloading Problem with Reinforcement Learning | Frederico Metelo et.al. | 2403.17637 | null |
| 2024-03-26 | Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems | Siyu Wang et.al. | 2403.17634 | null |
| 2024-03-26 | LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation | Ke Guo et.al. | 2403.17601 | link |
| 2024-03-26 | Towards a Zero-Data, Controllable, Adaptive Dialog System | Dirk Väth et.al. | 2403.17582 | null |
| 2024-03-26 | VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts | Marius Captari et.al. | 2403.17542 | null |
| 2024-03-25 | An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems | Hanqing Yang et.al. | 2403.16809 | null |
| 2024-03-25 | Enhancing Software Effort Estimation through Reinforcement Learning-based Project Management-Oriented Feature Selection | Haoyang Chen et.al. | 2403.16749 | null |
| 2024-03-25 | Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization | Fernando Acero et.al. | 2403.16667 | null |
| 2024-03-25 | Skill Q-Network: Learning Adaptive Skill Ensemble for Mapless Navigation in Unknown Environments | Hyunki Seong et.al. | 2403.16664 | null |
| 2024-03-25 | Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL | Osama Ahmad et.al. | 2403.16652 | null |
| 2024-03-25 | CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment | Feiteng Fang et.al. | 2403.16649 | link |
| 2024-03-25 | Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications | Thao Dang et.al. | 2403.16593 | null |
| 2024-03-25 | Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged Robot | Zifan Wang et.al. | 2403.16535 | link |
| 2024-03-25 | Towards Cooperative Maneuver Planning in Mixed Traffic at Urban Intersections | Marvin Klimke et.al. | 2403.16478 | null |
| 2024-03-25 | If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions | Reza Esfandiarpoor et.al. | 2403.16442 | link |
| 2024-03-25 | Physics-informed RL for Maximal Safety Probability Estimation | Hikaru Hoshino et.al. | 2403.16391 | null |
| 2024-03-25 | Learning Action-based Representations Using Invariance | Max Rudolph et.al. | 2403.16369 | null |
| 2024-03-22 | Can large language models explore in-context? | Akshay Krishnamurthy et.al. | 2403.15371 | null |
| 2024-03-22 | Planning with a Learned Policy Basis to Optimally Solve Complex Tasks | Guillermo Infante et.al. | 2403.15301 | null |
| 2024-03-22 | Blockchain-based Pseudonym Management for Vehicle Twin Migrations in Vehicular Edge Metaverse | Jiawen Kang et.al. | 2403.15285 | null |
| 2024-03-22 | Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies | Nicolò Botteghi et.al. | 2403.15267 | null |
| 2024-03-22 | Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement | Jonathan Pirnay et.al. | 2403.15180 | null |
| 2024-03-22 | Subequivariant Reinforcement Learning Framework for Coordinated Motion Control | Haoyu Wang et.al. | 2403.15100 | null |
| 2024-03-22 | Improved Long Short-Term Memory-based Wastewater Treatment Simulators for Deep Reinforcement Learning | Esmaeel Mohammadi et.al. | 2403.15091 | null |
| 2024-03-22 | Automated Feature Selection for Inverse Reinforcement Learning | Daulet Baimukashev et.al. | 2403.15079 | null |
| 2024-03-22 | Testing for Fault Diversity in Reinforcement Learning | Quentin Mazouni et.al. | 2403.15065 | null |
| 2024-03-22 | Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation | Zhenrui Yue et.al. | 2403.14952 | null |
| 2024-03-21 | Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery | Yangchun Zhang et.al. | 2403.14593 | null |
| 2024-03-21 | A Mathematical Introduction to Deep Reinforcement Learning for 5G/6G Applications | Farhad Rezazadeh et.al. | 2403.14516 | null |
| 2024-03-21 | Constrained Reinforcement Learning with Smoothed Log Barrier Function | Baohe Zhang et.al. | 2403.14508 | null |
| 2024-03-21 | On the continuity and smoothness of the value function in reinforcement learning and optimal control | Hans Harder et.al. | 2403.14432 | null |
| 2024-03-21 | Emergent communication and learning pressures in language models: a language evolution perspective | Lukas Galke et.al. | 2403.14427 | null |
| 2024-03-21 | Task-optimal data-driven surrogate models for eNMPC via differentiable simulation and optimization | Daniel Mayfrank et.al. | 2403.14425 | null |
| 2024-03-21 | A reinforcement learning guided hybrid evolutionary algorithm for the latency location routing problem | Yuji Zou et.al. | 2403.14405 | link |
| 2024-03-21 | Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression | Fernando Acero et.al. | 2403.14328 | null |
| 2024-03-21 | Bayesian Optimization for Sample-Efficient Policy Improvement in Robotic Manipulation | Adrian Röfer et.al. | 2403.14305 | null |
| 2024-03-21 | Reactor Optimization Benchmark by Reinforcement Learning | Deborah Schwarcz et.al. | 2403.14273 | link |
| 2024-03-20 | Information-Theoretic Distillation for Reference-less Summarization | Jaehun Jung et.al. | 2403.13780 | null |
| 2024-03-20 | Towards Principled Representation Learning from Videos for Reinforcement Learning | Dipendra Misra et.al. | 2403.13765 | null |
| 2024-03-20 | Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension Study | Luca Giamattei et.al. | 2403.13729 | null |
| 2024-03-20 | Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections | Zengqi Peng et.al. | 2403.13674 | null |
| 2024-03-20 | Multi-agent Reinforcement Traffic Signal Control based on Interpretable Influence Mechanism and Biased ReLU Approximation | Zhiyue Luo et.al. | 2403.13639 | null |
| 2024-03-20 | Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation | Do June Min et.al. | 2403.13578 | link |
| 2024-03-20 | GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot | Wenxuan Song et.al. | 2403.13358 | null |
| 2024-03-20 | Waypoint-Based Reinforcement Learning for Robot Manipulation Tasks | Shaunak A. Mehta et.al. | 2403.13281 | null |
| 2024-03-20 | Federated reinforcement learning for robot motion planning with zero-shot generalization | Zhenyuan Yuan et.al. | 2403.13245 | null |
| 2024-03-20 | Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0 | Jiana Liao et.al. | 2403.13237 | null |
| 2024-03-19 | Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes | He Wang et.al. | 2403.12946 | null |
| 2024-03-19 | Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers | Vidhi Jain et.al. | 2403.12943 | null |
| 2024-03-19 | Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types | Rui Liu et.al. | 2403.12891 | null |
| 2024-03-19 | HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning | Fucai Ke et.al. | 2403.12884 | null |
| 2024-03-19 | Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning | Mirco Theile et.al. | 2403.12856 | null |
| 2024-03-19 | Policy Bifurcation in Safe Reinforcement Learning | Wenjun Zou et.al. | 2403.12847 | link |
| 2024-03-19 | AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents | Jieming Cui et.al. | 2403.12835 | null |
| 2024-03-19 | Oriented and Non-oriented Cubical Surfaces in The Penteract | Manuel Estevez et.al. | 2403.12825 | null |
| 2024-03-19 | Dynamic Manipulation of Deformable Objects using Imitation Learning with Adaptation to Hardware Constraints | Eric Hannus et.al. | 2403.12685 | null |
| 2024-03-19 | Automated Contrastive Learning Strategy Search for Time Series | Baoyu Jing et.al. | 2403.12641 | null |
| 2024-03-18 | The Value of Reward Lookahead in Reinforcement Learning | Nadav Merlis et.al. | 2403.11637 | null |
| 2024-03-18 | Offline Multitask Representation Learning for Reinforcement Learning | Haque Ishfaq et.al. | 2403.11574 | null |
| 2024-03-18 | Reinforcement Learning with Token-level Feedback for Controllable Text Generation | Wendi Li et.al. | 2403.11558 | null |
| 2024-03-18 | TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling | Weiran Chen et.al. | 2403.11550 | null |
| 2024-03-18 | State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards | Yuto Tanimoto et.al. | 2403.11520 | link |
| 2024-03-18 | Demystifying Deep Reinforcement Learning-Based Autonomous Vehicle Decision-Making | Hanxi Wan et.al. | 2403.11432 | null |
| 2024-03-18 | Variational Sampling of Temporal Trajectories | Jurijs Nazarovs et.al. | 2403.11418 | null |
| 2024-03-17 | Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective | Muhammad Aneeq uz Zaman et.al. | 2403.11345 | null |
| 2024-03-17 | Causality from Bottom to Top: A Survey | Abraham Itzhak Weinberg et.al. | 2403.11219 | null |
| 2024-03-17 | Continuous Jumping of a Parallel Wire-Driven Monopedal Robot RAMIEL Using Reinforcement Learning | Kento Kawaharazuka et.al. | 2403.11205 | null |
| 2024-03-14 | Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning | Zhishuai Liu et.al. | 2403.09621 | null |
| 2024-03-14 | ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models | Runyu Ma et.al. | 2403.09583 | null |
| 2024-03-14 | A Reinforcement Learning Approach to Dairy Farm Battery Management using Q Learning | Nawazish Ali et.al. | 2403.09499 | null |
| 2024-03-14 | Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision | Zhiqing Sun et.al. | 2403.09472 | link |
| 2024-03-14 | A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces | Hyuckjin Choi et.al. | 2403.09270 | null |
| 2024-03-14 | Leveraging Constraint Programming in a Deep Learning Approach for Dynamically Solving the Flexible Job-Shop Scheduling Problem | Imanol Echeverria et.al. | 2403.09249 | null |
| 2024-03-14 | Rumor Mitigation in Social Media Platforms with Deep Reinforcement Learning | Hongyuan Su et.al. | 2403.09217 | null |
| 2024-03-14 | MetroGNN: Metro Network Expansion with Reinforcement Learning | Hongyuan Su et.al. | 2403.09197 | null |
| 2024-03-14 | SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning | Nicholas Zolman et.al. | 2403.09110 | link |
| 2024-03-14 | CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences | Martin Weyssow et.al. | 2403.09032 | link |
| 2024-03-13 | TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning | Shangding Gu et.al. | 2403.08694 | null |
| 2024-03-13 | Digital Twin-assisted Reinforcement Learning for Resource-aware Microservice Offloading in Edge Computing | Xiangchun Chen et.al. | 2403.08687 | null |
| 2024-03-13 | Meta Reinforcement Learning for Resource Allocation in Aerial Active-RIS-assisted Networks with Rate-Splitting Multiple Access | Sajad Faramarzi et.al. | 2403.08648 | null |
| 2024-03-13 | Human Alignment of Large Language Models through Online Preference Optimisation | Daniele Calandriello et.al. | 2403.08635 | null |
| 2024-03-13 | Specification Overfitting in Artificial Intelligence | Benjamin Roth et.al. | 2403.08425 | null |
| 2024-03-13 | Optimizing Risk-averse Human-AI Hybrid Teams | Andrew Fuchs et.al. | 2403.08386 | null |
| 2024-03-13 | Learning to Describe for Predicting Zero-shot Drug-Drug Interactions | Fangqi Zhu et.al. | 2403.08377 | link |
| 2024-03-13 | LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments | Maonan Wang et.al. | 2403.08337 | link |
| 2024-03-14 | HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback | Ang Li et.al. | 2403.08309 | null |
| 2024-03-13 | SpaceOctopus: An Octopus-inspired Motion Planning Framework for Multi-arm Space Robot | Wenbo Zhao et.al. | 2403.08219 | null |
| 2024-03-12 | TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation | Shivin Dass et.al. | 2403.07869 | null |
| 2024-03-12 | Exploring Safety Generalization Challenges of Large Language Models via Code | Qibing Ren et.al. | 2403.07865 | null |
| 2024-03-12 | DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation | Chen Wang et.al. | 2403.07788 | null |
| 2024-03-12 | Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards | Wei Shen et.al. | 2403.07708 | null |
| 2024-03-12 | Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning | Motoki Omura et.al. | 2403.07704 | null |
| 2024-03-12 | Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation | Michael Ogezi et.al. | 2403.07605 | null |
| 2024-03-12 | An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning | Weiwei Gu et.al. | 2403.07566 | null |
| 2024-03-12 | Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding | Huijie Tang et.al. | 2403.07559 | link |
| 2024-03-12 | Constrained Optimal Fuel Consumption of HEV: A Constrained Reinforcement Learning Approach | Shuchang Yan et.al. | 2403.07503 | null |
| 2024-03-12 | Optimization of Pressure Management Strategies for Geological CO2 Sequestration Using Surrogate Model-based Reinforcement Learning | Jungang Chen et.al. | 2403.07360 | null |
| 2024-03-11 | Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts | Onur Celik et.al. | 2403.06966 | null |
| 2024-03-11 | Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning | Junseok Park et.al. | 2403.06880 | null |
| 2024-03-11 | Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification | Joar Skalse et.al. | 2403.06854 | null |
| 2024-03-11 | In-context Exploration-Exploitation for Reinforcement Learning | Zhenwen Dai et.al. | 2403.06826 | null |
| 2024-03-11 | ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment | Hao-Lun Hsu et.al. | 2403.06814 | null |
| 2024-03-11 | From Factor Models to Deep Learning: Machine Learning in Reshaping Empirical Asset Pricing | Junyi Ye et.al. | 2403.06779 | null |
| 2024-03-11 | ALaRM: Align Language Models via Hierarchical Rewards Modeling | Yuhang Lai et.al. | 2403.06754 | null |
| 2024-03-11 | Generalising Multi-Agent Cooperation through Task-Agnostic Communication | Dulhan Jayalath et.al. | 2403.06750 | link |
| 2024-03-11 | Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback | Adarsh N L et.al. | 2403.06735 | null |
| 2024-03-11 | Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning | Zijian Zhou et.al. | 2403.06728 | null |
| 2024-03-08 | Will GPT-4 Run DOOM? | Adrian de Wynter et.al. | 2403.05468 | null |
| 2024-03-08 | Switching the Loss Reduces the Cost in Batch Reinforcement Learning | Alex Ayoub et.al. | 2403.05385 | null |
| 2024-03-08 | Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation | Xiaoying Zhang et.al. | 2403.05171 | null |
| 2024-03-08 | Inverse Design of Photonic Crystal Surface Emitting Lasers is a Sequence Modeling Problem | Ceyao Zhang et.al. | 2403.05149 | null |
| 2024-03-08 | ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models | Jun Xu et.al. | 2403.05132 | null |
| 2024-03-08 | RLPeri: Accelerating Visual Perimetry Test with Reinforcement Learning and Convolutional Feature Extraction | Tanvi Verma et.al. | 2403.05112 | null |
| 2024-03-08 | Efficient Data Collection for Robotic Manipulation via Compositional Generalization | Jensen Gao et.al. | 2403.05110 | null |
| 2024-03-08 | Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detection | Jared M. Ping et.al. | 2403.05106 | null |
| 2024-03-08 | Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning | Hongjoon Ahn et.al. | 2403.05066 | null |
| 2024-03-08 | Aligning Large Language Models for Controllable Recommendations | Wensheng Lu et.al. | 2403.05063 | null |
| 2024-03-07 | Teaching Large Language Models to Reason with Reinforcement Learning | Alex Havrilla et.al. | 2403.04642 | null |
| 2024-03-07 | Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace | Léopold Maytié et.al. | 2403.04588 | null |
| 2024-03-07 | Learning Agility Adaptation for Flight in Clutter | Guangyu Zhao et.al. | 2403.04586 | null |
| 2024-03-07 | Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition | Long-Fei Li et.al. | 2403.04568 | null |
| 2024-03-07 | Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation | Fabian Otto et.al. | 2403.04453 | null |
| 2024-03-07 | Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation | Tairan He et.al. | 2403.04436 | null |
| 2024-03-07 | iTRPL: An Intelligent and Trusted RPL Protocol based on Multi-Agent Reinforcement Learning | Debasmita Dey et.al. | 2403.04416 | null |
| 2024-03-07 | Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning | Jing Guo Jing Guo et.al. | 2403.04412 | null |
| 2024-03-07 | Model-Free Load Frequency Control of Nonlinear Power Systems Based on Deep Reinforcement Learning | Xiaodi Chen et.al. | 2403.04374 | null |
| 2024-03-07 | Symmetry Considerations for Learning Task Symmetric Robot Policies | Mayank Mittal et.al. | 2403.04359 | null |
| 2024-03-06 | 3D Diffusion Policy | Yanjie Ze et.al. | 2403.03954 | link |
| 2024-03-06 | Stop Regressing: Training Value Functions via Classification for Scalable Deep RL | Jesse Farebrother et.al. | 2403.03950 | null |
| 2024-03-06 | Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation | Marcel Torne et.al. | 2403.03949 | null |
| 2024-03-06 | Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning | Zifan Xu et.al. | 2403.03848 | null |
| 2024-03-06 | A Survey on Applications of Reinforcement Learning in Spatial Resource Allocation | Di Zhang et.al. | 2403.03643 | null |
| 2024-03-06 | Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Yuhong Sun et.al. | 2403.03558 | link |
| 2024-03-06 | Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning | Zida Wu et.al. | 2403.03552 | null |
| 2024-03-05 | RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging | Jordan Poots et.al. | 2403.03359 | null |
| 2024-03-05 | Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks | Jianfeng Gao et.al. | 2403.03270 | null |
| 2024-03-05 | Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination | Liangzhou Wang et.al. | 2403.03172 | null |
| 2024-03-05 | Leveraging Federated Learning and Edge Computing for Recommendation Systems within Cloud Computing Networks | Yaqian Qi et.al. | 2403.03165 | null |
| 2024-03-05 | Language Guided Exploration for RL Agents in Text Environments | Hitesh Golchha et.al. | 2403.03141 | null |
| 2024-03-05 | SplAgger: Split Aggregation for Meta-Reinforcement Learning | Jacob Beck et.al. | 2403.03020 | null |
| 2024-03-05 | Autonomous vehicle decision and control through reinforcement learning with traffic flow randomization | Yuan Lin et.al. | 2403.02882 | null |
| 2024-03-05 | SpaceHopper: A Small-Scale Legged Robot for Exploring Low-Gravity Celestial Bodies | Alexander Spiridonov et.al. | 2403.02831 | null |
| 2024-03-05 | A Zero-Shot Reinforcement Learning Strategy for Autonomous Guidewire Navigation | Valentina Scarponi et.al. | 2403.02777 | null |
| 2024-03-05 | RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches | Priya Sundaresan et.al. | 2403.02709 | null |
| 2024-03-05 | Fighting Game Adaptive Background Music for Improved Gameplay | Ibrahim Khan et.al. | 2403.02701 | null |
| 2024-03-05 | PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning | Ke Zhang et.al. | 2403.02635 | null |
| 2024-03-02 | Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Alexander Scarlatos et.al. | 2403.01304 | link |
| 2024-03-02 | Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey | Hamza Kheddar et.al. | 2403.01255 | null |
| 2024-03-02 | Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding | Ha-Thanh Nguyen et.al. | 2403.01185 | null |
| 2024-03-02 | Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning | Hyungho Na et.al. | 2403.01112 | null |
| 2024-03-02 | Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL) | Noah Ford et.al. | 2403.01059 | null |
| 2024-03-01 | A Holistic Power Optimization Approach for Microgrid Control Based on Deep Reinforcement Learning | Fulong Yao et.al. | 2403.01013 | null |
| 2024-03-01 | Policy Optimization for PDE Control with a Warm Start | Xiangyuan Zhang et.al. | 2403.01005 | null |
| 2024-03-01 | On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games | Awni Altabaa et.al. | 2403.00993 | null |
| 2024-03-01 | SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation | Noriaki Hirose et.al. | 2403.00991 | null |
| 2024-03-01 | Scale-free Adversarial Reinforcement Learning | Mingyu Chen et.al. | 2403.00930 | null |
| 2024-02-29 | Curiosity-driven Red-teaming for Large Language Models | Zhang-Wei Hong et.al. | 2402.19464 | link |
| 2024-02-29 | ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL | Yifei Zhou et.al. | 2402.19446 | link |
| 2024-02-29 | Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation | Jonathan Yang et.al. | 2402.19432 | null |
| 2024-02-29 | Understanding Iterative Combinatorial Auction Designs via Multi-Agent Reinforcement Learning | Greg d’Eon et.al. | 2402.19420 | null |
| 2024-02-29 | RL-GPT: Integrating Reinforcement Learning and Code-as-policy | Shaoteng Liu et.al. | 2402.19299 | null |
| 2024-02-29 | StiefelGen: A Simple, Model Agnostic Approach for Time Series Data Augmentation over Riemannian Manifolds | Prasad Cheema et.al. | 2402.19287 | null |
| 2024-02-29 | Adaptive Testing Environment Generation for Connected and Automated Vehicles with Dense Reinforcement Learning | Jingxuan Yang et.al. | 2402.19275 | null |
| 2024-02-29 | Deep Reinforcement Learning: A Convex Optimization Approach | Ather Gattami et.al. | 2402.19212 | null |
| 2024-02-29 | ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration | Angelo Caregnato-Neto et.al. | 2402.19128 | null |
| 2024-02-29 | Temporal-Aware Deep Reinforcement Learning for Energy Storage Bidding in Energy and Contingency Reserve Markets | Jinhao Li et.al. | 2402.19110 | null |
| 2024-02-28 | Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards | Haoxiang Wang et.al. | 2402.18571 | link |
| 2024-02-28 | Unifying F1TENTH Autonomous Racing: Survey, Methods and Benchmarks | Benjamin David Evans et.al. | 2402.18558 | null |
| 2024-02-28 | Human-Centric Aware UAV Trajectory Planning in Search and Rescue Missions Employing Multi-Objective Reinforcement Learning with AHP and Similarity-Based Experience Replay | Mahya Ramezani et.al. | 2402.18487 | null |
| 2024-02-28 | FinAgent: A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist | Wentao Zhang et.al. | 2402.18485 | null |
| 2024-02-28 | Implementing Online Reinforcement Learning with Clustering Neural Networks | James E. Smith et.al. | 2402.18472 | null |
| 2024-02-28 | Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning | Jin Hwa Lee et.al. | 2402.18361 | null |
| 2024-02-28 | Solving Multi-Entity Robotic Problems Using Permutation Invariant Neural Networks | Tianxu An et.al. | 2402.18345 | null |
| 2024-02-28 | Whole-body Humanoid Robot Locomotion with Human Reference | Qiang Zhang et.al. | 2402.18294 | null |
| 2024-02-28 | Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization | Shuo Yang et.al. | 2402.18284 | null |
| 2024-02-28 | Reinforcement Learning and Graph Neural Networks for Probabilistic Risk Assessment | Joachim Grimstad et.al. | 2402.18246 | null |
(<a href=../README.md>back to main</a>)