Reinforcement Learning

Publish Date	Title	Authors	PDF	Code
2025-12-18	Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification	Qihao Liu et.al.	2512.16921	null
2025-12-18	AdaTooler-V: Adaptive Tool-Use for Images and Videos	Chaoyang Wang et.al.	2512.16918	null
2025-12-18	Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning	Qihao Liu et.al.	2512.16917	null
2025-12-18	Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward	Peter Chen et.al.	2512.16912	null
2025-12-18	Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning	Andrew Wagenmaker et.al.	2512.16911	null
2025-12-18	MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning	Yuanchen Ju et.al.	2512.16909	null
2025-12-18	AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning	Tzu-Han Lin et.al.	2512.16883	null
2025-12-18	A survey of the orienteering problem: model evolution, algorithmic advances, and future directions	Songhao Shen et.al.	2512.16865	null
2025-12-18	RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing	Tianyuan Qu et.al.	2512.16864	null
2025-12-18	ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning	Zihan Zhou et.al.	2512.16861	null
2025-12-18	Meta-RL Induces Exploration in Language Agents	Yulun Jiang et.al.	2512.16848	null
2025-12-18	Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning	Bahman Abolhassani et.al.	2512.16813	null
2025-12-18	Olaf: Bringing an Animated Character to Life in the Physical World	David Müller et.al.	2512.16705	null
2025-12-18	JustRL: Scaling a 1.5B LLM with a Simple RL Recipe	Bingxiang He et.al.	2512.16649	null
2025-12-18	Implementing a Sharia Chatbot as a Consultation Medium for Questions About Islam	Wisnu Uriawan et.al.	2512.16644	null
2025-12-18	Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game	Barna Pásztor et.al.	2512.16626	null
2025-12-18	Non-Asymptotic Global Convergence of PPO-Clip	Yin Liu et.al.	2512.16565	null
2025-12-18	ParamExplorer: A framework for exploring parameters in generative art	Julien Gachadoat et.al.	2512.16529	null
2025-12-18	Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment	Yuan Li et.al.	2512.16484	null
2025-12-18	E-SDS: Environment-aware See it, Do it, Sorted - Automated Environment-Aware Reinforcement Learning for Humanoid Locomotion	Enis Yalcin et.al.	2512.16446	null
2025-12-18	StarCraft+: Benchmarking Multi-agent Algorithms in Adversary Paradigm	Yadong Li et.al.	2512.16444	null
2025-12-18	NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning	Ruifeng Xu et.al.	2512.16408	null
2025-12-18	Hypernetworks That Evolve Themselves	Joachim Winther Pedersen et.al.	2512.16406	null
2025-12-18	Machine Learning-based Optimal Control for Colloidal Self-Assembly	Andres Lizano-Villalobos et.al.	2512.16402	null
2025-12-18	ManiLong-Shot: Interaction-Aware One-Shot Imitation Learning for Long-Horizon Manipulation	Zixuan Chen et.al.	2512.16302	null
2025-12-18	Simultaneous Secrecy and Covert Communications (SSACC) in Mobility-Aware RIS-Aided Networks	Yanyu Cheng et.al.	2512.16224	null
2025-12-18	Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation	Sarosij Bose et.al.	2512.16201	null
2025-12-18	MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation	Pengyu Wang et.al.	2512.16145	null
2025-12-18	INTELLECT-3: Technical Report	Prime Intellect Team et.al.	2512.16144	null
2025-12-17	Techno-economic optimization of a heat-pipe microreactor, part I: theory and cost optimization	Paul Seurin et.al.	2512.16032	null
2025-12-17	Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models	Caner Erden et.al.	2512.15973	null
2025-12-17	Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning	Polaris Jhandi et.al.	2512.15943	null
2025-12-17	DSO: Direct Steering Optimization for Bias Mitigation	Lucas Monteiro Paes et.al.	2512.15926	null
2025-12-15	Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)	Akhil Sharma et.al.	2512.15790	null
2025-12-17	Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning	Zhenwen Liang et.al.	2512.15687	null
2025-12-17	Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning	Jiaqi Xu et.al.	2512.15662	null
2025-12-17	Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction	Mathieu Blondel et.al.	2512.15605	null
2025-12-17	Deep Reinforcement Learning for EH-Enabled Cognitive-IoT Under Jamming Attacks	Nadia Abdolkhani et.al.	2512.15558	null
2025-12-17	Autonomous Pressure Control in MuVacAS via Deep Reinforcement Learning and Deep Learning Surrogate Models	Guillermo Rodriguez-Llorente et.al.	2512.15521	null
2025-12-17	Double Horizon Model-Based Policy Optimization	Akihiro Kubo et.al.	2512.15439	null
2025-12-17	FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments	Quanxi Zhou et.al.	2512.15430	null
2025-12-17	Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods	Ji Zhou et.al.	2512.15422	null
2025-12-17	EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning	Jianfei Ma et.al.	2512.15405	null
2025-12-17	Graph Contextual Reinforcement Learning for Efficient Directed Controller Synthesis	Toshihide Ubukata et.al.	2512.15295	null
2025-12-17	Learning-Based Phase Shift Optimization of Liquid Crystal RIS in Dynamic mmWave Networks	Le Hao et.al.	2512.15279	null
2025-12-17	Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning	Yiliu Sun et.al.	2512.15274	null
2025-12-17	EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence	Jiaxu Wan et.al.	2512.15160	null
2025-12-17	Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning	Weiqin Wang et.al.	2512.15146	null
2025-12-17	Automatic Reward Shaping from Multi-Objective Human Heuristics	Yuqing Xie et.al.	2512.15120	null
2025-12-17	QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management	Jiayang Wan et.al.	2512.15119	null
2025-12-17	Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models	Jinwu Hu et.al.	2512.15089	null
2025-12-17	Deep Reinforcement Learning for Joint Time and Power Management in SWIPT-EH CIoT	Nadia Abdolkhani et.al.	2512.15062	null
2025-12-17	Spectral Representation-based Reinforcement Learning	Chenxiao Gao et.al.	2512.15036	null
2025-12-17	ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision	Wenlong Xia et.al.	2512.15020	null
2025-12-17	Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management	E. C. Garrido-Merchán et.al.	2512.14992	null
2025-12-17	Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes	Hanqing Jin et.al.	2512.14991	null
2025-12-16	Puzzle Curriculum GRPO for Vision-Centric Reasoning	Ahmadreza Jeddi et.al.	2512.14944	null
2025-12-16	Imitation Learning for Multi-turn LM Agents via On-policy Expert Corrections	Niklas Lauffer et.al.	2512.14895	null
2025-12-16	Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse	Jingwei Chen et.al.	2512.14879	null
2025-12-16	TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs	Jun Zhang et.al.	2512.14698	null
2025-12-16	CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives	Zihan Wang et.al.	2512.14696	null
2025-12-16	Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes	Alessandro Trapasso et.al.	2512.14617	null
2025-12-16	RecGPT-V2 Technical Report	Chao Yi et.al.	2512.14503	null
2025-12-16	Hybrid Cognitive IoT with Cooperative Caching and SWIPT-EH: A Hierarchical Reinforcement Learning Framework	Nadia Abdolkhani et.al.	2512.14488	null
2025-12-16	Context-Picker: Dynamic context selection using multi-stage reinforcement learning	Siyuan Zhu et.al.	2512.14465	null
2025-12-16	A data-physics hybrid generative model for patient-specific post-stroke motor rehabilitation using wearable sensor data	Yanning Dai et.al.	2512.14329	null
2025-12-16	Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations	Xudong Han et.al.	2512.14321	null
2025-12-16	A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks	Agrippina Mwangi et.al.	2512.14297	null
2025-12-16	GLM-TTS Technical Report	Jiayan Cui et.al.	2512.14291	null
2025-12-16	Understanding and Improving Hyperbolic Deep Reinforcement Learning	Timo Klein et.al.	2512.14202	null
2025-12-16	Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis	Yankai Jiang et.al.	2512.14157	null
2025-12-16	A First-Order Logic-Based Alternative to Reward Models in RLHF	Chunjin Jian et.al.	2512.14100	null
2025-12-16	RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees	Junjie Ma et.al.	2512.14069	null
2025-12-16	Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning	Amir M. Soufi Enayati et.al.	2512.14057	null
2025-12-16	OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving	Zhenguo Zhang et.al.	2512.14044	null
2025-12-16	Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model	Zhaofeng Hu et.al.	2512.14031	null
2025-12-16	Cooperative Caching Towards Efficient Spectrum Utilization in Cognitive-IoT Networks	Nadia Abdolkhani et.al.	2512.14029	null
2025-12-16	Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks	Nadia Abdolkhani et.al.	2512.14013	null
2025-12-15	Adaptive digital twins for predictive decision-making: Online Bayesian learning of transition dynamics	Eugenio Varetti et.al.	2512.13919	null
2025-12-15	Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences	Charles Marrder et.al.	2512.13890	null
2025-12-15	SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning	Jitesh Jain et.al.	2512.13874	null
2025-12-15	Explainable reinforcement learning from human feedback to improve alignment	Shicheng Liu et.al.	2512.13837	null
2025-12-13	RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing	Yuhan Tang et.al.	2512.13727	null
2025-12-13	Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce	Sayak Chakrabarty et.al.	2512.13726	null
2025-12-15	AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection	Junwen Miao et.al.	2512.13671	null
2025-12-15	A Scientific Reasoning Model for Organic Synthesis Procedure Generation	Guoqing Liu et.al.	2512.13668	null
2025-12-15	Advancing Machine Learning Optimization of Chiral Photonic Metasurface: Comparative Study of Neural Network and Genetic Algorithm Approaches	Davide Filippozzi et.al.	2512.13656	null
2025-12-15	MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning	Haoyu Fu et.al.	2512.13636	null
2025-12-15	SCR2-ST: Combine Single Cell with Spatial Transcriptomics for Efficient Active Sampling via Reinforcement Learning	Junchao Zhu et.al.	2512.13635	null
2025-12-15	Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models	Boxin Wang et.al.	2512.13607	null
2025-12-15	Image Diffusion Preview with Consistency Solver	Fu-Yun Wang et.al.	2512.13592	link
2025-12-15	MMhops-R1: Multimodal Multi-hop Reasoning	Tao Zhang et.al.	2512.13573	null
2025-12-15	Memory in the Age of AI Agents	Yuyang Hu et.al.	2512.13564	link
2025-12-15	How Low Can You Go? The Data-Light SE Challenge	Kishan Kumar Ganguly et.al.	2512.13524	null
2025-12-15	Reinforcement Learning based 6-DoF Maneuvers for Microgravity Intravehicular Docking: A Simulation Study with Int-Ball2 in ISS-JEM	Aman Arora et.al.	2512.13514	null
2025-12-15	MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph	Linjie Mu et.al.	2512.13510	null
2025-12-15	Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model	Heyi Chen et.al.	2512.13507	null
2025-12-15	Differentiable Evolutionary Reinforcement Learning	Sitao Cheng et.al.	2512.13399	null
2025-12-15	QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution	Mohammad Reza Fasihi et.al.	2512.13393	null
2025-12-15	Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning	Chuan Mao et.al.	2512.13380	null
2025-12-15	Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles	Sümer Tunçay et.al.	2512.13359	null
2025-12-15	Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)	Zeyad Gamal et.al.	2512.13356	null
2025-12-15	Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration	Hao Fu et.al.	2512.13293	null
2025-12-15	AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning	Jiaru Zou et.al.	2512.13278	null
2025-12-15	SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling	Muhammad Alfian Amrizal et.al.	2512.13268	null
2025-12-15	Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving	Hyunki Seong et.al.	2512.13262	null
2025-12-15	Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection	Zihui Zhao et.al.	2512.13240	null
2025-12-15	SACn: Soft Actor-Critic with n-step Returns	Jakub Łyskawa et.al.	2512.13165	null
2025-12-15	SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning	Emre Can Acikgoz et.al.	2512.13159	null
2025-12-15	TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning	Shenzhi Yang et.al.	2512.13106	null
2025-12-15	Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures	Mohammad Walid Charrwi et.al.	2512.13096	null
2025-12-15	ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning	Feng Zhang et.al.	2512.13095	null
2025-12-15	Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation	Xiang Li et.al.	2512.13094	null
2025-12-15	PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations	Mingqi Yuan et.al.	2512.13093	null
2025-12-15	M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization	Bizhe Bai et.al.	2512.13070	null
2025-12-15	Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments	Kangning Gao et.al.	2512.13060	null
2025-12-15	GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training	Tong Wei et.al.	2512.13043	null
2025-12-15	What Happens Next? Next Scene Prediction with a Unified Video Model	Xinjie Li et.al.	2512.13015	null
2025-12-15	Learning Terrain Aware Bipedal Locomotion via Reduced Dimensional Perceptual Representations	Guillermo A. Castillo et.al.	2512.12993	null
2025-12-15	Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning	Amin Jalal Aghdasian et.al.	2512.12987	null
2025-12-15	QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management	Weizhou Shen et.al.	2512.12967	null
2025-12-15	Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals	Gagan Deep et.al.	2512.12924	null
2025-12-15	LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization	Bangyu Li et.al.	2512.12922	null
2025-12-15	Meta-GPT: Decoding the Metasurface Genome with Generative Artificial Intelligence	David Dang et.al.	2512.12888	null
2025-12-14	Information-Consistent Language Model Recommendations through Group Relative Policy Optimization	Sonal Prabhune et.al.	2512.12858	null
2025-12-14	MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems	Patrick Kostelac et.al.	2512.12855	null
2025-12-14	Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks	Dong Liu et.al.	2512.12803	null
2025-12-14	CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning	Xuanzhang Liu et.al.	2512.12716	null
2025-12-14	Self-Motivated Growing Neural Network for Adaptive Architecture via Local Structural Plasticity	Yiyang Jia et.al.	2512.12713	null
2025-12-14	Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning	Enhong Mu et.al.	2512.12706	null
2025-12-14	Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning	Yongcan Yu et.al.	2512.12690	null
2025-12-14	CogDoc: Towards Unified thinking in Documents	Qixin Xu et.al.	2512.12658	null
2025-12-14	Coupled Variational Reinforcement Learning for Language Model General Reasoning	Xueru Wen et.al.	2512.12576	null
2025-12-14	World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents	Yesid Fonseca et.al.	2512.12548	null
2025-12-13	Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings	Shengkai Xu et.al.	2512.12492	null
2025-12-13	More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models	Hoang Anh Just et.al.	2512.12487	null
2025-12-13	HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments	Yongjun He et.al.	2512.12476	null
2025-12-13	Sim2Real Reinforcement Learning for Soccer skills	Jonathan Spraggett et.al.	2512.12437	null
2025-12-13	Deep Hedging with Reinforcement Learning: A Practical Framework for Option Risk Management	Travon Lucius et.al.	2512.12420	null
2025-12-13	ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) Systems	Babak Badnava et.al.	2512.12366	null
2025-12-13	The Role of AI in Modern Penetration Testing	J. Alexander Curtis et.al.	2512.12326	null
2025-12-13	A Conflict-Aware Resource Management Framework for the Computing Continuum	Vlad Popescu-Vifor et.al.	2512.12299	null
2025-12-13	Moment and Highlight Detection via MLLM Frame Segmentation	I Putu Andika Bagas Jiwanta et.al.	2512.12246	null
2025-12-13	Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy	Jonathan Spraggett et.al.	2512.12230	null
2025-12-12	Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning	Vittorio Giammarino et.al.	2512.12046	null
2025-12-12	Policy Gradient Algorithms for Age-of-Information Cost Minimization	José-Ramón Vidal et.al.	2512.11990	null
2025-12-12	Learning to Extract Context for Context-Aware LLM Inference	Minseon Kim et.al.	2512.11986	null
2025-12-12	A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach	Jia Hu et.al.	2512.11944	null
2025-12-12	Evolutionary Reinforcement Learning based AI tutor for Socratic Interdisciplinary Instruction	Mei Jiang et.al.	2512.11930	null
2025-12-12	AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis	Junjie Ye et.al.	2512.11797	null
2025-12-12	Agile Flight Emerges from Multi-Agent Competitive Racing	Vineet Pasumarti et.al.	2512.11781	null
2025-12-12	SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support	Yuming Feng et.al.	2512.11755	null
2025-12-12	UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations	Tingyu Yuan et.al.	2512.11609	null
2025-12-12	DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry	Zhenyang Cai et.al.	2512.11558	null
2025-12-12	Rethinking Expert Trajectory Utilization in LLM Post-training	Bowen Ding et.al.	2512.11470	null
2025-12-12	Three methods, one problem: Classical and AI approaches to no-three-in-line	Pranav Ramanathan et.al.	2512.11469	null
2025-12-12	Towards Trustworthy Multi-Turn LLM Agents via Behavioral Guidance	Gonca Gürsun et.al.	2512.11421	null
2025-12-12	Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization	Yifan Niu et.al.	2512.11391	null
2025-12-12	Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits	Minwoo Park et.al.	2512.11345	null
2025-12-12	DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning	Jinming Ge et.al.	2512.11342	null
2025-12-12	RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training	Tianyuan Wu et.al.	2512.11306	null
2025-12-12	When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents	Mrinal Rawat et.al.	2512.11277	null
2025-12-12	A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation	Hong Je-Gal et.al.	2512.11270	null
2025-12-12	Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control	Iftekharul Islam et.al.	2512.11247	null
2025-12-11	Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning	Wei Duan et.al.	2512.11179	null
2025-12-11	Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance	Tzu-Hsien Lee et.al.	2512.11173	null
2025-12-11	CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound	Akhil S Anand et.al.	2512.11169	null
2025-12-11	Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts	Guanli Liu et.al.	2512.11161	null
2025-12-11	In-Context Multi-Objective Optimization	Xinyu Zhang et.al.	2512.11114	null
2025-12-11	Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation	Yiwen Tang et.al.	2512.10949	link
2025-12-11	Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit	Zamirddine Mari et.al.	2512.10934	null
2025-12-11	Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation	Zamirddine Mari et.al.	2512.10925	null
2025-12-11	Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies	Mohammad Rezoanul Hoque et.al.	2512.10913	null
2025-12-11	Iterative Compositional Data Generation for Robot Control	Anh-Quan Pham et.al.	2512.10891	null
2025-12-11	Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments	Atahan Cilan et.al.	2512.10835	null
2025-12-11	OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification	Zijian Wu et.al.	2512.10756	null
2025-12-11	Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification	Maya Swisa et.al.	2512.10747	null
2025-12-11	Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving	Songyang Gao et.al.	2512.10739	null
2025-12-11	How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning	Jianbo Wang et.al.	2512.10698	null
2025-12-11	Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning	Benjamin Gundersen et.al.	2512.10691	null
2025-12-11	AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence	Bo Yang et.al.	2512.10624	null
2025-12-11	Multi-Objective Reward and Preference Optimization: Theory and Algorithms	Akhil Agnihotri et.al.	2512.10601	null
2025-12-11	Grounding Everything in Tokens for Multimodal Large Language Models	Xiangxuan Ren et.al.	2512.10554	null
2025-12-11	Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning	Haiteng Zhao et.al.	2512.10534	null
2025-12-11	Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning	Chihyeon Song et.al.	2512.10510	null
2025-12-11	UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning	Jiaxi Wu et.al.	2512.10492	null
2025-12-11	Shot and Architecture Adaptive Subspace Variational Quantum Eigensolver for Microwave Simulation	Zhixiu Han et.al.	2512.10458	null
2025-12-11	HypeR Adaptivity: Joint $hr$ -Adaptive Meshing via Hypergraph Multi-Agent Deep Reinforcement Learning	Niccolò Grillo et.al.	2512.10439	null
2025-12-11	Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention	Yang Yu et.al.	2512.10414	null
2025-12-11	A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale	Vinoth Punniyamoorthy et.al.	2512.10341	null
2025-12-11	Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters	Shruti Dongare et.al.	2512.10271	null
2025-12-11	Multi-dimensional Preference Alignment by Conditioning Reward Itself	Jiho Jang et.al.	2512.10237	null
2025-12-11	Task-Oriented Grasping Using Reinforcement Learning with a Contextual Reward Machine	Hui Li et.al.	2512.10235	null
2025-12-11	Latent Chain-of-Thought World Modeling for End-to-End Driving	Shuhan Tan et.al.	2512.10226	null
2025-12-11	An exploration for higher efficiency in multi objective optimisation with reinforcement learning	Mehmet Emin Aydin et.al.	2512.10208	null
2025-12-10	Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation	Pol Mestres et.al.	2512.10118	null
2025-12-10	Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation	Steven Caro et.al.	2512.10099	null
2025-12-10	SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation	Jongmin Lee et.al.	2512.10042	null
2025-12-10	Diffusion Is Your Friend in Show, Suggest and Tell	Jia Cheng Hu et.al.	2512.10038	null
2025-12-10	Latent Action World Models for Control with Unlabeled Trajectories	Marvin Alles et.al.	2512.10016	null
2025-12-10	TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0	Jinyu Chen et.al.	2512.09961	null
2025-12-10	STACHE: Local Black-Box Explanations for Reinforcement Learning Policies	Andrew Elashkin et.al.	2512.09909	null
2025-12-10	FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning	Khurram Khalil et.al.	2512.09872	null
2025-12-10	Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation	Yuyang Li et.al.	2512.09851	link
2025-12-10	ChronusOmni: Improving Time Awareness of Omni Large Language Models	Yijing Chen et.al.	2512.09841	null
2025-12-10	RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning	Khurram Khalil et.al.	2512.09829	null
2025-12-10	Prefrontal scaling of reward prediction error readout gates reinforcement-derived adaptive behavior in primates	Tian Sang et.al.	2512.09761	null
2025-12-10	MOA: Multi-Objective Alignment for Role-Playing Agents	Chonghua Liao et.al.	2512.09756	null
2025-12-10	Flexible Reconfigurable Intelligent Surface-Aided Covert Communications in UAV Networks	Chong Huang et.al.	2512.09714	null
2025-12-10	Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning	Kaichen He et.al.	2512.09706	null
2025-12-10	Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies	Mika Persson et.al.	2512.09682	null
2025-12-10	d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models	Leyi Pan et.al.	2512.09675	null
2025-12-10	SynthPix: A lightspeed PIV images generator	Antonio Terpin et.al.	2512.09664	null
2025-12-10	Mastering Diverse, Unknown, and Cluttered Tracks for Robust Vision-Based Drone Racing	Feng Yu et.al.	2512.09571	null
2025-12-10	Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search	Junkai Ji et.al.	2512.09566	null
2025-12-10	REASAN: Learning Reactive Safe Navigation for Legged Robots	Qihao Yuan et.al.	2512.09537	null
2025-12-10	RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning	Yucan Guo et.al.	2512.09487	null
2025-12-10	Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation	Jialin Ying et.al.	2512.09410	null
2025-12-10	CFLight: Enhancing Safety with Traffic Signal Control through Counterfactual Learning	Mingyuan Li et.al.	2512.09368	null
2025-12-10	COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning	Lin Li et.al.	2512.09349	null
2025-12-10	Tyche: A Hybrid Computation Framework of Illumination Pattern for Satellite Beam Hopping	Ziheng Yang et.al.	2512.09312	null
2025-12-10	One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation	Huayi Zhou et.al.	2512.09297	null
2025-12-10	Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning	Ruonan Pi et.al.	2512.09293	null
2025-12-10	Exploratory Mean-Variance with Jumps: An Equilibrium Approach	Yuling Max Chen et.al.	2512.09224	null
2025-12-09	Learning Unmasking Policies for Diffusion Language Models	Metod Jazbec et.al.	2512.09106	null
2025-12-09	Masked Generative Policy for Robotic Control	Lipeng Zhuang et.al.	2512.09101	null
2025-12-09	No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers	Damiano Marsili et.al.	2512.08889	null
2025-12-09	IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams	Ryan LeRoy et.al.	2512.08877	null
2025-12-09	Reinforcement Learning From State and Temporal Differences	Lex Weaver et.al.	2512.08855	null
2025-12-09	Optimal navigation in two-dimensional regular and turbulent flows	Vladimir Parfenyev et.al.	2512.08766	null
2025-12-09	Learning and Editing Universal Graph Prompt Tuning via Reinforcement Learning	Jinfeng Xu et.al.	2512.08763	null
2025-12-09	Direct transfer of optimized controllers to similar systems using dimensionless MPC	Josip Kir Hromatko et.al.	2512.08667	null
2025-12-09	Sim2Swim: Zero-Shot Velocity Control for Agile AUV Maneuvering in 3 Minutes	Lauritz Rismark Fosso et.al.	2512.08656	null
2025-12-09	Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis	Orit Davidovich et.al.	2512.08601	null
2025-12-09	Mind to Hand: Purposeful Robotic Control via Embodied Reasoning	Peijun Tang et.al.	2512.08580	null
2025-12-09	Thinking with Images via Self-Calling Agent	Wenxi Yang et.al.	2512.08511	link
2025-12-09	Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning	Junnan Qiu et.al.	2512.08485	null
2025-12-09	Using reinforcement learning to probe the role of feedback in skill acquisition	Antonio Terpin et.al.	2512.08463	null
2025-12-09	From Accuracy to Impact: The Impact-Driven AI Framework (IDAIF) for Aligning Engineering Architecture with Theory of Change	Yong-Woon Kim et.al.	2512.08449	null
2025-12-09	Turning Threat into Opportunity: DRL-Powered Anti-Jamming via Energy Harvesting in UAV-Disrupted Channels	Ngoc-Tan Nguyen et.al.	2512.08351	null
2025-12-09	Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks	Thai Duong Nguyen et.al.	2512.08341	null
2025-12-09	Collaborative Intelligence for UAV-Satellite Network Slicing: Towards a Joint QoS-Energy-Fairness MADRL Optimization	Thanh-Dao Nguyen et.al.	2512.08322	null
2025-12-09	rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection	Sijia Chen et.al.	2512.08300	null
2025-12-09	Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventions	Eunice Yiu et.al.	2512.08230	null
2025-12-09	Primal-dual policy learning for mean-field stochastic LQR problem	Xiushan Jiang et.al.	2512.08205	null
2025-12-09	TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models	Zheng Ding et.al.	2512.08153	null
2025-12-09	Robust Agents in Open-Ended Worlds	Mikayel Samvelyan et.al.	2512.08139	null
2025-12-09	Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward	Sampriti Soor et.al.	2512.08131	null
2025-12-08	Scalable Offline Model-Based RL with Action Chunks	Kwanyoung Park et.al.	2512.08108	null
2025-12-08	Training LLMs for Honesty via Confessions	Manas Joglekar et.al.	2512.08093	null
2025-12-08	An Introduction to Deep Reinforcement and Imitation Learning	Pedro Santana et.al.	2512.08052	null
2025-12-08	F2: Offline Reinforcement Learning for Hamiltonian Simulation via Free-Fermionic Subroutine Compilation	Ethan Decker et.al.	2512.08023	null
2025-12-08	Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care	Aryaman Bansal et.al.	2512.08012	null
2025-12-08	VLD: Visual Language Goal Distance for Reinforcement Learning Navigation	Lazar Milikic et.al.	2512.07976	null
2025-12-08	Agentic Artificial Intelligence for Ethical Cybersecurity in Uganda: A Reinforcement Learning Framework for Threat Detection in Resource-Constrained Environments	Ibrahim Adabara et.al.	2512.07909	null
2025-12-08	An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning	Lukas Johannes Möller et.al.	2512.07827	null
2025-12-08	On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models	Charlie Zhang et.al.	2512.07783	null
2025-12-08	RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models	Xiqiao Xiong et.al.	2512.07761	null
2025-12-08	DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving	Jialv Zou et.al.	2512.07745	null
2025-12-08	SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery	Meng Cao et.al.	2512.07733	null
2025-12-08	Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE	Anxiang Zeng et.al.	2512.07710	null
2025-12-08	Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks	Aileen Liao et.al.	2512.07697	null
2025-12-08	The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds	Shahar Lutati et.al.	2512.07631	null
2025-12-08	Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement	Yongsheng Lian et.al.	2512.07611	null
2025-12-08	Understanding Individual Decision-Making in Multi-Agent Reinforcement Learning: A Dynamical Systems Approach	James Rudd-Jones et.al.	2512.07588	null
2025-12-08	ReLaX: Reasoning with Latent Exploration for Large Reasoning Models	Shimin Zhang et.al.	2512.07558	null
2025-12-08	Model-Based Reinforcement Learning Under Confounding	Nishanth Venkatesh et.al.	2512.07528	null
2025-12-08	How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations	JV Roig et.al.	2512.07497	null
2025-12-08	Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization	Zhuoran Zhuang et.al.	2512.07478	null
2025-12-08	Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction	Haolin Song et.al.	2512.07464	null
2025-12-08	Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning	Tong Wu et.al.	2512.07461	null
2025-12-08	From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models	Clarisse Bardiot et.al.	2512.07452	null
2025-12-08	KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models	Chenwei Shi et.al.	2512.07437	null
2025-12-08	Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models	Haidong Kang et.al.	2512.07419	null
2025-12-08	Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning	Giray Önür et.al.	2512.07417	null
2025-12-08	Training Language Models to Use Prolog as a Tool	Niklas Mellgren et.al.	2512.07407	null
2025-12-08	Control and Reinforcement Learning through the Lens of Optimization: An Algorithmic Perspective	Tolga Ok et.al.	2512.07377	null
2025-12-08	ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning	Byungju Kim et.al.	2512.07371	null
2025-12-08	Multi-Rigid-Body Approximation of Human Hands with Application to Digital Twin	Bin Zhao et.al.	2512.07359	null
2025-12-08	PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning	Chen Gong et.al.	2512.07342	null
2025-12-08	RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation	Zhi Rao et.al.	2512.07273	null
2025-12-08	SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks	Florian Tretter et.al.	2512.07266	null
2025-12-08	Benchmarking Humanoid Imitation Learning with Motion Difficulty	Zhaorui Meng et.al.	2512.07248	null
2025-12-08	Towards Robust Protective Perturbation against DeepFake Face Swapping	Hengyang Yao et.al.	2512.07228	null
2025-12-08	Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation	Zhaoyang Liu et.al.	2512.07212	null
2025-12-08	MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning	Xuhui Zheng et.al.	2512.07203	null
2025-12-08	Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction	Zhen Huang et.al.	2512.07200	null
2025-12-08	Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models	Fenghua Weng et.al.	2512.07141	null
2025-12-08	TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning	Zebin Xing et.al.	2512.07135	null
2025-12-08	Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots	Jue Wang et.al.	2512.07114	null
2025-12-07	A Hetero-Associative Sequential Memory Model Utilizing Neuromorphic Signals: Validated on a Mobile Manipulator	Runcong Wang et.al.	2512.07032	null
2025-12-07	Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients	Krishna Arun et.al.	2512.06990	null
2025-12-07	LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding	Yu Yu et.al.	2512.06982	null
2025-12-07	Neuro-Vesicles: Neuromodulation Should Be a Dynamical System, Not a Tensor Decoration	Zilin Li et.al.	2512.06966	null
2025-12-07	Statistical analysis of Inverse Entropy-regularized Reinforcement Learning	Denis Belomestny et.al.	2512.06956	null
2025-12-07	Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features	Aseer Al Faisal et.al.	2512.06925	null
2025-12-07	Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models	Alexandr Plashchinsky et.al.	2512.06920	null
2025-12-07	Know your Trajectory – Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis	Clifford F et.al.	2512.06917	null
2025-12-07	Khalasi: Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields	Rushiraj Gadhvi et.al.	2512.06912	null
2025-12-07	An Analysis of Large Language Models for Simulating User Responses in Surveys	Ziyun Yu et.al.	2512.06874	null
2025-12-07	JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models	Ce Chi et.al.	2512.06859	null
2025-12-07	Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning	Tingyu Li et.al.	2512.06835	null
2025-12-07	MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning	Yueqian Wang et.al.	2512.06810	null
2025-12-07	PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance	Jifar Wakuma Ayana et.al.	2512.06747	null
2025-12-07	The Role of Entropy in Visual Grounding: Analysis and Optimization	Shuo Li et.al.	2512.06726	null
2025-12-07	RunawayEvil: Jailbreaking the Image-to-Video Generative Models	Songping Wang et.al.	2512.06674	null
2025-12-07	LightSearcher: Efficient DeepSearch via Experiential Memory	Hengzhi Lan et.al.	2512.06653	null
2025-12-07	Analyzing Collision Rates in Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning	Muyang Fan et.al.	2512.06645	null
2025-12-07	Learning to Hedge Swaptions	Zaniar Ahmadi et.al.	2512.06639	null
2025-12-07	MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment	Ruicheng Zhang et.al.	2512.06628	null
2025-12-07	A New Trajectory-Oriented Approach to Enhancing Comprehensive Crowd Navigation Performance	Xinyu Zhou et.al.	2512.06608	null
2025-12-06	MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding	Yuhao Su et.al.	2512.06581	null
2025-12-06	Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input	Zifan Xu et.al.	2512.06571	null
2025-12-06	A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation	Xiaocan Li et.al.	2512.06547	null
2025-12-06	Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning	Ming Chen et.al.	2512.06533	null
2025-12-06	Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains	Wanru Gong et.al.	2512.06486	null
2025-12-06	Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control	Nathan P. Lawrence et.al.	2512.06471	null
2025-12-06	RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs	Runlong Zhou et.al.	2512.06392	null
2025-12-06	VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning	Yuji Wang et.al.	2512.06373	null
2025-12-06	LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing	Zhiying Yang et.al.	2512.06351	null
2025-12-06	ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models	Jiahao Li et.al.	2512.06328	null
2025-12-06	A Hybrid Physics-Based and Reinforcement Learning Framework for Electric Vehicle Charging Time Prediction	Praharshitha Aryasomayajula et.al.	2512.06287	null
2025-12-06	Networked Restless Multi-Arm Bandits with Reinforcement Learning	Hanmo Zhang et.al.	2512.06274	null
2025-12-06	Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models	Chen Yang et.al.	2512.06266	null
2025-12-06	Learning Without Time-Based Embodiment Resets in Soft-Actor Critic	Homayoon Farrahi et.al.	2512.06252	null
2025-12-06	Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning	Chris Tava et.al.	2512.06250	null
2025-12-06	Auto-exploration for online reinforcement learning	Caleb Ju et.al.	2512.06244	null
2025-12-06	AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems	Chuanhao Nie et.al.	2512.06240	null
2025-12-05	Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration	Huizhen Yu et.al.	2512.06218	null
2025-12-05	Quantifying Memory Use in Reinforcement Learning with Temporal Range	Rodney Lafuente-Mercado et.al.	2512.06204	null
2025-12-05	JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning	Ufuk Çakır et.al.	2512.06102	null
2025-12-05	Empathy by Design: Aligning Large Language Models for Healthcare Dialogue	Emre Umucu et.al.	2512.06097	null
2025-12-05	Comparative Analysis of Autonomous and Systematic Control Strategies for Hole-Doped Hubbard Clusters: Reinforcement Learning versus Physics-Guided Design	Shivanshu Dwivedi et.al.	2512.06095	null
2025-12-05	Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring	Mohanakrishnan Hariharan et.al.	2512.06060	null
2025-12-05	EditThinker: Unlocking Iterative Reasoning for Any Image Editor	Hongyu Li et.al.	2512.05965	null
2025-12-05	Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity	Germán Kruszewski et.al.	2512.05962	null
2025-12-05	Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning	Yunhao Cao et.al.	2512.05953	null
2025-12-05	Variational Quantum Rainbow Deep Q-Network for Optimizing Resource Allocation Problem	Truong Thanh Hung Nguyen et.al.	2512.05946	null
2025-12-05	Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation	Fabian Konstantinidis et.al.	2512.05812	null
2025-12-05	Real-time Remote Tracking and Autonomous Planning for Whale Rendezvous using Robots	Sushmita Bhattacharya et.al.	2512.05808	null
2025-12-05	A Fast Anti-Jamming Cognitive Radar Deployment Algorithm Based on Reinforcement Learning	Wencheng Cai et.al.	2512.05753	null
2025-12-05	A High-Order Immersed Boundary Method for Fluid-Structure Interaction Problems	Yingjie Xia et.al.	2512.05733	null
2025-12-05	Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning	Ali Krayani et.al.	2512.05711	null
2025-12-05	LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving	Yiming Shu et.al.	2512.05686	null
2025-12-05	MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation	Zhitao He et.al.	2512.05671	null
2025-12-05	Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning	Zhenpeng Su et.al.	2512.05591	null
2025-12-05	Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning	Pengcheng Dai et.al.	2512.05447	null
2025-12-05	ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction	Jiangtong Tan et.al.	2512.05422	null
2025-12-05	State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning	Yuxiang Liu et.al.	2512.05335	null
2025-12-04	Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay	Mehmet Efe Lorasdagi et.al.	2512.05320	null
2025-12-04	Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces	Na Li et.al.	2512.05291	null
2025-12-04	Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem	Ali Al Housseini et.al.	2512.05207	null
2025-12-04	ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning	Shengyuan Ding et.al.	2512.05111	null
2025-12-04	STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models	Feng Xu et.al.	2512.05107	null
2025-12-04	Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning	Purbesh Mitra et.al.	2512.05105	link
2025-11-06	FoodRL: A Reinforcement Learning Ensembling Framework For In-Kind Food Donation Forecasting	Esha Sharma et.al.	2511.04865	null
2025-11-06	Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning	Thore Gerlach et.al.	2511.04856	null
2025-11-06	Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning	NVIDIA et.al.	2511.04831	null
2025-11-06	Unified Multimodal Diffusion Forcing for Forceful Manipulation	Zixuan Huang et.al.	2511.04812	null
2025-11-06	Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models	Chenxi Liu et.al.	2511.04800	null
2025-11-05	SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change Memory	Mahek Desai et.al.	2511.04713	null
2025-11-05	NCSAC: Effective Neural Community Search via Attribute-augmented Conductance	Longlong Lin et.al.	2511.04712	null
2025-11-06	GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction	Qingzhou Lu et.al.	2511.04679	null
2025-11-06	Forgetting is Everywhere	Ben Sanati et.al.	2511.04666	null
2025-11-06	Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning	Hampus Åström et.al.	2511.04598	null
2025-11-06	End-to-End Reinforcement Learning of Koopman Models for eNMPC of an Air Separation Unit	Daniel Mayfrank et.al.	2511.04522	null
2025-11-06	V-Thinker: Interactive Thinking with Images	Runqi Qiao et.al.	2511.04460	null
2025-11-06	Fitting Reinforcement Learning Model to Behavioral Data under Bandits	Hao Zhu et.al.	2511.04454	null
2025-11-06	The Peril of Preference: Why GRPO fails on Ordinal Rewards	Anisha Garg et.al.	2511.04439	null
2025-11-06	Temporal Action Selection for Action Chunking	Yueyang Weng et.al.	2511.04421	null
2025-11-06	GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies	Maëlic Neau et.al.	2511.04357	null
2025-11-06	MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments	Kuankuan Sima et.al.	2511.04320	null
2025-11-06	GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents	Jian Mu et.al.	2511.04307	null
2025-11-06	Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference	Matteo Cercola et.al.	2511.04286	null
2025-11-06	RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization	Zeng Zhiyuan et.al.	2511.04285	null
2025-11-06	SSPO: Subsentence-level Policy Optimization	Kun Yang et.al.	2511.04256	null
2025-11-06	Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies	Marco Iannotta et.al.	2511.04249	null
2025-11-06	Shared Spatial Memory Through Predictive Coding	Zhengru Fang et.al.	2511.04235	null
2025-11-06	Opus: A Quantitative Framework for Workflow Evaluation	Alan Seroul et.al.	2511.04220	null
2025-11-06	Black-Box Guardrail Reverse-engineering Attack	Hongwei Yao et.al.	2511.04215	null
2025-11-06	PUL-SLAM: Path-Uncertainty Co-Optimization with Lightweight Stagnation Detection for Efficient Robotic Exploration	Yizhen Yin et.al.	2511.04180	null
2025-11-06	Deep reinforcement learning based navigation of a jellyfish-like swimmer in flows with obstacles	Yihao Chen et.al.	2511.04156	null
2025-11-06	Exchange Policy Optimization Algorithm for Semi-Infinite Safe Reinforcement Learning	Jiaming Zhang et.al.	2511.04147	null
2025-11-06	BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning	Yitang Li et.al.	2511.04131	null
2025-11-06	RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning	Xinyuan Li et.al.	2511.04120	null
2025-11-06	CBMC-V3: A CNS-inspired Control Framework Towards Manipulation Agility with SNN	Yanbo Pang et.al.	2511.04109	null
2025-11-06	Necessary and Sufficient Conditions for the Optimization-Based Concurrent Execution of Learned Robotic Tasks	Sheikh A. Tahmid et.al.	2511.04054	null
2025-11-06	Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots	Yushi Wang et.al.	2511.03996	null
2025-11-06	Adaptive Temporal Refinement: Continuous Depth Allocation and Distance Regression for Efficient Action Localization	Ibne Farabi Shihab et.al.	2511.03943	null
2025-11-06	RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods	Raghav Sharma et.al.	2511.03939	null
2025-11-05	Learning to shine: Neuroevolution enables optical control of phase transitions	Sraddha Agrawal et.al.	2511.03895	null
2025-11-05	Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures	Florence Klitzner et.al.	2511.03882	null
2025-11-05	From Static to Dynamic: Enhancing Offline-to-Online Reinforcement Learning via Energy-Guided Diffusion Stratification	Lipeng Zu et.al.	2511.03828	null
2025-11-05	Scaling Agent Learning via Experience Synthesis	Zhaorun Chen et.al.	2511.03773	null
2025-11-05	Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning	Richard Dewey et.al.	2511.03724	null
2025-11-05	Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards	Guanning Zeng et.al.	2511.03710	null
2025-11-05	AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing	Mohsen Ahmadzadeh et.al.	2511.03697	null
2025-11-05	Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL	Lipeng Zu et.al.	2511.03695	null
2025-11-05	Simulation-Based Validation of an Integrated 4D/5D Digital-Twin Framework for Predictive Construction Control	Atena Khoshkonesh et.al.	2511.03684	null
2025-11-05	DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay	Daniel Perkins et.al.	2511.03670	null
2025-11-05	Towards Formalizing Reinforcement Learning Theory	Shangtong Zhang et.al.	2511.03618	null
2025-11-05	Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning	Iason Chrysomallis et.al.	2511.03616	null
2025-11-05	Tensor-Efficient High-Dimensional Q-learning	Junyi Wu et.al.	2511.03595	null
2025-11-05	PerfDojo: Automated ML Library Generation for Heterogeneous Architectures	Andrei Ivanov et.al.	2511.03586	null
2025-11-05	Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances	Iason Chrysomallis et.al.	2511.03565	null
2025-11-05	Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments	Bryan L. M. de Oliveira et.al.	2511.03527	null
2025-11-05	Reinforcement Learning Using known Invariances	Alexandru Cioba et.al.	2511.03473	null
2025-11-05	Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG	Longpeng Qiu et.al.	2511.03410	null
2025-11-05	Adaptable Hindsight Experience Replay for Search-Based Learning	Alexandros Vazaios et.al.	2511.03405	null
2025-11-05	Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning	Changxi Zhu et.al.	2511.03348	null
2025-11-05	DRL-Based Robust Multi-Timescale Anti-Jamming Approaches under State Uncertainty	Haoqin Zhao et.al.	2511.03305	null
2025-11-05	Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning	Ning Lyu et.al.	2511.03279	null
2025-11-05	Climate Adaptation with Reinforcement Learning: Economic vs. Quality of Life Adaptation Pathways	Miguel Costa et.al.	2511.03243	null
2025-11-05	Incorporating Quality of Life in Climate Adaptation Planning via Reinforcement Learning	Miguel Costa et.al.	2511.03238	null
2025-11-05	Collaborative Assembly Policy Learning of a Sightless Robot	Zeqing Zhang et.al.	2511.03189	null
2025-11-05	Periodic Skill Discovery	Jonghae Park et.al.	2511.03187	null
2025-11-05	Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control	Rewida Ali et.al.	2511.03181	null
2025-11-05	Optimizing Earth-Moon Transfer and Cislunar Navigation: Integrating Low-Energy Trajectories, AI Techniques and GNSS-R Technologies	Arsalan Muhammad et.al.	2511.03173	null
2025-11-05	Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning	Xin Liu et.al.	2511.03167	null
2025-11-05	Accelerating inverse materials design using generative diffusion models with reinforcement learning	Junwu Chen et.al.	2511.03112	null
2025-11-05	Scaling Multi-Agent Environment Co-Design with Diffusion Models	Hao Xiang Li et.al.	2511.03100	null
2025-11-04	Leveraging Discrete Function Decomposability for Scientific Design	James C. Bowden et.al.	2511.03032	null
2025-11-04	Value of Information-Enhanced Exploration in Bootstrapped DQN	Stergios Plataniotis et.al.	2511.02969	null
2025-11-04	Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks	Mohsin Mahmud Topu et.al.	2511.02957	null
2025-11-04	Audience Amplified: Virtual Audiences in Asynchronously Performed AR Theater	You-Jin Kim et.al.	2511.02807	null
2025-11-04	MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning	Qianhao Yuan et.al.	2511.02805	null
2025-11-04	From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos	Xun Wang et.al.	2511.02762	null
2025-11-04	Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning	Bowen Jin et.al.	2511.02755	null
2025-11-04	VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models	Zhicheng Zhang et.al.	2511.02712	null
2025-11-04	Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs	Georgios Tzannetos et.al.	2511.02690	null
2025-11-04	RL-Aided Cognitive ISAC: Robust Detection and Sensing-Communication Trade-offs	Adam Umra et.al.	2511.02672	null
2025-11-04	Natural-gas storage modelling by deep reinforcement learning	Tiziano Balaconi et.al.	2511.02646	null
2025-11-04	Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning	Tiberiu-Andrei Georgescu et.al.	2511.02605	null
2025-11-04	Directional-Clamp PPO	Gilad Karpel et.al.	2511.02577	null
2025-11-04	Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning	Yixiu Mao et.al.	2511.02567	null
2025-11-04	An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems	Changhao Miao et.al.	2511.02525	null
2025-11-04	Dexterous Robotic Piano Playing at Scale	Le Chen et.al.	2511.02504	null
2025-11-04	Auditable-choice reframing unlocks RL-based verification for open-ended tasks	Mengyu Zhang et.al.	2511.02463	null
2025-11-04	ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension	Duo Xu et.al.	2511.02415	null
2025-11-04	Large-scale automatic carbon ion treatment planning for head and neck cancers via parallel multi-agent reinforcement learning	Jueye Zhang et.al.	2511.02314	null
2025-11-04	Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning	Beyazit Yalcinkaya et.al.	2511.02304	null
2025-11-04	Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation	Zhiwei Zhang et.al.	2511.02303	null
2025-11-04	Reinforcement learning based data assimilation for unknown state model	Ziyi Wang et.al.	2511.02286	null
2025-11-04	SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning	Fangxun Shu et.al.	2511.02280	null
2025-11-04	Structural Plasticity as Active Inference: A Biologically-Inspired Architecture for Homeostatic Control	Brennen A. Hill et.al.	2511.02241	null
2025-11-04	Learning Interactive World Model for Object-Centric Reinforcement Learning	Fan Feng et.al.	2511.02225	null
2025-11-04	Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments	Manonmani Sekar et.al.	2511.02217	null
2025-11-04	Adaptive Cooperative Transmission Design for Ultra-Reliable Low-Latency Communications via Deep Reinforcement Learning	Hyemin Yu et.al.	2511.02216	null
2025-11-04	Training Proactive and Personalized LLM Agents	Weiwei Sun et.al.	2511.02208	null
2025-11-04	A Quantitative Comparison of Centralised and Distributed Reinforcement Learning-Based Control for Soft Robotic Arms	Linxin Hou et.al.	2511.02192	null
2025-11-03	JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading	Valentin Mohl et.al.	2511.02136	null
2025-11-03	Second-Order Policy Gradient Methods for the Linear Quadratic Regulator	Amirreza Valaei et.al.	2511.02095	null
2025-11-03	Automated Reward Design for Gran Turismo	Michel Ma et.al.	2511.02094	null
2025-11-03	Deep Reinforcement Learning for Multi-flow Routing in Heterogeneous Wireless Networks	Brian Kim et.al.	2511.02030	null
2025-11-03	ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book	Patrick Cheridito et.al.	2511.02016	null
2025-11-02	Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR	Abdelaziz Bounhar et.al.	2511.01937	link
2025-11-02	Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch	Yirong Zeng et.al.	2511.01934	null
2025-11-03	GenDexHand: Generative Simulation for Dexterous Hands	Feng Chen et.al.	2511.01791	null
2025-11-03	MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll	Alexander Schperberg et.al.	2511.01774	null
2025-11-03	RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks	Mian Wu et.al.	2511.01758	null
2025-11-03	Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding	Jungyeon Koh et.al.	2511.01695	null
2025-11-03	Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward	Xiaogang Xu et.al.	2511.01645	null
2025-11-03	Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models	Xiaoyu Zhan et.al.	2511.01618	null
2025-11-03	L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3	Xinyue Yang et.al.	2511.01602	null
2025-11-03	Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning	Aditya Kapoor et.al.	2511.01554	null
2025-11-03	TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks	Hanwen Xu et.al.	2511.01527	null
2025-11-03	BARD: budget-aware reasoning distillation	Lujie Niu et.al.	2511.01470	null
2025-11-03	Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis	Yuhang Huang et.al.	2511.01425	null
2025-11-03	Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm	Amrapali Pednekar et.al.	2511.01415	null
2025-11-03	AoI-Aware Machine Learning for Constrained Multimodal Sensing-Aided Communications	Abolfazl Zakeri et.al.	2511.01406	null
2025-11-03	Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization	Ziqi Wang et.al.	2511.01374	null
2025-11-03	Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series	Wenrui Cai et.al.	2511.01354	null
2025-11-03	Diffusion-Based Solver for CNF Placement on the Cloud-Continuum	Álvaro Vázquez Rodríguez et.al.	2511.01343	null
2025-11-03	RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models	Hongyin Zhang et.al.	2511.01331	null
2025-11-03	From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models	Sureyya Akin et.al.	2511.01310	null
2025-11-03	Optimizing Electric Vehicle Charging Station Placement Using Reinforcement Learning and Agent-Based Simulations	Minh-Duc Nguyen et.al.	2511.01218	null
2025-11-03	Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering	Riddhi Jain et.al.	2511.01213	null
2025-11-03	DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection	Guoxin Ma et.al.	2511.01192	null
2025-11-03	Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning	Ru Wang et.al.	2511.01191	null
2025-11-03	DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models	Ruofan Zhang et.al.	2511.01170	null
2025-11-02	SLAP: Shortcut Learning for Abstract Planning	Y. Isabel Liu et.al.	2511.01107	null
2025-11-02	HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning	Yujian Liu et.al.	2511.01104	null
2025-11-02	Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment	Zihan Wang et.al.	2511.01083	null
2025-11-02	Predictive Auxiliary Learning for Belief-based Multi-Agent Systems	Qinwei Huang et.al.	2511.01078	null
2025-11-02	Quantum Reinforcement Learning for 6G and Beyond Wireless Networks	Dinh-Hieu Tran et.al.	2511.01070	null
2025-11-02	Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning	Wenjin Liu et.al.	2511.01016	link
2025-11-02	IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation	Bosi Wen et.al.	2511.01014	null
2025-11-02	MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL	Haolin Yang et.al.	2511.01008	link
2025-11-02	GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies	Ziye Wang et.al.	2511.00998	null
2025-11-02	Optimizing Energy and Latency in 6G Smart Cities with Edge CyberTwins	Amine Abouaomar et.al.	2511.00955	null
2025-11-02	KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization	Joonyoung Lim et.al.	2511.00880	null
2025-11-02	Optimal Undulatory Swimming with Constrained Deformation and Actuation Intervals	Fumiya Tokoro et.al.	2511.00816	null
2025-11-02	Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games	Runyu Lu et.al.	2511.00811	null
2025-11-02	Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?	Bowen Fang et.al.	2511.00808	null
2025-11-02	Logic-informed reinforcement learning for cross-domain optimization of large-scale cyber-physical systems	Guangxi Wan et.al.	2511.00806	null
2025-11-02	GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents	Jie JW Wu et.al.	2511.00802	null
2025-11-02	Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration	Yan Sun et.al.	2511.00794	null
2025-11-02	Power Control Based on Multi-Agent Deep Q Network for D2D Communication	Shi Gengtian et.al.	2511.00767	null
2025-11-01	Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries	Minghe Shen et.al.	2511.00710	null
2025-11-01	PreferThinker: Reasoning-based Personalized Image Preference Assessment	Shengqi Xu et.al.	2511.00609	null
2025-11-01	OpenSIR: Open-Ended Self-Improving Reasoner	Wai-Chung Kwan et.al.	2511.00602	link
2025-11-01	Improving Robustness to Out-of-Distribution States in Imitation Learning via Deep Koopman-Boosted Diffusion Policy	Dianye Huang et.al.	2511.00555	null
2025-11-01	Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control	Qiang Li et.al.	2511.00551	null
2025-11-01	Robust Single-Agent Reinforcement Learning for Regional Traffic Signal Control Under Demand Fluctuations	Qiang Li et.al.	2511.00549	null
2025-11-01	ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation	Panwang Pan et.al.	2511.00511	null
2025-11-01	GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining	Chunyu Wei et.al.	2511.00457	null
2025-11-01	Bootstrap Off-policy with World Model	Guojian Zhan et.al.	2511.00423	null
2025-11-01	UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings	Zhibin Lan et.al.	2511.00405	link
2025-11-01	CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks	Long Li et.al.	2511.00396	null
2025-11-01	VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning	Xuanle Zhao et.al.	2511.00391	link
2025-11-01	Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond	Fan Zhang et.al.	2511.00389	null
2025-11-01	Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict	Chaochen Wu et.al.	2511.00370	null
2025-10-31	Reinforcement Learning for Resource Allocation in Vehicular Multi-Fog Computing	Mohammad Hadi Akbarzadeh et.al.	2511.00276	null
2025-10-31	Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning	Michiel Straat et.al.	2511.00272	null
2025-10-31	Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning	Marwa Abdulhai et.al.	2511.00222	null
2025-10-31	Iterative Foundation Model Fine-Tuning on Multiple Rewards	Pouya M. Ghari et.al.	2511.00220	null
2025-10-31	Deep reinforcement learning for optimal trading with partial information	Andrea Macrì et.al.	2511.00190	null
2025-10-31	Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning	Shiman Zhang et.al.	2511.00166	null
2025-10-31	EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations	Justin Yu et.al.	2511.00153	null
2025-10-31	A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control	Qing Guo et.al.	2511.00136	null
2025-10-31	DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads	Antonio Guillen-Perez et.al.	2511.00117	null
2025-10-31	LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers	Avisek Naug et.al.	2511.00116	null
2025-10-31	End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning	Hanae Elmekki et.al.	2511.00114	null
2025-10-30	Real-DRL: Teach and Learn in Reality	Yanbing Mao et.al.	2511.00112	null
2025-10-30	Self-Improving Vision-Language-Action Models with Data Generation via Residual RL	Wenli Xiao et.al.	2511.00091	null
2025-10-30	Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail	NVIDIA et.al.	2511.00088	null
2025-10-29	Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models	Tue Le et.al.	2511.00066	null
2025-10-31	Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems	Alireza Saleh Abadi et.al.	2510.27659	null
2025-10-31	Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning	Yuhong Liu et.al.	2510.27606	link
2025-10-31	MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval	Qi Luo et.al.	2510.27569	null
2025-10-31	Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval	Yulong Hui et.al.	2510.27566	null
2025-10-31	VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision	Xuan Gong et.al.	2510.27462	null
2025-10-31	Learning Soft Robotic Dynamics with Active Exploration	Hehui Zheng et.al.	2510.27428	null
2025-10-31	DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains	Tian Liang et.al.	2510.27419	null
2025-10-31	Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints	Yueyang Wang et.al.	2510.27383	null
2025-10-31	Reasoning Models Sometimes Output Illegible Chains of Thought	Arun Jose et.al.	2510.27338	null
2025-10-31	When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making	Ali Raza Jafree et.al.	2510.27334	null
2025-10-31	Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines	Kristina Levina et.al.	2510.27329	null
2025-10-31	A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination	Zhengchang Hua et.al.	2510.27289	null
2025-10-31	Inferring trust in recommendation systems from brain, behavioural, and physiological data	Vincent K. M. Cheung et.al.	2510.27272	null
2025-10-31	MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models	Kangkun Mao et.al.	2510.27267	null
2025-10-31	GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation	Tao Liu et.al.	2510.27210	null
2025-10-31	ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction	Jing Chang et.al.	2510.27168	null
2025-10-31	Disrupting Networks: Amplifying Social Dissensus via Opinion Perturbation and Large Language Models	Erica Coppolillo et.al.	2510.27152	null
2025-10-31	AURA: A Reinforcement Learning Framework for AI-Driven Adaptive Conversational Surveys	Jinwen Tang et.al.	2510.27126	null
2025-10-31	Towards Understanding Self-play for LLM Reasoning	Justin Yang Chae et.al.	2510.27072	null
2025-10-31	Distributed Precoding for Cell-free Massive MIMO in O-RAN: A Multi-agent Deep Reinforcement Learning Framework	Mohammad Hossein Shokouhi et.al.	2510.27069	null
2025-10-31	Adaptive Human-Computer Interaction Strategies Through Reinforcement Learning in Complex	Rui Liu et.al.	2510.27058	null
2025-10-30	SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation	Eric T. Chang et.al.	2510.27048	null
2025-10-30	Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning	Md Tanvirul Alam et.al.	2510.27044	link
2025-10-30	e1: Learning Adaptive Control of Reasoning Effort	Michael Kleinman et.al.	2510.27042	null
2025-10-30	Algorithmic Predation: Equilibrium Analysis in Dynamic Oligopolies with Smooth Market Sharing	Fabian Raoul Pieroth et.al.	2510.27008	null
2025-10-30	A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms	Elise Wolf et.al.	2510.27001	null
2025-10-30	Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench	Fenfen Lin et.al.	2510.26865	link
2025-10-30	Defeating the Training-Inference Mismatch via FP16	Penghui Qi et.al.	2510.26788	link
2025-10-30	A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation	Ashwin Kumar et.al.	2510.26740	null
2025-10-30	Stabilizing Rayleigh-Benard convection with reinforcement learning trained on a reduced-order model	Qiwei Chen et.al.	2510.26705	null
2025-10-30	Kimi Linear: An Expressive, Efficient Attention Architecture	Kimi Team et.al.	2510.26692	link
2025-10-30	Action-Driven Processes for Continuous-Time Control	Ruimin He et.al.	2510.26672	null
2025-10-30	Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation	Qianyou Zhao et.al.	2510.26670	null
2025-10-30	The Era of Agentic Organization: Learning to Organize with Language Models	Zewen Chi et.al.	2510.26658	null
2025-10-30	Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments	Xiaoyi He et.al.	2510.26646	null
2025-10-30	Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications	Chuang Zhang et.al.	2510.26628	null
2025-10-30	A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication	Weixuan Chen et.al.	2510.26610	null
2025-10-30	Emu3.5: Native Multimodal Models are World Learners	Yufeng Cui et.al.	2510.26583	link
2025-10-30	InfoFlow: Reinforcing Search Agent Via Reward Density Optimization	Kun Luo et.al.	2510.26575	null
2025-10-30	Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics	Prathamesh Kothavale et.al.	2510.26551	null
2025-10-30	Think Outside the Policy: In-Context Steered Policy Optimization	Hsiu-Yuan Huang et.al.	2510.26519	null
2025-10-30	Data-Efficient RLVR via Off-Policy Influence Guidance	Erle Zhu et.al.	2510.26491	null
2025-10-30	ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems	Qiaoling Chen et.al.	2510.26475	null
2025-10-30	PolarZero: A Reinforcement Learning Approach for Low-Complexity Polarization Kernel Design	Yi-Ting Hong et.al.	2510.26452	null
2025-10-30	An Impulse Control Approach to Market Making in a Hawkes LOB Market	Konark Jain et.al.	2510.26438	null
2025-10-30	Human-in-the-loop Online Rejection Sampling for Robotic Manipulation	Guanxing Lu et.al.	2510.26406	null
2025-10-30	Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning	Wenchang Duan et.al.	2510.26389	null
2025-10-30	Towards Reinforcement Learning Based Log Loading Automation	Ilya Kurinov et.al.	2510.26363	null
2025-10-30	Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle	Sebastian Zieglmeier et.al.	2510.26347	null
2025-10-30	Offline Clustering of Preference Learning with Active-data Augmentation	Jingyuan Liu et.al.	2510.26301	null
2025-10-30	Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving	Lin Liu et.al.	2510.26292	null
2025-10-30	Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search	Guochang Li et.al.	2510.26287	null
2025-10-30	Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments	Gangyang Li et.al.	2510.26280	null
2025-10-30	Graph-Enhanced Policy Optimization in LLM Agent Training	Jiazhen Yuan et.al.	2510.26270	null
2025-10-30	A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation	Songxin Lei et.al.	2510.26184	null
2025-10-30	One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning	Renhao Li et.al.	2510.26167	null
2025-10-30	Learning to Manage Investment Portfolios beyond Simple Utility Functions	Maarten P. Scholl et.al.	2510.26165	null
2025-10-30	Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math	Bo Pang et.al.	2510.26143	null
2025-10-30	EgoExo-Con: Exploring View-Invariant Video Temporal Understanding	Minjoon Jung et.al.	2510.26113	null
2025-10-30	Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error	Chenming Tang et.al.	2510.26109	null
2025-10-30	GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks	Chenrui Shi et.al.	2510.26098	null
2025-10-30	Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing	Fazel Arasteh et.al.	2510.26089	null
2025-10-30	Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion	Chi Zhang et.al.	2510.26067	null
2025-10-30	Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods	Emily Steiner et.al.	2510.26040	null
2025-10-29	Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation	Feichen Gan et.al.	2510.26026	null
2025-10-29	PORTool: Tool-Use LLM Training with Rewarded Tree	Feijie Wu et.al.	2510.26020	null
2025-10-29	Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning	Yihe Deng et.al.	2510.25992	null
2025-10-29	Estimating cognitive biases with attention-aware inverse planning	Sounak Banerjee et.al.	2510.25951	null
2025-10-29	InputDSA: Demixing then Comparing Recurrent and Externally Driven Dynamics	Ann Huang et.al.	2510.25943	null
2025-10-29	Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion	Ziyi Wang et.al.	2510.25929	null
2025-10-29	$π_\texttt{RL}$ : Online RL Fine-tuning for Flow-based Vision-Language-Action Models	Kang Chen et.al.	2510.25889	null
2025-10-29	Approximating Human Preferences Using a Multi-Judge Learned System	Eitán Sprejer et.al.	2510.25884	null
2025-10-29	MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs	Xiaoke Huang et.al.	2510.25867	null
2025-10-29	Adversarial Pre-Padding: Generating Evasive Network Traffic Against Transformer-Based Classifiers	Quanliang Jing et.al.	2510.25810	null
2025-10-29	MetaLore: Learning to Orchestrate Communication and Computation for Metaverse Synchronization	Elif Ebru Ohri et.al.	2510.25705	null
2025-10-29	PairUni: Pairwise Training for Unified Multimodal Language Models	Jiani Zheng et.al.	2510.25682	null
2025-10-29	Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning	Federica Tonti et.al.	2510.25679	null
2025-10-29	ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents	Tianyu Yang et.al.	2510.25668	null
2025-10-29	Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills	Weikang Wan et.al.	2510.25634	null
2025-10-29	EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis	Yusheng Liao et.al.	2510.25628	null
2025-10-29	On the instability of local learning algorithms: Q-learning can fail in infinite state spaces	Urtzi Ayesta et.al.	2510.25572	null
2025-10-29	Deep Reinforcement Learning-Based Cooperative Rate Splitting for Satellite-to-Underground Communication Networks	Kaiqiang Lin et.al.	2510.25562	null
2025-10-29	Off-policy Reinforcement Learning with Model-based Exploration Augmentation	Likun Wang et.al.	2510.25529	null
2025-10-29	Zero Reinforcement Learning Towards General Domains	Yuyuan Zeng et.al.	2510.25528	null
2025-10-29	MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL	Zekun Xu et.al.	2510.25510	null
2025-10-29	Dynamic Beamforming and Power Allocation in ISAC via Deep Reinforcement Learning	Duc Nguyen Dao et.al.	2510.25496	null
2025-10-29	Reinforcement Learning techniques for the flavor problem in particle physics	A. Giarnetti et.al.	2510.25495	null
2025-10-29	Generalized Pseudo-Relevance Feedback	Yiteng Tu et.al.	2510.25488	null
2025-10-29	Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning	Kei Ikemura et.al.	2510.25405	null
2025-10-29	Model-Free Robust Beamforming in Satellite Downlink using Reinforcement Learning	Alea Schröder et.al.	2510.25393	null
2025-10-29	Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork	Beiwen Zhang et.al.	2510.25340	null
2025-10-29	GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning	Jiaqi Wu et.al.	2510.25320	null
2025-10-29	Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning	Sagalpreet Singh et.al.	2510.25311	null
2025-10-29	Adaptive Design of mmWave Initial Access Codebooks using Reinforcement Learning	Sabrine Aroua et.al.	2510.25271	null
2025-10-29	The influence of the random numbers quality on the results in stochastic simulations and machine learning	Benjamin A. Antunes et.al.	2510.25269	null
2025-10-29	SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation	Wang zhi et.al.	2510.25268	null
2025-10-29	One-shot Humanoid Whole-body Motion Learning	Hao Huang et.al.	2510.25241	null
2025-09-26	Impact of Collective Behaviors of Autonomous Vehicles on Urban Traffic Dynamics: A Multi-Agent Reinforcement Learning Approach	Ahmet Onur Akman et.al.	2509.22216	null
2025-07-29	Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics	Leonard Hinckeldey et.al.	2507.21638	null
2025-07-23	Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains	Anisha Gunjal et.al.	2507.17746	null
2025-07-23	Megrez2 Technical Report	Boxun Li et.al.	2507.17728	null
2025-07-23	How Should We Meta-Learn Reinforcement Learning Algorithms?	Alexander David Goldie et.al.	2507.17668	null
2025-07-23	CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning	Lingxiao Tang et.al.	2507.17548	null
2025-07-23	Generalized Advantage Estimation for Distributional Policy Gradients	Shahil Shaik et.al.	2507.17530	null
2025-07-23	Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice	Shanbo Cheng et.al.	2507.17527	null
2025-07-23	URPO: A Unified Reward & Policy Optimization Framework for Large Language Models	Songshuo Lu et.al.	2507.17515	null
2025-07-23	Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning	Yu Li et.al.	2507.17512	null
2025-07-23	ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents	Chang Nie et.al.	2507.17462	null
2025-07-23	Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning	Situo Zhang et.al.	2507.17448	null
2025-07-22	Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning	Junhao Shen et.al.	2507.16814	null
2025-07-22	Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty	Mehul Damani et.al.	2507.16806	null
2025-07-22	Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning	Mian Ibad Ali Shah et.al.	2507.16796	null
2025-07-22	Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning	Ang Li et.al.	2507.16746	link
2025-07-23	Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints	Zhenyun Yin et.al.	2507.16727	null
2025-07-22	Adaptive Inventory Strategies using Deep Reinforcement Learning for Dynamic Agri-Food Supply Chains	Amandeep Kaur et.al.	2507.16670	null
2025-07-22	FOGNITE: Federated Learning-Enhanced Fog-Cloud Architecture	Somayeh Sobati-M et.al.	2507.16668	null
2025-07-22	Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis	Sara Giordano et.al.	2507.16641	null
2025-07-22	Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems	Ali Mohamed Ali et.al.	2507.16635	null
2025-07-22	Step-Audio 2 Technical Report	Boyong Wu et.al.	2507.16632	link
2025-07-21	The Impact of Language Mixing on Bilingual LLM Reasoning	Yihao Li et.al.	2507.15849	null
2025-07-21	GUI-G $^2$ : Gaussian Reward Modeling for GUI Grounding	Fei Tang et.al.	2507.15846	link
2025-07-22	Hierarchical Budget Policy Optimization for Adaptive Reasoning	Shangke Lyu et.al.	2507.15844	link
2025-07-21	LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra	Seth Karten et.al.	2507.15815	link
2025-07-21	Power-Constrained Policy Gradient Methods for LQR	Ashwin Verma et.al.	2507.15806	null
2025-07-21	Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning	Sneheel Sarangi et.al.	2507.15788	null
2025-07-21	Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR	Jiakang Wang et.al.	2507.15778	link
2025-07-21	LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization	Xingyu Wu et.al.	2507.15758	link
2025-07-21	EMP: Executable Motion Prior for Humanoid Robot Standing Upper-body Motion Imitation	Haocheng Xu et.al.	2507.15649	null
2025-07-21	Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training	Kailai Yang et.al.	2507.15640	null
2025-07-18	CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning	Xiaoya Li et.al.	2507.14111	link
2025-07-18	Preference-based Multi-Objective Reinforcement Learning	Ni Mu et.al.	2507.14066	null
2025-07-18	Reframing attention as a reinforcement learning problem for causal discovery	Turan Orujlu et.al.	2507.13920	null
2025-07-18	Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments	Kathrin Korte et.al.	2507.13846	null
2025-07-18	Scalable Submodular Policy Optimization via Pruned Submodularity Graph	Aditi Anand et.al.	2507.13834	null
2025-07-18	DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training	Zhixin Wang et.al.	2507.13833	null
2025-07-18	Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery	Joydeep Chandra et.al.	2507.13757	null
2025-07-18	LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction	Jing Chang et.al.	2507.13712	null
2025-07-18	CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation	Jing Chang et.al.	2507.13710	null
2025-07-18	State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions	Sen Lu et.al.	2507.13638	null
2025-07-17	VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning	Senqiao Yang et.al.	2507.13348	link
2025-07-17	The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner	Zhouqi Hua et.al.	2507.13332	null
2025-07-17	Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour	Emma M. A. Harrison et.al.	2507.13277	null
2025-07-17	QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation	Jiazheng Li et.al.	2507.13266	null
2025-07-17	Signal Temporal Logic Compliant Co-design of Planning and Control	Manas Sashank Juvvi et.al.	2507.13225	null
2025-07-17	Spectral Bellman Method: Unifying Representation and Exploration in RL	Ofir Nabati et.al.	2507.13181	null
2025-07-17	Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback	Suzie Kim et.al.	2507.13171	null
2025-07-17	Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities	Hao Sun et.al.	2507.13158	null
2025-07-17	From Roots to Rewards: Dynamic Tree Reasoning with RL	Ahmed Bahloul et.al.	2507.13142	null
2025-07-17	ZipMPC: Compressed Context-Dependent MPC Cost via Imitation Learning	Rahel Rickenbach et.al.	2507.13088	null
2025-07-16	EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos	Ruihan Yang et.al.	2507.12440	null
2025-07-16	Improving Reinforcement Learning Sample-Efficiency using Local Approximation	Mohit Prashant et.al.	2507.12383	null
2025-07-16	Thought Purity: Defense Paradigm For Chain-of-Thought Attack	Zihao Xue et.al.	2507.12314	null
2025-07-16	Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning	Yuhao Chen et.al.	2507.12215	null
2025-07-16	BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search	Azhar Ikhtiarudin et.al.	2507.12189	link
2025-07-17	Efficient Preparation of Fermionic Superfluids in an Optical Dipole Trap through Reinforcement Learning	Yueyang Min et.al.	2507.12152	null
2025-07-16	Topology Enhanced MARL for Multi-Vehicle Cooperative Decision-Making of CAVs	Ye Han et.al.	2507.12110	null
2025-07-16	Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics	Muleilan Pei et.al.	2507.12083	null
2025-07-16	Towards Ultra-Reliable 6G in-X Subnetworks: Dynamic Link Adaptation by Deep Reinforcement Learning	Fateme Salehi et.al.	2507.12031	null
2025-07-16	QAS-QTNs: Curriculum Reinforcement Learning-Driven Quantum Architecture Search for Quantum Tensor Networks	Siddhant Dutta et.al.	2507.12013	null
2025-07-15	Robot Drummer: Learning Rhythmic Skills for Humanoid Drumming	Asad Ali Shahid et.al.	2507.11498	null
2025-07-15	Exploring the robustness of TractOracle methods in RL-based tractography	Jeremi Levesque et.al.	2507.11486	null
2025-07-15	Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light	Mani Hamidi et.al.	2507.11482	null
2025-07-15	Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs	Gabriel Bo et.al.	2507.11371	null
2025-07-15	Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning	Daniel Tanneberg et.al.	2507.11367	null
2025-07-15	Sensing Accuracy Optimization for Multi-UAV SAR Interferometry with Data Offloading	Mohamed-Amine Lahmeri et.al.	2507.11284	null
2025-07-15	Ocean Diviner: A Diffusion-Augmented Reinforcement Learning for AUV Robust Control in the Underwater Tasks	Weiyi Liu et.al.	2507.11283	null
2025-07-15	Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound	Tal Fiskus et.al.	2507.11269	null
2025-07-15	Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction	Deepak Kumar Panda et.al.	2507.11173	null
2025-07-15	Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities	Yiting Qu et.al.	2507.11155	null
2025-07-14	EmbRACE-3K: Embodied Reasoning and Action in Complex Environments	Mingxian Lin et.al.	2507.10548	link
2025-07-14	Disentangling Neural Disjunctive Normal Form Models	Kexin Gu Baugh et.al.	2507.10546	null
2025-07-14	Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination	Mingqi Wu et.al.	2507.10532	link
2025-07-14	Some remarks on gradient dominance and LQR policy optimization	Eduardo D. Sontag et.al.	2507.10452	null
2025-07-14	Prompt Informed Reinforcement Learning for Visual Coverage Path Planning	Venkat Margapuri et.al.	2507.10284	null
2025-07-14	Cross-Timeslot Optimization for Distributed GPU Inference Using Reinforcement Learning	Chengze Du et.al.	2507.10259	null
2025-07-14	ToMacVF : Temporal Macro-action Value Factorization for Asynchronous Multi-Agent Reinforcement Learning	Wenjing Zhang et.al.	2507.10251	null
2025-07-14	Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning?	Yumi Omori et.al.	2507.10174	null
2025-07-14	Robust RL Control for Bipedal Locomotion with Closed Kinematic Chains	Egor Maslennikov et.al.	2507.10164	null
2025-07-14	Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review	Siyi Hu et.al.	2507.10142	null
2025-07-11	One Token to Fool LLM-as-a-Judge	Yulai Zhao et.al.	2507.08794	null
2025-07-11	Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning	James McCarthy et.al.	2507.08793	null
2025-07-11	Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data	Jeonghye Kim et.al.	2507.08761	null
2025-07-11	On the Effect of Regularization in Policy Mirror Descent	Jan Felix Kleuker et.al.	2507.08718	null
2025-07-11	SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations	Peter Crowley et.al.	2507.08707	null
2025-07-11	elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings	Philip Osborne et.al.	2507.08705	null
2025-07-11	Multi-critic Learning for Whole-body End-effector Twist Tracking	Aravind Elanjimattathil Vijayan et.al.	2507.08656	null
2025-07-11	Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees	Berire Gunes Reyhan et.al.	2507.08653	null
2025-07-11	Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning	Xingguang Ji et.al.	2507.08649	link
2025-07-11	Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data	Parag Dutta et.al.	2507.08610	null
2025-07-10	Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology	Haochen Wang et.al.	2507.07999	link
2025-07-10	Single-pass Adaptive Image Tokenization for Minimum Program Search	Shivam Duggal et.al.	2507.07995	null
2025-07-10	EXPO: Stable Reinforcement Learning with Expressive Policies	Perry Dong et.al.	2507.07986	null
2025-07-10	Reinforcement Learning with Action Chunking	Qiyang Li et.al.	2507.07969	null
2025-07-10	Scaling RL to Long Videos	Yukang Chen et.al.	2507.07966	link
2025-07-10	Excess Observables Reveal Nonreciprocity in Integrated Covariance	Timur Aslyamov et.al.	2507.07876	null
2025-07-10	“So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents	Giovanni Dispoto et.al.	2507.07848	null
2025-07-10	Beyond Robustness: Learning Unknown Dynamic Load Adaptation for Quadruped Locomotion on Rough Terrain	Leixin Chang et.al.	2507.07825	null
2025-07-10	BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning	Ruohong Liu et.al.	2507.07769	null
2025-07-10	Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization	Chengtao Jian et.al.	2507.07723	null
2025-07-09	Graph-Based Complexity Metrics for Multi-Agent Curriculum Learning: A Validated Approach to Task Ordering in Cooperative Coordination Environments	Farhaan Ebadulla et.al.	2507.07074	null
2025-07-09	First Return, Entropy-Eliciting Explore	Tianyu Zheng et.al.	2507.07017	null
2025-07-09	Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks	Deemah H. Tashman et.al.	2507.06997	null
2025-07-09	Optimizing Cognitive Networks: Reinforcement Learning Meets Energy Harvesting Over Cascaded Channels	Deemah H. Tashman et.al.	2507.06981	null
2025-07-09	Bounomodes: the grazing ox algorithm for exploration of clustered anomalies	Samuel Matloob et.al.	2507.06960	null
2025-07-10	Rethinking Verification for LLM Code Generation: From Generation to Testing	Zihan Ma et.al.	2507.06920	link
2025-07-09	Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams	Abolfazl Zarghani et.al.	2507.06901	null
2025-07-09	Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model	Jing Liang et.al.	2507.06892	null
2025-07-09	Episodic Contextual Bandits with Knapsacks under Conversion Models	Zitian Li et.al.	2507.06859	null
2025-07-10	Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning	Matej Straka et.al.	2507.06825	link
2025-07-08	EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow	Yixiang Chen et.al.	2507.06224	null
2025-07-08	CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization	Zhongyuan Peng et.al.	2507.06181	link
2025-07-08	Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model	Koki Yamane et.al.	2507.06174	null
2025-07-08	Learning Agile Tensile Perching for Aerial Robots from Demonstrations	Kangle Yuan et.al.	2507.06172	null
2025-07-08	Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation	Mohamad H. Danesh et.al.	2507.06111	null
2025-07-08	AI-Based Demand Forecasting and Load Balancing for Optimising Energy use in Healthcare Systems: A real case study	Iman Rahimi et.al.	2507.06077	null
2025-07-09	FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models	Bo Pang et.al.	2507.06057	null
2025-07-08	CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation	Kushal Gajjar et.al.	2507.06013	null
2025-07-08	From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination	Chang Yao et.al.	2507.06004	null
2025-07-08	BlueLM-2.5-3B Technical Report	Baojiao Xiong et.al.	2507.05934	null
2025-07-07	Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning	Yana Wei et.al.	2507.05255	link
2025-07-07	Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving	Elahe Delavari et.al.	2507.05251	null
2025-07-07	NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving	Qucheng Peng et.al.	2507.05227	null
2025-07-07	EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling	Boyuan Wang et.al.	2507.05198	null
2025-07-07	Sequential Attention-based Sampling for Histopathological Analysis	Tarun G et.al.	2507.05077	null
2025-07-07	Replacing thinking with tool usage enables reasoning in small language models	Corrado Rainone et.al.	2507.05065	null
2025-07-07	When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning	Maxence Boels et.al.	2507.05011	null
2025-07-07	Linking Homeostasis to Reinforcement Learning: Internal State Control of Motivated Behavior	Naoto Yoshida et.al.	2507.04998	null
2025-07-07	Object-centric Denoising Diffusion Models for Physical Reasoning	Moritz Lange et.al.	2507.04920	null
2025-07-07	Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning	Sanyam Vyas et.al.	2507.04883	null
2025-07-03	MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs	Purbesh Mitra et.al.	2507.02851	link
2025-07-03	StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason	Kaiyi Zhang et.al.	2507.02841	null
2025-07-03	ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning	Ruiyang Zhou et.al.	2507.02834	null
2025-07-03	Generalizing Verifiable Instruction Following	Valentina Pyatkin et.al.	2507.02833	null
2025-07-03	Multimodal Mathematical Reasoning with Diverse Solving Perspective	Wenhao Shi et.al.	2507.02804	null
2025-07-03	A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control	Zilin Kang et.al.	2507.02712	null
2025-07-03	Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions	Thomas Hazenberg et.al.	2507.02698	null
2025-07-03	RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes	Jiaxing Wang et.al.	2507.02690	null
2025-07-03	TUC-PPO: Team Utility-Constrained Proximal Policy Optimization for Spatial Public Goods Games	Zhaoqilin Yang et.al.	2507.02675	null
2025-07-03	On Efficient Bayesian Exploration in Model-Based Reinforcement Learning	Alberto Caron et.al.	2507.02639	null
2025-07-02	Kwai Keye-VL Technical Report	Kwai Keye Team et.al.	2507.01949	link
2025-07-02	NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks	Yang Li et.al.	2507.01921	null
2025-07-02	Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models	Chengao Li et.al.	2507.01915	null
2025-07-02	TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types	Yuhao Lin et.al.	2507.01857	null
2025-07-02	TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents	Dmytro Kuzmenko et.al.	2507.01823	null
2025-07-02	Quantum reinforcement learning in dynamic environments	Oliver Sefrin et.al.	2507.01691	null
2025-07-02	AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training	Zhenyu Han et.al.	2507.01663	null
2025-07-02	Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning	Wu Fei et.al.	2507.01551	null
2025-07-02	Chargax: A JAX Accelerated EV Charging Simulator	Koen Ponse et.al.	2507.01522	null
2025-07-02	Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning	Yanfei Zhang et.al.	2507.01489	null
2025-07-01	SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning	Bo Liu et.al.	2506.24119	link
2025-06-30	Scaling Human Judgment in Community Notes with LLMs	Haiwen Li et.al.	2506.24118	null
2025-06-30	Constructing Non-Markovian Decision Process via History Aggregator	Yongyi Wang et.al.	2506.24026	null
2025-06-30	Provably Efficient and Agile Randomized Q-Learning	He Wang et.al.	2506.24005	null
2025-06-30	Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning	Seungjun Yi et.al.	2506.23998	null
2025-06-30	ADReFT: Adaptive Decision Repair for Safe Autonomous Driving via Reinforcement Fine-Tuning	Mingfei Cheng et.al.	2506.23960	null
2025-07-01	Adapt Your Body: Mitigating Proprioception Shifts in Imitation Learning	Fuhang Kuang et.al.	2506.23944	null
2025-06-30	Reinforcement Learning for Synchronised Flow Control in a Dual-Gate Resin Infusion System	Miguel Camacho-Sánchez et.al.	2506.23923	null
2025-06-30	The Trilemma of Truth in Large Language Models	Germans Savcisens et.al.	2506.23921	link
2025-06-30	Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning	Anton Andreychuk et.al.	2506.23793	link
2025-06-27	MiCo: Multi-image Contrast for Reinforcement Visual Reasoning	Xi Chen et.al.	2506.22434	null
2025-06-27	ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks	Pritam Dash et.al.	2506.22423	null
2025-06-27	HyperCLOVA X THINK Technical Report	NAVER Cloud HyperCLOVA X Team et.al.	2506.22403	null
2025-06-27	Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL	Tong Yang et.al.	2506.22401	null
2025-06-27	Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation	Tao Li et.al.	2506.22365	null
2025-06-27	Education-Oriented Graph Retrieval-Augmented Generation for Learning Path Recommendation	Xinghe Cheng et.al.	2506.22303	null
2025-06-27	ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning	Ming Zhao et.al.	2506.22216	null
2025-06-27	A Reinforcement Learning Framework for Some Singular Stochastic Control Problems	Zongxia Liang et.al.	2506.22203	null
2025-06-27	EFRame: Deeper Reasoning via Exploration-Filtering-Replay Reinforcement Learning Framework	Chen Wang et.al.	2506.22200	link
2025-06-27	ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research	Bavo Lesy et.al.	2506.22174	null
2025-06-26	Joint Scheduling of DER under Demand Charges: Structure and Approximation	Ruixiao Yang et.al.	2506.21510	null
2025-06-26	Bridging Offline and Online Reinforcement Learning for LLMs	Jack Lanchantin et.al.	2506.21495	null
2025-06-26	Reinforcement Learning for Optimal Control of Spin Magnetometers	Logan W. Cooke et.al.	2506.21475	null
2025-06-26	Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage	Gavin Lee Goodship et.al.	2506.21465	null
2025-06-26	Spatial Mental Modeling from Limited Views	Baiqiao Yin et.al.	2506.21458	null
2025-06-26	Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning	Prajwal Koirala et.al.	2506.21427	null
2025-06-26	rQdia: Regularizing Q-Value Distributions With Image Augmentation	Sam Lerman et.al.	2506.21367	null
2025-06-26	HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context	Qize Yang et.al.	2506.21277	link
2025-06-26	World-aware Planning Narratives Enhance Large Vision-Language Model Planner	Junhao Shi et.al.	2506.21230	null
2025-06-26	Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design	Hampus Gummesson Svensson et.al.	2506.21158	null
2025-06-25	MMSearch-R1: Incentivizing LMMs to Search	Jinming Wu et.al.	2506.20670	link
2025-06-25	DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy	Sungjae Park et.al.	2506.20668	null
2025-06-25	The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind	Andrei Lupu et.al.	2506.20664	null
2025-06-25	DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation	Shansan Gong et.al.	2506.20639	link
2025-06-25	PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models	Soufiane Hayou et.al.	2506.20629	link
2025-06-25	Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control	Andrew Mole et.al.	2506.20554	null
2025-06-25	Demonstration of effective UCB-based routing in skill-based queues on real-world data	Sanne van Kempen et.al.	2506.20543	null
2025-06-25	Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards	Charles Arnal et.al.	2506.20520	null
2025-06-25	OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling	Zengzhi Wang et.al.	2506.20512	link
2025-06-25	ReCode: Updating Code API Knowledge with Reinforcement Learning	Haoze Wu et.al.	2506.20495	link
2025-06-24	JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning	Ai Han et.al.	2506.19846	null
2025-06-24	Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning	Guo Li et.al.	2506.19843	null
2025-06-24	Persona Features Control Emergent Misalignment	Miles Wang et.al.	2506.19823	null
2025-06-24	KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality	Baochang Ren et.al.	2506.19807	null
2025-06-24	Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning	Menglong Zhang et.al.	2506.19785	null
2025-06-24	SAGE: Strategy-Adaptive Generation Engine for Query Rewriting	Teng Wang et.al.	2506.19783	null
2025-06-24	Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment	Yuhui Sun et.al.	2506.19780	null
2025-06-24	SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning	Yuqian Fu et.al.	2506.19767	null
2025-06-24	Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks	Nathan Maurer et.al.	2506.19703	null
2025-06-24	From memories to maps: Mechanisms of in context reinforcement learning in transformers	Ching Fang et.al.	2506.19686	null
2025-06-23	ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs	Jiaru Zou et.al.	2506.18896	null
2025-06-23	Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning	Anthony Kobanda et.al.	2506.18847	null
2025-06-23	LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning	Yuhao Wu et.al.	2506.18841	null
2025-06-23	SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives	Yizhou Chen et.al.	2506.18825	null
2025-06-23	MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation	Ruicheng Zhang et.al.	2506.18679	null
2025-06-23	Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation	Jingming Liu et.al.	2506.18670	null
2025-06-23	RL-Driven Semantic Compression Model Selection and Resource Allocation in Semantic Communication Systems	Xinyi Lin et.al.	2506.18660	null
2025-06-23	Dual-level Behavioral Consistency for Inter-group and Intra-group Coordination in Multi-Agent Systems	Shuocun Yang et.al.	2506.18651	null
2025-06-23	Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits	Yannik Mahlau et.al.	2506.18627	null
2025-06-23	Policy gradient methods for ordinal policies	Simón Weinberger et.al.	2506.18614	null
2025-06-20	No Free Lunch: Rethinking Internal Feedback for LLM Reasoning	Yanzhi Zhang et.al.	2506.17219	null
2025-06-20	Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens	Zeyuan Yang et.al.	2506.17218	null
2025-06-20	BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning	Xuechen Zhang et.al.	2506.17211	null
2025-06-20	Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning	Guozheng Ma et.al.	2506.17204	null
2025-06-20	Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity	Samin Yeasar Arnob et.al.	2506.17155	null
2025-06-20	When Can Model-Free Reinforcement Learning be Enough for Thinking?	Josiah P. Hanna et.al.	2506.17124	null
2025-06-20	TransDreamerV3: Implanting Transformer In DreamerV3	Shruti Sadanand Dongare et.al.	2506.17103	null
2025-06-20	Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs	Ricardo Rei et.al.	2506.17080	null
2025-06-20	Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment	Leizhen Wang et.al.	2506.17029	null
2025-06-20	Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators	Marco Jiralerspong et.al.	2506.17007	null
2025-06-18	Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards	Qingming Liu et.al.	2506.15684	null
2025-06-18	CC-LEARN: Cohort-based Consistency Learning	Xiao Ye et.al.	2506.15662	null
2025-06-18	CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization	Ranting Hu et.al.	2506.15654	null
2025-06-18	AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning	Tevin Wang et.al.	2506.15651	null
2025-06-18	Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement	Weixiang Zhao et.al.	2506.15647	null
2025-06-18	Learning to flock in open space by avoiding collisions and staying together	Martino Brambati et.al.	2506.15587	null
2025-06-18	Design of an all-facet illuminator for high NA EUV lithography exposure tool based on deep reinforcement learning	Tong Li et.al.	2506.15558	null
2025-06-18	Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning	Roger Creus Castanyer et.al.	2506.15544	link
2025-06-18	Lessons from Training Grounded LLMs with Verifiable Rewards	Shang Hong Sim et.al.	2506.15522	null
2025-06-18	Zero-Shot Reinforcement Learning Under Partial Observability	Scott Jeen et.al.	2506.15446	null
2025-06-17	Reasoning with Exploration: An Entropy Perspective	Daixuan Cheng et.al.	2506.14758	null
2025-06-17	Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation	Carolina Higuera et.al.	2506.14754	null
2025-06-17	Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs	Ring Team et.al.	2506.14731	null
2025-06-17	Adaptive Accompaniment with ReaLchords	Yusong Wu et.al.	2506.14723	null
2025-06-17	SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning	Hexian Ni et.al.	2506.14648	null
2025-06-17	On Quantum BSDE Solver for High-Dimensional Parabolic PDEs	Howard Su et.al.	2506.14612	null
2025-06-17	TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization	Mingkang Zhu et.al.	2506.14574	null
2025-06-17	Toward Safety-First Human-Like Decision Making for Autonomous Vehicles in Time-Varying Traffic Flow	Xiao Wang et.al.	2506.14502	null
2025-06-17	Zeroth-Order Optimization is Secretly Single-Step Policy Optimization	Junbin Qiu et.al.	2506.14460	null
2025-06-17	Toward Rich Video Human-Motion2D Generation	Ruihao Xi et.al.	2506.14428	null
2025-06-16	Touch begins where vision ends: Generalizable policies for contact-rich manipulation	Zifan Zhao et.al.	2506.13762	null
2025-06-16	MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering	Arya Fayyazi et.al.	2506.13755	null
2025-06-16	LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction	Haoru Xue et.al.	2506.13751	null
2025-06-16	PB $^2$ : Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning	Brahim Driss et.al.	2506.13741	null
2025-06-16	TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning	Junru Zhang et.al.	2506.13705	link
2025-06-16	Value-Free Policy Optimization via Reward Partitioning	Bilal Faye et.al.	2506.13702	null
2025-06-16	OneRec Technical Report	Guorui Zhou et.al.	2506.13695	null
2025-06-16	Meta-learning how to Share Credit among Macro-Actions	Ionel-Alexandru Hosu et.al.	2506.13690	null
2025-06-16	The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning	Jiashun Liu et.al.	2506.13672	null
2025-06-16	We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems	Junfeng Fang et.al.	2506.13666	null
2025-06-13	Schema-R1: A reasoning training approach for schema linking in Text-to-SQL Task	Wuzhenghong Wen et.al.	2506.11986	null
2025-06-13	Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks	Ankit Bhardwaj et.al.	2506.11973	null
2025-06-13	Visual Pre-Training on Unlabeled Images using Reinforcement Learning	Dibya Ghosh et.al.	2506.11967	null
2025-06-13	Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning	Mohammadamin Moradi et.al.	2506.11957	null
2025-06-13	SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies	Nadun Ranawaka Arachchige et.al.	2506.11948	null
2025-06-13	Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations	Miguel Suau et.al.	2506.11912	null
2025-06-13	Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients	Chapa Sirithunge et.al.	2506.11906	null
2025-06-13	TreeRL: LLM Reinforcement Learning with On-Policy Tree Search	Zhenyu Hou et.al.	2506.11902	link
2025-06-13	An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing	Haochen Sun et.al.	2506.11882	null
2025-06-13	LLM-based Dynamic Differential Testing for Database Connectors with Reinforcement Learning-Guided Prompt Selection	Ce Lyu et.al.	2506.11870	null
2025-06-12	Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop	Justin Kerr et.al.	2506.10968	null
2025-06-12	Spurious Rewards: Rethinking Training Signals in RLVR	Rulin Shao et.al.	2506.10947	link
2025-06-12	Self-Adapting Language Models	Adam Zweiger et.al.	2506.10943	null
2025-06-12	Magistral	Mistral-AI et.al.	2506.10910	null
2025-06-12	Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning	Waylon Luo et.al.	2506.10889	null
2025-06-12	Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization	Pierre-François Massiani et.al.	2506.10871	null
2025-06-13	Joint Beamforming with Extremely Large Scale RIS: A Sequential Multi-Agent A2C Approach	Zhi Chai et.al.	2506.10815	null
2025-06-12	Human-Robot Navigation using Event-based Cameras and Reinforcement Learning	Ignacio Bugueno-Cordova et.al.	2506.10790	null
2025-06-12	PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework	SiXiang Chen et.al.	2506.10741	link
2025-06-12	Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs	Yucong Luo et.al.	2506.10630	null
2025-06-11	Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing	Junfei Wu et.al.	2506.09965	link
2025-06-11	VerIF: Verification Engineering for Reinforcement Learning in Instruction Following	Hao Peng et.al.	2506.09942	link
2025-06-11	The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability	Jiachen Hu et.al.	2506.09940	null
2025-06-11	From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models	Irving Fang et.al.	2506.09930	link
2025-06-11	“What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)	Noel Brindise et.al.	2506.09901	null
2025-06-11	Hierarchical Learning-Enhanced MPC for Safe Crowd Navigation with Heterogeneous Constraints	Huajian Liu et.al.	2506.09859	null
2025-06-11	Foundation Model-Aided Deep Reinforcement Learning for RIS-Assisted Wireless Communication	Mohammad Ghassemi et.al.	2506.09855	null
2025-06-11	CoRT: Code-integrated Reasoning within Thinking	Chengpeng Li et.al.	2506.09820	link
2025-06-11	Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy	Tonghe Wang et.al.	2506.09805	null
2025-06-11	Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving	Haochen Liu et.al.	2506.09800	null
2025-06-09	Play to Generalize: Learning to Reason Through Game Play	Yunfei Xie et.al.	2506.08011	link
2025-06-09	Reinforcement Pre-Training	Qingxiu Dong et.al.	2506.08007	null
2025-06-09	Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator	Alberto Bazán-Guillén et.al.	2506.07980	null
2025-06-09	Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction	Junhong Shen et.al.	2506.07976	link
2025-06-09	A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle	Amirreza Yasami et.al.	2506.07929	null
2025-06-09	LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement	Dimitris Panagopoulos et.al.	2506.07915	null
2025-06-09	WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning	Jie Yang et.al.	2506.07905	link
2025-06-09	MiniCPM4: Ultra-Efficient LLMs on End Devices	MiniCPM Team et.al.	2506.07900	link
2025-06-09	Diffusion-RL for Scalable Resource Allocation for 6G Networks	Salar Nouri et.al.	2506.07880	null
2025-06-09	Versatile Loco-Manipulation through Flexible Interlimb Coordination	Xinghao Zhu et.al.	2506.07876	null
2025-06-06	Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens	Jihwan Jeong et.al.	2506.06261	null
2025-06-06	How to craft a deep reinforcement learning policy for wind farm flow control	Elie Kadoche et.al.	2506.06204	null
2025-06-06	Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization	Jonathan Yang et.al.	2506.06196	null
2025-06-06	A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization	Muhammed Ustaomeroglu et.al.	2506.06179	null
2025-06-06	Reusing Trajectories in Policy Gradients Enables Fast Convergence	Alessandro Montenegro et.al.	2506.06178	null
2025-06-06	Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach	James Ford et.al.	2506.06175	null
2025-06-06	Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models	Rihui Jin et.al.	2506.06137	null
2025-06-06	Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library	Weixun Wang et.al.	2506.06122	link
2025-06-06	On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems	Elim Kwan et.al.	2506.06094	null
2025-06-06	Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning	Atharv Kulkarni et.al.	2506.06093	null
2025-06-05	ContentV: Efficient Training of Video Generation Models with Limited Compute	Wenfeng Lin et.al.	2506.05343	null
2025-06-05	AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Lidong Lu et.al.	2506.05328	link
2025-06-05	Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay	Yifan Sun et.al.	2506.05316	null
2025-06-05	Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s	Ramesh Johari et.al.	2506.05308	null
2025-06-05	A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$ : Robust Imitation via Learning to Search	Arnav Kumar Jain et.al.	2506.05294	link
2025-06-06	Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning	Violet Xiang et.al.	2506.05256	null
2025-06-05	Towards Language-Augmented Multi-Agent Deep Reinforcement Learning	Maxime Toquebiau et.al.	2506.05236	null
2025-06-05	Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning	Yuhua Zhu et.al.	2506.05208	null
2025-06-05	TreeRPO: Tree Relative Policy Optimization	Zhicheng Yang et.al.	2506.05183	link
2025-06-05	Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning	Yunsheng Tian et.al.	2506.05168	null
2025-06-04	Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning	Shuang Chen et.al.	2506.04207	link
2025-06-04	MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures	Elena Zamaraeva et.al.	2506.04195	null
2025-06-04	R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning	Qingfei Zhao et.al.	2506.04185	link
2025-06-04	Horizon Reduction Makes RL Scalable	Seohong Park et.al.	2506.04168	null
2025-06-04	SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL	Jiaheng Hu et.al.	2506.04147	null
2025-06-04	Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning	Muling Wu et.al.	2506.04065	null
2025-06-04	Crowd-SFT: Crowdsourcing for LLM Alignment	Alex Sotiropoulos et.al.	2506.04063	null
2025-06-04	Autonomous Vehicle Lateral Control Using Deep Reinforcement Learning with MPC-PID Demonstration	Chengdong Wu et.al.	2506.04040	null
2025-06-04	Interpretability by Design for Efficient Multi-Objective Reinforcement Learning	Qiyue Xia et.al.	2506.04022	null
2025-06-04	Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning	Xunzhu Tang et.al.	2506.03921	null
2025-06-03	Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning	Yinjie Wang et.al.	2506.03136	link
2025-06-03	AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation	Prashanth Vijayaraghavan et.al.	2506.03122	null
2025-06-03	Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback	Xiaoying Zhang et.al.	2506.03106	link
2025-06-03	EgoVLM: Policy Optimization for Egocentric Video Understanding	Ashwin Vinod et.al.	2506.03097	link
2025-06-03	DPO Learning with LLMs-Judge Signal for Computer Use Agents	Man Luo et.al.	2506.03095	null
2025-06-03	Provable Reinforcement Learning from Human Feedback with an Unknown Link Function	Qining Zhang et.al.	2506.03066	null
2025-06-03	EDEN: Entorhinal Driven Egocentric Navigation Toward Robotic Deployment	Mikolaj Walczak et.al.	2506.03046	null
2025-06-03	Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective	Jintian Shao et.al.	2506.03038	null
2025-06-03	MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver	Yuepeng Zheng et.al.	2506.02935	null
2025-06-03	Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning	Yin Fang et.al.	2506.02911	link
2025-05-30	ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL	Yu Zhang et.al.	2505.24875	null
2025-05-30	ProxyThinker: Test-Time Guidance through Small Visual Reasoners	Zilin Xiao et.al.	2505.24872	null
2025-05-30	MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning	Yiqing Liang et.al.	2505.24871	null
2025-05-30	ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models	Mingjie Liu et.al.	2505.24864	null
2025-05-30	MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning	Jingyan Shen et.al.	2505.24846	null
2025-05-30	AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models	Conor Heins et.al.	2505.24784	null
2025-05-30	Diffusion-Based Symbolic Regression	Zachary Bastiani et.al.	2505.24776	null
2025-05-30	REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards	Zafir Stojanovski et.al.	2505.24760	link
2025-05-30	Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning	Shelly Bensal et.al.	2505.24726	null
2025-06-03	Reinforcing Video Reasoning with Focused Thinking	Jisheng Dang et.al.	2505.24718	link
2025-05-29	ZeroGUI: Automating Online GUI Learning at Zero Human Cost	Chenyu Yang et.al.	2505.23762	link
2025-05-29	DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning	Ziyin Zhang et.al.	2505.23754	link
2025-05-29	PixelThink: Towards Efficient Chain-of-Pixel Reasoning	Song Wang et.al.	2505.23727	null
2025-05-29	ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering	Zexi Liu et.al.	2505.23723	link
2025-05-29	AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning	Lucas N. Alegre et.al.	2505.23708	null
2025-05-29	Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability	Ruida Wang et.al.	2505.23703	null
2025-05-29	Grounded Reinforcement Learning for Visual Reasoning	Gabriel Sarch et.al.	2505.23678	null
2025-05-29	Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models	Lang Cao et.al.	2505.23667	null
2025-05-29	AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction	Niklas Freymuth et.al.	2505.23663	link
2025-05-29	Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation	Hongxiang Zhang et.al.	2505.23657	null
2025-05-28	Maximizing Confidence Alone Improves Reasoning	Mihir Prabhudesai et.al.	2505.22660	null
2025-05-28	The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason	Ang Lv et.al.	2505.22653	null
2025-05-28	WebDancer: Towards Autonomous Information Seeking Agency	Jialong Wu et.al.	2505.22648	null
2025-05-28	FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control	Younggyo Seo et.al.	2505.22642	null
2025-05-28	SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning	Yu Zhang et.al.	2505.22626	null
2025-05-28	The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models	Ganqu Cui et.al.	2505.22617	null
2025-05-28	HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym	Ngoc La et.al.	2505.22597	null
2025-05-28	SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning	Jiaqi Huang et.al.	2505.22596	null
2025-05-28	Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs	Changhao Song et.al.	2505.22548	null
2025-05-28	Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation	Hongyi Zhou et.al.	2505.22492	null
2025-05-27	Reinforcing General Reasoning without Verifiers	Xiangxin Zhou et.al.	2505.21493	null
2025-05-27	Policy Optimized Text-to-Image Pipeline Design	Uri Gadot et.al.	2505.21478	null
2025-05-27	Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO	Muzhi Zhu et.al.	2505.21457	null
2025-05-27	Can Large Reasoning Models Self-Train?	Sheikh Shafayat et.al.	2505.21444	null
2025-05-27	A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment	Brett Bissey et.al.	2505.21414	null
2025-05-27	MRSD: Multi-Resolution Skill Discovery for HRL Agents	Shashank Sharma et.al.	2505.21410	null
2025-05-27	Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features	Zixuan Xie et.al.	2505.21391	null
2025-05-27	EgoWalk: A Multimodal Dataset for Robot Navigation in the Wild	Timur Akhtyamov et.al.	2505.21282	null
2025-05-27	Data-Driven Cellular Mobility Management via Bayesian Optimization and Reinforcement Learning	Mohamed Benzaghta et.al.	2505.21249	null
2025-05-27	Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies	Felix Chalumeau et.al.	2505.21236	null
2025-05-26	FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities	Jin Wang et.al.	2505.20147	null
2025-05-26	MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning	Yuanxin Zhuang et.al.	2505.20131	null
2025-05-26	Proxy-Free GFlowNet	Ruishuo Chen et.al.	2505.20110	null
2025-05-26	Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning	Ziyi Zhang et.al.	2505.20107	null
2025-05-26	Adaptive Deep Reasoning: Triggering Deep Thinking When Needed	Yunhao Wang et.al.	2505.20101	null
2025-05-26	SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale	Qi Li et.al.	2505.20094	null
2025-05-26	Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback	Mengdi Li et.al.	2505.20075	null
2025-05-26	Incentivizing Reasoning from Weak Supervision	Yige Yuan et.al.	2505.20072	null
2025-05-26	SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety	Geon-Hyeong Kim et.al.	2505.20065	null
2025-05-26	REARANK: Reasoning Re-ranking Agent via Reinforcement Learning	Le Zhang et.al.	2505.20046	null
2025-05-23	One RL to See Them All: Visual Triple Unified Reinforcement Learning	Yan Ma et.al.	2505.18129	null
2025-05-23	Reward Model Overoptimisation in Iterated RLHF	Lorenz Wolf et.al.	2505.18126	null
2025-05-23	ProgRM: Build Better GUI Agents with Progress Rewards	Danyang Zhang et.al.	2505.18121	null
2025-05-23	Bridging Supervised Learning and Reinforcement Learning in Math Reasoning	Huayu Chen et.al.	2505.18116	null
2025-05-23	Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL	Joey Hong et.al.	2505.18098	null
2025-05-23	Stable Reinforcement Learning for Efficient Reasoning	Muzhi Dai et.al.	2505.18086	null
2025-05-23	What Do You Need for Diverse Trajectory Stitching in Diffusion Planning?	Quentin Clark et.al.	2505.18083	null
2025-05-23	Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals	Jia-Nan Li et.al.	2505.18071	null
2025-05-23	Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective	Jintian Shao et.al.	2505.17997	null
2025-05-23	Outcome-based Reinforcement Learning to Predict the Future	Benjamin Turtel et.al.	2505.17989	null
2025-05-22	GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning	Chengqi Duan et.al.	2505.17022	link
2025-05-22	SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward	Kaixuan Fan et.al.	2505.17018	link
2025-05-22	Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO	Chengzhuo Tong et.al.	2505.17017	link
2025-05-22	Interactive Post-Training for Vision-Language-Action Models	Shuhan Tan et.al.	2505.17016	null
2025-05-22	R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning	Huatong Song et.al.	2505.17005	link
2025-05-22	$\text{R}^2\text{ec}$ : Towards Large Recommender Models with Reasoning	Runyang You et.al.	2505.16994	link
2025-05-22	SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development	Yaxin Du et.al.	2505.16975	link
2025-05-22	Risk-Averse Reinforcement Learning with Itakura-Saito Loss	Igor Udovichenko et.al.	2505.16925	null
2025-05-22	LARES: Latent Reasoning for Sequential Recommendation	Enze Liu et.al.	2505.16865	null
2025-05-22	Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only	Wei Xiao et.al.	2505.16856	null
2025-05-21	GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents	Yuqi Zhou et.al.	2505.15810	link
2025-05-21	MMaDA: Multimodal Large Diffusion Language Models	Ling Yang et.al.	2505.15809	link
2025-05-21	STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs	Zongzhao Li et.al.	2505.15804	null
2025-05-21	VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models	Yuchen Yan et.al.	2505.15801	null
2025-05-21	Reverse Engineering Human Preferences with Reinforcement Learning	Lisa Alazraki et.al.	2505.15795	null
2025-05-21	HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving	Zhiwen Chen et.al.	2505.15793	null
2025-05-21	VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL	Fengyuan Dai et.al.	2505.15791	null
2025-05-21	ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning	Changtai Zhu et.al.	2505.15776	null
2025-05-21	Improving planning and MBRL with temporally-extended actions	Palash Chatterjee et.al.	2505.15754	null
2025-05-21	UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning	Xiangyu Wang et.al.	2505.15725	null
2025-05-20	Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning	Haolei Xu et.al.	2505.14684	link
2025-05-20	Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning	Jiaer Xia et.al.	2505.14677	link
2025-05-20	Reward Reasoning Model	Jiaxin Guo et.al.	2505.14674	null
2025-05-20	General-Reasoner: Advancing LLM Reasoning Across All Domains	Xueguang Ma et.al.	2505.14652	link
2025-05-20	Think Only When You Need with Large Hybrid-Reasoning Models	Lingjie Jiang et.al.	2505.14631	null
2025-05-20	TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning	Zhangchen Xu et.al.	2505.14625	link
2025-05-20	Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning	Wenbin Hu et.al.	2505.14585	null
2025-05-20	Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning	Deemah H. Tashman et.al.	2505.14581	null
2025-05-20	KIPPO: Koopman-Inspired Proximal Policy Optimization	Andrei Cozma et.al.	2505.14566	null
2025-05-20	Bellman operator convergence enhancements in reinforcement learning algorithms	David Krame Kadurha et.al.	2505.14564	null
2025-05-19	Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards	Xiaoyuan Liu et.al.	2505.13445	link
2025-05-19	Optimizing Anytime Reasoning via Budget Relative Policy Optimization	Penghui Qi et.al.	2505.13438	link
2025-05-19	KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture	R. James Cotton et.al.	2505.13436	null
2025-05-19	G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning	Liang Chen et.al.	2505.13426	link
2025-05-20	A Dataless Reinforcement Learning Approach to Rounding Hyperplane Optimization for Max-Cut	Gabriel Malikal et.al.	2505.13405	null
2025-05-19	Thinkless: LLM Learns When to Think	Gongfan Fang et.al.	2505.13379	link
2025-05-19	Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning	Irene Brugnara et.al.	2505.13372	null
2025-05-19	J4R: Learning to Judge with Equivalent Initial State Group Relative Preference Optimization	Austin Xu et.al.	2505.13346	null
2025-05-19	Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems	Babak Badnava et.al.	2505.13337	null
2025-05-19	CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning	Lei Sheng et.al.	2505.13271	link
2025-05-16	SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics	Lizhi Yang et.al.	2505.11494	null
2025-05-16	Improving Assembly Code Performance with Large Language Models via Reinforcement Learning	Anjiang Wei et.al.	2505.11480	null
2025-05-16	Automatic Reward Shaping from Confounded Offline Data	Mingxuan Li et.al.	2505.11478	null
2025-05-16	HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages	Zhilin Wang et.al.	2505.11475	null
2025-05-16	Disentangling Reasoning and Knowledge in Medical Large Language Models	Rahul Thapa et.al.	2505.11462	null
2025-05-16	Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks	Wesley A Suttle et.al.	2505.11461	null
2025-05-16	Visual Planning: Let’s Think Only with Images	Yi Xu et.al.	2505.11409	link
2025-05-16	Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner	Wenchuan Zhang et.al.	2505.11404	link
2025-05-16	Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space	Ali Rabiee et.al.	2505.11366	null
2025-05-16	Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics	Ardian Selmonaj et.al.	2505.11311	null
2025-05-15	Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models	Zhiyuan Hu et.al.	2505.10554	link
2025-05-15	Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation	Xinrui Wang et.al.	2505.10522	null
2025-05-15	Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning	Andrea Baisero et.al.	2505.10484	null
2025-05-15	Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps	Ningyuan Yang et.al.	2505.10482	null
2025-05-15	Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	Zemin Huang et.al.	2505.10446	null
2025-05-15	IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning	Dechen Gao et.al.	2505.10442	null
2025-05-15	Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs	Jingyao Wang et.al.	2505.10425	null
2025-05-15	Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency	Daniel Weitekamp et.al.	2505.10422	null
2025-05-15	Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change	Jonathan Clifford Balloch et.al.	2505.10330	null
2025-05-15	J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning	Chenxi Whitehouse et.al.	2505.10320	null
2025-05-14	DataMIL: Selecting Data for Robot Imitation Learning with Datamodels	Shivin Dass et.al.	2505.09603	null
2025-05-14	Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware	Justin Yu et.al.	2505.09601	link
2025-05-14	VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation	Chaofan Zhang et.al.	2505.09577	null
2025-05-14	Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach	Shannon Lodoen et.al.	2505.09576	null
2025-05-14	Learning Long-Context Diffusion Policies via Past-Token Prediction	Marcel Torne et.al.	2505.09561	null
2025-05-14	WavReward: Spoken Dialogue Models With Generalist Reward Evaluators	Shengpeng Ji et.al.	2505.09558	link
2025-05-14	Distilling Realizable Students from Unrealizable Teachers	Yujin Kim et.al.	2505.09546	null
2025-05-14	Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data	Rui Miao et.al.	2505.09496	null
2025-05-14	Preserving Plasticity in Continual Learning with Adaptive Linearity Injection	Seyed Roozbeh Razavi Rohani et.al.	2505.09486	null
2025-05-14	Quantum state-agnostic work extraction (almost) without dissipation	Josep Lumbreras et.al.	2505.09456	null
2025-05-13	Generative Molecular Design with Steerable and Granular Synthesizability Control	Jeff Guo et.al.	2505.08774	null
2025-05-13	Preference Optimization for Combinatorial Optimization Problems	Mingjun Pan et.al.	2505.08735	null
2025-05-13	A Study of Data-driven Methods for Inventory Optimization	Lee Yeung Ping et.al.	2505.08673	null
2025-05-13	Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning	Shuai Han et.al.	2505.08630	null
2025-05-13	Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations	Sarmad Mehrdad et.al.	2505.08619	null
2025-05-13	OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning	Zhaochen Su et.al.	2505.08617	link
2025-05-13	Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection	Ayush K. Rai et.al.	2505.08561	null
2025-05-13	Strategy-Augmented Planning for Large Language Models via Opponent Exploitation	Shuai Xu et.al.	2505.08459	null
2025-05-13	Zero-Shot Sim-to-Real Reinforcement Learning for Fruit Harvesting	Emlyn Williams et.al.	2505.08458	null
2025-05-13	Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges	Miguel Arana-Catania et.al.	2505.08453	null
2025-05-12	DanceGRPO: Unleashing GRPO on Visual Generation	Zeyue Xue et.al.	2505.07818	link
2025-05-12	A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values	Daniel Beechey et.al.	2505.07797	link
2025-05-12	MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering	Rushi Qiang et.al.	2505.07782	link
2025-05-12	Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving	Xinji Mai et.al.	2505.07773	link
2025-05-12	Guiding Data Collection via Factored Scaling Curves	Lihan Zha et.al.	2505.07728	link
2025-05-12	S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models	Muzhi Dai et.al.	2505.07686	null
2025-05-12	A comparative study of Bitcoin and Ripple cryptocurrencies trading using Deep Reinforcement Learning algorithms	Dieu-Donne Fangnon et.al.	2505.07660	null
2025-05-12	MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining	Xiaomi LLM-Core Team et.al.	2505.07608	link
2025-05-12	Multi-Objective Reinforcement Learning for Energy-Efficient Industrial Control	Georg Schäfer et.al.	2505.07607	null
2025-05-12	Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent	Ziyang Huang et.al.	2505.07596	link
2025-05-09	VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction	Noah Frahm et.al.	2505.06219	null
2025-05-09	Let Humanoids Hike! Integrative Skill Development on Complex Trails	Kwan-Yee Lin et.al.	2505.06218	null
2025-05-09	Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach	Tim Schneider et.al.	2505.06182	null
2025-05-09	Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning	Haokun Yu et.al.	2505.06122	null
2025-05-09	TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations	Shuaiyi Huang et.al.	2505.06079	null
2025-05-09	Safe-EF: Error Feedback for Nonsmooth Constrained Optimization	Rustem Islamov et.al.	2505.06053	null
2025-05-09	Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI	Jianpeng Qi et.al.	2505.06025	null
2025-05-09	Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models	Lennart Stöpler et.al.	2505.05970	null
2025-05-09	Offline Multi-agent Reinforcement Learning via Score Decomposition	Dan Qiao et.al.	2505.05968	null
2025-05-09	Learning Power Control Protocol for In-Factory 6G Subnetworks	Uyoata E. Uyoata et.al.	2505.05967	null
2025-05-08	Flow-GRPO: Training Flow Matching Models via Online RL	Jie Liu et.al.	2505.05470	link
2025-05-08	RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles	Pouria Behnoudfar et.al.	2505.05452	null
2025-05-08	Reasoning Models Don’t Always Say What They Think	Yanda Chen et.al.	2505.05410	null
2025-05-08	Repair Crew Routing for Infrastructure Network Restoration under Incomplete Information	Subhojit Biswas et.al.	2505.05297	null
2025-05-08	Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation	Zechu Li et.al.	2505.05287	null
2025-05-08	Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration	Andreas Kontogiannis et.al.	2505.05262	null
2025-05-08	High Altitude Platform-Based Caching and Multicasting for Rural Connectivity	Yongqiang Zhang et.al.	2505.05251	null
2025-05-08	Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation	Luca Marzari et.al.	2505.05235	null
2025-05-08	Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network	Changxiang Wu et.al.	2505.05231	null
2025-05-08	Multi-Objective Reinforcement Learning for Adaptive Personalized Autonomous Driving	Hendrik Surmann et.al.	2505.05223	null
2025-05-07	EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning	Zhenghao Xing et.al.	2505.04623	link
2025-05-07	Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation	Abdulaziz Almuzairee et.al.	2505.04619	null
2025-05-07	ZeroSearch: Incentivize the Search Capability of LLMs without Searching	Hao Sun et.al.	2505.04588	link
2025-05-07	Active Sampling for MRI-based Sequential Decision Making	Yuning Du et.al.	2505.04586	link
2025-05-07	Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions	Stéphane Aroca-Ouellette et.al.	2505.04579	null
2025-05-07	Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization	Wenjun Cao et.al.	2505.04578	null
2025-05-07	Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions	Shanyu Han et.al.	2505.04553	null
2025-05-07	A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance	Axel Friedrich Wolter et.al.	2505.04494	null
2025-05-07	RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation	Jing Hu et.al.	2505.04424	link
2025-05-07	A Heuristic-Integrated DRL Approach for Phase Optimization in Large-Scale RISs	Wei Wang et.al.	2505.04401	null
2025-05-06	AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control	Jialong Li et.al.	2505.03738	null
2025-05-06	Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning	Dian Chen et.al.	2505.03721	null
2025-05-06	Actor-Critics Can Achieve Optimal Sample Efficiency	Kevin Tan et.al.	2505.03710	null
2025-05-06	Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches	Feiran Zhao et.al.	2505.03706	null
2025-05-06	Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation	Songchen Fu et.al.	2505.03586	null
2025-05-06	Ergodic Generative Flows	Leo Maxime Brunswic et.al.	2505.03561	null
2025-05-06	Multi-Agent Reinforcement Learning Scheduling to Support Low Latency in Teleoperated Driving	Giacomo Avanzi et.al.	2505.03558	null
2025-05-06	Small-Scale-Fading-Aware Resource Allocation in Wireless Federated Learning	Jiacheng Wang et.al.	2505.03533	null
2025-05-06	The Steganographic Potentials of Language Models	Artem Karpov et.al.	2505.03439	null
2025-05-06	Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients	Stefano Bruno et.al.	2505.03432	null
2025-05-05	R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning	Yi-Fan Zhang et.al.	2505.02835	link
2025-05-05	TWIST: Teleoperated Whole-Body Imitation System	Yanjie Ze et.al.	2505.02833	null
2025-05-05	Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing	Diji Yang et.al.	2505.02811	link
2025-05-05	Teaching the social media generation: rethinking learning without sacrificing quality	Sepinoud Azimi et.al.	2505.02770	null
2025-05-05	The use of Artificial Intelligence for Intervention and Assessment in Individuals with ASD	Aggeliki Sideraki et.al.	2505.02747	null
2025-05-05	Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry	Junu Kim et.al.	2505.02722	link
2025-05-05	Graph Neural Network-Based Reinforcement Learning for Controlling Biological Networks: The GATTACA Framework	Andrzej Mizera et.al.	2505.02712	null
2025-05-05	Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models	Xiaobao Wu et.al.	2505.02686	link
2025-05-05	Online Phase Estimation of Human Oscillatory Motions using Deep Learning	Antonio Grotta et.al.	2505.02668	null
2025-05-05	A Survey on Progress in LLM Alignment from the Perspective of Reward Design	Miaomiao Ji et.al.	2505.02666	null
2025-05-02	FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research	Yan Miao et.al.	2505.01383	null
2025-05-02	Stabilizing Temporal Difference Learning via Implicit Stochastic Approximation	Hwanwoo Kim et.al.	2505.01361	null
2025-05-02	Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story	Vincenzo De Paola et.al.	2505.01336	null
2025-05-02	Integration of Multi-Mode Preference into Home Energy Management System Using Deep Reinforcement Learning	Mohammed Sumayli et.al.	2505.01332	null
2025-05-02	Exploring Equity of Climate Policies using Multi-Agent Multi-Objective Reinforcement Learning	Palok Biswas et.al.	2505.01115	null
2025-05-02	Multi-Objective Reinforcement Learning for Water Management	Zuzanna Osika et.al.	2505.01094	null
2025-05-02	Llama-Nemotron: Efficient Reasoning Models	Akhiad Bercovich et.al.	2505.00949	null
2025-05-01	Learning Neural Control Barrier Functions from Offline Data with Conservatism	Ihab Tabbara et.al.	2505.00908	null
2025-05-01	SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation	Quang P. M. Pham et.al.	2505.00831	null
2025-05-01	Constructing an Optimal Behavior Basis for the Option Keyboard	Lucas N. Alegre et.al.	2505.00787	null
2025-05-01	T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT	Dongzhi Jiang et.al.	2505.00703	link
2025-05-01	Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions	Chenggang Wang et.al.	2505.00671	null
2025-05-01	Deep Reinforcement Learning for Urban Air Quality Management: Multi-Objective Optimization of Pollution Mitigation Booth Placement in Metropolitan Environments	Kirtan Rajesh et.al.	2505.00668	null
2025-05-01	Wasserstein Policy Optimization	David Pfau et.al.	2505.00663	null
2025-05-01	DeepCritic: Deliberate Critique with Large Language Models	Wenkai Yang et.al.	2505.00662	link
2025-05-02	100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models	Chong Zhang et.al.	2505.00551	null
2025-05-01	Directly Forecasting Belief for Reinforcement Learning with Delays	Qingyuan Wu et.al.	2505.00546	null
2025-05-01	Emergence of Roles in Robotic Teams with Model Sharing and Limited Communication	Ian O’Flynn et.al.	2505.00540	null
2025-05-01	Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks	Xinyu Wang et.al.	2505.00530	null
2025-05-01	DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation	Zixuan Chen et.al.	2505.00527	null
2025-04-30	DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition	Z. Z. Ren et.al.	2504.21801	link
2025-04-30	Reconciling Discrete-Time Mixed Policies and Continuous-Time Relaxed Controls in Reinforcement Learning and Stochastic Control	Rene Carmona et.al.	2504.21793	null
2025-04-30	MAGNET: an open-source library for mesh agglomeration by Graph Neural Networks	Paola F. Antonietti et.al.	2504.21780	null
2025-04-30	LLM-based Interactive Imitation Learning for Robotic Manipulation	Jonas Werner et.al.	2504.21769	null
2025-04-30	LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning	Yiyang Shao et.al.	2504.21738	null
2025-04-30	Adaptive 3D UI Placement in Mixed Reality Using Deep Reinforcement Learning	Feiyu Lu et.al.	2504.21731	null
2025-04-30	MovementVR: An open-source tool for the study of motor control and learning in virtual reality	Cristina Rossi et.al.	2504.21696	null
2025-04-30	Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation	Luca Marzari et.al.	2504.21643	null
2025-04-30	Multi-Goal Dexterous Hand Manipulation using Probabilistic Model-based Reinforcement Learning	Yingzhuo Jiang et.al.	2504.21585	null
2025-04-30	SimPRIVE: a Simulation framework for Physical Robot Interaction with Virtual Environments	Federico Nesti et.al.	2504.21454	null
2025-04-29	Toward Efficient Exploration by Large Language Model Agents	Dilip Arumugam et.al.	2504.20997	null
2025-04-29	XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search	Yiting Zhang et.al.	2504.20969	null
2025-04-29	Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity	Taisuke Kobayashi et.al.	2504.20932	null
2025-04-29	ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification	Ziqing Fan et.al.	2504.20930	link
2025-04-29	Exploiting inter-agent coupling information for efficient reinforcement learning of cooperative LQR	Shahbaz P Qadri Syed et.al.	2504.20927	null
2025-04-29	A Domain-Agnostic Scalable AI Safety Ensuring Framework	Beomjun Kim et.al.	2504.20924	null
2025-04-29	Reinforcement Learning for LLM Reasoning Under Memory Constraints	Alan Lee et.al.	2504.20834	null
2025-04-29	A Teacher-Student MPC-PPO Coupled Reinforcement Learning Framework for Winter Temperature Control of Solar Greenhouses in Northern China	Jingxin Yu et.al.	2504.20815	null
2025-04-29	SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings	Florian Vahl et.al.	2504.20808	null
2025-04-29	Q-Fusion: Diffusing Quantum Circuits	Collin Beaudoin et.al.	2504.20794	null
2025-04-28	SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning	Wufei Ma et.al.	2504.20024	null
2025-04-28	Socially-Aware Autonomous Driving: Inferring Yielding Intentions for Safer Interactions	Jing Wang et.al.	2504.20004	null
2025-04-28	Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets	Adam Younsi et.al.	2504.19981	null
2025-04-28	Mesh-Learner: Texturing Mesh with Spherical Harmonics	Yunfei Wan et.al.	2504.19938	null
2025-04-28	Automated decision-making for dynamic task assignment at scale	Riccardo Lo Bianco et.al.	2504.19933	null
2025-04-28	GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets	Mingqian He et.al.	2504.19898	null
2025-04-28	Optimizing the Charging of Open Quantum Batteries using Long Short-Term Memory-Driven Reinforcement Learning	Shadab Zakavati et.al.	2504.19840	null
2025-04-28	LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects	Guangyi Liu et.al.	2504.19838	link
2025-04-28	Reinforcement Learning-Based Heterogeneous Multi-Task Optimization in Semantic Broadcast Communications	Zhilin Lu et.al.	2504.19806	null
2025-04-28	Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control	Heisei Yonezawa et.al.	2504.19715	null
2025-04-25	Generalization Capability for Imitation Learning	Yixiao Wang et.al.	2504.18538	null
2025-04-25	Intelligent Attacks and Defense Methods in Federated Learning-enabled Energy-Efficient Wireless Networks	Han Zhang et.al.	2504.18519	null
2025-04-25	Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation	Peiyuan Jing et.al.	2504.18453	null
2025-04-25	Pushing the boundary on Natural Language Inference	Pablo Miralles-González et.al.	2504.18376	null
2025-04-25	Explainable AI for UAV Mobility Management: A Deep Q-Network Approach for Handover Minimization	Irshad A. Meer et.al.	2504.18371	null
2025-04-25	Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps	Simon Hakenes et.al.	2504.18300	null
2025-04-25	Depth-Constrained ASV Navigation with Deep RL and Limited Sensing	Amirhossein Zhalehmehrabi et.al.	2504.18253	null
2025-04-25	Aligning Language Models for Icelandic Legal Text Summarization	Þórir Hrafn Harðarson et.al.	2504.18180	null
2025-04-25	Offline Learning of Controllable Diverse Behaviors	Mathieu Petitbois et.al.	2504.18160	null
2025-04-25	Learning from Less: SINDy Surrogates in RL	Aniket Dixit et.al.	2504.18113	null
2025-04-24	Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control	Haochen Wang et.al.	2504.17771	null
2025-04-24	Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence	Edward Collins et.al.	2504.17703	null
2025-04-24	Applied Sheaf Theory For Multi-agent Artificial Intelligence (Reinforcement Learning) Systems: A Prospectus	Eric Schmid et.al.	2504.17700	null
2025-04-24	SAPO-RL: Sequential Actuator Placement Optimization for Fuselage Assembly via Reinforcement Learning	Peng Ye et.al.	2504.17603	null
2025-04-24	Mitigating xApp conflicts for efficient network slicing in 6G O-RAN: a graph convolutional-based attention network approach	Sihem Bakri et.al.	2504.17590	null
2025-04-24	Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization	Hongshu Guo et.al.	2504.17578	null
2025-04-24	Cooperative Task Offloading through Asynchronous Deep Reinforcement Learning in Mobile Edge Computing for Future Networks	Yuelin Liu et.al.	2504.17526	null
2025-04-24	Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning	Mingqi Yuan et.al.	2504.17490	null
2025-04-24	Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning	Weiliang Zhang et.al.	2504.17356	null
2025-04-24	Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization	Xiaohan Huang et.al.	2504.17355	null
2025-04-23	Latent Diffusion Planning for Imitation Learning	Amber Xie et.al.	2504.16925	null
2025-04-23	Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms	Hsin-Jung Yang et.al.	2504.16916	null
2025-04-23	Hybrid Reinforcement Learning and Model Predictive Control for Adaptive Control of Hydrogen-Diesel Dual-Fuel Combustion	Julian Bedei et.al.	2504.16875	null
2025-04-23	Monte Carlo Planning with Large Language Model for Text-Based Game Agents	Zijing Shi et.al.	2504.16855	null
2025-04-23	SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward	Nicolas Jonason et.al.	2504.16839	null
2025-04-23	MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme	Weixi Li et.al.	2504.16729	null
2025-04-23	PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation	Wenxuan Li et.al.	2504.16693	null
2025-04-23	Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator	Chenhao Li et.al.	2504.16680	null
2025-04-23	Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning	Chris et.al.	2504.16656	link
2025-04-23	Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models	Fredy Pokou et.al.	2504.16635	null
2025-04-22	TTRL: Test-Time Reinforcement Learning	Yuxin Zuo et.al.	2504.16084	link
2025-04-22	LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities	Thomas Schmied et.al.	2504.16078	null
2025-04-22	Reinforcement Learning and Metaheuristics for Feynman Integral Reduction	Mao Zeng et.al.	2504.16045	null
2025-04-22	The Formation of Production Networks: How Supply Chains Arise from Simple Learning with Minimal Information	Tuong Manh Vu et.al.	2504.16010	null
2025-04-22	Making Neural Networks More Suitable for Approximate Clifford+T Circuit Synthesis	Mathias Weiden et.al.	2504.15990	null
2025-04-22	Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems	Lukas Gehrke et.al.	2504.15984	null
2025-04-22	Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning	Wang Lin et.al.	2504.15932	null
2025-04-22	StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation	Yinmin Zhong et.al.	2504.15930	null
2025-04-22	New Recipe for Semi-supervised Community Detection: Clique Annealing under Crystallization Kinetics	Ling Cheng et.al.	2504.15927	null
2025-04-22	GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network	Wenjing Xiao et.al.	2504.15905	null
2025-04-21	VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models	Weiye Xu et.al.	2504.15279	null
2025-04-21	Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning	Jie Cheng et.al.	2504.15275	link
2025-04-21	FlowReasoner: Reinforcing Query-Level Meta-Agents	Hongcheng Gao et.al.	2504.15257	link
2025-04-21	DRAGON: Distributional Rewards Optimize Diffusion Generative Models	Yatong Bai et.al.	2504.15217	null
2025-04-21	Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs	Marina Sakharova et.al.	2504.15210	null
2025-04-21	Beyond Binary Opinions: A Deep Reinforcement Learning-Based Approach to Uncertainty-Aware Competitive Influence Maximization	Qi Zhang et.al.	2504.15131	null
2025-04-21	A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment	Kangyao Huang et.al.	2504.15129	null
2025-04-21	Fast-Slow Co-advancing Optimizer: Toward Harmonious Adversarial Training of GAN	Lin Wang et.al.	2504.15099	null
2025-04-21	Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL	Simone Papicchio et.al.	2504.15077	null
2025-04-21	Energy-Efficient UAV-Mounted RIS for IoT: A Hybrid Energy Harvesting and DRL Approach	Mahmoud M. Salim et.al.	2504.15043	null
2025-04-18	Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?	Yang Yue et.al.	2504.13837	null
2025-04-18	Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning	Yixuan Even Xu et.al.	2504.13818	null
2025-04-18	DiffOG: Differentiable Policy Trajectory Optimization with Generalizability	Zhengtong Xu et.al.	2504.13807	null
2025-04-18	Imitation Learning with Precisely Labeled Human Demonstrations	Yilong Song et.al.	2504.13803	null
2025-04-18	Bake Two Cakes with One Oven: RL for Defusing Popularity Bias and Cold-start in Third-Party Library Recommendations	Minh Hoang Vuong et.al.	2504.13772	null
2025-04-18	A Reinforcement Learning Method to Factual and Counterfactual Explanations for Session-based Recommendation	Han Zhou et.al.	2504.13632	null
2025-04-18	Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning	Rohan P. Singh et.al.	2504.13619	null
2025-04-18	On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting	Niklas Funk et.al.	2504.13618	null
2025-04-18	Compile Scene Graphs with Reinforcement Learning	Zuyao Chen et.al.	2504.13617	null
2025-04-18	Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling	Zihao Feng et.al.	2504.13592	null
2025-04-17	Energy-Based Reward Models for Robust Language Model Alignment	Anamika Lochab et.al.	2504.13134	null
2025-04-17	LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard	Varun Rao et.al.	2504.13125	null
2025-04-17	SkyReels-V2: Infinite-length Film Generative Model	Guibin Chen et.al.	2504.13074	null
2025-04-17	NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation	Xiangyan Liu et.al.	2504.13055	null
2025-04-17	InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning	Zheng Wang et.al.	2504.13032	null
2025-04-17	QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?	Zhouyang Jiang et.al.	2504.12961	null
2025-04-17	RL-PINNs: Reinforcement Learning-Driven Adaptive Sampling for Efficient Training of PINNs	Zhenao Song et.al.	2504.12949	null
2025-04-17	Image-Editing Specialists: An RLAIF Approach for Diffusion Models	Elior Benarous et.al.	2504.12833	link
2025-04-17	Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis	James Rudd-Jones et.al.	2504.12777	null
2025-04-17	GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks	Hao Xu et.al.	2504.12764	null
2025-04-16	Adapting a World Model for Trajectory Following in a 3D Game	Marko Tot et.al.	2504.12299	null
2025-04-16	d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning	Siyan Zhao et.al.	2504.12216	null
2025-04-16	Reasoning-Based AI for Startup Evaluation (R.A.I.S.E.): A Memory-Augmented, Multi-Step Decision Framework	Jack Preuveneers et.al.	2504.12090	null
2025-04-16	pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild	Jonas Myhre Schiøtt et.al.	2504.12045	null
2025-04-16	Evolutionary Reinforcement Learning for Interpretable Decision-Making in Supply Chain Management	Stefano Genetti et.al.	2504.12023	null
2025-04-16	Control of Rayleigh-Bénard Convection: Effectiveness of Reinforcement Learning in the Turbulent Regime	Thorben Markmann et.al.	2504.12000	null
2025-04-16	A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs	Kihyuk Hong et.al.	2504.11997	null
2025-04-16	Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions	Yifei Dong et.al.	2504.11967	null
2025-04-16	R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors	Haoyang Wang et.al.	2504.11946	null
2025-04-16	VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning	Xuyang Chen et.al.	2504.11944	null
2025-04-15	DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning	Zhiwei He et.al.	2504.11456	null
2025-04-15	A Clean Slate for Offline Reinforcement Learning	Matthew Thomas Jackson et.al.	2504.11453	null
2025-04-15	Embodied World Models Emerge from Navigational Task in Open-Ended Environments	Li Jin et.al.	2504.11419	null
2025-04-15	Measures of Variability for Risk-averse Policy Gradient	Yudong Luo et.al.	2504.11412	null
2025-04-15	Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning	Haiming Wang et.al.	2504.11354	null
2025-04-15	A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce	Wei Xiong et.al.	2504.11343	null
2025-04-15	Multi-Agent Reinforcement Learning for Greenhouse Gas Offset Credit Markets	Liam Welsh et.al.	2504.11258	null
2025-04-15	A Rollout-Based Algorithm and Reward Function for Efficient Resource Allocation in Business Processes	Jeroen Middelhuis et.al.	2504.11250	null
2025-04-15	Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks	Fikrican Özgür et.al.	2504.11247	null
2025-04-15	Revealing Covert Attention by Analyzing Human and Reinforcement Learning Agent Gameplay	Henrik Krauss et.al.	2504.11118	null
2025-04-14	Weight Ensembling Improves Reasoning in Language Models	Xingyu Dang et.al.	2504.10478	null
2025-04-14	Co-optimizing Physical Reconfiguration Parameters and Controllers for an Origami-inspired Reconfigurable Manipulator	Zhe Chen et.al.	2504.10474	null
2025-04-14	GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents	Xiaobo Xia et.al.	2504.10458	null
2025-04-14	The Communication and Computation Trade-off in Wireless Semantic Communications	Xuyang Chen et.al.	2504.10357	null
2025-04-14	Heimdall: test-time scaling on the generative verification	Wenlei Shi et.al.	2504.10337	null
2025-04-14	Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning	Guanqi He et.al.	2504.10334	null
2025-04-14	InstructEngine: Instruction-driven Text-to-Image Alignment	Xingyu Lu et.al.	2504.10329	null
2025-04-14	Vision based driving agent for race car simulation environments	Gergely Bári et.al.	2504.10266	null
2025-04-14	Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins	Collins O. Ogbodo et.al.	2504.10248	null
2025-04-14	Deep Reasoning Translation via Reinforcement Learning	Jiaan Wang et.al.	2504.10187	null
2025-04-11	Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing	Vinal Asodia et.al.	2504.08704	null
2025-04-11	Pobogot – An Open-Hardware Open-Source Low Cost Robot for Swarm Robotics	Alessia Loi et.al.	2504.08686	null
2025-04-11	Reinforcement Learning-Driven Plant-Wide Refinery Planning Using Model Decomposition	Zhouchang Li et.al.	2504.08642	null
2025-04-11	Neural Fidelity Calibration for Informative Sim-to-Real Adaptation	Youwei Yu et.al.	2504.08604	null
2025-04-11	SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning	Peixian Ma et.al.	2504.08600	link
2025-04-11	Playpen: An Environment for Exploring Learning Through Conversational Interaction	Nicola Horst et.al.	2504.08590	null
2025-04-11	Slicing the Gaussian Mixture Wasserstein Distance	Moritz Piening et.al.	2504.08544	null
2025-04-11	Diffusion Models for Robotic Manipulation: A Survey	Rosa Wolf et.al.	2504.08438	null
2025-04-11	Belief States for Cooperative Multi-Agent Reinforcement Learning under Partial Observability	Paul J. Pritz et.al.	2504.08417	null
2025-04-11	Scalable Conflict-free Decision Making with Photons	Kohei Konaka et.al.	2504.08331	null
2025-04-10	Perception-R1: Pioneering Perception Policy with Reinforcement Learning	En Yu et.al.	2504.07954	link
2025-04-10	Echo: An Open-Source, Low-Cost Teleoperation System with Force Feedback for Dataset Collection in Robot Learning	Artem Bazhenov et.al.	2504.07939	null
2025-04-10	Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining	Rosie Zhao et.al.	2504.07912	link
2025-04-10	Fast Adaptation with Behavioral Foundation Models	Harshit Sikchi et.al.	2504.07896	null
2025-04-10	2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization	Mengyang Li et.al.	2504.07856	null
2025-04-10	Genetic Programming with Reinforcement Learning Trained Transformer for Real-World Dynamic Scheduling Problems	Xian Chen et.al.	2504.07779	null
2025-04-10	Harnessing Equivariance: Modeling Turbulence with Graph Neural Networks	Marius Kurz et.al.	2504.07741	null
2025-04-10	Relaxing the Markov Requirements on Reinforcement Learning Under Weak Partial Ignorability	MaryLena Bleile et.al.	2504.07722	null
2025-04-10	Sim-to-Real Transfer in Reinforcement Learning for Maneuver Control of a Variable-Pitch MAV	Zhikun Wang et.al.	2504.07694	null
2025-04-10	VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model	Haozhan Shen et.al.	2504.07615	link
2025-04-09	Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning	Chenjie Hao et.al.	2504.07095	link
2025-04-09	AssistanceZero: Scalably Solving Assistance Games	Cassidy Laidlaw et.al.	2504.07091	link
2025-04-09	A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility	Andreas Hochlehnert et.al.	2504.07086	link
2025-04-09	To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning	Tian Qin et.al.	2504.07052	null
2025-04-09	Free Random Projection for In-Context Reinforcement Learning	Tomohiro Hayase et.al.	2504.06983	null
2025-04-09	VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning	Xinhao Li et.al.	2504.06958	link
2025-04-09	Regret Bounds for Robust Online Decision Making	Alexander Appel et.al.	2504.06820	null
2025-04-09	Interactive Expressive Motion Generation Using Dynamic Movement Primitives	Till Hielscher et.al.	2504.06735	null
2025-04-09	Learning global control of underactuated systems with Model-Based Reinforcement Learning	Niccolò Turcato et.al.	2504.06721	null
2025-04-09	SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination	Delin Zhao et.al.	2504.06684	null
2025-04-08	ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface	Fangchen Liu et.al.	2504.06156	null
2025-04-08	Adversarial Training of Reward Models	Alexander Bukharin et.al.	2504.06141	null
2025-04-08	A Multimedia Analytics Model for the Foundation Model Era	Marcel Worring et.al.	2504.06138	null
2025-04-08	Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms	Ido Greenberg et.al.	2504.06126	null
2025-04-08	Robo-taxi Fleet Coordination at Scale via Reinforcement Learning	Luigi Tresca et.al.	2504.06125	link
2025-04-09	Leanabell-Prover: Posttraining Scaling in Formal Reasoning	Jingyuan Zhang et.al.	2504.06122	link
2025-04-08	Trust-Region Twisted Policy Improvement	Joery A. de Vries et.al.	2504.06048	null
2025-04-08	Information-Theoretic Reward Decomposition for Generalizable RLHF	Liyuan Mao et.al.	2504.06020	null
2025-04-08	Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models	J. S. van Hulst et.al.	2504.05978	null
2025-04-08	AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems	Zhuoli Zhuang et.al.	2504.05950	null
2025-04-07	RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception	Hui Zhang et.al.	2504.05287	null
2025-04-07	Concise Reasoning via Reinforcement Learning	Mehdi Fatemi et.al.	2504.05185	link
2025-04-07	Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval	Kidist Amde Mekonnen et.al.	2504.05181	link
2025-04-07	RLBayes: a Bayesian Network Structure Learning Algorithm via Reinforcement Learning-Based Search Strategy	Mingcan Wang et.al.	2504.05167	null
2025-04-07	A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks	Leonardo Kanashiro Felizardo et.al.	2504.05150	link
2025-04-08	VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks	Yu Yue et.al.	2504.05118	null
2025-04-07	Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning	Anja Surina et.al.	2504.05108	null
2025-04-08	Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation	Huilin Yin et.al.	2504.05045	null
2025-04-07	Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning	Bibek Poudel et.al.	2504.05018	null
2025-04-07	Wavelet Policy: Imitation Policy Learning in Frequency Domain with Wavelet Transforms	Changchuan Yang et.al.	2504.04991	link
2025-04-04	Align to Structure: Aligning Large Language Models with Structural Information	Zae Myung Kim et.al.	2504.03622	null
2025-04-04	Optimization of a Triangular Delaunay Mesh Generator using Reinforcement Learning	Will Thacher et.al.	2504.03610	null
2025-04-04	Dexterous Manipulation through Imitation Learning: A Survey	Shan An et.al.	2504.03515	null
2025-04-04	Learning Dual-Arm Coordination for Grasping Large Flat Objects	Yongliang Wang et.al.	2504.03500	null
2025-04-04	Optimizing Quantum Circuits via ZX Diagrams using Reinforcement Learning and Graph Neural Networks	Alexander Mattick et.al.	2504.03429	null
2025-04-04	DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models	Sathish Kumar et.al.	2504.03423	null
2025-04-04	Autonomous state-space segmentation for Deep-RL sparse reward scenarios	Gianluca Maselli et.al.	2504.03420	null
2025-04-04	Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning	Sanghwan Bae et.al.	2504.03380	null
2025-04-04	Verification of Autonomous Neural Car Control with KeYmaera X	Enguerrand Prebet et.al.	2504.03272	null
2025-04-04	Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward	Yanming Wan et.al.	2504.03206	null
2025-04-03	Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets	Chuning Zhu et.al.	2504.02792	link
2025-04-03	A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy	Andrea Ghezzi et.al.	2504.02710	null
2025-04-03	Handover and SINR-Aware Path Optimization in 5G-UAV mmWave Communication using DRL	Achilles Kiwanuka Machumilane et.al.	2504.02688	null
2025-04-03	Integrating Human Knowledge Through Action Masking in Reinforcement Learning for Operations Research	Mirko Stappert et.al.	2504.02662	null
2025-04-03	SymDQN: Symbolic Knowledge and Reasoning in Neural Network-based Reinforcement Learning	Ivo Amador et.al.	2504.02654	null
2025-04-03	Solving the Paint Shop Problem with Flexible Management of Multi-Lane Buffers Using Reinforcement Learning and Action Masking	Mirko Stappert et.al.	2504.02644	null
2025-04-03	Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving	Daoguang Zan et.al.	2504.02605	link
2025-04-03	Regulating Spatial Fairness in a Tripartite Micromobility Sharing System via Reinforcement Learning	Matteo Cederle et.al.	2504.02597	null
2025-04-03	LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning	Kepu Zhang et.al.	2504.02590	null
2025-04-04	Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme	Yan Ma et.al.	2504.02587	link
2025-04-02	OpenCodeReasoning: Advancing Data Distillation for Competitive Coding	Wasi Uddin Ahmad et.al.	2504.01943	null
2025-04-02	Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity	Lisa Coiffard et.al.	2504.01915	null
2025-04-02	GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning	Yanzhou Su et.al.	2504.01886	link
2025-04-02	Interpreting Emergent Planning in Model-Free Reinforcement Learning	Thomas Bush et.al.	2504.01871	null
2025-04-02	Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error	Anne Somalwar et.al.	2504.01766	null
2025-04-03	Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning	Ke Jiang et.al.	2504.01719	null
2025-04-02	ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs	Yi-Long Lu et.al.	2504.01698	null
2025-04-02	8-DoFs Cable Driven Parallel Robots for Bimanual Teleportation	Hung Hon Cheng et.al.	2504.01554	null
2025-04-02	A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics	Qihao Ye et.al.	2504.01482	null
2025-04-02	Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning	Llewyn Salt et.al.	2504.01459	null
2025-03-31	Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1	Yi Chen et.al.	2503.24376	link
2025-03-31	Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning	Yubo Zhang et.al.	2503.24296	null
2025-03-31	Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model	Jingcheng Hu et.al.	2503.24290	link
2025-03-31	Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning	Jiacheng Lin et.al.	2503.24289	link
2025-03-31	Moving Edge for On-Demand Edge Computing: An Uncertainty-aware Approach	Fangtong Zhou et.al.	2503.24214	null
2025-03-31	Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantees via Constrained Mean-Field Reinforcement Learning	Matej Jusup et.al.	2503.24183	link
2025-03-31	Learning a Canonical Basis of Human Preferences from Binary Ratings	Kailas Vodrahalli et.al.	2503.24150	null
2025-03-31	Reinforcement Learning for Safe Autonomous Two Device Navigation of Cerebral Vessels in Mechanical Thrombectomy	Harry Robertshaw et.al.	2503.24140	null
2025-03-31	Level the Level: Balancing Game Levels for Asymmetric Player Archetypes With Reinforcement Learning	Florian Rupp et.al.	2503.24099	null
2025-03-31	HACTS: a Human-As-Copilot Teleoperation System for Robot Learning	Zhiyuan Xu et.al.	2503.24070	null
2025-03-28	Q-Insight: Understanding Image Quality via Visual Reinforcement Learning	Weiqi Li et.al.	2503.22679	link
2025-03-28	Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels	Adam Wei et.al.	2503.22634	null
2025-03-28	Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments	S. Aaron McClendon et.al.	2503.22595	null
2025-03-28	On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations	Rajdeep Singh Hundal et.al.	2503.22575	null
2025-03-28	Robust Offline Imitation Learning Through State-level Trajectory Stitching	Shuze Wang et.al.	2503.22524	null
2025-03-28	Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments	Luke Rowe et.al.	2503.22496	null
2025-03-28	Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model	Wangtao Sun et.al.	2503.22480	null
2025-03-28	Control of Humanoid Robots with Parallel Mechanisms using Kinematic Actuation Models	Victor Lutz et.al.	2503.22459	null
2025-03-28	Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning	Abdullah Vanlioglu et.al.	2503.22456	null
2025-03-28	Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses	Sebastián Espinel-Ríos et.al.	2503.22409	null
2025-03-27	Video-R1: Reinforcing Video Reasoning in MLLMs	Kaituo Feng et.al.	2503.21776	link
2025-03-27	ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation	Zhicheng Lee et.al.	2503.21729	link
2025-03-27	Collab: Controlled Decoding using Mixture of Agents for LLM Alignment	Souradip Chakraborty et.al.	2503.21720	null
2025-03-27	Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks	Wenqi Zhang et.al.	2503.21696	link
2025-03-27	LLM-Gomoku: A Large Language Model-Based System for Strategic Gomoku with Self-Play and Reinforcement Learning	Hui Wang et.al.	2503.21683	null
2025-03-27	A tale of two goals: leveraging sequentiality in multi-goal scenarios	Olivier Serris et.al.	2503.21677	null
2025-03-27	UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning	Zhengxi Lu et.al.	2503.21620	link
2025-03-27	A Deep Reinforcement Learning-based Approach for Adaptive Handover Protocols	Johannes Voigt et.al.	2503.21601	null
2025-03-27	DATA-WA: Demand-based Adaptive Task Assignment with Dynamic Worker Availability Windows	Jinwen Chen et.al.	2503.21458	null
2025-03-27	On Learning-Based Traffic Monitoring With a Swarm of Drones	Marko Maljkovic et.al.	2503.21433	null
2025-03-26	Understanding R1-Zero-Like Training: A Critical Perspective	Zichen Liu et.al.	2503.20783	link
2025-03-27	Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning	Huajie Tan et.al.	2503.20752	link
2025-03-26	Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control	Eloy Anguiano Batanero et.al.	2503.20688	null
2025-03-26	Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound	Yuhao Huang et.al.	2503.20685	null
2025-03-26	Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging	Han Wu et.al.	2503.20641	link
2025-03-26	State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning	Zongyuan Zhang et.al.	2503.20613	null
2025-03-26	Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models	Siyuan Guo et.al.	2503.20576	null
2025-03-26	Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems	Rakesh Nadig et.al.	2503.20507	null
2025-03-26	Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles	Ruoqi Wen et.al.	2503.20462	null
2025-03-26	The Crucial Role of Problem Formulation in Real-World Reinforcement Learning	Georg Schäfer et.al.	2503.20442	null
2025-03-25	Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking	Xiaoyu Tian et.al.	2503.19855	link
2025-03-25	Optimal Path Planning and Cost Minimization for a Drone Delivery System Via Model Predictive Control	Muhammad Al-Zafar Khan et.al.	2503.19699	null
2025-03-25	Risk-Aware Reinforcement Learning for Autonomous Driving: Improving Safety When Driving through Intersection	Bo Leng et.al.	2503.19690	null
2025-03-25	Learning to chain-of-thought with Jensen’s evidence lower bound	Yunhao Tang et.al.	2503.19618	null
2025-03-25	RL-finetuning LLMs from on- and off-policy data with a single algorithm	Yunhao Tang et.al.	2503.19612	null
2025-03-25	Optimizing Language Models for Inference Time Objectives using Reinforcement Learning	Yunhao Tang et.al.	2503.19595	null
2025-03-25	One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF	Xin Cai et.al.	2503.19523	null
2025-03-25	ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning	Mingyang Chen et.al.	2503.19470	link
2025-03-25	Multi-Agent Deep Reinforcement Learning for Safe Autonomous Driving with RICS-Assisted MEC	Xueyao Zhang et.al.	2503.19418	null
2025-03-25	NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios	Songyi Gao et.al.	2503.19267	link
2025-03-24	Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training	Brian R. Bartoldson et.al.	2503.18929	link
2025-03-24	SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild	Weihao Zeng et.al.	2503.18892	link
2025-03-24	Bootstrapped Model Predictive Control	Yuhang Wang et.al.	2503.18871	link
2025-03-24	Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm	Chak Lam Shek et.al.	2503.18816	null
2025-03-24	Sample-Efficient Reinforcement Learning of Koopman eNMPC	Daniel Mayfrank et.al.	2503.18787	null
2025-03-24	Simulation-Driven Balancing of Competitive Game Levels with Reinforcement Learning	Florian Rupp et.al.	2503.18748	null
2025-03-24	RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation	Chengbo Yuan et.al.	2503.18738	null
2025-03-24	FF-SRL: High Performance GPU-Based Surgical Simulation For Robot Learning	Diego Dall’Alba et.al.	2503.18616	null
2025-03-24	Adventurer: Exploration with BiGAN for Deep Reinforcement Learning	Yongshuai Liu et.al.	2503.18612	null
2025-03-24	Reinforcement Learning in Switching Non-Stationary Markov Decision Processes: Algorithms and Convergence Analysis	Mohsen Amiri et.al.	2503.18607	null
2025-03-21	OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement	Yihe Deng et.al.	2503.17352	link
2025-03-21	Capturing Individual Human Preferences with Reward Features	André Barreto et.al.	2503.17338	null
2025-03-21	FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models	Mingyang Song et.al.	2503.17287	link
2025-03-21	Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem	Abhijeet Pendyala et.al.	2503.17194	null
2025-03-21	Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning	Chan Kim et.al.	2503.17125	null
2025-03-21	Neural-Guided Equation Discovery	Jannis Brugger et.al.	2503.16953	null
2025-03-21	A New Segment Routing method with Swap Node Selection Strategy Based on Deep Reinforcement Learning for Software Defined Network	Miao Ye et.al.	2503.16914	null
2025-03-21	Federated Digital Twin Construction via Distributed Sensing: A Game-Theoretic Online Optimization with Overlapping Coalitions	Ruoyang Chen et.al.	2503.16823	null
2025-03-21	BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation	Hirotaka Tahara et.al.	2503.16803	null
2025-03-21	Causally Aligned Curriculum Learning	Mingxuan Li et.al.	2503.16799	null
2025-03-20	Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models	Yang Sui et.al.	2503.16419	link
2025-03-20	RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints	Yiran Qin et.al.	2503.16408	null
2025-03-20	Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming	Minori Narita et.al.	2503.16371	null
2025-03-20	JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse	Muyao Li et.al.	2503.16365	link
2025-03-21	Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning	Zhaowei Liu et.al.	2503.16252	link
2025-03-20	Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t	Quy-Anh Dang et.al.	2503.16219	link
2025-03-20	Explosive Jumping with Rigid and Articulated Soft Quadrupeds via Example Guided Reinforcement Learning	Georgios Apostolides et.al.	2503.16197	null
2025-03-20	Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning	Yuki Akiyama et.al.	2503.16192	null
2025-03-20	CLS-RL: Image Classification with Rule-Based Reinforcement Learning	Ming Li et.al.	2503.16188	link
2025-03-20	Cultural Alignment in Large Language Models Using Soft Prompt Tuning	Reem I. Masoud et.al.	2503.16094	null
2025-03-19	Learning to Play Piano in the Real World	Yves-Simon Zeulner et.al.	2503.15481	null
2025-03-19	What Makes a Reward Model a Good Teacher? An Optimization Perspective	Noam Razin et.al.	2503.15477	link
2025-03-19	CCDP: Composition of Conditional Diffusion Policies with Guided Sampling	Amirreza Razmjoo et.al.	2503.15386	null
2025-03-19	Online Imitation Learning for Manipulation via Decaying Relative Correction through Teleoperation	Cheng Pan et.al.	2503.15368	null
2025-03-19	Optimizing Decomposition for Optimal Claim Verification	Yining Lu et.al.	2503.15354	link
2025-03-19	aiXcoder-7B-v2: Training LLMs to Fully Utilize the Long Context in Repository-level Code Completion	Jia Li et.al.	2503.15301	null
2025-03-19	Reinforcement Learning for Robust Athletic Intelligence: Lessons from the 2nd ‘AI Olympics with RealAIGym’ Competition	Felix Wiebe et.al.	2503.15290	null
2025-03-19	DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning	Ruowen Zhao et.al.	2503.15265	link
2025-03-19	Partially Observable Reinforcement Learning with Memory Traces	Onno Eberhard et.al.	2503.15200	null
2025-03-19	Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach	Mohamed Hassouna et.al.	2503.15190	null
2025-03-18	DAPO: An Open-Source LLM Reinforcement Learning System at Scale	Qiying Yu et.al.	2503.14476	null
2025-03-18	Pauli Network Circuit Synthesis with Reinforcement Learning	Ayushi Dubal et.al.	2503.14448	null
2025-03-18	Flying in Highly Dynamic Environments with End-to-end Learning Approach	Xiyu Fan et.al.	2503.14352	null
2025-03-18	MANTRA: Enhancing Automated Method-Level Refactoring with Contextual RAG and Multi-Agent LLM Collaboration	Yisen Xu et.al.	2503.14340	null
2025-03-18	Revealing higher-order neural representations with generative artificial intelligence	Hojjat Azimi Asrari et.al.	2503.14333	null
2025-03-18	Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs	Nicolas Le Roux et.al.	2503.14286	null
2025-03-18	Integral modelling and Reinforcement Learning control of 3D liquid metal coating on a moving substrate	Fabio Pino et.al.	2503.14270	null
2025-03-18	Automating Experimental Optics with Sample Efficient Machine Learning Methods	Arindam Saha et.al.	2503.14260	null
2025-03-18	Quantization-Free Autoregressive Action Transformer	Ziyad Sheebaelhamd et.al.	2503.14259	null
2025-03-18	CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration	Chunyu Yang et.al.	2503.14254	null
2025-03-17	Uncovering Utility Functions from Observed Outcomes	Marta Grzeskiewicz et.al.	2503.13432	null
2025-03-17	FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation	Shijie Fang et.al.	2503.13418	null
2025-03-17	A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives	Weiqiang Jin et.al.	2503.13415	null
2025-03-17	TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM	Ye Wang et.al.	2503.13377	link
2025-03-17	Agents Play Thousands of 3D Video Games	Zhongwen Xu et.al.	2503.13356	null
2025-03-17	Local-Global Learning of Interpretable Control Policies: The Interface between MPC and Reinforcement Learning	Thomas Banker et.al.	2503.13289	null
2025-03-17	Timing the Match: A Deep Reinforcement Learning Approach for Ride-Hailing and Ride-Pooling Services	Yiman Bao et.al.	2503.13200	null
2025-03-17	A representational framework for learning and encoding structurally enriched trajectories in complex agent environments	Corina Catarau-Cotutiu et.al.	2503.13194	null
2025-03-17	HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning	Wensheng Wang et.al.	2503.13171	null
2025-03-17	Efficient Imitation Under Misspecification	Nicolas Espinosa-Dice et.al.	2503.13162	null
2025-03-14	Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning	Siyuan Huang et.al.	2503.11646	null
2025-03-14	Scaling the Automated Discovery of Quantum Circuits via Reinforcement Learning with Gadgets	Jan Olle et.al.	2503.11638	null
2025-03-14	Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control	Yifeng Zhang et.al.	2503.11488	null
2025-03-14	A Review of DeepSeek Models’ Key Innovative Techniques	Chengen Wang et.al.	2503.11486	null
2025-03-14	Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning	Jose-Luis Holgado-Alvarez et.al.	2503.11467	null
2025-03-14	Optimizing 6G Dense Network Deployment for the Metaverse Using Deep Reinforcement Learning	Jie Zhang et.al.	2503.11449	null
2025-03-14	Adaptive Torque Control of Exoskeletons under Spasticity Conditions via Reinforcement Learning	Andrés Chavarrías et.al.	2503.11433	null
2025-03-14	TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation	Hongxiang Zhao et.al.	2503.11423	null
2025-03-14	Reinforcement Learning-Based Controlled Switching Approach for Inrush Current Minimization in Power Transformers	Jone Ugarte Valdivielso et.al.	2503.11398	null
2025-03-14	Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model	Moritz A. Zanger et.al.	2503.11339	null
2025-03-13	NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models	Mert Albaba et.al.	2503.10626	null
2025-03-13	R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization	Yi Yang et.al.	2503.10615	link
2025-03-13	The Lagrangian Method for Solving Constrained Markov Games	Soham Das et.al.	2503.10561	null
2025-03-13	Towards Safe Path Tracking Using the Simplex Architecture	Georg Jäger et.al.	2503.10559	null
2025-03-13	SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models	Sahar Admoni et.al.	2503.10509	null
2025-03-13	Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality	Wei Xiao et.al.	2503.10484	null
2025-03-13	SortingEnv: An Extendable RL-Environment for an Industrial Sorting Process	Tom Maus et.al.	2503.10466	null
2025-03-13	Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond	Liang Wen et.al.	2503.10460	link
2025-03-13	Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback	Derun Li et.al.	2503.10434	null
2025-03-13	Towards Constraint-Based Adaptive Hypergraph Learning for Solving Vehicle Routing: An End-to-End Solution	Zhenwei Wang et.al.	2503.10421	null
2025-03-12	Strategyproof Reinforcement Learning from Human Feedback	Thomas Kleine Buening et.al.	2503.09561	null
2025-03-12	Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning	Bowen Jin et.al.	2503.09516	link
2025-03-12	RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment	Md Morshed Alam et.al.	2503.09513	null
2025-03-12	Reinforcement Learning is all You Need	Yongsheng Lian et.al.	2503.09512	null
2025-03-12	ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning	Ziyu Wan et.al.	2503.09501	link
2025-03-12	Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic	Kexuan Wang et.al.	2503.09391	null
2025-03-12	Evaluating Reinforcement Learning Safety and Trustworthiness in Cyber-Physical Systems	Katherine Dearstyne et.al.	2503.09388	null
2025-03-12	Rule-Guided Reinforcement Learning Policy Evaluation and Improvement	Martin Tappler et.al.	2503.09270	null
2025-03-12	Large-scale Regional Traffic Signal Control Based on Single-Agent Reinforcement Learning	Qiang Li et.al.	2503.09252	null
2025-03-12	MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics	Shuguang Chu et.al.	2503.09203	null
2025-03-11	MoE-Loco: Mixture of Experts for Multitask Locomotion	Runhan Huang et.al.	2503.08564	null
2025-03-11	Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies	Chen Xu et.al.	2503.08558	null
2025-03-11	TLA: Tactile-Language-Action Model for Contact-Rich Manipulation	Peng Hao et.al.	2503.08548	null
2025-03-11	GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training	Tong Wei et.al.	2503.08525	null
2025-03-11	Hierarchical Multi Agent DRL for Soft Handovers Between Edge Clouds in Open RAN	F. Giarrè et.al.	2503.08493	null
2025-03-11	Hybrid Deep Reinforcement Learning for Radio Tracer Localisation in Robotic-assisted Radioguided Surgery	Hanyi Zhang et.al.	2503.08492	null
2025-03-12	An Autonomous RL Agent Methodology for Dynamic Web UI Testing in a BDD Framework	Ali Hassaan Mughal et.al.	2503.08464	null
2025-03-11	V-Max: Making RL practical for Autonomous Driving	Valentin Charraut et.al.	2503.08388	link
2025-03-11	Gait in Eight: Efficient On-Robot Learning for Omnidirectional Quadruped Locomotion	Nico Bohlinger et.al.	2503.08375	null
2025-03-11	LiPS: Large-Scale Humanoid Robot Reinforcement Learning with Parallel-Series Structures	Qiang Zhang et.al.	2503.08349	null
2025-03-10	Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration	Dylan J. Foster et.al.	2503.07453	null
2025-03-10	DRESS: Diffusion Reasoning-based Reward Shaping Scheme For Intelligent Networks	Feiran You et.al.	2503.07433	null
2025-03-10	The Interplay of AI-and-RAN: Dynamic Resource Allocation for Converged 6G Platform	Syed Danial Ali Shah et.al.	2503.07420	null
2025-03-10	Cost-Effective Design of Grid-tied Community Microgrid	Moslem Uddin et.al.	2503.07414	null
2025-03-10	PER-DPP Sampling Framework and Its Application in Path Planning	Junzhe Wang et.al.	2503.07411	null
2025-03-10	Towards Safe Robot Foundation Models	Maximilian Tölle et.al.	2503.07404	null
2025-03-10	Q-MARL: A quantum-inspired algorithm using neural message passing for large-scale multi-agent reinforcement learning	Kha Vo et.al.	2503.07397	null
2025-03-10	AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments	Grik Tadevosyan et.al.	2503.07376	null
2025-03-10	MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning	Fanqing Meng et.al.	2503.07365	link
2025-03-10	Artificial Utopia: Simulation and Intelligent Agents for a Democratised Future	Yannick Oswald et.al.	2503.07364	null
2025-03-07	Multi-Fidelity Policy Gradient Algorithms	Xinjie Liu et.al.	2503.05696	null
2025-03-07	dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale	Yihao Liu et.al.	2503.05646	null
2025-03-07	R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning	Huatong Song et.al.	2503.05592	null
2025-03-07	InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model	Feeza Khan Khanzada et.al.	2503.05573	null
2025-03-07	Tractable Representations for Convergent Approximation of Distributional HJB Equations	Julie Alhosh et.al.	2503.05563	null
2025-03-07	Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning	Raphael Trumpp et.al.	2503.05546	null
2025-03-07	RiLoCo: An ISAC-oriented AI Solution to Build RIS-empowered Networks	Guillermo Encinas-Lago et.al.	2503.05480	null
2025-03-07	Controllable Complementarity: Subjective Preferences in Human-AI Collaboration	Chase McDonald et.al.	2503.05455	null
2025-03-07	R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning	Jiaxing Zhao et.al.	2503.05379	null
2025-03-07	Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning	Hyungkyu Kang et.al.	2503.05306	null
2025-03-06	Sample-Optimal Agnostic Boosting with Unlabeled Data	Udaya Ghai et.al.	2503.04706	null
2025-03-06	L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning	Pranjal Aggarwal et.al.	2503.04697	null
2025-03-06	Multi-Agent Inverse Q-Learning from Demonstrations	Nathaniel Haynam et.al.	2503.04679	null
2025-03-06	Learning Generalizable Language-Conditioned Cloth Manipulation from Long Demonstrations	Hanyi Zhao et.al.	2503.04557	null
2025-03-06	PALo: Learning Posture-Aware Locomotion for Quadruped Robots	Xiangyu Miao et.al.	2503.04462	null
2025-03-06	AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services	Xiaoqi Wang et.al.	2503.04418	null
2025-03-06	Learning Transformer-based World Models with Contrastive Predictive Coding	Maxime Burchi et.al.	2503.04416	null
2025-03-06	Energy-Aware Task Offloading for Rotatable STAR-RIS-Enhanced Mobile Edge Computing Systems	Dongdong Yang et.al.	2503.04397	null
2025-03-06	Delay-Aware Digital Twin Synchronization in Mobile Edge Networks with Semantic Communications	Bin Li et.al.	2503.04387	null
2025-03-06	Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models	Niccolò Turcato et.al.	2503.04280	null
2025-03-05	Curating Demonstrations using Online Experience	Annie S. Chen et.al.	2503.03707	null
2025-03-05	A Generative Approach to High Fidelity 3D Reconstruction from Text Data	Venkat Kumar R et.al.	2503.03664	null
2025-03-05	Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns	Dong Tian et.al.	2503.03660	null
2025-03-05	Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset	Jessica Hoffmann et.al.	2503.03654	null
2025-03-05	Olympus: A Jumping Quadruped for Planetary Exploration Utilizing Reinforcement Learning for In-Flight Attitude Control	Jørgen Anker Olsen et.al.	2503.03574	null
2025-03-05	Probabilistic Insights for Efficient Exploration Strategies in Reinforcement Learning	Ernesto Garcia et.al.	2503.03565	null
2025-03-05	DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions	Anna Kuchko et.al.	2503.03515	null
2025-03-05	SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning	Borong Zhang et.al.	2503.03480	null
2025-03-05	Continuous Control of Diverse Skills in Quadruped Robots Without Complete Expert Datasets	Jiaxin Tu et.al.	2503.03476	null
2025-03-05	Navigating Intelligence: A Survey of Google OR-Tools and Machine Learning for Global Path Planning in Autonomous Vehicles	Alexandre Benoit et.al.	2503.03338	null
2025-03-04	Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation	Han Xue et.al.	2503.02881	null
2025-03-04	AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation	Songming Zhang et.al.	2503.02832	null
2025-03-04	Meta-Learning to Explore via Memory Density Feedback	Kevin L. McKee et.al.	2503.02831	null
2025-03-04	Quantitative Resilience Modeling for Autonomous Cyber Defense	Xavier Cadet et.al.	2503.02780	null
2025-03-04	Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning	Qiyang Yan et.al.	2503.02738	null
2025-03-04	Learning-Based Passive Fault-Tolerant Control of a Quadrotor with Rotor Failure	Jiehao Chen et.al.	2503.02649	null
2025-03-04	Human-aligned Safe Reinforcement Learning for Highway On-Ramp Merging in Dense Traffic	Yang Li et.al.	2503.02624	null
2025-03-04	Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models	Paul Stangel et.al.	2503.02623	null
2025-03-04	Reinforcement Learning-based Threat Assessment	Wuzhou Sun et.al.	2503.02612	null
2025-03-04	What Makes a Model Breathe? Understanding Reinforcement Learning Reward Function Design in Biomechanical User Simulation	Hannah Selder et.al.	2503.02571	null
2025-02-28	LLM Post-Training: A Deep Dive into Reasoning Large Language Models	Komal Kumar et.al.	2502.21321	null
2025-02-28	ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers	Alexander Scarlatos et.al.	2502.21267	null
2025-02-28	ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs	Hao Ge et.al.	2502.21231	null
2025-02-28	A Method of Selective Attention for Reservoir Based Agents	Kevin McKee et.al.	2502.21229	null
2025-02-28	Reducing Reward Dependence in RL Through Adaptive Confidence Discounting	Muhammed Yusuf Satici et.al.	2502.21181	null
2025-02-28	Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning	Léopold Maytié et.al.	2502.21142	null
2025-02-28	Dynamically Local-Enhancement Planner for Large-Scale Autonomous Driving	Nanshan Deng et.al.	2502.21134	null
2025-02-28	AuthSim: Towards Authentic and Effective Safety-critical Scenario Generation for Autonomous Driving Tests	Yukuan Yang et.al.	2502.21100	null
2025-02-28	Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control	Taeho Lee et.al.	2502.21057	null
2025-02-28	Motion ReTouch: Motion Modification Using Four-Channel Bilateral Control	Koki Inami et.al.	2502.20982	null
2025-02-27	Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids	Toru Lin et.al.	2502.20396	null
2025-02-27	Multi-Turn Code Generation Through Single-Step Rewards	Arnav Kumar Jain et.al.	2502.20380	null
2025-02-27	The Role of Tactile Sensing for Learning Reach and Grasp	Boya Zhang et.al.	2502.20367	null
2025-02-27	Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning	Thomas Budiarjo et.al.	2502.20348	null
2025-02-27	Safety Representations for Safer Policy Learning	Kaustubh Mani et.al.	2502.20341	null
2025-02-27	Deep Reinforcement Learning based Autonomous Decision-Making for Cooperative UAVs: A Search and Rescue Real World Application	Thomas Hickling et.al.	2502.20326	null
2025-02-27	On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+( $λ$,$λ$ ))-GA	Tai Nguyen et.al.	2502.20265	null
2025-02-27	Explainable physics-based constraints on reinforcement learning for accelerator controls	Jonathan Colen et.al.	2502.20247	null
2025-02-27	MARVEL: Multi-Agent Reinforcement Learning for constrained field-of-View multi-robot Exploration in Large-scale environments	Jimmy Chiun et.al.	2502.20217	null
2025-02-27	Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies	Zhouyu He et.al.	2502.20190	null
2025-02-26	Recurrent Auto-Encoders for Enhanced Deep Reinforcement Learning in Wilderness Search and Rescue Planning	Jan-Hendrik Ewers et.al.	2502.19356	null
2025-02-26	Hybrid Robot Learning for Automatic Robot Motion Planning in Manufacturing	Siddharth Singh et.al.	2502.19340	null
2025-02-26	WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies	William Solow et.al.	2502.19308	null
2025-02-26	Combining Planning and Reinforcement Learning for Solving Relational Multiagent Domains	Nikhilesh Prabhakar et.al.	2502.19297	null
2025-02-26	Deep Computerized Adaptive Testing	Jiguang Li et.al.	2502.19275	null
2025-02-26	Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective	Jiawei Huang et.al.	2502.19255	null
2025-02-26	ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration	Minjie Zhu et.al.	2502.19250	null
2025-02-26	Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time	Jiazheng Li et.al.	2502.19230	null
2025-02-26	When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning	Yijiang River Dong et.al.	2502.19158	null
2025-02-26	Policy Testing with MDPFuzz (Replicability Study)	Quentin Mazouni et.al.	2502.19116	null
2025-02-25	SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	Yuxiang Wei et.al.	2502.18449	null
2025-02-25	MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning	Chanwoo Park et.al.	2502.18439	null
2025-02-25	Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand	Fengshuo Bai et.al.	2502.18423	null
2025-02-25	Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck	Ryo Takizawa et.al.	2502.18121	null
2025-02-25	Controlling dynamics of stochastic systems with deep reinforcement learning	Ruslan Mukhamadiarov et.al.	2502.18111	null
2025-02-25	From planning to policy: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation	Haewon Jung et.al.	2502.18015	null
2025-02-25	NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms	Yashan Wang et.al.	2502.18008	null
2025-02-25	Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric	Zhenyu Tao et.al.	2502.17983	null
2025-02-25	FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real	Weiheng Liu et.al.	2502.17894	null
2025-02-25	Sample-efficient diffusion-based control of complex nonlinear systems	Hongyi Chen et.al.	2502.17893	null
2025-02-24	Event-Based Limit Order Book Simulation under a Neural Hawkes Process: Application in Market-Making	Luca Lalor et.al.	2502.17417	null
2025-02-24	Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models	Alon Albalak et.al.	2502.17387	link
2025-02-24	Distributed Coordination for Heterogeneous Non-Terrestrial Networks	Jikang Deng et.al.	2502.17366	null
2025-02-24	TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control	Zifeng Zhuang et.al.	2502.17322	null
2025-02-24	Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach	Jichen Li et.al.	2502.17307	null
2025-02-24	A Reinforcement Learning Approach to Non-prehensile Manipulation through Sliding	Hamidreza Raei et.al.	2502.17221	null
2025-02-24	Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning	Weiji Xie et.al.	2502.17219	null
2025-02-24	Teleology-Driven Affective Computing: A Causal Framework for Sustained Well-Being	Bin Yin et.al.	2502.17172	null
2025-02-24	A Novel Multiple Access Scheme for Heterogeneous Wireless Communications using Symmetry-aware Continual Deep Reinforcement Learning	Hamidreza Mazandarani et.al.	2502.17167	null
2025-02-24	MA2RL: Masked Autoencoders for Generalizable Multi-Agent Reinforcement Learning	Jinyuan Feng et.al.	2502.17046	null
2025-02-21	BOSS: Benchmark for Observation Space Shift in Long-Horizon Task	Yue Yang et.al.	2502.15679	null
2025-02-21	VaViM and VaVAM: Autonomous Driving through Video Generative Modeling	Florent Bartoccioni et.al.	2502.15672	link
2025-02-21	Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network	Vincent Hsiao et.al.	2502.15662	null
2025-02-21	A Simulation Pipeline to Facilitate Real-World Robotic Reinforcement Learning Applications	Jefferson Silveira et.al.	2502.15649	null
2025-02-21	Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach	Xiangtong Yao et.al.	2502.15613	null
2025-02-21	SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning	Xuyang Li et.al.	2502.15512	null
2025-02-21	Learning Long-Horizon Robot Manipulation Skills via Privileged Action	Xiaofeng Mao et.al.	2502.15442	null
2025-02-21	TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning	Giuseppe Paolo et.al.	2502.15425	null
2025-02-21	Hyperspherical Normalization for Scalable Deep Reinforcement Learning	Hojoon Lee et.al.	2502.15280	null
2025-02-21	CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models	Shunchang Liu et.al.	2502.15278	null
2025-02-20	Generating $π$ -Functional Molecules Using STGG+ with Active Learning	Alexia Jolicoeur-Martineau et.al.	2502.14842	link
2025-02-20	Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models	Vlad Sobal et.al.	2502.14819	null
2025-02-20	Making Universal Policies Universal	Niklas Höpner et.al.	2502.14777	null
2025-02-20	Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning	Tian Xie et.al.	2502.14768	link
2025-02-20	Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse	Michael Doherty et.al.	2502.14741	null
2025-02-20	Length-Controlled Margin-Based Preference Optimization without Reference Model	Gengxu Li et.al.	2502.14643	null
2025-02-20	Curiosity Driven Multi-agent Reinforcement Learning for 3D Game Testing	Raihana Ferdous et.al.	2502.14606	null
2025-02-20	ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification	Hyunseok Lee et.al.	2502.14565	link
2025-02-20	MLGym: A New Framework and Benchmark for Advancing AI Research Agents	Deepak Nathani et.al.	2502.14499	link
2025-02-20	Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization	Zhitao He et.al.	2502.14496	link
2025-02-19	A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects	Arjun Gupta et.al.	2502.13964	null
2025-02-19	Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks	Guilherme Palma et.al.	2502.13918	null
2025-02-19	Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning	Antoine Moulin et.al.	2502.13900	null
2025-02-19	NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants	Yiran Qin et.al.	2502.13894	null
2025-02-19	Uncertainty quantification for Markov chains with application to temporal difference learning	Weichen Wu et.al.	2502.13822	null
2025-02-19	Learning to explore when mistakes are not allowed	Charly Pecqueux-Guézénec et.al.	2502.13801	null
2025-02-19	User Agency and System Automation in Interactive Intelligent Systems	Thomas Langerak et.al.	2502.13779	null
2025-02-19	Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values	Hongbo Zhang et.al.	2502.13723	null
2025-02-19	Hierarchical RL-MPC for Demand Response Scheduling	Maximilian Bloor et.al.	2502.13714	null
2025-02-19	User Association and Coordinated Beamforming in Cognitive Aerial-Terrestrial Networks: A Safe Reinforcement Learning Approach	Zizhen Zhou et.al.	2502.13663	null
2025-02-18	Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization	Shuo Xing et.al.	2502.13146	link
2025-02-18	RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning	Hao Gao et.al.	2502.13144	link
2025-02-18	Theorem Prover as a Judge for Synthetic Data Generation	Joshua Ong Jun Leang et.al.	2502.13137	null
2025-02-18	Text2World: Benchmarking Large Language Models for Symbolic World Model Generation	Mengkang Hu et.al.	2502.13092	link
2025-02-18	Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation	Sha Li et.al.	2502.13019	null
2025-02-18	HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit	Qingwei Ben et.al.	2502.13013	link
2025-02-18	Integrating Reinforcement Learning, Action Model Learning, and Numeric Planning for Tackling Complex Tasks	Yarin Benyamin et.al.	2502.13006	link
2025-02-18	Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options	Lakshmi Nair et.al.	2502.12929	link
2025-02-18	Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning	Nandakishor M et.al.	2502.12876	null
2025-02-18	A Survey on DRL based UAV Communications and Networking: DRL Fundamentals, Applications and Implementations	Wei Zhao et.al.	2502.12875	null
2025-02-17	Scaling Test-Time Compute Without Verification or RL is Suboptimal	Amrith Setlur et.al.	2502.12118	null
2025-02-17	Unhackable Temporal Rewarding for Scalable Video MLLMs	En Yu et.al.	2502.12081	link
2025-02-17	How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines	Ayan Sengupta et.al.	2502.12051	null
2025-02-17	Theoretical Barriers in Bellman-Based Reinforcement Learning	Brieuc Pinon et.al.	2502.11968	null
2025-02-17	Massively Scaling Explicit Policy-conditioned Value Functions	Nico Bohlinger et.al.	2502.11949	null
2025-02-17	FitLight: Federated Imitation Learning for Plug-and-Play Autonomous Traffic Signal Control	Yutong Ye et.al.	2502.11937	null
2025-02-17	VLP: Vision-Language Preference Learning for Embodied Manipulation	Runze Liu et.al.	2502.11918	null
2025-02-17	CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning	Yanxiao Zhao et.al.	2502.11896	null
2025-02-17	Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving?	Natalie Grabowsky et.al.	2502.11864	null
2025-02-17	Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces	Eric Eaton et.al.	2502.11828	null
2025-02-14	BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds	Huayi Wang et.al.	2502.10363	null
2025-02-14	Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations	Abdelrhman Shaheen et.al.	2502.10303	null
2025-02-14	Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding	Laurin Luttmann et.al.	2502.10233	null
2025-02-14	Dynamic Reinforcement Learning for Actors	Katsunari Shibata et.al.	2502.10200	null
2025-02-14	Reinforcement Learning based Constrained Optimal Control: an Interpretable Reward Design	Jingjie Ni et.al.	2502.10187	null
2025-02-14	Combinatorial Reinforcement Learning with Preference Feedback	Joongkyu Lee et.al.	2502.10158	null
2025-02-14	MonoForce: Learnable Image-conditioned Physics Engine	Ruslan Agishev et.al.	2502.10156	null
2025-02-14	Cooperative Multi-Agent Planning with Adaptive Skill Synthesis	Zhiyuan Li et.al.	2502.10148	null
2025-02-14	Provably Efficient RL under Episode-Wise Safety in Linear CMDPs	Toshinori Kitamura et.al.	2502.10138	null
2025-02-14	Causal Information Prioritization for Efficient Reinforcement Learning	Hongye Cao et.al.	2502.10097	null
2025-02-13	DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References	Xueyi Liu et.al.	2502.09614	link
2025-02-13	Coupled Rendezvous and Docking Maneuver control of satellite using Reinforcement learning-based Adaptive Fixed-Time Sliding Mode Controller	Rakesh Kumar Sahoo et.al.	2502.09517	null
2025-02-13	Variable Stiffness for Robust Locomotion through Reinforcement Learning	Dario Spoljaric et.al.	2502.09436	null
2025-02-13	A Survey of Reinforcement Learning for Optimization in Automation	Ahmad Farooq et.al.	2502.09417	null
2025-02-13	Generalizable Reinforcement Learning with Biologically Inspired Hyperdimensional Occupancy Grid Maps for Exploration and Goal-Directed Path Planning	Shay Snyder et.al.	2502.09393	null
2025-02-13	Machine learning for modelling unstructured grid data in computational physics: a review	Sibo Cheng et.al.	2502.09346	null
2025-02-13	Revisiting Topological Interference Management: A Learning-to-Code on Graphs Perspective	Zhiwei Shan et.al.	2502.09344	null
2025-02-13	Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning	Daniel Koutas et.al.	2502.09298	null
2025-02-13	Autonomous Task Completion Based on Goal-directed Answer Set Programming	Alexis R. Tudor et.al.	2502.09208	null
2025-02-13	Logical Reasoning in Large Language Models: A Survey	Hanmeng Liu et.al.	2502.09100	link
2025-02-12	Re $^3$ Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation	Xiaoshen Han et.al.	2502.08645	link
2025-02-12	A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards	Shivansh Patel et.al.	2502.08643	null
2025-02-12	Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning	Dhruv Rohatgi et.al.	2502.08632	null
2025-02-12	Robot Data Curation with Mutual Information Estimators	Joey Hejna et.al.	2502.08623	null
2025-02-12	Learning to Group and Grasp Multiple Objects	Takahiro Yonemaru et.al.	2502.08452	null
2025-02-12	CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World	Yankai Fu et.al.	2502.08449	null
2025-02-12	Acceleration of crystal structure relaxation with Deep Reinforcement Learning	Elena Trukhan et.al.	2502.08405	null
2025-02-12	Learning Humanoid Standing-up Control across Diverse Postures	Tao Huang et.al.	2502.08378	link
2025-02-12	Towards Principled Multi-Agent Task Agnostic Exploration	Riccardo Zamboni et.al.	2502.08365	null
2025-02-12	Deterministic generation of non-classical mechanical states in cavity optomechanics via reinforcement learning	Yu-Hong Liu et.al.	2502.08350	null
2025-02-11	Polynomial-Time Approximability of Constrained Reinforcement Learning	Jeremy McMahan et.al.	2502.07764	null
2025-02-11	DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove	Han Zhang et.al.	2502.07730	null
2025-02-11	Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning	Aya Kayal et.al.	2502.07715	null
2025-02-11	A Unifying Framework for Causal Imitation Learning with Hidden Confounders	Daqian Shao et.al.	2502.07656	null
2025-02-11	Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning	Zhaoting Li et.al.	2502.07645	null
2025-02-11	Distributed Value Decomposition Networks with Networked Agents	Guilherme S. Varela et.al.	2502.07635	null
2025-02-11	Evolution of cooperation in a bimodal mixture of conditional cooperators	Chenyang Zhao et.al.	2502.07537	null
2025-02-11	Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization	Daniel Palenicek et.al.	2502.07523	null
2025-02-11	Logarithmic Regret for Online KL-Regularized Reinforcement Learning	Heyang Zhao et.al.	2502.07460	null
2025-02-11	Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation	Erik M. Lintunen et.al.	2502.07423	null
2025-02-10	Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning	Chengqi Lyu et.al.	2502.06781	link
2025-02-10	On the Emergence of Thinking in LLMs I: Searching for the Right Intuition	Guanghao Ye et.al.	2502.06773	link
2025-02-10	ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates	Ling Yang et.al.	2502.06772	link
2025-02-10	AgilePilot: DRL-Based Drone Agent for Real-Time Motion Planning in Dynamic Environments by Leveraging Object Detection	Roohan Ahmed Khan et.al.	2502.06725	null
2025-02-10	Discovery of skill switching criteria for learning agile quadruped locomotion	Wanming Yu et.al.	2502.06676	null
2025-02-10	Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series	Aurélien Renault et.al.	2502.06584	null
2025-02-10	Predictive Red Teaming: Breaking Policies Without Breaking Robots	Anirudha Majumdar et.al.	2502.06575	null
2025-02-10	Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning	Jean Vassoyan et.al.	2502.06533	link
2025-02-10	Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling	Shenghong He et.al.	2502.06491	null
2025-02-10	SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding	Shuhao Liao et.al.	2502.06440	null
2025-02-07	DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails	Yihe Deng et.al.	2502.05163	link
2025-02-07	Use of Winsome Robots for Understanding Human Feedback (UWU)	Jessica Eggers et.al.	2502.05118	null
2025-02-07	3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery	Xiuyuan Hu et.al.	2502.05107	link
2025-02-07	Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Tushar Pandey et.al.	2502.05078	link
2025-02-07	Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning approach with Policy Distillation	Wenqi Bai et.al.	2502.05069	null
2025-02-07	Seasonal Station-Keeping of Short Duration High Altitude Balloons using Deep Reinforcement Learning	Tristan K. Schuler et.al.	2502.05014	null
2025-02-07	A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning Enhanced Approach	Taiyi Wang et.al.	2502.05001	null
2025-02-07	Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits	Finn Rietz et.al.	2502.04979	null
2025-02-07	Towards Smarter Sensing: 2D Clutter Mitigation in RL-Driven Cognitive MIMO Radar	Adam Umra et.al.	2502.04967	null
2025-02-07	Fast Adaptive Anti-Jamming Channel Access via Deep Q Learning and Coarse-Grained Spectrum Prediction	Jianshu Zhang et.al.	2502.04963	null
2025-02-06	DexterityGen: Foundation Controller for Unprecedented Dexterity	Zhao-Heng Yin et.al.	2502.04307	null
2025-02-06	PILAF: Optimal Human Preference Sampling for Reward Modeling	Yunzhen Feng et.al.	2502.04270	null
2025-02-06	Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning	Wesley A. Suttle et.al.	2502.04141	null
2025-02-06	Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents	Yuchen Lian et.al.	2502.04038	null
2025-02-06	Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning	Nikunj Gupta et.al.	2502.04028	link
2025-02-06	Bilevel Multi-Armed Bandit-Based Hierarchical Reinforcement Learning for Interaction-Aware Self-Driving at Unsignalized Intersections	Zengqi Peng et.al.	2502.03960	null
2025-02-06	Fairness Aware Reinforcement Learning via Proximal Policy Optimization	Gabriele La Malfa et.al.	2502.03953	null
2025-02-06	CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning	Yousef Koka et.al.	2502.03946	null
2025-02-06	Mirror Descent Actor Critic via Bounded Advantage Learning	Ryo Iwaki et.al.	2502.03854	null
2025-02-06	PAGNet: Pluggable Adaptive Generative Networks for Information Completion in Multi-Agent Communication	Zhuohui Zhang et.al.	2502.03845	null
2025-02-05	Deep Reinforcement Learning-Based Optimization of Second-Life Battery Utilization in Electric Vehicles Charging Stations	Rouzbeh Haghighi et.al.	2502.03412	null
2025-02-05	Lightweight Authenticated Task Offloading in 6G-Cloud Vehicular Twin Networks	Sarah Al-Shareeda et.al.	2502.03403	null
2025-02-05	Energy-Efficient Flying LoRa Gateways: A Multi-Agent Reinforcement Learning Approach	Abdullahi Isa Ahmed et.al.	2502.03377	null
2025-02-05	Demystifying Long Chain-of-Thought Reasoning in LLMs	Edward Yeo et.al.	2502.03373	link
2025-02-05	Learning from Active Human Involvement through Proxy Value Propagation	Zhenghao Peng et.al.	2502.03369	null
2025-02-05	Conditional Prediction by Simulation for Automated Driving	Fabian Konstantinidis et.al.	2502.03286	null
2025-02-05	Calibrated Unsupervised Anomaly Detection in Multivariate Time-series using Reinforcement Learning	Saba Sanami et.al.	2502.03245	null
2025-02-05	Underwater Soft Fin Flapping Motion with Deep Neural Network Based Surrogate Model	Yuya Hamamatsu et.al.	2502.03135	null
2025-02-05	Double Distillation Network for Multi-Agent Reinforcement Learning	Yang Zhou et.al.	2502.03125	null
2025-02-05	HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller	Qiyuan Zhang et.al.	2502.03122	null
2025-02-04	Flow Q-Learning	Seohong Park et.al.	2502.02538	null
2025-02-04	Brief analysis of DeepSeek R1 and it’s implications for Generative AI	Sarah Mercer et.al.	2502.02523	null
2025-02-04	Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search	Maohao Shen et.al.	2502.02508	null
2025-02-04	Towards Fast Graph Generation via Autoregressive Noisy Filtration Modeling	Markus Krimmel et.al.	2502.02415	null
2025-02-04	Achieving Hiding and Smart Anti-Jamming Communication: A Parallel DRL Approach against Moving Reactive Jammer	Yangyang Li et.al.	2502.02385	null
2025-02-04	Circular Microalgae-Based Carbon Control for Net Zero	Federico Zocco et.al.	2502.02382	null
2025-02-04	Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning	Donglin Zhan et.al.	2502.02332	null
2025-02-04	Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation	Siyu Wang et.al.	2502.02327	null
2025-02-04	DIME:Diffusion-Based Maximum Entropy Reinforcement Learning	Onur Celik et.al.	2502.02316	null
2025-02-04	MAGNNET: Multi-Agent Graph Neural Network-based Efficient Task Allocation for Autonomous Vehicles with Deep Reinforcement Learning	Lavanya Ratnabala et.al.	2502.02311	null
2025-01-31	Vintix: Action Model via In-Context Reinforcement Learning	Andrey Polubarov et.al.	2501.19400	link
2025-01-31	The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking	Yuchun Miao et.al.	2501.19358	null
2025-01-31	Jackpot! Alignment as a Maximal Lottery	Roberto-Rafael Maura-Rivero et.al.	2501.19266	null
2025-01-31	Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning	Balint Gyevnar et.al.	2501.19256	null
2025-01-31	Linear $Q$ -Learning Does Not Diverge: Convergence Rates to a Bounded Set	Xinyu Liu et.al.	2501.19254	null
2025-02-03	SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments	Hüseyin Aydın et.al.	2501.19245	null
2025-01-31	An Empirical Game-Theoretic Analysis of Autonomous Cyber-Defence Agents	Gregory Palmer et.al.	2501.19206	null
2025-01-31	APEX: Automated Parameter Exploration for Low-Power Wireless Protocols	Mohamed Hassaan M. Hydher et.al.	2501.19194	null
2025-01-31	Test-Time Training Scaling for Chemical Exploration in Drug Design	Morgan Thomas et.al.	2501.19153	null
2025-01-31	Decorrelated Soft Actor-Critic for Efficient Deep Reinforcement Learning	Burcu Küçükoğlu et.al.	2501.19133	null
2025-01-30	Design and Validation of Learning Aware HMI For Learning-Enabled Increasingly Autonomous Systems	Parth Ganeriwala et.al.	2501.18506	null
2025-01-30	Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor	Fausto Mauricio Lagos Suarez et.al.	2501.18490	null
2025-01-30	Model-Free RL Agents Demonstrate System 1-Like Intentionality	Hal Ashton et.al.	2501.18299	null
2025-01-30	Neural Operator based Reinforcement Learning for Control of first-order PDEs with Spatially-Varying State Delay	Jiaqi Hu et.al.	2501.18201	null
2025-01-30	QNN-QRL: Quantum Neural Network Integrated with Quantum Reinforcement Learning for Quantum Key Distribution	Bikash K. Behera et.al.	2501.18188	null
2025-01-30	Investigating Tax Evasion Emergence Using Dual Large Language Model and Deep Reinforcement Learning Powered Agent-based Simulation	Teddy Lazebnik et.al.	2501.18177	null
2025-01-30	B3C: A Minimalist Approach to Offline Multi-Agent Reinforcement Learning	Woojun Kim et.al.	2501.18138	null
2025-01-30	Diverse Preference Optimization	Jack Lanchantin et.al.	2501.18101	null
2025-01-30	Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method	Hoda Yamani et.al.	2501.18093	null
2025-01-30	DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems	Se-Wook Yoo et.al.	2501.18086	null
2025-01-29	From Sparse to Dense: Toddler-inspired Reward Transition in Goal-Oriented Reinforcement Learning	Junseok Park et.al.	2501.17842	null
2025-01-29	Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning	Haque Ishfaq et.al.	2501.17827	null
2025-01-29	Consensus Based Stochastic Control	Liyao Lyu et.al.	2501.17801	null
2025-01-29	CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization	Derui Wang et.al.	2501.17667	link
2025-01-29	Accelerated DC loadflow solver for topology optimization	Nico Westerbeck et.al.	2501.17529	null
2025-01-29	Human-Aligned Skill Discovery: Balancing Behaviour Exploration and Alignment	Maxence Hussonnois et.al.	2501.17431	null
2025-01-29	Certificated Actor-Critic: Hierarchical Reinforcement Learning with Control Barrier Functions for Safe Navigation	Junjun Xie et.al.	2501.17424	null
2025-01-29	Value Function Decomposition in Markov Recommendation Process	Xiaobei Wang et.al.	2501.17409	null
2025-01-29	A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning	Zhengpeng Xie et.al.	2501.17384	null
2025-01-29	ASAP: Learning Generalizable Online Bin Packing via Adaptive Selection After Pruning	Han Fang et.al.	2501.17377	null
2025-01-28	SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training	Tianzhe Chu et.al.	2501.17161	null
2025-01-28	Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning	Rémy Hosseinkhan Boucher et.al.	2501.17115	null
2025-01-28	Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction	Carl-Leander Henneking et.al.	2501.17112	null
2025-01-28	COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models	Tobias Materzok et.al.	2501.17104	null
2025-01-28	Learning Mean Field Control on Sparse Graphs	Christian Fabian et.al.	2501.17079	null
2025-01-28	Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning	Anna Soligo et.al.	2501.17077	null
2025-01-28	Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies	Manojkumar Parmar et.al.	2501.17030	null
2025-01-28	Network Slice-based Low-Altitude Intelligent Network for Advanced Air Mobility	Kai Xiong et.al.	2501.17014	null
2025-01-28	Heterogeneity-aware Personalized Federated Learning via Adaptive Dual-Agent Reinforcement Learning	Xi Chen et.al.	2501.16966	null
2025-01-28	On Rollouts in Model-Based Reinforcement Learning	Bernd Frauenknecht et.al.	2501.16918	link
2025-01-27	Upside Down Reinforcement Learning with Policy Generators	Jacopo Di Ventura et.al.	2501.16288	link
2025-01-27	Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach	Yang Xu et.al.	2501.16243	null
2025-01-27	Towards General-Purpose Model-Free Reinforcement Learning	Scott Fujimoto et.al.	2501.16142	link
2025-01-27	Quantifying the Self-Interest Level of Markov Social Dilemmas	Richard Willis et.al.	2501.16138	null
2025-01-27	ReFill: Reinforcement Learning for Fill-In Minimization	Elfarouk Harb et.al.	2501.16130	null
2025-01-27	Multi-Agent Meta-Offline Reinforcement Learning for Timely UAV Path Planning and Data Collection	Eslam Eldeeb et.al.	2501.16098	null
2025-01-27	Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback	Harry Emerson et.al.	2501.15972	null
2025-01-27	REINFORCE-ING Chemical Language Models in Drug Design	Morgan Thomas et.al.	2501.15971	null
2025-01-27	Inverse Reinforcement Learning via Convex Optimization	Hao Zhu et.al.	2501.15957	null
2025-01-27	Generative AI for Lyapunov Optimization Theory in UAV-based Low-Altitude Economy Networking	Zhang Liu et.al.	2501.15928	null
2025-01-24	An Attentive Graph Agent for Topology-Adaptive Cyber Defence	Ilya Orson Sandoval et.al.	2501.14700	link
2025-01-24	ACT-JEPA: Joint-Embedding Predictive Architecture Improves Policy Representation Learning	Aleksandar Vujinovic et.al.	2501.14622	null
2025-01-24	COMIX: Generalized Conflict Management in O-RAN xApps – Architecture, Workflow, and a Power Control case	Anastasios Giannopoulos et.al.	2501.14619	null
2025-01-24	Age and Power Minimization via Meta-Deep Reinforcement Learning in UAV Networks	Sankani Sarathchandra et.al.	2501.14603	null
2025-01-24	Reducing Action Space for Deep Reinforcement Learning via Causal Effect Estimation	Wenzhang Liu et.al.	2501.14543	link
2025-01-24	Breaking the Pre-Planning Barrier: Real-Time Adaptive Coordination of Mission and Charging UAVs Using Graph Reinforcement Learning	Yuhan Hu et.al.	2501.14488	null
2025-01-24	MARL-OT: Multi-Agent Reinforcement Learning Guided Online Fuzzing to Detect Safety Violation in Autonomous Driving Systems	Linfeng Liang et.al.	2501.14451	null
2025-01-24	Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent	Lucía Güitta-López et.al.	2501.14443	null
2025-01-24	SKIL: Semantic Keypoint Imitation Learning for Generalizable Data-efficient Manipulation	Shengjie Wang et.al.	2501.14400	null
2025-01-24	Reinforcement Learning for Efficient Returns Management	Pascal Linden et.al.	2501.14394	null
2025-01-23	CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation	Guofeng Cui et.al.	2501.13927	null
2025-01-23	Improving Video Generation with Human Feedback	Jie Liu et.al.	2501.13918	link
2025-01-23	GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration	Yue Fan et.al.	2501.13896	null
2025-01-23	Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning	Matyáš Lorenc et.al.	2501.13883	link
2025-01-23	A space-decoupling framework for optimization on bounded-rank matrices with orthogonally invariant constraints	Yan Yang et.al.	2501.13830	null
2025-01-23	Large Language Model driven Policy Exploration for Recommender Systems	Jie Wang et.al.	2501.13816	null
2025-01-23	Integrating Causality with Neurochaos Learning: Proposed Approach and Research Agenda	Nanjangud C. Narendra et.al.	2501.13763	null
2025-01-23	Scalable Safe Multi-Agent Reinforcement Learning for Multi-Agent System	Haikuo Du et.al.	2501.13727	null
2025-01-23	WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control	Claire Bizon Monroc et.al.	2501.13592	link
2025-01-23	Explainable AI-aided Feature Selection and Model Reduction for DRL-based V2X Resource Allocation	Nasir Khan et.al.	2501.13552	null
2025-01-22	Which Sensor to Observe? Timely Tracking of a Joint Markov Source with Model Predictive Control	Ismail Cosandal et.al.	2501.13099	null
2025-01-22	Attention-Driven Hierarchical Reinforcement Learning with Particle Filtering for Source Localization in Dynamic Fields	Yiwei Shi et.al.	2501.13084	null
2025-01-22	Evolution and The Knightian Blindspot of Machine Learning	Joel Lehman et.al.	2501.13075	null
2025-01-22	AdaWM: Adaptive World Model based Planning for Autonomous Driving	Hang Wang et.al.	2501.13072	null
2025-01-22	Optimizing Return Distributions with Distributional Dynamic Programming	Bernardo Ávila Pires et.al.	2501.13028	null
2025-01-22	MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking	Sebastian Farquhar et.al.	2501.13011	null
2025-01-22	An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management	Eslam Eldeeb et.al.	2501.12991	null
2025-01-22	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	DeepSeek-AI et.al.	2501.12948	link
2025-01-22	Offline Critic-Guided Diffusion Policy for Multi-User Delay-Constrained Scheduling	Zhuoran Li et.al.	2501.12942	null
2025-01-22	Reinforcement learning Based Automated Design of Differential Evolution Algorithm for Black-box Optimization	Xu Yang et.al.	2501.12881	null
2025-01-21	InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Yuhang Zang et.al.	2501.12368	link
2025-01-21	ARM-IRL: Adaptive Resilience Metric Quantification Using Inverse Reinforcement Learning	Abhijeet Sahu et.al.	2501.12362	null
2025-01-21	Sum Rate Enhancement using Machine Learning for Semi-Self Sensing Hybrid RIS-Enabled ISAC in THz Bands	Sara Farrag Mobarak et.al.	2501.12353	null
2025-01-21	Towards neural reinforcement learning for large deviations in nonequilibrium systems with memory	Venkata D. Pamulaparthy et.al.	2501.12333	null
2025-01-21	Heuristic Deep Reinforcement Learning for Phase Shift Optimization in RIS-assisted Secure Satellite Communication Systems with RSMA	Tingnan Bao et.al.	2501.12311	null
2025-01-21	RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression	Uri Gadot et.al.	2501.12216	null
2025-01-21	Experience-replay Innovative Dynamics	Tuo Zhang et.al.	2501.12199	null
2025-01-21	Extend Adversarial Policy Against Neural Machine Translation via Unknown Token	Wei Zou et.al.	2501.12183	null
2025-01-21	DNRSelect: Active Best View Selection for Deferred Neural Rendering	Dongli Wu et.al.	2501.12150	null
2025-01-21	Tackling Uncertainties in Multi-Agent Reinforcement Learning through Integration of Agent Termination Dynamics	Somnath Hazra et.al.	2501.12061	link
2025-01-17	DexForce: Extracting Force-informed Actions from Kinesthetic Demonstrations for Dexterous Manipulation	Claire Chen et.al.	2501.10356	null
2025-01-17	Enhancing AI Transparency: XRL-Based Resource Management and RAN Slicing for 6G ORAN Architecture	Suvidha Mhatre et.al.	2501.10292	null
2025-01-17	Enhancing UAV Path Planning Efficiency Through Accelerated Learning	Joseanne Viana et.al.	2501.10141	null
2025-01-17	Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking	Futian Wang et.al.	2501.10129	null
2025-01-17	PaSa: An LLM Agent for Comprehensive Academic Paper Search	Yichen He et.al.	2501.10120	link
2025-01-17	GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning	Zifeng Shi et.al.	2501.10116	null
2025-01-17	Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics	Chenhao Li et.al.	2501.10100	null
2025-01-17	ForestProtector: An IoT Architecture Integrating Machine Vision and Deep Reinforcement Learning for Efficient Wildfire Monitoring	Kenneth Bonilla-Ormachea et.al.	2501.09926	null
2025-01-17	SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning	Haichao Zhang et.al.	2501.09905	null
2025-01-16	From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation	Peilang Li et.al.	2501.09858	null
2025-01-16	Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models	Fengli Xu et.al.	2501.09686	null
2025-01-16	Optimizing hypergraph product codes with random walks, simulated annealing and reinforcement learning	Bruno C. A. Freire et.al.	2501.09622	null
2025-01-16	Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment	Chaoqi Wang et.al.	2501.09620	null
2025-01-16	EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning	Siddharth Aravindan et.al.	2501.09611	null
2025-01-16	RE-POSE: Synergizing Reinforcement Learning-Based Partitioning and Offloading for Edge Object Detection	Jianrui Shi et.al.	2501.09465	null
2025-01-16	ADAGE: A generic two-layer framework for adaptive agent based modelling	Benjamin Patrick Evans et.al.	2501.09429	null
2025-01-16	Fast Searching of Extreme Operating Conditions for Relay Protection Setting Calculation Based on Graph Neural Network and Reinforcement Learning	Yan Li et.al.	2501.09399	null
2025-01-16	Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse	Guangyuan Liu et.al.	2501.09391	null
2025-01-16	Adaptive Contextual Caching for Mobile Edge Large Language Model Service	Guangyuan Liu et.al.	2501.09383	null
2025-01-16	Solving Infinite-Player Games with Player-to-Strategy Networks	Carlos Martin et.al.	2501.09330	null
2025-01-15	Computing Approximated Fixpoints via Dampened Mann Iteration	Paolo Baldan et.al.	2501.08950	null
2025-01-15	A Reinforcement Learning Approach to Quiet and Safe UAM Traffic Management	Surya Murthy et.al.	2501.08941	null
2025-01-15	Reinforcement learning-based adaptive time-integration for nonsmooth dynamics	David Riley et.al.	2501.08934	null
2025-01-15	Projection Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning	Xinchen Han et.al.	2501.08907	null
2025-01-15	Deep Learning Meets Queue-Reactive: A Framework for Realistic Limit Order Book Simulation	Hamza Bodor et.al.	2501.08822	null
2025-01-15	Multi-visual modality micro drone-based structural damage detection	Isaac Osei Agyemanga et.al.	2501.08807	null
2025-01-15	Networked Agents in the Dark: Team Value Learning under Partial Observability	Guilherme S. Varela et.al.	2501.08778	null
2025-01-15	SPEQ: Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning	Carlo Romeo et.al.	2501.08669	null
2025-01-15	Application of Deep Reinforcement Learning to UAV Swarming for Ground Surveillance	Raúl Arranz et.al.	2501.08655	null
2025-01-15	RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation	Kaiqu Liang et.al.	2501.08617	null
2025-01-14	FDPP: Fine-tune Diffusion Policy with Human Preference	Yuxin Chen et.al.	2501.08259	null
2025-01-14	Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning	Enrique Adrian Villarrubia-Martin et.al.	2501.08234	null
2025-01-14	Optimization of Link Configuration for Satellite Communication Using Reinforcement Learning	Tobias Rohe et.al.	2501.08220	null
2025-01-14	In-situ graph reasoning and knowledge expansion using Graph-PReFLexOR	Markus J. Buehler et.al.	2501.08120	null
2025-01-14	Data-driven inventory management for new products: A warm-start and adjusted Dyna- $Q$ approach	Xinyu Qu et.al.	2501.08109	null
2025-01-14	Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving	Guizhe Jin et.al.	2501.08096	null
2025-01-14	CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning	Guoliang He et.al.	2501.08071	null
2025-01-14	Continual Reinforcement Learning for Digital Twin Synchronization Optimization	Haonan Tong et.al.	2501.08045	null
2025-01-14	READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data	Rohit Sharma et.al.	2501.08035	null
2025-01-14	Cooperative Patrol Routing: Optimizing Urban Crime Surveillance through Multi-Agent Reinforcement Learning	Juan Palma-Borda et.al.	2501.08020	null
2025-01-13	SafeSwarm: Decentralized Safe RL for the Swarm of Drones Landing in Dense Crowds	Grik Tadevosyan et.al.	2501.07566	null
2025-01-13	Improving DeFi Accessibility through Efficient Liquidity Provisioning with Deep Reinforcement Learning	Haonan Xu et.al.	2501.07508	null
2025-01-13	RbRL2.0: Integrated Reward and Policy Learning for Rating-based Reinforcement Learning	Mingkang Wu et.al.	2501.07502	null
2025-01-13	Online inductive learning from answer sets for efficient reinforcement learning exploration	Celeste Veronese et.al.	2501.07445	null
2025-01-13	Attention when you need	Lokesh Boominathan et.al.	2501.07440	null
2025-01-13	Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data	Shilong Deng et.al.	2501.07346	link
2025-01-13	Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring	Buse Sibel Korkmaz et.al.	2501.07324	link
2025-01-13	Mining Intraday Risk Factor Collections via Hierarchical Reinforcement Learning based on Transferred Options	Wenyan Xu et.al.	2501.07274	null
2025-01-13	Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer	Chongming Gao et.al.	2501.07212	null
2025-01-13	Generalizable Graph Neural Networks for Robust Power Grid Topology Control	Matthijs de Jong et.al.	2501.07186	null
2025-01-10	From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training	Julius Berner et.al.	2501.06148	link
2025-01-10	Vehicle-in-Virtual-Environment (VVE) Based Autonomous Driving Function Development and Evaluation Methodology for Vulnerable Road User Safety	Haochong Chen et.al.	2501.06113	null
2025-01-10	Learning Flexible Heterogeneous Coordination with Capability-Aware Shared Hypernetworks	Kevin Fu et.al.	2501.06058	null
2025-01-10	Investigating the Impact of Observation Space Design Choices On Training Reinforcement Learning Solutions for Spacecraft Problems	Nathaniel Hamilton et.al.	2501.06016	null
2025-01-10	The Safe Trusted Autonomy for Responsible Space Program	Kerianne L. Hobbs et.al.	2501.05984	null
2025-01-10	A Practical Demonstration of DRL-Based Dynamic Resource Allocation xApp Using OpenAirInterface	Onur Sever et.al.	2501.05879	null
2025-01-10	Diffusion Models for Smarter UAVs: Decision-Making and Modeling	Yousef Emami et.al.	2501.05819	null
2025-01-10	Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform	Jingyi Cheng et.al.	2501.05808	null
2025-01-10	Understanding Impact of Human Feedback via Influence Functions	Taywon Min et.al.	2501.05790	link
2025-01-09	Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning	Tao Liu et.al.	2501.05591	null
2025-01-09	TimeRL: Efficient Deep Reinforcement Learning with Polyhedral Dependence Graphs	Pedro F. Silvestre et.al.	2501.05408	null
2025-01-09	Search-o1: Agentic Search-Enhanced Large Reasoning Models	Xiaoxi Li et.al.	2501.05366	link
2025-01-09	Knowledge Transfer in Model-Based Reinforcement Learning Agents for Efficient Multi-Task Learning	Dmytro Kuzmenko et.al.	2501.05329	null
2025-01-09	Design and Control of a Bipedal Robotic Character	Ruben Grandia et.al.	2501.05204	null
2025-01-09	Constrained Optimization of Charged Particle Tracking with Multi-Agent Reinforcement Learning	Tobias Kortus et.al.	2501.05113	null
2025-01-09	LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models	Zengqi Peng et.al.	2501.05057	null
2025-01-09	CuRLA: Curriculum Learning Based Deep Reinforcement Learning for Autonomous Driving	Bhargava Uppuluri et.al.	2501.04982	null
2025-01-09	Promoting Shared Energy Storage Aggregation among High Price-Tolerance Prosumer: An Incentive Deposit and Withdrawal Service	Xin Lu et.al.	2501.04964	null
2025-01-09	Balancing Exploration and Cybersickness: Investigating Curiosity-Driven Behavior in Virtual Environments	Tangyao Li et.al.	2501.04905	null
2025-01-08	Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning	Sergio Rozada et.al.	2501.04879	null
2025-01-08	Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought	Violet Xiang et.al.	2501.04682	null
2025-01-08	Framework for Integrating Machine Learning Methods for Path-Aware Source Routing	Anees Al-Najjar et.al.	2501.04624	null
2025-01-08	MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data	Zifan Wang et.al.	2501.04595	null
2025-01-08	HypeRL: Parameter-Informed Reinforcement Learning for Parametric PDEs	Nicolò Botteghi et.al.	2501.04538	null
2025-01-08	Safe Reinforcement Learning with Minimal Supervision	Alexander Quessy et.al.	2501.04481	null
2025-01-08	Research on environment perception and behavior prediction of intelligent UAV based on semantic communication	Kechong Ren et.al.	2501.04480	null
2025-01-08	Hybrid Artificial Intelligence Strategies for Drone Navigation	Rubén San-Segundo et.al.	2501.04472	null
2025-01-08	Risk-averse policies for natural gas futures trading using distributional reinforcement learning	Félicien Hêche et.al.	2501.04421	null
2025-01-08	Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions	Yu Ishihara et.al.	2501.04228	null
2025-01-07	Explainable Reinforcement Learning via Temporal Policy Decomposition	Franco Ruggeri et.al.	2501.03902	null
2025-01-07	Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies	Kexin Gu Baugh et.al.	2501.03888	null
2025-01-07	AlphaPO – Reward shape matters for LLM alignment	Aman Gupta et.al.	2501.03884	null
2025-01-07	Online Reinforcement Learning-Based Dynamic Adaptive Evaluation Function for Real-Time Strategy Tasks	Weilong Yang et.al.	2501.03824	null
2025-01-07	Run-and-tumble chemotaxis using reinforcement learning	Ramesh Pramanik et.al.	2501.03687	null
2025-01-07	IEEE 802.11bn Multi-AP Coordinated Spatial Reuse with Hierarchical Multi-Armed Bandits	Maksymilian Wojnar et.al.	2501.03680	null
2025-01-07	SALE-Based Offline Reinforcement Learning with Ensemble Q-Networks	Zheng Chun et.al.	2501.03676	null
2025-01-07	Imitation Learning of MPC with Neural Networks: Error Guarantees and Sparsification	Hendrik Alsmeier et.al.	2501.03671	null
2025-01-07	Rethinking Adversarial Attacks in Reinforcement Learning from Policy Distribution Perspective	Tianyang Duan et.al.	2501.03562	null
2025-01-07	Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment	Prashant Trivedi et.al.	2501.03486	null
2025-01-06	Turn-based Multi-Agent Reinforcement Learning Model Checking	Dennis Gross et.al.	2501.03187	null
2025-01-06	Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies	Dennis Gross et.al.	2501.03142	null
2025-01-06	CALM: Curiosity-Driven Auditing for Large Language Models	Xiang Zheng et.al.	2501.02997	null
2025-01-06	CAMP: Collaborative Attention Model with Profiles for Vehicle Routing Problems	Chuanbo Hua et.al.	2501.02977	null
2025-01-06	Sim-to-Real Transfer for Mobile Robots with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots	Sahar Salimpour et.al.	2501.02902	link
2025-01-06	Revisiting Communication Efficiency in Multi-Agent Reinforcement Learning from the Dimensional Analysis Perspective	Chuxiong Sun et.al.	2501.02888	null
2025-01-06	First-place Solution for Streetscape Shop Sign Recognition Competition	Bin Wang et.al.	2501.02811	null
2025-01-06	Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model	Yueqin Yin et.al.	2501.02790	null
2025-01-06	Joint Optimization of UAV-Carried IRS for Urban Low Altitude mmWave Communications with Deep Reinforcement Learning	Wenwen Xie et.al.	2501.02787	null
2025-01-06	Learn A Flexible Exploration Model for Parameterized Action Markov Decision Processes	Zijian Wang et.al.	2501.02774	null
2025-01-03	Evaluating Scenario-based Decision-making for Interactive Autonomous Driving Using Rational Criteria: A Survey	Zhen Tian et.al.	2501.01886	null
2025-01-03	Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models	Yanjiang Liu et.al.	2501.01830	null
2025-01-03	Genetic algorithm enhanced Solovay-Kitaev algorithm for quantum compiling	Jiangwei Long et.al.	2501.01746	null
2025-01-03	Proposing Hierarchical Goal-Conditioned Policy Planning in Multi-Goal Reinforcement Learning	Gavin B. Rens et.al.	2501.01727	null
2025-01-03	Inversely Learning Transferable Rewards via Abstracted States	Yikang Gui et.al.	2501.01669	null
2025-01-03	BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems	Yinbo Yu et.al.	2501.01593	null
2025-01-02	Reinforcement-learning-based control of turbulent channel flows at high Reynolds numbers	Zisong Zhou et.al.	2501.01573	null
2025-01-02	Reinforcement Learning for Respondent-Driven Sampling	Justin Weltz et.al.	2501.01505	null
2025-01-02	Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension	Yanbo Fang et.al.	2501.01332	null
2025-01-02	Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems	Shunxing Yang et.al.	2501.01281	null
2025-01-02	PIMAEX: Multi-Agent Exploration through Peer Incentivization	Michael Kölle et.al.	2501.01266	null
2025-01-02	Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method	Ruichen Zhang et.al.	2501.01141	null
2025-01-02	Communicating Unexpectedness for Out-of-Distribution Multi-Agent Reinforcement Learning	Min Whoo Lee et.al.	2501.01140	null
2025-01-02	Symmetries-enhanced Multi-Agent Reinforcement Learning	Nikolaos Bousias et.al.	2501.01136	null
2025-01-02	Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning	Chenglu Sun et.al.	2501.01085	null
2025-01-02	Enhancing Neural Adaptive Wireless Video Streaming via Lower-Layer Information Exposure and Online Tuning	Lingzhi Zhao et.al.	2501.01044	null
2025-01-02	Energy-Efficient and Intelligent ISAC in V2X Networks with Spiking Neural Networks-Driven DRL	Chen Shang et.al.	2501.01038	null
2025-01-02	Deep Reinforcement Learning for Job Scheduling and Resource Management in Cloud Computing: An Algorithm-Level Review	Yan Gu et.al.	2501.01007	null
2024-12-30	Advances in Multi-agent Reinforcement Learning: Persistent Autonomy and Robot Learning Lab Report 2024	Reza Azadeh et.al.	2412.21088	null
2024-12-30	Learning Epidemiological Dynamics via the Finite Expression Method	Jianda Du et.al.	2412.21049	null
2024-12-30	Weber-Fechner Law in Temporal Difference learning derived from Control as Inference	Keiichiro Takahashi et.al.	2412.21004	null
2024-12-30	LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency	Xiao-Yin Liu et.al.	2412.21001	link
2024-12-30	UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI	Fangwei Zhong et.al.	2412.20977	null
2024-12-30	Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients	Dongdong Li et.al.	2412.20845	null
2024-12-30	Isoperimetry is All We Need: Langevin Posterior Sampling for RL with Sublinear Regret	Emilio Jorge et.al.	2412.20824	null
2024-12-29	The intrinsic motivation of reinforcement and imitation learning for sequential tasks	Sao Mai Nguyen et.al.	2412.20573	null
2024-12-29	Diminishing Return of Value Expansion Methods	Daniel Palenicek et.al.	2412.20537	link
2024-12-29	Game Theory and Multi-Agent Reinforcement Learning : From Nash Equilibria to Evolutionary Dynamics	Neil De La Fuente et.al.	2412.20523	null
2024-12-27	From Ceilings to Walls: Universal Dynamic Perching of Small Aerial Robots on Surfaces with Variable Orientations	Bryan Habas et.al.	2412.19765	null
2024-12-27	Adaptive Context-Aware Multi-Path Transmission Control for VR/AR Content: A Deep Reinforcement Learning Approach	Shakil Ahmed et.al.	2412.19737	null
2024-12-27	Goal-oriented Communications based on Recursive Early Exit Neural Networks	Jary Pomponi et.al.	2412.19587	null
2024-12-27	Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization	Shixuan Liu et.al.	2412.19578	null
2024-12-27	Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing	Yongbiao Gao et.al.	2412.19563	null
2024-12-27	Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning	Xuan Zhou et.al.	2412.19538	null
2024-12-27	An Overview of Machine Learning-Driven Resource Allocation in IoT Networks	Zhengdong Li et.al.	2412.19478	null
2024-12-27	DeepSeek-V3 Technical Report	DeepSeek-AI et.al.	2412.19437	link
2024-12-27	Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback	Seong Jin Lee et.al.	2412.19436	null
2024-12-27	Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe	Kiran Koshy Thekumparampil et.al.	2412.19396	null
2024-12-24	Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making	David Shoresh et.al.	2412.18593	null
2024-12-24	Dynamic Optimization of Portfolio Allocation Using Deep Reinforcement Learning	Gang Huang et.al.	2412.18563	link
2024-12-24	Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving	Hao Pang et.al.	2412.18511	null
2024-12-24	Joint Adaptive OFDM and Reinforcement Learning Design for Autonomous Vehicles: Leveraging Age of Updates	Mamady Delamou et.al.	2412.18500	null
2024-12-24	Contrastive Representation for Interactive Recommendation	Jingyu Li et.al.	2412.18396	link
2024-12-24	Navigating Data Corruption in Machine Learning: Balancing Quality, Quantity, and Imputation Strategies	Qi Liu et.al.	2412.18296	null
2024-12-24	Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization	Jiacai Liu et.al.	2412.18279	null
2024-12-24	Accelerating AIGC Services with Latent Action Diffusion Scheduling in Edge Networks	Changfu Xu et.al.	2412.18212	link
2024-12-24	Quantum framework for Reinforcement Learning: integrating Markov Decision Process, quantum arithmetic, and trajectory search	Thet Htar Su et.al.	2412.18208	null
2024-12-24	Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models	Xiaomeng Hu et.al.	2412.18171	null
2024-12-23	HyperQ-Opt: Q-learning for Hyperparameter Optimization	Md. Tarek Hasan et.al.	2412.17765	null
2024-12-23	Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking	Yun Liu et.al.	2412.17730	null
2024-12-23	SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC	Yue Deng et.al.	2412.17707	link
2024-12-23	Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning	Huchen Jiang et.al.	2412.17397	null
2024-12-23	Reinforcement Learning with a Focus on Adjusting Policies to Reach Targets	Akane Tsuboya et.al.	2412.17344	null
2024-12-23	Multimodal Deep Reinforcement Learning for Portfolio Optimization	Sumit Nawathe et.al.	2412.17293	null
2024-12-23	LMD-PGN: Cross-Modal Knowledge Distillation from First-Person-View Images to Third-Person-View BEV Maps for Universal Point Goal Navigation	Riku Uemura et.al.	2412.17282	null
2024-12-23	ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models	Chengran Yang et.al.	2412.17264	null
2024-12-23	A Coalition Game for On-demand Multi-modal 3D Automated Delivery System	Farzan Moosavi et.al.	2412.17252	null
2024-12-23	Model-free stochastic linear quadratic design by semidefinite programming	Jing Guo et.al.	2412.17230	null
2024-12-20	Offline Reinforcement Learning for LLM Multi-Step Reasoning	Huaijie Wang et.al.	2412.16145	null
2024-12-20	APIRL: Deep Reinforcement Learning for REST API Fuzzing	Myles Foley et.al.	2412.15991	link
2024-12-20	Active Flow Control for Bluff Body under High Reynolds Number Turbulent Flow Conditions Using Deep Reinforcement Learning	Jingbo Chen et.al.	2412.15975	null
2024-12-20	From General to Specific: Tailoring Large Language Models for Personalized Healthcare	Ruize Shi et.al.	2412.15957	null
2024-12-20	What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning	Yiran Ma et.al.	2412.15904	null
2024-12-20	Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback	Jiaming Ji et.al.	2412.15838	link
2024-12-20	MacLight: Multi-scene Aggregation Convolutional Learning for Traffic Signal Control	Sunbowen Lee et.al.	2412.15703	link
2024-12-20	AIR: Unifying Individual and Cooperative Exploration in Collective Multi-Agent Reinforcement Learning	Guangchong Zhou et.al.	2412.15700	link
2024-12-20	Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning	Lunjun Liu et.al.	2412.15639	null
2024-12-20	Dexterous Manipulation Based on Prior Dexterous Grasp Pose Knowledge	Hengxu Yan et.al.	2412.15587	null
2024-12-19	Qwen2.5 Technical Report	Qwen et.al.	2412.15115	null
2024-12-19	Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination	Leonardo Barcellona et.al.	2412.14957	null
2024-12-19	Effective Method with Compression for Distributed and Federated Cocoercive Variational Inequalities	Daniil Medyakov et.al.	2412.14935	null
2024-12-19	Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning	Anthony Kobanda et.al.	2412.14865	null
2024-12-19	Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning	Mohammadreza nakhaei et.al.	2412.14834	link
2024-12-19	Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning	Aditya Kapoor et.al.	2412.14779	null
2024-12-19	Learning to Generate Research Idea with Dynamic Control	Ruochen Li et.al.	2412.14626	null
2024-12-19	Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues	Tao He et.al.	2412.14584	null
2024-12-19	Single-Loop Federated Actor-Critic across Heterogeneous Environments	Ye Zhu et.al.	2412.14555	null
2024-12-18	Implementing TD3 to train a Neural Network to fly a Quadcopter through an FPV Gate	Patrick Thomas et.al.	2412.14367	null
2024-12-18	Learning from Massive Human Videos for Universal Humanoid Pose Control	Jiageng Mao et.al.	2412.14172	null
2024-12-18	Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective	Zhiyuan Zeng et.al.	2412.14135	null
2024-12-18	Alignment faking in large language models	Ryan Greenblatt et.al.	2412.14093	link
2024-12-18	Spatio-Temporal SIR Model of Pandemic Spread During Warfare with Optimal Dual-use Healthcare System Administration using Deep Reinforcement Learning	Adi Shuchami et.al.	2412.14039	null
2024-12-18	Robust Optimal Safe and Stability Guaranteeing Reinforcement Learning Control for Quadcopter	Sanghyoup Gu et.al.	2412.14003	null
2024-12-18	Harvesting energy from turbulent winds with Reinforcement Learning	Lorenzo Basile et.al.	2412.13961	null
2024-12-18	RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation	Kun Wu et.al.	2412.13877	null
2024-12-18	AI-Powered Algorithm-Centric Quantum Processor Topology Design	Tian Li et.al.	2412.13805	link
2024-12-18	Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN	Pengxiang Li et.al.	2412.13795	link
2024-12-18	A hybrid learning agent for episodic learning tasks with unknown target distance	Oliver Sefrin et.al.	2412.13686	null
2024-12-17	ExBody2: Advanced Expressive Humanoid Whole-Body Control	Mazeyu Ji et.al.	2412.13196	null
2024-12-17	Tilted Quantile Gradient Updates for Quantile-Constrained Reinforcement Learning	Chenglin Li et.al.	2412.13184	link
2024-12-17	Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions	Juan Del Aguila Ferrandis et.al.	2412.13157	null
2024-12-17	Practicable Black-box Evasion Attacks on Link Prediction in Dynamic Graphs – A Graph Sequential Embedding Method	Jiate Li et.al.	2412.13134	link
2024-12-17	Active Reinforcement Learning Strategies for Offline Policy Improvement	Ambedkar Dukkipati et.al.	2412.13106	null
2024-12-17	Reservoir Computing for Fast, Simplified Reinforcement Learning on Memory Tasks	Kevin McKee et.al.	2412.13093	null
2024-12-17	SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks	Mátyás Vincze et.al.	2412.13053	null
2024-12-17	Relational Neurosymbolic Markov Models	Lennert De Smet et.al.	2412.13023	null
2024-12-17	Future Aspects in Human Action Recognition: Exploring Emerging Techniques and Ethical Influences	Antonios Gasteratos et.al.	2412.12990	null
2024-12-17	Guiding Generative Protein Language Models with Reinforcement Learning	Filippo Stocco et.al.	2412.12979	null
2024-12-16	MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization	Bhavya Sukhija et.al.	2412.12098	null
2024-12-16	Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation	Eliot Xing et.al.	2412.12089	null
2024-12-16	Artificial Intelligence in Traffic Systems	Ritwik Raj Saxena et.al.	2412.12046	null
2024-12-16	Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps	Linfeng Zhao et.al.	2412.12024	null
2024-12-16	Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm	Rajat Khanda et.al.	2412.12006	null
2024-12-16	AlphaZero Neural Scaling and Zipf’s Law: a Tale of Board Games and Power Laws	Oren Neumann et.al.	2412.11979	link
2024-12-16	Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning	Qi Sun et.al.	2412.11974	link
2024-12-16	Hierarchical Meta-Reinforcement Learning via Automated Macro-Action Discovery	Minjae Cho et.al.	2412.11930	null
2024-12-16	Generalized Bayesian deep reinforcement learning	Shreya Sinha Roy et.al.	2412.11743	null
2024-12-16	Learning UAV-based path planning for efficient localization of objects using prior knowledge	Rick van Essen et.al.	2412.11717	null
2024-12-13	A Novel Framework Using Deep Reinforcement Learning for Join Order Selection	Chang Liu et.al.	2412.10253	null
2024-12-13	Physics Instrument Design with Reinforcement Learning	Shah Rukh Qasim et.al.	2412.10237	null
2024-12-13	Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation	Federico Julian Camerota Verdù et.al.	2412.10163	null
2024-12-13	AMUSE: Adaptive Model Updating using a Simulated Environment	Louis Chislett et.al.	2412.10119	null
2024-12-13	Reward Machine Inference for Robotic Manipulation	Mattijs Baert et.al.	2412.10096	null
2024-12-13	Optimized Coordination Strategy for Multi-Aerospace Systems in Pick-and-Place Tasks By Deep Neural Network	Ye Zhang et.al.	2412.09877	null
2024-12-13	RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning	Charles Xu et.al.	2412.09858	null
2024-12-13	ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression	Kai Yao et.al.	2412.09812	null
2024-12-12	GainAdaptor: Learning Quadrupedal Locomotion with Dual Actors for Adaptable and Energy-Efficient Walking on Various Terrains	Mincheol Kim et.al.	2412.09520	null
2024-12-12	Distributional Reinforcement Learning based Integrated Decision Making and Control for Autonomous Surface Vehicles	Xi Lin et.al.	2412.09466	link
2024-12-12	Learning to Adapt: Bio-Inspired Gait Strategies for Versatile Quadruped Locomotion	Joseph Humphreys et.al.	2412.09440	null
2024-12-12	Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer	Adam Labiosa et.al.	2412.09417	null
2024-12-12	Does Low Spoilage Under Cold Conditions Foster Cultural Complexity During the Foraging Era? – A Theoretical and Computational Inquiry	Minhyeok Lee et.al.	2412.09335	null
2024-12-12	Learning to be Indifferent in Complex Decisions: A Coarse Payoff-Assessment Model	Philippe Jehiel et.al.	2412.09321	null
2024-12-12	Learning Novel Skills from Language-Generated Demonstrations	Ao-Qun Jin et.al.	2412.09286	null
2024-12-12	Student-Informed Teacher Training	Nico Messikommer et.al.	2412.09149	null
2024-12-12	Reconfigurable Intelligent Surface for Internet of Robotic Things	Wanli Ni et.al.	2412.09117	null
2024-12-12	In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning	Songjun Tu et.al.	2412.09104	null
2024-12-11	Learning Sketch Decompositions in Planning via Deep Reinforcement Learning	Michael Aichmüller et.al.	2412.08574	null
2024-12-11	GenPlan: Generative sequence models as adaptive planners	Akash Karthikeyan et.al.	2412.08565	null
2024-12-11	An End-to-End Collaborative Learning Approach for Connected Autonomous Vehicles in Occluded Scenarios	Leandro Parada et.al.	2412.08562	null
2024-12-11	MaestroMotif: Skill Design from Artificial Intelligence Feedback	Martin Klissarov et.al.	2412.08542	null
2024-12-11	Subspace-wise Hybrid RL for Articulated Object Manipulation	Yujin Kim et.al.	2412.08522	null
2024-12-11	Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation	Huiyuan Lai et.al.	2412.08473	null
2024-12-11	IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health	Gauri Jain et.al.	2412.08463	link
2024-12-11	SINERGYM – A virtual testbed for building energy optimization with Reinforcement Learning	Alejandro Campoy-Nieves et.al.	2412.08293	link
2024-12-11	Coarse-to-Fine: A Dual-Phase Channel-Adaptive Method for Wireless Image Transmission	Hanlei Li et.al.	2412.08211	null
2024-12-11	Learn How to Query from Unlabeled Data Streams in Federated Learning	Yuchang Sun et.al.	2412.08138	link
2024-12-10	Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control	Chenhao Lu et.al.	2412.07773	null
2024-12-10	Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data	Zhiyuan Zhou et.al.	2412.07762	null
2024-12-10	Optimizing Sensor Redundancy in Sequential Decision-Making Problems	Jonas Nüßlein et.al.	2412.07686	null
2024-12-10	Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization	Zongkai Liu et.al.	2412.07639	null
2024-12-10	Swarm Behavior Cloning	Jonas Nüßlein et.al.	2412.07617	null
2024-12-10	Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery	Amin Abyaneh et.al.	2412.07544	null
2024-12-10	ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning	Hongshu Guo et.al.	2412.07507	null
2024-12-10	Optimizing pulsed blowing parameters for active separation control in a one-sided diffuser using reinforcement learning	Alexandra Müller et.al.	2412.07480	null
2024-12-10	Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulation for Time-Efficient Fine-Resolution Policy Learning	Yuki Kadokawa et.al.	2412.07477	null
2024-12-10	RLT4Rec: Reinforcement Learning Transformer for User Cold Start and Item Recommendation	Dilina Chandika Rajapakse et.al.	2412.07403	null
2024-12-09	Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning	Ali Devran Kara et.al.	2412.06735	null
2024-12-09	Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone	Max Sobol Mark et.al.	2412.06685	null
2024-12-09	Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures	Adrien Bolland et.al.	2412.06655	null
2024-12-09	Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation	Egor Cherepanov et.al.	2412.06531	null
2024-12-09	SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation	Catalin E. Brita et.al.	2412.06486	link
2024-12-09	Edge Delayed Deep Deterministic Policy Gradient: efficient continuous control for edge scenarios	Alberto Sinigaglia et.al.	2412.06390	null
2024-12-09	Tracking control of latent dynamic systems with application to spacecraft attitude control	Congxi Zhang et.al.	2412.06342	null
2024-12-09	Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi	F. Bredell et.al.	2412.06333	null
2024-12-09	Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information	Junqiao Wang et.al.	2412.06313	null
2024-12-09	A Scalable Decentralized Reinforcement Learning Framework for UAV Target Localization Using Recurrent PPO	Leon Fernando et.al.	2412.06231	null
2024-12-06	Reinforcement Learning: An Overview	Kevin Murphy et.al.	2412.05265	null
2024-12-06	TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft	Qian Long et.al.	2412.05255	link
2024-12-06	LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds	James Beetham et.al.	2412.05232	null
2024-12-06	FlowPolicy: Enabling Fast and Robust 3D Flow-based Policy via Consistency Flow Matching for Robot Manipulation	Qinglun Zhang et.al.	2412.04987	null
2024-12-06	Putting the Iterative Training of Decision Trees to the Test on a Real-World Robotic Task	Raphael C. Engelhardt et.al.	2412.04974	null
2024-12-06	DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling	Minzheng Wang et.al.	2412.04905	link
2024-12-06	Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment	Ran Tian et.al.	2412.04835	null
2024-12-06	Learning-based Control for Tendon-Driven Continuum Robotic Arms	Nima Maghooli et.al.	2412.04829	null
2024-12-06	A Temporally Correlated Latent Exploration for Reinforcement Learning	SuMin Oh et.al.	2412.04775	null
2024-12-06	Measuring Goal-Directedness	Matt MacDermott et.al.	2412.04758	null
2024-12-05	Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy	Keru Chen et.al.	2412.04426	null
2024-12-05	Intersection-Aware Assessment of EMS Accessibility in NYC: A Data-Driven Approach	Haoran Su et.al.	2412.04369	null
2024-12-05	Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting	Edoardo Cetin et.al.	2412.04368	null
2024-12-05	Reinforcement Learning for Freeway Lane-Change Regulation via Connected Vehicles	Ke Sun et.al.	2412.04341	null
2024-12-05	Action Mapping for Reinforcement Learning in Continuous Environments with Constraints	Mirco Theile et.al.	2412.04327	null
2024-12-05	GRAM: Generalization in Deep RL with a Robust Adaptation Module	James Queeney et.al.	2412.04323	link
2024-12-05	Reinforcement Learning from Wild Animal Videos	Elliot Chane-Sane et.al.	2412.04273	null
2024-12-05	HyperMARL: Adaptive Hypernetworks for Multi-Agent RL	Kale-ab Abebe Tessera et.al.	2412.04233	null
2024-12-05	A Dynamic Safety Shield for Safe and Efficient Reinforcement Learning of Navigation Tasks	Murad Dawood et.al.	2412.04153	null
2024-12-05	Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning	Shicheng Zhou et.al.	2412.04078	link
2024-12-04	AI-Driven Day-to-Day Route Choice	Leizhen Wang et.al.	2412.03338	null
2024-12-04	Rotograb: Combining Biomimetic Hands with Industrial Grippers using a Rotating Thumb	Arnaud Bersier et.al.	2412.03279	null
2024-12-04	Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning	Mianchu Wang et.al.	2412.03258	null
2024-12-04	Alignment at Pre-training! Towards Native Alignment for Arabic LLMs	Juhao Liang et.al.	2412.03253	link
2024-12-04	Variable-Speed Teaching-Playback as Real-World Data Augmentation for Imitation Learning	Nozomu Masuya et.al.	2412.03252	null
2024-12-04	Using Deep Reinforcement Learning to Enhance Channel Sampling Patterns in Integrated Sensing and Communication	Federico Mason et.al.	2412.03157	null
2024-12-04	Experience-driven discovery of planning strategies	Ruiqi He et.al.	2412.03111	null
2024-12-04	Less is More: A Stealthy and Efficient Adversarial Attack Method for DRL-based Autonomous Driving Policies	Junchao Fan et.al.	2412.03051	null
2024-12-04	Learning Whole-Body Loco-Manipulation for Omni-Directional Task Space Pose Tracking with a Wheeled-Quadrupedal-Manipulator	Kaiwen Jiang et.al.	2412.03012	null
2024-12-04	Data Acquisition for Improving Model Fairness using Reinforcement Learning	Jahid Hasan et.al.	2412.03009	null
2024-12-03	UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping	Wenbo Wang et.al.	2412.02699	link
2024-12-03	Preliminary Investigation into Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving	Yupeng Zheng et.al.	2412.02689	null
2024-12-03	T-REG: Preference Optimization with Token-Level Reward Regularization	Wenxuan Zhou et.al.	2412.02685	link
2024-12-03	AI-Driven Resource Allocation Framework for Microservices in Hybrid Cloud Platforms	Biman Barua et.al.	2412.02610	null
2024-12-03	Explainable CTR Prediction via LLM Reasoning	Xiaohan Yu et.al.	2412.02588	null
2024-12-03	Mobile Cell-Free Massive MIMO with Multi-Agent Reinforcement Learning: A Scalable Framework	Ziheng Liu et.al.	2412.02581	null
2024-12-03	Generating Critical Scenarios for Testing Automated Driving Systems	Trung-Hieu Nguyen et.al.	2412.02574	link
2024-12-03	Cooperative Cruising: Reinforcement Learning based Time-Headway Control for Increased Traffic Efficiency	Yaron Veksler et.al.	2412.02520	null
2024-12-03	Reinforcement learning to learn quantum states for Heisenberg scaling accuracy	Jeongwoo Jae et.al.	2412.02334	null
2024-12-03	Optimizing Plastic Waste Collection in Water Bodies Using Heterogeneous Autonomous Surface Vehicles with Deep Reinforcement Learning	Alejandro Mendoza Barrionuevo et.al.	2412.02316	null
2024-11-29	PDDLFuse: A Tool for Generating Diverse Planning Domains	Vedant Khandelwal et.al.	2411.19886	null
2024-11-29	CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives	Armin Saghafian et.al.	2411.19787	link
2024-11-29	HVAC-DPT: A Decision Pretrained Transformer for HVAC Control	Anaïs Berkes et.al.	2411.19746	null
2024-11-29	Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning	Severin Bochem et.al.	2411.19732	null
2024-11-29	RMIO: A Model-Based MARL Framework for Scenarios with Observation Loss in Some Agents	Shi Zifeng et.al.	2411.19639	null
2024-11-29	Build An Influential Bot In Social Media Simulations With Large Language Models	Bailu Jin et.al.	2411.19635	null
2024-11-29	Adaptive dynamics of Ising spins in one dimension leveraging Reinforcement Learning	Anish Kumar et.al.	2411.19602	null
2024-11-29	Solving Rubik’s Cube Without Tricky Sampling	Yicheng Lin et.al.	2411.19583	null
2024-11-29	Training Agents with Weakly Supervised Feedback from Large Language Models	Dihong Gong et.al.	2411.19547	null
2024-11-29	A Local Information Aggregation based Multi-Agent Reinforcement Learning for Robot Swarm Dynamic Task Allocation	Yang Lv et.al.	2411.19526	null
2024-11-27	Robust Offline Reinforcement Learning with Linearly Structured $f$ -Divergence Regularization	Cheng Tang et.al.	2411.18612	null
2024-11-27	A Talent-infused Policy-gradient Approach to Efficient Co-Design of Morphology and Task Allocation Behavior of Multi-Robot Systems	Prajit KrisshnaKumar et.al.	2411.18519	null
2024-11-27	G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation	Tianxing Chen et.al.	2411.18369	null
2024-11-27	Two-Timescale Digital Twin Assisted Model Interference and Retraining over Wireless Network	Jiayi Cong et.al.	2411.18329	null
2024-11-27	Application of Soft Actor-Critic Algorithms in Optimizing Wastewater Treatment with Time Delays Integration	Esmaeel Mohammadi et.al.	2411.18305	null
2024-11-27	NeoHebbian Synapses to Accelerate Online Training of Neuromorphic Hardware	Shubham Pande et.al.	2411.18272	null
2024-11-27	Dynamic Retail Pricing via Q-Learning – A Reinforcement Learning Framework for Enhanced Revenue Management	Mohit Apte et.al.	2411.18261	null
2024-11-27	Dependency-Aware CAV Task Scheduling via Diffusion-Based Reinforcement Learning	Xiang Cheng et.al.	2411.18230	null
2024-11-27	Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning	Di Zhang et.al.	2411.18203	link
2024-11-27	Learning for Long-Horizon Planning via Neuro-Symbolic Abductive Imitation	Jie-Jing Shao et.al.	2411.18201	link
2024-11-26	Multi-Objective Reinforcement Learning for Automated Resilient Cyber Defence	Ross O’Driscoll et.al.	2411.17585	null
2024-11-26	Ensuring Safety in Target Pursuit Control: A CBF-Safe Reinforcement Learning Approach	Yaosheng Deng et.al.	2411.17552	null
2024-11-26	IMPROVE: Improving Medical Plausibility without Reliance on HumanValidation – An Enhanced Prototype-Guided Diffusion Framework	Anurag Shandilya et.al.	2411.17535	null
2024-11-26	Spatially Visual Perception for End-to-End Robotic Learning	Travis Davies et.al.	2411.17458	null
2024-11-26	BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving	Teng Wang et.al.	2411.17404	null
2024-11-26	Joint Combinatorial Node Selection and Resource Allocations in the Lightning Network using Attention-based Reinforcement Learning	Mahdi Salahshour et.al.	2411.17353	null
2024-11-26	*SIL-RRT: Learning Sampling Distribution through Self Imitation Learning**	Xuzhe Dang et.al.	2411.17293	null
2024-11-26	LHPF: Look back the History and Plan for the Future in Autonomous Driving	Sheng Wang et.al.	2411.17253	null
2024-11-26	Self-reconfiguration Strategies for Space-distributed Spacecraft	Tianle Liu et.al.	2411.17137	null
2024-11-26	LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward Ensemble	Yujeong Lee et.al.	2411.17135	null
2024-11-25	Self-Generated Critiques Boost Reward Modeling for Language Models	Yue Yu et.al.	2411.16646	null
2024-11-25	Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation	Muhammad Burhan Hafez et.al.	2411.16532	link
2024-11-25	Reinforcement Learning for Bidding Strategy Optimization in Day-Ahead Energy Market	Luca Di Persio et.al.	2411.16519	null
2024-11-25	Unsupervised Event Outlier Detection in Continuous Time	Somjit Nath et.al.	2411.16427	null
2024-11-25	CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning	Duo Wu et.al.	2411.16313	null
2024-11-25	Probing for Consciousness in Machines	Mathis Immertreu et.al.	2411.16262	null
2024-11-25	Multi-Robot Reliable Navigation in Uncertain Topological Environments with Graph Attention Networks	Zhuoyuan Yu et.al.	2411.16134	null
2024-11-25	End-to-End Steering for Autonomous Vehicles via Conditional Imitation Co-Learning	Mahmoud M. Kishky et.al.	2411.16131	null
2024-11-25	Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision Masks	Rui Zuo et.al.	2411.16120	null
2024-11-25	M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling	Youngmin Oh et.al.	2411.16019	null
2024-11-22	WildLMa: Long Horizon Loco-Manipulation in the Wild	Ri-Zhao Qiu et.al.	2411.15131	null
2024-11-22	Learning-based Trajectory Tracking for Bird-inspired Flapping-Wing Robots	Jiaze Cai et.al.	2411.15130	null
2024-11-22	TÜLU 3: Pushing Frontiers in Open Language Model Post-Training	Nathan Lambert et.al.	2411.15124	link
2024-11-22	On Multi-Agent Inverse Reinforcement Learning	Till Freihaut et.al.	2411.15046	null
2024-11-22	Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium	Zeyang Li et.al.	2411.15036	null
2024-11-22	On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations	Guojun Xiong et.al.	2411.15014	null
2024-11-22	Free Energy Projective Simulation (FEPS): Active inference with interpretability	Joséphine Pazem et.al.	2411.14991	null
2024-11-22	Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation	Huy Le et.al.	2411.14913	null
2024-11-22	Segmenting Action-Value Functions Over Time-Scales in SARSA using TD( $Δ$ )	Mahammad Humayoo et.al.	2411.14783	null
2024-11-22	Enhancing Molecular Design through Graph-based Topological Reinforcement Learning	Xiangyu Zhang et.al.	2411.14726	null
2024-11-21	Multi-Agent Environments for Vehicle Routing Problems	Ricardo Gama et.al.	2411.14411	null
2024-11-21	Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions	Yu Zhao et.al.	2411.14405	link
2024-11-21	23 DoF Grasping Policies from a Raw Point Cloud	Martin Matak et.al.	2411.14400	null
2024-11-21	Model Checking for Reinforcement Learning in Autonomous Driving: One Can Do More Than You Think!	Rong Gu et.al.	2411.14375	null
2024-11-21	Convex Approximation of Probabilistic Reachable Sets from Small Samples Using Self-supervised Neural Networks	Jun Xiang et.al.	2411.14356	null
2024-11-21	Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect	Ojash Neopane et.al.	2411.14341	null
2024-11-21	Explainable Multi-Agent Reinforcement Learning for Extended Reality Codec Adaptation	Pedro Enrique Iturria-Rivera et.al.	2411.14264	null
2024-11-21	Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs	Zeyu Dong et.al.	2411.14256	null
2024-11-21	Natural Language Reinforcement Learning	Xidong Feng et.al.	2411.14251	link
2024-11-21	Umbrella Reinforcement Learning – computationally efficient tool for hard non-linear problems	Egor E. Nuzhin et.al.	2411.14117	null
2024-11-20	BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Davide Paglieri et.al.	2411.13543	link
2024-11-20	Metacognition for Unknown Situations and Environments (MUSE)	Rodolfo Valiente et.al.	2411.13537	null
2024-11-20	Robust Monocular Visual Odometry using Curriculum Learning	Assaf Lahiany et.al.	2411.13438	null
2024-11-20	A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback	Alireza Rashidi Laleh et.al.	2411.13410	null
2024-11-20	Fine-tuning Myoelectric Control through Reinforcement Learning in a Game Environment	Kilian Freitag et.al.	2411.13327	null
2024-11-20	Backward Stochastic Control System with Entropy Regularization	Ziyue Chen et.al.	2411.13219	null
2024-11-20	ViSTa Dataset: Do vision-language models understand sequential tasks?	Evžen Wybitul et.al.	2411.13211	link
2024-11-20	Engagement-Driven Content Generation with Large Language Models	Erica Coppolillo et.al.	2411.13187	null
2024-11-20	Learning Time-Optimal and Speed-Adjustable Tactile In-Hand Manipulation	Johannes Pitz et.al.	2411.13148	null
2024-11-20	ReinFog: A DRL Empowered Framework for Resource Management in Edge and Cloud Computing Environments	Zhiyu Wang et.al.	2411.13121	null
2024-11-19	ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models	Salma Kharrat et.al.	2411.12736	link
2024-11-19	Reinforcement Learning, Collusion, and the Folk Theorem	Galit Askenazi-Golan et.al.	2411.12725	null
2024-11-19	UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments	Chunru Lin et.al.	2411.12711	null
2024-11-19	Instant Policy: In-Context Imitation Learning via Graph Diffusion	Vitalis Vosylius et.al.	2411.12633	null
2024-11-19	Robotic transcatheter tricuspid valve replacement with hybrid enhanced intelligence: a new paradigm and first-in-vivo study	Shuangyi Wang et.al.	2411.12478	null
2024-11-19	Variable-Frequency Imitation Learning for Variable-Speed Motion	Nozomu Masuya et.al.	2411.12310	null
2024-11-19	Emergence of Implicit World Models from Mortal Agents	Kazuya Horibe et.al.	2411.12304	null
2024-11-19	DT-RaDaR: Digital Twin Assisted Robot Navigation using Differential Ray-Tracing	Sunday Amatare et.al.	2411.12284	null
2024-11-19	Error-Feedback Model for Output Correction in Bilateral Control-Based Imitation Learning	Hiroshi Sato et.al.	2411.12255	null
2024-11-19	Efficient Training in Multi-Agent Reinforcement Learning: A Communication-Free Framework for the Box-Pushing Problem	David Ge et.al.	2411.12246	null
2024-11-18	Design And Optimization Of Multi-rendezvous Manoeuvres Based On Reinforcement Learning And Convex Optimization	Antonio López Rivera et.al.	2411.11778	null
2024-11-18	High-Speed Cornering Control and Real-Vehicle Deployment for Autonomous Electric Vehicles	Shiyue Zhao et.al.	2411.11762	null
2024-11-18	Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework	Yannick Metz et.al.	2411.11761	null
2024-11-18	Aligning Few-Step Diffusion Models with Dense Reward Difference Learning	Ziyi Zhang et.al.	2411.11727	link
2024-11-18	Bitcoin Under Volatile Block Rewards: How Mempool Statistics Can Influence Bitcoin Mining	Roozbeh Sarenche et.al.	2411.11702	null
2024-11-18	Robust Reinforcement Learning under Diffusion Models for Data with Jumps	Chenyang Jiang et.al.	2411.11697	null
2024-11-18	Coevolution of Opinion Dynamics and Recommendation System: Modeling Analysis and Reinforcement Learning Based Manipulation	Yuhong Chen et.al.	2411.11687	null
2024-11-18	No-regret Exploration in Shuffle Private Reinforcement Learning	Shaojie Bai et.al.	2411.11647	null
2024-11-18	Signaling and Social Learning in Swarms of Robots	Leo Cazenille et.al.	2411.11616	null
2024-11-18	A Pre-Trained Graph-Based Model for Adaptive Sequencing of Educational Documents	Jean Vassoyan et.al.	2411.11520	null
2024-11-15	Mitigating Parameter Degeneracy using Joint Conditional Diffusion Model for WECC Composite Load Model in Power Systems	Feiqin Zhu et.al.	2411.10431	null
2024-11-15	Continual Adversarial Reinforcement Learning (CARL) of False Data Injection detection: forgetting and explainability	Pooja Aslami et.al.	2411.10367	null
2024-11-15	BMP: Bridging the Gap between B-Spline and Movement Primitives	Weiran Liao et.al.	2411.10336	null
2024-11-15	Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review	Hossein Hassani et.al.	2411.10268	null
2024-11-15	Learning Generalizable 3D Manipulation With 10 Demonstrations	Yu Ren et.al.	2411.10203	null
2024-11-15	The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning	Moritz Schneider et.al.	2411.10175	null
2024-11-15	Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles	Anant Garg et.al.	2411.10171	null
2024-11-15	Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention	Libo Wang et.al.	2411.10156	link
2024-11-15	That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design	Anna Goldie et.al.	2411.10053	null
2024-11-15	Enforcing Cooperative Safety for Reinforcement Learning-based Mixed-Autonomy Platoon Control	Jingyuan Zhou et.al.	2411.10031	null
2024-11-14	A Risk Sensitive Contract-unified Reinforcement Learning Approach for Option Hedging	Xianhua Peng et.al.	2411.09659	null
2024-11-14	Motion Before Action: Diffusing Object Motion as Manipulation Condition	Yup Su et.al.	2411.09658	null
2024-11-14	Tailoring interactions between active nematic defects with reinforcement learning	Carlos Floyd et.al.	2411.09588	null
2024-11-14	Developement of Reinforcement Learning based Optimisation Method for Side-Sill Design	Aditya Borse et.al.	2411.09499	null
2024-11-14	Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment	Yuang Cai et.al.	2411.09341	null
2024-11-14	Socio-Economic Consequences of Generative AI: A Review of Methodological Approaches	Carlos J. Costa et.al.	2411.09313	null
2024-11-14	Enhancing reinforcement learning for population setpoint tracking in co-cultures	Sebastián Espinel-Ríos et.al.	2411.09177	null
2024-11-14	Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging	Bo Wang et.al.	2411.09176	null
2024-11-14	Rationality based Innate-Values-driven Reinforcement Learning	Qin Yang et.al.	2411.09160	null
2024-11-14	Secrecy Energy Efficiency Maximization in IRS-Assisted VLC MISO Networks with RSMA: A DS-PPO approach	Yangbo Guo et.al.	2411.09146	null
2024-11-13	LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs	Piyush Jha et.al.	2411.08862	null
2024-11-13	Goal-oriented Semantic Communication for Robot Arm Reconstruction in Digital Twin: Feature and Temporal Selections	Shutong Chen et.al.	2411.08835	null
2024-11-13	Recommender systems and reinforcement learning for building control and occupant interaction: A text-mining driven review of scientific literature	Wenhao Zhang et.al.	2411.08734	null
2024-11-13	Joint Model Caching and Resource Allocation in Generative AI-Enabled Wireless Edge Networks	Zhang Liu et.al.	2411.08672	null
2024-11-13	Estimating unknown parameters in differential equations with a reinforcement learning based PSO method	Wenkui Sun et.al.	2411.08651	null
2024-11-13	Towards Secure Intelligent O-RAN Architecture: Vulnerabilities, Threats and Promising Technical Solutions using LLMs	Mojdeh Karbalaee Motalleb et.al.	2411.08640	null
2024-11-13	Robot See, Robot Do: Imitation Reward for Noisy Financial Environments	Sven Goluža et.al.	2411.08637	null
2024-11-13	Precision-Focused Reinforcement Learning Model for Robotic Object Pushing	Lara Bergmann et.al.	2411.08622	link
2024-11-13	Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent	Leonidas Askianakis et.al.	2411.08566	null
2024-11-13	Towards Practical Deep Schedulers for Allocating Cellular Radio Resources	Petteri Kela et.al.	2411.08529	null
2024-11-12	Learning Memory Mechanisms for Decision Making through Demonstrations	William Yue et.al.	2411.07954	link
2024-11-12	Doubly Mild Generalization for Offline Reinforcement Learning	Yixiu Mao et.al.	2411.07934	link
2024-11-12	Scaling policy iteration based reinforcement learning for unknown discrete-time linear systems	Zhen Pang et.al.	2411.07825	null
2024-11-12	Navigation with QPHIL: Quantizing Planner for Hierarchical Implicit Q-Learning	Alexi Canesse et.al.	2411.07760	null
2024-11-12	Optimizing Traffic Signal Control using High-Dimensional State Representation and Efficient Deep Reinforcement Learning	Lawrence Francis et.al.	2411.07759	null
2024-11-12	EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners	Niklas Hanselmann et.al.	2411.07719	null
2024-11-12	Test Where Decisions Matter: Importance-driven Testing for Deep Reinforcement Learning	Stefan Pranger et.al.	2411.07700	null
2024-11-12	Exploring Multi-Agent Reinforcement Learning for Unrelated Parallel Machine Scheduling	Maria Zampella et.al.	2411.07634	null
2024-11-12	Direct Preference Optimization Using Sparse Feature-Level Constraints	Qingyu Yin et.al.	2411.07618	null
2024-11-12	Entropy Controllable Direct Preference Optimization	Motoki Omura et.al.	2411.07595	null
2024-11-11	‘Explaining RL Decisions with Trajectories’: A Reproducibility Study	Karim Abdel Sadek et.al.	2411.07200	link
2024-11-11	Joint Age-State Belief is All You Need: Minimizing AoII via Pull-Based Remote Estimation	Ismail Cosandal et.al.	2411.07179	null
2024-11-11	Learning Multi-Agent Collaborative Manipulation for Long-Horizon Quadrupedal Pushing	Chuye Hong et.al.	2411.07104	null
2024-11-11	A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs	Myeongsoo Kim et.al.	2411.07098	null
2024-11-11	OCMDP: Observation-Constrained Markov Decision Process	Taiyi Wang et.al.	2411.07087	null
2024-11-11	To Train or Not to Train: Balancing Efficiency and Training Cost in Deep Reinforcement Learning for Mobile Edge Computing	Maddalena Boscaro et.al.	2411.07086	null
2024-11-11	Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching	Arnav Kumar Jain et.al.	2411.07007	link
2024-11-11	Enhancing Robot Assistive Behaviour with Reinforcement Learning and Theory of Mind	Antonio Andriella et.al.	2411.07003	link
2024-11-11	Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration	Xingrui Yu et.al.	2411.06965	null
2024-11-11	Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC	Aditya Soni et.al.	2411.06815	null
2024-11-08	Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles	Jonas Kiemel et.al.	2411.05784	null
2024-11-08	Tract-RLFormer: A Tract-Specific RL policy based Decoder-only Transformer Network	Ankita Joshi et.al.	2411.05757	null
2024-11-08	Topology-aware Reinforcement Feature Space Reconstruction for Graph Data	Wangyang Ying et.al.	2411.05742	null
2024-11-08	Renewable Energy Powered and Open RAN-based Architecture for 5G Fixed Wireless Access Provisioning in Rural Areas	Anselme Ndikumana et.al.	2411.05699	null
2024-11-08	Data-Driven Distributed Common Operational Picture from Heterogeneous Platforms using Multi-Agent Reinforcement Learning	Indranil Sur et.al.	2411.05683	null
2024-11-08	Digital Twin Backed Closed-Loops for Energy-Aware and Open RAN-based Fixed Wireless Access Serving Rural Areas	Anselme Ndikumana et.al.	2411.05664	null
2024-11-08	Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey	Zhihong Liu et.al.	2411.05614	null
2024-11-08	Smart navigation through a rotating barrier: Deep reinforcement learning with application to size-based separation of active microagents	Mohammad Hossein Masoudi et.al.	2411.05587	null
2024-11-08	Tangled Program Graphs as an alternative to DRL-based control algorithms for UAVs	Hubert Szolc et.al.	2411.05586	null
2024-11-08	Towards Active Flow Control Strategies Through Deep Reinforcement Learning	Ricard Montalà et.al.	2411.05536	null
2024-11-07	Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games	Usman Anwar et.al.	2411.04976	link
2024-11-07	A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model	Panwen Hu et.al.	2411.04942	null
2024-11-07	Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion	Kaizhe Hu et.al.	2411.04919	link
2024-11-07	Evaluating Robustness of Reinforcement Learning Algorithms for Autonomous Shipping	Bavo Lesy et.al.	2411.04915	null
2024-11-07	Think Smart, Act SMARL! Analyzing Probabilistic Logic Driven Safety in Multi-Agent Reinforcement Learning	Satchit Chatterji et.al.	2411.04867	link
2024-11-07	Asymptotic regularity of a generalised stochastic Halpern scheme with applications	Nicholas Pischke et.al.	2411.04845	null
2024-11-07	Plasticity Loss in Deep Reinforcement Learning: A Survey	Timo Klein et.al.	2411.04832	null
2024-11-07	Harnessing the Power of Gradient-Based Simulations for Multi-Objective Optimization in Particle Accelerators	Kishansingh Rajput et.al.	2411.04817	null
2024-11-07	AllGaits: Learning All Quadruped Gaits and Transitions	Guillaume Bellegarda et.al.	2411.04787	null
2024-11-07	Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning	Zuzanna Osika et.al.	2411.04784	link
2024-11-06	A Comparative Study of Deep Reinforcement Learning for Crop Production Management	Joseph Balderas et.al.	2411.04106	null
2024-11-06	Interpretable and Efficient Data-driven Discovery and Control of Distributed Systems	Florian Wolf et.al.	2411.04098	null
2024-11-06	Memorized action chunking with Transformers: Imitation learning for vision-based tissue surface scanning	Bochen Yang et.al.	2411.04050	null
2024-11-06	Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset	Alexandre Galashov et.al.	2411.04034	null
2024-11-06	Predicting and Publishing Accurate Imbalance Prices Using Monte Carlo Tree Search	Fabio Pavirani et.al.	2411.04011	null
2024-11-06	Object-Centric Dexterous Manipulation from Human Motion Data	Yuanpei Chen et.al.	2411.04005	null
2024-11-06	ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy	Chenrui Tie et.al.	2411.03990	null
2024-11-06	AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making	Yizhe Huang et.al.	2411.03865	link
2024-11-06	Beyond The Rainbow: High Performance Deep Reinforcement Learning On A Desktop PC	Tyler Clark et.al.	2411.03820	null
2024-11-06	From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning	Zhirui Deng et.al.	2411.03817	null
2024-11-05	Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy For Visuomotor Imitation Learning	George Jiayuan Gao et.al.	2411.03294	null
2024-11-05	Pre-trained Visual Dynamics Representations for Efficient Policy Learning	Hao Luo et.al.	2411.03169	null
2024-11-05	Hierarchical Orchestra of Policies	Thomas P Cannon et.al.	2411.03008	null
2024-11-05	Accelerating Task Generalisation with Multi-Level Hierarchical Options	Thomas P Cannon et.al.	2411.02998	null
2024-11-05	Autonomous Decision Making for UAV Cooperative Pursuit-Evasion Game with Reinforcement Learning	Yang Zhao et.al.	2411.02983	null
2024-11-05	Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation	Francisco Giral et.al.	2411.02975	null
2024-11-05	Embedding Safety into RL: A New Take on Trust Region Methods	Nikola Milosevic et.al.	2411.02957	null
2024-11-05	The Unreasonable Effectiveness of LLMs for Query Optimization	Peter Akioyamen et.al.	2411.02862	link
2024-11-05	ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate	Shohei Taniguchi et.al.	2411.02853	link
2024-11-05	When to Localize? A Risk-Constrained Reinforcement Learning Approach	Chak Lam Shek et.al.	2411.02788	null
2024-11-04	Simulation of Nanorobots with Artificial Intelligence and Reinforcement Learning for Advanced Cancer Cell Detection and Tracking	Shahab Kavousinejad et.al.	2411.02345	link
2024-11-04	WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning	Zehan Qi et.al.	2411.02337	null
2024-11-04	Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback	Marcus Williams et.al.	2411.02306	link
2024-11-04	N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs	Ilya Zisman et.al.	2411.01958	null
2024-11-04	RoboCrowd: Scaling Robot Data Collection through Crowdsourcing	Suvir Mirchandani et.al.	2411.01915	null
2024-11-04	Efficient Active Imitation Learning with Random Network Distillation	Emilien Biré et.al.	2411.01894	null
2024-11-04	Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback	Guan-Ting Lin et.al.	2411.01834	null
2024-11-04	Risk-sensitive control as inference with Rényi divergence	Kaito Ito et.al.	2411.01827	null
2024-11-04	IRS-Enhanced Secure Semantic Communication Networks: Cross-Layer and Context-Awared Resource Allocation	Lingyi Wang et.al.	2411.01821	null
2024-11-04	So You Think You Can Scale Up Autonomous Robot Data Collection?	Suvir Mirchandani et.al.	2411.01813	null
2024-10-31	EgoMimic: Scaling Imitation Learning via Egocentric Video	Simar Kareer et.al.	2410.24221	link
2024-10-31	Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use	Jiajun Xi et.al.	2410.24218	link
2024-10-31	ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs	Yuchen Yang et.al.	2410.24214	null
2024-10-31	Zonal RL-RRT: Integrated RL-RRT Path Planning with Collision Probability and Zone Connectivity	AmirMohammad Tahmasbi et.al.	2410.24205	link
2024-10-31	DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning	Zhenyu Jiang et.al.	2410.24185	null
2024-10-31	Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning	Jiaqi Liu et.al.	2410.24152	null
2024-10-31	Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers	Kai Yan et.al.	2410.24108	link
2024-10-31	Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning	Nabil Omi et.al.	2410.24096	null
2024-10-31	3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing	Binghao Huang et.al.	2410.24091	null
2024-10-31	Demystifying Linear MDPs and Novel Dynamics Aggregation Framework	Joongkyu Lee et.al.	2410.24089	null
2024-10-30	Keypoint Abstraction using Large Models for Object-Relative Imitation Learning	Xiaolin Fang et.al.	2410.23254	null
2024-10-30	Carrot and Stick: Eliciting Comparison Data and Beyond	Yiling Chen et.al.	2410.23243	null
2024-10-30	A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment	Matteo G. Mecattaf et.al.	2410.23242	null
2024-10-30	COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences	Yixin Liu et.al.	2410.23223	link
2024-10-31	Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval	Sheryl Hsu et.al.	2410.23214	null
2024-10-30	Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks	Michael Matthews et.al.	2410.23208	null
2024-10-30	Energy-Efficient Intra-Domain Network Slicing for Multi-Layer Orchestration in Intelligent-Driven Distributed 6G Networks: Learning Generic Assignment Skills with Unsupervised Reinforcement Learning	Navideh Ghafouri et.al.	2410.23161	null
2024-10-30	VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning	Yichao Liang et.al.	2410.23156	null
2024-10-30	From Hype to Reality: The Road Ahead of Deploying DRL in 6G Networks	Haiyuan Li et.al.	2410.23086	null
2024-10-30	Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation	Samuele Peri et.al.	2410.23031	null
2024-10-29	Environment as Policy: Learning to Race in Unseen Tracks	Hongze Wang et.al.	2410.22308	null
2024-10-29	EconoJax: A Fast & Scalable Economic Simulation in Jax	Koen Ponse et.al.	2410.22165	link
2024-10-29	Learning Successor Features the Simple Way	Raymond Chua et.al.	2410.22133	null
2024-10-29	PC-Gym: Benchmark Environments For Process Control Problems	Maximilian Bloor et.al.	2410.22093	null
2024-10-29	PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference	Kendong Liu et.al.	2410.21966	null
2024-10-29	Human-Readable Programs as Actors of Reinforcement Learning Agents Using Critic-Moderated Evolution	Senne Deproost et.al.	2410.21940	link
2024-10-29	Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning	Jianlan Luo et.al.	2410.21845	link
2024-10-29	Robot Policy Learning with Temporal Optimal Transport Reward	Yuwei Fu et.al.	2410.21795	link
2024-10-29	Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem	Shaan Ul Haque et.al.	2410.21704	null
2024-10-29	Sequential choice in ordered bundles	Rajeev Kohli et.al.	2410.21670	null
2024-10-28	LongReward: Improving Long-context Large Language Models with AI Feedback	Jiajie Zhang et.al.	2410.21252	link
2024-10-28	Quantum Reinforcement Learning-Based Two-Stage Unit Commitment Framework for Enhanced Power Systems Robustness	Xiang Wei et.al.	2410.21240	null
2024-10-28	Offline Reinforcement Learning With Combinatorial Action Spaces	Matthew Landers et.al.	2410.21151	null
2024-10-28	Robustness and Generalization in Quantum Reinforcement Learning via Lipschitz Regularization	Nico Meyer et.al.	2410.21117	link
2024-10-28	Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment	Yi Zheng et.al.	2410.21109	null
2024-10-28	Stronger Regret Bounds for Safe Online Reinforcement Learning in the Linear Quadratic Regulator	Benjamin Schiffer et.al.	2410.21081	null
2024-10-28	Getting By Goal Misgeneralization With a Little Help From a Mentor	Tu Trinh et.al.	2410.21052	null
2024-10-28	FairStream: Fair Multimedia Streaming Benchmark for Reinforcement Learning Agents	Jannis Weil et.al.	2410.21029	null
2024-10-28	Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies	Franck Djeumou et.al.	2410.20990	null
2024-10-28	BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks	Yunhan Zhao et.al.	2410.20971	null
2024-10-25	Adversarial Environment Design via Regret-Guided Diffusion Models	Hojun Chung et.al.	2410.19715	null
2024-10-25	DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control	Md Faizal Karim et.al.	2410.19712	null
2024-10-25	MILES: Making Imitation Learning Easy with Self-Supervision	Georgios Papagiannis et.al.	2410.19693	null
2024-10-25	Automated generation of photonic circuits for Bell tests with homodyne measurements	Corentin Lanore et.al.	2410.19670	null
2024-10-25	MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services	Hongjia Wu et.al.	2410.19665	null
2024-10-25	Shared Control with Black Box Agents using Oracle Queries	Inbal Avraham et.al.	2410.19612	null
2024-10-25	OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization	Hongliang He et.al.	2410.19609	link
2024-10-25	Diverse Sign Language Translation	Xin Shen et.al.	2410.19586	null
2024-10-25	Robotic Learning in your Backyard: A Neural Simulator from Open Source Components	Liyou Zhou et.al.	2410.19564	null
2024-10-25	AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design	Francisco Erivaldo Fernandes Junior et.al.	2410.19528	null
2024-10-24	SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment	Caelan Garrett et.al.	2410.18907	null
2024-10-24	Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks	Graziano A. Manduzio et.al.	2410.18890	null
2024-10-24	Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences	Weijian Luo et.al.	2410.18881	null
2024-10-24	Learning Collusion in Episodic, Inventory-Constrained Markets	Paul Friedrich et.al.	2410.18871	null
2024-10-24	Towards Visual Text Design Transfer Across Languages	Yejin Choi et.al.	2410.18823	null
2024-10-24	PointPatchRL – Masked Reconstruction Improves Reinforcement Learning on Point Clouds	Balázs Gyenes et.al.	2410.18800	null
2024-10-24	Adapting MLOps for Diverse In-Network Intelligence in 6G Era: Challenges and Solutions	Peizheng Li et.al.	2410.18793	null
2024-10-24	Data Scaling Laws in Imitation Learning for Robotic Manipulation	Fanqi Lin et.al.	2410.18647	link
2024-10-24	Multi-agent cooperation through learning-aware policy gradients	Alexander Meulemans et.al.	2410.18636	null
2024-10-24	Leveraging Graph Neural Networks and Multi-Agent Reinforcement Learning for Inventory Control in Supply Chains	Niki Kotecha et.al.	2410.18631	null
2024-10-23	Prioritized Generative Replay	Renhao Wang et.al.	2410.18082	null
2024-10-23	Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration	Max Wilcoxson et.al.	2410.18076	link
2024-10-23	SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation	Zihan Zhou et.al.	2410.18065	null
2024-10-23	Cross-lingual Transfer of Reward Models in Multilingual Alignment	Jiwoo Hong et.al.	2410.18027	link
2024-10-23	Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning	Nguyen Van Huynh et.al.	2410.17971	null
2024-10-23	Slot: Provenance-Driven APT Detection through Graph Reinforcement Learning	Wei Qiao et.al.	2410.17910	null
2024-10-23	Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity	Philip Amortila et.al.	2410.17904	null
2024-10-23	Scalable Offline Reinforcement Learning for Mean Field Games	Axel Brunnbauer et.al.	2410.17898	null
2024-10-23	Learning Versatile Skills with Curriculum Masking	Yao Tang et.al.	2410.17744	link
2024-10-23	Optimizing Load Scheduling in Power Grids Using Reinforcement Learning and Markov Decision Processes	Dongwen Luo et.al.	2410.17696	null
2024-10-22	Few-shot In-Context Preference Learning Using Large Language Models	Chao Yu et.al.	2410.17233	null
2024-10-22	DyPNIPP: Predicting Environment Dynamics for RL-based Robust Informative Path Planning	Srujan Deolasee et.al.	2410.17186	null
2024-10-22	Reinforcement learning on structure-conditioned categorical diffusion for protein inverse folding	Yasha Ektefaie et.al.	2410.17173	link
2024-10-22	Reinforcement Learning for Data-Driven Workflows in Radio Interferometry. I. Principal Demonstration in Calibration	Brian M. Kirk et.al.	2410.17135	null
2024-10-22	Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards	Alexander G. Padula et.al.	2410.17126	link
2024-10-22	Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning	Haining Wang et.al.	2410.17088	link
2024-10-22	Delay-Constrained Grant-Free Random Access in MIMO Systems: Distributed Pilot Allocation and Power Control	Jianan Bai et.al.	2410.17068	null
2024-10-22	Optimal Design for Reward Modeling in RLHF	Antoine Scheid et.al.	2410.17055	null
2024-10-22	Proleptic Temporal Ensemble for Improving the Speed of Robot Tasks Generated by Imitation Learning	Hyeonjun Park et.al.	2410.16981	null
2024-10-22	Safe Load Balancing in Software-Defined-Networking	Lam Dinh et.al.	2410.16846	null
2024-10-21	Improve Vision Language Model Chain-of-thought Reasoning	Ruohong Zhang et.al.	2410.16198	link
2024-10-21	RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style	Yantao Liu et.al.	2410.16184	link
2024-10-21	SMART: Self-learning Meta-strategy Agent for Reasoning Tasks	Rongxing Liu et.al.	2410.16128	link
2024-10-21	Statistical Inference for Temporal Difference Learning with Linear Function Approximation	Weichen Wu et.al.	2410.16106	null
2024-10-21	A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models	Yue Deng et.al.	2410.16024	link
2024-10-21	Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality	Raghav Bongole et.al.	2410.16013	null
2024-10-21	ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning	Yue Yang et.al.	2410.15994	null
2024-10-21	Learning Quadrotor Control From Visual Features Using Differentiable Simulation	Johannes Heeg et.al.	2410.15979	null
2024-10-21	Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning	Hanlin Yang et.al.	2410.15910	null
2024-10-21	FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL	Woosung Koh et.al.	2410.15876	link
2024-10-18	Online Reinforcement Learning with Passive Memory	Anay Pattanaik et.al.	2410.14665	null
2024-10-18	A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning	Shengjie Sun et.al.	2410.14660	null
2024-10-18	Harnessing Causality in Reinforcement Learning With Bagged Decision Times	Daiqi Gao et.al.	2410.14659	null
2024-10-18	Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments	Mariusz Wisniewski et.al.	2410.14616	link
2024-10-18	Streaming Deep Reinforcement Learning Finally Works	Mohamed Elsayed et.al.	2410.14606	link
2024-10-18	Reinforcement Learning in Non-Markov Market-Making	Luca Lalor et.al.	2410.14504	null
2024-10-18	Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping	Kavinayan P. Sivakumar et.al.	2410.14484	null
2024-10-18	DRL Optimization Trajectory Generation via Wireless Network Intent-Guided Diffusion Models for Optimizing Resource Allocation	Junjie Wu et.al.	2410.14481	null
2024-10-18	From Simple to Complex: Knowledge Transfer in Safe and Efficient Reinforcement Learning for Autonomous Driving	Rongliang Zhou et.al.	2410.14468	null
2024-10-18	MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation	Toby Godfrey et.al.	2410.14383	null
2024-10-17	Diffusing States and Matching Scores: A New Framework for Imitation Learning	Runzhe Wu et.al.	2410.13855	link
2024-10-17	ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization	Chen Bo Calvin Zhang et.al.	2410.13837	link
2024-10-17	A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement	Hui Yuan et.al.	2410.13828	link
2024-10-17	Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation	Jean-Pierre Sleiman et.al.	2410.13817	null
2024-10-17	Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?	Argyrios Gerogiannis et.al.	2410.13772	null
2024-10-17	Transformer Guided Coevolution: Improved Team Formation in Multiagent Adversarial Games	Pranav Rajbhandari et.al.	2410.13769	null
2024-10-17	Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design	Chenyu Wang et.al.	2410.13643	link
2024-10-17	Ornstein-Uhlenbeck Adaptation as a Mechanism for Learning in Brains and Machines	Jesus Garcia Fernandez et.al.	2410.13563	null
2024-10-17	Contracting With a Reinforcement Learning Agent by Playing Trick or Treat	Matteo Bollini et.al.	2410.13520	null
2024-10-17	Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning	Yoav Alon et.al.	2410.13501	null
2024-10-16	Neural-based Control for CubeSat Docking Maneuvers	Matteo Stoisa et.al.	2410.12703	null
2024-10-16	Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach	Henrique Donâncio et.al.	2410.12598	null
2024-10-16	Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving	Sihao Wu et.al.	2410.12568	null
2024-10-16	Spectrum Sharing using Deep Reinforcement Learning in Vehicular Networks	Riya Dinesh Deshpande et.al.	2410.12521	null
2024-10-16	Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL	Jared Joselowitz et.al.	2410.12491	null
2024-10-16	SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling	Loris Gaven et.al.	2410.12481	null
2024-10-16	Sharpness-Aware Black-Box Optimization	Feiyang Ye et.al.	2410.12457	null
2024-10-16	AoI-Aware Resource Allocation for Smart Multi-QoS Provisioning	Jingqing Wang et.al.	2410.12384	null
2024-10-16	PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking	Markus J. Buehler et.al.	2410.12375	link
2024-10-16	GAN Based Top-Down View Synthesis in Reinforcement Learning Environments	Usama Younus et.al.	2410.12372	null
2024-10-15	Molecular Quantum Control Algorithm Design by Reinforcement Learning	Anastasia Pipi et.al.	2410.11839	null
2024-10-15	Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions	Ayush Jain et.al.	2410.11833	null
2024-10-15	Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies	Zixuan Chen et.al.	2410.11825	null
2024-10-15	Solving The Dynamic Volatility Fitting Problem: A Deep Reinforcement Learning Approach	Emmanuel Gnabeyeu et.al.	2410.11789	null
2024-10-15	Zero-shot Model-based Reinforcement Learning using Large Language Models	Abdelhakim Benechehab et.al.	2410.11711	link
2024-10-15	BlendRL: A Framework for Merging Symbolic and Neural Policy Learning	Hikaru Shindo et.al.	2410.11689	null
2024-10-15	Understanding Likelihood Over-optimisation in Direct Alignment Algorithms	Zhengyan Shi et.al.	2410.11677	null
2024-10-15	Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents	Federico Pizarro Bejarano et.al.	2410.11671	link
2024-10-15	Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search	Jiamian Li et.al.	2410.11642	null
2024-10-15	DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment	Wendi Chen et.al.	2410.11584	link
2024-10-14	Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation	Youwei Yu et.al.	2410.10766	null
2024-10-14	Online Statistical Inference for Time-varying Sample-averaged Q-learning	Saunak Kumar Panda et.al.	2410.10737	null
2024-10-14	Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach	Rory Young et.al.	2410.10674	null
2024-10-14	Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning	William A. Stigall et.al.	2410.10660	null
2024-10-14	DR-MPC: Deep Residual Model Predictive Control for Real-world Social Navigation	James R. Han et.al.	2410.10646	null
2024-10-14	Traversability-Aware Legged Navigation by Learning from Real-World Visual Data	Hongbo Zhang et.al.	2410.10621	null
2024-10-14	Online waveform selection for cognitive radar	Thulasi Tholeti et.al.	2410.10591	null
2024-10-14	STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack	Naman Gupta et.al.	2410.10584	null
2024-10-14	Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes	Juan Sebastian Rojas et.al.	2410.10578	null
2024-10-14	Continual Deep Reinforcement Learning to Prevent Catastrophic Forgetting in Jamming Mitigation	Kemal Davaslioglu et.al.	2410.10521	null
2024-10-11	Hierarchical Universal Value Function Approximators	Rushiv Arora et.al.	2410.08997	null
2024-10-11	Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control	Devdhar Patel et.al.	2410.08979	null
2024-10-11	MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL	Claas A Voelcker et.al.	2410.08896	null
2024-10-11	Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient	Wenlong Wang et.al.	2410.08893	link
2024-10-11	Adaptive optimization of wave energy conversion in oscillatory wave surge converters via SPH simulation and deep reinforcement learning	Mai Ye et.al.	2410.08871	null
2024-10-11	Can we hop in general? A discussion of benchmark selection and design using the Hopper environment	Claas A Voelcker et.al.	2410.08870	null
2024-10-11	Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving	Zijiang Yan et.al.	2410.08854	null
2024-10-11	Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback	Michelle Zhao et.al.	2410.08852	null
2024-10-11	Public Transport Network Design for Equality of Accessibility via Message Passing Neural Networks and Reinforcement Learning	Duo Wang et.al.	2410.08841	null
2024-10-11	SOLD: Reinforcement Learning with Slot Object-Centric Latent Dynamics	Malte Mosbach et.al.	2410.08822	null
2024-10-10	GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment	Yuancheng Xu et.al.	2410.08193	null
2024-10-10	Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning	Amrith Setlur et.al.	2410.08146	null
2024-10-10	VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers	Jianing Qi et.al.	2410.08048	null
2024-10-10	Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching	Xiaoshan Lin et.al.	2410.08022	null
2024-10-10	Neuroplastic Expansion in Deep Reinforcement Learning	Jiashun Liu et.al.	2410.07994	null
2024-10-10	Variational Inequality Methods for Multi-Agent Reinforcement Learning: Performance and Stability Gains	Baraah A. M. Sidahmed et.al.	2410.07976	null
2024-10-10	AI Surrogate Model for Distributed Computing Workloads	David K. Park et.al.	2410.07940	null
2024-10-10	Offline Hierarchical Reinforcement Learning via Inverse Optimization	Carolin Schmidt et.al.	2410.07933	null
2024-10-10	Efficient Reinforcement Learning with Large Language Model Priors	Xue Yan et.al.	2410.07927	null
2024-10-10	Meta-Learning Integration in Hierarchical Reinforcement Learning for Advanced Task Complexity	Arash Khajooeinejad et.al.	2410.07921	link
2024-10-09	One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation	Fabian Paischer et.al.	2410.07170	null
2024-10-09	Retrieval-Augmented Decision Transformer: External Memory for In-context RL	Thomas Schmied et.al.	2410.07071	null
2024-10-09	Safe Reinforcement Learning Filter for Multicopter Collision-Free Tracking under disturbances	Qihan Qi et.al.	2410.06852	null
2024-10-09	A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering	Qihan Qi et.al.	2410.06847	null
2024-10-09	Transfer Learning for a Class of Cascade Dynamical Systems	Shima Rabiei et.al.	2410.06828	null
2024-10-09	Deep End-to-End Survival Analysis with Temporal Consistency	Mariana Vargas Vieyra et.al.	2410.06786	null
2024-10-09	Q-WSL:Leveraging Dynamic Programming for Weighted Supervised Learning in Goal-conditioned RL	Xing Lei et.al.	2410.06648	null
2024-10-09	Variations in Multi-Agent Actor-Critic Frameworks for Joint Optimizations in UAV Swarm Networks: Recent Evolution, Challenges, and Directions	Muhammad Morshed Alam et.al.	2410.06627	null
2024-10-09	Effective Exploration Based on the Structural Information Principles	Xianghua Zeng et.al.	2410.06621	null
2024-10-09	Disturbance Observer-based Control Barrier Functions with Residual Model Learning for Safe Reinforcement Learning	Dvij Kalaria et.al.	2410.06570	null
2024-10-07	DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control	Kaifeng Zhao et.al.	2410.05260	null
2024-10-07	SePPO: Semi-Policy Preference Optimization for Diffusion Alignment	Daoan Zhang et.al.	2410.05255	link
2024-10-07	ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control	Ehsan Futuhi et.al.	2410.05225	null
2024-10-07	Smart Jamming Attack and Mitigation on Deep Transfer Reinforcement Learning Enabled Resource Allocation for Network Slicing	Shavbo Salehi et.al.	2410.05153	null
2024-10-07	PAMLR: A Passive-Active Multi-Armed Bandit-Based Solution for LoRa Channel Allocation	Jihoon Yun et.al.	2410.05147	null
2024-10-07	Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning	Ayano Hiranaka et.al.	2410.05116	null
2024-10-07	AlphaRouter: Quantum Circuit Routing with Reinforcement Learning and Tree Search	Wei Tang et.al.	2410.05115	null
2024-10-07	Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools	Filippo A. Spinelli et.al.	2410.05093	null
2024-10-07	HE-Drive: Human-Like End-to-End Driving with Vision Language Models	Junming Wang et.al.	2410.05051	null
2024-10-07	Active Fine-Tuning of Generalist Policies	Marco Bagatella et.al.	2410.05026	null
2024-10-04	Learning Humanoid Locomotion over Challenging Terrain	Ilija Radosavovic et.al.	2410.03654	null
2024-10-04	Aligning LLMs with Individual Preferences via Interaction	Shujin Wu et.al.	2410.03642	link
2024-10-04	Robust Offline Imitation Learning from Diverse Auxiliary Data	Udita Ghosh et.al.	2410.03626	null
2024-10-04	Open-World Reinforcement Learning over Long Short-Term Imagination	Jiajian Li et.al.	2410.03618	null
2024-10-04	Training on more Reachable Tasks for Generalisation in Reinforcement Learning	Max Weltevrede et.al.	2410.03565	null
2024-10-04	GAP-RL: Grasps As Points for RL Towards Dynamic Object Grasping	Pengwei Xie et.al.	2410.03509	null
2024-10-04	STREAMS: An Assistive Multimodal AI Framework for Empowering Biosignal Based Robotic Controls	Ali Rabiee et.al.	2410.03486	null
2024-10-04	Deep Reinforcement Learning for Delay-Optimized Task Offloading in Vehicular Fog Computin	Mohammad Parsa Toopchinezhad et.al.	2410.03472	null
2024-10-04	CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control	Guy Tevet et.al.	2410.03441	link
2024-10-04	ToolGen: Unified Tool Retrieval and Calling via Generation	Renxi Wang et.al.	2410.03439	link
2024-10-03	ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI	Ahmad Elawady et.al.	2410.02751	link
2024-10-03	MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions	Yekun Chai et.al.	2410.02743	link
2024-10-03	DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects	Zhaowei Wang et.al.	2410.02730	link
2024-10-03	Grounded Answers for Multi-agent Decision-making Problem through Generative World Model	Zeyang Liu et.al.	2410.02664	null
2024-10-03	Beyond Expected Returns: A Policy Gradient Algorithm for Cumulative Prospect Theoretic Reinforcement Learning	Olivier Lepel et.al.	2410.02605	null
2024-10-03	Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance	Joshua McClellan et.al.	2410.02581	null
2024-10-03	Machine Learning Approaches for Active Queue Management: A Survey, Taxonomy, and Future Directions	Mohammad Parsa Toopchinezhad et.al.	2410.02563	null
2024-10-03	Semantic-Guided RL for Interpretable Feature Engineering	Mohamed Bouadi et.al.	2410.02519	null
2024-10-03	Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments	Vasanth Reddy Baddam et.al.	2410.02516	null
2024-10-03	A Hitchhiker’s Guide To Active Motion	Tobias Plasczyk et.al.	2410.02515	null
2024-10-02	Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space	Yangming Li et.al.	2410.01796	null
2024-10-02	Open Human-Robot Collaboration using Decentralized Inverse Reinforcement Learning	Prasanth Sengadu Suresh et.al.	2410.01790	null
2024-10-02	Investigating on RLHF methodology	Alexey Kutalev et.al.	2410.01789	null
2024-10-02	Social coordination perpetuates stereotypic expectations and behaviors across generations in deep multi-agent reinforcement learning	Rebekah A. Gelpí et.al.	2410.01763	null
2024-10-02	PreND: Enhancing Intrinsic Motivation in Reinforcement Learning through Pre-trained Network Distillation	Mohammadamin Davoodabadi et.al.	2410.01745	null
2024-10-02	Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning	Xingrui Gu et.al.	2410.01739	null
2024-10-02	Evaluating Robustness of Reward Models for Mathematical Reasoning	Sunghwan Kim et.al.	2410.01729	null
2024-10-02	Performant, Memory Efficient and Scalable Multi-Agent Reinforcement Learning	Omayma Mahjoub et.al.	2410.01706	null
2024-10-02	VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment	Amirhossein Kazemnejad et.al.	2410.01679	link
2024-10-02	Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning	Jason Piquenot et.al.	2410.01661	null
2024-09-30	Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning	Zhishuai Liu et.al.	2409.20521	null
2024-09-30	Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation	Fukang Liu et.al.	2409.20514	null
2024-09-30	The Perfect Blend: Redefining RLHF with Mixture of Judges	Tengyu Xu et.al.	2409.20370	null
2024-10-01	Enhancing GANs with Contrastive Learning-Based Multistage Progressive Finetuning SNN and RL-Based External Optimization	Osama Mustafa et.al.	2409.20340	null
2024-09-30	MARLadona – Towards Cooperative Team Play Using Multi-Agent Reinforcement Learning	Zichong Li et.al.	2409.20326	null
2024-09-30	RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning	Yuxuan Wu et.al.	2409.20291	null
2024-09-30	Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning	Junlin Lu et.al.	2409.20258	link
2024-09-30	Professor X: Manipulating EEG BCI with Invisible and Robust Backdoor Attack	Xuan-Hao Liu et.al.	2409.20158	null
2024-09-30	GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation	Yangtao Chen et.al.	2409.20154	null
2024-09-30	DRLinSPH: An open-source platform using deep reinforcement learning and SPHinXsys for fluid-structure-interaction problems	Mai Ye et.al.	2409.20134	null
2024-09-27	Robust Deep Reinforcement Learning for Volt-VAR Optimization in Active Distribution System under Uncertainty	Zhengrong Chen et.al.	2409.18937	null
2024-09-27	HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models	Yu Zhou et.al.	2409.18893	null
2024-09-27	ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning	Jannis Becktepe et.al.	2409.18827	link
2024-09-27	LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis	Hamed Babaei Giglou et.al.	2409.18812	null
2024-09-27	Autoregressive Policy Optimization for Constrained Allocation Tasks	David Winkel et.al.	2409.18735	link
2024-09-27	Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning	Sheikh Salman Hassan et.al.	2409.18718	null
2024-09-27	Refutation of Spectral Graph Theory Conjectures with Search Algorithms)	Milo Roucairol et.al.	2409.18626	null
2024-09-27	TemporalPaD: a reinforcement-learning framework for temporal feature representation and dimension reduction	Xuechen Mu et.al.	2409.18597	null
2024-09-27	Climate Adaptation with Reinforcement Learning: Experiments with Flooding and Transportation in Copenhagen	Miguel Costa et.al.	2409.18574	null
2024-09-27	Cost-Aware Dynamic Cloud Workflow Scheduling using Self-Attention and Evolutionary Reinforcement Learning	Ya Shen et.al.	2409.18444	null
2024-09-26	Inverse Reinforcement Learning with Multiple Planning Horizons	Jiayu Yao et.al.	2409.18051	null
2024-09-26	Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles	Lewei He et.al.	2409.18014	null
2024-09-26	LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots	Peilin Wu et.al.	2409.17992	null
2024-09-26	Navigation in a simplified Urban Flow through Deep Reinforcement Learning	Federica Tonti et.al.	2409.17922	null
2024-09-26	Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions	David Olivares et.al.	2409.17896	null
2024-09-26	Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness	Jian Li et.al.	2409.17791	link
2024-09-26	Robust Ladder Climbing with a Quadrupedal Robot	Dylan Vogel et.al.	2409.17731	null
2024-09-26	Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization	Kaden Uhlig et.al.	2409.17673	null
2024-09-26	Hierarchical End-to-End Autonomous Driving: Integrating BEV Perception with Deep Reinforcement Learning	Siyi Lu et.al.	2409.17659	null
2024-09-26	FactorSim: Generative Simulation via Factorized Representation	Fan-Yun Sun et.al.	2409.17652	null
2024-09-25	Learning with Dynamics: Autonomous Regulation of UAV Based Communication Networks with Dynamic UAV Crew	Ran Zhang et.al.	2409.17139	null
2024-09-25	Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action	Xin Chen et.al.	2409.17138	null
2024-09-25	On-orbit Servicing for Spacecraft Collision Avoidance With Autonomous Decision Making	Susmitha Patnala et.al.	2409.17125	null
2024-09-25	AI-Driven Risk-Aware Scheduling for Active Debris Removal Missions	Antoine Poupon et.al.	2409.17012	null
2024-09-25	Multi-Robot Informative Path Planning for Efficient Target Mapping using Deep Reinforcement Learning	Apoorva Vashisth et.al.	2409.16967	link
2024-09-25	Dynamic Obstacle Avoidance through Uncertainty-Based Adaptive Planning with Diffusion	Vineet Punyamoorty et.al.	2409.16950	null
2024-09-25	Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering	Wanqi Yang et.al.	2409.16909	null
2024-09-25	Revisiting Space Mission Planning: A Reinforcement Learning-Guided Approach for Multi-Debris Rendezvous	Agni Bandyopadhyay et.al.	2409.16882	null
2024-09-25	Behavior evolution-inspired approach to walking gait reinforcement training for quadruped robots	Yu Wang et.al.	2409.16862	null
2024-09-25	Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing	Lyudong Jin et.al.	2409.16832	null
2024-09-24	A Critical Review of Safe Reinforcement Learning Techniques in Smart Grid Applications	Van-Hai Bui et.al.	2409.16256	null
2024-09-24	Context-Based Meta Reinforcement Learning for Robust and Adaptable Peg-in-Hole Assembly Tasks	Ahmed Shokry et.al.	2409.16208	null
2024-09-24	Microsecond-Latency Feedback at a Particle Accelerator by Online Reinforcement Learning on Hardware	Luca Scomparin et.al.	2409.16177	null
2024-09-24	The Digital Transformation in Health: How AI Can Improve the Performance of Health Systems	África Periáñez et.al.	2409.16098	null
2024-09-24	Whole-body end-effector pose tracking	Tifanny Portela et.al.	2409.16048	null
2024-09-24	Bridging Environments and Language with Rendering Functions and Vision-Language Models	Theo Cachet et.al.	2409.16024	null
2024-09-24	Provably Efficient Exploration in Inverse Constrained Reinforcement Learning	Bo Yue et.al.	2409.15963	null
2024-09-24	Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning	Sukai Huang et.al.	2409.15922	null
2024-09-24	Multi-UAV Pursuit-Evasion with Online Planning in Unknown Environments by Deep Reinforcement Learning	Jiayu Chen et.al.	2409.15866	null
2024-09-24	Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection	Matteo Zecchin et.al.	2409.15844	null
2024-09-18	DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control	Zichen Jeff Cui et.al.	2409.12192	null
2024-09-18	Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games	Ravi Pandya et.al.	2409.12153	null
2024-09-18	Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features	Jiuqi Wang et.al.	2409.12135	null
2024-09-18	Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement	An Yang et.al.	2409.12122	null
2024-09-18	IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition	Rui Liu et.al.	2409.12092	null
2024-09-18	Generalized Robot Learning Framework	Jiahuan Yan et.al.	2409.12061	null
2024-09-23	Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning	Jonas Günster et.al.	2409.12045	link
2024-09-18	Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning	Claude Formanek et.al.	2409.12001	null
2024-09-18	Data-Efficient Quadratic Q-Learning Using LMIs	J. S. van Hulst et.al.	2409.11986	null
2024-09-18	Reinforcement Learning with Lie Group Orientations for Robotics	Martin Schuck et.al.	2409.11935	null
2024-09-17	UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning	Kathakoli Sengupta et.al.	2409.11403	null
2024-09-17	Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids	Caio Fabio Oliveira da Silva et.al.	2409.11267	null
2024-09-17	Attacking Slicing Network via Side-channel Reinforcement Learning Attack	Wei Shao et.al.	2409.11258	null
2024-09-17	LLM-as-a-Judge & Reward Model: What They Can and Cannot Do	Guijin Son et.al.	2409.11239	null
2024-09-17	Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems	Jake Welde et.al.	2409.11238	null
2024-09-17	Linear Jamming Bandits: Learning to Jam 5G-based Coded Communications Systems	Zachary Schutz et.al.	2409.11191	null
2024-09-17	Preventing Unconstrained CBF Safety Filters Caused by Invalid Relative Degree Assumptions	Lukas Brunke et.al.	2409.11171	null
2024-09-17	Co-Designing Tools and Control Policies for Robust Manipulation	Yifei Dong et.al.	2409.11113	null
2024-09-17	Reactive Environments for Active Inference Agents with RxEnvironments.jl	Wouter W. L. Nuijten et.al.	2409.11087	link
2024-09-17	A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler	Nazim Bendib et.al.	2409.11068	null
2024-09-16	Instigating Cooperation among LLM Agents Using Adaptive Information Modulation	Qiliang Chen et.al.	2409.10372	null
2024-09-16	Catch It! Learning to Catch in Flight with Mobile Dexterous Hands	Yuanhang Zhang et.al.	2409.10319	null
2024-09-16	ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework	Jiahao Yuan et.al.	2409.10289	null
2024-09-16	Safety-Oriented Pruning and Interpretation of Reinforcement Learning Policies	Dennis Gross et.al.	2409.10218	null
2024-09-16	Enhancing RL Safety with Counterfactual LLM Reasoning	Dennis Gross et.al.	2409.10188	null
2024-09-16	Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control	Sebastian Hirt et.al.	2409.10171	null
2024-09-16	Quantile Regression for Distributional Reward Models in RLHF	Nicolai Dorka et.al.	2409.10164	link
2024-09-16	Robust Reinforcement Learning with Dynamic Distortion Risk Measures	Anthony Coache et.al.	2409.10096	null
2024-09-16	Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments	Wessel Ledder et.al.	2409.10048	null
2024-09-16	Reinforcement learning-based statistical search strategy for an axion model from flavor	Satsuki Nishimura et.al.	2409.10023	null
2024-09-13	The unknotting number, hard unknot diagrams, and reinforcement learning	Taylor Applebaum et.al.	2409.09032	null
2024-09-13	Modeling Rational Adaptation of Visual Search to Hierarchical Structures	Saku Sourulahti et.al.	2409.08967	null
2024-09-13	Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks	Jean Seong Bjorn Choe et.al.	2409.08938	null
2024-09-13	AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models	Yifei Yao et.al.	2409.08904	null
2024-09-13	Deep reinforcement learning for tracking a moving target in jellyfish-like swimming	Yihao Chen et.al.	2409.08815	null
2024-09-13	DexSim2Real $^{2}$ : Building Explicit World Model for Precise Articulated Object Dexterous Manipulation	Taoran Jiang et.al.	2409.08750	null
2024-09-13	Quasimetric Value Functions with Dense Rewards	Khadichabonu Valieva et.al.	2409.08724	null
2024-09-13	Secure Offloading in NOMA-Aided Aerial MEC Systems Based on Deep Reinforcement Learning	Hongjiang Lei et.al.	2409.08579	null
2024-09-13	Batch Ensemble for Variance Dependent Regret in Stochastic Bandits	Asaf Cassel et.al.	2409.08570	null
2024-09-13	OIDM: An Observability-based Intelligent Distributed Edge Sensing Method for Industrial Cyber-Physical Systems	Shigeng Wang et.al.	2409.08549	null
2024-09-12	Hand-Object Interaction Pretraining from Videos	Himanshu Gaurav Singh et.al.	2409.08273	null
2024-09-12	Multi-Model based Federated Learning Against Model Poisoning Attack: A Deep Learning Based Model Selection for MEC Systems	Somayeh Kianpisheh et.al.	2409.08237	null
2024-09-12	Towards Online Safety Corrections for Robotic Manipulation Policies	Ariana Spalter et.al.	2409.08233	null
2024-09-12	Design Optimization of Nuclear Fusion Reactor through Deep Reinforcement Learning	Jinsu Kim et.al.	2409.08231	null
2024-09-12	Adaptive Language-Guided Abstraction from Contrastive Explanations	Andi Peng et.al.	2409.08212	null
2024-09-12	Optimal Management of Grid-Interactive Efficient Buildings via Safe Reinforcement Learning	Xiang Huo et.al.	2409.08132	null
2024-09-12	Linear Complementary Dual Codes Constructed from Reinforcement Learning	Yansheng Wu et.al.	2409.08114	null
2024-09-12	Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning	Teng Yan et.al.	2409.08062	null
2024-09-12	Learning Causally Invariant Reward Functions from Diverse Demonstrations	Ivan Ovinnikov et.al.	2409.08012	null
2024-09-12	Digital Twin for Autonomous Guided Vehicles based on Integrated Sensing and Communications	Van-Phuc Bui et.al.	2409.08005	null
2024-09-11	Autonomous loading of ore piles with Load-Haul-Dump machines using Deep Reinforcement Learning	Rodrigo Salas et.al.	2409.07449	null
2024-09-11	Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation	Luo Ji et.al.	2409.07416	null
2024-09-11	Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching	Eugenio Chisari et.al.	2409.07343	null
2024-09-11	Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence	Luo Ji et.al.	2409.07341	null
2024-09-11	A Framework for Predicting the Impact of Game Balance Changes through Meta Discovery	Akash Saravanan et.al.	2409.07340	null
2024-09-11	Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences	Ziang Liu et.al.	2409.07268	null
2024-09-11	Perceptive Pedipulation with Local Obstacle Avoidance	Jonas Stolle et.al.	2409.07195	null
2024-09-11	A Perspective on AI-Guided Molecular Simulations in VR: Exploring Strategies for Imitation Learning in Hyperdimensional Molecular Systems	Mohamed Dhouioui et.al.	2409.07189	null
2024-09-11	Learning Efficient Recursive Numeral Systems via Reinforcement Learning	Jonathan D. Thomas et.al.	2409.07170	null
2024-09-11	DCMAC: Demand-aware Customized Multi-Agent Communication via Upper Bound Training	Dongkun Huo et.al.	2409.07127	null
2024-09-10	DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots	Maria Bauza et.al.	2409.06613	null
2024-09-10	Advancements in Gesture Recognition Techniques and Machine Learning for Enhanced Human-Robot Interaction: A Comprehensive Review	Sajjad Hussain et.al.	2409.06503	null
2024-09-10	Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout	Atharva Gundawar et.al.	2409.06477	null
2024-09-10	Learning Generative Interactive Environments By Trained Agent Exploration	Naser Kazemi et.al.	2409.06445	link
2024-09-10	Length Desensitization in Directed Preference Optimization	Wei Liu et.al.	2409.06411	null
2024-09-10	One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion	Nico Bohlinger et.al.	2409.06366	null
2024-09-10	Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning	Shreyas S R et.al.	2409.06356	null
2024-09-10	Learning Augmentation Policies from A Model Zoo for Time Series Forecasting	Haochen Yuan et.al.	2409.06282	null
2024-09-09	Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments	Haritheja Etukuru et.al.	2409.05865	link
2024-09-09	An Introduction to Quantum Reinforcement Learning (QRL)	Samuel Yen-Chi Chen et.al.	2409.05846	null
2024-09-09	Learning control of underactuated double pendulum with Model-Based Reinforcement Learning	Niccolò Turcato et.al.	2409.05811	null
2024-09-09	Markov Chain Variance Estimation: A Stochastic Approximation Approach	Shubhada Agrawal et.al.	2409.05733	null
2024-09-09	Cooperative Decision-Making for CAVs at Unsignalized Intersections: A MARL Approach with Attention and Hierarchical Game Priors	Jiaqi Liu et.al.	2409.05712	null
2024-09-09	Interactive incremental learning of generalizable skills with local trajectory modulation	Markus Knauer et.al.	2409.05655	null
2024-09-09	Forward KL Regularized Preference Optimization for Aligning Diffusion Policies	Zhao Shan et.al.	2409.05622	null
2024-09-09	Adaptive Multi-Layer Deployment for A Digital Twin Empowered Satellite-Terrestrial Integrated Network	Yihong Tao et.al.	2409.05480	null
2024-09-09	Reinforcement Learning for Variational Quantum Circuits Design	Simone Foderà et.al.	2409.05475	null
2024-09-09	Semifactual Explanations for Reinforcement Learning	Jasmina Gajcin et.al.	2409.05435	null
2024-09-06	RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs	Jiaxing Wu et.al.	2409.04421	null
2024-09-06	Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization	Minh Vu et.al.	2409.04374	null
2024-09-06	Refined Bounds on Near Optimality Finite Window Policies in POMDPs and Their Reinforcement Learning	Yunus Emre Demirci et.al.	2409.04351	null
2024-09-06	Advancing Multi-Organ Disease Care: A Hierarchical Multi-Agent Reinforcement Learning Framework	Daniel J. Tan et.al.	2409.04224	null
2024-09-06	The Prevalence of Neural Collapse in Neural Multivariate Regression	George Andriopoulos et.al.	2409.04180	null
2024-09-06	Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering	Jan Hofmann et.al.	2409.04122	null
2024-09-05	DRAL: Deep Reinforcement Adaptive Learning for Multi-UAVs Navigation in Unknown Indoor Environment	Kangtong Mo et.al.	2409.03930	null
2024-09-05	Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning	Huizhen Yu et.al.	2409.03915	null
2024-09-05	On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments	Muxing Wang et.al.	2409.03897	null
2024-09-05	Multi-agent Path Finding for Mixed Autonomy Traffic Coordination	Han Zheng et.al.	2409.03881	null
2024-09-05	Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron	Christian Schmid et.al.	2409.03749	null
2024-09-05	Differentiable Discrete Event Simulation for Queuing Network Control	Ethan Che et.al.	2409.03740	null
2024-09-05	On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization	Yong Lin et.al.	2409.03650	null
2024-09-05	1 Modular Parallel Manipulator for Long-Term Soft Robotic Data Collection	Kiyn Chin et.al.	2409.03614	null
2024-09-05	CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning	John Birkbeck et.al.	2409.03577	null
2024-09-05	Sparsifying Parametric Models with L0 Regularization	Nicolò Botteghi et.al.	2409.03489	null
2024-09-05	Reinforcement Learning Approach to Optimizing Profilometric Sensor Trajectories for Surface Inspection	Sara Roos-Hoefgeest et.al.	2409.03429	null
2024-09-05	Game On: Towards Language Models as RL Experimenters	Jingwei Zhang et.al.	2409.03402	null
2024-09-05	ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models	Qi Ju et.al.	2409.03301	link
2024-09-05	Robust synchronization and policy adaptation for networked heterogeneous agents	Miguel F. Arevalo-Castiblanco et.al.	2409.03273	null
2024-09-04	Hybrid Imitation-Learning Motion Planner for Urban Driving	Cristian Gariboldi et.al.	2409.02871	null
2024-09-04	Knowledge Transfer for Collaborative Misbehavior Detection in Untrusted Vehicular Environments	Roshan Sedar et.al.	2409.02844	null
2024-09-04	Tractable Offline Learning of Regular Decision Processes	Ahana Deb et.al.	2409.02747	null
2024-09-04	Surgical Task Automation Using Actor-Critic Frameworks and Self-Supervised Imitation Learning	Jingshuai Liu et.al.	2409.02724	null
2024-09-04	Decision Transformer for Enhancing Neural Local Search on the Job Shop Scheduling Problem	Constantin Waubert de Puiseau et.al.	2409.02697	null
2024-09-04	Causality-Aware Transformer Networks for Robotic Navigation	Ruoyu Wang et.al.	2409.02669	null
2024-09-04	A Survey on Emergent Language	Jannik Peters et.al.	2409.02645	null
2024-09-04	Mamba as a motion encoder for robotic imitation learning	Toshiaki Tsuji et.al.	2409.02636	null
2024-09-04	Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal	Jifeng Hu et.al.	2409.02512	null
2024-09-04	USV-AUV Collaboration Framework for Underwater Tasks under Extreme Sea Conditions	Jingzehua Xu et.al.	2409.02444	null
2024-08-30	Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control	Zihao Sheng et.al.	2408.17380	link
2024-08-30	Stationary Policies are Optimal in Risk-averse Total-reward MDPs with EVaR	Xihong Su et.al.	2408.17286	null
2024-08-30	Using Quantum Solved Deep Boltzmann Machines to Increase the Data Efficiency of RL Agents	Daniel Kent et.al.	2408.17240	null
2024-08-30	MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models	Yujing Wang et.al.	2408.17072	null
2024-08-30	Efficient Camera Exposure Control for Visual Odometry via Deep Reinforcement Learning	Shuyang Zhang et.al.	2408.17005	link
2024-08-30	A Tighter Convergence Proof of Reverse Experience Replay	Nan Jiang et.al.	2408.16999	link
2024-08-30	Discovery of False Data Injection Schemes on Frequency Controllers with Reinforcement Learning	Romesh Prasad et.al.	2408.16958	null
2024-08-29	FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning	Li-Heng Lin et.al.	2408.16944	null
2024-08-29	Manipulating OpenFlow Link Discovery Packet Forwarding for Topology Poisoning	Mingming Chen et.al.	2408.16940	null
2024-08-29	Coverage Analysis of Multi-Environment Q-Learning Algorithms for Wireless Network Optimization	Talha Bozkus et.al.	2408.16882	null
2024-08-29	Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models	Alec Solway et.al.	2408.16753	null
2024-08-29	A GREAT Architecture for Edge-Based Graph Problems Like TSP	Attila Lischka et.al.	2408.16717	null
2024-08-29	RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model	Zhuan Shi et.al.	2408.16634	null
2024-08-29	Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning	Keqin Li et.al.	2408.16633	null
2024-08-29	Phase Optimization and Relay Selection for Joint Relay and IRS-Assisted Communication	Uyoata E. Uyoata et.al.	2408.16399	null
2024-08-29	EasyChauffeur: A Baseline Advancing Simplicity and Efficiency on Waymax	Lingyu Xiao et.al.	2408.16375	null
2024-08-29	Efficient Multi-agent Navigation with Lightweight DRL Policy	Xingrong Diao et.al.	2408.16370	null
2024-08-29	On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes	Yi Wan et.al.	2408.16262	null
2024-08-28	DECAF: a Discrete-Event based Collaborative Human-Robot Framework for Furniture Assembly	Giulio Giacomuzzo et.al.	2408.16125	null
2024-08-28	RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models	Pritthijit Nath et.al.	2408.16118	link
2024-08-28	In-Context Imitation Learning via Next-Token Prediction	Letian Fu et.al.	2408.15980	link
2024-08-28	Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games	Nicholas R. Waytowich et.al.	2408.15950	null
2024-08-28	DeMoBot: Deformable Mobile Manipulation with Vision-based Sub-goal Retrieval	Yuying Zhang et.al.	2408.15919	null
2024-08-28	Adaptive Traffic Signal Control Using Reinforcement Learning	Muhammad Tahir Rafique et.al.	2408.15751	null
2024-08-28	Deep Reinforcement Learning for Radiative Heat Transfer Optimization Problems	Eva Ortiz-Mansilla et.al.	2408.15727	null
2024-08-28	Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System	Georg Schäfer et.al.	2408.15633	null
2024-08-28	Structural Optimization of Lightweight Bipedal Robot via SERL	Yi Cheng et.al.	2408.15632	null
2024-08-28	Statistical QoS Provision in Business-Centric Networks	Chang Wu et.al.	2408.15609	null
2024-08-28	Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning	Minjong Yoo et.al.	2408.15593	null
2024-08-28	Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits	Woojin Jeong et.al.	2408.15535	null
2024-08-27	SpecGuard: Specification Aware Recovery for Robotic Autonomous Vehicles from Physical Attacks	Pritam Dash et.al.	2408.15200	null
2024-08-27	Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning	Batuhan Yardim et.al.	2408.15173	null
2024-08-27	Applications in CityLearn Gym Environment for Multi-Objective Control Benchmarking in Grid-Interactive Buildings and Districts	Kingsley Nweye et.al.	2408.15170	null
2024-08-27	muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults	Deepak-George Thomas et.al.	2408.15150	null
2024-08-27	No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery	Alexander Rutherford et.al.	2408.15099	link
2024-08-27	MiWaves Reinforcement Learning Algorithm	Susobhan Ghosh et.al.	2408.15076	null
2024-08-27	Earth Observation Satellite Scheduling with Graph Neural Networks	Antoine Jacquet et.al.	2408.15041	null
2024-08-27	*Inverse-Q: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data**	Han Xia et.al.	2408.14874	null
2024-08-27	Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation	Haozhe Lou et.al.	2408.14873	null
2024-08-27	Learning Robust Reward Machines from Noisy Labels	Roko Parac et.al.	2408.14871	link
2024-08-26	Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning	Xinyang Gu et.al.	2408.14472	null
2024-08-26	Equivariant Reinforcement Learning under Partial Observability	Hai Nguyen et.al.	2408.14336	null
2024-08-26	Efficient Active Flow Control Strategy for Confined Square Cylinder Wake Using Deep Learning-Based Surrogate Model and Reinforcement Learning	Meng Zhang et.al.	2408.14232	null
2024-08-26	DynamicRouteGPT: A Real-Time Multi-Vehicle Dynamic Navigation Framework Based on Large Language Models	Ziai Zhou et.al.	2408.14185	null
2024-08-26	Robot Navigation with Entity-Based Collision Avoidance using Deep Reinforcement Learning	Yury Kolomeytsev et.al.	2408.14183	null
2024-08-26	ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners	Xiangge Huang et.al.	2408.14086	null
2024-08-26	Bridging the gap between Learning-to-plan, Motion Primitives and Safe Reinforcement Learning	Piotr Kicki et.al.	2408.14063	null
2024-08-26	Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning	Joey Hejna et.al.	2408.14037	link
2024-08-26	Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning	Wen-Han Hsieh et.al.	2408.14009	null
2024-08-26	Quantitative Representation of Scenario Difficulty for Autonomous Driving Based on Adversarial Policy Search	Shuo Yang et.al.	2408.14000	null
2024-08-23	Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach	Johan Peralez et.al.	2408.13139	null
2024-08-23	Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning	Jihwan Oh et.al.	2408.13092	null
2024-08-23	Guiding IoT-Based Healthcare Alert Systems with Large Language Models	Yulan Gao et.al.	2408.13071	null
2024-08-23	cc-DRL: a Convex Combined Deep Reinforcement Learning Flight Control Design for a Morphing Quadrotor	Tao Yang et.al.	2408.13054	null
2024-08-23	In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting	Haowei Du et.al.	2408.13028	null
2024-08-23	Robust Iterative Value Conversion: Deep Reinforcement Learning for Neurochip-driven Edge Robots	Yuki Kadokawa et.al.	2408.13018	null
2024-08-23	SUMO: Search-Based Uncertainty Estimation for Model-Based Offline Reinforcement Learning	Zhongjian Qiao et.al.	2408.12970	null
2024-08-23	SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning	Wang Luo et.al.	2408.12830	null
2024-08-23	DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation	Xiaowei Mao et.al.	2408.12809	null
2024-08-23	Intelligent OPC Engineer Assistant for Semiconductor Manufacturing	Guojin Chen et.al.	2408.12775	null
2024-08-22	Controllable Text Generation for Large Language Models: A Survey	Xun Liang et.al.	2408.12599	link
2024-08-22	Automating Deformable Gasket Assembly	Simeon Adebola et.al.	2408.12593	null
2024-08-22	Human-In-The-Loop Machine Learning for Safe and Ethical Autonomous Vehicles: Principles, Challenges, and Opportunities	Yousef Emami et.al.	2408.12548	null
2024-08-22	PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators	Sam Earle et.al.	2408.12525	null
2024-08-22	EX-DRL: Hedging Against Heavy Losses with EXtreme Distributional Reinforcement Learning	Parvin Malekzadeh et.al.	2408.12446	null
2024-08-22	Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning	Yen-Ru Lai et.al.	2408.12307	null
2024-08-22	Domino-cooling Oscillator Networks with Deep Reinforcement Learning	Sampreet Kalita et.al.	2408.12271	null
2024-08-22	UNCO: Towards Unifying Neural Combinatorial Optimization through Large Language Model	Xia Jiang et.al.	2408.12214	null
2024-08-22	A Safety-Oriented Self-Learning Algorithm for Autonomous Driving: Evolution Starting from a Basic Model	Shuo Yang et.al.	2408.12190	null
2024-08-22	A Safe and Efficient Self-evolving Algorithm for Decision-making and Control of Autonomous Driving Systems	Shuo Yang et.al.	2408.12187	null
2024-08-21	Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction	Anthony GX-Chen et.al.	2408.11816	null
2024-08-21	ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation	Shiqi Yang et.al.	2408.11805	null
2024-08-21	Critique-out-Loud Reward Models	Zachary Ankner et.al.	2408.11791	link
2024-08-21	Deviations from the Nash equilibrium and emergence of tacit collusion in a two-player optimal execution game with reinforcement learning	Fabrizio Lillo et.al.	2408.11773	null
2024-08-21	Bayesian Optimization Framework for Efficient Fleet Design in Autonomous Multi-Robot Exploration	David Molina Concha et.al.	2408.11751	null
2024-08-21	Optimizing Interpretable Decision Tree Policies for Reinforcement Learning	Daniël Vos et.al.	2408.11632	link
2024-08-21	A Survey of Embodied Learning for Object-Centric Robotic Manipulation	Ying Zheng et.al.	2408.11537	link
2024-08-22	Using Part-based Representations for Explainable Deep Reinforcement Learning	Manos Kirtas et.al.	2408.11455	null
2024-08-21	Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration	Cheng Xu et.al.	2408.11416	link
2024-08-21	Reflex-Based Open-Vocabulary Navigation without Prior Knowledge Using Omnidirectional Camera and Multiple Vision-Language Models	Kento Kawaharazuka et.al.	2408.11380	null
2024-08-20	Accelerating Goal-Conditioned RL Algorithms and Research	Michał Bortkiewicz et.al.	2408.11052	link
2024-08-20	RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands	Yi Zhao et.al.	2408.11048	null
2024-08-20	Quantum Machine Learning Algorithms for Anomaly Detection: a Survey	Sebastiano Corli et.al.	2408.11047	null
2024-08-20	Deep Reinforcement Learning for Network Energy Saving in 6G and Beyond Networks	Dinh-Hieu Tran et.al.	2408.10974	null
2024-08-20	The Evolution of Reinforcement Learning in Quantitative Finance	Nikolaos Pippas et.al.	2408.10932	null
2024-08-20	Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning	Haozhe Ma et.al.	2408.10858	link
2024-08-20	Offline Model-Based Reinforcement Learning with Anti-Exploration	Padmanaba Srinivasan et.al.	2408.10713	null
2024-08-20	Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation	Shiming Xie et.al.	2408.10642	null
2024-08-20	Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search	Jonathan Light et.al.	2408.10635	link
2024-08-20	Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams	Litian Huang et.al.	2408.10592	link
2024-08-19	LEAD: Towards Learning-Based Equity-Aware Decarbonization in Ridesharing Platforms	Mahsa Sahebdel et.al.	2408.10201	null
2024-08-19	Physics-Aware Combinatorial Assembly Planning using Deep Reinforcement Learning	Ruixuan Liu et.al.	2408.10162	null
2024-08-19	$R^2$ -Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement	Haoyang Wang et.al.	2408.10135	null
2024-08-19	Enhancing Reinforcement Learning Through Guided Search	Jérôme Arjonilla et.al.	2408.10113	null
2024-08-19	Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning	Sriyash Poddar et.al.	2408.10075	null
2024-08-19	Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm	Nikolai Rozanov et.al.	2408.10055	null
2024-08-19	Adaptive BESS and Grid Setpoints Optimization: A Model-Free Framework for Efficient Battery Management under Dynamic Tariff Pricing	Alaa Selim et.al.	2408.09989	null
2024-08-19	The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective	Renye Yan et.al.	2408.09974	null
2024-08-19	GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits	Gongpu Chen et.al.	2408.09882	null
2024-08-19	ShortCircuit: AlphaZero-Driven Circuit Design	Dimitrios Tsaras et.al.	2408.09858	null
2024-08-16	HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis	Zhi-Bo Liu et.al.	2408.08847	link
2024-08-16	CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk	Mohamad Fares El Hajj Chehade et.al.	2408.08812	null
2024-08-16	Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions	Bhuvanashree Murugadoss et.al.	2408.08781	null
2024-08-16	SYMPOL: Symbolic Tree-Based On-Policy Reinforcement Learning	Sascha Marton et.al.	2408.08761	link
2024-08-16	Efficient Multi-Policy Evaluation for Reinforcement Learning	Shuze Liu et.al.	2408.08706	null
2024-08-16	Neural Reward Machines	Elena Umili et.al.	2408.08677	link
2024-08-16	Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space Program	Alejandro Carrasco et.al.	2408.08676	link
2024-08-16	DeepREST: Automated Test Case Generation for REST APIs Exploiting Deep Reinforcement Learning	Davide Corradini et.al.	2408.08594	null
2024-08-16	Multilevel Graph Reinforcement Learning for Consistent Cognitive Decision-making in Heterogeneous Mixed Autonomy	Xin Gao et.al.	2408.08516	null
2024-08-16	Deep multi-intentional inverse reinforcement learning for cognitive multi-function radar inverse cognition	Hancong Feng et.al.	2408.08478	null
2024-08-15	A Conflicts-free, Speed-lossless KAN-based Reinforcement Learning Decision System for Interactive Driving in Roundabouts	Zhihao Lin et.al.	2408.08242	null
2024-08-15	Explaining an Agent’s Future Beliefs through Temporally Decomposing Future Reward Estimators	Mark Towers et.al.	2408.08230	link
2024-08-15	DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search	Huajian Xin et.al.	2408.08152	link
2024-08-15	Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players	Pragnya Alatur et.al.	2408.08075	null
2024-08-15	An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation	Jun Wang et.al.	2408.08047	null
2024-08-15	Adaptive User Journeys in Pharma E-Commerce with Reinforcement Learning: Insights from SwipeRx	Ana Fernández del Río et.al.	2408.08024	null
2024-08-15	Experimental evaluation of offline reinforcement learning for HVAC control in buildings	Jun Wang et.al.	2408.07986	link
2024-08-15	Meta SAC-Lag: Towards Deployable Safe Reinforcement Learning via MetaGradient-based Hyperparameter Tuning	Homayoun Honari et.al.	2408.07962	null
2024-08-15	Solving a Rubik’s Cube Using its Local Graph Structure	Shunyu Yao et.al.	2408.07945	null
2024-08-15	IReCa: Intrinsic Reward-enhanced Context-aware Reinforcement Learning for Human-AI Coordination	Xin Hao et.al.	2408.07877	null
2024-08-14	Off-Policy Reinforcement Learning with High Dimensional Reward	Dong Neuck Lee et.al.	2408.07660	null
2024-08-14	Adaptive Behavioral AI: Reinforcement Learning to Enhance Pharmacy Services	Ana Fernández del Río et.al.	2408.07647	null
2024-08-14	SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning	Jianye Xu et.al.	2408.07644	link
2024-08-14	Optimizing HIV Patient Engagement with Reinforcement Learning in Resource-Limited Settings	África Periáñez et.al.	2408.07629	null
2024-08-14	A Nested Graph Reinforcement Learning-based Decision-making Strategy for Eco-platooning	Xin Gao et.al.	2408.07578	null
2024-08-14	Large Language Models Know What Makes Exemplary Contexts	Quanyu Long et.al.	2408.07505	null
2024-08-14	Large Language Models Prompting With Episodic Memory	Dai Do et.al.	2408.07465	null
2024-08-14	Real-world validation of safe reinforcement learning, model predictive control and decision tree-based home energy management systems	Julian Ruddick et.al.	2408.07435	null
2024-08-14	Bridging Training and Execution via Dynamic Directed Graph-Based Communication in Cooperative Multi-Agent Systems	Zhuohui Zhang et.al.	2408.07397	null
2024-08-14	Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space	Xiaoyang Yu et.al.	2408.07395	null
2024-08-13	LLMs can Schedule	Henrik Abgaryan et.al.	2408.06993	link
2024-08-13	IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and Optimization	Guanchang Li et.al.	2408.06969	null
2024-08-13	Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation	Yanjie Dong et.al.	2408.06945	null
2024-08-13	Multi-Agent Continuous Control with Generative Flow Networks	Shuang Luo et.al.	2408.06920	link
2024-08-13	Personalized Dynamic Difficulty Adjustment – Imitation Learning Meets Reinforcement Learning	Ronja Fuchs et.al.	2408.06818	link
2024-08-13	Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection	Matthias Bartolo et.al.	2408.06803	link
2024-08-13	Residual Deep Reinforcement Learning for Inverter-based Volt-Var Control	Qiong Liu et.al.	2408.06790	null
2024-08-13	Deep reinforcement learning for the management of the wall regeneration cycle in wall-bounded turbulent flows	Giorgio Maria Cavallazzi et.al.	2408.06783	null
2024-08-13	Robust Deep Reinforcement Learning for Inverter-based Volt-Var Control in Partially Observable Distribution Networks	Qiong Liu et.al.	2408.06776	null
2024-08-13	MAPPO-PIS: A Multi-Agent Proximal Policy Optimization Method with Prior Intent Sharing for CAVs’ Cooperative Decision-Making	Yicheng Guo et.al.	2408.06656	link
2024-08-12	Body Transformer: Leveraging Robot Embodiment for Policy Learning	Carmelo Sferrazza et.al.	2408.06316	null
2024-08-12	Inverse designing metamaterials with programmable nonlinear functional responses in graph space	Marco Maurizi et.al.	2408.06300	null
2024-08-12	EyeSight Hand: Design of a Fully-Actuated Dexterous Robot Hand with Integrated Vision-Based Tactile Sensors and Compliant Actuation	Branden Romero et.al.	2408.06265	null
2024-08-12	Stable-BC: Controlling Covariate Shift with Stable Behavior Cloning	Shaunak A. Mehta et.al.	2408.06246	null
2024-08-12	Building Decision Making Models Through Language Model Regime	Yu Zhang et.al.	2408.06087	null
2024-08-12	Sequential sampling without comparison to boundary through model-free reinforcement learning	Jamal Esmaily et.al.	2408.06080	null
2024-08-12	Online Optimization of Curriculum Learning Schedules using Evolutionary Optimization	Mohit Jiwatode et.al.	2408.06068	null
2024-08-12	GFlowNet Training by Policy Gradients	Puhua Niu et.al.	2408.05885	link
2024-08-12	Multi-Agent Deep Reinforcement Learning Framework for Wireless MAC Protocol Design and Optimization	Navid Keshtiarast et.al.	2408.05884	null
2024-08-11	Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning	Shi Bo et.al.	2408.05860	null
2024-08-09	Deterministic remote entanglement using a chiral quantum interconnect	Aziza Almanakly et.al.	2408.05164	null
2024-08-09	Kolmogorov-Arnold Network for Online Reinforcement Learning	Victor Augusto Kich et.al.	2408.04841	null
2024-08-09	Multi-User MISO with Stacked Intelligent Metasurfaces: A DRL-Based Sum-Rate Optimization Approach	Hao Liu et.al.	2408.04837	null
2024-08-09	Next-Generation Wi-Fi Networks with Generative AI: Design and Insights	Jingyu Wang et.al.	2408.04835	null
2024-08-08	Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity	Martin Smit et.al.	2408.04549	link
2024-08-08	Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs	Kevin Tan et.al.	2408.04526	null
2024-08-08	Model-Based Transfer Learning for Contextual Reinforcement Learning	Jung-Hoon Cho et.al.	2408.04498	null
2024-08-08	Reinforcement Learning from Human Feedback for Lane Changing of Autonomous Vehicles in Mixed Traffic	Yuting Wang et.al.	2408.04447	null
2024-08-08	Non-maximizing policies that fulfill multi-criterion aspirations in expectation	Simon Dima et.al.	2408.04385	null
2024-08-08	Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations	Julen Urain et.al.	2408.04380	null
2024-08-08	Deep Reinforcement Learning for the Design of Metamaterial Mechanisms with Functional Compliance Control	Yejun Choi et.al.	2408.04376	null
2024-08-08	Goal-Oriented UAV Communication Design and Optimization for Target Tracking: A MachineLearning Approach	Wenchao Wu et.al.	2408.04358	null
2024-08-08	KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination	Yin Gu et.al.	2408.04336	null
2024-08-08	Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization	Aditya Kapoor et.al.	2408.04295	null
2024-08-07	Traffic and Obstacle-aware UAV Positioning in Urban Environments Using Reinforcement Learning	Kamran Shafafi et.al.	2408.03894	null
2024-08-07	Navigating the Human Maze: Real-Time Robot Pathfinding with Generative Imitation Learning	Martin Moder et.al.	2408.03807	null
2024-08-07	HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks	Jingsong Liang et.al.	2408.03768	null
2024-08-07	Asynchronous Credit Assignment Framework for Multi-Agent Reinforcement Learning	Yongheng Liang et.al.	2408.03692	null
2024-08-07	RL-ADN: A High-Performance Deep Reinforcement Learning Environment for Optimal Energy Storage Systems Dispatch in Active Distribution Networks	Shengren Hou et.al.	2408.03685	null
2024-08-07	AI-Driven approach for sustainable extraction of earth’s subsurface renewable energy while minimizing seismic activity	Diego Gutierrez-Oribio et.al.	2408.03664	null
2024-08-07	A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case	Sonia Meyer et.al.	2408.03562	null
2024-08-07	Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes	Chen Tang et.al.	2408.03539	null
2024-08-06	Spacecraft inertial parameters estimation using time series clustering and reinforcement learning	Konstantinos Platanitis et.al.	2408.03445	null
2024-08-06	Communication-Aware Consistent Edge Selection for Mobile Users and Autonomous Vehicles	Nazish Tahir et.al.	2408.03435	null
2024-08-07	Adversarial Safety-Critical Scenario Generation using Naturalistic Human Driving Priors	Kunkun Hao et.al.	2408.03200	null
2024-08-06	RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning	Jiapeng Zhu et.al.	2408.03195	null
2024-08-06	Integrated Intention Prediction and Decision-Making with Spectrum Attention Net and Proximal Policy Optimization	Xiao Zhou et.al.	2408.03191	null
2024-08-06	CADRL: Category-aware Dual-agent Reinforcement Learning for Explainable Recommendations over Knowledge Graphs	Shangfei Zheng et.al.	2408.03166	null
2024-08-06	QADQN: Quantum Attention Deep Q-Network for Financial Market Prediction	Siddhant Dutta et.al.	2408.03088	null
2024-08-06	Research on Autonomous Driving Decision-making Strategies based Deep Reinforcement Learning	Zixiang Wang et.al.	2408.03084	null
2024-08-06	Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach	Ehsan Badfar et.al.	2408.03077	null
2024-08-06	Learning to Turn: Diffusion Imitation for Robust Row Turning in Under-Canopy Robots	Arun N. Sivakumar et.al.	2408.03059	null
2024-08-06	A Course in Dynamic Optimization	Bar Light et.al.	2408.03034	null
2024-08-07	Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning	Haozhe Ma et.al.	2408.03029	null
2024-08-05	Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion	Ho Jae Lee et.al.	2408.02662	null
2024-08-05	Context-aware Mamba-based Reinforcement Learning for social robot navigation	Syed Muhammad Mustafa et.al.	2408.02661	null
2024-08-05	Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?	Mohammad Bahrami Karkevandi et.al.	2408.02651	null
2024-08-05	Backward explanations via redefinition of predicates	Léo Saulières et.al.	2408.02606	null
2024-08-05	Progressively Selective Label Enhancement for Language Model Alignment	Biao Liu et.al.	2408.02599	null
2024-08-05	Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information	Yauwai Yim et.al.	2408.02559	null
2024-08-05	Counterfactual Shapley Values for Explaining Reinforcement Learning	Yiwei Shi et.al.	2408.02529	null
2024-08-05	Fair Resource Allocation For Hierarchical Federated Edge Learning in Space-Air-Ground Integrated Networks via Deep Reinforcement Learning with Hybrid Control	Chong Huang et.al.	2408.02501	null
2024-08-05	Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise	Noufel Frikha et.al.	2408.02489	null
2024-08-05	Terracorder: Sense Long and Prosper	Josh Millar et.al.	2408.02407	null
2024-08-02	Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer	Yu Yang et.al.	2408.01402	null
2024-08-02	NOLO: Navigate Only Look Once	Bohan Zhou et.al.	2408.01384	null
2024-08-02	Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation	Ruoxuan Feng et.al.	2408.01366	null
2024-08-02	Jacta: A Versatile Planner for Learning Dexterous and Whole-body Manipulation	Jan Brüdigam et.al.	2408.01258	null
2024-08-02	Deep progressive reinforcement learning-based flexible resource scheduling framework for IRS and UAV-assisted MEC system	Li Dong et.al.	2408.01248	null
2024-08-02	Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems	Juan C. Rosero et.al.	2408.01188	null
2024-08-02	Optimizing Variational Quantum Circuits Using Metaheuristic Strategies in Reinforcement Learning	Michael Kölle et.al.	2408.01187	null
2024-08-02	TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation	Yicheng Lin et.al.	2408.01156	null
2024-08-02	Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning	Yueen Ma et.al.	2408.01147	null
2024-08-02	A Survey on Self-play Methods in Reinforcement Learning	Ruize Zhang et.al.	2408.01072	null
2024-08-01	A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence	Mingyang Liu et.al.	2408.00751	null
2024-08-01	Insurance Portfolio Pursuit with Reinforcement Learning	Edward James Young et.al.	2408.00713	null
2024-08-01	Learning in Multi-Objective Public Goods Games with Non-Linear Utilities	Nicole Orzan et.al.	2408.00682	null
2024-08-01	Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning	Yuanyang Zhu et.al.	2408.00309	null
2024-08-01	A Reinforcement Learning Based Motion Planner for Quadrotor Autonomous Flight in Dense Environment	Zhaohong Liu et.al.	2408.00275	null
2024-08-01	Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control	Hao Zhou et.al.	2408.00214	null
2024-07-31	CREW: Facilitating Human-AI Teaming Research	Lingyu Zhang et.al.	2408.00170	null
2024-07-31	Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates	Colin Shea-Blymyer et.al.	2408.00147	null
2024-07-31	Adaptive Transit Signal Priority based on Deep Reinforcement Learning and Connected Vehicles in a Traffic Microsimulation Environment	Dickness Kwesiga et.al.	2408.00098	null
2024-07-31	Berkeley Humanoid: A Research Platform for Learning-based Control	Qiayuan Liao et.al.	2407.21781	null
2024-07-31	Human-Machine Co-Adaptation for Robot-Assisted Rehabilitation via Dual-Agent Multiple Model Reinforcement Learning (DAMMRL)	Yang An et.al.	2407.21734	null
2024-07-31	Multi-agent reinforcement learning for the control of three-dimensional Rayleigh-Bénard convection	Joel Vasanth et.al.	2407.21565	null
2024-07-31	Black box meta-learning intrinsic rewards for sparse-reward environments	Octavio Pappalardo et.al.	2407.21546	null
2024-07-31	Multi-agent Assessment with QoS Enhancement for HD Map Updates in a Vehicular Network	Jeffrey Redondo et.al.	2407.21460	null
2024-07-31	ProSpec RL: Plan Ahead, then Execute	Liangliang Liu et.al.	2407.21359	null
2024-07-31	Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks	David Valencia et.al.	2407.21338	null
2024-07-31	Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation	Taehyun Cho et.al.	2407.21260	null
2024-07-30	VITAL: Visual Teleoperation to Enhance Robot Learning through Human-in-the-Loop Corrections	Hamidreza Kasaei et.al.	2407.21244	null
2024-07-30	Learning Stable Robot Grasping with Transformer-based Tactile Control Policies	En Yen Puang et.al.	2407.21172	link
2024-07-30	Securing Proof of Stake Blockchains: Leveraging Multi-Agent Reinforcement Learning for Detecting and Mitigating Malicious Nodes	Faisal Haque Bappy et.al.	2407.20983	null
2024-07-30	How to Choose a Reinforcement-Learning Algorithm	Fabian Bongratz et.al.	2407.20917	null
2024-07-30	ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning	Hosung Lee et.al.	2407.20806	link
2024-07-30	Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning	Norman Di Palo et.al.	2407.20798	null
2024-07-30	Architectural Influence on Variational Quantum Circuits in Multi-Agent Reinforcement Learning: Evolutionary Strategies for Optimization	Michael Kölle et.al.	2407.20739	null
2024-07-30	Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems	Qionghua Liao et.al.	2407.20679	null
2024-07-30	Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations	Yupei Yang et.al.	2407.20651	null
2024-07-30	Wireless Multi-User Interactive Virtual Reality in Metaverse with Edge-Device Collaborative Computing	Caolu Xu et.al.	2407.20523	null
2024-07-30	Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge	Yupei Yang et.al.	2407.20506	link
2024-07-29	A Method for Fast Autonomy Transfer in Reinforcement Learning	Dinuka Sahabandu et.al.	2407.20466	null
2024-07-29	SAPG: Split and Aggregate Policy Gradients	Jayesh Singla et.al.	2407.20230	null
2024-07-29	Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration	Yixiao Ma et.al.	2407.20203	null
2024-07-29	Language-Conditioned Offline RL for Multi-Robot Navigation	Steven Morad et.al.	2407.20164	null
2024-07-29	Quantum Machine Learning Architecture Search via Deep Reinforcement Learning	Xin Dai et.al.	2407.20147	null
2024-07-29	Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning	Liyuan Mao et.al.	2407.20109	null
2024-07-29	Counterfactual rewards promote collective transport using individually controlled swarm microrobots	Veit-Lorenz Heuthe et.al.	2407.20041	null
2024-07-29	Collision Probability Distribution Estimation via Temporal Difference Learning	Thomas Steinecker et.al.	2407.20000	link
2024-07-29	Integrated Communications and Security: RIS-Assisted Simultaneous Transmission and Generation of Secret Keys	Ning Gao et.al.	2407.19960	null
2024-07-29	A Differential Dynamic Programming Framework for Inverse Reinforcement Learning	Kun Cao et.al.	2407.19902	null
2024-07-29	Imitation Learning for Intra-Day Power Grid Operation through Topology Actions	Matthijs de Jong et.al.	2407.19865	null
2024-07-26	SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments	Shu Ishida et.al.	2407.18913	null
2024-07-26	Lessons from Learning to Spin “Pens”	Jun Wang et.al.	2407.18902	null
2024-07-26	SHANGUS: Deep Reinforcement Learning Meets Heuristic Optimization for Speedy Frontier-Based Exploration of Autonomous Vehicles in Unknown Spaces	Seunghyeop Nam et.al.	2407.18892	null
2024-07-26	An Accelerated Multi-level Monte Carlo Approach for Average Reward Reinforcement Learning with General Policy Parametrization	Swetha Ganesh et.al.	2407.18878	null
2024-07-26	QT-TDM: Planning with Transformer Dynamics Model and Autoregressive Q-Learning	Mostafa Kotb et.al.	2407.18841	null
2024-07-26	The Cross-environment Hyperparameter Setting Benchmark for Reinforcement Learning	Andrew Patterson et.al.	2407.18840	null
2024-07-26	Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects	Johannes Pitz et.al.	2407.18834	null
2024-07-26	Online Planning in POMDPs with State-Requests	Raphael Avalos et.al.	2407.18812	null
2024-07-26	Tuning the kinetics of intracellular transport	Ardra Suchitran et.al.	2407.18784	null
2024-07-26	A Deep Reinforcement Learning Approach to Wavefront Control for Exoplanet Imaging	Yann Gutierrez et.al.	2407.18733	null
2024-07-25	Recursive Introspection: Teaching Language Model Agents How to Self-Improve	Yuxiao Qu et.al.	2407.18219	null
2024-07-25	Differentiable Quantum Architecture Search in Asynchronous Quantum Reinforcement Learning	Samuel Yen-Chi Chen et.al.	2407.18202	null
2024-07-25	Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation	Jean Seong Bjorn Choe et.al.	2407.18143	null
2024-07-25	MapTune: Advancing ASIC Technology Mapping via Reinforcement Learning Guided Library Tuning	Mingju Liu et.al.	2407.18110	link
2024-07-25	Principal-Agent Reinforcement Learning	Dima Ivanov et.al.	2407.18074	null
2024-07-25	Multi-Agent Deep Reinforcement Learning for Resilience Optimization in 5G RAN	Soumeya Kaada et.al.	2407.18066	null
2024-07-25	Personalized and Context-aware Route Planning for Edge-assisted Vehicles	Dinesh Cyril Selvaraj et.al.	2407.17980	null
2024-07-25	Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization	Feihu Huang et.al.	2407.17823	null
2024-07-25	Advanced deep-reinforcement-learning methods for flow control: group-invariant and positional-encoding networks improve learning speed and quality	Joogoo Jeon et.al.	2407.17822	null
2024-07-25	Preliminary Results of Neuromorphic Controller Design and a Parkinson’s Disease Dataset Building for Closed-Loop Deep Brain Stimulation	Ananna Biswas et.al.	2407.17756	null
2024-07-24	Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning	Shuang Qiu et.al.	2407.17466	null
2024-07-24	Toward human-centered shared autonomy AI paradigms for human-robot teaming in healthcare	Reza Abiri et.al.	2407.17464	null
2024-07-24	SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning	Jianpeng Yao et.al.	2407.17460	null
2024-07-24	Joint Transmit and Jamming Power Optimization for Secrecy in Energy Harvesting Networks: A Reinforcement Learning Approach	Shalini Tripathi et.al.	2407.17435	null
2024-07-24	Market Making with Exogenous Competition	Robert Boyce et.al.	2407.17393	null
2024-07-24	MoveLight: Enhancing Traffic Signal Control through Movement-Centric Deep Reinforcement Learning	Junqi Shao et.al.	2407.17303	null
2024-07-24	Pretrained Visual Representations in Reinforcement Learning	Emlyn Williams et.al.	2407.17238	null
2024-07-24	Sublinear Regret for An Actor-Critic Algorithm in Continuous-Time Linear-Quadratic Reinforcement Learning	Yilie Huang et.al.	2407.17226	null
2024-07-24	Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization	Jonathan Pirnay et.al.	2407.17206	link
2024-07-24	Path Following and Stabilisation of a Bicycle Model using a Reinforcement Learning Approach	Sebastian Weyrer et.al.	2407.17156	null
2024-07-23	A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data	Adrian Remonda et.al.	2407.16680	link
2024-07-23	From Imitation to Refinement – Residual RL for Precise Visual Assembly	Lars Ankile et.al.	2407.16677	null
2024-07-23	Efficient Discovery of Actual Causality using Abstraction-Refinement	Arshia Rafieioskouei et.al.	2407.16629	null
2024-07-23	Functional Acceleration for Policy Mirror Descent	Veronica Chelu et.al.	2407.16602	null
2024-07-23	Real-Time Interactions Between Human Controllers and Remote Devices in Metaverse	Kan Chen et.al.	2407.16591	null
2024-07-23	TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback	Eunseop Yoon et.al.	2407.16574	null
2024-07-23	Cross Anything: General Quadruped Robot Navigation through Complex Terrains	Shaoting Zhu et.al.	2407.16412	null
2024-07-23	Evaluating Uncertainties in Electricity Markets via Machine Learning and Quantum Computing	Shuyang Zhu et.al.	2407.16404	null
2024-07-23	Reinforcement Learning-based Adaptive Mitigation of Uncorrected DRAM Errors in the Field	Isaac Boixaderas et.al.	2407.16377	null
2024-07-23	Arbitrary quantum states preparation aided by deep reinforcement learning	Zhao-Wei Wang et.al.	2407.16368	null
2024-07-22	WayEx: Waypoint Exploration using a Single Demonstration	Mara Levy et.al.	2407.15849	null
2024-07-23	QueST: Self-Supervised Skill Abstractions for Learning Continuous Control	Atharva Mete et.al.	2407.15840	null
2024-07-22	Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments	Mansur Arief et.al.	2407.15839	null
2024-07-22	On shallow planning under partial observability	Randy Lefebvre et.al.	2407.15820	null
2024-07-22	Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning	Zhecheng Yuan et.al.	2407.15815	null
2024-07-22	Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels	Zhuorui Ye et.al.	2407.15786	null
2024-07-22	Diffusion Model Based Resource Allocation Strategy in Ultra-Reliable Wireless Networked Control Systems	Amirhassan Babazadeh Darabi et.al.	2407.15784	null
2024-07-22	How to Shrink Confidence Sets for Many Equivalent Discrete Distributions?	Odalric-Ambrym Maillard et.al.	2407.15662	null
2024-07-22	Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN	Norman Becker et.al.	2407.15656	null
2024-07-22	Reinforcement Learning Meets Visual Odometry	Nico Messikommer et.al.	2407.15626	null
2024-07-19	Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification	Thomas Kwa et.al.	2407.14503	null
2024-07-19	Explainable Post hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning agent	Alejandra de la Rica Escudero et.al.	2407.14486	link
2024-07-19	Data-Centric Human Preference Optimization with Rationales	Hoang Anh Just et.al.	2407.14477	null
2024-07-19	FuzzTheREST: An Intelligent Automated Black-box RESTful API Fuzzer	Tiago Dias et.al.	2407.14361	null
2024-07-19	Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning	Nihal Acharya Adde et.al.	2407.14262	null
2024-07-19	On Policy Evaluation Algorithms in Distributional Reinforcement Learning	Julian Gerstenberg et.al.	2407.14175	null
2024-07-19	A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C	Neil De La Fuente et.al.	2407.14151	link
2024-07-19	Track-MDP: Reinforcement Learning for Target Tracking with Controlled Sensing	Adarsh M. Subramaniam et.al.	2407.13995	null
2024-07-19	The Effect of Training Schedules on Morphological Robustness and Generalization	Edoardo Barba et.al.	2407.13965	link
2024-07-18	Event-Triggered Reinforcement Learning Based Joint Resource Allocation for Ultra-Reliable Low-Latency V2X Communications	Nasir Khan et.al.	2407.13947	null
2024-07-18	Random Latent Exploration for Deep Reinforcement Learning	Srinath Mahankali et.al.	2407.13755	null
2024-07-18	Optimistic Q-learning for average reward and episodic reinforcement learning	Priyank Agrawal et.al.	2407.13743	null
2024-07-18	Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review	Masatoshi Uehara et.al.	2407.13734	null
2024-07-18	A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice	Shaina Raza et.al.	2407.13699	null
2024-07-18	Misspecified $Q$ -Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error	Ally Yalei Du et.al.	2407.13622	null
2024-07-18	Hyp2Nav: Hyperbolic Planning and Curiosity for Crowd Navigation	Alessandro Flaborea et.al.	2407.13567	null
2024-07-18	Model-based Policy Optimization using Symbolic World Model	Andrey Gorodetskiy et.al.	2407.13518	null
2024-07-18	Instance Selection for Dynamic Algorithm Configuration with Reinforcement Learning: Improving Generalization	Carolin Benjamins et.al.	2407.13513	null
2024-07-18	LIMT: Language-Informed Multi-Task Visual World Models	Elie Aljalbout et.al.	2407.13466	null
2024-07-18	The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations	Jan Ole von Hartz et.al.	2407.13432	null
2024-07-17	Navigating the Smog: A Cooperative Multi-Agent RL for Accurate Air Pollution Mapping through Data Assimilation	Ichrak Mokhtari et.al.	2407.12539	null
2024-07-17	Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models	Xihe Qiu et.al.	2407.12532	null
2024-07-17	Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments	Runfa Chen et.al.	2407.12505	null
2024-07-17	Estimating Reaction Barriers with Deep Reinforcement Learning	Adittya Pal et.al.	2407.12453	null
2024-07-17	Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning	Xu-Hui Liu et.al.	2407.12448	link
2024-07-17	Variable-Agnostic Causal Exploration for Reinforcement Learning	Minh Hoang Nguyen et.al.	2407.12437	null
2024-07-17	Flow Matching Imitation Learning for Multi-Support Manipulation	Quentin Rouxel et.al.	2407.12381	null
2024-07-17	A foundation model approach to guide antimicrobial peptide design in the era of artificial intelligence driven scientific discovery	Jike Wang et.al.	2407.12296	null
2024-07-17	Chip Placement with Diffusion	Vint Lee et.al.	2407.12282	null
2024-07-17	Individualized Federated Learning for Traffic Prediction with Error Driven Aggregation	Hang Chen et.al.	2407.12226	link
2024-07-16	Why long model-based rollouts are no reason for bad Q-value estimates	Philipp Wissmann et.al.	2407.11751	null
2024-07-16	Pareto local search for a multi-objective demand response problem in residential areas with heat pumps and electric vehicles	Thomas Dengiz et.al.	2407.11719	null
2024-07-16	A Comparative Analysis of Interactive Reinforcement Learning Algorithms in Warehouse Robot Grid Based Environment	Arunabh Bora et.al.	2407.11671	null
2024-07-16	Exciting Action: Investigating Efficient Exploration for Learning Musculoskeletal Humanoid Locomotion	Henri-Jacques Geiß et.al.	2407.11658	null
2024-07-16	Building Resilience in Wireless Communication Systems With a Secret-Key Budget	Karl-Ludwig Besser et.al.	2407.11604	null
2024-07-16	Learning to Imitate Spatial Organization in Multi-robot Systems	Ayomide O. Agunloye et.al.	2407.11592	null
2024-07-16	Green Resource Allocation in Cloud-Native O-RAN Enabled Small Cell Networks	Rana M. Sohaib et.al.	2407.11563	null
2024-07-16	RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards	Fatemeh Zargarbashi et.al.	2407.11562	null
2024-07-16	Imitation learning with artificial neural networks for demand response with a heuristic control approach for heat pumps	Thomas Dengiz et.al.	2407.11561	null
2024-07-16	DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN	Rana M. Sohaib et.al.	2407.11558	null
2024-07-15	Walking the Values in Bayesian Inverse Reinforcement Learning	Ondrej Bajgar et.al.	2407.10971	null
2024-07-15	BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning	Haohong Lin et.al.	2407.10967	null
2024-07-15	Hedging Beyond the Mean: A Distributional Reinforcement Learning Perspective for Hedging Portfolios with Structured Products	Anil Sharma et.al.	2407.10903	null
2024-07-15	Offline Reinforcement Learning with Imputed Rewards	Carlo Romeo et.al.	2407.10839	null
2024-07-15	Exploration in Knowledge Transfer Utilizing Reinforcement Learning	Adam Jedlička et.al.	2407.10835	null
2024-07-15	GuideLight: “Industrial Solution” Guidance for More Practical Traffic Signal Control Agents	Haoyuan Jiang et.al.	2407.10811	null
2024-07-15	DINO Pre-training for Vision-based End-to-end Autonomous Driving	Shubham Juneja et.al.	2407.10803	null
2024-07-15	Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning	Alessandro Montenegro et.al.	2407.10775	null
2024-07-16	Back to Newton’s Laws: Learning Vision-based Agile Flight via Differentiable Physics	Yuang Zhang et.al.	2407.10648	null
2024-07-15	Balancing the Scales: Reinforcement Learning for Fair Classification	Leon Eshuijs et.al.	2407.10629	null
2024-07-12	Learning Coordinated Maneuver in Adversarial Environments	Zechen Hu et.al.	2407.09469	null
2024-07-12	ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts	Amelia F. Hardy et.al.	2407.09447	null
2024-07-12	A Benchmark Environment for Offline Reinforcement Learning in Racing Games	Girolamo Macaluso et.al.	2407.09415	link
2024-07-12	Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments	Zoya Volovikova et.al.	2407.09287	null
2024-07-12	GNN with Model-based RL for Multi-agent Systems	Hanxiao Chen et.al.	2407.09249	null
2024-07-12	Constrained Intrinsic Motivation for Reinforcement Learning	Xiang Zheng et.al.	2407.09247	null
2024-07-12	Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network	Shun Kotoku et.al.	2407.09124	null
2024-07-12	New Desiderata for Direct Preference Optimization	Xiangkun Hu et.al.	2407.09072	null
2024-07-12	Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control	Huayu Chen et.al.	2407.09024	null
2024-07-12	Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control	Sicong Jiang et.al.	2407.08964	null
2024-07-11	MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces	Wayne Wu et.al.	2407.08725	null
2024-07-11	RoboMorph: Evolving Robot Morphology using Large Language Models	Kevin Qiu et.al.	2407.08626	null
2024-07-11	A Review of Nine Physics Engines for Reinforcement Learning Research	Michael Kaup et.al.	2407.08590	null
2024-07-11	HACMan++: Spatially-Grounded Motion Primitives for Manipulation	Bowen Jiang et.al.	2407.08585	null
2024-07-11	Imitation Learning for Robotic Assisted Ultrasound Examination of Deep Venous Thrombosis using Kernelized Movement Primitives	Diego Dall’Alba et.al.	2407.08506	null
2024-07-11	TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations	Junik Bae et.al.	2407.08464	null
2024-07-11	Distributed Deep Reinforcement Learning Based Gradient Quantization for Federated Learning Enabled Vehicle Edge Computing	Cui Zhang et.al.	2407.08462	null
2024-07-11	Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning	Shulin Song et.al.	2407.08458	link
2024-07-11	A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning	Adrien Banse et.al.	2407.08324	null
2024-07-11	A Deep Reinforcement Learning Framework and Methodology for Reducing the Sim-to-Real Gap in ASV Navigation	Luis F W Batista et.al.	2407.08263	null
2024-07-10	Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing	Jessica Yin et.al.	2407.07885	null
2024-07-10	Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation	Eugene Teoh et.al.	2407.07868	null
2024-07-10	Reinforcement Learning of Adaptive Acquisition Policies for Inverse Problems	Gianluigi Silvestri et.al.	2407.07794	null
2024-07-11	BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark	Nikita Chernyadev et.al.	2407.07788	null
2024-07-10	Continuous Control with Coarse-to-fine Reinforcement Learning	Younggyo Seo et.al.	2407.07787	null
2024-07-10	Towards Human-Like Driving: Active Inference in Autonomous Vehicle Control	Elahe Delavari et.al.	2407.07684	null
2024-07-10	Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning	Dake Zhang et.al.	2407.07631	null
2024-07-10	Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network	Yu Xie et.al.	2407.07575	link
2024-07-10	CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias	Jiacheng Shen et.al.	2407.07454	link
2024-07-10	Real-time system optimal traffic routing under uncertainties – Can physics models boost reinforcement learning?	Zemian Ke et.al.	2407.07364	null
2024-07-09	Safe and Reliable Training of Learning-Based Aerospace Controllers	Udayan Mandal et.al.	2407.07088	null
2024-07-09	Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models	Logan Cross et.al.	2407.07086	link
2024-07-09	Can Learned Optimization Make Reinforcement Learning Less Difficult?	Alexander David Goldie et.al.	2407.07082	link
2024-07-09	A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning	Jesse Jiang et.al.	2407.06931	null
2024-07-09	Intercepting Unauthorized Aerial Robots in Controlled Airspace Using Reinforcement Learning	Francisco Giral et.al.	2407.06909	null
2024-07-09	Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective	Shahana Ibrahim et.al.	2407.06902	null
2024-07-09	Energy Efficient Fair STAR-RIS for Mobile Users	Ashok S. Kumar et.al.	2407.06868	null
2024-07-09	Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning	Augustine N. Mavor-Parker et.al.	2407.06756	null
2024-07-09	Hierarchical Average-Reward Linearly-solvable Markov Decision Processes	Guillermo Infante et.al.	2407.06690	null
2024-07-09	Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning	Fanyue Wei et.al.	2407.06642	link
2024-07-08	Periodic agent-state based Q-learning for POMDPs	Amit Sinha et.al.	2407.06121	null
2024-07-08	QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train	Chen-Yu Liu et.al.	2407.06103	null
2024-07-08	Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation	Sara Pohland et.al.	2407.06056	link
2024-07-08	iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement	Aoyu Pang et.al.	2407.06025	link
2024-07-08	Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals	Moritz Reuss et.al.	2407.05996	null
2024-07-08	On Bellman equations for continuous-time policy evaluation I: discretization and approximation	Wenlong Mou et.al.	2407.05966	null
2024-07-08	Graph Anomaly Detection with Noisy Labels by Reinforcement Learning	Zhu Wang et.al.	2407.05934	null
2024-07-08	FedMRL: Data Heterogeneity Aware Federated Multi-agent Deep Reinforcement Learning for Medical Imaging	Pranab Sahoo et.al.	2407.05800	link
2024-07-08	Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning	Jakob Nyberg et.al.	2407.05775	link
2024-07-08	Multi-agent Reinforcement Learning-based Network Intrusion Detection System	Amine Tellache et.al.	2407.05766	null
2024-07-05	Graph Reinforcement Learning in Power Grids: A Survey	Mohamed Hassouna et.al.	2407.04522	null
2024-07-05	Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks	Timon Sachweh et.al.	2407.04481	null
2024-07-05	Hindsight Preference Learning for Offline Preference-based Reinforcement Learning	Chen-Xiao Gao et.al.	2407.04451	link
2024-07-05	Enhancing Safety for Autonomous Agents in Partly Concealed Urban Traffic Environments Through Representation-Based Shielding	Pierre Haritz et.al.	2407.04343	null
2024-07-05	Gradient-based Regularization for Action Smoothness in Robotic Control with Reinforcement Learning	I Lee et.al.	2407.04315	null
2024-07-05	Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling	Jiawei Xu et.al.	2407.04285	null
2024-07-05	Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator	Mehryar Abbasi et.al.	2407.04258	null
2024-07-05	PA-LOCO: Learning Perturbation-Adaptive Locomotion for Quadruped Robots	Zhiyuan Xiao et.al.	2407.04224	null
2024-07-05	Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents	Sam Earle et.al.	2407.04221	null
2024-07-04	Orchestrating LLMs with Different Personalizations	Jin Peng Zhou et.al.	2407.04181	null
2024-07-03	Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations	Trevor Ablett et.al.	2407.03311	link
2024-07-03	A Review of the Applications of Deep Learning-Based Emergent Communication	Brendon Boldt et.al.	2407.03302	null
2024-07-03	Cooperative Multi-Agent Deep Reinforcement Learning Methods for UAV-aided Mobile Edge Computing Networks	Mintae Kim et.al.	2407.03280	null
2024-07-03	Policy-guided Monte Carlo on general state spaces: Application to glass-forming mixtures	Leonardo Galliano et.al.	2407.03275	null
2024-07-03	PPO-based Dynamic Control of Uncertain Floating Platforms in the Zero-G Environment	Mahya Ramezani et.al.	2407.03224	null
2024-07-03	Combining AI Control Systems and Human Decision Support via Robustness and Criticality	Walt Woods et.al.	2407.03210	null
2024-07-03	Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning	Runyu Ding et.al.	2407.03162	null
2024-07-03	Reinforcement Learning for Sequence Design Leveraging Protein Language Models	Jithendaraa Subramanian et.al.	2407.03154	null
2024-07-03	Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes	Asaf Cassel et.al.	2407.03065	null
2024-07-03	Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment	Janghwan Lee et.al.	2407.03051	null
2024-07-02	PWM: Policy Learning with Large World Models	Ignat Georgiev et.al.	2407.02466	null
2024-07-02	Predicting Visual Attention in Graphic Design Documents	Souradeep Chakraborty et.al.	2407.02439	null
2024-07-02	Reinforcement Learning and Machine ethics:a systematic review	Ajay Vishwanath et.al.	2407.02425	null
2024-07-02	Talking to Machines: do you read me?	Lina M. Rojas-Barahona et.al.	2407.02354	null
2024-07-02	DextrAH-G: Pixels-to-Action Dexterous Arm-Hand Grasping with Geometric Fabrics	Tyler Ga Wei Lum et.al.	2407.02274	null
2024-07-02	Safe CoR: A Dual-Expert Approach to Integrating Imitation Learning and Safe Reinforcement Learning Using Constraint Rewards	Hyeokjin Kwon et.al.	2407.02245	null
2024-07-02	Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization	Yuchen Hu et.al.	2407.02243	null
2024-07-02	Safety-Driven Deep Reinforcement Learning Framework for Cobots: A Sim2Real Approach	Ammar N. Abbas et.al.	2407.02231	link
2024-07-02	Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning	Zakariae El Asri et.al.	2407.02217	null
2024-07-02	Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning	Yifang Chen et.al.	2407.02119	null
2024-06-28	PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators	Kuo-Hao Zeng et.al.	2406.20083	null
2024-06-28	Applying RLAIF for Code Generation with API-usage in Lightweight LLMs	Sujan Dutta et.al.	2406.20060	null
2024-06-28	HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid	Xinyu Xu et.al.	2406.19972	null
2024-06-28	Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies	Pingcheng Jian et.al.	2406.19971	null
2024-06-28	Operator World Models for Reinforcement Learning	Pietro Novelli et.al.	2406.19861	null
2024-06-28	3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints	Yoonkyu Yoo et.al.	2406.19848	null
2024-06-28	Reinforcement Learning for Efficient Design and Control Co-optimisation of Energy Systems	Marine Cauz et.al.	2406.19825	null
2024-06-28	Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning	Tobias Nagel et.al.	2406.19817	null
2024-06-28	Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs	Shiyu Zhang et.al.	2406.19812	null
2024-06-28	Decision Transformer for IRS-Assisted Systems with Diffusion-Driven Generative Channels	Jie Zhang et.al.	2406.19769	null
2024-06-27	Efficient World Models with Context-Aware Tokenization	Vincent Micheli et.al.	2406.19320	link
2024-06-27	Averaging log-likelihoods in direct alignment	Nathan Grinsztajn et.al.	2406.19188	null
2024-06-27	Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion	Yannis Flet-Berliac et.al.	2406.19185	null
2024-06-27	Learning Pareto Set for Multi-Objective Continuous Robot Control	Tianye Shu et.al.	2406.18924	link
2024-06-27	Autonomous Control of a Novel Closed Chain Five Bar Active Suspension via Deep Reinforcement Learning	Nishesh Singh et.al.	2406.18899	null
2024-06-27	State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems	Tochukwu Elijah Ogri et.al.	2406.18804	null
2024-06-26	Decentralized Semantic Traffic Control in AVs Using RL and DQN for Dynamic Roadblocks	Emanuel Figetakis et.al.	2406.18741	null
2024-06-26	Confident Natural Policy Gradient for Local Planning in $q_π$ -realizable Constrained MDPs	Tian Tian et.al.	2406.18529	null
2024-06-26	Mental Modeling of Reinforcement Learning Agents by Language Models	Wenhao Lu et.al.	2406.18505	null
2024-06-26	Preference Elicitation for Offline Reinforcement Learning	Alizée Pace et.al.	2406.18450	null
2024-06-26	Mixture of Experts in a Mixture of RL settings	Timon Willi et.al.	2406.18420	null
2024-06-26	AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors	Hao Shi et.al.	2406.18394	null
2024-06-26	Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control	Zifan Liu et.al.	2406.18351	null
2024-06-26	AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations	Adam Dahlgren Lindström et.al.	2406.18346	null
2024-06-26	Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution	Wenting Chen et.al.	2406.18310	link
2024-06-26	Combining Automated Optimisation of Hyperparameters and Reward Shape	Julian Dierkes et.al.	2406.18293	link
2024-06-26	Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems	Italo Luis da Silva et.al.	2406.18245	link
2024-06-25	EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data	Jesse Zhang et.al.	2406.17768	null
2024-06-25	When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning	Claas Voelcker et.al.	2406.17718	null
2024-06-25	Privacy Preserving Reinforcement Learning for Population Processes	Samuel Yang-Zhao et.al.	2406.17649	null
2024-06-25	KANQAS: Kolmogorov Arnold Network for Quantum Architecture Search	Akash Kundu et.al.	2406.17630	link
2024-06-25	Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations	Cheng Wang et.al.	2406.17576	null
2024-06-25	On the consistency of hyper-parameter selection in value-based deep reinforcement learning	Johan Obando-Ceron et.al.	2406.17523	null
2024-06-25	BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO	Sebastian Dittert et.al.	2406.17490	null
2024-06-25	CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems	Zhen Chen et.al.	2406.17425	null
2024-06-25	Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning	Tianfu Wang et.al.	2406.17334	link
2024-06-25	The State-Action-Reward-State-Action Algorithm in Spatial Prisoner’s Dilemma Game	Lanyu Yang et.al.	2406.17326	null
2024-06-24	Confidence Aware Inverse Constrained Reinforcement Learning	Sriram Ganapathi Subramanian et.al.	2406.16782	null
2024-06-24	WARP: On the Benefits of Weight Averaged Rewarded Policies	Alexandre Ramé et.al.	2406.16768	null
2024-06-24	The MRI Scanner as a Diagnostic: Image-less Active Sampling	Yuning Du et.al.	2406.16754	null
2024-06-24	OCALM: Object-Centric Assessment with Language Models	Timo Kaufmann et.al.	2406.16748	null
2024-06-24	Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization	Zhengyue Zhao et.al.	2406.16743	null
2024-06-24	Probabilistic Subgoal Representations for Hierarchical Reinforcement learning	Vivienne Huiling Wang et.al.	2406.16707	null
2024-06-24	Decentralized RL-Based Data Transmission Scheme for Energy Efficient Harvesting	Rafaela Scaciota et.al.	2406.16624	null
2024-06-24	Towards Physically Talented Aerial Robots with Tactically Smart Swarm Behavior thereof: An Efficient Co-design Approach	Prajit KrisshnaKumar et.al.	2406.16612	null
2024-06-24	$\text{Alpha}^2$ : Discovering Logical Formulaic Alphas using Deep Reinforcement Learning	Feng Xu et.al.	2406.16505	link
2024-06-24	Towards Comprehensive Preference Data Collection for Reward Modeling	Yulan Hu et.al.	2406.16486	null
2024-06-21	MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation	Xuan He et.al.	2406.15252	null
2024-06-21	Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning	Sattar Vakili et.al.	2406.15250	null
2024-06-21	Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting	Jiyong Oh et.al.	2406.15225	null
2024-06-21	Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks	Alex Quach et.al.	2406.15149	null
2024-06-21	KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty	Philipp Becker et.al.	2406.15131	null
2024-06-21	A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning	Gianluca Drappo et.al.	2406.15124	null
2024-06-21	Towards General Negotiation Strategies with End-to-End Reinforcement Learning	Bram M. Renting et.al.	2406.15096	null
2024-06-21	KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning	Jiahan Chen et.al.	2406.15073	null
2024-06-21	Behaviour Distillation	Andrei Lupu et.al.	2406.15042	link
2024-06-21	SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning	Matthias Weissenbacher et.al.	2406.15025	null
2024-06-20	CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics	Jiawei Gao et.al.	2406.14558	null
2024-06-20	MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading	Chuqiao Zong et.al.	2406.14537	link
2024-06-20	RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold	Amrith Setlur et.al.	2406.14532	link
2024-06-20	Learning telic-controllable state representations	Nadav Amir et.al.	2406.14476	null
2024-06-20	Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue	Huifang Du et.al.	2406.14457	null
2024-06-20	Revealing the learning process in reinforcement learning agents through attention-oriented metrics	Charlotte Beylier et.al.	2406.14324	null
2024-06-20	Resource Optimization for Tail-Based Control in Wireless Networked Control Systems	Rasika Vijithasena et.al.	2406.14301	null
2024-06-21	REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability	Shuang Ao et.al.	2406.14214	link
2024-06-20	Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning	Amit Sharma et.al.	2406.14169	null
2024-06-20	Iterative Sizing Field Prediction for Adaptive Mesh Generation From Expert Demonstrations	Niklas Freymuth et.al.	2406.14161	link
2024-06-18	Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Haoxiang Wang et.al.	2406.12845	link
2024-06-18	Injection Optimization at Particle Accelerators via Reinforcement Learning: From Simulation to Real-World Application	Awal Awal et.al.	2406.12735	null
2024-06-18	A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement Learning	Flora Angileri et.al.	2406.12667	null
2024-06-18	Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry	A. L. García Navarro et.al.	2406.12602	null
2024-06-18	Discovering Minimal Reinforcement Learning Environments	Jarek Liesen et.al.	2406.12589	null
2024-06-18	RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation	Shuting Wang et.al.	2406.12566	null
2024-06-18	A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo	Miguel Vasco et.al.	2406.12563	null
2024-06-18	Offline Imitation Learning with Model-based Reverse Augmentation	Jie-Jing Shao et.al.	2406.12550	null
2024-06-18	Demonstrating Agile Flight from Pixels without State Estimation	Ismail Geles et.al.	2406.12505	null
2024-06-18	Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning	Harry Robertshaw et.al.	2406.12499	null
2024-06-17	WPO: Enhancing RLHF with Weighted Preference Optimization	Wenxuan Zhou et.al.	2406.11827	link
2024-06-17	Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics	Runzhe Wu et.al.	2406.11810	null
2024-06-17	Run Time Assured Reinforcement Learning for Six Degree-of-Freedom Spacecraft Inspection	Kyle Dunlap et.al.	2406.11795	null
2024-06-17	FetchBench: A Simulation Benchmark for Robot Fetching	Beining Han et.al.	2406.11793	null
2024-06-17	Optimal Transport-Assisted Risk-Sensitive Q-Learning	Zahra Shahrooei et.al.	2406.11774	null
2024-06-17	Measuring memorization in RLHF for code completion	Aneesh Pappu et.al.	2406.11715	null
2024-06-17	The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation	Noah Golowich et.al.	2406.11686	null
2024-06-17	Communication-Efficient MARL for Platoon Stability and Energy-efficiency Co-optimization in Cooperative Adaptive Cruise Control of CAVs	Min Hua et.al.	2406.11653	null
2024-06-17	Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions	Noah Golowich et.al.	2406.11640	null
2024-06-17	Style Transfer with Multi-iteration Preference Optimization	Shuai Liu et.al.	2406.11581	null
2024-06-14	Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs	Rui Yang et.al.	2406.10216	null
2024-06-14	A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors	Naaman Tan et.al.	2406.10203	null
2024-06-14	Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication	Sanjali Yadav et.al.	2406.10166	null
2024-06-14	Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models	Carson Denison et.al.	2406.10162	link
2024-06-14	BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation	Dongjie Yu et.al.	2406.10093	null
2024-06-14	PRIMER: Perception-Aware Robust Learning-based Multiagent Trajectory Planner	Kota Kondo et.al.	2406.10060	null
2024-06-14	Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation	Federico Tavella et.al.	2406.10043	null
2024-06-14	ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR	Vishwanath Pratap Singh et.al.	2406.09999	null
2024-06-14	Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model	Siemen Herremans et.al.	2406.09976	link
2024-06-14	InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning	Tiancheng Li et.al.	2406.09973	null
2024-06-13	Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms	Miaosen Zhang et.al.	2406.09397	null
2024-06-13	Is Value Learning Really the Main Bottleneck in Offline RL?	Seohong Park et.al.	2406.09329	null
2024-06-13	OpenVLA: An Open-Source Vision-Language-Action Model	Moo Jin Kim et.al.	2406.09246	null
2024-06-13	AutomaChef: A Physics-informed Demonstration-guided Learning Framework for Granular Material Manipulation	Minglun Wei et.al.	2406.09178	null
2024-06-13	Direct Imitation Learning-based Visual Servoing using the Large Projection Formulation	Sayantan Auddy et.al.	2406.09120	null
2024-06-13	Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems	Ashwin P. Dani et.al.	2406.09097	null
2024-06-13	DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning	Xuemin Hu et.al.	2406.09089	null
2024-06-13	Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles	Hao Zhang et.al.	2406.09082	null
2024-06-13	Latent Assistance Networks: Rediscovering Hyperbolic Tangents in RL	Jacob E. Kooi et.al.	2406.09079	null
2024-06-13	Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation	Claude Formanek et.al.	2406.09068	null
2024-06-12	RILe: Reinforced Imitation Learning	Mert Albaba et.al.	2406.08472	null
2024-06-12	Adaptive Swarm Mesh Refinement using Deep Reinforcement Learning with Local Rewards	Niklas Freymuth et.al.	2406.08440	null
2024-06-12	RRLS : Robust Reinforcement Learning Suite	Adil Zouitine et.al.	2406.08406	link
2024-06-12	Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning	Yuhui Wang et.al.	2406.08404	null
2024-06-12	Time-Constrained Robust MDPs	Adil Zouitine et.al.	2406.08395	null
2024-06-12	Residual Learning and Context Encoding for Adaptive Offline-to-Online Reinforcement Learning	Mohammadreza Nakhaei et.al.	2406.08238	link
2024-06-12	MaIL: Improving Imitation Learning with Mamba	Xiaogang Jia et.al.	2406.08234	null
2024-06-12	Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning	Max Weltevrede et.al.	2406.08069	null
2024-06-12	Deep reinforcement learning with positional context for intraday trading	Sven Goluža et.al.	2406.08013	null
2024-06-12	Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning	Yizhe Huang et.al.	2406.08002	null
2024-06-11	CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning	Zeyuan Liu et.al.	2406.07541	null
2024-06-11	BAKU: An Efficient Transformer for Multi-Task Policy Learning	Siddhant Haldar et.al.	2406.07539	null
2024-06-11	Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis	Qining Zhang et.al.	2406.07455	null
2024-06-11	Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization	Weiliang Zhang et.al.	2406.07418	null
2024-06-11	Federated Multi-Agent DRL for Radio Resource Management in Industrial 6G in-X subnetworks	Bjarke Madsen et.al.	2406.07383	null
2024-06-11	World Models with Hints of Large Language Models for Goal Achieving	Zeyuan Liu et.al.	2406.07381	null
2024-06-11	EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning	Yijun Hao et.al.	2406.07342	null
2024-06-11	Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling	Constantin Waubert de Puiseau et.al.	2406.07325	null
2024-06-11	Multi-objective Reinforcement learning from AI Feedback	Marcus Williams et.al.	2406.07295	null
2024-06-11	Hybrid Reinforcement Learning from Offline Observation Alone	Yuda Song et.al.	2406.07253	null
2024-06-10	Verification-Guided Shielding for Deep Reinforcement Learning	Davide Corsi et.al.	2406.06507	null
2024-06-10	Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation	Mohidul Haque Mridul et.al.	2406.06500	null
2024-06-10	Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity	Calarina Muslimani et.al.	2406.06495	null
2024-06-10	Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots	Bahador Beigomi et.al.	2406.06460	link
2024-06-10	Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?	Denis Tarasov et.al.	2406.06309	link
2024-06-10	Learning-based cognitive architecture for enhancing coordination in human groups	Antonio Grotta et.al.	2406.06297	null
2024-06-10	Deep Multi-Objective Reinforcement Learning for Utility-Based Infrastructural Maintenance Optimization	Jesse van Remmerden et.al.	2406.06184	null
2024-06-10	Mastering truss structure optimization with tree search	Gabriel E. Garayalde et.al.	2406.06145	null
2024-06-10	EXPIL: Explanatory Predicate Invention for Learning in Games	Jingyuan Sha et.al.	2406.06107	null
2024-06-10	Sim-To-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery	Paul Maria Scheikl et.al.	2406.06092	null
2024-06-07	LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration	Tavor Lipman et.al.	2406.05107	null
2024-06-07	Massively Multiagent Minigames for Training Generalist Agents	Kyoung Whan Choe et.al.	2406.05071	link
2024-06-07	Online Frequency Scheduling by Learning Parallel Actions	Anastasios Giovanidis et.al.	2406.05041	null
2024-06-07	Optimizing Automatic Differentiation with Deep Reinforcement Learning	Jamie Lohoff et.al.	2406.05027	null
2024-06-07	Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems	Rohan Paleja et.al.	2406.05003	null
2024-06-07	SLOPE: Search with Learned Optimal Pruning-based Expansion	Davor Bokan et.al.	2406.04935	link
2024-06-07	Sim-to-real Transfer of Deep Reinforcement Learning Agents for Online Coverage Path Planning	Arvi Jonnarth et.al.	2406.04920	null
2024-06-07	Online Adaptation for Enhancing Imitation Learning Policies	Federico Malato et.al.	2406.04913	link
2024-06-07	Stabilizing Extreme Q-learning by Maclaurin Expansion	Motoki Omura et.al.	2406.04896	null
2024-06-07	Primitive Agentic First-Order Optimization	R. Sala et.al.	2406.04841	null
2024-06-06	ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories	Qianlan Yang et.al.	2406.04323	null
2024-06-06	Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models	Xiang Ji et.al.	2406.04274	null
2024-06-06	Multi-Agent Imitation Learning: Value is Easy, Regret is Hard	Jingwu Tang et.al.	2406.04219	null
2024-06-06	Aligning Agents like Large Language Models	Adam Jelley et.al.	2406.04208	null
2024-06-06	MARLander: A Local Path Planning for Drone Swarms using Multiagent Deep Reinforcement Learning	Demetros Aschu et.al.	2406.04159	null
2024-06-06	Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning	Abdullah Akgül et.al.	2406.04088	null
2024-06-06	Bootstrapping Expectiles in Reinforcement Learning	Pierre Clavier et.al.	2406.04081	null
2024-06-06	Spatio-temporal Early Prediction based on Multi-objective Reinforcement Learning	Wei Shao et.al.	2406.04035	link
2024-06-06	Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents	Yoann Poupart et.al.	2406.04028	link
2024-06-06	HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning	Quentin Delfosse et.al.	2406.03997	link
2024-06-05	Automating Turkish Educational Quiz Generation Using Large Language Models	Kamyar Zeinalipour et.al.	2406.03397	null
2024-06-05	LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback	Timon Ziegenbein et.al.	2406.03363	link
2024-06-05	UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning	Yu Zhang et.al.	2406.03324	null
2024-06-05	Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning	Mohamed Elsayed et.al.	2406.03276	null
2024-06-05	Prompt-based Visual Alignment for Zero-shot Policy Transfer	Haihan Gao et.al.	2406.03250	null
2024-06-05	Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning	Inwoo Hwang et.al.	2406.03234	link
2024-06-05	CommonPower: Supercharging Machine Learning for Smart Grids	Michael Eichelbeck et.al.	2406.03231	link
2024-06-05	Object Manipulation in Marine Environments using Reinforcement Learning	Ahmed Nader et.al.	2406.03223	null
2024-06-05	Adaptive Distance Functions via Kelvin Transformation	Rafael I. Cabral Muchacho et.al.	2406.03200	null
2024-06-05	DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays	Bo Xia et.al.	2406.03102	null
2024-06-04	RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots	Soroush Nasiriany et.al.	2406.02523	link
2024-06-04	Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs	Filippo Valdettaro et.al.	2406.02456	null
2024-06-04	A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies	Md Mirajul Islam et.al.	2406.02450	null
2024-06-04	Algorithmic Collusion in Dynamic Pricing with Deep Reinforcement Learning	Shidi Deng et.al.	2406.02437	null
2024-06-04	Seed-TTS: A Family of High-Quality Versatile Speech Generation Models	Philip Anastassiou et.al.	2406.02430	link
2024-06-04	Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning	Jiaxu Wang et.al.	2406.02370	null
2024-06-04	How to Explore with Belief: State Entropy Maximization in POMDPs	Riccardo Zamboni et.al.	2406.02295	null
2024-06-04	Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling	Arthur Müller et.al.	2406.02294	null
2024-06-04	Test-Time Regret Minimization in Meta Reinforcement Learning	Mirco Mutti et.al.	2406.02282	null
2024-06-04	Reinforcement Learning with Lookahead Information	Nadav Merlis et.al.	2406.02258	null
2024-05-31	*Exploratory Preference Optimization: Harnessing Implicit Q-Approximation for Sample-Efficient RLHF**	Tengyang Xie et.al.	2405.21046	null
2024-05-31	Direct Alignment of Language Models via Quality-Aware Self-Refinement	Runsheng Yu et.al.	2405.21040	null
2024-06-03	Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles	Jiesong Lian et.al.	2405.21027	null
2024-05-31	Generating Triangulations and Fibrations with Reinforcement Learning	Per Berglund et.al.	2405.21017	null
2024-05-31	Bayesian Design Principles for Offline-to-Online Reinforcement Learning	Hao Hu et.al.	2405.20984	null
2024-05-31	Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring	Prasoon Raghuwanshi et.al.	2405.20983	null
2024-05-31	SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales	Tianyang Xu et.al.	2405.20974	link
2024-05-31	Amortizing intractable inference in diffusion models for vision, language, and control	Siddarth Venkatraman et.al.	2405.20971	link
2024-05-31	Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation	Shangding Gu et.al.	2405.20860	null
2024-05-31	Improving Reward Models with Synthetic Critiques	Zihuiwen Ye et.al.	2405.20850	null
2024-05-30	Group Robust Preference Optimization in Reward-free RLHF	Shyam Sundhar Ramesh et.al.	2405.20304	link
2024-05-30	Evaluating Large Language Model Biases in Persona-Steered Generation	Andy Liu et.al.	2405.20253	link
2024-05-30	InstructionCP: A fast approach to transfer Large Language Models into target language	Kuang-Ming Chen et.al.	2405.20175	null
2024-05-30	Enhancing Battlefield Awareness: An Aerial RIS-assisted ISAC System with Deep Reinforcement Learning	Hyunsang Cho et.al.	2405.20168	null
2024-05-30	Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation	Wooseong Cho et.al.	2405.20165	null
2024-05-30	NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models	Kai Wu et.al.	2405.20081	null
2024-05-30	Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads	Avelina Asada Hadji-Kyriacou et.al.	2405.20053	link
2024-05-30	Deep Reinforcement Learning for Intrusion Detection in IoT: A Survey	Afrah Gueriani et.al.	2405.20038	null
2024-05-30	Safe Multi-agent Reinforcement Learning with Natural Language Constraints	Ziyan Wang et.al.	2405.20018	null
2024-05-30	LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning	Hyungho Na et.al.	2405.19998	null
2024-05-29	Self-Exploring Language Models: Active Preference Elicitation for Online Alignment	Shenao Zhang et.al.	2405.19332	link
2024-05-29	Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF	Shicong Cen et.al.	2405.19320	null
2024-05-29	Robust Preference Optimization through Reward Model Distillation	Adam Fisch et.al.	2405.19316	null
2024-05-29	Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels	Abhay Deshpande et.al.	2405.19307	null
2024-05-29	Act Natural! Projecting Autonomous System Trajectories Into Naturalistic Behavior Sets	Hamzah I. Khan et.al.	2405.19292	null
2024-05-29	Rich-Observation Reinforcement Learning with Continuous Latent Dynamics	Yuda Song et.al.	2405.19269	null
2024-05-29	Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach	Amir Hossein Karbasi et.al.	2405.19236	null
2024-05-29	Diffusion-based Dynamics Models for Long-Horizon Rollout in Offline Reinforcement Learning	Hanye Zhao et.al.	2405.19189	null
2024-05-29	Conditional Latent ODEs for Motion Prediction in Autonomous Driving	Khang Truong Giang et.al.	2405.19183	null
2024-05-29	A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning	Arthur Juliani et.al.	2405.19153	null
2024-05-28	Hierarchical World Models as Visual Whole-Body Humanoid Controllers	Nicklas Hansen et.al.	2405.18418	null
2024-05-28	Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study	Shreyas Bhat et.al.	2405.18324	null
2024-05-28	Highway Reinforcement Learning	Yuhui Wang et.al.	2405.18289	null
2024-05-28	Extreme Value Monte Carlo Tree Search	Masataro Asai et.al.	2405.18248	null
2024-05-28	Recurrent Natural Policy Gradient for POMDPs	Semih Cayci et.al.	2405.18221	null
2024-05-28	Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving	Zhi Zheng et.al.	2405.18209	link
2024-05-28	Mutation-Bias Learning in Games	Johann Bauer et.al.	2405.18190	null
2024-05-28	Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding	Daniel Bethell et.al.	2405.18180	link
2024-05-28	Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing	Wei Zhao et.al.	2405.18166	link
2024-05-28	PyTAG: Tabletop Games for Multi-Agent Reinforcement Learning	Martin Balla et.al.	2405.18123	link
2024-05-27	A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning	Abdulaziz Almuzairee et.al.	2405.17416	null
2024-05-27	Rethinking Transformers in Solving POMDPs	Chenhao Lu et.al.	2405.17358	link
2024-05-27	Opinion-Guided Reinforcement Learning	Kyanna Dagenais et.al.	2405.17287	null
2024-05-27	DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems	Zhi Zheng et.al.	2405.17272	link
2024-05-27	Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning	Adriana Hugessen et.al.	2405.17243	null
2024-05-27	InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning	Guozheng Li et.al.	2405.17229	null
2024-05-27	Learning Generic and Dynamic Locomotion of Humanoids Across Discrete Terrains	Shangqun Yu et.al.	2405.17227	null
2024-05-27	Flow control of three-dimensional cylinders transitioning to turbulence via multi-agent reinforcement learning	P. Suárez et.al.	2405.17210	null
2024-05-27	CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control	Jingqing Ruan et.al.	2405.17152	link
2024-05-27	Q-value Regularized Transformer for Offline Reinforcement Learning	Shengchao Hu et.al.	2405.17098	null
2024-05-24	Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment	Hao Sun et.al.	2405.15624	null
2024-05-24	Neuromorphic dreaming: A pathway to efficient learning in artificial agents	Ingo Blakowski et.al.	2405.15616	null
2024-05-24	OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code	Maxence Faldor et.al.	2405.15568	link
2024-05-24	Learning Generalizable Human Motion Generator with Reinforcement Learning	Yunyao Mao et.al.	2405.15541	null
2024-05-24	Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces	Angeliki Kamoutsi et.al.	2405.15509	null
2024-05-24	Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments	Olivia Jullian Parra et.al.	2405.15508	null
2024-05-24	TD3 Based Collision Free Motion Planning for Robot Navigation	Hao Liu et.al.	2405.15460	null
2024-05-24	Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics	David Boetius et.al.	2405.15430	null
2024-05-24	Model-free reinforcement learning with noisy actions for automated experimental control in optics	Lea Richtmann et.al.	2405.15421	null
2024-05-24	Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate	Fan-Ming Luo et.al.	2405.15384	null
2024-05-23	Privileged Sensing Scaffolds Reinforcement Learning	Edward S. Hu et.al.	2405.14853	null
2024-05-23	Axioms for AI Alignment from Human Feedback	Luise Ge et.al.	2405.14758	null
2024-05-23	AGILE: A Novel Framework of LLM Agents	Peiyuan Feng et.al.	2405.14751	link
2024-05-23	Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence	Minheng Xiao et.al.	2405.14749	null
2024-05-23	SimPO: Simple Preference Optimization with a Reference-Free Reward	Yu Meng et.al.	2405.14734	link
2024-05-23	Multi-turn Reinforcement Learning from Preference Human Feedback	Lior Shani et.al.	2405.14655	null
2024-05-23	Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models	Jingyi Chen et.al.	2405.14632	null
2024-05-23	Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences	Takuya Hiraoka et.al.	2405.14629	null
2024-05-23	Closed-form Symbolic Solutions: A New Perspective on Solving Partial Differential Equations	Shu Wei et.al.	2405.14620	null
2024-05-23	Discretization of continuous input spaces in the hippocampal autoencoder	Adrian F. Amil et.al.	2405.14600	null
2024-05-21	Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale	Shriram Chennakesavalu et.al.	2405.12961	null
2024-05-21	Effect of Synthetic Jets Actuator Parameters on Deep Reinforcement Learning-Based Flow Control Performance in a Square Cylinder	Wang Jia et.al.	2405.12834	null
2024-05-21	Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones	Jan-Hendrik Ewers et.al.	2405.12800	null
2024-05-21	Generative AI and Large Language Models for Cyber Security: All Insights You Need	Mohamed Amine Ferrag et.al.	2405.12750	null
2024-05-21	Reinforcement Learning Enabled Peer-to-Peer Energy Trading for Dairy Farms	Mian Ibad Ali Shah et.al.	2405.12716	null
2024-05-21	A Multimodal Learning-based Approach for Autonomous Landing of UAV	Francisco Neves et.al.	2405.12681	null
2024-05-21	Learning Causal Dynamics Models in Object-Oriented Environments	Zhongwei Yu et.al.	2405.12615	null
2024-05-21	PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation	Yuhua Zhu et.al.	2405.12535	null
2024-05-21	GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems	Zhenwei Wang et.al.	2405.12475	null
2024-05-21	Physics-based Scene Layout Generation from Human Motion	Jianan Li et.al.	2405.12460	null
2024-05-20	Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?	Yang Dai et.al.	2405.12094	null
2024-05-20	PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation	Zhuobin Huang et.al.	2405.12079	null
2024-05-20	Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning	Hai Zhang et.al.	2405.12001	null
2024-05-20	Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space	Qianmei Liu et.al.	2405.11982	null
2024-05-20	A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers	Tom Roth et.al.	2405.11904	null
2024-05-20	Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process	Ermo Hua et.al.	2405.11870	link
2024-05-20	Reward-Punishment Reinforcement Learning with Maximum Entropy	Jiexin Wang et.al.	2405.11784	null
2024-05-20	Efficient Multi-agent Reinforcement Learning by Planning	Qihan Liu et.al.	2405.11778	link
2024-05-20	Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning	Xin Liu et.al.	2405.11740	null
2024-05-20	Highway Graph to Accelerate Reinforcement Learning	Zidu Yin et.al.	2405.11727	link
2024-05-17	Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review	Hongyi Yang et.al.	2405.10883	null
2024-05-17	Automated Radiology Report Generation: A Review of Recent Advances	Phillip Sloan et.al.	2405.10842	null
2024-05-17	Combining Teacher-Student with Representation Learning: A Concurrent Teacher-Student Reinforcement Learning Paradigm for Legged Locomotion	Hongxi Wang et.al.	2405.10830	null
2024-05-17	Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities	Hao Zhou et.al.	2405.10825	null
2024-05-17	A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization	Andrzej Ruszczyński et.al.	2405.10815	null
2024-05-17	SignLLM: Sign Languages Production Large Language Models	Sen Fang et.al.	2405.10718	null
2024-05-17	Sample-Efficient Constrained Reinforcement Learning with General Parameterization	Washim Uddin Mondal et.al.	2405.10624	null
2024-05-17	An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems	Jiyue Tao et.al.	2405.10576	null
2024-05-17	Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage Control	Jaeik Jeong et.al.	2405.10536	null
2024-05-17	Towards Better Question Generation in QA-Based Event Extraction	Zijin Hong et.al.	2405.10517	null
2024-05-16	Stochastic Q-learning for Large Discrete Action Spaces	Fares Fourati et.al.	2405.10310	null
2024-05-16	Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning	Yuexiang Zhai et.al.	2405.10292	null
2024-05-16	Keep It Private: Unsupervised Privatization of Online Text	Calvin Bao et.al.	2405.10260	link
2024-05-16	A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy	Zhaoxing Li et.al.	2405.10214	null
2024-05-16	Continuous Transfer Learning for UAV Communication-aware Trajectory Design	Chenrui Sun et.al.	2405.10087	null
2024-05-16	Optimizing Search and Rescue UAV Connectivity in Challenging Terrain through Multi Q-Learning	Mohammed M. H. Qazzaz et.al.	2405.10042	null
2024-05-16	Reward Centering	Abhishek Naik et.al.	2405.09999	null
2024-05-16	Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning	Francisco Leiva et.al.	2405.09760	null
2024-05-16	NIFTY Financial News Headlines Dataset	Raeid Saqur et.al.	2405.09747	null
2024-05-15	Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning	Sihan Zeng et.al.	2405.09660	null
2024-05-15	Reinforcement Learning-Based Framework for the Intelligent Adaptation of User Interfaces	Daniel Gaspar-Figueiredo et.al.	2405.09255	null
2024-05-15	DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State Representation	Jingwen Yang et.al.	2405.09163	null
2024-05-15	CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving	Dechen Gao et.al.	2405.09111	null
2024-05-15	Chaos-based reinforcement learning with TD3	Toshitaka Matsuki et.al.	2405.09086	null
2024-05-15	Deep Learning in Earthquake Engineering: A Comprehensive Review	Yazhou Xie et.al.	2405.09021	null
2024-05-14	Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language	Jan Kaiser et.al.	2405.08888	null
2024-05-14	Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes	Samuel Tesfazgi et.al.	2405.08756	null
2024-05-14	Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach	Urvij Saroliya et.al.	2405.08754	null
2024-05-14	Reinformer: Max-Return Sequence Modeling for offline RL	Zifeng Zhuang et.al.	2405.08740	null
2024-05-14	I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning	Yashuai Yan et.al.	2405.08726	null
2024-05-15	Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search Planning	Jan-Hendrik Ewers et.al.	2405.08691	null
2024-05-14	A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning	Matteo Cederle et.al.	2405.08655	link
2024-05-14	vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement	Yiwen Zhu et.al.	2405.08638	null
2024-05-14	Optimizing Deep Reinforcement Learning for American Put Option Hedging	Reilly Pickard et.al.	2405.08602	null
2024-05-14	Python-Based Reinforcement Learning on Simulink Models	Georg Schäfer et.al.	2405.08567	null
2024-05-14	Growing Artificial Neural Networks for Control: the Role of Neuronal Diversity	Eleni Nisioti et.al.	2405.08510	null
2024-05-13	Hierarchical Decision Mamba	André Correia et.al.	2405.07943	link
2024-05-13	RLHF Workflow: From Reward Modeling to Online RLHF	Hanze Dong et.al.	2405.07863	link
2024-05-13	Adaptive Exploration for Data-Efficient General Value Function Evaluations	Arushi Jain et.al.	2405.07838	null
2024-05-13	Fixed Point Theory Analysis of a Lambda Policy Iteration with Randomization for the Ćirić Contraction Operator	Abdelkader Belhenniche et.al.	2405.07824	null
2024-05-13	Hamiltonian-based Quantum Reinforcement Learning for Neural Combinatorial Optimization	Georg Kruse et.al.	2405.07790	null
2024-05-13	Hype or Heuristic? Quantum Reinforcement Learning for Join Order Optimisation	Maja Franz et.al.	2405.07770	null
2024-05-13	CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian Optimization	Wei-Ting Tang et.al.	2405.07760	null
2024-05-13	MADRL-Based Rate Adaptation for 360 $\degree$ Video Streaming with Multi-Viewpoint Prediction	Haopeng Wang et.al.	2405.07759	null
2024-05-13	Neural Network Compression for Reinforcement Learning Tasks	Dmitry A. Ivanov et.al.	2405.07748	null
2024-05-13	Backdoor Removal for Generative Large Language Models	Haoran Li et.al.	2405.07667	null
2024-05-10	Value Augmented Sampling for Language Model Alignment and Personalization	Seungwook Han et.al.	2405.06639	link
2024-05-10	EcoEdgeTwin: Enhanced 6G Network via Mobile Edge Computing and Digital Twin Integration	Synthia Hossain Karobi et.al.	2405.06507	null
2024-05-10	Advantageous and disadvantageous inequality aversion can be taught through vicarious learning of others’ preferences	Shen Zhang et.al.	2405.06500	null
2024-05-10	Contextual Affordances for Safe Exploration in Robotic Scenarios	William Z. Ye et.al.	2405.06422	null
2024-05-10	Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs	Davide Maran et.al.	2405.06363	null
2024-05-10	Learning Latent Dynamic Robust Representations for World Models	Ruixiang Sun et.al.	2405.06263	link
2024-05-10	Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning	Xiaoyu Wen et.al.	2405.06192	link
2024-05-10	(A Partial Survey of) Decentralized, Cooperative Multi-Agent Reinforcement Learning	Christopher Amato et.al.	2405.06161	null
2024-05-09	An RNN-policy gradient approach for quantum architecture search	Gang Wang et.al.	2405.05892	null
2024-05-09	Safe Exploration Using Bayesian World Models and Log-Barrier Optimization	Yarden As et.al.	2405.05890	null
2024-05-09	ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers	Liangliang Chen et.al.	2405.05861	null
2024-05-09	Policy Gradient with Active Importance Sampling	Matteo Papini et.al.	2405.05630	null
2024-05-09	An Automatic Prompt Generation System for Tabular Data Tasks	Ashlesha Akella et.al.	2405.05618	null
2024-05-09	Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning	Yuchen Shi et.al.	2405.05542	link
2024-05-08	Model-Free Robust $φ$ -Divergence Reinforcement Learning Using Both Offline and Online Data	Kishan Panaganti et.al.	2405.05468	null
2024-05-08	Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management	Gang Hu et.al.	2405.05449	null
2024-05-08	Learning to Play Pursuit-Evasion with Dynamic and Sensor Constraints	Burak M. Gonultas et.al.	2405.05372	null
2024-05-08	Offline Model-Based Optimization via Policy-Guided Gradient Search	Yassine Chemingui et.al.	2405.05349	link
2024-05-08	Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models	Aylin Gunal et.al.	2405.05060	null
2024-05-08	Fault Identification Enhancement with Reinforcement Learning (FIERL)	Valentina Zaccaria et.al.	2405.04938	link
2024-05-07	RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes	Kyle Stachowicz et.al.	2405.04714	null
2024-05-07	Proximal Policy Optimization with Adaptive Exploration	Andrei Lixandru et.al.	2405.04664	null
2024-05-07	ACEGEN: Reinforcement learning of generative chemical agents for drug discovery	Albert Bou et.al.	2405.04657	link
2024-05-07	TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters	Jonathan Wilder Lavington et.al.	2405.04491	null
2024-05-07	Designing, Developing, and Validating Network Intelligence for Scaling in Service-Based Architectures based on Deep Reinforcement Learning	Paola Soto et.al.	2405.04441	null
2024-05-08	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	DeepSeek-AI et.al.	2405.04434	link
2024-05-07	The Curse of Diversity in Ensemble-Based Exploration	Zhixuan Lin et.al.	2405.04342	link
2024-05-07	Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation	Atharvan Dogra et.al.	2405.04325	null
2024-05-07	Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies	Paul Templier et.al.	2405.04322	null
2024-05-07	Improving Offline Reinforcement Learning with Inaccurate Simulators	Yiwen Hou et.al.	2405.04307	null
2024-05-07	Deep Reinforcement Learning for Multi-User RF Charging with Non-linear Energy Harvesters	Amirhossein Azarbahram et.al.	2405.04218	null
2024-05-07	In-context Learning for Automated Driving Scenarios	Ziqi Zhou et.al.	2405.04135	null
2024-05-07	Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning	Chunlin Tian et.al.	2405.04122	null
2024-05-06	$ε$ -Policy Gradient for Online Pricing	Lukasz Szpruch et.al.	2405.03624	null
2024-05-06	Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions	Xingyou Song et.al.	2405.03547	null
2024-05-06	ReinWiFi: A Reinforcement-Learning-Based Framework for the Application-Layer QoS Optimization of WiFi Networks	Qianren Li et.al.	2405.03526	null
2024-05-06	Robotic Constrained Imitation Learning for the Peg Transfer Task in Fundamentals of Laparoscopic Surgery	Kento Kawaharazuka et.al.	2405.03440	null
2024-05-06	Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning	Stone Tao et.al.	2405.03379	null
2024-05-06	Enhancing Q-Learning with Large Language Model Heuristics	Xiefeng Wu et.al.	2405.03341	null
2024-05-06	Artificial Intelligence in the Autonomous Navigation of Endovascular Interventions: A Systematic Review	Harry Robertshaw et.al.	2405.03305	null
2024-05-06	End-to-End Reinforcement Learning of Curative Curtailment with Partial Measurement Availability	Hinrikus Wolf et.al.	2405.03262	null
2024-05-06	Federated Reinforcement Learning with Constraint Heterogeneity	Hao Jin et.al.	2405.03236	null
2024-05-06	Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning	Caleb Chuck et.al.	2405.03113	null
2024-05-03	Geometric Fabrics: a Safe Guiding Medium for Policy Learning	Karl Van Wyk et.al.	2405.02250	null
2024-05-03	Learning Optimal Deterministic Policies with Stochastic Policy Gradients	Alessandro Montenegro et.al.	2405.02235	null
2024-05-03	The Cambridge RoboMaster: An Agile Multi-Robot Research Platform	Jan Blumenkamp et.al.	2405.02198	null
2024-05-03	Imitation Learning in Discounted Linear MDPs without exploration assumptions	Luca Viano et.al.	2405.02181	null
2024-05-03	Simulating the economic impact of rationality through reinforcement learning and agent-based modelling	Simone Brusatin et.al.	2405.02161	null
2024-05-03	Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach	Anton Plaksin et.al.	2405.02044	null
2024-05-03	Model-based reinforcement learning for protein backbone design	Frederic Renard et.al.	2405.01983	null
2024-05-03	Rescale-Invariant Federated Reinforcement Learning for Resource Allocation in V2X Networks	Kaidi Xu et.al.	2405.01961	null
2024-05-03	Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization	Changliang Zhou et.al.	2405.01906	null
2024-05-03	Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants	Francesco Maldonato et.al.	2405.01889	link
2024-05-02	Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks	Murtaza Dalal et.al.	2405.01534	null
2024-05-02	FLAME: Factuality-Aware Alignment for Large Language Models	Sheng-Chieh Lin et.al.	2405.01525	null
2024-05-02	NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment	Gerald Shen et.al.	2405.01481	link
2024-05-02	IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning	Ryan Hoque et.al.	2405.01472	null
2024-05-02	Goal-conditioned reinforcement learning for ultrasound navigation guidance	Abdoul Aziz Amadou et.al.	2405.01409	null
2024-05-02	Learning Force Control for Legged Manipulation	Tifanny Portela et.al.	2405.01402	null
2024-05-02	Constrained Reinforcement Learning Under Model Mismatch	Zhongchang Sun et.al.	2405.01327	null
2024-05-02	Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network	Hyeonsu Lyu et.al.	2405.01314	null
2024-05-02	Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning	Liu Qiyuan et.al.	2405.01284	null
2024-05-02	Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation	Hao Wang et.al.	2405.01280	null
2024-05-01	Self-Play Preference Optimization for Language Model Alignment	Yue Wu et.al.	2405.00675	null
2024-05-01	No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO	Skander Moalla et.al.	2405.00662	link
2024-05-01	HUGO – Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach	Malte Lehna et.al.	2405.00629	null
2024-05-01	Koopman-based Deep Learning for Nonlinear System Estimation	Zexin Sun et.al.	2405.00627	null
2024-05-01	Queue-based Eco-Driving at Roundabouts with Reinforcement Learning	Anna-Lena Schlamp et.al.	2405.00625	null
2024-05-01	The Real, the Better: Aligning Large Language Models with Online Human Behaviors	Guanying Jiang et.al.	2405.00578	null
2024-05-01	Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment	Zhili Liu et.al.	2405.00557	null
2024-05-01	Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning	Lucas-Andreï Thil et.al.	2405.00516	null
2024-05-01	MetaRM: Shifted Distributions Alignment via Meta-Learning	Shihan Dou et.al.	2405.00438	null
2024-05-01	UCB-driven Utility Function Search for Multi-objective Reinforcement Learning	Yucheng Shi et.al.	2405.00410	link
2024-04-30	Collaborative Control Method of Transit Signal Priority Based on Cooperative Game and Reinforcement Learning	Hao Qin et.al.	2404.19683	null
2024-04-30	Towards Generalist Robot Learning from Internet Video: A Survey	Robert McCarthy et.al.	2404.19664	null
2024-04-30	Short term vs. long term: optimization of microswimmer navigation on different time horizons	Navid Mousavi et.al.	2404.19561	null
2024-04-30	Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation	Cengis Hasan et.al.	2404.19462	null
2024-04-30	Imitation Learning: A Survey of Learning Methods, Environments and Metrics	Nathan Gavenski et.al.	2404.19456	null
2024-04-30	Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning	Mathieu Rita et.al.	2404.19409	link
2024-04-30	Numeric Reward Machines	Kristina Levina et.al.	2404.19370	null
2024-04-30	Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning	Chenjia Bai et.al.	2404.19346	link
2024-04-30	Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning	Qiaosheng Zhang et.al.	2404.19292	null
2024-04-30	DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets	Xiaoyu Huang et.al.	2404.19264	null
2024-04-29	DPO Meets PPO: Reinforced Token Optimization for RLHF	Han Zhong et.al.	2404.18922	null
2024-04-29	Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty	Laixi Shi et.al.	2404.18909	null
2024-04-29	Overcoming Knowledge Barriers: Online Imitation Learning from Observation with Pretrained World Models	Xingyuan Zhang et.al.	2404.18896	null
2024-04-29	More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness	Aaron J. Li et.al.	2404.18870	link
2024-04-29	Performance-Aligned LLMs for Generating Fast Code	Daniel Nichols et.al.	2404.18864	null
2024-04-29	PlanNetX: Learning an Efficient Neural Network Planner from MPC for Longitudinal Control	Jasper Hoffmann et.al.	2404.18863	null
2024-04-30	Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization	Qi Zhang et.al.	2404.18826	null
2024-04-29	Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies	Seyed Soroush Karimi Madahi et.al.	2404.18821	null
2024-04-29	Multi-Agent Synchronization Tasks	Rolando Fernandez et.al.	2404.18798	null
2024-04-29	Resource-rational reinforcement learning and sensorimotor causal states	Sarah Marzen et.al.	2404.18775	null
2024-04-26	Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo	Stephen Zhao et.al.	2404.17546	null
2024-04-26	Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations	Puhao Li et.al.	2404.17521	link
2024-04-26	Quantum Multi-Agent Reinforcement Learning for Aerial Ad-hoc Networks	Theodora-Augustina Drăgan et.al.	2404.17499	null
2024-04-26	Q-Learning to navigate turbulence without a map	Marco Rando et.al.	2404.17495	null
2024-04-26	Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning	Hao Liu et.al.	2404.17379	null
2024-04-26	When to Trust LLMs: Aligning Confidence with Response Quality	Shuchang Tao et.al.	2404.17287	null
2024-04-26	Enhancing Privacy and Security of Autonomous UAV Navigation	Vatsal Aggarwal et.al.	2404.17225	null
2024-04-26	Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving	C. Gong et.al.	2404.17198	null
2024-04-26	An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging	Sadjad Anzabi Zadeh et.al.	2404.17187	null
2024-04-25	Compiler for Distributed Quantum Computing: a Reinforcement Learning Approach	Panagiotis Promponas et.al.	2404.17077	null
2024-04-25	REBEL: Reinforcement Learning via Regressing Relative Rewards	Zhaolin Gao et.al.	2404.16767	null
2024-04-25	Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods	Min Kyu Shin et.al.	2404.16721	null
2024-04-25	RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments	Diego Martinez-Baselga et.al.	2404.16672	null
2024-04-25	Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare	Emre Can Acikgoz et.al.	2404.16621	null
2024-04-25	Exploring the Dynamics of Data Transmission in 5G Networks: A Conceptual Analysis	Nikita Smirnov et.al.	2404.16508	null
2024-04-25	Leveraging Pretrained Latent Representations for Few-Shot Imitation Learning on a Dexterous Robotic Hand	Davide Liconti et.al.	2404.16483	null
2024-04-25	A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints	Bram De Cooman et.al.	2404.16468	null
2024-04-25	Offline Reinforcement Learning with Behavioral Supervisor Tuning	Padmanaba Srinivasan et.al.	2404.16399	null
2024-04-25	SwarmRL: Building the Future of Smart Active Systems	Samuel Tovey et.al.	2404.16388	link
2024-04-25	Reinforcement Learning with Generative Models for Compact Support Sets	Nico Schiavone et.al.	2404.16300	link
2024-04-24	DPO: Differential reinforcement learning with application to optimal configuration search	Chandrajit Bajaj et.al.	2404.15617	null
2024-04-24	GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL	Lang Qin et.al.	2404.15597	null
2024-04-24	Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems	Sarah Keren et.al.	2404.15583	null
2024-04-23	An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models	Yangchen Pan et.al.	2404.15518	null
2024-04-23	The Power of Resets in Online Reinforcement Learning	Zakaria Mhammedi et.al.	2404.15417	null
2024-04-23	Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments	Mateus G. Machado et.al.	2404.15410	link
2024-04-23	Reinforcement Learning with Adaptive Control Regularization for Safe Control of Critical Systems	Haozhe Tian et.al.	2404.15199	null
2024-04-23	Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation	Xun Wu et.al.	2404.15100	null
2024-04-23	Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot	Neil Guan et.al.	2404.15096	null
2024-04-23	Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem	Raphael Koster et.al.	2404.15059	null
2024-04-23	Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems	Xiaoshuang Chen et.al.	2404.14961	null
2024-04-23	Multi-Objective Deep Reinforcement Learning for 5G Base Station Placement to Support Localisation for Future Sustainable Traffic	Ahmed Al-Tahmeesschi et.al.	2404.14954	null
2024-04-23	MultiSTOP: Solving Functional Equations with Reinforcement Learning	Alessandro Trenta et.al.	2404.14909	null
2024-04-23	Unitary Synthesis of Clifford+T Circuits with Reinforcement Learning	Sebastian Rietsch et.al.	2404.14865	null
2024-04-23	Evolutionary Reinforcement Learning via Cooperative Coevolution	Chengpeng Hu et.al.	2404.14763	null
2024-04-23	Rank2Reward: Learning Shaped Reward Functions from Passive Video	Daniel Yang et.al.	2404.14735	null
2024-04-22	Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data	Fahim Tajwar et.al.	2404.14367	link
2024-04-22	PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving	Jie Cheng et.al.	2404.14327	null
2024-04-22	Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs	David R. Nickel et.al.	2404.14319	null
2024-04-22	LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots	Dongge Han et.al.	2404.14285	null
2024-04-22	Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories	Ning Yang et.al.	2404.14238	null
2024-04-22	Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems	Yiyang Zhu et.al.	2404.14092	null
2024-04-22	Mechanistic Interpretability for AI Safety – A Review	Leonard Bereska et.al.	2404.14082	null
2024-04-22	Research on Robot Path Planning Based on Reinforcement Learning	Wang Ruiqi et.al.	2404.14077	link
2024-04-22	Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras	Mhairi Dunion et.al.	2404.14064	link
2024-04-22	A survey of air combat behavior modeling using machine learning	Patrick Ribu Gorton et.al.	2404.13954	null
2024-04-19	Mapping Social Choice Theory to RLHF	Jessica Dai et.al.	2404.13038	null
2024-04-19	Deep Reinforcement Learning-Based Active Flow Control of an Elliptical Cylinder: Transitioning from an Elliptical Cylinder to a Circular Cylinder and a Flat Plate	Wang Jia et.al.	2404.13003	null
2024-04-19	Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning	Lisheng Wu et.al.	2404.12999	null
2024-04-19	MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering	Avinash Anand et.al.	2404.12926	null
2024-04-19	Zero-Shot Stitching in Reinforcement Learning using Relative Representations	Antonio Pio Ricciardi et.al.	2404.12917	null
2024-04-19	MAexp: A Generic Platform for RL-based Multi-Agent Exploration	Shaohao Zhu et.al.	2404.12824	link
2024-04-19	Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation	Qiang He et.al.	2404.12754	link
2024-04-19	Demonstration of quantum projective simulation on a single-photon-based quantum computer	Giacomo Franceschetto et.al.	2404.12729	null
2024-04-19	Energy Conserved Failure Detection for NS-IoT Systems	Guojin Liu et.al.	2404.12713	null
2024-04-19	Single-Task Continual Offline Reinforcement Learning	Sibo Gai et.al.	2404.12639	null
2024-04-18	*From $r$ to $Q^$ : Your Language Model is Secretly a Q-Function**	Rafael Rafailov et.al.	2404.12358	null
2024-04-18	Improving the interpretability of GNN predictions through conformal-based graph sparsification	Pablo Sanchez-Martin et.al.	2404.12356	link
2024-04-18	Practical Considerations for Discrete-Time Implementations of Continuous-Time Control Barrier Function-Based Safety Filters	Lukas Brunke et.al.	2404.12329	null
2024-04-18	ASID: Active Exploration for System Identification in Robotic Manipulation	Marius Memmel et.al.	2404.12308	null
2024-04-18	RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective	Chenxi Wang et.al.	2404.12281	null
2024-04-18	Privacy-Preserving UCB Decision Process Verification via zk-SNARKs	Xikun Jiang et.al.	2404.12186	null
2024-04-18	Aligning language models with human preferences	Tomasz Korbak et.al.	2404.12150	link
2024-04-19	Robust and Adaptive Deep Reinforcement Learning for Enhancing Flow Control around a Square Cylinder with Varying Reynolds Numbers	Wang Jia et.al.	2404.12123	null
2024-04-18	X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner	Haoyuan Jiang et.al.	2404.12090	link
2024-04-18	Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning	Hyunwoo Park et.al.	2404.12079	null
2024-04-17	Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding	Zezhong Fan et.al.	2404.11589	null
2024-04-17	Deep Policy Optimization with Temporal Logic Constraints	Ameesh Shah et.al.	2404.11578	null
2024-04-17	Spatio-Temporal Motion Retargeting for Quadruped Robots	Taerim Yoon et.al.	2404.11557	null
2024-04-17	VC Theory for Inventory Policies	Yaqi Xie et.al.	2404.11509	null
2024-04-17	Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem	Bowen Fang et.al.	2404.11458	null
2024-04-17	What-if Analysis Framework for Digital Twins in 6G Wireless Network Management	Elif Ak et.al.	2404.11394	null
2024-04-17	Convergence of Policy Gradient for Stochastic Linear-Quadratic Control Problem in Infinite Horizon	Xinpei Zhang et.al.	2404.11382	null
2024-04-17	Following the Human Thread in Social Navigation	Luca Scofano et.al.	2404.11327	link
2024-04-17	On Learning Parities with Dependent Noise	Noah Golowich et.al.	2404.11325	null
2024-04-17	Physics-informed Actor-Critic for Coordination of Virtual Inertia from Power Distribution Systems	Simon Stock et.al.	2404.11149	null
2024-04-16	Settling Constant Regrets in Linear Markov Decision Processes	Weitong Zhang et.al.	2404.10745	null
2024-04-16	N-Agent Ad Hoc Teamwork	Caroline Wang et.al.	2404.10740	null
2024-04-16	Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration	Benjamin A Newman et.al.	2404.10733	null
2024-04-16	Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning	Hao-Lun Hsu et.al.	2404.10728	null
2024-04-16	Automatic re-calibration of quantum devices by reinforcement learning	T. Crosta et.al.	2404.10726	null
2024-04-16	Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study	Shusheng Xu et.al.	2404.10719	null
2024-04-16	Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning	David Winkel et.al.	2404.10683	null
2024-04-16	SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation	Chang Chen et.al.	2404.10675	null
2024-04-16	Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay	Jinmei Liu et.al.	2404.10662	link
2024-04-16	Trajectory Planning using Reinforcement Learning for Interactive Overtaking Maneuvers in Autonomous Racing Scenarios	Levent Ögretmen et.al.	2404.10658	null
2024-04-15	Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model	Hyunsoo Cho et.al.	2404.09717	null
2024-04-15	Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning	Linjie Xu et.al.	2404.09715	null
2024-04-15	Learn Your Reference Model for Real Good Alignment	Alexey Gorbatovski et.al.	2404.09656	null
2024-04-15	Reliability Estimation of News Media Sources: Birds of a Feather Flock Together	Sergio Burdisso et.al.	2404.09565	null
2024-04-15	Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning	Tidiane Camaret Ndir et.al.	2404.09521	link
2024-04-14	Correlated Mean Field Imitation Learning	Zhiyu Zhao et.al.	2404.09324	null
2024-04-14	Egret: Reinforcement Mechanism for Sequential Computation Offloading in Edge Computing	Haosong Peng et.al.	2404.09285	null
2024-04-14	A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs	Elliot Kolker-Hicks et.al.	2404.09264	null
2024-04-14	Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts	Jing-Cheng Pang et.al.	2404.09248	null
2024-04-14	Advanced Intelligent Optimization Algorithms for Multi-Objective Optimal Power Flow in Future Power Systems: A Review	Yuyan Li et.al.	2404.09203	null
2024-04-12	Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation	Hanlin Tian et.al.	2404.08570	null
2024-04-12	RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs	Shreyas Chaudhari et.al.	2404.08555	null
2024-04-12	Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement	Lucas Murray et.al.	2404.08523	null
2024-04-12	Adversarial Imitation Learning via Boosting	Jonathan D. Chang et.al.	2404.08513	null
2024-04-12	Prescribing Optimal Health-Aware Operation for Urban Air Mobility with Deep Reinforcement Learning	Mina Montazeri et.al.	2404.08497	null
2024-04-12	Dataset Reset Policy Optimization for RLHF	Jonathan D. Chang et.al.	2404.08495	link
2024-04-12	Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing	Cui Zhang et.al.	2404.08444	null
2024-04-12	SIR-RL: Reinforcement Learning for Optimized Policy Control during Epidemiological Outbreaks in Emerging Market and Developing Economies	Maeghal Jain et.al.	2404.08423	null
2024-04-12	TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability	Shiwei Lian et.al.	2404.08353	null
2024-04-12	Agile and versatile bipedal robot tracking control through reinforcement learning	Jiayi Li et.al.	2404.08246	null
2024-04-11	High-Dimension Human Value Representation in Large Language Models	Samuel Cahyawijaya et.al.	2404.07900	null
2024-04-11	Data-Driven System Identification of Quadrotors Subject to Motor Delays	Jonas Eschmann et.al.	2404.07837	null
2024-04-11	On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning	Giuseppe Canonaco et.al.	2404.07826	null
2024-04-11	An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization	Minshuo Chen et.al.	2404.07771	null
2024-04-11	Differentially Private Reinforcement Learning with Self-Play	Dan Qiao et.al.	2404.07559	null
2024-04-11	Enhancing Policy Gradient with the Polyak Step-Size Adaption	Yunxiang Li et.al.	2404.07525	null
2024-04-11	Generative Probabilistic Planning for Optimizing Supply Chain Networks	Hyung-il Ahn et.al.	2404.07511	null
2024-04-11	Neural Fault Injection: Generating Software Faults from Natural Language	Domenico Cotroneo et.al.	2404.07491	null
2024-04-11	Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains	Soichiro Nishimori et.al.	2404.07465	null
2024-04-11	UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning	Saichao Liu et.al.	2404.07453	null
2024-04-10	Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery	Zohre Karimi et.al.	2404.07185	null
2024-04-10	Adaptive behavior with stable synapses	Cristiano Capone et.al.	2404.07150	null
2024-04-10	How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models	Unnseo Park et.al.	2404.07148	null
2024-04-10	Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection	Linas Nasvytis et.al.	2404.07099	link
2024-04-10	Improving Language Model Reasoning with Self-motivated Learning	Yunlong Feng et.al.	2404.07017	null
2024-04-10	Agent-driven Generative Semantic Communication for Remote Surveillance	Wanting Yang et.al.	2404.06997	null
2024-04-10	Deep Reinforcement Learning for Mobile Robot Path Planning	Hao Liu et.al.	2404.06974	null
2024-04-10	UAV-Assisted Enhanced Coverage and Capacity in Dynamic MU-mMIMO IoT Systems: A Deep Reinforcement Learning Approach	MohammadMahdi Ghadaksaz et.al.	2404.06726	null
2024-04-10	Dual Ensemble Kalman Filter for Stochastic Optimal Control	Anant A. Joshi et.al.	2404.06696	null
2024-04-09	Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective	Victor-Alexandru Darvariu et.al.	2404.06492	null
2024-04-09	Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints	Hritik Bana et.al.	2404.06423	null
2024-04-09	The Power in Communication: Power Regularization of Communication for Autonomy in Cooperative Multi-Agent Reinforcement Learning	Nancirose Piazza et.al.	2404.06387	null
2024-04-09	Policy-Guided Diffusion	Matthew Thomas Jackson et.al.	2404.06356	link
2024-04-09	Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning	Yanjie Li et.al.	2404.06330	null
2024-04-09	Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning	Xudong Yu et.al.	2404.06188	null
2024-04-09	A quantum information theoretic analysis of reinforcement learning-assisted quantum architecture search	Abhishek Sadhu et.al.	2404.06174	null
2024-04-09	Adaptable Recovery Behaviors in Robotics: A Behavior Trees and Motion Generators(BTMG) Approach for Failure Management	Faseeh Ahmad et.al.	2404.06129	null
2024-04-09	Automatic Configuration Tuning on Cloud Database: A Survey	Limeng Zhang et.al.	2404.06043	null
2024-04-09	Commute with Community: Enhancing Shared Travel through Social Networks	Tian Siyuan et.al.	2404.05987	null
2024-04-08	Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer	Xinyang Gu et.al.	2404.05695	null
2024-04-08	YaART: Yet Another ART Rendering Technology	Sergey Kastryulin et.al.	2404.05666	null
2024-04-08	Dynamic Backtracking in GFlowNet: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms	Shuai Guo et.al.	2404.05576	null
2024-04-08	Optimal Flow Admission Control in Edge Computing via Safe Reinforcement Learning	A. Fox et.al.	2404.05564	null
2024-04-08	Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data	Tim Baumgärtner et.al.	2404.05530	null
2024-04-08	CNN-based Game State Detection for a Foosball Table	David Hagens et.al.	2404.05357	null
2024-04-08	Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models	Yutao Ouyang et.al.	2404.05291	null
2024-04-08	SAFE-GIL: SAFEty Guided Imitation Learning	Yusuf Umut Ciftci et.al.	2404.05249	null
2024-04-08	MeSA-DRL: Memory-Enhanced Deep Reinforcement Learning for Advanced Socially Aware Robot Navigation in Crowded Environments	Mannan Saeed Muhammad et.al.	2404.05203	null
2024-04-08	Decision Transformer for Wireless Communications: A New Paradigm of Resource Management	Jie Zhang et.al.	2404.05199	null
2024-04-05	Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution	Tim Seyde et.al.	2404.04253	null
2024-04-05	Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation	Lanpei Li et.al.	2404.04219	null
2024-04-05	Enhancing IoT Intelligence: A Transformer-based Reinforcement Learning Methodology	Gaith Rjoub et.al.	2404.04205	null
2024-04-05	Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report	Jerrod Wigmore et.al.	2404.04106	null
2024-04-05	Dynamic Prompt Optimizing for Text-to-Image Generation	Wenyi Mo et.al.	2404.04095	link
2024-04-05	Demonstration Guided Multi-Objective Reinforcement Learning	Junlin Lu et.al.	2404.03997	null
2024-04-05	A proximal policy optimization based intelligent home solar management	Kode Creer et.al.	2404.03888	null
2024-04-05	Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration	Xudong Guo et.al.	2404.03869	null
2024-04-04	Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning	Noah Golowich et.al.	2404.03774	null
2024-04-04	A Reinforcement Learning based Reset Policy for CDCL SAT Solvers	Chunxiao Li et.al.	2404.03753	null
2024-04-04	AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent	Hanyu Lai et.al.	2404.03648	link
2024-04-04	Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention	Ziru Liu et.al.	2404.03637	link
2024-04-04	Laser Learning Environment: A new environment for coordination-critical multi-agent tasks	Yannick Molinghen et.al.	2404.03596	link
2024-04-04	Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm	Miao Lu et.al.	2404.03578	null
2024-04-04	Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity	Jake Varley et.al.	2404.03570	null
2024-04-04	AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale	Adam Pardyl et.al.	2404.03482	link
2024-04-04	Integrating Hyperparameter Search into GramML	Hernán Ceferino Vázquez et.al.	2404.03419	link
2024-04-04	Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought	Jooyoung Lee et.al.	2404.03414	null
2024-04-04	SENSOR: Imitate Third-Person Expert’s Behaviors via Active Sensoring	Kaichen Huang et.al.	2404.03386	null
2024-04-04	DIDA: Denoised Imitation Learning based on Domain Adaptation	Kaichen Huang et.al.	2404.03382	null
2024-04-03	Learning Quadrupedal Locomotion via Differentiable Simulation	Clemens Schwarke et.al.	2404.02887	null
2024-04-03	Unsupervised Learning of Effective Actions in Robotics	Marko Zaric et.al.	2404.02728	link
2024-04-03	Reinforcement Learning in Categorical Cybernetics	Jules Hedges et.al.	2404.02688	null
2024-04-03	Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering	Abhijeet Pendyala et.al.	2404.02577	null
2024-04-03	SliceIt! – A Dual Simulator Framework for Learning Robot Food Slicing	Cristian C. Beltran-Hernandez et.al.	2404.02569	link
2024-04-03	Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning	Yi Shen et.al.	2404.02545	link
2024-04-03	Versatile Scene-Consistent Traffic Scenario Generation as Optimization with Diffusion	Zhiyu Huang et.al.	2404.02524	null
2024-04-03	Joint Optimization on Uplink OFDMA and MU-MIMO for IEEE 802.11ax: Deep Hierarchical Reinforcement Learning Approach	Hyeonho Noh et.al.	2404.02486	null
2024-04-03	Deep Reinforcement Learning for Traveling Purchaser Problems	Haofeng Yuan et.al.	2404.02476	null
2024-04-03	Electric Vehicle Routing Problem for Emergency Power Supply: Towards Telecom Base Station Relief	Daisuke Kikuta et.al.	2404.02448	link
2024-04-02	Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL	Golnaz Mesbahi et.al.	2404.02113	null
2024-04-02	Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning	Samuel Tovey et.al.	2404.01999	null
2024-04-02	VLRM: Vision-Language Models act as Reward Models for Image Captioning	Maksim Dzabraev et.al.	2404.01911	null
2024-04-02	Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation	Carlos Plou et.al.	2404.01867	null
2024-04-02	Keeping Behavioral Programs Alive: Specifying and Executing Liveness Requirements	Tom Yaacov et.al.	2404.01858	null
2024-04-02	EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking	Stavros Orfanoudakis et.al.	2404.01849	null
2024-04-02	Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy	Kyungbok Lee et.al.	2404.01830	null
2024-04-02	Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid	Eric MSP Veith et.al.	2404.01794	null
2024-04-02	Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems	Dapeng Zhi et.al.	2404.01769	null
2024-04-02	Asymptotics of Language Model Alignment	Joy Qiping Yang et.al.	2404.01730	null
2024-03-29	Learning Visual Quadrupedal Loco-Manipulation from Demonstrations	Zhengmao He et.al.	2403.20328	null
2024-03-29	Active flow control of a turbulent separation bubble through deep reinforcement learning	Bernat Font et.al.	2403.20295	null
2024-03-29	Functional Bilevel Optimization for Machine Learning	Ieva Petrulionyte et.al.	2403.20233	null
2024-03-29	Decentralized Multimedia Data Sharing in IoV: A Learning-based Equilibrium of Supply and Demand	Jiani Fan et.al.	2403.20218	null
2024-03-29	Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning	Duzhen Zhang et.al.	2403.20163	null
2024-03-29	CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening	Hei Yi Mak et.al.	2403.20156	null
2024-03-29	A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles	Jiani Fan et.al.	2403.20151	null
2024-03-29	Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation	Jinyeong Park et.al.	2403.20109	link
2024-03-29	Reinforcement learning for graph theory, II. Small Ramsey numbers	Mohammad Ghebleh et.al.	2403.20055	null
2024-03-29	Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering	Yuki Akiyama et.al.	2403.20020	null
2024-03-28	Human-compatible driving partners through data-regularized self-play reinforcement learning	Daphne Cornelisse et.al.	2403.19648	link
2024-03-28	Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics	Norman Di Palo et.al.	2403.19578	null
2024-03-28	Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment	Alireza Ganjdanesh et.al.	2403.19490	null
2024-03-28	Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization	Teodor V. Marinov et.al.	2403.19462	null
2024-03-28	RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation	Chongkai Gao et.al.	2403.19460	null
2024-03-28	EDA-Driven Preprocessing for SAT Solving	Zhengyuan Shi et.al.	2403.19446	null
2024-03-28	Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model	Qi Gou et.al.	2403.19443	null
2024-03-28	Fine-Tuning Language Models with Reward Learning on Policy	Hao Lang et.al.	2403.19279	link
2024-03-28	Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning	Dieter Coppens et.al.	2403.19262	null
2024-03-28	Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning	Wei Duan et.al.	2403.19253	null
2024-03-27	Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment	Li Siyao et.al.	2403.18811	null
2024-03-27	CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning	Elliot Chane-Sane et.al.	2403.18765	null
2024-03-27	Probabilistic Model Checking of Stochastic Reinforcement Learning Policies	Dennis Gross et.al.	2403.18725	null
2024-03-27	Fpga-Based Neural Thrust Controller for UAVs	Sharif Azem et.al.	2403.18703	null
2024-03-27	Safe and Robust Reinforcement-Learning: Principles and Practice	Taku Yamagata et.al.	2403.18539	null
2024-03-27	Bridging the Gap: Regularized Reinforcement Learning for Improved Classical Motion Planning with Safety Modules	Elias Goldsztejn et.al.	2403.18524	null
2024-03-27	VersaT2I: Improving Text-to-Image Models with Versatile Reward	Jianshu Guo et.al.	2403.18493	null
2024-03-27	Scaling Vision-and-Language Navigation With Offline RL	Valay Bundele et.al.	2403.18454	null
2024-03-27	FRESCO: Federated Reinforcement Energy System for Cooperative Optimization	Nicolas Mauricio Cuadrado et.al.	2403.18444	null
2024-03-27	Reinforcement learning for graph theory, I. Reimplementation of Wagner’s approach	Salem Al-Yakoob et.al.	2403.18429	null
2024-03-26	TractOracle: towards an anatomically-informed reward function for RL-based tractography	Antoine Théberge et.al.	2403.17845	null
2024-03-26	Learning the Optimal Power Flow: Environment Design Matters	Thomas Wolgast et.al.	2403.17831	link
2024-03-26	Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games	Yikuan Yan et.al.	2403.17674	null
2024-03-26	Learning Goal-Directed Object Pushing in Cluttered Scenes with Location-Based Attention	Nils Dengler et.al.	2403.17667	null
2024-03-26	Uncertainty-aware Distributional Offline Reinforcement Learning	Xiaocong Chen et.al.	2403.17646	null
2024-03-26	PeersimGym: An Environment for Solving the Task Offloading Problem with Reinforcement Learning	Frederico Metelo et.al.	2403.17637	null
2024-03-26	Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems	Siyu Wang et.al.	2403.17634	null
2024-03-26	LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation	Ke Guo et.al.	2403.17601	link
2024-03-26	Towards a Zero-Data, Controllable, Adaptive Dialog System	Dirk Väth et.al.	2403.17582	null
2024-03-26	VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts	Marius Captari et.al.	2403.17542	null
2024-03-25	An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems	Hanqing Yang et.al.	2403.16809	null
2024-03-25	Enhancing Software Effort Estimation through Reinforcement Learning-based Project Management-Oriented Feature Selection	Haoyang Chen et.al.	2403.16749	null
2024-03-25	Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization	Fernando Acero et.al.	2403.16667	null
2024-03-25	Skill Q-Network: Learning Adaptive Skill Ensemble for Mapless Navigation in Unknown Environments	Hyunki Seong et.al.	2403.16664	null
2024-03-25	Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL	Osama Ahmad et.al.	2403.16652	null
2024-03-25	CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment	Feiteng Fang et.al.	2403.16649	link
2024-03-25	Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications	Thao Dang et.al.	2403.16593	null
2024-03-25	Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged Robot	Zifan Wang et.al.	2403.16535	link
2024-03-25	Towards Cooperative Maneuver Planning in Mixed Traffic at Urban Intersections	Marvin Klimke et.al.	2403.16478	null
2024-03-25	If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions	Reza Esfandiarpoor et.al.	2403.16442	link
2024-03-25	Physics-informed RL for Maximal Safety Probability Estimation	Hikaru Hoshino et.al.	2403.16391	null
2024-03-25	Learning Action-based Representations Using Invariance	Max Rudolph et.al.	2403.16369	null
2024-03-22	Can large language models explore in-context?	Akshay Krishnamurthy et.al.	2403.15371	null
2024-03-22	Planning with a Learned Policy Basis to Optimally Solve Complex Tasks	Guillermo Infante et.al.	2403.15301	null
2024-03-22	Blockchain-based Pseudonym Management for Vehicle Twin Migrations in Vehicular Edge Metaverse	Jiawen Kang et.al.	2403.15285	null
2024-03-22	Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies	Nicolò Botteghi et.al.	2403.15267	null
2024-03-22	Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement	Jonathan Pirnay et.al.	2403.15180	null
2024-03-22	Subequivariant Reinforcement Learning Framework for Coordinated Motion Control	Haoyu Wang et.al.	2403.15100	null
2024-03-22	Improved Long Short-Term Memory-based Wastewater Treatment Simulators for Deep Reinforcement Learning	Esmaeel Mohammadi et.al.	2403.15091	null
2024-03-22	Automated Feature Selection for Inverse Reinforcement Learning	Daulet Baimukashev et.al.	2403.15079	null
2024-03-22	Testing for Fault Diversity in Reinforcement Learning	Quentin Mazouni et.al.	2403.15065	null
2024-03-22	Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation	Zhenrui Yue et.al.	2403.14952	null
2024-03-21	Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery	Yangchun Zhang et.al.	2403.14593	null
2024-03-21	A Mathematical Introduction to Deep Reinforcement Learning for 5G/6G Applications	Farhad Rezazadeh et.al.	2403.14516	null
2024-03-21	Constrained Reinforcement Learning with Smoothed Log Barrier Function	Baohe Zhang et.al.	2403.14508	null
2024-03-21	On the continuity and smoothness of the value function in reinforcement learning and optimal control	Hans Harder et.al.	2403.14432	null
2024-03-21	Emergent communication and learning pressures in language models: a language evolution perspective	Lukas Galke et.al.	2403.14427	null
2024-03-21	Task-optimal data-driven surrogate models for eNMPC via differentiable simulation and optimization	Daniel Mayfrank et.al.	2403.14425	null
2024-03-21	A reinforcement learning guided hybrid evolutionary algorithm for the latency location routing problem	Yuji Zou et.al.	2403.14405	link
2024-03-21	Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression	Fernando Acero et.al.	2403.14328	null
2024-03-21	Bayesian Optimization for Sample-Efficient Policy Improvement in Robotic Manipulation	Adrian Röfer et.al.	2403.14305	null
2024-03-21	Reactor Optimization Benchmark by Reinforcement Learning	Deborah Schwarcz et.al.	2403.14273	link
2024-03-20	Information-Theoretic Distillation for Reference-less Summarization	Jaehun Jung et.al.	2403.13780	null
2024-03-20	Towards Principled Representation Learning from Videos for Reinforcement Learning	Dipendra Misra et.al.	2403.13765	null
2024-03-20	Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension Study	Luca Giamattei et.al.	2403.13729	null
2024-03-20	Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections	Zengqi Peng et.al.	2403.13674	null
2024-03-20	Multi-agent Reinforcement Traffic Signal Control based on Interpretable Influence Mechanism and Biased ReLU Approximation	Zhiyue Luo et.al.	2403.13639	null
2024-03-20	Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation	Do June Min et.al.	2403.13578	link
2024-03-20	GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot	Wenxuan Song et.al.	2403.13358	null
2024-03-20	Waypoint-Based Reinforcement Learning for Robot Manipulation Tasks	Shaunak A. Mehta et.al.	2403.13281	null
2024-03-20	Federated reinforcement learning for robot motion planning with zero-shot generalization	Zhenyuan Yuan et.al.	2403.13245	null
2024-03-20	Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0	Jiana Liao et.al.	2403.13237	null
2024-03-19	Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes	He Wang et.al.	2403.12946	null
2024-03-19	Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers	Vidhi Jain et.al.	2403.12943	null
2024-03-19	Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types	Rui Liu et.al.	2403.12891	null
2024-03-19	HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning	Fucai Ke et.al.	2403.12884	null
2024-03-19	Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning	Mirco Theile et.al.	2403.12856	null
2024-03-19	Policy Bifurcation in Safe Reinforcement Learning	Wenjun Zou et.al.	2403.12847	link
2024-03-19	AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents	Jieming Cui et.al.	2403.12835	null
2024-03-19	Oriented and Non-oriented Cubical Surfaces in The Penteract	Manuel Estevez et.al.	2403.12825	null
2024-03-19	Dynamic Manipulation of Deformable Objects using Imitation Learning with Adaptation to Hardware Constraints	Eric Hannus et.al.	2403.12685	null
2024-03-19	Automated Contrastive Learning Strategy Search for Time Series	Baoyu Jing et.al.	2403.12641	null
2024-03-18	The Value of Reward Lookahead in Reinforcement Learning	Nadav Merlis et.al.	2403.11637	null
2024-03-18	Offline Multitask Representation Learning for Reinforcement Learning	Haque Ishfaq et.al.	2403.11574	null
2024-03-18	Reinforcement Learning with Token-level Feedback for Controllable Text Generation	Wendi Li et.al.	2403.11558	null
2024-03-18	TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling	Weiran Chen et.al.	2403.11550	null
2024-03-18	State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards	Yuto Tanimoto et.al.	2403.11520	link
2024-03-18	Demystifying Deep Reinforcement Learning-Based Autonomous Vehicle Decision-Making	Hanxi Wan et.al.	2403.11432	null
2024-03-18	Variational Sampling of Temporal Trajectories	Jurijs Nazarovs et.al.	2403.11418	null
2024-03-17	Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective	Muhammad Aneeq uz Zaman et.al.	2403.11345	null
2024-03-17	Causality from Bottom to Top: A Survey	Abraham Itzhak Weinberg et.al.	2403.11219	null
2024-03-17	Continuous Jumping of a Parallel Wire-Driven Monopedal Robot RAMIEL Using Reinforcement Learning	Kento Kawaharazuka et.al.	2403.11205	null
2024-03-14	Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning	Zhishuai Liu et.al.	2403.09621	null
2024-03-14	ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models	Runyu Ma et.al.	2403.09583	null
2024-03-14	A Reinforcement Learning Approach to Dairy Farm Battery Management using Q Learning	Nawazish Ali et.al.	2403.09499	null
2024-03-14	Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision	Zhiqing Sun et.al.	2403.09472	link
2024-03-14	A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces	Hyuckjin Choi et.al.	2403.09270	null
2024-03-14	Leveraging Constraint Programming in a Deep Learning Approach for Dynamically Solving the Flexible Job-Shop Scheduling Problem	Imanol Echeverria et.al.	2403.09249	null
2024-03-14	Rumor Mitigation in Social Media Platforms with Deep Reinforcement Learning	Hongyuan Su et.al.	2403.09217	null
2024-03-14	MetroGNN: Metro Network Expansion with Reinforcement Learning	Hongyuan Su et.al.	2403.09197	null
2024-03-14	SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning	Nicholas Zolman et.al.	2403.09110	link
2024-03-14	CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences	Martin Weyssow et.al.	2403.09032	link
2024-03-13	TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning	Shangding Gu et.al.	2403.08694	null
2024-03-13	Digital Twin-assisted Reinforcement Learning for Resource-aware Microservice Offloading in Edge Computing	Xiangchun Chen et.al.	2403.08687	null
2024-03-13	Meta Reinforcement Learning for Resource Allocation in Aerial Active-RIS-assisted Networks with Rate-Splitting Multiple Access	Sajad Faramarzi et.al.	2403.08648	null
2024-03-13	Human Alignment of Large Language Models through Online Preference Optimisation	Daniele Calandriello et.al.	2403.08635	null
2024-03-13	Specification Overfitting in Artificial Intelligence	Benjamin Roth et.al.	2403.08425	null
2024-03-13	Optimizing Risk-averse Human-AI Hybrid Teams	Andrew Fuchs et.al.	2403.08386	null
2024-03-13	Learning to Describe for Predicting Zero-shot Drug-Drug Interactions	Fangqi Zhu et.al.	2403.08377	link
2024-03-13	LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments	Maonan Wang et.al.	2403.08337	link
2024-03-14	HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback	Ang Li et.al.	2403.08309	null
2024-03-13	SpaceOctopus: An Octopus-inspired Motion Planning Framework for Multi-arm Space Robot	Wenbo Zhao et.al.	2403.08219	null
2024-03-12	TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation	Shivin Dass et.al.	2403.07869	null
2024-03-12	Exploring Safety Generalization Challenges of Large Language Models via Code	Qibing Ren et.al.	2403.07865	null
2024-03-12	DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation	Chen Wang et.al.	2403.07788	null
2024-03-12	Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards	Wei Shen et.al.	2403.07708	null
2024-03-12	Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning	Motoki Omura et.al.	2403.07704	null
2024-03-12	Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation	Michael Ogezi et.al.	2403.07605	null
2024-03-12	An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning	Weiwei Gu et.al.	2403.07566	null
2024-03-12	Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding	Huijie Tang et.al.	2403.07559	link
2024-03-12	Constrained Optimal Fuel Consumption of HEV: A Constrained Reinforcement Learning Approach	Shuchang Yan et.al.	2403.07503	null
2024-03-12	Optimization of Pressure Management Strategies for Geological CO2 Sequestration Using Surrogate Model-based Reinforcement Learning	Jungang Chen et.al.	2403.07360	null
2024-03-11	Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts	Onur Celik et.al.	2403.06966	null
2024-03-11	Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning	Junseok Park et.al.	2403.06880	null
2024-03-11	Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification	Joar Skalse et.al.	2403.06854	null
2024-03-11	In-context Exploration-Exploitation for Reinforcement Learning	Zhenwen Dai et.al.	2403.06826	null
2024-03-11	ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment	Hao-Lun Hsu et.al.	2403.06814	null
2024-03-11	From Factor Models to Deep Learning: Machine Learning in Reshaping Empirical Asset Pricing	Junyi Ye et.al.	2403.06779	null
2024-03-11	ALaRM: Align Language Models via Hierarchical Rewards Modeling	Yuhang Lai et.al.	2403.06754	null
2024-03-11	Generalising Multi-Agent Cooperation through Task-Agnostic Communication	Dulhan Jayalath et.al.	2403.06750	link
2024-03-11	Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback	Adarsh N L et.al.	2403.06735	null
2024-03-11	Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning	Zijian Zhou et.al.	2403.06728	null
2024-03-08	Will GPT-4 Run DOOM?	Adrian de Wynter et.al.	2403.05468	null
2024-03-08	Switching the Loss Reduces the Cost in Batch Reinforcement Learning	Alex Ayoub et.al.	2403.05385	null
2024-03-08	Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation	Xiaoying Zhang et.al.	2403.05171	null
2024-03-08	Inverse Design of Photonic Crystal Surface Emitting Lasers is a Sequence Modeling Problem	Ceyao Zhang et.al.	2403.05149	null
2024-03-08	ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models	Jun Xu et.al.	2403.05132	null
2024-03-08	RLPeri: Accelerating Visual Perimetry Test with Reinforcement Learning and Convolutional Feature Extraction	Tanvi Verma et.al.	2403.05112	null
2024-03-08	Efficient Data Collection for Robotic Manipulation via Compositional Generalization	Jensen Gao et.al.	2403.05110	null
2024-03-08	Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detection	Jared M. Ping et.al.	2403.05106	null
2024-03-08	Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning	Hongjoon Ahn et.al.	2403.05066	null
2024-03-08	Aligning Large Language Models for Controllable Recommendations	Wensheng Lu et.al.	2403.05063	null
2024-03-07	Teaching Large Language Models to Reason with Reinforcement Learning	Alex Havrilla et.al.	2403.04642	null
2024-03-07	Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace	Léopold Maytié et.al.	2403.04588	null
2024-03-07	Learning Agility Adaptation for Flight in Clutter	Guangyu Zhao et.al.	2403.04586	null
2024-03-07	Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition	Long-Fei Li et.al.	2403.04568	null
2024-03-07	Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation	Fabian Otto et.al.	2403.04453	null
2024-03-07	Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation	Tairan He et.al.	2403.04436	null
2024-03-07	iTRPL: An Intelligent and Trusted RPL Protocol based on Multi-Agent Reinforcement Learning	Debasmita Dey et.al.	2403.04416	null
2024-03-07	Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning	Jing Guo Jing Guo et.al.	2403.04412	null
2024-03-07	Model-Free Load Frequency Control of Nonlinear Power Systems Based on Deep Reinforcement Learning	Xiaodi Chen et.al.	2403.04374	null
2024-03-07	Symmetry Considerations for Learning Task Symmetric Robot Policies	Mayank Mittal et.al.	2403.04359	null
2024-03-06	3D Diffusion Policy	Yanjie Ze et.al.	2403.03954	link
2024-03-06	Stop Regressing: Training Value Functions via Classification for Scalable Deep RL	Jesse Farebrother et.al.	2403.03950	null
2024-03-06	Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation	Marcel Torne et.al.	2403.03949	null
2024-03-06	Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning	Zifan Xu et.al.	2403.03848	null
2024-03-06	A Survey on Applications of Reinforcement Learning in Spatial Resource Allocation	Di Zhang et.al.	2403.03643	null
2024-03-06	Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem	Yuhong Sun et.al.	2403.03558	link
2024-03-06	Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning	Zida Wu et.al.	2403.03552	null
2024-03-05	RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging	Jordan Poots et.al.	2403.03359	null
2024-03-05	Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks	Jianfeng Gao et.al.	2403.03270	null
2024-03-05	Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination	Liangzhou Wang et.al.	2403.03172	null
2024-03-05	Leveraging Federated Learning and Edge Computing for Recommendation Systems within Cloud Computing Networks	Yaqian Qi et.al.	2403.03165	null
2024-03-05	Language Guided Exploration for RL Agents in Text Environments	Hitesh Golchha et.al.	2403.03141	null
2024-03-05	SplAgger: Split Aggregation for Meta-Reinforcement Learning	Jacob Beck et.al.	2403.03020	null
2024-03-05	Autonomous vehicle decision and control through reinforcement learning with traffic flow randomization	Yuan Lin et.al.	2403.02882	null
2024-03-05	SpaceHopper: A Small-Scale Legged Robot for Exploring Low-Gravity Celestial Bodies	Alexander Spiridonov et.al.	2403.02831	null
2024-03-05	A Zero-Shot Reinforcement Learning Strategy for Autonomous Guidewire Navigation	Valentina Scarponi et.al.	2403.02777	null
2024-03-05	RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches	Priya Sundaresan et.al.	2403.02709	null
2024-03-05	Fighting Game Adaptive Background Music for Improved Gameplay	Ibrahim Khan et.al.	2403.02701	null
2024-03-05	PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning	Ke Zhang et.al.	2403.02635	null
2024-03-02	Improving the Validity of Automatically Generated Feedback via Reinforcement Learning	Alexander Scarlatos et.al.	2403.01304	link
2024-03-02	Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey	Hamza Kheddar et.al.	2403.01255	null
2024-03-02	Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding	Ha-Thanh Nguyen et.al.	2403.01185	null
2024-03-02	Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning	Hyungho Na et.al.	2403.01112	null
2024-03-02	Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL)	Noah Ford et.al.	2403.01059	null
2024-03-01	A Holistic Power Optimization Approach for Microgrid Control Based on Deep Reinforcement Learning	Fulong Yao et.al.	2403.01013	null
2024-03-01	Policy Optimization for PDE Control with a Warm Start	Xiangyuan Zhang et.al.	2403.01005	null
2024-03-01	On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games	Awni Altabaa et.al.	2403.00993	null
2024-03-01	SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation	Noriaki Hirose et.al.	2403.00991	null
2024-03-01	Scale-free Adversarial Reinforcement Learning	Mingyu Chen et.al.	2403.00930	null
2024-02-29	Curiosity-driven Red-teaming for Large Language Models	Zhang-Wei Hong et.al.	2402.19464	link
2024-02-29	ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL	Yifei Zhou et.al.	2402.19446	link
2024-02-29	Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation	Jonathan Yang et.al.	2402.19432	null
2024-02-29	Understanding Iterative Combinatorial Auction Designs via Multi-Agent Reinforcement Learning	Greg d’Eon et.al.	2402.19420	null
2024-02-29	RL-GPT: Integrating Reinforcement Learning and Code-as-policy	Shaoteng Liu et.al.	2402.19299	null
2024-02-29	StiefelGen: A Simple, Model Agnostic Approach for Time Series Data Augmentation over Riemannian Manifolds	Prasad Cheema et.al.	2402.19287	null
2024-02-29	Adaptive Testing Environment Generation for Connected and Automated Vehicles with Dense Reinforcement Learning	Jingxuan Yang et.al.	2402.19275	null
2024-02-29	Deep Reinforcement Learning: A Convex Optimization Approach	Ather Gattami et.al.	2402.19212	null
2024-02-29	ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration	Angelo Caregnato-Neto et.al.	2402.19128	null
2024-02-29	Temporal-Aware Deep Reinforcement Learning for Energy Storage Bidding in Energy and Contingency Reserve Markets	Jinhao Li et.al.	2402.19110	null
2024-02-28	Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards	Haoxiang Wang et.al.	2402.18571	link
2024-02-28	Unifying F1TENTH Autonomous Racing: Survey, Methods and Benchmarks	Benjamin David Evans et.al.	2402.18558	null
2024-02-28	Human-Centric Aware UAV Trajectory Planning in Search and Rescue Missions Employing Multi-Objective Reinforcement Learning with AHP and Similarity-Based Experience Replay	Mahya Ramezani et.al.	2402.18487	null
2024-02-28	FinAgent: A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist	Wentao Zhang et.al.	2402.18485	null
2024-02-28	Implementing Online Reinforcement Learning with Clustering Neural Networks	James E. Smith et.al.	2402.18472	null
2024-02-28	Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning	Jin Hwa Lee et.al.	2402.18361	null
2024-02-28	Solving Multi-Entity Robotic Problems Using Permutation Invariant Neural Networks	Tianxu An et.al.	2402.18345	null
2024-02-28	Whole-body Humanoid Robot Locomotion with Human Reference	Qiang Zhang et.al.	2402.18294	null
2024-02-28	Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization	Shuo Yang et.al.	2402.18284	null
2024-02-28	Reinforcement Learning and Graph Neural Networks for Probabilistic Risk Assessment	Joachim Grimstad et.al.	2402.18246	null

(<a href=../README.md>back to main</a>)