Reinforcement Learning - 2026-03

Publish Date Title Authors PDF Translate Read Code
2026-03-31 HapCompass: A Rotational Haptic Device for Contact-Rich Robotic Teleoperation Xiangshan Tan et.al. 2603.30042 translate read null
2026-03-31 Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models Md Saad et.al. 2603.30022 translate read null
2026-03-31 Phyelds: A Pythonic Framework for Aggregate Computing Gianluca Aguzzi et.al. 2603.29999 translate read null
2026-03-31 GreenFLag: A Green Agentic Approach for Energy-Efficient Federated Learning Theodora Panagea et.al. 2603.29933 translate read null
2026-03-31 ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training Rui Ai et.al. 2603.29871 translate read null
2026-03-31 An Output Feedback Q-learning Algorithm for Optimal Control of Nonlinear Systems with Koopman Linear Embedding Victor G. Lopez et.al. 2603.29858 translate read null
2026-03-31 Friends, Foes, and First Authors: A Game Theory Model of How Power Plays Rewrite Academic Co-Authorship Networks Amit Bengal et.al. 2603.29834 translate read null
2026-03-31 Reinforced Reasoning for End-to-End Retrosynthetic Planning Chenyang Zuo et.al. 2603.29723 translate read null
2026-03-31 6GAgentGym: Tool Use, Data Synthesis, and Agentic Learning for Network Management Jiao Chen et.al. 2603.29656 translate read null
2026-03-31 ASI-Evolve: AI Accelerates AI Weixian Xu et.al. 2603.29640 translate read null
2026-03-31 Learning Diagnostic Reasoning for Decision Support in Toxicology Nico Oberländer et.al. 2603.29608 translate read null
2026-03-31 GraSP-STL: A Graph-Based Framework for Zero-Shot Signal Temporal Logic Planning via Offline Goal-Conditioned Reinforcement Learning Ancheng Hou et.al. 2603.29533 translate read null
2026-03-31 Target-Aligned Reinforcement Learning Leonard S. Pleiss et.al. 2603.29501 translate read null
2026-03-31 Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries Luoxin Chen et.al. 2603.29500 translate read null
2026-03-31 MemFactory: Unified Inference & Training Framework for Agent Memory Ziliang Guo et.al. 2603.29493 translate read null
2026-03-31 Calibrated Confidence Expression for Radiology Report Generation David Bani-Harouni et.al. 2603.29492 translate read null
2026-03-31 Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement Learning Jiaao Ma et.al. 2603.29426 translate read null
2026-03-31 AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP Enlai Li et.al. 2603.29369 translate read null
2026-03-31 Scaling Whole-Body Human Musculoskeletal Behavior Emulation for Specificity and Diversity Yunyue Wei et.al. 2603.29332 translate read null
2026-03-31 Downsides of Smartness Across Edge-Cloud Continuum in Modern Industry Akhil Gupta Chigullapally et.al. 2603.29289 translate read null
2026-03-31 MemRerank: Preference Memory for Personalized Product Reranking Zhiyuan Peng et.al. 2603.29247 translate read null
2026-03-30 Gen-Searcher: Reinforcing Agentic Search for Image Generation Kaituo Feng et.al. 2603.28767 translate read null
2026-03-30 SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning Philip Schroeder et.al. 2603.28730 translate read null
2026-03-30 Stepwise Credit Assignment for GRPO on Flow-Matching Models Yash Savani et.al. 2603.28718 translate read null
2026-03-30 Dynamic Dual-Granularity Skill Bank for Agentic RL Songjun Tu et.al. 2603.28716 translate read null
2026-03-30 DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing Kailai Feng et.al. 2603.28713 translate read null
2026-03-30 Dynamic Lookahead Distance via Reinforcement Learning-Based Pure Pursuit for Autonomous Racing Mohamed Elgouhary et.al. 2603.28625 translate read null
2026-03-30 Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning Ziqi Miao et.al. 2603.28618 translate read null
2026-03-30 Learning Partial Action Replacement in Offline MARL Yue Jin et.al. 2603.28573 translate read null
2026-03-30 GraphWalker: Agentic Knowledge Graph Question Answering via Synthetic Trajectory Curriculum Shuwen Xu et.al. 2603.28533 translate read null
2026-03-30 Intelligent Radio Resource Slicing for 6G In-Body Subnetworks Samira Abdelrahman et.al. 2603.28529 translate read null
2026-03-30 Tac2Real: Reliable and GPU Visuotactile Simulation for Online Reinforcement Learning and Zero-Shot Real-World Deployment Ningyu Yan et.al. 2603.28475 translate read null
2026-03-30 CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains Wenhan Wang et.al. 2603.28474 translate read null
2026-03-30 $R_{dm}$ : Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation Linqian Fan et.al. 2603.28460 translate read null
2026-03-30 Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation Robin Kühn et.al. 2603.28422 translate read null
2026-03-30 Learning unified control of internal spin squeezing in atomic qudits for magnetometry C. Z. Cao et.al. 2603.28421 translate read null
2026-03-30 Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models Alkis Sygkounas et.al. 2603.28416 translate read null
2026-03-30 Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids Carlos S. Sepúlveda et.al. 2603.28385 translate read null
2026-03-30 Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models Tao Xia et.al. 2603.28367 translate read null
2026-03-30 Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization He Du et.al. 2603.28342 translate read null
2026-03-30 Competitor-aware Race Management for Electric Endurance Racing Wytze de Vries et.al. 2603.28286 translate read null
2026-03-30 Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback Andi Nika et.al. 2603.28281 translate read null
2026-03-30 Cost-Matching Model Predictive Control for Efficient Reinforcement Learning in Humanoid Locomotion Wenqi Cai et.al. 2603.28243 translate read null
2026-03-30 ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models Song Yu et.al. 2603.28204 translate read null
2026-03-30 A Deep Reinforcement Learning Framework for Closed-loop Guidance of Fish Schools via Virtual Agents Takato Shibayama et.al. 2603.28200 translate read null
2026-03-30 MedLoc-R1: Performance-Aware Curriculum Reward Scheduling for GRPO-Based Medical Visual Grounding Guangjing Yang et.al. 2603.28120 translate read null
2026-03-30 $AutoDrive\text{-}P^3$ : Unified Chain of Perception-Prediction-Planning Thought via Reinforcement Fine-Tuning Yuqi Ye et.al. 2603.28116 translate read null
2026-03-30 Heddle: A Distributed Orchestration System for Agentic RL Rollout Zili Zhang et.al. 2603.28101 translate read null
2026-03-30 Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection Tim Plotzki et.al. 2603.28074 translate read null
2026-03-30 Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL Udita Ghosh et.al. 2603.28053 translate read null
2026-03-30 CARLA-Air: Fly Drones Inside a CARLA World – A Unified Infrastructure for Air-Ground Embodied Intelligence Tianle Zeng et.al. 2603.28032 translate read null
2026-03-30 Energy-Aware Imitation Learning for Steering Prediction Using Events and Frames Hu Cao et.al. 2603.28008 translate read null
2026-03-30 SARL: Label-Free Reinforcement Learning by Rewarding Reasoning Topology Yifan Wang et.al. 2603.27977 translate read null
2026-03-30 Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning Bodla Krishna Vamshi et.al. 2603.27971 translate read null
2026-03-30 Flip Stunts on Bicycle Robots using Iterative Motion Imitation Jeonghwan Kim et.al. 2603.27944 translate read null
2026-03-25 DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving Pengxuan Yang et.al. 2603.24587 translate read null
2026-03-25 MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination Zhuo Li et.al. 2603.24579 translate read null
2026-03-25 VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models Qijia He et.al. 2603.24575 translate read null
2026-03-25 Completeness of Unbounded Best-First Minimax and Descent Minimax Quentin Cohen-Solal et.al. 2603.24572 translate read null
2026-03-25 Composer 2 Technical Report Cursor Reseach et.al. 2603.24477 translate read null
2026-03-25 Improving Lean4 Autoformalization via Cycle Consistency Fine-tuning Arsen Shebzukhov et.al. 2603.24372 translate read null
2026-03-25 CoordLight: Learning Decentralized Coordination for Network-Wide Traffic Signal Control Yifeng Zhang et.al. 2603.24366 translate read null
2026-03-25 LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control Yifeng Zhang et.al. 2603.24361 translate read null
2026-03-25 Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning Dogan Urgun et.al. 2603.24324 translate read null
2026-03-25 Heuristic Self-Paced Learning for Domain Adaptive Semantic Segmentation under Adverse Conditions Shiqin Wang et.al. 2603.24322 translate read null
2026-03-25 C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents Guihlerme Daubt et.al. 2603.24241 translate read null
2026-03-25 Decentralized End-to-End Multi-AAV Pursuit Using Predictive Spatio-Temporal Observation via Deep Reinforcement Learning Yude Li et.al. 2603.24238 translate read null
2026-03-25 SumRank: Aligning Summarization Models for Long-Document Listwise Reranking Jincheng Feng et.al. 2603.24204 translate read null
2026-03-25 A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula Cansu Sancaktar et.al. 2603.24202 translate read null
2026-03-25 Optimized control protocols for stable skyrmion creation using deep reinforcement learning Ji Seok Song et.al. 2603.24177 translate read null
2026-03-25 A Longitudinal Analysis of the CEC Single-Objective Competitions (2010-2024) and Implications for Variational Quantum Optimization Vojtěch Novák et.al. 2603.24140 translate read null
2026-03-25 Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection Zhanhe Lei et.al. 2603.24139 translate read null
2026-03-25 Likelihood hacking in probabilistic program synthesis Jacek Karwowski et.al. 2603.24126 translate read null
2026-03-25 Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization Fei Bai et.al. 2603.24093 translate read null
2026-03-25 Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning Aditya Narendra et.al. 2603.24083 translate read null
2026-03-25 PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning Huanyu Li et.al. 2603.24047 translate read null
2026-03-25 Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage Rishikesh Sahay et.al. 2603.23966 translate read null
2026-03-25 From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments Lijing Luo et.al. 2603.23964 translate read null
2026-03-25 PointRFT: Explicit Reinforcement Fine-tuning for Point Cloud Few-shot Learning Yankai Wang et.al. 2603.23957 translate read null
2026-03-25 Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs Guy Zamir et.al. 2603.23926 translate read null
2026-03-25 Off-Policy Safe Reinforcement Learning with Constrained Optimistic Exploration Guopeng Li et.al. 2603.23889 translate read null
2026-03-25 ProcureGym: A Multi-Agent Markov Game Framework for Modeling National Volume-based Drug Procurement Jia Wang et.al. 2603.23880 translate read null
2026-03-25 The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search Forest Agostinelli et.al. 2603.23873 translate read null
2026-03-25 HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation Ken Ding et.al. 2603.23871 translate read null
2026-03-25 Joint Source-Channel-Check Coding with HARQ for Reliable Semantic Communications Boyuan Li et.al. 2603.23869 translate read null
2026-03-25 Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation Han Zheng et.al. 2603.23838 translate read null
2026-03-25 Human, AI, and Hybrid Ensembles for Detection of Adaptive, RL-based Social Bots Valerio La Gatta et.al. 2603.23796 translate read null
2026-03-24 Self Paced Gaussian Contextual Reinforcement Learning Mohsen Sahraei Ardakani et.al. 2603.23755 translate read null
2026-03-24 BXRL: Behavior-Explainable Reinforcement Learning Ram Rachum et.al. 2603.23738 translate read null
2026-03-24 Dual-Gated Epistemic Time-Dilation: Autonomous Compute Modulation in Asynchronous MARL Igor Jankowski et.al. 2603.23722 translate read null
2026-03-24 Utilizing Adversarial Training for Robust Voltage Control: An Adaptive Deep Reinforcement Learning Method Sungjoo Chung et.al. 2603.23648 translate read null
2026-03-24 Safe Reinforcement Learning with Preference-based Constraint Inference Chenglin Li et.al. 2603.23565 translate read null
2026-03-21 Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction Haoyu Wang et.al. 2603.23550 translate read null
2026-03-24 UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Jie Liu et.al. 2603.23500 translate read null
2026-03-24 WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Zhen Li et.al. 2603.23497 translate read null
2026-03-24 End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions Zakaria Mhammedi et.al. 2603.23461 translate read null
2026-03-24 SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling Yiqi Zhang et.al. 2603.23414 translate read null
2026-03-24 A Joint Reinforcement Learning Scheduling and Compression Framework for Teleoperated Driving Giacomo Avanzi et.al. 2603.23387 translate read null
2026-03-24 Off-Policy Value-Based Reinforcement Learning for Large Language Models Peng-Yuan Wang et.al. 2603.23355 translate read null
2026-03-24 Learning Multi-Agent Local Collision-Avoidance for Collaborative Carrying tasks with Coupled Quadrupedal Robots Francesca Bray et.al. 2603.23278 translate read null
2026-03-24 A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling Ruisong Zhou et.al. 2603.23249 translate read null
2026-03-24 Neural ODE and SDE Models for Adaptation and Planning in Model-Based Reinforcement Learning Chao Han et.al. 2603.23245 translate read null
2026-03-24 GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL Haoyu Wang et.al. 2603.23232 translate read null
2026-03-24 ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment Hao Wang et.al. 2603.23184 translate read null
2026-03-24 Path Planning and Reinforcement Learning-Driven Control of On-Orbit Free-Flying Multi-Arm Robots Álvaro Belmonte-Baeza et.al. 2603.23182 translate read null
2026-03-24 Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems Guoxin Su et.al. 2603.23113 translate read null
2026-03-24 SpecXMaster Technical Report Yutang Ge et.al. 2603.23101 translate read null
2026-03-24 Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards Orhun Buğra Baran et.al. 2603.23086 translate read null
2026-03-24 MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models Jianxin Lin et.al. 2603.23085 translate read null
2026-03-24 Minimizing Material Waste in Additive Manufacturing through Online Reel Assignment Ilayda Celenk et.al. 2603.23042 translate read null
2026-03-24 From Morality Installation in LLMs to LLMs in Morality-as-a-System Gunter Bombaerts et.al. 2603.22944 translate read null
2026-03-24 Quality Over Clicks: Intrinsic Quality-Driven Iterative Reinforcement Learning for Cold-Start E-Commerce Query Suggestion Qi Sun et.al. 2603.22922 translate read null
2026-03-24 EVA: Efficient Reinforcement Learning for End-to-End Video Agent Yaolun Zhang et.al. 2603.22918 translate read null
2026-03-24 VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents Pengsen Liu et.al. 2603.22892 translate read null
2026-03-24 Portfolio Optimization under Recursive Utility via Reinforcement Learning Minkey Chang et.al. 2603.22880 translate read null
2026-03-24 Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models Ruixing Jin et.al. 2603.22876 translate read null
2026-03-24 DecompGrind: A Decomposition Framework for Robotic Grinding via Cutting-Surface Planning and Contact-Force Adaptation Shunsuke Araki et.al. 2603.22859 translate read null
2026-03-24 Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought Yunheng Li et.al. 2603.22847 translate read null
2026-03-24 CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models Youzhi Liu et.al. 2603.22846 translate read null
2026-03-24 Improving Safety Alignment via Balanced Direct Preference Optimization Shiji Zhao et.al. 2603.22829 translate read null
2026-03-24 SG-VLA: Learning Spatially-Grounded Vision-Language-Action Models for Mobile Manipulation Ruisen Tu et.al. 2603.22760 translate read null
2026-03-24 Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints Tian Xu et.al. 2603.22713 translate read null
2026-03-23 Q-Tacit: Image Quality Assessment via Latent Visual Reasoning Yuxuan Jiang et.al. 2603.22641 translate read null
2026-03-23 Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling Young Hyun Cho et.al. 2603.22563 translate read null
2026-03-23 Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion Honglin He et.al. 2603.22527 translate read null
2026-03-23 Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs Haoming Meng et.al. 2603.22446 translate read null
2026-03-23 CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation Max Fu et.al. 2603.22435 translate read null
2026-03-23 Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning Rohan Deb et.al. 2603.22430 translate read null
2026-03-23 Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure Davide Di Gioia et.al. 2603.22384 translate read null
2026-03-22 WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement Fangyuan Li et.al. 2603.22352 translate read null
2026-03-19 The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis Di Zhang et.al. 2603.22312 translate read null
2026-03-23 Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration Zakaria Mhammedi et.al. 2603.22273 translate read null
2026-03-23 TiCo: Time-Controllable Training for Spoken Dialogue Models Kai-Wei Chang et.al. 2603.22267 translate read null
2026-03-23 DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming Hung-Chieh Fang et.al. 2603.22263 translate read null
2026-03-23 SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation Sashuai Zhou et.al. 2603.22228 translate read null
2026-03-23 Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control Qingrui Zhao et.al. 2603.22201 translate read null
2026-03-23 Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement Junrong Guo et.al. 2603.22187 translate read null
2026-03-23 Cross-Modal Reinforcement Learning for Navigation with Degraded Depth Measurements Omkar Sawant et.al. 2603.22182 translate read null
2026-03-23 Closed-Loop Verbal Reinforcement Learning for Task-Level Robotic Planning Dmitrii Plotnikov et.al. 2603.22169 translate read null
2026-03-23 On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation Kexin Huang et.al. 2603.22117 translate read null
2026-03-23 A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP Xi Yang et.al. 2603.22083 translate read null
2026-03-23 MEVIUS2: Practical Open-Source Quadruped Robot with Sheet Metal Welding and Multimodal Perception Kento Kawaharazuka et.al. 2603.22031 translate read null
2026-03-23 TREX: Trajectory Explanations for Multi-Objective Reinforcement Learning Dilina Rajapakse et.al. 2603.21988 translate read null
2026-03-23 Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe Xixi Wu et.al. 2603.21972 translate read null
2026-03-23 Deep Reinforcement Learning and The Tale of Two Temporal Difference Errors Juan Sebastian Rojas et.al. 2603.21921 translate read null
2026-03-23 P^2O: Joint Policy and Prompt Optimization Xinyu Lu et.al. 2603.21877 translate read null
2026-03-23 Manifold-Aware Exploration for Reinforcement Learning in Video Generation Mingzhe Zheng et.al. 2603.21872 translate read null
2026-03-23 Agentic Personas for Adaptive Scientific Explanations with Knowledge Graphs Susana Nunes et.al. 2603.21846 translate read null
2026-03-23 Partial Attention in Deep Reinforcement Learning for Safe Multi-Agent Control Turki Bin Mohaya et.al. 2603.21810 translate read null
2026-03-23 Image-Conditioned Adaptive Parameter Tuning for Visual Odometry Frontends Simone Nascivera et.al. 2603.21785 translate read null
2026-03-23 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning Dongxia Wu et.al. 2603.21743 translate read null
2026-03-23 EvoIdeator: Evolving Scientific Ideas through Checklist-Grounded Reinforcement Learning Andreas Sauter et.al. 2603.21728 translate read null
2026-03-23 PPGL-Swarm: Integrated Multimodal Risk Stratification and Hereditary Syndrome Detection in Pheochromocytoma and Paraganglioma Zelin Liu et.al. 2603.21700 translate read null
2026-03-23 TAMTRL: Teacher-Aligned Reward Reshaping for Multi-Turn Reinforcement Learning in Long-Context Compression Li Wang et.al. 2603.21663 translate read null
2026-03-23 Proximal Policy Optimization in Path Space: A Schrödinger Bridge Perspective Yuehu Gong et.al. 2603.21621 translate read null
2026-03-23 Spatio-Temporal Attention Enhanced Multi-Agent DRL for UAV-Assisted Wireless Networks with Limited Communications Che Chen et.al. 2603.21594 translate read null
2026-03-23 Adaptive Robust Estimator for Multi-Agent Reinforcement Learning Zhongyi Li et.al. 2603.21574 translate read null
2026-03-23 Counterfactual Credit Policy Optimization for Multi-Agent Collaboration Zhongyi Li et.al. 2603.21563 translate read null
2026-03-23 What Do World Models Learn in RL? Probing Latent Representations in Learned Environment Simulators Xinyu Zhang et.al. 2603.21546 translate read null
2026-03-23 VIGIL: Part-Grounded Structured Reasoning for Generalizable Deepfake Detection Xinghan Li et.al. 2603.21526 translate read null
2026-03-23 Learning Can Converge Stably to the Wrong Belief under Latent Reliability Zhipeng Zhang et.al. 2603.21491 translate read null
2026-03-23 DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation Siqi Guo et.al. 2603.21465 translate read null
2026-03-22 KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning Shuai Wang et.al. 2603.21440 translate read null
2026-03-22 Dynasto: Validity-Aware Dynamic-Static Parameter Optimization for Autonomous Driving Testing Dmytro Humeniuk et.al. 2603.21427 translate read null
2026-03-22 PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost Junkeun Yi et.al. 2603.21383 translate read null
2026-03-22 A transformer architecture alteration to incentivise externalised reasoning Elizabeth Pavlova et.al. 2603.21376 translate read null
2026-03-22 RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models Dongyoung Kim et.al. 2603.21341 translate read null
2026-03-22 FinRL-X: An AI-Native Modular Infrastructure for Quantitative Trading Hongyang Yang et.al. 2603.21330 translate read null
2026-03-22 DeepXplain: XAI-Guided Autonomous Defense Against Multi-Stage APT Campaigns Trung V. Phan et.al. 2603.21296 translate read null
2026-03-22 Prompt replay: speeding up grpo with on-policy reuse of high-signal prompts Andrei Baroian et.al. 2603.21177 translate read null
2026-03-22 Reward Sharpness-Aware Fine-Tuning for Diffusion Models Kwanyoung Kim et.al. 2603.21175 translate read null
2026-03-22 Rethinking Plasticity in Deep Reinforcement Learning Zhiqiang He et.al. 2603.21173 translate read null
2026-03-22 Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning Leonid Ugadiarov et.al. 2603.21162 translate read null
2026-03-22 Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues Wenjin Hou et.al. 2603.21138 translate read null
2026-03-22 Anatomical Prior-Driven Framework for Autonomous Robotic Cardiac Ultrasound Standard View Acquisition Zhiyan Cao et.al. 2603.21134 translate read null
2026-03-22 VisFly-Lab: Unified Differentiable Framework for First-Order Reinforcement Learning of Quadrotor Control Fanxing Li et.al. 2603.21123 translate read null
2026-03-22 Learning to Optimize Joint Source and RIS-assisted Channel Encoding for Multi-User Semantic Communication Systems Haidong Wang et.al. 2603.21097 translate read null
2026-03-22 DRL-driven Online Optimization for Joint Traffic Reshaping and Channel Reconfiguration in RIS-assisted Semantic NOMA Communications Songhan Zhao et.al. 2603.21093 translate read null
2026-03-22 LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Jianing Wang et.al. 2603.21065 translate read null
2026-03-22 OrbitStream: Training-Free Adaptive 360-degree Video Streaming via Semantic Potential Fields Aizierjiang Aiersilan et.al. 2603.20999 translate read null
2026-03-22 The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes Benedikt Hornig et.al. 2603.20994 translate read null
2026-03-21 Cyber Deception for Mission Surveillance via Hypergame-Theoretic Deep Reinforcement Learning Zelin Wan et.al. 2603.20981 translate read null
2026-03-21 Deep Adaptive Rate Allocation in Volatile Heterogeneous Wireless Networks Gregorio Maglione et.al. 2603.20926 translate read null
2026-03-21 EruDiff: Refactoring Knowledge in Diffusion Models for Advanced Text-to-Image Synthesis Xiefan Guo et.al. 2603.20828 translate read null
2026-03-21 RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and a Simple Solution Kaiyuan Li et.al. 2603.20799 translate read null
2026-03-21 Enhanced Direction-Sensing Methods and Performance Analysis in Low-Altitude Wireless Network via a Rotation Antenna Array Jinbing Jiang et.al. 2603.20784 translate read null
2026-03-21 Decoupling Numerical and Structural Parameters: An Empirical Study on Adaptive Genetic Algorithms via Deep Reinforcement Learning for the Large-Scale TSP Hongyu Wang et.al. 2603.20702 translate read null
2026-03-21 Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs Huan Zheng et.al. 2603.20698 translate read null
2026-03-21 AI-Driven Multi-Agent Simulation of Stratified Polyamory Systems: A Computational Framework for Optimizing Social Reproductive Efficiency Yicai Xing et.al. 2603.20678 translate read null
2026-03-21 Speedup Patch: Learning a Plug-and-Play Policy to Accelerate Embodied Manipulation Zhichao Wu et.al. 2603.20658 translate read null
2026-03-21 Hierarchical Reinforcement Learning for Next Generation of Multi-AP Coordinated Spatial Reuse Ziru Chen et.al. 2603.20647 translate read null
2026-03-21 Reinforcement Learning-Based Secure Near-field Directional Modulation Enhanced by Rotatable RIS Yongqiang Li et.al. 2603.20608 translate read null
2026-03-21 Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models Zhilong Zhang et.al. 2603.20607 translate read null
2026-03-21 Current state of the multi-agent multi-view experimental and digital twin rendezvous (MMEDR-Autonomous) framework Logan Banker et.al. 2603.20575 translate read null
2026-03-20 Delightful Distributed Policy Gradient Ian Osband et.al. 2603.20521 translate read null
2026-03-20 Grounded Chess Reasoning in Language Models via Master Distillation Zhenwei Tang et.al. 2603.20510 translate read null
2026-03-20 Fluid Antenna Networks Beyond Beamforming: An AI-Native Control Paradigm for 6G Ian F. Akyildiz et.al. 2603.20484 translate read null
2026-03-20 Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret Ming Shi et.al. 2603.20453 translate read null
2026-03-20 SymCircuit: Bayesian Structure Inference for Tractable Probabilistic Circuits via Entropy-Regularized Reinforcement Learning Y. Sungtaek Ju et.al. 2603.20392 translate read null
2026-03-20 CAMA: Exploring Collusive Adversarial Attacks in c-MARL Men Niu et.al. 2603.20390 translate read null
2026-03-20 Leum-VL Technical Report Yuxuan He et.al. 2603.20354 translate read null
2026-03-20 Bounded Coupled AI Learning Dynamics in Tri-Hierarchical Drone Swarms Oleksii Bychkov et.al. 2603.20333 translate read null
2026-03-19 MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery Dong Li et.al. 2603.20295 translate read null
2026-03-17 Learning Communication Between Heterogeneous Agents in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence Alex Popa et.al. 2603.20279 translate read null
2026-03-20 AGILE: A Comprehensive Workflow for Humanoid Loco-Manipulation Learning Huihua Zhao et.al. 2603.20147 translate read null
2026-03-20 Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning Jiajie Li et.al. 2603.20116 translate read null
2026-03-20 Fine-tuning Timeseries Predictors Using Reinforcement Learning Hugo Cazaux et.al. 2603.20063 translate read null
2026-03-20 Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs Wenjian Zhang et.al. 2603.20046 translate read null
2026-03-20 ReViSQL: Achieving Human-Level Text-to-SQL Yuxuan Zhu et.al. 2603.20004 translate read null
2026-03-20 Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States Yurun Yuan et.al. 2603.19987 translate read null
2026-03-20 Interpreting Reinforcement Learning Model Behavior via Koopman with Control William T. Redman et.al. 2603.19968 translate read null
2026-03-20 GustPilot: A Hierarchical DRL-INDI Framework for Wind-Resilient Quadrotor Navigation Amir Atef Habel et.al. 2603.19966 translate read null
2026-03-20 SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia Zhixiang Lu et.al. 2603.19931 translate read null
2026-03-20 Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach Anouar Nechi et.al. 2603.19930 translate read null
2026-03-20 Learning Adaptive Parameter Policies for Nonlinear Bayesian Filtering Ondrej Straka et.al. 2603.19910 translate read null
2026-03-20 What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time Dong Yan et.al. 2603.19880 translate read null
2026-03-20 NASimJax: GPU-Accelerated Policy Learning Framework for Penetration Testing Raphael Simon et.al. 2603.19864 translate read null
2026-03-20 FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Chiyu Ma et.al. 2603.19835 translate read null
2026-03-20 Generalized Task-Driven Design of Soft Robots via Reduced-Order FEM-based Surrogate Modeling Yao Yao et.al. 2603.19794 translate read null
2026-03-20 FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment Kewen Zhu et.al. 2603.19741 translate read null
2026-03-20 LoopRPT: Reinforcement Pre-Training for Looped Language Models Guo Tang et.al. 2603.19714 translate read null
2026-03-20 A Subgoal-driven Framework for Improving Long-Horizon LLM Agents Taiyi Wang et.al. 2603.19685 translate read null
2026-03-20 Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis Siddharth Chandak et.al. 2603.19648 translate read null
2026-03-20 ContractionPPO: Certified Reinforcement Learning via Differentiable Contraction Layers Vrushabh Zinage et.al. 2603.19632 translate read null
2026-03-20 DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management Yaqi Xie et.al. 2603.19621 translate read null
2026-03-20 SaFRO: Satisfaction-Aware Fusion via Dual-Relative Policy Optimization for Short-Video Search Renzhe Zhou et.al. 2603.19585 translate read null
2026-03-20 PA2D-MORL: Pareto Ascent Directional Decomposition based Multi-Objective Reinforcement Learning Tianmeng Hu et.al. 2603.19579 translate read null
2026-03-20 Learning to Bet for Horizon-Aware Anytime-Valid Testing Ege Onur Taga et.al. 2603.19551 translate read null
2026-03-20 EvidenceRL: Reinforcing Evidence Consistency for Trustworthy Language Models J. Ben Tamo et.al. 2603.19532 translate read null
2026-03-19 Stochastic Sequential Decision Making over Expanding Networks with Graph Filtering Zhan Gao et.al. 2603.19501 translate read null
2026-03-19 Teaching an Agent to Sketch One Part at a Time Xiaodan Du et.al. 2603.19500 translate read null
2026-03-19 Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids Lucas Ferraz et.al. 2603.19473 translate read null
2026-03-19 ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models Thomas De Min et.al. 2603.19466 translate read null
2026-03-19 Deep Hilbert–Galerkin Methods for Infinite-Dimensional PDEs and Optimal Control Samuel N. Cohen et.al. 2603.19463 translate read null
2026-03-19 Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social Dilemmas Víctor Gallego et.al. 2603.19453 translate read null
2026-03-19 Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning Xueqiao Peng et.al. 2603.19397 translate read null
2026-03-18 Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification Zenan Li et.al. 2603.19329 translate read null
2026-03-19 OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards Zehao Li et.al. 2603.19191 translate read null
2026-03-19 Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving Huiwen Yan et.al. 2603.19188 translate read null
2026-03-19 Box Maze: A Process-Control Architecture for Reliable LLM Reasoning Zou Qiang et.al. 2603.19182 translate read null
2026-03-19 VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models Chonghan Liu et.al. 2603.19152 translate read null
2026-03-19 Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control Mohammad Al Ridhawi et.al. 2603.19136 translate read null
2026-03-19 Variational and Annealing-Based Approaches to Quantum Combinatorial Optimization Hala Hawashin et.al. 2603.19117 translate read null
2026-03-19 Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning Sangwoo Shin et.al. 2603.19078 translate read null
2026-03-19 MoRI: Learning Motivation-Grounded Reasoning for Scientific Ideation in Large Language Models Chenyang Gu et.al. 2603.19044 translate read null
2026-03-19 CRAFT: Aligning Diffusion Models with Fine-Tuning Is Easier Than You Think Zening Sun et.al. 2603.18991 translate read null
2026-03-19 Maximum-Entropy Exploration with Future State-Action Visitation Measures Adrien Bolland et.al. 2603.18965 translate read null
2026-03-19 Context Bootstrapped Reinforcement Learning Saaket Agashe et.al. 2603.18953 translate read null
2026-03-19 Safety-Guaranteed Imitation Learning from Nonlinear Model Predictive Control for Spacecraft Close Proximity Operations Alexander Meinert et.al. 2603.18910 translate read null
2026-03-19 MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model Youngwan Lee et.al. 2603.18892 translate read null
2026-03-19 Bridging Network Fragmentation: A Semantic-Augmented DRL Framework for UAV-aided VANETs Gaoxiang Cao et.al. 2603.18871 translate read null
2026-03-19 RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models Xiao Feng et.al. 2603.18859 translate read null
2026-03-19 Learn for Variation: Variationally Guided AAV Trajectory Learning in Differentiable Environments Xiucheng Wang et.al. 2603.18853 translate read null
2026-03-19 ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents Hao Zhang et.al. 2603.18815 translate read null
2026-03-19 V-Dreamer: Automating Robotic Simulation and Trajectory Synthesis via Video Generation Priors Songjia He et.al. 2603.18811 translate read null
2026-03-19 Mi:dm K 2.5 Pro KT Tech innovation Group et.al. 2603.18788 translate read null
2026-03-19 ViTac-Tracing: Visual-Tactile Imitation Learning of Deformable Object Tracing Yongqiang Zhao et.al. 2603.18784 translate read null
2026-03-19 Automatic Configuration of LLM Post-Training Pipelines Channe Chwa et.al. 2603.18773 translate read null
2026-03-19 Memento-Skills: Let Agents Design Agents Huichi Zhou et.al. 2603.18743 translate read null
2026-03-19 CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks Hao Wang et.al. 2603.18736 translate read null
2026-03-19 HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning Zhicong Lu et.al. 2603.18683 translate read null
2026-03-19 Thinking with Constructions: A Benchmark and Policy Optimization for Visual-Text Interleaved Geometric Reasoning Haokun Zhao et.al. 2603.18662 translate read null
2026-03-19 Balanced Thinking: Improving Chain of Thought Training in Vision Language Models Shaked Perek et.al. 2603.18656 translate read null
2026-03-19 Learning to Self-Evolve Xiaoyin Chen et.al. 2603.18620 translate read null
2026-03-19 iSatCR: Graph-Empowered Joint Onboard Computing and Routing for LEO Data Delivery Jiangtao Luo et.al. 2603.18539 translate read null
2026-03-19 Balancing the Reasoning Load: Difficulty-Differentiated Policy Optimization with Length Redistribution for Efficient and Robust Reinforcement Learning Yinan Xia et.al. 2603.18533 translate read null
2026-03-19 Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds Andrew Choi et.al. 2603.18532 translate read null
2026-03-19 AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models Chengxuan Lu et.al. 2603.18464 translate read null
2026-03-19 Discounted Beta–Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards Haechan Kim et.al. 2603.18444 translate read null
2026-03-19 Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation Asmita Bhardwaj et.al. 2603.18428 translate read null
2026-03-19 Efficient and Versatile Quadrupedal Skating: Optimal Co-design via Reinforcement Learning and Bayesian Optimization Hanwen Wang et.al. 2603.18408 translate read null
2026-03-19 RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach Yifan Zhang et.al. 2603.18396 translate read null
2026-03-19 Mathematical Foundations of Deep Learning Xiaojing Ye et.al. 2603.18387 translate read null
2026-03-19 PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching Ruishuo Chen et.al. 2603.18363 translate read null
2026-03-18 Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration Amirhossein Roknilamouki et.al. 2603.18326 translate read null
2026-03-18 Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum Nived Rajaraman et.al. 2603.18325 translate read null
2026-03-18 DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving Zilin Huang et.al. 2603.18315 translate read null
2026-03-18 Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning Kaiyang Li et.al. 2603.18314 translate read null
2026-03-18 Discovering What You Can Control: Interventional Boundary Discovery for Reinforcement Learning Jiaxin Liu et.al. 2603.18257 translate read null
2026-03-18 MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasonning Models Philippe Formont et.al. 2603.18256 translate read null
2026-03-18 How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence Alex Anvi Eponon et.al. 2603.18203 translate read null
2026-03-18 R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation Naoki Morihira et.al. 2603.18202 translate read null
2026-03-18 Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models Yuhao Dong et.al. 2603.18118 translate read null
2026-03-18 BoundAD: Boundary-Aware Negative Generation for Time Series Anomaly Detection Xiancheng Wang et.al. 2603.18111 translate read null
2026-03-18 Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner Hao Ma et.al. 2603.18088 translate read null
2026-03-18 Uncovering Latent Phase Structures and Branching Logic in Locomotion Policies: A Case Study on HalfCheetah Daisuke Yasui et.al. 2603.18084 translate read null
2026-03-18 SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training Prince Zizhuang Wang et.al. 2603.18079 translate read null
2026-03-18 Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction Yi Yu et.al. 2603.18074 translate read null
2026-03-18 Reinforcement Learning for Fast and Robust Longitudinal Qubit Readout Yiming Yu et.al. 2603.18060 translate read null
2026-03-18 Unified Policy Value Decomposition for Rapid Adaptation Cristiano Capone et.al. 2603.17947 translate read null
2026-03-18 Training Diffusion Language Models for Black-Box Optimization Zipeng Sun et.al. 2603.17919 translate read null
2026-03-18 Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs Abhishek Gupta et.al. 2603.17875 translate read null
2026-03-18 Procedural Generation of Algorithm Discovery Tasks in Machine Learning Alexander D. Goldie et.al. 2603.17863 translate read null
2026-03-18 Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control Zunzhe Zhang et.al. 2603.17834 translate read null
2026-03-18 CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents Lintang Sutawika et.al. 2603.17829 translate read null
2026-03-18 Federated Distributional Reinforcement Learning with Distributional Critic Regularization David Millard et.al. 2603.17820 translate read null
2026-03-18 EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards Ruixiang Wang et.al. 2603.17808 translate read null
2026-03-18 CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution Teng Pan et.al. 2603.17775 translate read null
2026-03-18 Fast stabilizer state preparation via AI-optimized graph decimation Michael Doherty et.al. 2603.17743 translate read null
2026-03-18 VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning Tianxing Zhou et.al. 2603.17720 translate read null
2026-03-18 Machine Learning for Network Attacks Classification and Statistical Evaluation of Machine Learning for Network Attacks Classification and Adversarial Learning Methodologies for Synthetic Data Generation Iakovos-Christos Zarkadis et.al. 2603.17717 translate read null
2026-03-18 Flow Matching Policy with Entropy Regularization Ting Gao et.al. 2603.17685 translate read null
2026-03-18 Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards Philipp Normann et.al. 2603.17673 translate read null
2026-03-18 Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies Sinan Ibrahim et.al. 2603.17631 translate read null
2026-03-18 Complementary Reinforcement Learning Dilxat Muhtar et.al. 2603.17621 translate read null
2026-03-18 From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation Pujun Zheng et.al. 2603.17588 translate read null
2026-03-18 Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation Tharun Sethuraman et.al. 2603.17510 translate read null
2026-03-18 Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control Hao Ma et.al. 2603.17468 translate read null
2026-03-18 AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization Dailan He et.al. 2603.17461 translate read null
2026-03-18 CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval Guangzhi Wang et.al. 2603.17387 translate read null
2026-03-18 Efficient Exploration at Scale Seyed Mohammad Asghari et.al. 2603.17378 translate read null
2026-03-18 EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection Chenyang Zhu et.al. 2603.17343 translate read null
2026-03-18 A Progressive Visual-Logic-Aligned Framework for Ride-Hailing Adjudication Weiming Wu et.al. 2603.17328 translate read null
2026-03-18 ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling Ang Li et.al. 2603.17324 translate read null
2026-03-18 Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing Aniruddha Bora et.al. 2603.17319 translate read null
2026-03-18 Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress Yuelin Zhang et.al. 2603.17312 translate read null
2026-03-18 Ruyi2.5 Technical Report Huan Song et.al. 2603.17311 translate read null
2026-03-18 InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning Chengwei Wei et.al. 2603.17310 translate read null
2026-03-18 ReLMXEL: Adaptive RL-Based Memory Controller with Explainable Energy and Latency Optimization Panuganti Chirag Sai et.al. 2603.17309 translate read null
2026-03-18 Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations Haozheng Luo et.al. 2603.17305 translate read null
2026-03-18 WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation Zahin Sufiyan et.al. 2603.17301 translate read null
2026-03-18 Network and Device Level Cyber Deception for Contested Environments Using RL and LLMs Abhijeet Sahu et.al. 2603.17272 translate read null
2026-03-18 Adaptive Anchor Policies for Efficient 4D Gaussian Streaming Ashim Dahal et.al. 2603.17227 translate read null
2026-03-17 MetaClaw: Just Talk – An Agent That Meta-Learns and Evolves in the Wild Peng Xia et.al. 2603.17187 translate read null
2026-03-17 Shielded Reinforcement Learning Under Dynamic Temporal Logic Constraints Sadık Bera Yüksel et.al. 2603.17152 translate read null
2026-03-17 REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge Yasi Zhang et.al. 2603.17145 translate read null
2026-03-17 SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion Elham Daneshmand et.al. 2603.17092 translate read null
2026-03-17 CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning Weikun K. Zhang et.al. 2603.17075 translate read null
2026-03-17 PaAgent: Portrait-Aware Image Restoration Agent via Subjective-Objective Reinforcement Learning Yijian Wang et.al. 2603.17055 translate read null
2026-03-17 Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Songchun Zhang et.al. 2603.17051 translate read null
2026-03-17 HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Shenzhi Wang et.al. 2603.17024 translate read null
2026-03-17 Efficient and Reliable Teleoperation through Real-to-Sim-to-Real Shared Autonomy Shuo Sha et.al. 2603.17016 translate read null
2026-03-17 Rewarding DINO: Predicting Dense Rewards with Vision Foundation Models Pierre Krack et.al. 2603.16978 translate read null
2026-03-17 DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns Trung V. Phan et.al. 2603.16969 translate read null
2026-03-17 Efficient Reasoning on the Edge Yelysei Bondarenko et.al. 2603.16867 translate read null
2026-03-17 DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models Emily Yue-Ting Jia et.al. 2603.16860 translate read null
2026-03-17 Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning Jello Zhou et.al. 2603.16842 translate read null
2026-03-17 Learning to Present: Inverse Specification Rewards for Agentic Slide Generation Karthik Ragunath Ananda Kumar et.al. 2603.16839 translate read null
2026-03-17 Deep Reinforcement Learning-driven Edge Offloading for Latency-constrained XR pipelines Sourya Saha et.al. 2603.16823 translate read null
2026-03-17 Anticipatory Planning for Multimodal AI Agents Yongyuan Liang et.al. 2603.16777 translate read null
2026-03-16 GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution Qiaosi Yi et.al. 2603.16769 translate read null
2026-03-17 Learning Whole-Body Control for a Salamander Robot Mengze Tian et.al. 2603.16683 translate read null
2026-03-17 When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making Jun Liu et.al. 2603.16673 translate read null
2026-03-17 What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline Benoît Alcaraz et.al. 2603.16651 translate read null
2026-03-17 Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLM Reward Models Weijie Qiu et.al. 2603.16600 translate read null
2026-03-17 When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective Zelin Zhang et.al. 2603.16578 translate read null
2026-03-17 EmoLLM: Appraisal-Grounded Cognitive-Emotional Co-Reasoning in Large Language Models Yifei Zhang et.al. 2603.16553 translate read null
2026-03-17 Kamino: GPU-based Massively Parallel Simulation of Multi-Body Systems with Challenging Topologies Vassilios Tsounis et.al. 2603.16536 translate read null
2026-03-17 From the Inside Out: Progressive Distribution Refinement for Confidence Calibration Xizhong Yang et.al. 2603.16500 translate read null
2026-03-17 Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems Marios Aristodemou et.al. 2603.16470 translate read null
2026-03-17 Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition Yu Liu et.al. 2603.16463 translate read null
2026-03-17 Agentic AI for SAGIN Resource Management_Semantic Awareness, Orchestration, and Optimization Linghao Zhang et.al. 2603.16458 translate read null
2026-03-17 TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas Ai Jian et.al. 2603.16448 translate read null
2026-03-17 Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences Quan Cheng et.al. 2603.16417 translate read null
2026-03-17 Onboard MuJoCo-based Model Predictive Control for Shipboard Crane with Double-Pendulum Sway Suppression Oscar Pang et.al. 2603.16407 translate read null
2026-03-17 Deep Reinforcement Learning-Assisted Automated Operator Portfolio for Constrained Multi-objective Optimization Shuai Shao et.al. 2603.16401 translate read null
2026-03-17 Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement Yusuke Nishii et.al. 2603.16384 translate read null
2026-03-17 Agile Interception of a Flying Target using Competitive Reinforcement Learning Timothée Gavin et.al. 2603.16279 translate read null
2026-03-17 VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment Tengjiao Yin et.al. 2603.16271 translate read null
2026-03-17 Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism Kaixuan Du et.al. 2603.16223 translate read null
2026-03-17 Offline Exploration-Aware Fine-Tuning for Long-Chain Mathematical Reasoning Yongyu Mu et.al. 2603.16206 translate read null
2026-03-17 Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning Haomin Wang et.al. 2603.16189 translate read null
2026-03-17 ECHO: Edge-Cloud Humanoid Orchestration for Language-to-Motion Control Haozhe Jia et.al. 2603.16188 translate read null
2026-03-17 Task-Specified Compliance Bounds for Humanoids via Lipschitz-Constrained Policies Zewen He et.al. 2603.16180 translate read null
2026-03-17 SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation Long Li et.al. 2603.16161 translate read null
2026-03-17 Execution-Grounded Credit Assignment for GRPO in Code Generation Abhijit Kumar et.al. 2603.16158 translate read null
2026-03-17 DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay Long Li et.al. 2603.16157 translate read null
2026-03-17 HIPO: Instruction Hierarchy via Constrained Reinforcement Learning Keru Chen et.al. 2603.16152 translate read null
2026-03-17 Communication-Aware Multi-Agent Reinforcement Learning for Decentralized Cooperative UAV Deployment Enguang Fan et.al. 2603.16141 translate read null
2026-03-17 Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards Yuxuan Zhu et.al. 2603.16140 translate read null
2026-03-17 SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding Songcheng Cai et.al. 2603.16124 translate read null
2026-03-17 Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models Yanru Wu et.al. 2603.16065 translate read null
2026-03-17 ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning Yu Li et.al. 2603.16060 translate read null
2026-03-17 Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition Xiaozhou Ye et.al. 2603.16043 translate read null
2026-03-16 Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning Jingxiang Chen et.al. 2603.15981 translate read null
2026-03-16 ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors Zifan Xu et.al. 2603.15956 translate read null
2026-03-16 Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions Goutam Das et.al. 2603.15907 translate read null
2026-03-16 Counteractive RL: Rethinking Core Principles for Efficient and Scalable Deep Reinforcement Learning Ezgi Korkmaz et.al. 2603.15871 translate read null
2026-03-16 Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning Patrick Yin et.al. 2603.15789 translate read null
2026-03-16 CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving Yihong Guo et.al. 2603.15771 translate read null
2026-03-16 Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation Jacob Levy et.al. 2603.15759 translate read null
2026-03-16 Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models Lit Sin Tan et.al. 2603.15724 translate read null
2026-03-16 BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator Ruyi Zhang et.al. 2603.15692 translate read null
2026-03-16 GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering Xincheng Shuai et.al. 2603.15616 translate read null
2026-03-16 HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Yukang Cao et.al. 2603.15612 translate read null
2026-03-16 Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning Aozhe Wang et.al. 2603.15611 translate read null
2026-03-16 From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation Yibin Liu et.al. 2603.15600 translate read null
2026-03-16 Unbiased and Biased Variance-Reduced Forward-Reflected-Backward Splitting Methods for Stochastic Composite Inclusions Quoc Tran-Dinh et.al. 2603.15576 translate read null
2026-03-16 Deep Reinforcement Learning for Fano Hypersurfaces Marc Truter et.al. 2603.15437 translate read null
2026-03-16 Listening to the Echo: User-Reaction Aware Policy Optimization via Scalar-Verbal Hybrid Reinforcement Learning Jing Ye et.al. 2603.15434 translate read null
2026-03-16 Gym-V: A Unified Vision Environment System for Agentic Vision Research Fanqing Meng et.al. 2603.15432 translate read null
2026-03-16 MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings Shahil Shaik et.al. 2603.15418 translate read null
2026-03-16 Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities Vanshaj Khattar et.al. 2603.15417 translate read null
2026-03-16 Fusian: Multi-LoRA Fusion for Fine-Grained Continuous MBTI Personality Control in Large Language Models Zehao Chen et.al. 2603.15405 translate read null
2026-03-16 Trajectory-Diversity-Driven Robust Vision-and-Language Navigation Jiangyang Li et.al. 2603.15370 translate read null
2026-03-16 NavThinker: Action-Conditioned World Models for Coupled Prediction and Planning in Social Navigation Tianshuai Hu et.al. 2603.15359 translate read null
2026-03-16 Evaluating the Robustness of Reinforcement Learning based Adaptive Traffic Signal Control Dickens Kwesiga et.al. 2603.15283 translate read null
2026-03-16 MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers Kangjun Guo et.al. 2603.15265 translate read null
2026-03-16 Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search Mengxiang Chen et.al. 2603.15262 translate read null
2026-03-16 SAGE: Multi-Agent Self-Evolution for LLM Reasoning Yulin Peng et.al. 2603.15255 translate read null
2026-03-16 Towards Foundation Models for Consensus Rank Aggregation Yijun Jin et.al. 2603.15218 translate read null
2026-03-16 What Matters for Scalable and Robust Learning in End-to-End Driving Planners? David Holtz et.al. 2603.15185 translate read null
2026-03-16 Iterative Learning Control-Informed Reinforcement Learning for Batch Process Control Runze Lin et.al. 2603.15180 translate read null
2026-03-16 KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots Xiaoyi Wei et.al. 2603.15179 translate read null
2026-03-16 Multi-Scale Control of Large Agent Populations: From Density Dynamics to Individual Actuation Mario di Bernardo et.al. 2603.15160 translate read null
2026-03-16 Master Micro Residual Correction with Adaptive Tactile Fusion and Force-Mixed Control for Contact-Rich Manipulation Xingting Li et.al. 2603.15152 translate read null
2026-03-16 Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies Mumuksh Tayal et.al. 2603.15136 translate read null
2026-03-16 MMKU-Bench: A Multimodal Update Benchmark for Diverse Visual Knowledge Baochen Fu et.al. 2603.15117 translate read null
2026-03-16 Sampling-guided exploration of active feature selection policies Gabriel Bernardino et.al. 2603.15110 translate read null
2026-03-16 HALO:Closing Sim-to-Real Gap for Heavy-loaded Humanoid Agile Motion Skills via Differentiable Simulation Xingyi Wang et.al. 2603.15084 translate read null
2026-03-16 Writer-R1: Enhancing Generative Writing in LLMs via Memory-augmented Replay Policy Optimization Jihao Zhao et.al. 2603.15061 translate read null
2026-03-16 Interference-Aware K-Step Reachable Communication in Multi-Agent Reinforcement Learning Ziyu Cheng et.al. 2603.15054 translate read null
2026-03-16 CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control Gelu Liu et.al. 2603.15013 translate read null
2026-03-16 Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing Jiahe Song et.al. 2603.15011 translate read null
2026-03-16 CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models Xiaojun Shan et.al. 2603.14957 translate read null
2026-03-16 EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing Zitong Xu et.al. 2603.14916 translate read null
2026-03-16 PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning Yinfeng Gao et.al. 2603.14908 translate read null
2026-03-16 ViSA: Visited-State Augmentation for Generalized Goal-Space Contrastive Reinforcement Learning Issa Nakamura et.al. 2603.14887 translate read null
2026-03-16 Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning Mikoto Kudo et.al. 2603.14867 translate read null
2026-03-16 Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks Zijian Yu et.al. 2603.14864 translate read null
2026-03-16 Ego to World: Collaborative Spatial Reasoning in Embodied Systems via Reinforcement Learning Heng Zhou et.al. 2603.14811 translate read null
2026-03-16 DeFRiS: Silo-Cooperative IoT Applications Scheduling via Decentralized Federated Reinforcement Learning Zhiyu Wang et.al. 2603.14729 translate read null
2026-03-15 VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting Daeun Lee et.al. 2603.14659 translate read null
2026-03-15 EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees Saad Alqithami et.al. 2603.14625 translate read null
2026-03-15 A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study Jingyi Liu et.al. 2603.14600 translate read null
2026-03-15 Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning Jingyi Liu et.al. 2603.14589 translate read null
2026-03-15 Machine Learning-Driven Intelligent Memory System Design: From On-Chip Caches to Storage Rahul Bera et.al. 2603.14583 translate read null
2026-03-15 MorFiC: Fixing Value Miscalibration for Zero-Shot Quadruped Transfer Prakhar Mishra et.al. 2603.14554 translate read null
2026-03-15 Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms Jingyi Liu et.al. 2603.14535 translate read null
2026-03-15 VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning Chaoyang Wang et.al. 2603.14523 translate read null
2026-03-15 AI Can Learn Scientific Taste Jingqi Tong et.al. 2603.14473 translate read link
2026-03-15 Physics-Informed Policy Optimization via Analytic Dynamics Regularization Namai Chandra et.al. 2603.14469 translate read null
2026-03-15 eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation Prithvi Jai Ramesh et.al. 2603.14397 translate read null
2026-03-15 From $\boldsymbol{\logπ}$ to $\boldsymbolπ$ : Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight Xiaoliang Fu et.al. 2603.14389 translate read null
2026-03-15 SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI Parth Patne et.al. 2603.14380 translate read null
2026-03-15 Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling Suvadeep Hajra et.al. 2603.14355 translate read null
2026-03-15 VIP-Loco: A Visually Guided Infinite Horizon Planning Framework for Legged Locomotion Aditya Shirwatkar et.al. 2603.14345 translate read null
2026-03-15 AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models Jiarui Zhang et.al. 2603.14342 translate read null
2026-03-15 Data-Driven Physics Embedded Dynamics with Predictive Control and Reinforcement Learning for Quadrupeds Prakrut Kotecha et.al. 2603.14333 translate read null
2026-03-15 Load-Aware Locomotion Control for Humanoid Robots in Industrial Transportation Tasks Lequn Fu et.al. 2603.14308 translate read null
2026-03-15 RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment Yujia Wang et.al. 2603.14297 translate read null
2026-03-15 MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos Sagnik Majumder et.al. 2603.14252 translate read null
2026-03-15 GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies He Zhang et.al. 2603.14245 translate read null
2026-03-15 Understanding Strategic Platform Entry and Seller Exploration: A Stackelberg Model Garrett Seo et.al. 2603.14206 translate read null
2026-03-12 HumDex:Humanoid Dexterous Manipulation Made Easy Liang Heng et.al. 2603.12260 translate read null
2026-03-12 DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning Yujie Wei et.al. 2603.12257 translate read null
2026-03-12 Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Baifeng Shi et.al. 2603.12254 translate read null
2026-03-12 Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Xiangyu Zhao et.al. 2603.12247 translate read null
2026-03-12 Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training Yixin Liu et.al. 2603.12246 translate read null
2026-03-12 Separable neural architectures as a primitive for unified predictive and generative intelligence Reza T. Batley et.al. 2603.12244 translate read null
2026-03-12 HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies Amber Xie et.al. 2603.12243 translate read null
2026-03-12 Integrated Online Monitoring and Adaption of Process Model Predictive Controllers Samuel Mallick et.al. 2603.12187 translate read null
2026-03-12 LatentGeo: Learnable Auxiliary Constructions in Latent Space for Multimodal Geometric Reasoning Haiying Xu et.al. 2603.12166 translate read null
2026-03-12 IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL Zhoujun Cheng et.al. 2603.12151 translate read null
2026-03-12 Linking Perception, Confidence and Accuracy in MLLMs Yuetian Du et.al. 2603.12149 translate read null
2026-03-12 EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next Ye Pan et.al. 2603.12147 translate read null
2026-03-12 Automatic Generation of High-Performance RL Environments Seth Karten et.al. 2603.12145 translate read null
2026-03-12 Increasing intelligence in AI agents can worsen collective outcomes Neil F. Johnson et.al. 2603.12129 translate read null
2026-03-12 Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives Taeho Lee et.al. 2603.12110 translate read null
2026-03-12 On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents Deyu Zou et.al. 2603.12109 translate read null
2026-03-12 A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control Sheng-You Huang et.al. 2603.12096 translate read null
2026-03-12 Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics Ming-Hong Chen et.al. 2603.12087 translate read null
2026-03-12 AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling Hamed Hamzeh et.al. 2603.12031 translate read null
2026-03-12 Sim-to-reality adaptation for Deep Reinforcement Learning applied to an underwater docking application Alaaeddine Chaarani et.al. 2603.12020 translate read null
2026-03-12 Learning Visuomotor Policy for Multi-Robot Laser Tag Game Kai Li et.al. 2603.11980 translate read null
2026-03-12 FlexRec: Adapting LLM-based Recommenders for Flexible Needs via Reinforcement Learning Yijun Pan et.al. 2603.11901 translate read null
2026-03-12 The price of decentralization in managing engineering systems through multi-agent reinforcement learning Prateek Bhustali et.al. 2603.11884 translate read null
2026-03-12 Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language Remigiusz Kinas et.al. 2603.11881 translate read null
2026-03-12 Hybrid Human-Agent Social Dilemmas in Energy Markets Isuri Perera et.al. 2603.11834 translate read null
2026-03-12 Towards High-Fidelity CAD Generation via LLM-Driven Program Generation and Text-Based B-Rep Primitive Grounding Jiahao Li et.al. 2603.11831 translate read null
2026-03-12 RADAR: Closed-Loop Robotic Data Generation via Semantic Planning and Autonomous Causal Environment Reset Yongzhong Wang et.al. 2603.11811 translate read null
2026-03-12 Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach Erfan Mirzaei et.al. 2603.11757 translate read null
2026-03-12 STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning Jiwon Jeon et.al. 2603.11691 translate read null
2026-03-12 Entropy-Preserving Reinforcement Learning Aleksei Petrenko et.al. 2603.11682 translate read null
2026-03-12 Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge Junjie Wu et.al. 2603.11665 translate read null
2026-03-12 Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models Xiquan Li et.al. 2603.11661 translate read null
2026-03-12 Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning Jiaheng Hu et.al. 2603.11653 translate read null
2026-03-12 Diversity You Can Actually Measure: A Fast, Model-Free Diversity Metric for Robotics Datasets Sreevardhan Sirigiri et.al. 2603.11634 translate read null
2026-03-12 Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization Qijun Liao et.al. 2603.11600 translate read null
2026-03-12 WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing Hui Zhang et.al. 2603.11593 translate read null
2026-03-12 Multi-Agent Reinforcement Learning for UAV-Based Chemical Plume Source Localization Zhirun Li et.al. 2603.11582 translate read null
2026-03-12 SVLL: Staged Vision-Language Learning for Physically Grounded Embodied Task Planning Yuyuan Yang et.al. 2603.11563 translate read null
2026-03-12 NFPO: Stabilized Policy Optimization of Normalizing Flow for Robotic Policy Learning Diyuan Shi et.al. 2603.11470 translate read null
2026-03-12 Adversarial Reinforcement Learning for Detecting False Data Injection Attacks in Vehicular Routing Taha Eghtesad et.al. 2603.11433 translate read null
2026-03-12 ARROW: Augmented Replay for RObust World models Abdulaziz Alyahya et.al. 2603.11395 translate read null
2026-03-12 SliceFed: Federated Constrained Multi-Agent DRL for Dynamic Spectrum Slicing in 6G Hossein Mohammadi et.al. 2603.11390 translate read null
2026-03-11 Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification Hang Yu et.al. 2603.11372 translate read null
2026-03-11 abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance Joyce Lee et.al. 2603.11369 translate read null
2026-03-11 Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning Hong Lu et.al. 2603.11351 translate read null
2026-03-11 Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning Yuto Shibata et.al. 2603.11346 translate read null
2026-03-11 Meta-Reinforcement Learning with Self-Reflection for Agentic Search Teng Xiao et.al. 2603.11327 translate read null
2026-03-11 Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings Yuning Wu et.al. 2603.11321 translate read null
2026-03-11 ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning Lingxiao Tang et.al. 2603.11226 translate read null
2026-03-11 Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning Yuehao Song et.al. 2603.11219 translate read null
2026-03-11 DeReason: A Difficulty-Aware Curriculum Improves Decoupled SFT-then-RL Training for General Reasoning Hanxu Hu et.al. 2603.11193 translate read null
2026-03-11 Learning to Unscramble: Simplifying Symbolic Expressions via Self-Supervised Oracle Trajectories David Shih et.al. 2603.11164 translate read null
2026-03-11 Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion Yuanhong Wu et.al. 2603.11126 translate read null
2026-03-11 Learning Tree-Based Models with Gradient Descent Sascha Marton et.al. 2603.11117 translate read null
2026-03-11 ResWM: Residual-Action World Model for Visual RL Jseen Zhang et.al. 2603.11110 translate read null
2026-03-11 RC-NF: Robot-Conditioned Normalizing Flow for Real-Time Anomaly Detection in Robotic Manipulation Shijie Zhou et.al. 2603.11106 translate read null
2026-03-11 Learning Adaptive Force Control for Contact-Rich Sample Scraping with Heterogeneous Materials Cenk Cetin et.al. 2603.10979 translate read null
2026-03-11 Contact Coverage-Guided Exploration for General-Purpose Dexterous Manipulation Zixuan Liu et.al. 2603.10971 translate read null
2026-03-11 Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control Yaswanth Chittepu et.al. 2603.10938 translate read null
2026-03-11 Lifelong Imitation Learning with Multimodal Latent Replay and Incremental Adjustment Fanqi Yu et.al. 2603.10929 translate read null
2026-03-11 Ergodicity in reinforcement learning Dominik Baumann et.al. 2603.10895 translate read null
2026-03-11 Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models Yixiu Mao et.al. 2603.10887 translate read null
2026-03-11 RL-Augmented MPC for Non-Gaited Legged and Hybrid Locomotion Andrea Patrizi et.al. 2603.10878 translate read null
2026-03-11 $V_{0.5}$ : Generalist Value Model as a Prior for Sparse RL Rollouts Yi-Kai Zhang et.al. 2603.10848 translate read null
2026-03-11 Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis Yujie Zheng et.al. 2603.10846 translate read null
2026-03-11 ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning Xiaofeng Lin et.al. 2603.10823 translate read null
2026-03-11 Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments Konstantin Dobler et.al. 2603.10793 translate read null
2026-03-11 mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR Konstantin Dobler et.al. 2603.10767 translate read null
2026-03-11 ASTER: Attitude-aware Suspended-payload Quadrotor Traversal via Efficient Reinforcement Learning Dongcheng Cao et.al. 2603.10715 translate read null
2026-03-11 MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers Jin Zhou et.al. 2603.10714 translate read null
2026-03-11 Splat2Real: Novel-view Scaling for Physical AI with 3D Gaussian Splatting Hansol Lim et.al. 2603.10638 translate read null
2026-03-11 Reinforcement Learning with Conditional Expectation Reward Changyi Xiao et.al. 2603.10624 translate read null
2026-03-11 AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments Zixuan Chen et.al. 2603.10616 translate read null
2026-03-11 Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning Zhaowei Zhang et.al. 2603.10588 translate read null
2026-03-11 Safety-critical Control Under Partial Observability: Reach-Avoid POMDP meets Belief Space Control Matti Vahs et.al. 2603.10572 translate read null
2026-03-11 Adaptive RAN Slicing Control via Reward-Free Self-Finetuning Agents Yuanhao Li et.al. 2603.10564 translate read null
2026-03-11 Learning to Score: Tuning Cluster Schedulers through Reinforcement Learning Martin Asenov et.al. 2603.10545 translate read null
2026-03-11 Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning Zichao Li et.al. 2603.10535 translate read null
2026-03-11 UAV-MARL: Multi-Agent Reinforcement Learning for Time-Critical and Dynamic Medical Supply Delivery Islam Guven et.al. 2603.10528 translate read null
2026-03-11 IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs Chuan Guo et.al. 2603.10521 translate read null
2026-03-11 Muscle Synergy Priors Enhance Biomechanical Fidelity in Predictive Musculoskeletal Locomotion Simulation Ilseung Park et.al. 2603.10474 translate read null
2026-03-11 COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints Mohammad Saeid Anwar et.al. 2603.10436 translate read null
2026-03-11 Adaptive Active Learning for Regression via Reinforcement Learning Simon D. Nguyen et.al. 2603.10435 translate read null
2026-03-11 Graph-GRPO: Training Graph Flow Models with Reinforcement Learning Baoheng Zhu et.al. 2603.10395 translate read null
2026-03-11 ScanDP: Generalizable 3D Scanning with Diffusion Policy Itsuki Hirako et.al. 2603.10390 translate read null
2026-03-11 SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning Anlun Huang et.al. 2603.10306 translate read null
2026-03-11 From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification Ke Zhang et.al. 2603.10300 translate read null
2026-03-11 Quantum entanglement provides a competitive advantage in adversarial games Peiyong Wang et.al. 2603.10289 translate read null
2026-03-10 From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning Zhanyi Sun et.al. 2603.10263 translate read null
2026-03-10 SiMPO: Measure Matching for Online Diffusion Reinforcement Learning Haitong Ma et.al. 2603.10250 translate read null
2026-03-10 Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces Ji Gao et.al. 2603.10199 translate read null
2026-03-10 Learning to Decode Quantum LDPC Codes Via Belief Propagation Mohsen Moradi et.al. 2603.10192 translate read null
2026-03-10 Calibration-Reasoning Framework for Descriptive Speech Quality Assessment Elizaveta Kostenok et.al. 2603.10175 translate read null
2026-03-10 ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning Ruizhong Qiu et.al. 2603.10160 translate read null
2026-03-10 CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR Sijia Cui et.al. 2603.10101 translate read null
2026-03-10 Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models Daniel Hennes et.al. 2603.10098 translate read null
2026-03-10 Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models Ali Raza et.al. 2603.10080 translate read null
2026-03-10 Improving Search Agent with One Line of Code Jian Li et.al. 2603.10069 translate read null
2026-03-09 Cluster-Aware Attention-Based Deep Reinforcement Learning for Pickup and Delivery Problems Wentao Wang et.al. 2603.10053 translate read null
2026-03-10 Kinodynamic Motion Retargeting for Humanoid Locomotion via Multi-Contact Whole-Body Trajectory Optimization Xiaoyu Zhang et.al. 2603.09956 translate read null
2026-03-10 When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic Alberto Fernández-Hernández et.al. 2603.09950 translate read null
2026-03-10 Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts Hongbo Bo et.al. 2603.09890 translate read null
2026-03-10 Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning Yixin Zheng et.al. 2603.09882 translate read null
2026-03-10 RecThinker: An Agentic Framework for Tool-Augmented Reasoning in Recommendation Haobo Zhang et.al. 2603.09843 translate read null
2026-03-10 Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning Tiehua Mei et.al. 2603.09803 translate read null
2026-03-10 Long-Run Conditional Value-at-Risk Reinforcement Learning Qixin Wang et.al. 2603.09734 translate read null
2026-03-10 GSStream: 3D Gaussian Splatting based Volumetric Scene Streaming System Zhiye Tang et.al. 2603.09718 translate read null
2026-03-10 ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning Davit Melikidze et.al. 2603.09692 translate read null
2026-03-10 ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly Minchi Ruan et.al. 2603.09565 translate read null
2026-03-10 GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision Lang Sun et.al. 2603.09551 translate read null
2026-03-10 NS-VLA: Towards Neuro-Symbolic Vision-Language-Action Models Ziyue Zhu et.al. 2603.09542 translate read null
2026-03-10 Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization Ming Nie et.al. 2603.09538 translate read null
2026-03-10 MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning Xiang Yuan et.al. 2603.09478 translate read null
2026-03-10 SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments Shiyi Chen et.al. 2603.09460 translate read null
2026-03-10 Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning Tatjana Krau et.al. 2603.09427 translate read null
2026-03-10 SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space Swaminathan S K et.al. 2603.09378 translate read null
2026-03-10 Robust Regularized Policy Iteration under Transition Uncertainty Hongqiang Lin et.al. 2603.09344 translate read null
2026-03-10 Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning Heng Zhang et.al. 2603.09331 translate read null
2026-03-10 OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models Tengjin Weng et.al. 2603.09326 translate read null
2026-03-10 Social-R1: Towards Human-like Social Reasoning in LLMs Jincenzi Wu et.al. 2603.09249 translate read null
2026-03-10 MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics Neil Janwani et.al. 2603.09237 translate read null
2026-03-10 Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control Peihao Wang et.al. 2603.09221 translate read null
2026-03-10 Embodied Human Simulation for Quantitative Design and Analysis of Interactive Robotics Chenhui Zuo et.al. 2603.09218 translate read null
2026-03-10 Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation Jake Gonzales et.al. 2603.09208 translate read null
2026-03-10 Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents Jiangming Shu et.al. 2603.09203 translate read null
2026-03-10 RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning Tzu-Heng Huang et.al. 2603.09160 translate read null
2026-03-10 Critical States Preparation With Deep Reinforcement Learning Jia-Wen Yu et.al. 2603.09135 translate read null
2026-03-10 Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards Zhengzhao Ma et.al. 2603.09117 translate read null
2026-03-10 Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms Renos Zabounidis et.al. 2603.09090 translate read null
2026-03-10 Learning Adaptive LLM Decoding Chloe H. Su et.al. 2603.09065 translate read null
2026-03-10 Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection George Edwards et.al. 2603.09044 translate read null
2026-03-09 PlayWorld: Learning Robot World Models from Autonomous Play Tenny Yin et.al. 2603.09030 translate read null
2026-03-09 MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment Kailong Fan et.al. 2603.08987 translate read null
2026-03-09 FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid Niraj Pudasaini et.al. 2603.08961 translate read null
2026-03-09 A Survey of Reinforcement Learning For Economics Pranjal Rawat et.al. 2603.08956 translate read null
2026-03-09 Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance Joshua Castillo et.al. 2603.08933 translate read null
2026-03-09 Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks Hanzhi Yu et.al. 2603.08931 translate read null
2026-03-09 APPLV: Adaptive Planner Parameter Learning from Vision-Language-Action Model Yuanjie Lu et.al. 2603.08862 translate read null
2026-03-09 VisionCreator-R1: A Reflection-Enhanced Native Visual-Generation Agentic Model Jinxiang Lai et.al. 2603.08812 translate read null
2026-03-09 Multi-level meta-reinforcement learning with skill-based curriculum Sichen Yang et.al. 2603.08773 translate read null
2026-03-09 SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning Kaushik Roy et.al. 2603.08763 translate read null
2026-03-09 Agentic Critical Training Weize Liu et.al. 2603.08706 translate read null
2026-03-09 How Far Can Unsupervised RLVR Scale LLM Training? Bingxiang He et.al. 2603.08660 translate read null
2026-03-09 Embedding Classical Balance Control Principles in Reinforcement Learning for Humanoid Recovery Nehar Poddar et.al. 2603.08619 translate read null
2026-03-09 Diff-Muscle: Efficient Learning for Musculoskeletal Robotic Table Tennis Wentao Zhao et.al. 2603.08617 translate read null
2026-03-09 Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control Riccardo De Monte et.al. 2603.08588 translate read null
2026-03-09 MetaWorld-X: Hierarchical World Modeling via VLM-Orchestrated Experts for Humanoid Loco-Manipulation Yutong Shen et.al. 2603.08572 translate read null
2026-03-09 RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback Xiaoying Zhang et.al. 2603.08561 translate read null
2026-03-09 Impact of Connectivity on Laplacian Representations in Reinforcement Learning Tommaso Giorgi et.al. 2603.08558 translate read null
2026-03-09 EquiBim: Learning Symmetry-Equivariant Policy for Bimanual Manipulation Zhiyuan Zhang et.al. 2603.08541 translate read null
2026-03-09 Breaking the Bias Barrier in Concave Multi-Objective Reinforcement Learning Swetha Ganesh et.al. 2603.08518 translate read null
2026-03-09 Oracle-Guided Soft Shielding for Safe Move Prediction in Chess Prajit T Rajendran et.al. 2603.08506 translate read null
2026-03-09 LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning Ariel Rodriguez et.al. 2603.08476 translate read null
2026-03-09 Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning Shreya Das et.al. 2603.08468 translate read null
2026-03-09 Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck Fabio Valerio Massoli et.al. 2603.08462 translate read null
2026-03-09 Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems Théo Zangato et.al. 2603.08418 translate read null
2026-03-09 Aligning to Illusions: Choice Blindness in Human and AI Feedback Wenbin Wu et.al. 2603.08412 translate read null
2026-03-09 A Recipe for Stable Offline Multi-agent Reinforcement Learning Dongsu Lee et.al. 2603.08399 translate read null
2026-03-09 Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective Liyuan Mao et.al. 2603.08398 translate read null
2026-03-09 SlowBA: An efficiency backdoor attack towards VLM-based GUI agents Junxian Li et.al. 2603.08316 translate read null
2026-03-09 Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces Hamish Flynn et.al. 2603.08287 translate read null
2026-03-09 SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM Makoto Sato et.al. 2603.08269 translate read null
2026-03-09 Adaptive shape control for microswimmer navigation in turbulence Jingran Qiu et.al. 2603.08201 translate read null
2026-03-09 RexDrug: Reliable Multi-Drug Combination Extraction through Reasoning-Enhanced LLMs Zhijun Wang et.al. 2603.08166 translate read null
2026-03-09 Towards Human-Like Manipulation through RL-Augmented Teleoperation and Mixture-of-Dexterous-Experts VLA Tutian Tang et.al. 2603.08122 translate read null
2026-03-09 Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting Zhongjian Qiao et.al. 2603.08118 translate read null
2026-03-09 DeReCo: Decoupling Representation and Coordination Learning for Object-Adaptive Decentralized Multi-Robot Cooperative Transport Kazuki Shibata et.al. 2603.08111 translate read null
2026-03-09 Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization Hongli Zhou et.al. 2603.08091 translate read null
2026-03-09 In-Context Reinforcement Learning for Tool Use in Large Language Models Yaoqi Ye et.al. 2603.08068 translate read null
2026-03-09 ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning Yiran Zhao et.al. 2603.08059 translate read null
2026-03-09 MJ1: Multimodal Judgment via Grounded Verification Bhavesh Kumar et.al. 2603.07990 translate read null
2026-03-09 On the Feasibility and Opportunity of Autoregressive 3D Object Detection Zanming Huang et.al. 2603.07985 translate read null
2026-03-09 VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments Ning Liu et.al. 2603.07973 translate read null
2026-03-09 Model-Free DRL Control for Power Inverters: From Policy Learning to Real-Time Implementation via Knowledge Distillation Yang Yang et.al. 2603.07964 translate read null
2026-03-09 SGG-R $^{\rm 3}$ : From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation Jiaye Feng et.al. 2603.07961 translate read null
2026-03-09 RL unknotter, hard unknots and unknotting number Anne Dranowski et.al. 2603.07955 translate read null
2026-03-09 SMGI: A Structural Theory of General Artificial Intelligence Aomar Osmani et.al. 2603.07896 translate read null
2026-03-09 SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans Hansi Zeng et.al. 2603.07853 translate read null
2026-03-08 Relating Reinforcement Learning to Dynamic Programming-Based Planning Filip V. Georgiev et.al. 2603.07844 translate read null
2026-03-08 Preference-Conditioned Reinforcement Learning for Space-Time Efficient Online 3D Bin Packing Nikita Sarawgi et.al. 2603.07800 translate read null
2026-03-08 Toward Global Intent Inference for Human Motion by Inverse Reinforcement Learning Sarmad Mehrdad et.al. 2603.07797 translate read null
2026-03-08 ProgAgent:A Continual RL Agent with Progress-Aware Rewards Jinzhou Tan et.al. 2603.07784 translate read null
2026-03-08 Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems Zongqian Li et.al. 2603.07779 translate read null
2026-03-08 Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models Zongqian Li et.al. 2603.07777 translate read null
2026-03-08 Residual Control for Fast Recovery from Dynamics Shifts Nethmi Jayasinghe et.al. 2603.07775 translate read null
2026-03-08 TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward Yihong Luo et.al. 2603.07700 translate read null
2026-03-08 Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization Anirudh Satheesh et.al. 2603.07698 translate read null
2026-03-08 Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques Rahul Bera et.al. 2603.07683 translate read null
2026-03-08 Numerical Approach for On-the-Fly Active Flow Control via Flow Map Learning Method Xinyu Liu et.al. 2603.07678 translate read null
2026-03-08 DAISS: Phase-Aware Imitation Learning for Dual-Arm Robotic Ultrasound-Guided Interventions Feng Li et.al. 2603.07663 translate read null
2026-03-08 Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving Chang Su et.al. 2603.07642 translate read null
2026-03-08 Exoskeleton Control through Learning to Reduce Biological Joint Moments in Simulations Zihang You et.al. 2603.07629 translate read null
2026-03-08 GeoLoco: Leveraging 3D Geometric Priors from Visual Foundation Model for Robust RGB-Only Humanoid Locomotion Yufei Liu et.al. 2603.07624 translate read null
2026-03-08 Approximate Imitation Learning for Event-based Quadrotor Flight in Cluttered Environments Nico Messikommer et.al. 2603.07578 translate read null
2026-03-08 Constraints Matrix Diffusion based Generative Neural Solver for Vehicle Routing Problems Zhenwei Wang et.al. 2603.07568 translate read null
2026-03-08 COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance Dennis Gross et.al. 2603.07546 translate read null
2026-03-08 ICLR: In-Context Imitation Learning with Visual Reasoning Toan Nguyen et.al. 2603.07530 translate read null
2026-03-08 TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning Mingyue Cheng et.al. 2603.07528 translate read null
2026-03-08 Reinforcement learning-based dynamic cleaning scheduling framework for solar energy system Heungjo An et.al. 2603.07518 translate read null
2026-03-08 InterReal: A Unified Physics-Based Imitation Framework for Learning Human-Object Interaction Skills Dayang Liang et.al. 2603.07516 translate read null
2026-03-08 EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification Binjia Zhou et.al. 2603.07515 translate read null
2026-03-08 Med-Evo: Test-time Self-evolution for Medical Multimodal Large Language Models Dunyuan Xu et.al. 2603.07443 translate read null
2026-03-08 Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II Yi Tian et.al. 2603.07437 translate read null
2026-03-08 Generalization in Online Reinforcement Learning for Mobile Agents Li Gu et.al. 2603.07432 translate read null
2026-03-08 Dynamic Vehicle Routing Problem with Prompt Confirmation of Advance Requests Amutheezan Sivagnanam et.al. 2603.07422 translate read null
2026-03-08 Underwater Embodied Intelligence for Autonomous Robots: A Constraint-Coupled Perspective on Planning, Control, and Deployment Jingzehua Xu et.al. 2603.07393 translate read null
2026-03-07 Learning to Reflect: Hierarchical Multi-Agent Reinforcement Learning for CSI-Free mmWave Beam-Focusing Hieu Le et.al. 2603.07370 translate read null
2026-03-07 Neural Control and Learning of Simulated Hand Movements With an EMG-Based Closed-Loop Interface Balint K. Hodossy et.al. 2603.07364 translate read null
2026-03-07 Adversarial Latent-State Training for Robust Policies in Partially Observable Domains Angad Singh Ahuja et.al. 2603.07313 translate read null
2026-03-07 AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery Nilesh Jain et.al. 2603.07300 translate read null
2026-03-07 Adaptive Double-Booking Strategy for Outpatient Scheduling Using Multi-Objective Reinforcement Learning Ninda Nurseha Amalina et.al. 2603.07270 translate read null
2026-03-07 Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving Jiazhuo Li et.al. 2603.07264 translate read null
2026-03-07 Learning When to Cooperate Under Heterogeneous Goals Max Taylor-Davies et.al. 2603.07253 translate read null
2026-03-07 Reinforcement Learning for Vehicle-to-Grid Voltage Regulation: Single-Hub to Multi-Hub Coordination with Battery-Aware Constraints Jingbo Wang et.al. 2603.07237 translate read null
2026-03-07 $\textbf{Re}^{2}$ : Unlocking LLM Reasoning via Reinforcement Learning with Re-solving Pinzheng Wang et.al. 2603.07197 translate read null
2026-03-07 RoTri-Diff: A Spatial Robot-Object Triadic Interaction-Guided Diffusion Model for Bimanual Manipulation Zixuan Chen et.al. 2603.07165 translate read null
2026-03-07 Learning From Failures: Efficient Reinforcement Learning Control with Episodic Memory Chenyang Miao et.al. 2603.07110 translate read null
2026-03-07 Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction Xu Chen et.al. 2603.07093 translate read null
2026-03-07 Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR Muhammad Khalifa et.al. 2603.07084 translate read null
2026-03-07 Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction Michael Hauri et.al. 2603.07083 translate read null
2026-03-07 SSP: Safety-guaranteed Surgical Policy via Joint Optimization of Behavioral and Spatial Constraints Jianshu Hu et.al. 2603.07032 translate read null
2026-03-07 RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States Xiangjie Xiao et.al. 2603.07020 translate read null
2026-03-07 AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge Karen Zhou et.al. 2603.07019 translate read null
2026-03-07 AdaGen: Learning Adaptive Policy for Image Synthesis Zanlin Ni et.al. 2603.06993 translate read null
2026-03-07 Diffusion Controller: Framework, Algorithms and Parameterization Tong Yang et.al. 2603.06981 translate read null
2026-03-07 NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning Addison Kalanther et.al. 2603.06977 translate read null
2026-03-07 Topology-Aware Reinforcement Learning over Graphs for Resilient Power Distribution Networks Roshni Anna Jacob et.al. 2603.06964 translate read null
2026-03-07 Learning Quadruped Walking from Seconds of Demonstration Ruipeng Zhang et.al. 2603.06961 translate read null
2026-03-07 Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards Xin Zhang et.al. 2603.06958 translate read null
2026-03-06 Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments Ege C. Kaya et.al. 2603.06946 translate read null
2026-03-06 Collaborative Planning with Concurrent Synchronization for Operationally Constrained UAV-UGV Teams Zihao Deng et.al. 2603.06898 translate read null
2026-03-06 Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration Yanjun Chen et.al. 2603.06859 translate read null
2026-03-06 Reinforcing the World’s Edge: A Continual Learning Problem in the Multi-Agent-World Boundary Dane Malenfant et.al. 2603.06813 translate read null
2026-03-06 Multi-Agent Reinforcement Learning with Submodular Reward Wenjing Chen et.al. 2603.06810 translate read null
2026-03-06 Optimistic Policy Regularization Mai Pham et.al. 2603.06793 translate read null
2026-03-06 HGT-Scheduler: Deep Reinforcement Learning for the Job Shop Scheduling Problem via Heterogeneous Graph Transformers Bulent Soykan et.al. 2603.06777 translate read null
2026-03-06 HybridMimic: Hybrid RL-Centroidal Control for Humanoid Motion Mimicking Ludwig Chee-Ying Tay et.al. 2603.06775 translate read null
2026-03-06 Stabilizing Reinforcement Learning for Diffusion Language Models Jianyuan Zhong et.al. 2603.06743 translate read null
2026-03-06 Don’t Freeze, Don’t Crash: Extending the Safe Operating Range of Neural Navigation in Dense Crowds Jiefu Zhang et.al. 2603.06729 translate read null
2026-03-06 Boosting deep Reinforcement Learning using pretraining with Logical Options Zihan Ye et.al. 2603.06565 translate read null
2026-03-06 EgoReasoner: Learning Egocentric 4D Reasoning via Task-Adaptive Structured Thinking Fangrui Zhu et.al. 2603.06561 translate read null
2026-03-06 On a PDE model for Learning in Stochastic Market Entry Games Esther Bou Dagher et.al. 2603.06514 translate read null
2026-03-06 A Reference Architecture of Reinforcement Learning Frameworks Xiaoran Liu et.al. 2603.06413 translate read null
2026-03-06 Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion Pengcheng Jiang et.al. 2603.06397 translate read null
2026-03-06 OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis Yuxuan Fan et.al. 2603.06366 translate read null
2026-03-06 From Entropy to Calibrated Uncertainty: Training Language Models to Reason About Uncertainty Azza Jenane et.al. 2603.06317 translate read null
2026-03-06 Artificial Intelligence for Climate Adaptation: Reinforcement Learning for Climate Change-Resilient Transport Miguel Costa et.al. 2603.06278 translate read null
2026-03-06 Synthetic Monitoring Environments for Reinforcement Learning Leonard Pleiss et.al. 2603.06252 translate read null
2026-03-06 MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue Naifan Zhang et.al. 2603.06194 translate read null
2026-03-06 Optimizing 3D Diffusion Models for Medical Imaging via Multi-Scale Reward Learning Yueying Tian et.al. 2603.06173 translate read null
2026-03-06 Dual-Agent Multiple-Model Reinforcement Learning for Event-Triggered Human-Robot Co-Adaptation in Decoupled Task Spaces Yaqi Li et.al. 2603.06163 translate read null
2026-03-06 Partial Policy Gradients for RL in LLMs Puneet Mathur et.al. 2603.06138 translate read null
2026-03-06 ChatShopBuddy: Towards Reliable Conversational Shopping Agents via Reinforcement Learning Yiruo Cheng et.al. 2603.06065 translate read null
2026-03-06 Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models Canyu Chen et.al. 2603.06049 translate read null
2026-03-06 Reinforcement Learning for Secrecy Optimization in Underwater Energy Harvesting Relay Network Shalini Tripathi et.al. 2603.06046 translate read null
2026-03-06 Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models Jiadong Pan et.al. 2603.06043 translate read null
2026-03-06 ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning Xingjian Tao et.al. 2603.06024 translate read null
2026-03-06 TADPO: Reinforcement Learning Goes Off-road Zhouchonghao Wu et.al. 2603.05995 translate read null
2026-03-06 LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution Song Fei et.al. 2603.05947 translate read null
2026-03-06 How to Model Your Crazyflie Brushless Alexander Gräfe et.al. 2603.05944 translate read null
2026-03-06 Swooper: Learning High-Speed Aerial Grasping With a Simple Gripper Ziken Huang et.al. 2603.05935 translate read null
2026-03-06 CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning Yuxin Xie et.al. 2603.05911 translate read null
2026-03-06 Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning Xuan Li et.al. 2603.05900 translate read null
2026-03-06 Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation Changcheng Li et.al. 2603.05881 translate read null
2026-03-06 PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues Yukun Qi et.al. 2603.05869 translate read null
2026-03-06 ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning Juyong Jiang et.al. 2603.05863 translate read null
2026-03-06 Expert Knowledge-driven Reinforcement Learning for Autonomous Racing via Trajectory Guidance and Dynamics Constraints Bo Leng et.al. 2603.05842 translate read null
2026-03-06 OpenHEART: Opening Heterogeneous Articulated Objects with a Legged Manipulator Seonghyeon Lim et.al. 2603.05830 translate read null
2026-03-06 CDF-Glove: A Cable-Driven Force Feedback Glove for Dexterous Teleoperation Huayue Liang et.al. 2603.05804 translate read null
2026-03-06 Task-Level Decisions to Gait Level Control: A Hierarchical Policy Approach for Quadruped Navigation Sijia Li et.al. 2603.05783 translate read null
2026-03-05 MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation Rifny Rachman et.al. 2603.05760 translate read null
2026-03-05 Reinforcement Learning for Power-Flow Network Analysis Alperen Ergur et.al. 2603.05673 translate read null
2026-03-05 TransMASK: Masked State Representation through Learned Transformation Sagar Parekh et.al. 2603.05670 translate read null
2026-03-05 When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On Wisdom Ikezogwo et.al. 2603.05659 translate read null
2026-03-05 Thinking with Spatial Code for Physical-World Video Reasoning Jieneng Chen et.al. 2603.05591 translate read null
2026-03-05 A Novel Hybrid Heuristic-Reinforcement Learning Optimization Approach for a Class of Railcar Shunting Problems Ruonan Zhao et.al. 2603.05579 translate read null
2026-03-05 Task Parameter Extrapolation via Learning Inverse Tasks from Forward Demonstrations Serdar Bahar et.al. 2603.05576 translate read null
2026-03-05 PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions Arnau Boix-Granell et.al. 2603.05574 translate read null
2026-03-05 Autocorrelation effects in a stochastic-process model for decision making via time series Tomoki Yamagami et.al. 2603.05559 translate read null
2026-03-05 RoboPocket: Improve Robot Policies Instantly with Your Phone Junjie Fang et.al. 2603.05504 translate read null
2026-03-05 Latent Wasserstein Adversarial Imitation Learning Siqi Yang et.al. 2603.05440 translate read null
2026-03-05 SpiderCat: Optimal Fault-Tolerant Cat State Preparation Andrey Boris Khesin et.al. 2603.05391 translate read null
2026-03-05 DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning Mohammad Mahdi Moradi et.al. 2603.05357 translate read null
2026-03-05 Latent Policy Steering through One-Step Flow Policies Hokyun Im et.al. 2603.05296 translate read null
2026-03-05 Knowledge Divergence and the Value of Debate for Scalable Oversight Robin Young et.al. 2603.05293 translate read null
2026-03-05 Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts Samandar Samandarov et.al. 2603.05276 translate read null
2026-03-05 SarcasmMiner: A Dual-Track Post-Training Framework for Robust Audio-Visual Sarcasm Reasoning Zhu Li et.al. 2603.05275 translate read null
2026-03-05 Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum Shan Ning et.al. 2603.05256 translate read null
2026-03-05 Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards Linghan Fang et.al. 2603.05231 translate read null
2026-03-05 KARL: Knowledge Agents via Reinforcement Learning Jonathan D. Chang et.al. 2603.05218 translate read null
2026-03-05 LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting Yewen Li et.al. 2603.05134 translate read null
2026-03-05 SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation Youqiang Gui et.al. 2603.05117 translate read null
2026-03-05 Decoupling Task and Behavior: A Two-Stage Reward Curriculum in Reinforcement Learning for Robotics Kilian Freitag et.al. 2603.05113 translate read null
2026-03-05 Reward-Conditioned Reinforcement Learning Michal Nauman et.al. 2603.05066 translate read null
2026-03-05 WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents Sicheng Fan et.al. 2603.05044 translate read null
2026-03-05 Formal Entropy-Regularized Control of Stochastic Systems Menno van Zutphen et.al. 2603.05021 translate read null
2026-03-05 BioLLMAgent: A Hybrid Framework with Enhanced Structural Interpretability for Simulating Human Decision-Making in Computational Psychiatry Zuo Fei et.al. 2603.05016 translate read null
2026-03-05 Competitive Multi-Operator Reinforcement Learning for Joint Pricing and Fleet Rebalancing in AMoD Systems Emil Kragh Toft et.al. 2603.05000 translate read null
2026-03-05 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding Xiongkun Linghu et.al. 2603.04976 translate read null
2026-03-05 $\nabla$ -Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space Peihao Wang et.al. 2603.04948 translate read null
2026-03-05 Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition Mengze Hong et.al. 2603.04945 translate read null
2026-03-05 BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning Yuan Li et.al. 2603.04918 translate read null
2026-03-05 VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory Yuheng Lei et.al. 2603.04910 translate read null
2026-03-05 Task-Relevant and Irrelevant Region-Aware Augmentation for Generalizable Vision-Based Imitation Learning in Agricultural Manipulation Shun Hattori et.al. 2603.04845 translate read null
2026-03-05 SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning Manav Vora et.al. 2603.04833 translate read null
2026-03-05 VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment Jiawei Chen et.al. 2603.04822 translate read null
2026-03-05 Diffusion Policy through Conditional Proximal Policy Optimization Ben Liu et.al. 2603.04790 translate read null
2026-03-05 Adaptive Personalized Federated Reinforcement Learning for RIS-Assisted Aerial Relays in SAGINs with Fluid Antennas Yuxuan Yang et.al. 2603.04788 translate read null
2026-03-05 Data-Driven Control of a Magnetically Actuated Fish-Like Robot Akiyuki Koyama et.al. 2603.04787 translate read null
2026-03-05 Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction Xingwu Chen et.al. 2603.04783 translate read null
2026-03-05 Selfish Cooperation Towards Low-Altitude Economy: Integrated Multi-Service Deployment with Resilient Federated Reinforcement Learning Yuxuan Yang et.al. 2603.04779 translate read null
2026-03-05 Distributional Reinforcement Learning with Information Bottleneck for Uncertainty-Aware DRAM Equalization Muhammad Usama et.al. 2603.04768 translate read null
2026-03-05 LLM-Guided Decentralized Exploration with Self-Organizing Robot Teams Hiroaki Kawashima et.al. 2603.04762 translate read null
2026-03-05 SeekRBP: Leveraging Sequence-Structure Integration with Reinforcement Learning for Receptor-Binding Protein Identification Xiling Luo et.al. 2603.04748 translate read null
2026-03-04 Optimizing Language Models for Crosslingual Knowledge Consistency Tianyu Liu et.al. 2603.04678 translate read null
2026-03-04 When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift Kevin Vogt-Lowell et.al. 2603.04648 translate read null
2026-03-04 Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Lei Huang et.al. 2603.04597 translate read null
2026-03-04 ELLIPSE: Evidential Learning for Robust Waypoints and Uncertainties Zihao Dong et.al. 2603.04585 translate read null
2026-03-04 Risk-Aware Reinforcement Learning for Mobile Manipulation Michael Groom et.al. 2603.04579 translate read null
2026-03-04 Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling Tal Daniel et.al. 2603.04553 translate read null
2026-03-04 Transformer-Based Multipath Congestion Control: A Decoupled Approach for Wireless Uplinks Zongyuan Zhang et.al. 2603.04550 translate read null
2026-03-04 PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation Rosy Chen et.al. 2603.04531 translate read null
2026-03-04 TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning Maximilian von Klinski et.al. 2603.04380 translate read null
2026-03-04 Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks Haoyu Liu et.al. 2603.04364 translate read null
2026-03-04 A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications Ozan Aygün et.al. 2603.04353 translate read null
2026-03-04 Tendon Force Modeling for Sim2Real Transfer of Reinforcement Learning Policies for Tendon-Driven Robots Valentin Yuryev et.al. 2603.04351 translate read null
2026-03-04 What Does Flow Matching Bring To TD Learning? Bhavya Agrawalla et.al. 2603.04333 translate read null
2026-03-04 IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning Yihao Qin et.al. 2603.04289 translate read null
2026-03-04 Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory Zhenting Wang et.al. 2603.04257 translate read null
2026-03-04 OptiQKD: A Machine Learning-Optimized Framework for Real-Time Parameter Tuning in Quantum Key Distribution Noureldin Mohamed et.al. 2603.04192 translate read null
2026-03-04 Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation Ilseung Park et.al. 2603.04166 translate read null
2026-03-04 BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning Tarjei Paule Hage et.al. 2603.04124 translate read null
2026-03-04 Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning Ajan Subramanian et.al. 2603.04098 translate read null
2026-03-04 Swimming Under Constraints: A Safe Reinforcement Learning Framework for Quadrupedal Bio-Inspired Propulsion Xinyu Cui et.al. 2603.04073 translate read null
2026-03-04 SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling Jinlong Cui et.al. 2603.04071 translate read null
2026-03-04 Force-Aware Residual DAgger via Trajectory Editing for Precision Insertion with Impedance Control Yiou Huang et.al. 2603.04038 translate read null
2026-03-04 Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback Fabian Domberg et.al. 2603.04029 translate read null
2026-03-04 Rethinking the Efficiency and Effectiveness of Reinforcement Learning for Radiology Report Generation Zilin Lu et.al. 2603.04022 translate read null
2026-03-04 Discriminative Perception via Anchored Description for Reasoning Segmentation Tao Yang et.al. 2603.04002 translate read null
2026-03-04 Structural Action Transformer for 3D Dexterous Manipulation Xiaohan Lei et.al. 2603.03960 translate read null
2026-03-04 GIPO: Gaussian Importance Sampling Policy Optimization Chengxuan Lu et.al. 2603.03955 translate read null
2026-03-04 RVN-Bench: A Benchmark for Reactive Visual Navigation Jaewon Lee et.al. 2603.03953 translate read null
2026-03-04 Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control Nicolas Helson et.al. 2603.03932 translate read null
2026-03-04 IROSA: Interactive Robot Skill Adaptation using Natural Language Markus Knauer et.al. 2603.03897 translate read null
2026-03-04 Dual-Interaction-Aware Cooperative Control Strategy for Alleviating Mixed Traffic Congestion Zhengxuan Liu et.al. 2603.03848 translate read null
2026-03-04 Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation Yun Lu et.al. 2603.03820 translate read null
2026-03-04 Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling Emile Anand et.al. 2603.03759 translate read null
2026-03-04 Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning Chuang Zhang et.al. 2603.03752 translate read null
2026-03-04 Interaction-Aware Whole-Body Control for Compliant Object Transport Hao Zhang et.al. 2603.03751 translate read null
2026-03-04 HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration Hao Zhang et.al. 2603.03741 translate read null
2026-03-04 UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services Tonmoy Dey et.al. 2603.03701 translate read null
2026-03-04 MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation Lu Yang et.al. 2603.03680 translate read null
2026-03-04 MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation Guoyi Li et.al. 2603.03677 translate read null
2026-03-04 Principled Learning-to-Communicate with Quasi-Classical Information Structures Xiangyu Liu et.al. 2603.03664 translate read null
2026-03-04 Freezing of Gait Prediction using Proactive Agent that Learns from Selected Experience and DDQN Algorithm Septian Enggar Sukmana et.al. 2603.03651 translate read null
2026-03-04 Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration Danish Rizvi et.al. 2603.03595 translate read null
2026-03-03 Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence Shengbo Wang et.al. 2603.03523 translate read null
2026-03-03 PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation Shang Wu et.al. 2603.03505 translate read null
2026-03-03 Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion Haoran Lu et.al. 2603.03485 translate read null
2026-03-03 Optimal trajectory-guided stochastic co-optimization for e-fuel system design and real-time operation Jeongdong Kim et.al. 2603.03484 translate read null
2026-03-03 Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning Harin Lee et.al. 2603.03480 translate read null
2026-03-03 [Re] FairDICE: A Gap Between Theory And Practice Peter Adema et.al. 2603.03454 translate read null
2026-03-03 Beyond Accuracy: Evaluating Visual Grounding In Multimodal Medical Reasoning Anas Zafar et.al. 2603.03437 translate read null
2026-03-03 Multi-Agent-Based Simulation of Archaeological Mobility in Uneven Landscapes Chairi Kiourt et.al. 2603.03390 translate read null
2026-03-03 How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference Toru Lin et.al. 2603.03280 translate read null
2026-03-03 ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation Xialin He et.al. 2603.03279 translate read null
2026-03-03 Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use Aradhye Agarwal et.al. 2603.03205 translate read null
2026-03-03 Specificity-aware reinforcement learning for fine-grained open-world classification Samuele Angheben et.al. 2603.03197 translate read null
2026-03-03 Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing Jiyuan Wang et.al. 2603.03143 translate read null
2026-03-03 RL-Based Coverage Path Planning for Deformable Objects on 3D Surfaces Yuhang Zhang et.al. 2603.03137 translate read null
2026-03-03 Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics Hossein Rastgoftar et.al. 2603.03127 translate read null
2026-03-03 Proactive Guiding Strategy for Item-side Fairness in Interactive Recommendation Chongjun Xia et.al. 2603.03094 translate read null
2026-03-03 RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization Siwei Zhang et.al. 2603.03078 translate read null
2026-03-03 TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning Christian Greisinger et.al. 2603.03072 translate read null
2026-03-03 Reinforcement Learning with Symbolic Reward Machines Thomas Krug et.al. 2603.03068 translate read null
2026-03-03 CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots Shihao Ma et.al. 2603.03067 translate read null
2026-03-03 PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems Sudip Bhujel et.al. 2603.03054 translate read null
2026-03-03 QFlowNet: Fast, Diverse, and Efficient Unitary Synthesis with Generative Flow Networks Inhoe Koo et.al. 2603.03045 translate read null
2026-03-03 Why Does RLAIF Work At All? Robin Young et.al. 2603.03000 translate read null
2026-03-03 Contextualized Privacy Defense for LLM Agents Yule Wen et.al. 2603.02983 translate read null
2026-03-03 DreamFlow: Local Navigation Beyond Observation via Conditional Flow Matching in the Latent Space Jiwon Park et.al. 2603.02976 translate read null
2026-03-03 CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning Zhenquan Yao et.al. 2603.02951 translate read null
2026-03-03 Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models Fengzhi Li et.al. 2603.02938 translate read null
2026-03-03 Contextual Latent World Models for Offline Meta Reinforcement Learning Mohammadreza Nakheai et.al. 2603.02935 translate read null
2026-03-03 On the Structural Limitations of Weight-Based Neural Adaptation and the Role of Reversible Behavioral Learning Pardhu Sri Rushi Varma Konduru et.al. 2603.02934 translate read null
2026-03-03 Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection? Xin Wang et.al. 2603.02914 translate read null
2026-03-03 SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training Qi Zhang et.al. 2603.02908 translate read null
2026-03-03 Learning in Markov Decision Processes with Exogenous Dynamics Davide Maran et.al. 2603.02862 translate read null
2026-03-03 Rhythm: Learning Interactive Whole-Body Control for Dual Humanoids Hongjin Chen et.al. 2603.02856 translate read null
2026-03-03 Learning Memory-Enhanced Improvement Heuristics for Flexible Job Shop Scheduling Jiaqi Wang et.al. 2603.02846 translate read null
2026-03-03 VSearcher: Long-Horizon Multimodal Search Agent via Reinforcement Learning Ruiyang Zhang et.al. 2603.02795 translate read null
2026-03-03 Generative adversarial imitation learning for robot swarms: Learning from human demonstrations and trained policies Mattes Kraus et.al. 2603.02783 translate read null
2026-03-03 Next Embedding Prediction Makes World Models Stronger George Bredis et.al. 2603.02765 translate read null
2026-03-03 Enhancing User Throughput in Multi-panel mmWave Radio Access Networks for Beam-based MU-MIMO Using a DRL Method Ramin Hashemi et.al. 2603.02745 translate read null
2026-03-03 From “What” to “How”: Constrained Reasoning for Autoregressive Image Generation Ruxue Yan et.al. 2603.02712 translate read null
2026-03-03 Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization Yueyang Cang et.al. 2603.02701 translate read null
2026-03-03 VisionCreator: A Native Visual-Generation Agentic Model with Understanding, Thinking, Planning and Creation Jinxiang Lai et.al. 2603.02681 translate read null
2026-03-03 Watch Your Step: Learning Semantically-Guided Locomotion in Cluttered Environment Denan Liang et.al. 2603.02657 translate read null
2026-03-03 Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization Seongmin Kim et.al. 2603.02654 translate read null
2026-03-03 Improving Diffusion Planners by Self-Supervised Action Gating with Energies Yuan Lu et.al. 2603.02650 translate read null
2026-03-02 Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training Valentin Lacombe et.al. 2603.02208 translate read null
2026-03-02 Tool Verification for Test-Time Reinforcement Learning Ruotong Liao et.al. 2603.02203 translate read null
2026-03-02 Near-Optimal Regret for KL-Regularized Multi-Armed Bandits Kaixuan Ji et.al. 2603.02155 translate read null
2026-03-02 LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards Guanzheng Chen et.al. 2603.02146 translate read null
2026-03-02 Rethinking Camera Choice: An Empirical Study on Fisheye Camera Properties in Robotic Manipulation Han Xue et.al. 2603.02139 translate read null
2026-03-02 Pencil Puzzle Bench: A Benchmark for Multi-Step Verifiable Reasoning Justin Waugh et.al. 2603.02119 translate read null
2026-03-02 ACDC: Adaptive Curriculum Planning with Dynamic Contrastive Control for Goal-Conditioned Reinforcement Learning in Robotic Manipulation Xuerui Wang et.al. 2603.02104 translate read null
2026-03-02 Learning from Synthetic Data Improves Multi-hop Reasoning Anmol Kabra et.al. 2603.02091 translate read null
2026-03-02 Reinforcement Learning-Based Filters for Convection-Dominated Flows: Reference-Free and Reference-Guided Training Anna Ivagnes et.al. 2603.02086 translate read null
2026-03-02 $π$ -StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs Siting Wang et.al. 2603.02083 translate read null
2026-03-02 Accelerating PDE Surrogates via RL-Guided Mesh Optimization Yang Meng et.al. 2603.02066 translate read null
2026-03-02 Expanding LLM Agent Boundaries with Strategy-Guided Exploration Andrew Szot et.al. 2603.02045 translate read null
2026-03-02 Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards Faisal Mohamed et.al. 2603.02008 translate read null
2026-03-02 Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection Yuchen Zhang et.al. 2603.01993 translate read null
2026-03-02 CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production Yixin Nie et.al. 2603.01973 translate read null
2026-03-02 CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification Jinpeng Chen et.al. 2603.01940 translate read null
2026-03-02 LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving Yuechen Luo et.al. 2603.01928 translate read null
2026-03-02 Efficient RLVR Training via Weighted Mutual Information Data Selection Xinyu Zhou et.al. 2603.01907 translate read null
2026-03-02 Visual Bias in Simulated Users: The Impact of Luminance and Contrast on Reinforcement Learning-based Interaction Hannah Selder et.al. 2603.01901 translate read null
2026-03-02 Generative Visual Chain-of-Thought for Image Editing Zijin Yin et.al. 2603.01893 translate read null
2026-03-02 SEAR: Sample Efficient Action Chunking Reinforcement Learning C. F. Maximilian Nagy et.al. 2603.01891 translate read null
2026-03-02 FireRed-OCR Technical Report Hao Wu et.al. 2603.01840 translate read link
2026-03-02 Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport Harry Amad et.al. 2603.01771 translate read null
2026-03-02 Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning Naoki Shitanda et.al. 2603.01741 translate read null
2026-03-02 TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training Jinluan Yang et.al. 2603.01714 translate read null
2026-03-02 Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning Haonan Jia et.al. 2603.01696 translate read null
2026-03-02 MVR: Multi-view Video Reward Shaping for Reinforcement Learning Lirui Luo et.al. 2603.01694 translate read null
2026-03-02 Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs Shuangchun Gui et.al. 2603.01667 translate read null
2026-03-02 Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning Jiebin Zhang et.al. 2603.01639 translate read null
2026-03-02 Learning Thermal-Aware Locomotion Policies for an Electrically-Actuated Quadruped Robot Letian Qian et.al. 2603.01631 translate read null
2026-03-02 ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment in Domain-Specific Agents Pengbo Liu et.al. 2603.01620 translate read null
2026-03-02 CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework Yuexi Du et.al. 2603.01607 translate read null
2026-03-02 Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Qiyuan Zhang et.al. 2603.01571 translate read null
2026-03-02 Investigating Group Relative Policy Optimization for Diffusion Transformer based Text-to-Audio Generation Yi Gu et.al. 2603.01565 translate read null
2026-03-02 LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models Chenxing Wei et.al. 2603.01563 translate read null
2026-03-02 State-Action Inpainting Diffuser for Continuous Control with Delay Dongqi Han et.al. 2603.01553 translate read null
2026-03-02 GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control Haofeng Xu et.al. 2603.01501 translate read null
2026-03-02 LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning Chang Yao et.al. 2603.01488 translate read null
2026-03-02 Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents Haojin Yang et.al. 2603.01481 translate read null
2026-03-02 Towards Robot Skill Learning and Adaptation with Gaussian Processes A K M Nadimul Haque et.al. 2603.01480 translate read null
2026-03-02 ProtRLSearch: A Multi-Round Multimodal Protein Search Agent with Large Language Models Trained via Reinforcement Learning Congying Liu et.al. 2603.01464 translate read null
2026-03-02 Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning Shaohuai Liu et.al. 2603.01452 translate read null
2026-03-02 Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents Zhixiang Wang et.al. 2603.01416 translate read null
2026-03-02 MIST-RL: Mutation-based Incremental Suite Testing via Reinforcement Learning Sicheng Zhu et.al. 2603.01409 translate read null
2026-03-02 SubstratumGraphEnv: Reinforcement Learning Environment (RLE) for Modeling System Attack Paths Bahirah Adewunmi et.al. 2603.01340 translate read null
2026-03-02 Energy Efficient Traffic Scheduling For Optical LEO Satellite Downlinks Ethan Fettes et.al. 2603.01334 translate read null
2026-03-01 PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure Joshua Steier et.al. 2603.01309 translate read null
2026-03-01 Hybrid TD3: Overestimation Bias Analysis and Stable Policy Optimization for Hybrid Action Space Thanh-Tuan Tran et.al. 2603.01302 translate read null
2026-03-01 When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains Ahmadreza Jeddi et.al. 2603.01301 translate read link
2026-03-01 Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models Adel Javanmard et.al. 2603.01293 translate read null
2026-03-01 Integrating LTL Constraints into PPO for Safe Reinforcement Learning Maifang Zhang et.al. 2603.01292 translate read null
2026-03-01 Beyond Reward: A Bounded Measure of Agent Environment Coupling Wael Hafez et.al. 2603.01283 translate read null
2026-03-01 MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers Abdulhamid M. Mousa et.al. 2603.01260 translate read null
2026-03-01 Towards Policy-Adaptive Image Guardrail: Benchmark and Method Caiyong Piao et.al. 2603.01228 translate read null
2026-03-01 Can Thinking Models Think to Detect Hateful Memes? Mohamed Bayan Kmainasi et.al. 2603.01225 translate read null
2026-03-01 Learn Hard Problems During RL with Reference Guided Fine-tuning Yangzhen Wu et.al. 2603.01223 translate read null
2026-03-01 Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning Dan Qiao et.al. 2603.01221 translate read null
2026-03-01 Reasoning Boosts Opinion Alignment in LLMs Frédéric Berdoz et.al. 2603.01214 translate read null
2026-03-01 PARWiS: Winner determination under shoestring budgets using active pairwise comparisons Shailendra Bhandari et.al. 2603.01171 translate read null
2026-03-01 BeautyGRPO: Aesthetic Alignment for Face Retouching via Dynamic Path Guidance and Fine-Grained Preference Modeling Jiachen Yang et.al. 2603.01163 translate read null
2026-03-01 DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent Tongzhou Wu et.al. 2603.01152 translate read null
2026-03-01 Compact Task-Aligned Imitation Learning for Laboratory Automation Kanata Suzuki et.al. 2603.01110 translate read null
2026-03-01 DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage Haowen Gao et.al. 2603.01106 translate read null
2026-03-01 Feasible Pairings for Decentralized Integral Controllability of Non-Square Systems Yuhao Tong et.al. 2603.01076 translate read null
2026-03-01 How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning Xiangxiang Zhang et.al. 2603.01070 translate read null
2026-03-01 Unleashing VLA Potentials in Autonomous Driving via Explicit Learning from Failures Yuechen Luo et.al. 2603.01063 translate read null
2026-03-01 MM-DeepResearch: A Simple and Effective Multimodal Agentic Search Baseline Huanjin Yao et.al. 2603.01050 translate read null
2026-03-01 HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents Hongbo Jin et.al. 2603.00977 translate read null
2026-03-01 Intent-Context Synergy Reinforcement Learning for Autonomous UAV Decision-Making in Air Combat Jiahao Fu et.al. 2603.00974 translate read null
2026-03-01 Stabilizing Policy Optimization via Logits Convexity Hongzhan Chen et.al. 2603.00963 translate read null
2026-03-01 HierKick: Hierarchical Reinforcement Learning for Vision-Guided Soccer Robot Control Yizhi Chen et.al. 2603.00948 translate read null
2026-03-01 Non-Rectangular Average-Reward Robust MDPs: Non-Rectangular Average-Reward Robust MDPs:Optimal Policies and Their Transient Values Shengbo wang et.al. 2603.00945 translate read null
2026-03-01 Minimalist Compliance Control Haochen Shi et.al. 2603.00913 translate read null
2026-03-01 Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning Ke Sun et.al. 2603.00903 translate read null
2026-03-01 CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning Xinyu Zhu et.al. 2603.00889 translate read null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)