Reinforcement Learning - 2026-03
Reinforcement Learning - 2026-03
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2026-03-31 | HapCompass: A Rotational Haptic Device for Contact-Rich Robotic Teleoperation | Xiangshan Tan et.al. | 2603.30042 | translate | read | null |
| 2026-03-31 | Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models | Md Saad et.al. | 2603.30022 | translate | read | null |
| 2026-03-31 | Phyelds: A Pythonic Framework for Aggregate Computing | Gianluca Aguzzi et.al. | 2603.29999 | translate | read | null |
| 2026-03-31 | GreenFLag: A Green Agentic Approach for Energy-Efficient Federated Learning | Theodora Panagea et.al. | 2603.29933 | translate | read | null |
| 2026-03-31 | ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training | Rui Ai et.al. | 2603.29871 | translate | read | null |
| 2026-03-31 | An Output Feedback Q-learning Algorithm for Optimal Control of Nonlinear Systems with Koopman Linear Embedding | Victor G. Lopez et.al. | 2603.29858 | translate | read | null |
| 2026-03-31 | Friends, Foes, and First Authors: A Game Theory Model of How Power Plays Rewrite Academic Co-Authorship Networks | Amit Bengal et.al. | 2603.29834 | translate | read | null |
| 2026-03-31 | Reinforced Reasoning for End-to-End Retrosynthetic Planning | Chenyang Zuo et.al. | 2603.29723 | translate | read | null |
| 2026-03-31 | 6GAgentGym: Tool Use, Data Synthesis, and Agentic Learning for Network Management | Jiao Chen et.al. | 2603.29656 | translate | read | null |
| 2026-03-31 | ASI-Evolve: AI Accelerates AI | Weixian Xu et.al. | 2603.29640 | translate | read | null |
| 2026-03-31 | Learning Diagnostic Reasoning for Decision Support in Toxicology | Nico Oberländer et.al. | 2603.29608 | translate | read | null |
| 2026-03-31 | GraSP-STL: A Graph-Based Framework for Zero-Shot Signal Temporal Logic Planning via Offline Goal-Conditioned Reinforcement Learning | Ancheng Hou et.al. | 2603.29533 | translate | read | null |
| 2026-03-31 | Target-Aligned Reinforcement Learning | Leonard S. Pleiss et.al. | 2603.29501 | translate | read | null |
| 2026-03-31 | Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries | Luoxin Chen et.al. | 2603.29500 | translate | read | null |
| 2026-03-31 | MemFactory: Unified Inference & Training Framework for Agent Memory | Ziliang Guo et.al. | 2603.29493 | translate | read | null |
| 2026-03-31 | Calibrated Confidence Expression for Radiology Report Generation | David Bani-Harouni et.al. | 2603.29492 | translate | read | null |
| 2026-03-31 | Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement Learning | Jiaao Ma et.al. | 2603.29426 | translate | read | null |
| 2026-03-31 | AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP | Enlai Li et.al. | 2603.29369 | translate | read | null |
| 2026-03-31 | Scaling Whole-Body Human Musculoskeletal Behavior Emulation for Specificity and Diversity | Yunyue Wei et.al. | 2603.29332 | translate | read | null |
| 2026-03-31 | Downsides of Smartness Across Edge-Cloud Continuum in Modern Industry | Akhil Gupta Chigullapally et.al. | 2603.29289 | translate | read | null |
| 2026-03-31 | MemRerank: Preference Memory for Personalized Product Reranking | Zhiyuan Peng et.al. | 2603.29247 | translate | read | null |
| 2026-03-30 | Gen-Searcher: Reinforcing Agentic Search for Image Generation | Kaituo Feng et.al. | 2603.28767 | translate | read | null |
| 2026-03-30 | SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning | Philip Schroeder et.al. | 2603.28730 | translate | read | null |
| 2026-03-30 | Stepwise Credit Assignment for GRPO on Flow-Matching Models | Yash Savani et.al. | 2603.28718 | translate | read | null |
| 2026-03-30 | Dynamic Dual-Granularity Skill Bank for Agentic RL | Songjun Tu et.al. | 2603.28716 | translate | read | null |
| 2026-03-30 | DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing | Kailai Feng et.al. | 2603.28713 | translate | read | null |
| 2026-03-30 | Dynamic Lookahead Distance via Reinforcement Learning-Based Pure Pursuit for Autonomous Racing | Mohamed Elgouhary et.al. | 2603.28625 | translate | read | null |
| 2026-03-30 | Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning | Ziqi Miao et.al. | 2603.28618 | translate | read | null |
| 2026-03-30 | Learning Partial Action Replacement in Offline MARL | Yue Jin et.al. | 2603.28573 | translate | read | null |
| 2026-03-30 | GraphWalker: Agentic Knowledge Graph Question Answering via Synthetic Trajectory Curriculum | Shuwen Xu et.al. | 2603.28533 | translate | read | null |
| 2026-03-30 | Intelligent Radio Resource Slicing for 6G In-Body Subnetworks | Samira Abdelrahman et.al. | 2603.28529 | translate | read | null |
| 2026-03-30 | Tac2Real: Reliable and GPU Visuotactile Simulation for Online Reinforcement Learning and Zero-Shot Real-World Deployment | Ningyu Yan et.al. | 2603.28475 | translate | read | null |
| 2026-03-30 | CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains | Wenhan Wang et.al. | 2603.28474 | translate | read | null |
| 2026-03-30 | $R_{dm}$ : Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation | Linqian Fan et.al. | 2603.28460 | translate | read | null |
| 2026-03-30 | Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation | Robin Kühn et.al. | 2603.28422 | translate | read | null |
| 2026-03-30 | Learning unified control of internal spin squeezing in atomic qudits for magnetometry | C. Z. Cao et.al. | 2603.28421 | translate | read | null |
| 2026-03-30 | Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models | Alkis Sygkounas et.al. | 2603.28416 | translate | read | null |
| 2026-03-30 | Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids | Carlos S. Sepúlveda et.al. | 2603.28385 | translate | read | null |
| 2026-03-30 | Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models | Tao Xia et.al. | 2603.28367 | translate | read | null |
| 2026-03-30 | Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization | He Du et.al. | 2603.28342 | translate | read | null |
| 2026-03-30 | Competitor-aware Race Management for Electric Endurance Racing | Wytze de Vries et.al. | 2603.28286 | translate | read | null |
| 2026-03-30 | Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback | Andi Nika et.al. | 2603.28281 | translate | read | null |
| 2026-03-30 | Cost-Matching Model Predictive Control for Efficient Reinforcement Learning in Humanoid Locomotion | Wenqi Cai et.al. | 2603.28243 | translate | read | null |
| 2026-03-30 | ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models | Song Yu et.al. | 2603.28204 | translate | read | null |
| 2026-03-30 | A Deep Reinforcement Learning Framework for Closed-loop Guidance of Fish Schools via Virtual Agents | Takato Shibayama et.al. | 2603.28200 | translate | read | null |
| 2026-03-30 | MedLoc-R1: Performance-Aware Curriculum Reward Scheduling for GRPO-Based Medical Visual Grounding | Guangjing Yang et.al. | 2603.28120 | translate | read | null |
| 2026-03-30 | $AutoDrive\text{-}P^3$ : Unified Chain of Perception-Prediction-Planning Thought via Reinforcement Fine-Tuning | Yuqi Ye et.al. | 2603.28116 | translate | read | null |
| 2026-03-30 | Heddle: A Distributed Orchestration System for Agentic RL Rollout | Zili Zhang et.al. | 2603.28101 | translate | read | null |
| 2026-03-30 | Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection | Tim Plotzki et.al. | 2603.28074 | translate | read | null |
| 2026-03-30 | Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL | Udita Ghosh et.al. | 2603.28053 | translate | read | null |
| 2026-03-30 | CARLA-Air: Fly Drones Inside a CARLA World – A Unified Infrastructure for Air-Ground Embodied Intelligence | Tianle Zeng et.al. | 2603.28032 | translate | read | null |
| 2026-03-30 | Energy-Aware Imitation Learning for Steering Prediction Using Events and Frames | Hu Cao et.al. | 2603.28008 | translate | read | null |
| 2026-03-30 | SARL: Label-Free Reinforcement Learning by Rewarding Reasoning Topology | Yifan Wang et.al. | 2603.27977 | translate | read | null |
| 2026-03-30 | Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning | Bodla Krishna Vamshi et.al. | 2603.27971 | translate | read | null |
| 2026-03-30 | Flip Stunts on Bicycle Robots using Iterative Motion Imitation | Jeonghwan Kim et.al. | 2603.27944 | translate | read | null |
| 2026-03-25 | DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving | Pengxuan Yang et.al. | 2603.24587 | translate | read | null |
| 2026-03-25 | MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination | Zhuo Li et.al. | 2603.24579 | translate | read | null |
| 2026-03-25 | VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models | Qijia He et.al. | 2603.24575 | translate | read | null |
| 2026-03-25 | Completeness of Unbounded Best-First Minimax and Descent Minimax | Quentin Cohen-Solal et.al. | 2603.24572 | translate | read | null |
| 2026-03-25 | Composer 2 Technical Report | Cursor Reseach et.al. | 2603.24477 | translate | read | null |
| 2026-03-25 | Improving Lean4 Autoformalization via Cycle Consistency Fine-tuning | Arsen Shebzukhov et.al. | 2603.24372 | translate | read | null |
| 2026-03-25 | CoordLight: Learning Decentralized Coordination for Network-Wide Traffic Signal Control | Yifeng Zhang et.al. | 2603.24366 | translate | read | null |
| 2026-03-25 | LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control | Yifeng Zhang et.al. | 2603.24361 | translate | read | null |
| 2026-03-25 | Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning | Dogan Urgun et.al. | 2603.24324 | translate | read | null |
| 2026-03-25 | Heuristic Self-Paced Learning for Domain Adaptive Semantic Segmentation under Adverse Conditions | Shiqin Wang et.al. | 2603.24322 | translate | read | null |
| 2026-03-25 | C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents | Guihlerme Daubt et.al. | 2603.24241 | translate | read | null |
| 2026-03-25 | Decentralized End-to-End Multi-AAV Pursuit Using Predictive Spatio-Temporal Observation via Deep Reinforcement Learning | Yude Li et.al. | 2603.24238 | translate | read | null |
| 2026-03-25 | SumRank: Aligning Summarization Models for Long-Document Listwise Reranking | Jincheng Feng et.al. | 2603.24204 | translate | read | null |
| 2026-03-25 | A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula | Cansu Sancaktar et.al. | 2603.24202 | translate | read | null |
| 2026-03-25 | Optimized control protocols for stable skyrmion creation using deep reinforcement learning | Ji Seok Song et.al. | 2603.24177 | translate | read | null |
| 2026-03-25 | A Longitudinal Analysis of the CEC Single-Objective Competitions (2010-2024) and Implications for Variational Quantum Optimization | Vojtěch Novák et.al. | 2603.24140 | translate | read | null |
| 2026-03-25 | Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection | Zhanhe Lei et.al. | 2603.24139 | translate | read | null |
| 2026-03-25 | Likelihood hacking in probabilistic program synthesis | Jacek Karwowski et.al. | 2603.24126 | translate | read | null |
| 2026-03-25 | Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization | Fei Bai et.al. | 2603.24093 | translate | read | null |
| 2026-03-25 | Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning | Aditya Narendra et.al. | 2603.24083 | translate | read | null |
| 2026-03-25 | PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning | Huanyu Li et.al. | 2603.24047 | translate | read | null |
| 2026-03-25 | Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage | Rishikesh Sahay et.al. | 2603.23966 | translate | read | null |
| 2026-03-25 | From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments | Lijing Luo et.al. | 2603.23964 | translate | read | null |
| 2026-03-25 | PointRFT: Explicit Reinforcement Fine-tuning for Point Cloud Few-shot Learning | Yankai Wang et.al. | 2603.23957 | translate | read | null |
| 2026-03-25 | Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs | Guy Zamir et.al. | 2603.23926 | translate | read | null |
| 2026-03-25 | Off-Policy Safe Reinforcement Learning with Constrained Optimistic Exploration | Guopeng Li et.al. | 2603.23889 | translate | read | null |
| 2026-03-25 | ProcureGym: A Multi-Agent Markov Game Framework for Modeling National Volume-based Drug Procurement | Jia Wang et.al. | 2603.23880 | translate | read | null |
| 2026-03-25 | The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search | Forest Agostinelli et.al. | 2603.23873 | translate | read | null |
| 2026-03-25 | HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation | Ken Ding et.al. | 2603.23871 | translate | read | null |
| 2026-03-25 | Joint Source-Channel-Check Coding with HARQ for Reliable Semantic Communications | Boyuan Li et.al. | 2603.23869 | translate | read | null |
| 2026-03-25 | Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation | Han Zheng et.al. | 2603.23838 | translate | read | null |
| 2026-03-25 | Human, AI, and Hybrid Ensembles for Detection of Adaptive, RL-based Social Bots | Valerio La Gatta et.al. | 2603.23796 | translate | read | null |
| 2026-03-24 | Self Paced Gaussian Contextual Reinforcement Learning | Mohsen Sahraei Ardakani et.al. | 2603.23755 | translate | read | null |
| 2026-03-24 | BXRL: Behavior-Explainable Reinforcement Learning | Ram Rachum et.al. | 2603.23738 | translate | read | null |
| 2026-03-24 | Dual-Gated Epistemic Time-Dilation: Autonomous Compute Modulation in Asynchronous MARL | Igor Jankowski et.al. | 2603.23722 | translate | read | null |
| 2026-03-24 | Utilizing Adversarial Training for Robust Voltage Control: An Adaptive Deep Reinforcement Learning Method | Sungjoo Chung et.al. | 2603.23648 | translate | read | null |
| 2026-03-24 | Safe Reinforcement Learning with Preference-based Constraint Inference | Chenglin Li et.al. | 2603.23565 | translate | read | null |
| 2026-03-21 | Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction | Haoyu Wang et.al. | 2603.23550 | translate | read | null |
| 2026-03-24 | UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation | Jie Liu et.al. | 2603.23500 | translate | read | null |
| 2026-03-24 | WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG | Zhen Li et.al. | 2603.23497 | translate | read | null |
| 2026-03-24 | End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions | Zakaria Mhammedi et.al. | 2603.23461 | translate | read | null |
| 2026-03-24 | SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling | Yiqi Zhang et.al. | 2603.23414 | translate | read | null |
| 2026-03-24 | A Joint Reinforcement Learning Scheduling and Compression Framework for Teleoperated Driving | Giacomo Avanzi et.al. | 2603.23387 | translate | read | null |
| 2026-03-24 | Off-Policy Value-Based Reinforcement Learning for Large Language Models | Peng-Yuan Wang et.al. | 2603.23355 | translate | read | null |
| 2026-03-24 | Learning Multi-Agent Local Collision-Avoidance for Collaborative Carrying tasks with Coupled Quadrupedal Robots | Francesca Bray et.al. | 2603.23278 | translate | read | null |
| 2026-03-24 | A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling | Ruisong Zhou et.al. | 2603.23249 | translate | read | null |
| 2026-03-24 | Neural ODE and SDE Models for Adaptation and Planning in Model-Based Reinforcement Learning | Chao Han et.al. | 2603.23245 | translate | read | null |
| 2026-03-24 | GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL | Haoyu Wang et.al. | 2603.23232 | translate | read | null |
| 2026-03-24 | ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment | Hao Wang et.al. | 2603.23184 | translate | read | null |
| 2026-03-24 | Path Planning and Reinforcement Learning-Driven Control of On-Orbit Free-Flying Multi-Arm Robots | Álvaro Belmonte-Baeza et.al. | 2603.23182 | translate | read | null |
| 2026-03-24 | Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems | Guoxin Su et.al. | 2603.23113 | translate | read | null |
| 2026-03-24 | SpecXMaster Technical Report | Yutang Ge et.al. | 2603.23101 | translate | read | null |
| 2026-03-24 | Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards | Orhun Buğra Baran et.al. | 2603.23086 | translate | read | null |
| 2026-03-24 | MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models | Jianxin Lin et.al. | 2603.23085 | translate | read | null |
| 2026-03-24 | Minimizing Material Waste in Additive Manufacturing through Online Reel Assignment | Ilayda Celenk et.al. | 2603.23042 | translate | read | null |
| 2026-03-24 | From Morality Installation in LLMs to LLMs in Morality-as-a-System | Gunter Bombaerts et.al. | 2603.22944 | translate | read | null |
| 2026-03-24 | Quality Over Clicks: Intrinsic Quality-Driven Iterative Reinforcement Learning for Cold-Start E-Commerce Query Suggestion | Qi Sun et.al. | 2603.22922 | translate | read | null |
| 2026-03-24 | EVA: Efficient Reinforcement Learning for End-to-End Video Agent | Yaolun Zhang et.al. | 2603.22918 | translate | read | null |
| 2026-03-24 | VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents | Pengsen Liu et.al. | 2603.22892 | translate | read | null |
| 2026-03-24 | Portfolio Optimization under Recursive Utility via Reinforcement Learning | Minkey Chang et.al. | 2603.22880 | translate | read | null |
| 2026-03-24 | Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models | Ruixing Jin et.al. | 2603.22876 | translate | read | null |
| 2026-03-24 | DecompGrind: A Decomposition Framework for Robotic Grinding via Cutting-Surface Planning and Contact-Force Adaptation | Shunsuke Araki et.al. | 2603.22859 | translate | read | null |
| 2026-03-24 | Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought | Yunheng Li et.al. | 2603.22847 | translate | read | null |
| 2026-03-24 | CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models | Youzhi Liu et.al. | 2603.22846 | translate | read | null |
| 2026-03-24 | Improving Safety Alignment via Balanced Direct Preference Optimization | Shiji Zhao et.al. | 2603.22829 | translate | read | null |
| 2026-03-24 | SG-VLA: Learning Spatially-Grounded Vision-Language-Action Models for Mobile Manipulation | Ruisen Tu et.al. | 2603.22760 | translate | read | null |
| 2026-03-24 | Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints | Tian Xu et.al. | 2603.22713 | translate | read | null |
| 2026-03-23 | Q-Tacit: Image Quality Assessment via Latent Visual Reasoning | Yuxuan Jiang et.al. | 2603.22641 | translate | read | null |
| 2026-03-23 | Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling | Young Hyun Cho et.al. | 2603.22563 | translate | read | null |
| 2026-03-23 | Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion | Honglin He et.al. | 2603.22527 | translate | read | null |
| 2026-03-23 | Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs | Haoming Meng et.al. | 2603.22446 | translate | read | null |
| 2026-03-23 | CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation | Max Fu et.al. | 2603.22435 | translate | read | null |
| 2026-03-23 | Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning | Rohan Deb et.al. | 2603.22430 | translate | read | null |
| 2026-03-23 | Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure | Davide Di Gioia et.al. | 2603.22384 | translate | read | null |
| 2026-03-22 | WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement | Fangyuan Li et.al. | 2603.22352 | translate | read | null |
| 2026-03-19 | The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis | Di Zhang et.al. | 2603.22312 | translate | read | null |
| 2026-03-23 | Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration | Zakaria Mhammedi et.al. | 2603.22273 | translate | read | null |
| 2026-03-23 | TiCo: Time-Controllable Training for Spoken Dialogue Models | Kai-Wei Chang et.al. | 2603.22267 | translate | read | null |
| 2026-03-23 | DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming | Hung-Chieh Fang et.al. | 2603.22263 | translate | read | null |
| 2026-03-23 | SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation | Sashuai Zhou et.al. | 2603.22228 | translate | read | null |
| 2026-03-23 | Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control | Qingrui Zhao et.al. | 2603.22201 | translate | read | null |
| 2026-03-23 | Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement | Junrong Guo et.al. | 2603.22187 | translate | read | null |
| 2026-03-23 | Cross-Modal Reinforcement Learning for Navigation with Degraded Depth Measurements | Omkar Sawant et.al. | 2603.22182 | translate | read | null |
| 2026-03-23 | Closed-Loop Verbal Reinforcement Learning for Task-Level Robotic Planning | Dmitrii Plotnikov et.al. | 2603.22169 | translate | read | null |
| 2026-03-23 | On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation | Kexin Huang et.al. | 2603.22117 | translate | read | null |
| 2026-03-23 | A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP | Xi Yang et.al. | 2603.22083 | translate | read | null |
| 2026-03-23 | MEVIUS2: Practical Open-Source Quadruped Robot with Sheet Metal Welding and Multimodal Perception | Kento Kawaharazuka et.al. | 2603.22031 | translate | read | null |
| 2026-03-23 | TREX: Trajectory Explanations for Multi-Objective Reinforcement Learning | Dilina Rajapakse et.al. | 2603.21988 | translate | read | null |
| 2026-03-23 | Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe | Xixi Wu et.al. | 2603.21972 | translate | read | null |
| 2026-03-23 | Deep Reinforcement Learning and The Tale of Two Temporal Difference Errors | Juan Sebastian Rojas et.al. | 2603.21921 | translate | read | null |
| 2026-03-23 | P^2O: Joint Policy and Prompt Optimization | Xinyu Lu et.al. | 2603.21877 | translate | read | null |
| 2026-03-23 | Manifold-Aware Exploration for Reinforcement Learning in Video Generation | Mingzhe Zheng et.al. | 2603.21872 | translate | read | null |
| 2026-03-23 | Agentic Personas for Adaptive Scientific Explanations with Knowledge Graphs | Susana Nunes et.al. | 2603.21846 | translate | read | null |
| 2026-03-23 | Partial Attention in Deep Reinforcement Learning for Safe Multi-Agent Control | Turki Bin Mohaya et.al. | 2603.21810 | translate | read | null |
| 2026-03-23 | Image-Conditioned Adaptive Parameter Tuning for Visual Odometry Frontends | Simone Nascivera et.al. | 2603.21785 | translate | read | null |
| 2026-03-23 | CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning | Dongxia Wu et.al. | 2603.21743 | translate | read | null |
| 2026-03-23 | EvoIdeator: Evolving Scientific Ideas through Checklist-Grounded Reinforcement Learning | Andreas Sauter et.al. | 2603.21728 | translate | read | null |
| 2026-03-23 | PPGL-Swarm: Integrated Multimodal Risk Stratification and Hereditary Syndrome Detection in Pheochromocytoma and Paraganglioma | Zelin Liu et.al. | 2603.21700 | translate | read | null |
| 2026-03-23 | TAMTRL: Teacher-Aligned Reward Reshaping for Multi-Turn Reinforcement Learning in Long-Context Compression | Li Wang et.al. | 2603.21663 | translate | read | null |
| 2026-03-23 | Proximal Policy Optimization in Path Space: A Schrödinger Bridge Perspective | Yuehu Gong et.al. | 2603.21621 | translate | read | null |
| 2026-03-23 | Spatio-Temporal Attention Enhanced Multi-Agent DRL for UAV-Assisted Wireless Networks with Limited Communications | Che Chen et.al. | 2603.21594 | translate | read | null |
| 2026-03-23 | Adaptive Robust Estimator for Multi-Agent Reinforcement Learning | Zhongyi Li et.al. | 2603.21574 | translate | read | null |
| 2026-03-23 | Counterfactual Credit Policy Optimization for Multi-Agent Collaboration | Zhongyi Li et.al. | 2603.21563 | translate | read | null |
| 2026-03-23 | What Do World Models Learn in RL? Probing Latent Representations in Learned Environment Simulators | Xinyu Zhang et.al. | 2603.21546 | translate | read | null |
| 2026-03-23 | VIGIL: Part-Grounded Structured Reasoning for Generalizable Deepfake Detection | Xinghan Li et.al. | 2603.21526 | translate | read | null |
| 2026-03-23 | Learning Can Converge Stably to the Wrong Belief under Latent Reliability | Zhipeng Zhang et.al. | 2603.21491 | translate | read | null |
| 2026-03-23 | DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation | Siqi Guo et.al. | 2603.21465 | translate | read | null |
| 2026-03-22 | KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning | Shuai Wang et.al. | 2603.21440 | translate | read | null |
| 2026-03-22 | Dynasto: Validity-Aware Dynamic-Static Parameter Optimization for Autonomous Driving Testing | Dmytro Humeniuk et.al. | 2603.21427 | translate | read | null |
| 2026-03-22 | PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost | Junkeun Yi et.al. | 2603.21383 | translate | read | null |
| 2026-03-22 | A transformer architecture alteration to incentivise externalised reasoning | Elizabeth Pavlova et.al. | 2603.21376 | translate | read | null |
| 2026-03-22 | RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models | Dongyoung Kim et.al. | 2603.21341 | translate | read | null |
| 2026-03-22 | FinRL-X: An AI-Native Modular Infrastructure for Quantitative Trading | Hongyang Yang et.al. | 2603.21330 | translate | read | null |
| 2026-03-22 | DeepXplain: XAI-Guided Autonomous Defense Against Multi-Stage APT Campaigns | Trung V. Phan et.al. | 2603.21296 | translate | read | null |
| 2026-03-22 | Prompt replay: speeding up grpo with on-policy reuse of high-signal prompts | Andrei Baroian et.al. | 2603.21177 | translate | read | null |
| 2026-03-22 | Reward Sharpness-Aware Fine-Tuning for Diffusion Models | Kwanyoung Kim et.al. | 2603.21175 | translate | read | null |
| 2026-03-22 | Rethinking Plasticity in Deep Reinforcement Learning | Zhiqiang He et.al. | 2603.21173 | translate | read | null |
| 2026-03-22 | Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning | Leonid Ugadiarov et.al. | 2603.21162 | translate | read | null |
| 2026-03-22 | Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues | Wenjin Hou et.al. | 2603.21138 | translate | read | null |
| 2026-03-22 | Anatomical Prior-Driven Framework for Autonomous Robotic Cardiac Ultrasound Standard View Acquisition | Zhiyan Cao et.al. | 2603.21134 | translate | read | null |
| 2026-03-22 | VisFly-Lab: Unified Differentiable Framework for First-Order Reinforcement Learning of Quadrotor Control | Fanxing Li et.al. | 2603.21123 | translate | read | null |
| 2026-03-22 | Learning to Optimize Joint Source and RIS-assisted Channel Encoding for Multi-User Semantic Communication Systems | Haidong Wang et.al. | 2603.21097 | translate | read | null |
| 2026-03-22 | DRL-driven Online Optimization for Joint Traffic Reshaping and Channel Reconfiguration in RIS-assisted Semantic NOMA Communications | Songhan Zhao et.al. | 2603.21093 | translate | read | null |
| 2026-03-22 | LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning | Jianing Wang et.al. | 2603.21065 | translate | read | null |
| 2026-03-22 | OrbitStream: Training-Free Adaptive 360-degree Video Streaming via Semantic Potential Fields | Aizierjiang Aiersilan et.al. | 2603.20999 | translate | read | null |
| 2026-03-22 | The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes | Benedikt Hornig et.al. | 2603.20994 | translate | read | null |
| 2026-03-21 | Cyber Deception for Mission Surveillance via Hypergame-Theoretic Deep Reinforcement Learning | Zelin Wan et.al. | 2603.20981 | translate | read | null |
| 2026-03-21 | Deep Adaptive Rate Allocation in Volatile Heterogeneous Wireless Networks | Gregorio Maglione et.al. | 2603.20926 | translate | read | null |
| 2026-03-21 | EruDiff: Refactoring Knowledge in Diffusion Models for Advanced Text-to-Image Synthesis | Xiefan Guo et.al. | 2603.20828 | translate | read | null |
| 2026-03-21 | RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and a Simple Solution | Kaiyuan Li et.al. | 2603.20799 | translate | read | null |
| 2026-03-21 | Enhanced Direction-Sensing Methods and Performance Analysis in Low-Altitude Wireless Network via a Rotation Antenna Array | Jinbing Jiang et.al. | 2603.20784 | translate | read | null |
| 2026-03-21 | Decoupling Numerical and Structural Parameters: An Empirical Study on Adaptive Genetic Algorithms via Deep Reinforcement Learning for the Large-Scale TSP | Hongyu Wang et.al. | 2603.20702 | translate | read | null |
| 2026-03-21 | Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs | Huan Zheng et.al. | 2603.20698 | translate | read | null |
| 2026-03-21 | AI-Driven Multi-Agent Simulation of Stratified Polyamory Systems: A Computational Framework for Optimizing Social Reproductive Efficiency | Yicai Xing et.al. | 2603.20678 | translate | read | null |
| 2026-03-21 | Speedup Patch: Learning a Plug-and-Play Policy to Accelerate Embodied Manipulation | Zhichao Wu et.al. | 2603.20658 | translate | read | null |
| 2026-03-21 | Hierarchical Reinforcement Learning for Next Generation of Multi-AP Coordinated Spatial Reuse | Ziru Chen et.al. | 2603.20647 | translate | read | null |
| 2026-03-21 | Reinforcement Learning-Based Secure Near-field Directional Modulation Enhanced by Rotatable RIS | Yongqiang Li et.al. | 2603.20608 | translate | read | null |
| 2026-03-21 | Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models | Zhilong Zhang et.al. | 2603.20607 | translate | read | null |
| 2026-03-21 | Current state of the multi-agent multi-view experimental and digital twin rendezvous (MMEDR-Autonomous) framework | Logan Banker et.al. | 2603.20575 | translate | read | null |
| 2026-03-20 | Delightful Distributed Policy Gradient | Ian Osband et.al. | 2603.20521 | translate | read | null |
| 2026-03-20 | Grounded Chess Reasoning in Language Models via Master Distillation | Zhenwei Tang et.al. | 2603.20510 | translate | read | null |
| 2026-03-20 | Fluid Antenna Networks Beyond Beamforming: An AI-Native Control Paradigm for 6G | Ian F. Akyildiz et.al. | 2603.20484 | translate | read | null |
| 2026-03-20 | Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret | Ming Shi et.al. | 2603.20453 | translate | read | null |
| 2026-03-20 | SymCircuit: Bayesian Structure Inference for Tractable Probabilistic Circuits via Entropy-Regularized Reinforcement Learning | Y. Sungtaek Ju et.al. | 2603.20392 | translate | read | null |
| 2026-03-20 | CAMA: Exploring Collusive Adversarial Attacks in c-MARL | Men Niu et.al. | 2603.20390 | translate | read | null |
| 2026-03-20 | Leum-VL Technical Report | Yuxuan He et.al. | 2603.20354 | translate | read | null |
| 2026-03-20 | Bounded Coupled AI Learning Dynamics in Tri-Hierarchical Drone Swarms | Oleksii Bychkov et.al. | 2603.20333 | translate | read | null |
| 2026-03-19 | MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery | Dong Li et.al. | 2603.20295 | translate | read | null |
| 2026-03-17 | Learning Communication Between Heterogeneous Agents in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence | Alex Popa et.al. | 2603.20279 | translate | read | null |
| 2026-03-20 | AGILE: A Comprehensive Workflow for Humanoid Loco-Manipulation Learning | Huihua Zhao et.al. | 2603.20147 | translate | read | null |
| 2026-03-20 | Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning | Jiajie Li et.al. | 2603.20116 | translate | read | null |
| 2026-03-20 | Fine-tuning Timeseries Predictors Using Reinforcement Learning | Hugo Cazaux et.al. | 2603.20063 | translate | read | null |
| 2026-03-20 | Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs | Wenjian Zhang et.al. | 2603.20046 | translate | read | null |
| 2026-03-20 | ReViSQL: Achieving Human-Level Text-to-SQL | Yuxuan Zhu et.al. | 2603.20004 | translate | read | null |
| 2026-03-20 | Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States | Yurun Yuan et.al. | 2603.19987 | translate | read | null |
| 2026-03-20 | Interpreting Reinforcement Learning Model Behavior via Koopman with Control | William T. Redman et.al. | 2603.19968 | translate | read | null |
| 2026-03-20 | GustPilot: A Hierarchical DRL-INDI Framework for Wind-Resilient Quadrotor Navigation | Amir Atef Habel et.al. | 2603.19966 | translate | read | null |
| 2026-03-20 | SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia | Zhixiang Lu et.al. | 2603.19931 | translate | read | null |
| 2026-03-20 | Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach | Anouar Nechi et.al. | 2603.19930 | translate | read | null |
| 2026-03-20 | Learning Adaptive Parameter Policies for Nonlinear Bayesian Filtering | Ondrej Straka et.al. | 2603.19910 | translate | read | null |
| 2026-03-20 | What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time | Dong Yan et.al. | 2603.19880 | translate | read | null |
| 2026-03-20 | NASimJax: GPU-Accelerated Policy Learning Framework for Penetration Testing | Raphael Simon et.al. | 2603.19864 | translate | read | null |
| 2026-03-20 | FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization | Chiyu Ma et.al. | 2603.19835 | translate | read | null |
| 2026-03-20 | Generalized Task-Driven Design of Soft Robots via Reduced-Order FEM-based Surrogate Modeling | Yao Yao et.al. | 2603.19794 | translate | read | null |
| 2026-03-20 | FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment | Kewen Zhu et.al. | 2603.19741 | translate | read | null |
| 2026-03-20 | LoopRPT: Reinforcement Pre-Training for Looped Language Models | Guo Tang et.al. | 2603.19714 | translate | read | null |
| 2026-03-20 | A Subgoal-driven Framework for Improving Long-Horizon LLM Agents | Taiyi Wang et.al. | 2603.19685 | translate | read | null |
| 2026-03-20 | Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis | Siddharth Chandak et.al. | 2603.19648 | translate | read | null |
| 2026-03-20 | ContractionPPO: Certified Reinforcement Learning via Differentiable Contraction Layers | Vrushabh Zinage et.al. | 2603.19632 | translate | read | null |
| 2026-03-20 | DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management | Yaqi Xie et.al. | 2603.19621 | translate | read | null |
| 2026-03-20 | SaFRO: Satisfaction-Aware Fusion via Dual-Relative Policy Optimization for Short-Video Search | Renzhe Zhou et.al. | 2603.19585 | translate | read | null |
| 2026-03-20 | PA2D-MORL: Pareto Ascent Directional Decomposition based Multi-Objective Reinforcement Learning | Tianmeng Hu et.al. | 2603.19579 | translate | read | null |
| 2026-03-20 | Learning to Bet for Horizon-Aware Anytime-Valid Testing | Ege Onur Taga et.al. | 2603.19551 | translate | read | null |
| 2026-03-20 | EvidenceRL: Reinforcing Evidence Consistency for Trustworthy Language Models | J. Ben Tamo et.al. | 2603.19532 | translate | read | null |
| 2026-03-19 | Stochastic Sequential Decision Making over Expanding Networks with Graph Filtering | Zhan Gao et.al. | 2603.19501 | translate | read | null |
| 2026-03-19 | Teaching an Agent to Sketch One Part at a Time | Xiaodan Du et.al. | 2603.19500 | translate | read | null |
| 2026-03-19 | Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids | Lucas Ferraz et.al. | 2603.19473 | translate | read | null |
| 2026-03-19 | ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models | Thomas De Min et.al. | 2603.19466 | translate | read | null |
| 2026-03-19 | Deep Hilbert–Galerkin Methods for Infinite-Dimensional PDEs and Optimal Control | Samuel N. Cohen et.al. | 2603.19463 | translate | read | null |
| 2026-03-19 | Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social Dilemmas | Víctor Gallego et.al. | 2603.19453 | translate | read | null |
| 2026-03-19 | Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning | Xueqiao Peng et.al. | 2603.19397 | translate | read | null |
| 2026-03-18 | Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification | Zenan Li et.al. | 2603.19329 | translate | read | null |
| 2026-03-19 | OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards | Zehao Li et.al. | 2603.19191 | translate | read | null |
| 2026-03-19 | Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving | Huiwen Yan et.al. | 2603.19188 | translate | read | null |
| 2026-03-19 | Box Maze: A Process-Control Architecture for Reliable LLM Reasoning | Zou Qiang et.al. | 2603.19182 | translate | read | null |
| 2026-03-19 | VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models | Chonghan Liu et.al. | 2603.19152 | translate | read | null |
| 2026-03-19 | Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control | Mohammad Al Ridhawi et.al. | 2603.19136 | translate | read | null |
| 2026-03-19 | Variational and Annealing-Based Approaches to Quantum Combinatorial Optimization | Hala Hawashin et.al. | 2603.19117 | translate | read | null |
| 2026-03-19 | Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning | Sangwoo Shin et.al. | 2603.19078 | translate | read | null |
| 2026-03-19 | MoRI: Learning Motivation-Grounded Reasoning for Scientific Ideation in Large Language Models | Chenyang Gu et.al. | 2603.19044 | translate | read | null |
| 2026-03-19 | CRAFT: Aligning Diffusion Models with Fine-Tuning Is Easier Than You Think | Zening Sun et.al. | 2603.18991 | translate | read | null |
| 2026-03-19 | Maximum-Entropy Exploration with Future State-Action Visitation Measures | Adrien Bolland et.al. | 2603.18965 | translate | read | null |
| 2026-03-19 | Context Bootstrapped Reinforcement Learning | Saaket Agashe et.al. | 2603.18953 | translate | read | null |
| 2026-03-19 | Safety-Guaranteed Imitation Learning from Nonlinear Model Predictive Control for Spacecraft Close Proximity Operations | Alexander Meinert et.al. | 2603.18910 | translate | read | null |
| 2026-03-19 | MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model | Youngwan Lee et.al. | 2603.18892 | translate | read | null |
| 2026-03-19 | Bridging Network Fragmentation: A Semantic-Augmented DRL Framework for UAV-aided VANETs | Gaoxiang Cao et.al. | 2603.18871 | translate | read | null |
| 2026-03-19 | RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models | Xiao Feng et.al. | 2603.18859 | translate | read | null |
| 2026-03-19 | Learn for Variation: Variationally Guided AAV Trajectory Learning in Differentiable Environments | Xiucheng Wang et.al. | 2603.18853 | translate | read | null |
| 2026-03-19 | ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents | Hao Zhang et.al. | 2603.18815 | translate | read | null |
| 2026-03-19 | V-Dreamer: Automating Robotic Simulation and Trajectory Synthesis via Video Generation Priors | Songjia He et.al. | 2603.18811 | translate | read | null |
| 2026-03-19 | Mi:dm K 2.5 Pro | KT Tech innovation Group et.al. | 2603.18788 | translate | read | null |
| 2026-03-19 | ViTac-Tracing: Visual-Tactile Imitation Learning of Deformable Object Tracing | Yongqiang Zhao et.al. | 2603.18784 | translate | read | null |
| 2026-03-19 | Automatic Configuration of LLM Post-Training Pipelines | Channe Chwa et.al. | 2603.18773 | translate | read | null |
| 2026-03-19 | Memento-Skills: Let Agents Design Agents | Huichi Zhou et.al. | 2603.18743 | translate | read | null |
| 2026-03-19 | CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks | Hao Wang et.al. | 2603.18736 | translate | read | null |
| 2026-03-19 | HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning | Zhicong Lu et.al. | 2603.18683 | translate | read | null |
| 2026-03-19 | Thinking with Constructions: A Benchmark and Policy Optimization for Visual-Text Interleaved Geometric Reasoning | Haokun Zhao et.al. | 2603.18662 | translate | read | null |
| 2026-03-19 | Balanced Thinking: Improving Chain of Thought Training in Vision Language Models | Shaked Perek et.al. | 2603.18656 | translate | read | null |
| 2026-03-19 | Learning to Self-Evolve | Xiaoyin Chen et.al. | 2603.18620 | translate | read | null |
| 2026-03-19 | iSatCR: Graph-Empowered Joint Onboard Computing and Routing for LEO Data Delivery | Jiangtao Luo et.al. | 2603.18539 | translate | read | null |
| 2026-03-19 | Balancing the Reasoning Load: Difficulty-Differentiated Policy Optimization with Length Redistribution for Efficient and Robust Reinforcement Learning | Yinan Xia et.al. | 2603.18533 | translate | read | null |
| 2026-03-19 | Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds | Andrew Choi et.al. | 2603.18532 | translate | read | null |
| 2026-03-19 | AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models | Chengxuan Lu et.al. | 2603.18464 | translate | read | null |
| 2026-03-19 | Discounted Beta–Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards | Haechan Kim et.al. | 2603.18444 | translate | read | null |
| 2026-03-19 | Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation | Asmita Bhardwaj et.al. | 2603.18428 | translate | read | null |
| 2026-03-19 | Efficient and Versatile Quadrupedal Skating: Optimal Co-design via Reinforcement Learning and Bayesian Optimization | Hanwen Wang et.al. | 2603.18408 | translate | read | null |
| 2026-03-19 | RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach | Yifan Zhang et.al. | 2603.18396 | translate | read | null |
| 2026-03-19 | Mathematical Foundations of Deep Learning | Xiaojing Ye et.al. | 2603.18387 | translate | read | null |
| 2026-03-19 | PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching | Ruishuo Chen et.al. | 2603.18363 | translate | read | null |
| 2026-03-18 | Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration | Amirhossein Roknilamouki et.al. | 2603.18326 | translate | read | null |
| 2026-03-18 | Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum | Nived Rajaraman et.al. | 2603.18325 | translate | read | null |
| 2026-03-18 | DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving | Zilin Huang et.al. | 2603.18315 | translate | read | null |
| 2026-03-18 | Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning | Kaiyang Li et.al. | 2603.18314 | translate | read | null |
| 2026-03-18 | Discovering What You Can Control: Interventional Boundary Discovery for Reinforcement Learning | Jiaxin Liu et.al. | 2603.18257 | translate | read | null |
| 2026-03-18 | MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasonning Models | Philippe Formont et.al. | 2603.18256 | translate | read | null |
| 2026-03-18 | How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence | Alex Anvi Eponon et.al. | 2603.18203 | translate | read | null |
| 2026-03-18 | R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation | Naoki Morihira et.al. | 2603.18202 | translate | read | null |
| 2026-03-18 | Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2603.18118 | translate | read | null |
| 2026-03-18 | BoundAD: Boundary-Aware Negative Generation for Time Series Anomaly Detection | Xiancheng Wang et.al. | 2603.18111 | translate | read | null |
| 2026-03-18 | Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner | Hao Ma et.al. | 2603.18088 | translate | read | null |
| 2026-03-18 | Uncovering Latent Phase Structures and Branching Logic in Locomotion Policies: A Case Study on HalfCheetah | Daisuke Yasui et.al. | 2603.18084 | translate | read | null |
| 2026-03-18 | SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training | Prince Zizhuang Wang et.al. | 2603.18079 | translate | read | null |
| 2026-03-18 | Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction | Yi Yu et.al. | 2603.18074 | translate | read | null |
| 2026-03-18 | Reinforcement Learning for Fast and Robust Longitudinal Qubit Readout | Yiming Yu et.al. | 2603.18060 | translate | read | null |
| 2026-03-18 | Unified Policy Value Decomposition for Rapid Adaptation | Cristiano Capone et.al. | 2603.17947 | translate | read | null |
| 2026-03-18 | Training Diffusion Language Models for Black-Box Optimization | Zipeng Sun et.al. | 2603.17919 | translate | read | null |
| 2026-03-18 | Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs | Abhishek Gupta et.al. | 2603.17875 | translate | read | null |
| 2026-03-18 | Procedural Generation of Algorithm Discovery Tasks in Machine Learning | Alexander D. Goldie et.al. | 2603.17863 | translate | read | null |
| 2026-03-18 | Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control | Zunzhe Zhang et.al. | 2603.17834 | translate | read | null |
| 2026-03-18 | CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents | Lintang Sutawika et.al. | 2603.17829 | translate | read | null |
| 2026-03-18 | Federated Distributional Reinforcement Learning with Distributional Critic Regularization | David Millard et.al. | 2603.17820 | translate | read | null |
| 2026-03-18 | EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards | Ruixiang Wang et.al. | 2603.17808 | translate | read | null |
| 2026-03-18 | CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution | Teng Pan et.al. | 2603.17775 | translate | read | null |
| 2026-03-18 | Fast stabilizer state preparation via AI-optimized graph decimation | Michael Doherty et.al. | 2603.17743 | translate | read | null |
| 2026-03-18 | VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning | Tianxing Zhou et.al. | 2603.17720 | translate | read | null |
| 2026-03-18 | Machine Learning for Network Attacks Classification and Statistical Evaluation of Machine Learning for Network Attacks Classification and Adversarial Learning Methodologies for Synthetic Data Generation | Iakovos-Christos Zarkadis et.al. | 2603.17717 | translate | read | null |
| 2026-03-18 | Flow Matching Policy with Entropy Regularization | Ting Gao et.al. | 2603.17685 | translate | read | null |
| 2026-03-18 | Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards | Philipp Normann et.al. | 2603.17673 | translate | read | null |
| 2026-03-18 | Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies | Sinan Ibrahim et.al. | 2603.17631 | translate | read | null |
| 2026-03-18 | Complementary Reinforcement Learning | Dilxat Muhtar et.al. | 2603.17621 | translate | read | null |
| 2026-03-18 | From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation | Pujun Zheng et.al. | 2603.17588 | translate | read | null |
| 2026-03-18 | Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation | Tharun Sethuraman et.al. | 2603.17510 | translate | read | null |
| 2026-03-18 | Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control | Hao Ma et.al. | 2603.17468 | translate | read | null |
| 2026-03-18 | AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization | Dailan He et.al. | 2603.17461 | translate | read | null |
| 2026-03-18 | CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval | Guangzhi Wang et.al. | 2603.17387 | translate | read | null |
| 2026-03-18 | Efficient Exploration at Scale | Seyed Mohammad Asghari et.al. | 2603.17378 | translate | read | null |
| 2026-03-18 | EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection | Chenyang Zhu et.al. | 2603.17343 | translate | read | null |
| 2026-03-18 | A Progressive Visual-Logic-Aligned Framework for Ride-Hailing Adjudication | Weiming Wu et.al. | 2603.17328 | translate | read | null |
| 2026-03-18 | ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling | Ang Li et.al. | 2603.17324 | translate | read | null |
| 2026-03-18 | Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing | Aniruddha Bora et.al. | 2603.17319 | translate | read | null |
| 2026-03-18 | Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress | Yuelin Zhang et.al. | 2603.17312 | translate | read | null |
| 2026-03-18 | Ruyi2.5 Technical Report | Huan Song et.al. | 2603.17311 | translate | read | null |
| 2026-03-18 | InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning | Chengwei Wei et.al. | 2603.17310 | translate | read | null |
| 2026-03-18 | ReLMXEL: Adaptive RL-Based Memory Controller with Explainable Energy and Latency Optimization | Panuganti Chirag Sai et.al. | 2603.17309 | translate | read | null |
| 2026-03-18 | Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations | Haozheng Luo et.al. | 2603.17305 | translate | read | null |
| 2026-03-18 | WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation | Zahin Sufiyan et.al. | 2603.17301 | translate | read | null |
| 2026-03-18 | Network and Device Level Cyber Deception for Contested Environments Using RL and LLMs | Abhijeet Sahu et.al. | 2603.17272 | translate | read | null |
| 2026-03-18 | Adaptive Anchor Policies for Efficient 4D Gaussian Streaming | Ashim Dahal et.al. | 2603.17227 | translate | read | null |
| 2026-03-17 | MetaClaw: Just Talk – An Agent That Meta-Learns and Evolves in the Wild | Peng Xia et.al. | 2603.17187 | translate | read | null |
| 2026-03-17 | Shielded Reinforcement Learning Under Dynamic Temporal Logic Constraints | Sadık Bera Yüksel et.al. | 2603.17152 | translate | read | null |
| 2026-03-17 | REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge | Yasi Zhang et.al. | 2603.17145 | translate | read | null |
| 2026-03-17 | SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion | Elham Daneshmand et.al. | 2603.17092 | translate | read | null |
| 2026-03-17 | CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning | Weikun K. Zhang et.al. | 2603.17075 | translate | read | null |
| 2026-03-17 | PaAgent: Portrait-Aware Image Restoration Agent via Subjective-Objective Reinforcement Learning | Yijian Wang et.al. | 2603.17055 | translate | read | null |
| 2026-03-17 | Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models | Songchun Zhang et.al. | 2603.17051 | translate | read | null |
| 2026-03-17 | HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning | Shenzhi Wang et.al. | 2603.17024 | translate | read | null |
| 2026-03-17 | Efficient and Reliable Teleoperation through Real-to-Sim-to-Real Shared Autonomy | Shuo Sha et.al. | 2603.17016 | translate | read | null |
| 2026-03-17 | Rewarding DINO: Predicting Dense Rewards with Vision Foundation Models | Pierre Krack et.al. | 2603.16978 | translate | read | null |
| 2026-03-17 | DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns | Trung V. Phan et.al. | 2603.16969 | translate | read | null |
| 2026-03-17 | Efficient Reasoning on the Edge | Yelysei Bondarenko et.al. | 2603.16867 | translate | read | null |
| 2026-03-17 | DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models | Emily Yue-Ting Jia et.al. | 2603.16860 | translate | read | null |
| 2026-03-17 | Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning | Jello Zhou et.al. | 2603.16842 | translate | read | null |
| 2026-03-17 | Learning to Present: Inverse Specification Rewards for Agentic Slide Generation | Karthik Ragunath Ananda Kumar et.al. | 2603.16839 | translate | read | null |
| 2026-03-17 | Deep Reinforcement Learning-driven Edge Offloading for Latency-constrained XR pipelines | Sourya Saha et.al. | 2603.16823 | translate | read | null |
| 2026-03-17 | Anticipatory Planning for Multimodal AI Agents | Yongyuan Liang et.al. | 2603.16777 | translate | read | null |
| 2026-03-16 | GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution | Qiaosi Yi et.al. | 2603.16769 | translate | read | null |
| 2026-03-17 | Learning Whole-Body Control for a Salamander Robot | Mengze Tian et.al. | 2603.16683 | translate | read | null |
| 2026-03-17 | When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making | Jun Liu et.al. | 2603.16673 | translate | read | null |
| 2026-03-17 | What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline | Benoît Alcaraz et.al. | 2603.16651 | translate | read | null |
| 2026-03-17 | Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLM Reward Models | Weijie Qiu et.al. | 2603.16600 | translate | read | null |
| 2026-03-17 | When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective | Zelin Zhang et.al. | 2603.16578 | translate | read | null |
| 2026-03-17 | EmoLLM: Appraisal-Grounded Cognitive-Emotional Co-Reasoning in Large Language Models | Yifei Zhang et.al. | 2603.16553 | translate | read | null |
| 2026-03-17 | Kamino: GPU-based Massively Parallel Simulation of Multi-Body Systems with Challenging Topologies | Vassilios Tsounis et.al. | 2603.16536 | translate | read | null |
| 2026-03-17 | From the Inside Out: Progressive Distribution Refinement for Confidence Calibration | Xizhong Yang et.al. | 2603.16500 | translate | read | null |
| 2026-03-17 | Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems | Marios Aristodemou et.al. | 2603.16470 | translate | read | null |
| 2026-03-17 | Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition | Yu Liu et.al. | 2603.16463 | translate | read | null |
| 2026-03-17 | Agentic AI for SAGIN Resource Management_Semantic Awareness, Orchestration, and Optimization | Linghao Zhang et.al. | 2603.16458 | translate | read | null |
| 2026-03-17 | TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas | Ai Jian et.al. | 2603.16448 | translate | read | null |
| 2026-03-17 | Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences | Quan Cheng et.al. | 2603.16417 | translate | read | null |
| 2026-03-17 | Onboard MuJoCo-based Model Predictive Control for Shipboard Crane with Double-Pendulum Sway Suppression | Oscar Pang et.al. | 2603.16407 | translate | read | null |
| 2026-03-17 | Deep Reinforcement Learning-Assisted Automated Operator Portfolio for Constrained Multi-objective Optimization | Shuai Shao et.al. | 2603.16401 | translate | read | null |
| 2026-03-17 | Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement | Yusuke Nishii et.al. | 2603.16384 | translate | read | null |
| 2026-03-17 | Agile Interception of a Flying Target using Competitive Reinforcement Learning | Timothée Gavin et.al. | 2603.16279 | translate | read | null |
| 2026-03-17 | VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment | Tengjiao Yin et.al. | 2603.16271 | translate | read | null |
| 2026-03-17 | Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism | Kaixuan Du et.al. | 2603.16223 | translate | read | null |
| 2026-03-17 | Offline Exploration-Aware Fine-Tuning for Long-Chain Mathematical Reasoning | Yongyu Mu et.al. | 2603.16206 | translate | read | null |
| 2026-03-17 | Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning | Haomin Wang et.al. | 2603.16189 | translate | read | null |
| 2026-03-17 | ECHO: Edge-Cloud Humanoid Orchestration for Language-to-Motion Control | Haozhe Jia et.al. | 2603.16188 | translate | read | null |
| 2026-03-17 | Task-Specified Compliance Bounds for Humanoids via Lipschitz-Constrained Policies | Zewen He et.al. | 2603.16180 | translate | read | null |
| 2026-03-17 | SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation | Long Li et.al. | 2603.16161 | translate | read | null |
| 2026-03-17 | Execution-Grounded Credit Assignment for GRPO in Code Generation | Abhijit Kumar et.al. | 2603.16158 | translate | read | null |
| 2026-03-17 | DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay | Long Li et.al. | 2603.16157 | translate | read | null |
| 2026-03-17 | HIPO: Instruction Hierarchy via Constrained Reinforcement Learning | Keru Chen et.al. | 2603.16152 | translate | read | null |
| 2026-03-17 | Communication-Aware Multi-Agent Reinforcement Learning for Decentralized Cooperative UAV Deployment | Enguang Fan et.al. | 2603.16141 | translate | read | null |
| 2026-03-17 | Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards | Yuxuan Zhu et.al. | 2603.16140 | translate | read | null |
| 2026-03-17 | SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding | Songcheng Cai et.al. | 2603.16124 | translate | read | null |
| 2026-03-17 | Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models | Yanru Wu et.al. | 2603.16065 | translate | read | null |
| 2026-03-17 | ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning | Yu Li et.al. | 2603.16060 | translate | read | null |
| 2026-03-17 | Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition | Xiaozhou Ye et.al. | 2603.16043 | translate | read | null |
| 2026-03-16 | Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning | Jingxiang Chen et.al. | 2603.15981 | translate | read | null |
| 2026-03-16 | ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors | Zifan Xu et.al. | 2603.15956 | translate | read | null |
| 2026-03-16 | Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions | Goutam Das et.al. | 2603.15907 | translate | read | null |
| 2026-03-16 | Counteractive RL: Rethinking Core Principles for Efficient and Scalable Deep Reinforcement Learning | Ezgi Korkmaz et.al. | 2603.15871 | translate | read | null |
| 2026-03-16 | Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning | Patrick Yin et.al. | 2603.15789 | translate | read | null |
| 2026-03-16 | CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving | Yihong Guo et.al. | 2603.15771 | translate | read | null |
| 2026-03-16 | Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation | Jacob Levy et.al. | 2603.15759 | translate | read | null |
| 2026-03-16 | Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models | Lit Sin Tan et.al. | 2603.15724 | translate | read | null |
| 2026-03-16 | BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator | Ruyi Zhang et.al. | 2603.15692 | translate | read | null |
| 2026-03-16 | GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering | Xincheng Shuai et.al. | 2603.15616 | translate | read | null |
| 2026-03-16 | HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions | Yukang Cao et.al. | 2603.15612 | translate | read | null |
| 2026-03-16 | Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning | Aozhe Wang et.al. | 2603.15611 | translate | read | null |
| 2026-03-16 | From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation | Yibin Liu et.al. | 2603.15600 | translate | read | null |
| 2026-03-16 | Unbiased and Biased Variance-Reduced Forward-Reflected-Backward Splitting Methods for Stochastic Composite Inclusions | Quoc Tran-Dinh et.al. | 2603.15576 | translate | read | null |
| 2026-03-16 | Deep Reinforcement Learning for Fano Hypersurfaces | Marc Truter et.al. | 2603.15437 | translate | read | null |
| 2026-03-16 | Listening to the Echo: User-Reaction Aware Policy Optimization via Scalar-Verbal Hybrid Reinforcement Learning | Jing Ye et.al. | 2603.15434 | translate | read | null |
| 2026-03-16 | Gym-V: A Unified Vision Environment System for Agentic Vision Research | Fanqing Meng et.al. | 2603.15432 | translate | read | null |
| 2026-03-16 | MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings | Shahil Shaik et.al. | 2603.15418 | translate | read | null |
| 2026-03-16 | Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities | Vanshaj Khattar et.al. | 2603.15417 | translate | read | null |
| 2026-03-16 | Fusian: Multi-LoRA Fusion for Fine-Grained Continuous MBTI Personality Control in Large Language Models | Zehao Chen et.al. | 2603.15405 | translate | read | null |
| 2026-03-16 | Trajectory-Diversity-Driven Robust Vision-and-Language Navigation | Jiangyang Li et.al. | 2603.15370 | translate | read | null |
| 2026-03-16 | NavThinker: Action-Conditioned World Models for Coupled Prediction and Planning in Social Navigation | Tianshuai Hu et.al. | 2603.15359 | translate | read | null |
| 2026-03-16 | Evaluating the Robustness of Reinforcement Learning based Adaptive Traffic Signal Control | Dickens Kwesiga et.al. | 2603.15283 | translate | read | null |
| 2026-03-16 | MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers | Kangjun Guo et.al. | 2603.15265 | translate | read | null |
| 2026-03-16 | Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search | Mengxiang Chen et.al. | 2603.15262 | translate | read | null |
| 2026-03-16 | SAGE: Multi-Agent Self-Evolution for LLM Reasoning | Yulin Peng et.al. | 2603.15255 | translate | read | null |
| 2026-03-16 | Towards Foundation Models for Consensus Rank Aggregation | Yijun Jin et.al. | 2603.15218 | translate | read | null |
| 2026-03-16 | What Matters for Scalable and Robust Learning in End-to-End Driving Planners? | David Holtz et.al. | 2603.15185 | translate | read | null |
| 2026-03-16 | Iterative Learning Control-Informed Reinforcement Learning for Batch Process Control | Runze Lin et.al. | 2603.15180 | translate | read | null |
| 2026-03-16 | KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots | Xiaoyi Wei et.al. | 2603.15179 | translate | read | null |
| 2026-03-16 | Multi-Scale Control of Large Agent Populations: From Density Dynamics to Individual Actuation | Mario di Bernardo et.al. | 2603.15160 | translate | read | null |
| 2026-03-16 | Master Micro Residual Correction with Adaptive Tactile Fusion and Force-Mixed Control for Contact-Rich Manipulation | Xingting Li et.al. | 2603.15152 | translate | read | null |
| 2026-03-16 | Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies | Mumuksh Tayal et.al. | 2603.15136 | translate | read | null |
| 2026-03-16 | MMKU-Bench: A Multimodal Update Benchmark for Diverse Visual Knowledge | Baochen Fu et.al. | 2603.15117 | translate | read | null |
| 2026-03-16 | Sampling-guided exploration of active feature selection policies | Gabriel Bernardino et.al. | 2603.15110 | translate | read | null |
| 2026-03-16 | HALO:Closing Sim-to-Real Gap for Heavy-loaded Humanoid Agile Motion Skills via Differentiable Simulation | Xingyi Wang et.al. | 2603.15084 | translate | read | null |
| 2026-03-16 | Writer-R1: Enhancing Generative Writing in LLMs via Memory-augmented Replay Policy Optimization | Jihao Zhao et.al. | 2603.15061 | translate | read | null |
| 2026-03-16 | Interference-Aware K-Step Reachable Communication in Multi-Agent Reinforcement Learning | Ziyu Cheng et.al. | 2603.15054 | translate | read | null |
| 2026-03-16 | CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control | Gelu Liu et.al. | 2603.15013 | translate | read | null |
| 2026-03-16 | Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing | Jiahe Song et.al. | 2603.15011 | translate | read | null |
| 2026-03-16 | CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models | Xiaojun Shan et.al. | 2603.14957 | translate | read | null |
| 2026-03-16 | EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing | Zitong Xu et.al. | 2603.14916 | translate | read | null |
| 2026-03-16 | PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning | Yinfeng Gao et.al. | 2603.14908 | translate | read | null |
| 2026-03-16 | ViSA: Visited-State Augmentation for Generalized Goal-Space Contrastive Reinforcement Learning | Issa Nakamura et.al. | 2603.14887 | translate | read | null |
| 2026-03-16 | Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning | Mikoto Kudo et.al. | 2603.14867 | translate | read | null |
| 2026-03-16 | Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks | Zijian Yu et.al. | 2603.14864 | translate | read | null |
| 2026-03-16 | Ego to World: Collaborative Spatial Reasoning in Embodied Systems via Reinforcement Learning | Heng Zhou et.al. | 2603.14811 | translate | read | null |
| 2026-03-16 | DeFRiS: Silo-Cooperative IoT Applications Scheduling via Decentralized Federated Reinforcement Learning | Zhiyu Wang et.al. | 2603.14729 | translate | read | null |
| 2026-03-15 | VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting | Daeun Lee et.al. | 2603.14659 | translate | read | null |
| 2026-03-15 | EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees | Saad Alqithami et.al. | 2603.14625 | translate | read | null |
| 2026-03-15 | A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study | Jingyi Liu et.al. | 2603.14600 | translate | read | null |
| 2026-03-15 | Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning | Jingyi Liu et.al. | 2603.14589 | translate | read | null |
| 2026-03-15 | Machine Learning-Driven Intelligent Memory System Design: From On-Chip Caches to Storage | Rahul Bera et.al. | 2603.14583 | translate | read | null |
| 2026-03-15 | MorFiC: Fixing Value Miscalibration for Zero-Shot Quadruped Transfer | Prakhar Mishra et.al. | 2603.14554 | translate | read | null |
| 2026-03-15 | Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms | Jingyi Liu et.al. | 2603.14535 | translate | read | null |
| 2026-03-15 | VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning | Chaoyang Wang et.al. | 2603.14523 | translate | read | null |
| 2026-03-15 | AI Can Learn Scientific Taste | Jingqi Tong et.al. | 2603.14473 | translate | read | link |
| 2026-03-15 | Physics-Informed Policy Optimization via Analytic Dynamics Regularization | Namai Chandra et.al. | 2603.14469 | translate | read | null |
| 2026-03-15 | eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation | Prithvi Jai Ramesh et.al. | 2603.14397 | translate | read | null |
| 2026-03-15 | From $\boldsymbol{\logπ}$ to $\boldsymbolπ$ : Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight | Xiaoliang Fu et.al. | 2603.14389 | translate | read | null |
| 2026-03-15 | SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI | Parth Patne et.al. | 2603.14380 | translate | read | null |
| 2026-03-15 | Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling | Suvadeep Hajra et.al. | 2603.14355 | translate | read | null |
| 2026-03-15 | VIP-Loco: A Visually Guided Infinite Horizon Planning Framework for Legged Locomotion | Aditya Shirwatkar et.al. | 2603.14345 | translate | read | null |
| 2026-03-15 | AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models | Jiarui Zhang et.al. | 2603.14342 | translate | read | null |
| 2026-03-15 | Data-Driven Physics Embedded Dynamics with Predictive Control and Reinforcement Learning for Quadrupeds | Prakrut Kotecha et.al. | 2603.14333 | translate | read | null |
| 2026-03-15 | Load-Aware Locomotion Control for Humanoid Robots in Industrial Transportation Tasks | Lequn Fu et.al. | 2603.14308 | translate | read | null |
| 2026-03-15 | RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment | Yujia Wang et.al. | 2603.14297 | translate | read | null |
| 2026-03-15 | MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos | Sagnik Majumder et.al. | 2603.14252 | translate | read | null |
| 2026-03-15 | GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies | He Zhang et.al. | 2603.14245 | translate | read | null |
| 2026-03-15 | Understanding Strategic Platform Entry and Seller Exploration: A Stackelberg Model | Garrett Seo et.al. | 2603.14206 | translate | read | null |
| 2026-03-12 | HumDex:Humanoid Dexterous Manipulation Made Easy | Liang Heng et.al. | 2603.12260 | translate | read | null |
| 2026-03-12 | DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning | Yujie Wei et.al. | 2603.12257 | translate | read | null |
| 2026-03-12 | Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing | Baifeng Shi et.al. | 2603.12254 | translate | read | null |
| 2026-03-12 | Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation | Xiangyu Zhao et.al. | 2603.12247 | translate | read | null |
| 2026-03-12 | Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training | Yixin Liu et.al. | 2603.12246 | translate | read | null |
| 2026-03-12 | Separable neural architectures as a primitive for unified predictive and generative intelligence | Reza T. Batley et.al. | 2603.12244 | translate | read | null |
| 2026-03-12 | HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies | Amber Xie et.al. | 2603.12243 | translate | read | null |
| 2026-03-12 | Integrated Online Monitoring and Adaption of Process Model Predictive Controllers | Samuel Mallick et.al. | 2603.12187 | translate | read | null |
| 2026-03-12 | LatentGeo: Learnable Auxiliary Constructions in Latent Space for Multimodal Geometric Reasoning | Haiying Xu et.al. | 2603.12166 | translate | read | null |
| 2026-03-12 | IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL | Zhoujun Cheng et.al. | 2603.12151 | translate | read | null |
| 2026-03-12 | Linking Perception, Confidence and Accuracy in MLLMs | Yuetian Du et.al. | 2603.12149 | translate | read | null |
| 2026-03-12 | EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next | Ye Pan et.al. | 2603.12147 | translate | read | null |
| 2026-03-12 | Automatic Generation of High-Performance RL Environments | Seth Karten et.al. | 2603.12145 | translate | read | null |
| 2026-03-12 | Increasing intelligence in AI agents can worsen collective outcomes | Neil F. Johnson et.al. | 2603.12129 | translate | read | null |
| 2026-03-12 | Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives | Taeho Lee et.al. | 2603.12110 | translate | read | null |
| 2026-03-12 | On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents | Deyu Zou et.al. | 2603.12109 | translate | read | null |
| 2026-03-12 | A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control | Sheng-You Huang et.al. | 2603.12096 | translate | read | null |
| 2026-03-12 | Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics | Ming-Hong Chen et.al. | 2603.12087 | translate | read | null |
| 2026-03-12 | AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling | Hamed Hamzeh et.al. | 2603.12031 | translate | read | null |
| 2026-03-12 | Sim-to-reality adaptation for Deep Reinforcement Learning applied to an underwater docking application | Alaaeddine Chaarani et.al. | 2603.12020 | translate | read | null |
| 2026-03-12 | Learning Visuomotor Policy for Multi-Robot Laser Tag Game | Kai Li et.al. | 2603.11980 | translate | read | null |
| 2026-03-12 | FlexRec: Adapting LLM-based Recommenders for Flexible Needs via Reinforcement Learning | Yijun Pan et.al. | 2603.11901 | translate | read | null |
| 2026-03-12 | The price of decentralization in managing engineering systems through multi-agent reinforcement learning | Prateek Bhustali et.al. | 2603.11884 | translate | read | null |
| 2026-03-12 | Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language | Remigiusz Kinas et.al. | 2603.11881 | translate | read | null |
| 2026-03-12 | Hybrid Human-Agent Social Dilemmas in Energy Markets | Isuri Perera et.al. | 2603.11834 | translate | read | null |
| 2026-03-12 | Towards High-Fidelity CAD Generation via LLM-Driven Program Generation and Text-Based B-Rep Primitive Grounding | Jiahao Li et.al. | 2603.11831 | translate | read | null |
| 2026-03-12 | RADAR: Closed-Loop Robotic Data Generation via Semantic Planning and Autonomous Causal Environment Reset | Yongzhong Wang et.al. | 2603.11811 | translate | read | null |
| 2026-03-12 | Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach | Erfan Mirzaei et.al. | 2603.11757 | translate | read | null |
| 2026-03-12 | STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning | Jiwon Jeon et.al. | 2603.11691 | translate | read | null |
| 2026-03-12 | Entropy-Preserving Reinforcement Learning | Aleksei Petrenko et.al. | 2603.11682 | translate | read | null |
| 2026-03-12 | Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge | Junjie Wu et.al. | 2603.11665 | translate | read | null |
| 2026-03-12 | Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models | Xiquan Li et.al. | 2603.11661 | translate | read | null |
| 2026-03-12 | Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning | Jiaheng Hu et.al. | 2603.11653 | translate | read | null |
| 2026-03-12 | Diversity You Can Actually Measure: A Fast, Model-Free Diversity Metric for Robotics Datasets | Sreevardhan Sirigiri et.al. | 2603.11634 | translate | read | null |
| 2026-03-12 | Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization | Qijun Liao et.al. | 2603.11600 | translate | read | null |
| 2026-03-12 | WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing | Hui Zhang et.al. | 2603.11593 | translate | read | null |
| 2026-03-12 | Multi-Agent Reinforcement Learning for UAV-Based Chemical Plume Source Localization | Zhirun Li et.al. | 2603.11582 | translate | read | null |
| 2026-03-12 | SVLL: Staged Vision-Language Learning for Physically Grounded Embodied Task Planning | Yuyuan Yang et.al. | 2603.11563 | translate | read | null |
| 2026-03-12 | NFPO: Stabilized Policy Optimization of Normalizing Flow for Robotic Policy Learning | Diyuan Shi et.al. | 2603.11470 | translate | read | null |
| 2026-03-12 | Adversarial Reinforcement Learning for Detecting False Data Injection Attacks in Vehicular Routing | Taha Eghtesad et.al. | 2603.11433 | translate | read | null |
| 2026-03-12 | ARROW: Augmented Replay for RObust World models | Abdulaziz Alyahya et.al. | 2603.11395 | translate | read | null |
| 2026-03-12 | SliceFed: Federated Constrained Multi-Agent DRL for Dynamic Spectrum Slicing in 6G | Hossein Mohammadi et.al. | 2603.11390 | translate | read | null |
| 2026-03-11 | Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification | Hang Yu et.al. | 2603.11372 | translate | read | null |
| 2026-03-11 | abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance | Joyce Lee et.al. | 2603.11369 | translate | read | null |
| 2026-03-11 | Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning | Hong Lu et.al. | 2603.11351 | translate | read | null |
| 2026-03-11 | Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning | Yuto Shibata et.al. | 2603.11346 | translate | read | null |
| 2026-03-11 | Meta-Reinforcement Learning with Self-Reflection for Agentic Search | Teng Xiao et.al. | 2603.11327 | translate | read | null |
| 2026-03-11 | Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings | Yuning Wu et.al. | 2603.11321 | translate | read | null |
| 2026-03-11 | ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning | Lingxiao Tang et.al. | 2603.11226 | translate | read | null |
| 2026-03-11 | Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning | Yuehao Song et.al. | 2603.11219 | translate | read | null |
| 2026-03-11 | DeReason: A Difficulty-Aware Curriculum Improves Decoupled SFT-then-RL Training for General Reasoning | Hanxu Hu et.al. | 2603.11193 | translate | read | null |
| 2026-03-11 | Learning to Unscramble: Simplifying Symbolic Expressions via Self-Supervised Oracle Trajectories | David Shih et.al. | 2603.11164 | translate | read | null |
| 2026-03-11 | Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion | Yuanhong Wu et.al. | 2603.11126 | translate | read | null |
| 2026-03-11 | Learning Tree-Based Models with Gradient Descent | Sascha Marton et.al. | 2603.11117 | translate | read | null |
| 2026-03-11 | ResWM: Residual-Action World Model for Visual RL | Jseen Zhang et.al. | 2603.11110 | translate | read | null |
| 2026-03-11 | RC-NF: Robot-Conditioned Normalizing Flow for Real-Time Anomaly Detection in Robotic Manipulation | Shijie Zhou et.al. | 2603.11106 | translate | read | null |
| 2026-03-11 | Learning Adaptive Force Control for Contact-Rich Sample Scraping with Heterogeneous Materials | Cenk Cetin et.al. | 2603.10979 | translate | read | null |
| 2026-03-11 | Contact Coverage-Guided Exploration for General-Purpose Dexterous Manipulation | Zixuan Liu et.al. | 2603.10971 | translate | read | null |
| 2026-03-11 | Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control | Yaswanth Chittepu et.al. | 2603.10938 | translate | read | null |
| 2026-03-11 | Lifelong Imitation Learning with Multimodal Latent Replay and Incremental Adjustment | Fanqi Yu et.al. | 2603.10929 | translate | read | null |
| 2026-03-11 | Ergodicity in reinforcement learning | Dominik Baumann et.al. | 2603.10895 | translate | read | null |
| 2026-03-11 | Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models | Yixiu Mao et.al. | 2603.10887 | translate | read | null |
| 2026-03-11 | RL-Augmented MPC for Non-Gaited Legged and Hybrid Locomotion | Andrea Patrizi et.al. | 2603.10878 | translate | read | null |
| 2026-03-11 | $V_{0.5}$ : Generalist Value Model as a Prior for Sparse RL Rollouts | Yi-Kai Zhang et.al. | 2603.10848 | translate | read | null |
| 2026-03-11 | Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis | Yujie Zheng et.al. | 2603.10846 | translate | read | null |
| 2026-03-11 | ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning | Xiaofeng Lin et.al. | 2603.10823 | translate | read | null |
| 2026-03-11 | Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments | Konstantin Dobler et.al. | 2603.10793 | translate | read | null |
| 2026-03-11 | mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR | Konstantin Dobler et.al. | 2603.10767 | translate | read | null |
| 2026-03-11 | ASTER: Attitude-aware Suspended-payload Quadrotor Traversal via Efficient Reinforcement Learning | Dongcheng Cao et.al. | 2603.10715 | translate | read | null |
| 2026-03-11 | MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers | Jin Zhou et.al. | 2603.10714 | translate | read | null |
| 2026-03-11 | Splat2Real: Novel-view Scaling for Physical AI with 3D Gaussian Splatting | Hansol Lim et.al. | 2603.10638 | translate | read | null |
| 2026-03-11 | Reinforcement Learning with Conditional Expectation Reward | Changyi Xiao et.al. | 2603.10624 | translate | read | null |
| 2026-03-11 | AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments | Zixuan Chen et.al. | 2603.10616 | translate | read | null |
| 2026-03-11 | Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning | Zhaowei Zhang et.al. | 2603.10588 | translate | read | null |
| 2026-03-11 | Safety-critical Control Under Partial Observability: Reach-Avoid POMDP meets Belief Space Control | Matti Vahs et.al. | 2603.10572 | translate | read | null |
| 2026-03-11 | Adaptive RAN Slicing Control via Reward-Free Self-Finetuning Agents | Yuanhao Li et.al. | 2603.10564 | translate | read | null |
| 2026-03-11 | Learning to Score: Tuning Cluster Schedulers through Reinforcement Learning | Martin Asenov et.al. | 2603.10545 | translate | read | null |
| 2026-03-11 | Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning | Zichao Li et.al. | 2603.10535 | translate | read | null |
| 2026-03-11 | UAV-MARL: Multi-Agent Reinforcement Learning for Time-Critical and Dynamic Medical Supply Delivery | Islam Guven et.al. | 2603.10528 | translate | read | null |
| 2026-03-11 | IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs | Chuan Guo et.al. | 2603.10521 | translate | read | null |
| 2026-03-11 | Muscle Synergy Priors Enhance Biomechanical Fidelity in Predictive Musculoskeletal Locomotion Simulation | Ilseung Park et.al. | 2603.10474 | translate | read | null |
| 2026-03-11 | COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints | Mohammad Saeid Anwar et.al. | 2603.10436 | translate | read | null |
| 2026-03-11 | Adaptive Active Learning for Regression via Reinforcement Learning | Simon D. Nguyen et.al. | 2603.10435 | translate | read | null |
| 2026-03-11 | Graph-GRPO: Training Graph Flow Models with Reinforcement Learning | Baoheng Zhu et.al. | 2603.10395 | translate | read | null |
| 2026-03-11 | ScanDP: Generalizable 3D Scanning with Diffusion Policy | Itsuki Hirako et.al. | 2603.10390 | translate | read | null |
| 2026-03-11 | SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning | Anlun Huang et.al. | 2603.10306 | translate | read | null |
| 2026-03-11 | From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification | Ke Zhang et.al. | 2603.10300 | translate | read | null |
| 2026-03-11 | Quantum entanglement provides a competitive advantage in adversarial games | Peiyong Wang et.al. | 2603.10289 | translate | read | null |
| 2026-03-10 | From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning | Zhanyi Sun et.al. | 2603.10263 | translate | read | null |
| 2026-03-10 | SiMPO: Measure Matching for Online Diffusion Reinforcement Learning | Haitong Ma et.al. | 2603.10250 | translate | read | null |
| 2026-03-10 | Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces | Ji Gao et.al. | 2603.10199 | translate | read | null |
| 2026-03-10 | Learning to Decode Quantum LDPC Codes Via Belief Propagation | Mohsen Moradi et.al. | 2603.10192 | translate | read | null |
| 2026-03-10 | Calibration-Reasoning Framework for Descriptive Speech Quality Assessment | Elizaveta Kostenok et.al. | 2603.10175 | translate | read | null |
| 2026-03-10 | ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning | Ruizhong Qiu et.al. | 2603.10160 | translate | read | null |
| 2026-03-10 | CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR | Sijia Cui et.al. | 2603.10101 | translate | read | null |
| 2026-03-10 | Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models | Daniel Hennes et.al. | 2603.10098 | translate | read | null |
| 2026-03-10 | Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models | Ali Raza et.al. | 2603.10080 | translate | read | null |
| 2026-03-10 | Improving Search Agent with One Line of Code | Jian Li et.al. | 2603.10069 | translate | read | null |
| 2026-03-09 | Cluster-Aware Attention-Based Deep Reinforcement Learning for Pickup and Delivery Problems | Wentao Wang et.al. | 2603.10053 | translate | read | null |
| 2026-03-10 | Kinodynamic Motion Retargeting for Humanoid Locomotion via Multi-Contact Whole-Body Trajectory Optimization | Xiaoyu Zhang et.al. | 2603.09956 | translate | read | null |
| 2026-03-10 | When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic | Alberto Fernández-Hernández et.al. | 2603.09950 | translate | read | null |
| 2026-03-10 | Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts | Hongbo Bo et.al. | 2603.09890 | translate | read | null |
| 2026-03-10 | Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning | Yixin Zheng et.al. | 2603.09882 | translate | read | null |
| 2026-03-10 | RecThinker: An Agentic Framework for Tool-Augmented Reasoning in Recommendation | Haobo Zhang et.al. | 2603.09843 | translate | read | null |
| 2026-03-10 | Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning | Tiehua Mei et.al. | 2603.09803 | translate | read | null |
| 2026-03-10 | Long-Run Conditional Value-at-Risk Reinforcement Learning | Qixin Wang et.al. | 2603.09734 | translate | read | null |
| 2026-03-10 | GSStream: 3D Gaussian Splatting based Volumetric Scene Streaming System | Zhiye Tang et.al. | 2603.09718 | translate | read | null |
| 2026-03-10 | ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning | Davit Melikidze et.al. | 2603.09692 | translate | read | null |
| 2026-03-10 | ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly | Minchi Ruan et.al. | 2603.09565 | translate | read | null |
| 2026-03-10 | GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision | Lang Sun et.al. | 2603.09551 | translate | read | null |
| 2026-03-10 | NS-VLA: Towards Neuro-Symbolic Vision-Language-Action Models | Ziyue Zhu et.al. | 2603.09542 | translate | read | null |
| 2026-03-10 | Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization | Ming Nie et.al. | 2603.09538 | translate | read | null |
| 2026-03-10 | MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning | Xiang Yuan et.al. | 2603.09478 | translate | read | null |
| 2026-03-10 | SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments | Shiyi Chen et.al. | 2603.09460 | translate | read | null |
| 2026-03-10 | Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning | Tatjana Krau et.al. | 2603.09427 | translate | read | null |
| 2026-03-10 | SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space | Swaminathan S K et.al. | 2603.09378 | translate | read | null |
| 2026-03-10 | Robust Regularized Policy Iteration under Transition Uncertainty | Hongqiang Lin et.al. | 2603.09344 | translate | read | null |
| 2026-03-10 | Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning | Heng Zhang et.al. | 2603.09331 | translate | read | null |
| 2026-03-10 | OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models | Tengjin Weng et.al. | 2603.09326 | translate | read | null |
| 2026-03-10 | Social-R1: Towards Human-like Social Reasoning in LLMs | Jincenzi Wu et.al. | 2603.09249 | translate | read | null |
| 2026-03-10 | MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics | Neil Janwani et.al. | 2603.09237 | translate | read | null |
| 2026-03-10 | Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control | Peihao Wang et.al. | 2603.09221 | translate | read | null |
| 2026-03-10 | Embodied Human Simulation for Quantitative Design and Analysis of Interactive Robotics | Chenhui Zuo et.al. | 2603.09218 | translate | read | null |
| 2026-03-10 | Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation | Jake Gonzales et.al. | 2603.09208 | translate | read | null |
| 2026-03-10 | Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents | Jiangming Shu et.al. | 2603.09203 | translate | read | null |
| 2026-03-10 | RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning | Tzu-Heng Huang et.al. | 2603.09160 | translate | read | null |
| 2026-03-10 | Critical States Preparation With Deep Reinforcement Learning | Jia-Wen Yu et.al. | 2603.09135 | translate | read | null |
| 2026-03-10 | Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards | Zhengzhao Ma et.al. | 2603.09117 | translate | read | null |
| 2026-03-10 | Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms | Renos Zabounidis et.al. | 2603.09090 | translate | read | null |
| 2026-03-10 | Learning Adaptive LLM Decoding | Chloe H. Su et.al. | 2603.09065 | translate | read | null |
| 2026-03-10 | Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection | George Edwards et.al. | 2603.09044 | translate | read | null |
| 2026-03-09 | PlayWorld: Learning Robot World Models from Autonomous Play | Tenny Yin et.al. | 2603.09030 | translate | read | null |
| 2026-03-09 | MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment | Kailong Fan et.al. | 2603.08987 | translate | read | null |
| 2026-03-09 | FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid | Niraj Pudasaini et.al. | 2603.08961 | translate | read | null |
| 2026-03-09 | A Survey of Reinforcement Learning For Economics | Pranjal Rawat et.al. | 2603.08956 | translate | read | null |
| 2026-03-09 | Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance | Joshua Castillo et.al. | 2603.08933 | translate | read | null |
| 2026-03-09 | Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks | Hanzhi Yu et.al. | 2603.08931 | translate | read | null |
| 2026-03-09 | APPLV: Adaptive Planner Parameter Learning from Vision-Language-Action Model | Yuanjie Lu et.al. | 2603.08862 | translate | read | null |
| 2026-03-09 | VisionCreator-R1: A Reflection-Enhanced Native Visual-Generation Agentic Model | Jinxiang Lai et.al. | 2603.08812 | translate | read | null |
| 2026-03-09 | Multi-level meta-reinforcement learning with skill-based curriculum | Sichen Yang et.al. | 2603.08773 | translate | read | null |
| 2026-03-09 | SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning | Kaushik Roy et.al. | 2603.08763 | translate | read | null |
| 2026-03-09 | Agentic Critical Training | Weize Liu et.al. | 2603.08706 | translate | read | null |
| 2026-03-09 | How Far Can Unsupervised RLVR Scale LLM Training? | Bingxiang He et.al. | 2603.08660 | translate | read | null |
| 2026-03-09 | Embedding Classical Balance Control Principles in Reinforcement Learning for Humanoid Recovery | Nehar Poddar et.al. | 2603.08619 | translate | read | null |
| 2026-03-09 | Diff-Muscle: Efficient Learning for Musculoskeletal Robotic Table Tennis | Wentao Zhao et.al. | 2603.08617 | translate | read | null |
| 2026-03-09 | Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control | Riccardo De Monte et.al. | 2603.08588 | translate | read | null |
| 2026-03-09 | MetaWorld-X: Hierarchical World Modeling via VLM-Orchestrated Experts for Humanoid Loco-Manipulation | Yutong Shen et.al. | 2603.08572 | translate | read | null |
| 2026-03-09 | RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback | Xiaoying Zhang et.al. | 2603.08561 | translate | read | null |
| 2026-03-09 | Impact of Connectivity on Laplacian Representations in Reinforcement Learning | Tommaso Giorgi et.al. | 2603.08558 | translate | read | null |
| 2026-03-09 | EquiBim: Learning Symmetry-Equivariant Policy for Bimanual Manipulation | Zhiyuan Zhang et.al. | 2603.08541 | translate | read | null |
| 2026-03-09 | Breaking the Bias Barrier in Concave Multi-Objective Reinforcement Learning | Swetha Ganesh et.al. | 2603.08518 | translate | read | null |
| 2026-03-09 | Oracle-Guided Soft Shielding for Safe Move Prediction in Chess | Prajit T Rajendran et.al. | 2603.08506 | translate | read | null |
| 2026-03-09 | LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning | Ariel Rodriguez et.al. | 2603.08476 | translate | read | null |
| 2026-03-09 | Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning | Shreya Das et.al. | 2603.08468 | translate | read | null |
| 2026-03-09 | Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck | Fabio Valerio Massoli et.al. | 2603.08462 | translate | read | null |
| 2026-03-09 | Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems | Théo Zangato et.al. | 2603.08418 | translate | read | null |
| 2026-03-09 | Aligning to Illusions: Choice Blindness in Human and AI Feedback | Wenbin Wu et.al. | 2603.08412 | translate | read | null |
| 2026-03-09 | A Recipe for Stable Offline Multi-agent Reinforcement Learning | Dongsu Lee et.al. | 2603.08399 | translate | read | null |
| 2026-03-09 | Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective | Liyuan Mao et.al. | 2603.08398 | translate | read | null |
| 2026-03-09 | SlowBA: An efficiency backdoor attack towards VLM-based GUI agents | Junxian Li et.al. | 2603.08316 | translate | read | null |
| 2026-03-09 | Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces | Hamish Flynn et.al. | 2603.08287 | translate | read | null |
| 2026-03-09 | SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM | Makoto Sato et.al. | 2603.08269 | translate | read | null |
| 2026-03-09 | Adaptive shape control for microswimmer navigation in turbulence | Jingran Qiu et.al. | 2603.08201 | translate | read | null |
| 2026-03-09 | RexDrug: Reliable Multi-Drug Combination Extraction through Reasoning-Enhanced LLMs | Zhijun Wang et.al. | 2603.08166 | translate | read | null |
| 2026-03-09 | Towards Human-Like Manipulation through RL-Augmented Teleoperation and Mixture-of-Dexterous-Experts VLA | Tutian Tang et.al. | 2603.08122 | translate | read | null |
| 2026-03-09 | Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting | Zhongjian Qiao et.al. | 2603.08118 | translate | read | null |
| 2026-03-09 | DeReCo: Decoupling Representation and Coordination Learning for Object-Adaptive Decentralized Multi-Robot Cooperative Transport | Kazuki Shibata et.al. | 2603.08111 | translate | read | null |
| 2026-03-09 | Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization | Hongli Zhou et.al. | 2603.08091 | translate | read | null |
| 2026-03-09 | In-Context Reinforcement Learning for Tool Use in Large Language Models | Yaoqi Ye et.al. | 2603.08068 | translate | read | null |
| 2026-03-09 | ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning | Yiran Zhao et.al. | 2603.08059 | translate | read | null |
| 2026-03-09 | MJ1: Multimodal Judgment via Grounded Verification | Bhavesh Kumar et.al. | 2603.07990 | translate | read | null |
| 2026-03-09 | On the Feasibility and Opportunity of Autoregressive 3D Object Detection | Zanming Huang et.al. | 2603.07985 | translate | read | null |
| 2026-03-09 | VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments | Ning Liu et.al. | 2603.07973 | translate | read | null |
| 2026-03-09 | Model-Free DRL Control for Power Inverters: From Policy Learning to Real-Time Implementation via Knowledge Distillation | Yang Yang et.al. | 2603.07964 | translate | read | null |
| 2026-03-09 | SGG-R $^{\rm 3}$ : From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation | Jiaye Feng et.al. | 2603.07961 | translate | read | null |
| 2026-03-09 | RL unknotter, hard unknots and unknotting number | Anne Dranowski et.al. | 2603.07955 | translate | read | null |
| 2026-03-09 | SMGI: A Structural Theory of General Artificial Intelligence | Aomar Osmani et.al. | 2603.07896 | translate | read | null |
| 2026-03-09 | SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans | Hansi Zeng et.al. | 2603.07853 | translate | read | null |
| 2026-03-08 | Relating Reinforcement Learning to Dynamic Programming-Based Planning | Filip V. Georgiev et.al. | 2603.07844 | translate | read | null |
| 2026-03-08 | Preference-Conditioned Reinforcement Learning for Space-Time Efficient Online 3D Bin Packing | Nikita Sarawgi et.al. | 2603.07800 | translate | read | null |
| 2026-03-08 | Toward Global Intent Inference for Human Motion by Inverse Reinforcement Learning | Sarmad Mehrdad et.al. | 2603.07797 | translate | read | null |
| 2026-03-08 | ProgAgent:A Continual RL Agent with Progress-Aware Rewards | Jinzhou Tan et.al. | 2603.07784 | translate | read | null |
| 2026-03-08 | Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems | Zongqian Li et.al. | 2603.07779 | translate | read | null |
| 2026-03-08 | Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models | Zongqian Li et.al. | 2603.07777 | translate | read | null |
| 2026-03-08 | Residual Control for Fast Recovery from Dynamics Shifts | Nethmi Jayasinghe et.al. | 2603.07775 | translate | read | null |
| 2026-03-08 | TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward | Yihong Luo et.al. | 2603.07700 | translate | read | null |
| 2026-03-08 | Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization | Anirudh Satheesh et.al. | 2603.07698 | translate | read | null |
| 2026-03-08 | Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques | Rahul Bera et.al. | 2603.07683 | translate | read | null |
| 2026-03-08 | Numerical Approach for On-the-Fly Active Flow Control via Flow Map Learning Method | Xinyu Liu et.al. | 2603.07678 | translate | read | null |
| 2026-03-08 | DAISS: Phase-Aware Imitation Learning for Dual-Arm Robotic Ultrasound-Guided Interventions | Feng Li et.al. | 2603.07663 | translate | read | null |
| 2026-03-08 | Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving | Chang Su et.al. | 2603.07642 | translate | read | null |
| 2026-03-08 | Exoskeleton Control through Learning to Reduce Biological Joint Moments in Simulations | Zihang You et.al. | 2603.07629 | translate | read | null |
| 2026-03-08 | GeoLoco: Leveraging 3D Geometric Priors from Visual Foundation Model for Robust RGB-Only Humanoid Locomotion | Yufei Liu et.al. | 2603.07624 | translate | read | null |
| 2026-03-08 | Approximate Imitation Learning for Event-based Quadrotor Flight in Cluttered Environments | Nico Messikommer et.al. | 2603.07578 | translate | read | null |
| 2026-03-08 | Constraints Matrix Diffusion based Generative Neural Solver for Vehicle Routing Problems | Zhenwei Wang et.al. | 2603.07568 | translate | read | null |
| 2026-03-08 | COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance | Dennis Gross et.al. | 2603.07546 | translate | read | null |
| 2026-03-08 | ICLR: In-Context Imitation Learning with Visual Reasoning | Toan Nguyen et.al. | 2603.07530 | translate | read | null |
| 2026-03-08 | TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning | Mingyue Cheng et.al. | 2603.07528 | translate | read | null |
| 2026-03-08 | Reinforcement learning-based dynamic cleaning scheduling framework for solar energy system | Heungjo An et.al. | 2603.07518 | translate | read | null |
| 2026-03-08 | InterReal: A Unified Physics-Based Imitation Framework for Learning Human-Object Interaction Skills | Dayang Liang et.al. | 2603.07516 | translate | read | null |
| 2026-03-08 | EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification | Binjia Zhou et.al. | 2603.07515 | translate | read | null |
| 2026-03-08 | Med-Evo: Test-time Self-evolution for Medical Multimodal Large Language Models | Dunyuan Xu et.al. | 2603.07443 | translate | read | null |
| 2026-03-08 | Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II | Yi Tian et.al. | 2603.07437 | translate | read | null |
| 2026-03-08 | Generalization in Online Reinforcement Learning for Mobile Agents | Li Gu et.al. | 2603.07432 | translate | read | null |
| 2026-03-08 | Dynamic Vehicle Routing Problem with Prompt Confirmation of Advance Requests | Amutheezan Sivagnanam et.al. | 2603.07422 | translate | read | null |
| 2026-03-08 | Underwater Embodied Intelligence for Autonomous Robots: A Constraint-Coupled Perspective on Planning, Control, and Deployment | Jingzehua Xu et.al. | 2603.07393 | translate | read | null |
| 2026-03-07 | Learning to Reflect: Hierarchical Multi-Agent Reinforcement Learning for CSI-Free mmWave Beam-Focusing | Hieu Le et.al. | 2603.07370 | translate | read | null |
| 2026-03-07 | Neural Control and Learning of Simulated Hand Movements With an EMG-Based Closed-Loop Interface | Balint K. Hodossy et.al. | 2603.07364 | translate | read | null |
| 2026-03-07 | Adversarial Latent-State Training for Robust Policies in Partially Observable Domains | Angad Singh Ahuja et.al. | 2603.07313 | translate | read | null |
| 2026-03-07 | AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery | Nilesh Jain et.al. | 2603.07300 | translate | read | null |
| 2026-03-07 | Adaptive Double-Booking Strategy for Outpatient Scheduling Using Multi-Objective Reinforcement Learning | Ninda Nurseha Amalina et.al. | 2603.07270 | translate | read | null |
| 2026-03-07 | Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving | Jiazhuo Li et.al. | 2603.07264 | translate | read | null |
| 2026-03-07 | Learning When to Cooperate Under Heterogeneous Goals | Max Taylor-Davies et.al. | 2603.07253 | translate | read | null |
| 2026-03-07 | Reinforcement Learning for Vehicle-to-Grid Voltage Regulation: Single-Hub to Multi-Hub Coordination with Battery-Aware Constraints | Jingbo Wang et.al. | 2603.07237 | translate | read | null |
| 2026-03-07 | $\textbf{Re}^{2}$ : Unlocking LLM Reasoning via Reinforcement Learning with Re-solving | Pinzheng Wang et.al. | 2603.07197 | translate | read | null |
| 2026-03-07 | RoTri-Diff: A Spatial Robot-Object Triadic Interaction-Guided Diffusion Model for Bimanual Manipulation | Zixuan Chen et.al. | 2603.07165 | translate | read | null |
| 2026-03-07 | Learning From Failures: Efficient Reinforcement Learning Control with Episodic Memory | Chenyang Miao et.al. | 2603.07110 | translate | read | null |
| 2026-03-07 | Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction | Xu Chen et.al. | 2603.07093 | translate | read | null |
| 2026-03-07 | Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR | Muhammad Khalifa et.al. | 2603.07084 | translate | read | null |
| 2026-03-07 | Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction | Michael Hauri et.al. | 2603.07083 | translate | read | null |
| 2026-03-07 | SSP: Safety-guaranteed Surgical Policy via Joint Optimization of Behavioral and Spatial Constraints | Jianshu Hu et.al. | 2603.07032 | translate | read | null |
| 2026-03-07 | RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States | Xiangjie Xiao et.al. | 2603.07020 | translate | read | null |
| 2026-03-07 | AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge | Karen Zhou et.al. | 2603.07019 | translate | read | null |
| 2026-03-07 | AdaGen: Learning Adaptive Policy for Image Synthesis | Zanlin Ni et.al. | 2603.06993 | translate | read | null |
| 2026-03-07 | Diffusion Controller: Framework, Algorithms and Parameterization | Tong Yang et.al. | 2603.06981 | translate | read | null |
| 2026-03-07 | NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning | Addison Kalanther et.al. | 2603.06977 | translate | read | null |
| 2026-03-07 | Topology-Aware Reinforcement Learning over Graphs for Resilient Power Distribution Networks | Roshni Anna Jacob et.al. | 2603.06964 | translate | read | null |
| 2026-03-07 | Learning Quadruped Walking from Seconds of Demonstration | Ruipeng Zhang et.al. | 2603.06961 | translate | read | null |
| 2026-03-07 | Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards | Xin Zhang et.al. | 2603.06958 | translate | read | null |
| 2026-03-06 | Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments | Ege C. Kaya et.al. | 2603.06946 | translate | read | null |
| 2026-03-06 | Collaborative Planning with Concurrent Synchronization for Operationally Constrained UAV-UGV Teams | Zihao Deng et.al. | 2603.06898 | translate | read | null |
| 2026-03-06 | Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration | Yanjun Chen et.al. | 2603.06859 | translate | read | null |
| 2026-03-06 | Reinforcing the World’s Edge: A Continual Learning Problem in the Multi-Agent-World Boundary | Dane Malenfant et.al. | 2603.06813 | translate | read | null |
| 2026-03-06 | Multi-Agent Reinforcement Learning with Submodular Reward | Wenjing Chen et.al. | 2603.06810 | translate | read | null |
| 2026-03-06 | Optimistic Policy Regularization | Mai Pham et.al. | 2603.06793 | translate | read | null |
| 2026-03-06 | HGT-Scheduler: Deep Reinforcement Learning for the Job Shop Scheduling Problem via Heterogeneous Graph Transformers | Bulent Soykan et.al. | 2603.06777 | translate | read | null |
| 2026-03-06 | HybridMimic: Hybrid RL-Centroidal Control for Humanoid Motion Mimicking | Ludwig Chee-Ying Tay et.al. | 2603.06775 | translate | read | null |
| 2026-03-06 | Stabilizing Reinforcement Learning for Diffusion Language Models | Jianyuan Zhong et.al. | 2603.06743 | translate | read | null |
| 2026-03-06 | Don’t Freeze, Don’t Crash: Extending the Safe Operating Range of Neural Navigation in Dense Crowds | Jiefu Zhang et.al. | 2603.06729 | translate | read | null |
| 2026-03-06 | Boosting deep Reinforcement Learning using pretraining with Logical Options | Zihan Ye et.al. | 2603.06565 | translate | read | null |
| 2026-03-06 | EgoReasoner: Learning Egocentric 4D Reasoning via Task-Adaptive Structured Thinking | Fangrui Zhu et.al. | 2603.06561 | translate | read | null |
| 2026-03-06 | On a PDE model for Learning in Stochastic Market Entry Games | Esther Bou Dagher et.al. | 2603.06514 | translate | read | null |
| 2026-03-06 | A Reference Architecture of Reinforcement Learning Frameworks | Xiaoran Liu et.al. | 2603.06413 | translate | read | null |
| 2026-03-06 | Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion | Pengcheng Jiang et.al. | 2603.06397 | translate | read | null |
| 2026-03-06 | OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis | Yuxuan Fan et.al. | 2603.06366 | translate | read | null |
| 2026-03-06 | From Entropy to Calibrated Uncertainty: Training Language Models to Reason About Uncertainty | Azza Jenane et.al. | 2603.06317 | translate | read | null |
| 2026-03-06 | Artificial Intelligence for Climate Adaptation: Reinforcement Learning for Climate Change-Resilient Transport | Miguel Costa et.al. | 2603.06278 | translate | read | null |
| 2026-03-06 | Synthetic Monitoring Environments for Reinforcement Learning | Leonard Pleiss et.al. | 2603.06252 | translate | read | null |
| 2026-03-06 | MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue | Naifan Zhang et.al. | 2603.06194 | translate | read | null |
| 2026-03-06 | Optimizing 3D Diffusion Models for Medical Imaging via Multi-Scale Reward Learning | Yueying Tian et.al. | 2603.06173 | translate | read | null |
| 2026-03-06 | Dual-Agent Multiple-Model Reinforcement Learning for Event-Triggered Human-Robot Co-Adaptation in Decoupled Task Spaces | Yaqi Li et.al. | 2603.06163 | translate | read | null |
| 2026-03-06 | Partial Policy Gradients for RL in LLMs | Puneet Mathur et.al. | 2603.06138 | translate | read | null |
| 2026-03-06 | ChatShopBuddy: Towards Reliable Conversational Shopping Agents via Reinforcement Learning | Yiruo Cheng et.al. | 2603.06065 | translate | read | null |
| 2026-03-06 | Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models | Canyu Chen et.al. | 2603.06049 | translate | read | null |
| 2026-03-06 | Reinforcement Learning for Secrecy Optimization in Underwater Energy Harvesting Relay Network | Shalini Tripathi et.al. | 2603.06046 | translate | read | null |
| 2026-03-06 | Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models | Jiadong Pan et.al. | 2603.06043 | translate | read | null |
| 2026-03-06 | ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning | Xingjian Tao et.al. | 2603.06024 | translate | read | null |
| 2026-03-06 | TADPO: Reinforcement Learning Goes Off-road | Zhouchonghao Wu et.al. | 2603.05995 | translate | read | null |
| 2026-03-06 | LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution | Song Fei et.al. | 2603.05947 | translate | read | null |
| 2026-03-06 | How to Model Your Crazyflie Brushless | Alexander Gräfe et.al. | 2603.05944 | translate | read | null |
| 2026-03-06 | Swooper: Learning High-Speed Aerial Grasping With a Simple Gripper | Ziken Huang et.al. | 2603.05935 | translate | read | null |
| 2026-03-06 | CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning | Yuxin Xie et.al. | 2603.05911 | translate | read | null |
| 2026-03-06 | Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning | Xuan Li et.al. | 2603.05900 | translate | read | null |
| 2026-03-06 | Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation | Changcheng Li et.al. | 2603.05881 | translate | read | null |
| 2026-03-06 | PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues | Yukun Qi et.al. | 2603.05869 | translate | read | null |
| 2026-03-06 | ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning | Juyong Jiang et.al. | 2603.05863 | translate | read | null |
| 2026-03-06 | Expert Knowledge-driven Reinforcement Learning for Autonomous Racing via Trajectory Guidance and Dynamics Constraints | Bo Leng et.al. | 2603.05842 | translate | read | null |
| 2026-03-06 | OpenHEART: Opening Heterogeneous Articulated Objects with a Legged Manipulator | Seonghyeon Lim et.al. | 2603.05830 | translate | read | null |
| 2026-03-06 | CDF-Glove: A Cable-Driven Force Feedback Glove for Dexterous Teleoperation | Huayue Liang et.al. | 2603.05804 | translate | read | null |
| 2026-03-06 | Task-Level Decisions to Gait Level Control: A Hierarchical Policy Approach for Quadruped Navigation | Sijia Li et.al. | 2603.05783 | translate | read | null |
| 2026-03-05 | MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation | Rifny Rachman et.al. | 2603.05760 | translate | read | null |
| 2026-03-05 | Reinforcement Learning for Power-Flow Network Analysis | Alperen Ergur et.al. | 2603.05673 | translate | read | null |
| 2026-03-05 | TransMASK: Masked State Representation through Learned Transformation | Sagar Parekh et.al. | 2603.05670 | translate | read | null |
| 2026-03-05 | When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On | Wisdom Ikezogwo et.al. | 2603.05659 | translate | read | null |
| 2026-03-05 | Thinking with Spatial Code for Physical-World Video Reasoning | Jieneng Chen et.al. | 2603.05591 | translate | read | null |
| 2026-03-05 | A Novel Hybrid Heuristic-Reinforcement Learning Optimization Approach for a Class of Railcar Shunting Problems | Ruonan Zhao et.al. | 2603.05579 | translate | read | null |
| 2026-03-05 | Task Parameter Extrapolation via Learning Inverse Tasks from Forward Demonstrations | Serdar Bahar et.al. | 2603.05576 | translate | read | null |
| 2026-03-05 | PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions | Arnau Boix-Granell et.al. | 2603.05574 | translate | read | null |
| 2026-03-05 | Autocorrelation effects in a stochastic-process model for decision making via time series | Tomoki Yamagami et.al. | 2603.05559 | translate | read | null |
| 2026-03-05 | RoboPocket: Improve Robot Policies Instantly with Your Phone | Junjie Fang et.al. | 2603.05504 | translate | read | null |
| 2026-03-05 | Latent Wasserstein Adversarial Imitation Learning | Siqi Yang et.al. | 2603.05440 | translate | read | null |
| 2026-03-05 | SpiderCat: Optimal Fault-Tolerant Cat State Preparation | Andrey Boris Khesin et.al. | 2603.05391 | translate | read | null |
| 2026-03-05 | DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning | Mohammad Mahdi Moradi et.al. | 2603.05357 | translate | read | null |
| 2026-03-05 | Latent Policy Steering through One-Step Flow Policies | Hokyun Im et.al. | 2603.05296 | translate | read | null |
| 2026-03-05 | Knowledge Divergence and the Value of Debate for Scalable Oversight | Robin Young et.al. | 2603.05293 | translate | read | null |
| 2026-03-05 | Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts | Samandar Samandarov et.al. | 2603.05276 | translate | read | null |
| 2026-03-05 | SarcasmMiner: A Dual-Track Post-Training Framework for Robust Audio-Visual Sarcasm Reasoning | Zhu Li et.al. | 2603.05275 | translate | read | null |
| 2026-03-05 | Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum | Shan Ning et.al. | 2603.05256 | translate | read | null |
| 2026-03-05 | Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards | Linghan Fang et.al. | 2603.05231 | translate | read | null |
| 2026-03-05 | KARL: Knowledge Agents via Reinforcement Learning | Jonathan D. Chang et.al. | 2603.05218 | translate | read | null |
| 2026-03-05 | LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting | Yewen Li et.al. | 2603.05134 | translate | read | null |
| 2026-03-05 | SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation | Youqiang Gui et.al. | 2603.05117 | translate | read | null |
| 2026-03-05 | Decoupling Task and Behavior: A Two-Stage Reward Curriculum in Reinforcement Learning for Robotics | Kilian Freitag et.al. | 2603.05113 | translate | read | null |
| 2026-03-05 | Reward-Conditioned Reinforcement Learning | Michal Nauman et.al. | 2603.05066 | translate | read | null |
| 2026-03-05 | WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents | Sicheng Fan et.al. | 2603.05044 | translate | read | null |
| 2026-03-05 | Formal Entropy-Regularized Control of Stochastic Systems | Menno van Zutphen et.al. | 2603.05021 | translate | read | null |
| 2026-03-05 | BioLLMAgent: A Hybrid Framework with Enhanced Structural Interpretability for Simulating Human Decision-Making in Computational Psychiatry | Zuo Fei et.al. | 2603.05016 | translate | read | null |
| 2026-03-05 | Competitive Multi-Operator Reinforcement Learning for Joint Pricing and Fleet Rebalancing in AMoD Systems | Emil Kragh Toft et.al. | 2603.05000 | translate | read | null |
| 2026-03-05 | 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding | Xiongkun Linghu et.al. | 2603.04976 | translate | read | null |
| 2026-03-05 | $\nabla$ -Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space | Peihao Wang et.al. | 2603.04948 | translate | read | null |
| 2026-03-05 | Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition | Mengze Hong et.al. | 2603.04945 | translate | read | null |
| 2026-03-05 | BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning | Yuan Li et.al. | 2603.04918 | translate | read | null |
| 2026-03-05 | VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory | Yuheng Lei et.al. | 2603.04910 | translate | read | null |
| 2026-03-05 | Task-Relevant and Irrelevant Region-Aware Augmentation for Generalizable Vision-Based Imitation Learning in Agricultural Manipulation | Shun Hattori et.al. | 2603.04845 | translate | read | null |
| 2026-03-05 | SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning | Manav Vora et.al. | 2603.04833 | translate | read | null |
| 2026-03-05 | VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment | Jiawei Chen et.al. | 2603.04822 | translate | read | null |
| 2026-03-05 | Diffusion Policy through Conditional Proximal Policy Optimization | Ben Liu et.al. | 2603.04790 | translate | read | null |
| 2026-03-05 | Adaptive Personalized Federated Reinforcement Learning for RIS-Assisted Aerial Relays in SAGINs with Fluid Antennas | Yuxuan Yang et.al. | 2603.04788 | translate | read | null |
| 2026-03-05 | Data-Driven Control of a Magnetically Actuated Fish-Like Robot | Akiyuki Koyama et.al. | 2603.04787 | translate | read | null |
| 2026-03-05 | Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction | Xingwu Chen et.al. | 2603.04783 | translate | read | null |
| 2026-03-05 | Selfish Cooperation Towards Low-Altitude Economy: Integrated Multi-Service Deployment with Resilient Federated Reinforcement Learning | Yuxuan Yang et.al. | 2603.04779 | translate | read | null |
| 2026-03-05 | Distributional Reinforcement Learning with Information Bottleneck for Uncertainty-Aware DRAM Equalization | Muhammad Usama et.al. | 2603.04768 | translate | read | null |
| 2026-03-05 | LLM-Guided Decentralized Exploration with Self-Organizing Robot Teams | Hiroaki Kawashima et.al. | 2603.04762 | translate | read | null |
| 2026-03-05 | SeekRBP: Leveraging Sequence-Structure Integration with Reinforcement Learning for Receptor-Binding Protein Identification | Xiling Luo et.al. | 2603.04748 | translate | read | null |
| 2026-03-04 | Optimizing Language Models for Crosslingual Knowledge Consistency | Tianyu Liu et.al. | 2603.04678 | translate | read | null |
| 2026-03-04 | When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift | Kevin Vogt-Lowell et.al. | 2603.04648 | translate | read | null |
| 2026-03-04 | Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning | Lei Huang et.al. | 2603.04597 | translate | read | null |
| 2026-03-04 | ELLIPSE: Evidential Learning for Robust Waypoints and Uncertainties | Zihao Dong et.al. | 2603.04585 | translate | read | null |
| 2026-03-04 | Risk-Aware Reinforcement Learning for Mobile Manipulation | Michael Groom et.al. | 2603.04579 | translate | read | null |
| 2026-03-04 | Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling | Tal Daniel et.al. | 2603.04553 | translate | read | null |
| 2026-03-04 | Transformer-Based Multipath Congestion Control: A Decoupled Approach for Wireless Uplinks | Zongyuan Zhang et.al. | 2603.04550 | translate | read | null |
| 2026-03-04 | PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation | Rosy Chen et.al. | 2603.04531 | translate | read | null |
| 2026-03-04 | TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning | Maximilian von Klinski et.al. | 2603.04380 | translate | read | null |
| 2026-03-04 | Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks | Haoyu Liu et.al. | 2603.04364 | translate | read | null |
| 2026-03-04 | A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications | Ozan Aygün et.al. | 2603.04353 | translate | read | null |
| 2026-03-04 | Tendon Force Modeling for Sim2Real Transfer of Reinforcement Learning Policies for Tendon-Driven Robots | Valentin Yuryev et.al. | 2603.04351 | translate | read | null |
| 2026-03-04 | What Does Flow Matching Bring To TD Learning? | Bhavya Agrawalla et.al. | 2603.04333 | translate | read | null |
| 2026-03-04 | IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning | Yihao Qin et.al. | 2603.04289 | translate | read | null |
| 2026-03-04 | Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory | Zhenting Wang et.al. | 2603.04257 | translate | read | null |
| 2026-03-04 | OptiQKD: A Machine Learning-Optimized Framework for Real-Time Parameter Tuning in Quantum Key Distribution | Noureldin Mohamed et.al. | 2603.04192 | translate | read | null |
| 2026-03-04 | Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation | Ilseung Park et.al. | 2603.04166 | translate | read | null |
| 2026-03-04 | BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning | Tarjei Paule Hage et.al. | 2603.04124 | translate | read | null |
| 2026-03-04 | Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning | Ajan Subramanian et.al. | 2603.04098 | translate | read | null |
| 2026-03-04 | Swimming Under Constraints: A Safe Reinforcement Learning Framework for Quadrupedal Bio-Inspired Propulsion | Xinyu Cui et.al. | 2603.04073 | translate | read | null |
| 2026-03-04 | SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling | Jinlong Cui et.al. | 2603.04071 | translate | read | null |
| 2026-03-04 | Force-Aware Residual DAgger via Trajectory Editing for Precision Insertion with Impedance Control | Yiou Huang et.al. | 2603.04038 | translate | read | null |
| 2026-03-04 | Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback | Fabian Domberg et.al. | 2603.04029 | translate | read | null |
| 2026-03-04 | Rethinking the Efficiency and Effectiveness of Reinforcement Learning for Radiology Report Generation | Zilin Lu et.al. | 2603.04022 | translate | read | null |
| 2026-03-04 | Discriminative Perception via Anchored Description for Reasoning Segmentation | Tao Yang et.al. | 2603.04002 | translate | read | null |
| 2026-03-04 | Structural Action Transformer for 3D Dexterous Manipulation | Xiaohan Lei et.al. | 2603.03960 | translate | read | null |
| 2026-03-04 | GIPO: Gaussian Importance Sampling Policy Optimization | Chengxuan Lu et.al. | 2603.03955 | translate | read | null |
| 2026-03-04 | RVN-Bench: A Benchmark for Reactive Visual Navigation | Jaewon Lee et.al. | 2603.03953 | translate | read | null |
| 2026-03-04 | Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control | Nicolas Helson et.al. | 2603.03932 | translate | read | null |
| 2026-03-04 | IROSA: Interactive Robot Skill Adaptation using Natural Language | Markus Knauer et.al. | 2603.03897 | translate | read | null |
| 2026-03-04 | Dual-Interaction-Aware Cooperative Control Strategy for Alleviating Mixed Traffic Congestion | Zhengxuan Liu et.al. | 2603.03848 | translate | read | null |
| 2026-03-04 | Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation | Yun Lu et.al. | 2603.03820 | translate | read | null |
| 2026-03-04 | Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling | Emile Anand et.al. | 2603.03759 | translate | read | null |
| 2026-03-04 | Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning | Chuang Zhang et.al. | 2603.03752 | translate | read | null |
| 2026-03-04 | Interaction-Aware Whole-Body Control for Compliant Object Transport | Hao Zhang et.al. | 2603.03751 | translate | read | null |
| 2026-03-04 | HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration | Hao Zhang et.al. | 2603.03741 | translate | read | null |
| 2026-03-04 | UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services | Tonmoy Dey et.al. | 2603.03701 | translate | read | null |
| 2026-03-04 | MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation | Lu Yang et.al. | 2603.03680 | translate | read | null |
| 2026-03-04 | MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation | Guoyi Li et.al. | 2603.03677 | translate | read | null |
| 2026-03-04 | Principled Learning-to-Communicate with Quasi-Classical Information Structures | Xiangyu Liu et.al. | 2603.03664 | translate | read | null |
| 2026-03-04 | Freezing of Gait Prediction using Proactive Agent that Learns from Selected Experience and DDQN Algorithm | Septian Enggar Sukmana et.al. | 2603.03651 | translate | read | null |
| 2026-03-04 | Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration | Danish Rizvi et.al. | 2603.03595 | translate | read | null |
| 2026-03-03 | Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence | Shengbo Wang et.al. | 2603.03523 | translate | read | null |
| 2026-03-03 | PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation | Shang Wu et.al. | 2603.03505 | translate | read | null |
| 2026-03-03 | Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion | Haoran Lu et.al. | 2603.03485 | translate | read | null |
| 2026-03-03 | Optimal trajectory-guided stochastic co-optimization for e-fuel system design and real-time operation | Jeongdong Kim et.al. | 2603.03484 | translate | read | null |
| 2026-03-03 | Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning | Harin Lee et.al. | 2603.03480 | translate | read | null |
| 2026-03-03 | [Re] FairDICE: A Gap Between Theory And Practice | Peter Adema et.al. | 2603.03454 | translate | read | null |
| 2026-03-03 | Beyond Accuracy: Evaluating Visual Grounding In Multimodal Medical Reasoning | Anas Zafar et.al. | 2603.03437 | translate | read | null |
| 2026-03-03 | Multi-Agent-Based Simulation of Archaeological Mobility in Uneven Landscapes | Chairi Kiourt et.al. | 2603.03390 | translate | read | null |
| 2026-03-03 | How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference | Toru Lin et.al. | 2603.03280 | translate | read | null |
| 2026-03-03 | ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation | Xialin He et.al. | 2603.03279 | translate | read | null |
| 2026-03-03 | Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use | Aradhye Agarwal et.al. | 2603.03205 | translate | read | null |
| 2026-03-03 | Specificity-aware reinforcement learning for fine-grained open-world classification | Samuele Angheben et.al. | 2603.03197 | translate | read | null |
| 2026-03-03 | Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing | Jiyuan Wang et.al. | 2603.03143 | translate | read | null |
| 2026-03-03 | RL-Based Coverage Path Planning for Deformable Objects on 3D Surfaces | Yuhang Zhang et.al. | 2603.03137 | translate | read | null |
| 2026-03-03 | Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics | Hossein Rastgoftar et.al. | 2603.03127 | translate | read | null |
| 2026-03-03 | Proactive Guiding Strategy for Item-side Fairness in Interactive Recommendation | Chongjun Xia et.al. | 2603.03094 | translate | read | null |
| 2026-03-03 | RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization | Siwei Zhang et.al. | 2603.03078 | translate | read | null |
| 2026-03-03 | TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning | Christian Greisinger et.al. | 2603.03072 | translate | read | null |
| 2026-03-03 | Reinforcement Learning with Symbolic Reward Machines | Thomas Krug et.al. | 2603.03068 | translate | read | null |
| 2026-03-03 | CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots | Shihao Ma et.al. | 2603.03067 | translate | read | null |
| 2026-03-03 | PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems | Sudip Bhujel et.al. | 2603.03054 | translate | read | null |
| 2026-03-03 | QFlowNet: Fast, Diverse, and Efficient Unitary Synthesis with Generative Flow Networks | Inhoe Koo et.al. | 2603.03045 | translate | read | null |
| 2026-03-03 | Why Does RLAIF Work At All? | Robin Young et.al. | 2603.03000 | translate | read | null |
| 2026-03-03 | Contextualized Privacy Defense for LLM Agents | Yule Wen et.al. | 2603.02983 | translate | read | null |
| 2026-03-03 | DreamFlow: Local Navigation Beyond Observation via Conditional Flow Matching in the Latent Space | Jiwon Park et.al. | 2603.02976 | translate | read | null |
| 2026-03-03 | CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning | Zhenquan Yao et.al. | 2603.02951 | translate | read | null |
| 2026-03-03 | Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models | Fengzhi Li et.al. | 2603.02938 | translate | read | null |
| 2026-03-03 | Contextual Latent World Models for Offline Meta Reinforcement Learning | Mohammadreza Nakheai et.al. | 2603.02935 | translate | read | null |
| 2026-03-03 | On the Structural Limitations of Weight-Based Neural Adaptation and the Role of Reversible Behavioral Learning | Pardhu Sri Rushi Varma Konduru et.al. | 2603.02934 | translate | read | null |
| 2026-03-03 | Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection? | Xin Wang et.al. | 2603.02914 | translate | read | null |
| 2026-03-03 | SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training | Qi Zhang et.al. | 2603.02908 | translate | read | null |
| 2026-03-03 | Learning in Markov Decision Processes with Exogenous Dynamics | Davide Maran et.al. | 2603.02862 | translate | read | null |
| 2026-03-03 | Rhythm: Learning Interactive Whole-Body Control for Dual Humanoids | Hongjin Chen et.al. | 2603.02856 | translate | read | null |
| 2026-03-03 | Learning Memory-Enhanced Improvement Heuristics for Flexible Job Shop Scheduling | Jiaqi Wang et.al. | 2603.02846 | translate | read | null |
| 2026-03-03 | VSearcher: Long-Horizon Multimodal Search Agent via Reinforcement Learning | Ruiyang Zhang et.al. | 2603.02795 | translate | read | null |
| 2026-03-03 | Generative adversarial imitation learning for robot swarms: Learning from human demonstrations and trained policies | Mattes Kraus et.al. | 2603.02783 | translate | read | null |
| 2026-03-03 | Next Embedding Prediction Makes World Models Stronger | George Bredis et.al. | 2603.02765 | translate | read | null |
| 2026-03-03 | Enhancing User Throughput in Multi-panel mmWave Radio Access Networks for Beam-based MU-MIMO Using a DRL Method | Ramin Hashemi et.al. | 2603.02745 | translate | read | null |
| 2026-03-03 | From “What” to “How”: Constrained Reasoning for Autoregressive Image Generation | Ruxue Yan et.al. | 2603.02712 | translate | read | null |
| 2026-03-03 | Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization | Yueyang Cang et.al. | 2603.02701 | translate | read | null |
| 2026-03-03 | VisionCreator: A Native Visual-Generation Agentic Model with Understanding, Thinking, Planning and Creation | Jinxiang Lai et.al. | 2603.02681 | translate | read | null |
| 2026-03-03 | Watch Your Step: Learning Semantically-Guided Locomotion in Cluttered Environment | Denan Liang et.al. | 2603.02657 | translate | read | null |
| 2026-03-03 | Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization | Seongmin Kim et.al. | 2603.02654 | translate | read | null |
| 2026-03-03 | Improving Diffusion Planners by Self-Supervised Action Gating with Energies | Yuan Lu et.al. | 2603.02650 | translate | read | null |
| 2026-03-02 | Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training | Valentin Lacombe et.al. | 2603.02208 | translate | read | null |
| 2026-03-02 | Tool Verification for Test-Time Reinforcement Learning | Ruotong Liao et.al. | 2603.02203 | translate | read | null |
| 2026-03-02 | Near-Optimal Regret for KL-Regularized Multi-Armed Bandits | Kaixuan Ji et.al. | 2603.02155 | translate | read | null |
| 2026-03-02 | LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards | Guanzheng Chen et.al. | 2603.02146 | translate | read | null |
| 2026-03-02 | Rethinking Camera Choice: An Empirical Study on Fisheye Camera Properties in Robotic Manipulation | Han Xue et.al. | 2603.02139 | translate | read | null |
| 2026-03-02 | Pencil Puzzle Bench: A Benchmark for Multi-Step Verifiable Reasoning | Justin Waugh et.al. | 2603.02119 | translate | read | null |
| 2026-03-02 | ACDC: Adaptive Curriculum Planning with Dynamic Contrastive Control for Goal-Conditioned Reinforcement Learning in Robotic Manipulation | Xuerui Wang et.al. | 2603.02104 | translate | read | null |
| 2026-03-02 | Learning from Synthetic Data Improves Multi-hop Reasoning | Anmol Kabra et.al. | 2603.02091 | translate | read | null |
| 2026-03-02 | Reinforcement Learning-Based Filters for Convection-Dominated Flows: Reference-Free and Reference-Guided Training | Anna Ivagnes et.al. | 2603.02086 | translate | read | null |
| 2026-03-02 | $π$ -StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs | Siting Wang et.al. | 2603.02083 | translate | read | null |
| 2026-03-02 | Accelerating PDE Surrogates via RL-Guided Mesh Optimization | Yang Meng et.al. | 2603.02066 | translate | read | null |
| 2026-03-02 | Expanding LLM Agent Boundaries with Strategy-Guided Exploration | Andrew Szot et.al. | 2603.02045 | translate | read | null |
| 2026-03-02 | Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards | Faisal Mohamed et.al. | 2603.02008 | translate | read | null |
| 2026-03-02 | Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection | Yuchen Zhang et.al. | 2603.01993 | translate | read | null |
| 2026-03-02 | CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production | Yixin Nie et.al. | 2603.01973 | translate | read | null |
| 2026-03-02 | CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification | Jinpeng Chen et.al. | 2603.01940 | translate | read | null |
| 2026-03-02 | LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving | Yuechen Luo et.al. | 2603.01928 | translate | read | null |
| 2026-03-02 | Efficient RLVR Training via Weighted Mutual Information Data Selection | Xinyu Zhou et.al. | 2603.01907 | translate | read | null |
| 2026-03-02 | Visual Bias in Simulated Users: The Impact of Luminance and Contrast on Reinforcement Learning-based Interaction | Hannah Selder et.al. | 2603.01901 | translate | read | null |
| 2026-03-02 | Generative Visual Chain-of-Thought for Image Editing | Zijin Yin et.al. | 2603.01893 | translate | read | null |
| 2026-03-02 | SEAR: Sample Efficient Action Chunking Reinforcement Learning | C. F. Maximilian Nagy et.al. | 2603.01891 | translate | read | null |
| 2026-03-02 | FireRed-OCR Technical Report | Hao Wu et.al. | 2603.01840 | translate | read | link |
| 2026-03-02 | Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport | Harry Amad et.al. | 2603.01771 | translate | read | null |
| 2026-03-02 | Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning | Naoki Shitanda et.al. | 2603.01741 | translate | read | null |
| 2026-03-02 | TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training | Jinluan Yang et.al. | 2603.01714 | translate | read | null |
| 2026-03-02 | Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning | Haonan Jia et.al. | 2603.01696 | translate | read | null |
| 2026-03-02 | MVR: Multi-view Video Reward Shaping for Reinforcement Learning | Lirui Luo et.al. | 2603.01694 | translate | read | null |
| 2026-03-02 | Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs | Shuangchun Gui et.al. | 2603.01667 | translate | read | null |
| 2026-03-02 | Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning | Jiebin Zhang et.al. | 2603.01639 | translate | read | null |
| 2026-03-02 | Learning Thermal-Aware Locomotion Policies for an Electrically-Actuated Quadruped Robot | Letian Qian et.al. | 2603.01631 | translate | read | null |
| 2026-03-02 | ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment in Domain-Specific Agents | Pengbo Liu et.al. | 2603.01620 | translate | read | null |
| 2026-03-02 | CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework | Yuexi Du et.al. | 2603.01607 | translate | read | null |
| 2026-03-02 | Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models | Qiyuan Zhang et.al. | 2603.01571 | translate | read | null |
| 2026-03-02 | Investigating Group Relative Policy Optimization for Diffusion Transformer based Text-to-Audio Generation | Yi Gu et.al. | 2603.01565 | translate | read | null |
| 2026-03-02 | LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models | Chenxing Wei et.al. | 2603.01563 | translate | read | null |
| 2026-03-02 | State-Action Inpainting Diffuser for Continuous Control with Delay | Dongqi Han et.al. | 2603.01553 | translate | read | null |
| 2026-03-02 | GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control | Haofeng Xu et.al. | 2603.01501 | translate | read | null |
| 2026-03-02 | LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning | Chang Yao et.al. | 2603.01488 | translate | read | null |
| 2026-03-02 | Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents | Haojin Yang et.al. | 2603.01481 | translate | read | null |
| 2026-03-02 | Towards Robot Skill Learning and Adaptation with Gaussian Processes | A K M Nadimul Haque et.al. | 2603.01480 | translate | read | null |
| 2026-03-02 | ProtRLSearch: A Multi-Round Multimodal Protein Search Agent with Large Language Models Trained via Reinforcement Learning | Congying Liu et.al. | 2603.01464 | translate | read | null |
| 2026-03-02 | Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning | Shaohuai Liu et.al. | 2603.01452 | translate | read | null |
| 2026-03-02 | Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents | Zhixiang Wang et.al. | 2603.01416 | translate | read | null |
| 2026-03-02 | MIST-RL: Mutation-based Incremental Suite Testing via Reinforcement Learning | Sicheng Zhu et.al. | 2603.01409 | translate | read | null |
| 2026-03-02 | SubstratumGraphEnv: Reinforcement Learning Environment (RLE) for Modeling System Attack Paths | Bahirah Adewunmi et.al. | 2603.01340 | translate | read | null |
| 2026-03-02 | Energy Efficient Traffic Scheduling For Optical LEO Satellite Downlinks | Ethan Fettes et.al. | 2603.01334 | translate | read | null |
| 2026-03-01 | PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure | Joshua Steier et.al. | 2603.01309 | translate | read | null |
| 2026-03-01 | Hybrid TD3: Overestimation Bias Analysis and Stable Policy Optimization for Hybrid Action Space | Thanh-Tuan Tran et.al. | 2603.01302 | translate | read | null |
| 2026-03-01 | When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains | Ahmadreza Jeddi et.al. | 2603.01301 | translate | read | link |
| 2026-03-01 | Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models | Adel Javanmard et.al. | 2603.01293 | translate | read | null |
| 2026-03-01 | Integrating LTL Constraints into PPO for Safe Reinforcement Learning | Maifang Zhang et.al. | 2603.01292 | translate | read | null |
| 2026-03-01 | Beyond Reward: A Bounded Measure of Agent Environment Coupling | Wael Hafez et.al. | 2603.01283 | translate | read | null |
| 2026-03-01 | MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers | Abdulhamid M. Mousa et.al. | 2603.01260 | translate | read | null |
| 2026-03-01 | Towards Policy-Adaptive Image Guardrail: Benchmark and Method | Caiyong Piao et.al. | 2603.01228 | translate | read | null |
| 2026-03-01 | Can Thinking Models Think to Detect Hateful Memes? | Mohamed Bayan Kmainasi et.al. | 2603.01225 | translate | read | null |
| 2026-03-01 | Learn Hard Problems During RL with Reference Guided Fine-tuning | Yangzhen Wu et.al. | 2603.01223 | translate | read | null |
| 2026-03-01 | Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning | Dan Qiao et.al. | 2603.01221 | translate | read | null |
| 2026-03-01 | Reasoning Boosts Opinion Alignment in LLMs | Frédéric Berdoz et.al. | 2603.01214 | translate | read | null |
| 2026-03-01 | PARWiS: Winner determination under shoestring budgets using active pairwise comparisons | Shailendra Bhandari et.al. | 2603.01171 | translate | read | null |
| 2026-03-01 | BeautyGRPO: Aesthetic Alignment for Face Retouching via Dynamic Path Guidance and Fine-Grained Preference Modeling | Jiachen Yang et.al. | 2603.01163 | translate | read | null |
| 2026-03-01 | DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent | Tongzhou Wu et.al. | 2603.01152 | translate | read | null |
| 2026-03-01 | Compact Task-Aligned Imitation Learning for Laboratory Automation | Kanata Suzuki et.al. | 2603.01110 | translate | read | null |
| 2026-03-01 | DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage | Haowen Gao et.al. | 2603.01106 | translate | read | null |
| 2026-03-01 | Feasible Pairings for Decentralized Integral Controllability of Non-Square Systems | Yuhao Tong et.al. | 2603.01076 | translate | read | null |
| 2026-03-01 | How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning | Xiangxiang Zhang et.al. | 2603.01070 | translate | read | null |
| 2026-03-01 | Unleashing VLA Potentials in Autonomous Driving via Explicit Learning from Failures | Yuechen Luo et.al. | 2603.01063 | translate | read | null |
| 2026-03-01 | MM-DeepResearch: A Simple and Effective Multimodal Agentic Search Baseline | Huanjin Yao et.al. | 2603.01050 | translate | read | null |
| 2026-03-01 | HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents | Hongbo Jin et.al. | 2603.00977 | translate | read | null |
| 2026-03-01 | Intent-Context Synergy Reinforcement Learning for Autonomous UAV Decision-Making in Air Combat | Jiahao Fu et.al. | 2603.00974 | translate | read | null |
| 2026-03-01 | Stabilizing Policy Optimization via Logits Convexity | Hongzhan Chen et.al. | 2603.00963 | translate | read | null |
| 2026-03-01 | HierKick: Hierarchical Reinforcement Learning for Vision-Guided Soccer Robot Control | Yizhi Chen et.al. | 2603.00948 | translate | read | null |
| 2026-03-01 | Non-Rectangular Average-Reward Robust MDPs: Non-Rectangular Average-Reward Robust MDPs:Optimal Policies and Their Transient Values | Shengbo wang et.al. | 2603.00945 | translate | read | null |
| 2026-03-01 | Minimalist Compliance Control | Haochen Shi et.al. | 2603.00913 | translate | read | null |
| 2026-03-01 | Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning | Ke Sun et.al. | 2603.00903 | translate | read | null |
| 2026-03-01 | CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning | Xinyu Zhu et.al. | 2603.00889 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)