Reinforcement Learning - 2026-03

Publish Date	Title	Authors	PDF	Translate	Read	Code
2026-03-31	HapCompass: A Rotational Haptic Device for Contact-Rich Robotic Teleoperation	Xiangshan Tan et.al.	2603.30042	translate	read	null
2026-03-31	Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models	Md Saad et.al.	2603.30022	translate	read	null
2026-03-31	Phyelds: A Pythonic Framework for Aggregate Computing	Gianluca Aguzzi et.al.	2603.29999	translate	read	null
2026-03-31	GreenFLag: A Green Agentic Approach for Energy-Efficient Federated Learning	Theodora Panagea et.al.	2603.29933	translate	read	null
2026-03-31	ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training	Rui Ai et.al.	2603.29871	translate	read	null
2026-03-31	An Output Feedback Q-learning Algorithm for Optimal Control of Nonlinear Systems with Koopman Linear Embedding	Victor G. Lopez et.al.	2603.29858	translate	read	null
2026-03-31	Friends, Foes, and First Authors: A Game Theory Model of How Power Plays Rewrite Academic Co-Authorship Networks	Amit Bengal et.al.	2603.29834	translate	read	null
2026-03-31	Reinforced Reasoning for End-to-End Retrosynthetic Planning	Chenyang Zuo et.al.	2603.29723	translate	read	null
2026-03-31	6GAgentGym: Tool Use, Data Synthesis, and Agentic Learning for Network Management	Jiao Chen et.al.	2603.29656	translate	read	null
2026-03-31	ASI-Evolve: AI Accelerates AI	Weixian Xu et.al.	2603.29640	translate	read	null
2026-03-31	Learning Diagnostic Reasoning for Decision Support in Toxicology	Nico Oberländer et.al.	2603.29608	translate	read	null
2026-03-31	GraSP-STL: A Graph-Based Framework for Zero-Shot Signal Temporal Logic Planning via Offline Goal-Conditioned Reinforcement Learning	Ancheng Hou et.al.	2603.29533	translate	read	null
2026-03-31	Target-Aligned Reinforcement Learning	Leonard S. Pleiss et.al.	2603.29501	translate	read	null
2026-03-31	Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries	Luoxin Chen et.al.	2603.29500	translate	read	null
2026-03-31	MemFactory: Unified Inference & Training Framework for Agent Memory	Ziliang Guo et.al.	2603.29493	translate	read	null
2026-03-31	Calibrated Confidence Expression for Radiology Report Generation	David Bani-Harouni et.al.	2603.29492	translate	read	null
2026-03-31	Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement Learning	Jiaao Ma et.al.	2603.29426	translate	read	null
2026-03-31	AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP	Enlai Li et.al.	2603.29369	translate	read	null
2026-03-31	Scaling Whole-Body Human Musculoskeletal Behavior Emulation for Specificity and Diversity	Yunyue Wei et.al.	2603.29332	translate	read	null
2026-03-31	Downsides of Smartness Across Edge-Cloud Continuum in Modern Industry	Akhil Gupta Chigullapally et.al.	2603.29289	translate	read	null
2026-03-31	MemRerank: Preference Memory for Personalized Product Reranking	Zhiyuan Peng et.al.	2603.29247	translate	read	null
2026-03-30	Gen-Searcher: Reinforcing Agentic Search for Image Generation	Kaituo Feng et.al.	2603.28767	translate	read	null
2026-03-30	SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning	Philip Schroeder et.al.	2603.28730	translate	read	null
2026-03-30	Stepwise Credit Assignment for GRPO on Flow-Matching Models	Yash Savani et.al.	2603.28718	translate	read	null
2026-03-30	Dynamic Dual-Granularity Skill Bank for Agentic RL	Songjun Tu et.al.	2603.28716	translate	read	null
2026-03-30	DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing	Kailai Feng et.al.	2603.28713	translate	read	null
2026-03-30	Dynamic Lookahead Distance via Reinforcement Learning-Based Pure Pursuit for Autonomous Racing	Mohamed Elgouhary et.al.	2603.28625	translate	read	null
2026-03-30	Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning	Ziqi Miao et.al.	2603.28618	translate	read	null
2026-03-30	Learning Partial Action Replacement in Offline MARL	Yue Jin et.al.	2603.28573	translate	read	null
2026-03-30	GraphWalker: Agentic Knowledge Graph Question Answering via Synthetic Trajectory Curriculum	Shuwen Xu et.al.	2603.28533	translate	read	null
2026-03-30	Intelligent Radio Resource Slicing for 6G In-Body Subnetworks	Samira Abdelrahman et.al.	2603.28529	translate	read	null
2026-03-30	Tac2Real: Reliable and GPU Visuotactile Simulation for Online Reinforcement Learning and Zero-Shot Real-World Deployment	Ningyu Yan et.al.	2603.28475	translate	read	null
2026-03-30	CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains	Wenhan Wang et.al.	2603.28474	translate	read	null
2026-03-30	$R_{dm}$ : Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation	Linqian Fan et.al.	2603.28460	translate	read	null
2026-03-30	Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation	Robin Kühn et.al.	2603.28422	translate	read	null
2026-03-30	Learning unified control of internal spin squeezing in atomic qudits for magnetometry	C. Z. Cao et.al.	2603.28421	translate	read	null
2026-03-30	Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models	Alkis Sygkounas et.al.	2603.28416	translate	read	null
2026-03-30	Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids	Carlos S. Sepúlveda et.al.	2603.28385	translate	read	null
2026-03-30	Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models	Tao Xia et.al.	2603.28367	translate	read	null
2026-03-30	Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization	He Du et.al.	2603.28342	translate	read	null
2026-03-30	Competitor-aware Race Management for Electric Endurance Racing	Wytze de Vries et.al.	2603.28286	translate	read	null
2026-03-30	Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback	Andi Nika et.al.	2603.28281	translate	read	null
2026-03-30	Cost-Matching Model Predictive Control for Efficient Reinforcement Learning in Humanoid Locomotion	Wenqi Cai et.al.	2603.28243	translate	read	null
2026-03-30	ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models	Song Yu et.al.	2603.28204	translate	read	null
2026-03-30	A Deep Reinforcement Learning Framework for Closed-loop Guidance of Fish Schools via Virtual Agents	Takato Shibayama et.al.	2603.28200	translate	read	null
2026-03-30	MedLoc-R1: Performance-Aware Curriculum Reward Scheduling for GRPO-Based Medical Visual Grounding	Guangjing Yang et.al.	2603.28120	translate	read	null
2026-03-30	$AutoDrive\text{-}P^3$ : Unified Chain of Perception-Prediction-Planning Thought via Reinforcement Fine-Tuning	Yuqi Ye et.al.	2603.28116	translate	read	null
2026-03-30	Heddle: A Distributed Orchestration System for Agentic RL Rollout	Zili Zhang et.al.	2603.28101	translate	read	null
2026-03-30	Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection	Tim Plotzki et.al.	2603.28074	translate	read	null
2026-03-30	Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL	Udita Ghosh et.al.	2603.28053	translate	read	null
2026-03-30	CARLA-Air: Fly Drones Inside a CARLA World – A Unified Infrastructure for Air-Ground Embodied Intelligence	Tianle Zeng et.al.	2603.28032	translate	read	null
2026-03-30	Energy-Aware Imitation Learning for Steering Prediction Using Events and Frames	Hu Cao et.al.	2603.28008	translate	read	null
2026-03-30	SARL: Label-Free Reinforcement Learning by Rewarding Reasoning Topology	Yifan Wang et.al.	2603.27977	translate	read	null
2026-03-30	Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning	Bodla Krishna Vamshi et.al.	2603.27971	translate	read	null
2026-03-30	Flip Stunts on Bicycle Robots using Iterative Motion Imitation	Jeonghwan Kim et.al.	2603.27944	translate	read	null
2026-03-25	DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving	Pengxuan Yang et.al.	2603.24587	translate	read	null
2026-03-25	MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination	Zhuo Li et.al.	2603.24579	translate	read	null
2026-03-25	VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models	Qijia He et.al.	2603.24575	translate	read	null
2026-03-25	Completeness of Unbounded Best-First Minimax and Descent Minimax	Quentin Cohen-Solal et.al.	2603.24572	translate	read	null
2026-03-25	Composer 2 Technical Report	Cursor Reseach et.al.	2603.24477	translate	read	null
2026-03-25	Improving Lean4 Autoformalization via Cycle Consistency Fine-tuning	Arsen Shebzukhov et.al.	2603.24372	translate	read	null
2026-03-25	CoordLight: Learning Decentralized Coordination for Network-Wide Traffic Signal Control	Yifeng Zhang et.al.	2603.24366	translate	read	null
2026-03-25	LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control	Yifeng Zhang et.al.	2603.24361	translate	read	null
2026-03-25	Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning	Dogan Urgun et.al.	2603.24324	translate	read	null
2026-03-25	Heuristic Self-Paced Learning for Domain Adaptive Semantic Segmentation under Adverse Conditions	Shiqin Wang et.al.	2603.24322	translate	read	null
2026-03-25	C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents	Guihlerme Daubt et.al.	2603.24241	translate	read	null
2026-03-25	Decentralized End-to-End Multi-AAV Pursuit Using Predictive Spatio-Temporal Observation via Deep Reinforcement Learning	Yude Li et.al.	2603.24238	translate	read	null
2026-03-25	SumRank: Aligning Summarization Models for Long-Document Listwise Reranking	Jincheng Feng et.al.	2603.24204	translate	read	null
2026-03-25	A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula	Cansu Sancaktar et.al.	2603.24202	translate	read	null
2026-03-25	Optimized control protocols for stable skyrmion creation using deep reinforcement learning	Ji Seok Song et.al.	2603.24177	translate	read	null
2026-03-25	A Longitudinal Analysis of the CEC Single-Objective Competitions (2010-2024) and Implications for Variational Quantum Optimization	Vojtěch Novák et.al.	2603.24140	translate	read	null
2026-03-25	Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection	Zhanhe Lei et.al.	2603.24139	translate	read	null
2026-03-25	Likelihood hacking in probabilistic program synthesis	Jacek Karwowski et.al.	2603.24126	translate	read	null
2026-03-25	Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization	Fei Bai et.al.	2603.24093	translate	read	null
2026-03-25	Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning	Aditya Narendra et.al.	2603.24083	translate	read	null
2026-03-25	PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning	Huanyu Li et.al.	2603.24047	translate	read	null
2026-03-25	Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage	Rishikesh Sahay et.al.	2603.23966	translate	read	null
2026-03-25	From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments	Lijing Luo et.al.	2603.23964	translate	read	null
2026-03-25	PointRFT: Explicit Reinforcement Fine-tuning for Point Cloud Few-shot Learning	Yankai Wang et.al.	2603.23957	translate	read	null
2026-03-25	Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs	Guy Zamir et.al.	2603.23926	translate	read	null
2026-03-25	Off-Policy Safe Reinforcement Learning with Constrained Optimistic Exploration	Guopeng Li et.al.	2603.23889	translate	read	null
2026-03-25	ProcureGym: A Multi-Agent Markov Game Framework for Modeling National Volume-based Drug Procurement	Jia Wang et.al.	2603.23880	translate	read	null
2026-03-25	The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search	Forest Agostinelli et.al.	2603.23873	translate	read	null
2026-03-25	HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation	Ken Ding et.al.	2603.23871	translate	read	null
2026-03-25	Joint Source-Channel-Check Coding with HARQ for Reliable Semantic Communications	Boyuan Li et.al.	2603.23869	translate	read	null
2026-03-25	Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation	Han Zheng et.al.	2603.23838	translate	read	null
2026-03-25	Human, AI, and Hybrid Ensembles for Detection of Adaptive, RL-based Social Bots	Valerio La Gatta et.al.	2603.23796	translate	read	null
2026-03-24	Self Paced Gaussian Contextual Reinforcement Learning	Mohsen Sahraei Ardakani et.al.	2603.23755	translate	read	null
2026-03-24	BXRL: Behavior-Explainable Reinforcement Learning	Ram Rachum et.al.	2603.23738	translate	read	null
2026-03-24	Dual-Gated Epistemic Time-Dilation: Autonomous Compute Modulation in Asynchronous MARL	Igor Jankowski et.al.	2603.23722	translate	read	null
2026-03-24	Utilizing Adversarial Training for Robust Voltage Control: An Adaptive Deep Reinforcement Learning Method	Sungjoo Chung et.al.	2603.23648	translate	read	null
2026-03-24	Safe Reinforcement Learning with Preference-based Constraint Inference	Chenglin Li et.al.	2603.23565	translate	read	null
2026-03-21	Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction	Haoyu Wang et.al.	2603.23550	translate	read	null
2026-03-24	UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation	Jie Liu et.al.	2603.23500	translate	read	null
2026-03-24	WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG	Zhen Li et.al.	2603.23497	translate	read	null
2026-03-24	End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions	Zakaria Mhammedi et.al.	2603.23461	translate	read	null
2026-03-24	SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling	Yiqi Zhang et.al.	2603.23414	translate	read	null
2026-03-24	A Joint Reinforcement Learning Scheduling and Compression Framework for Teleoperated Driving	Giacomo Avanzi et.al.	2603.23387	translate	read	null
2026-03-24	Off-Policy Value-Based Reinforcement Learning for Large Language Models	Peng-Yuan Wang et.al.	2603.23355	translate	read	null
2026-03-24	Learning Multi-Agent Local Collision-Avoidance for Collaborative Carrying tasks with Coupled Quadrupedal Robots	Francesca Bray et.al.	2603.23278	translate	read	null
2026-03-24	A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling	Ruisong Zhou et.al.	2603.23249	translate	read	null
2026-03-24	Neural ODE and SDE Models for Adaptation and Planning in Model-Based Reinforcement Learning	Chao Han et.al.	2603.23245	translate	read	null
2026-03-24	GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL	Haoyu Wang et.al.	2603.23232	translate	read	null
2026-03-24	ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment	Hao Wang et.al.	2603.23184	translate	read	null
2026-03-24	Path Planning and Reinforcement Learning-Driven Control of On-Orbit Free-Flying Multi-Arm Robots	Álvaro Belmonte-Baeza et.al.	2603.23182	translate	read	null
2026-03-24	Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems	Guoxin Su et.al.	2603.23113	translate	read	null
2026-03-24	SpecXMaster Technical Report	Yutang Ge et.al.	2603.23101	translate	read	null
2026-03-24	Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards	Orhun Buğra Baran et.al.	2603.23086	translate	read	null
2026-03-24	MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models	Jianxin Lin et.al.	2603.23085	translate	read	null
2026-03-24	Minimizing Material Waste in Additive Manufacturing through Online Reel Assignment	Ilayda Celenk et.al.	2603.23042	translate	read	null
2026-03-24	From Morality Installation in LLMs to LLMs in Morality-as-a-System	Gunter Bombaerts et.al.	2603.22944	translate	read	null
2026-03-24	Quality Over Clicks: Intrinsic Quality-Driven Iterative Reinforcement Learning for Cold-Start E-Commerce Query Suggestion	Qi Sun et.al.	2603.22922	translate	read	null
2026-03-24	EVA: Efficient Reinforcement Learning for End-to-End Video Agent	Yaolun Zhang et.al.	2603.22918	translate	read	null
2026-03-24	VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents	Pengsen Liu et.al.	2603.22892	translate	read	null
2026-03-24	Portfolio Optimization under Recursive Utility via Reinforcement Learning	Minkey Chang et.al.	2603.22880	translate	read	null
2026-03-24	Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models	Ruixing Jin et.al.	2603.22876	translate	read	null
2026-03-24	DecompGrind: A Decomposition Framework for Robotic Grinding via Cutting-Surface Planning and Contact-Force Adaptation	Shunsuke Araki et.al.	2603.22859	translate	read	null
2026-03-24	Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought	Yunheng Li et.al.	2603.22847	translate	read	null
2026-03-24	CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models	Youzhi Liu et.al.	2603.22846	translate	read	null
2026-03-24	Improving Safety Alignment via Balanced Direct Preference Optimization	Shiji Zhao et.al.	2603.22829	translate	read	null
2026-03-24	SG-VLA: Learning Spatially-Grounded Vision-Language-Action Models for Mobile Manipulation	Ruisen Tu et.al.	2603.22760	translate	read	null
2026-03-24	Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints	Tian Xu et.al.	2603.22713	translate	read	null
2026-03-23	Q-Tacit: Image Quality Assessment via Latent Visual Reasoning	Yuxuan Jiang et.al.	2603.22641	translate	read	null
2026-03-23	Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling	Young Hyun Cho et.al.	2603.22563	translate	read	null
2026-03-23	Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion	Honglin He et.al.	2603.22527	translate	read	null
2026-03-23	Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs	Haoming Meng et.al.	2603.22446	translate	read	null
2026-03-23	CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation	Max Fu et.al.	2603.22435	translate	read	null
2026-03-23	Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning	Rohan Deb et.al.	2603.22430	translate	read	null
2026-03-23	Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure	Davide Di Gioia et.al.	2603.22384	translate	read	null
2026-03-22	WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement	Fangyuan Li et.al.	2603.22352	translate	read	null
2026-03-19	The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis	Di Zhang et.al.	2603.22312	translate	read	null
2026-03-23	Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration	Zakaria Mhammedi et.al.	2603.22273	translate	read	null
2026-03-23	TiCo: Time-Controllable Training for Spoken Dialogue Models	Kai-Wei Chang et.al.	2603.22267	translate	read	null
2026-03-23	DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming	Hung-Chieh Fang et.al.	2603.22263	translate	read	null
2026-03-23	SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation	Sashuai Zhou et.al.	2603.22228	translate	read	null
2026-03-23	Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control	Qingrui Zhao et.al.	2603.22201	translate	read	null
2026-03-23	Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement	Junrong Guo et.al.	2603.22187	translate	read	null
2026-03-23	Cross-Modal Reinforcement Learning for Navigation with Degraded Depth Measurements	Omkar Sawant et.al.	2603.22182	translate	read	null
2026-03-23	Closed-Loop Verbal Reinforcement Learning for Task-Level Robotic Planning	Dmitrii Plotnikov et.al.	2603.22169	translate	read	null
2026-03-23	On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation	Kexin Huang et.al.	2603.22117	translate	read	null
2026-03-23	A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP	Xi Yang et.al.	2603.22083	translate	read	null
2026-03-23	MEVIUS2: Practical Open-Source Quadruped Robot with Sheet Metal Welding and Multimodal Perception	Kento Kawaharazuka et.al.	2603.22031	translate	read	null
2026-03-23	TREX: Trajectory Explanations for Multi-Objective Reinforcement Learning	Dilina Rajapakse et.al.	2603.21988	translate	read	null
2026-03-23	Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe	Xixi Wu et.al.	2603.21972	translate	read	null
2026-03-23	Deep Reinforcement Learning and The Tale of Two Temporal Difference Errors	Juan Sebastian Rojas et.al.	2603.21921	translate	read	null
2026-03-23	P^2O: Joint Policy and Prompt Optimization	Xinyu Lu et.al.	2603.21877	translate	read	null
2026-03-23	Manifold-Aware Exploration for Reinforcement Learning in Video Generation	Mingzhe Zheng et.al.	2603.21872	translate	read	null
2026-03-23	Agentic Personas for Adaptive Scientific Explanations with Knowledge Graphs	Susana Nunes et.al.	2603.21846	translate	read	null
2026-03-23	Partial Attention in Deep Reinforcement Learning for Safe Multi-Agent Control	Turki Bin Mohaya et.al.	2603.21810	translate	read	null
2026-03-23	Image-Conditioned Adaptive Parameter Tuning for Visual Odometry Frontends	Simone Nascivera et.al.	2603.21785	translate	read	null
2026-03-23	CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning	Dongxia Wu et.al.	2603.21743	translate	read	null
2026-03-23	EvoIdeator: Evolving Scientific Ideas through Checklist-Grounded Reinforcement Learning	Andreas Sauter et.al.	2603.21728	translate	read	null
2026-03-23	PPGL-Swarm: Integrated Multimodal Risk Stratification and Hereditary Syndrome Detection in Pheochromocytoma and Paraganglioma	Zelin Liu et.al.	2603.21700	translate	read	null
2026-03-23	TAMTRL: Teacher-Aligned Reward Reshaping for Multi-Turn Reinforcement Learning in Long-Context Compression	Li Wang et.al.	2603.21663	translate	read	null
2026-03-23	Proximal Policy Optimization in Path Space: A Schrödinger Bridge Perspective	Yuehu Gong et.al.	2603.21621	translate	read	null
2026-03-23	Spatio-Temporal Attention Enhanced Multi-Agent DRL for UAV-Assisted Wireless Networks with Limited Communications	Che Chen et.al.	2603.21594	translate	read	null
2026-03-23	Adaptive Robust Estimator for Multi-Agent Reinforcement Learning	Zhongyi Li et.al.	2603.21574	translate	read	null
2026-03-23	Counterfactual Credit Policy Optimization for Multi-Agent Collaboration	Zhongyi Li et.al.	2603.21563	translate	read	null
2026-03-23	What Do World Models Learn in RL? Probing Latent Representations in Learned Environment Simulators	Xinyu Zhang et.al.	2603.21546	translate	read	null
2026-03-23	VIGIL: Part-Grounded Structured Reasoning for Generalizable Deepfake Detection	Xinghan Li et.al.	2603.21526	translate	read	null
2026-03-23	Learning Can Converge Stably to the Wrong Belief under Latent Reliability	Zhipeng Zhang et.al.	2603.21491	translate	read	null
2026-03-23	DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation	Siqi Guo et.al.	2603.21465	translate	read	null
2026-03-22	KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning	Shuai Wang et.al.	2603.21440	translate	read	null
2026-03-22	Dynasto: Validity-Aware Dynamic-Static Parameter Optimization for Autonomous Driving Testing	Dmytro Humeniuk et.al.	2603.21427	translate	read	null
2026-03-22	PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost	Junkeun Yi et.al.	2603.21383	translate	read	null
2026-03-22	A transformer architecture alteration to incentivise externalised reasoning	Elizabeth Pavlova et.al.	2603.21376	translate	read	null
2026-03-22	RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models	Dongyoung Kim et.al.	2603.21341	translate	read	null
2026-03-22	FinRL-X: An AI-Native Modular Infrastructure for Quantitative Trading	Hongyang Yang et.al.	2603.21330	translate	read	null
2026-03-22	DeepXplain: XAI-Guided Autonomous Defense Against Multi-Stage APT Campaigns	Trung V. Phan et.al.	2603.21296	translate	read	null
2026-03-22	Prompt replay: speeding up grpo with on-policy reuse of high-signal prompts	Andrei Baroian et.al.	2603.21177	translate	read	null
2026-03-22	Reward Sharpness-Aware Fine-Tuning for Diffusion Models	Kwanyoung Kim et.al.	2603.21175	translate	read	null
2026-03-22	Rethinking Plasticity in Deep Reinforcement Learning	Zhiqiang He et.al.	2603.21173	translate	read	null
2026-03-22	Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning	Leonid Ugadiarov et.al.	2603.21162	translate	read	null
2026-03-22	Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues	Wenjin Hou et.al.	2603.21138	translate	read	null
2026-03-22	Anatomical Prior-Driven Framework for Autonomous Robotic Cardiac Ultrasound Standard View Acquisition	Zhiyan Cao et.al.	2603.21134	translate	read	null
2026-03-22	VisFly-Lab: Unified Differentiable Framework for First-Order Reinforcement Learning of Quadrotor Control	Fanxing Li et.al.	2603.21123	translate	read	null
2026-03-22	Learning to Optimize Joint Source and RIS-assisted Channel Encoding for Multi-User Semantic Communication Systems	Haidong Wang et.al.	2603.21097	translate	read	null
2026-03-22	DRL-driven Online Optimization for Joint Traffic Reshaping and Channel Reconfiguration in RIS-assisted Semantic NOMA Communications	Songhan Zhao et.al.	2603.21093	translate	read	null
2026-03-22	LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning	Jianing Wang et.al.	2603.21065	translate	read	null
2026-03-22	OrbitStream: Training-Free Adaptive 360-degree Video Streaming via Semantic Potential Fields	Aizierjiang Aiersilan et.al.	2603.20999	translate	read	null
2026-03-22	The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes	Benedikt Hornig et.al.	2603.20994	translate	read	null
2026-03-21	Cyber Deception for Mission Surveillance via Hypergame-Theoretic Deep Reinforcement Learning	Zelin Wan et.al.	2603.20981	translate	read	null
2026-03-21	Deep Adaptive Rate Allocation in Volatile Heterogeneous Wireless Networks	Gregorio Maglione et.al.	2603.20926	translate	read	null
2026-03-21	EruDiff: Refactoring Knowledge in Diffusion Models for Advanced Text-to-Image Synthesis	Xiefan Guo et.al.	2603.20828	translate	read	null
2026-03-21	RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and a Simple Solution	Kaiyuan Li et.al.	2603.20799	translate	read	null
2026-03-21	Enhanced Direction-Sensing Methods and Performance Analysis in Low-Altitude Wireless Network via a Rotation Antenna Array	Jinbing Jiang et.al.	2603.20784	translate	read	null
2026-03-21	Decoupling Numerical and Structural Parameters: An Empirical Study on Adaptive Genetic Algorithms via Deep Reinforcement Learning for the Large-Scale TSP	Hongyu Wang et.al.	2603.20702	translate	read	null
2026-03-21	Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs	Huan Zheng et.al.	2603.20698	translate	read	null
2026-03-21	AI-Driven Multi-Agent Simulation of Stratified Polyamory Systems: A Computational Framework for Optimizing Social Reproductive Efficiency	Yicai Xing et.al.	2603.20678	translate	read	null
2026-03-21	Speedup Patch: Learning a Plug-and-Play Policy to Accelerate Embodied Manipulation	Zhichao Wu et.al.	2603.20658	translate	read	null
2026-03-21	Hierarchical Reinforcement Learning for Next Generation of Multi-AP Coordinated Spatial Reuse	Ziru Chen et.al.	2603.20647	translate	read	null
2026-03-21	Reinforcement Learning-Based Secure Near-field Directional Modulation Enhanced by Rotatable RIS	Yongqiang Li et.al.	2603.20608	translate	read	null
2026-03-21	Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models	Zhilong Zhang et.al.	2603.20607	translate	read	null
2026-03-21	Current state of the multi-agent multi-view experimental and digital twin rendezvous (MMEDR-Autonomous) framework	Logan Banker et.al.	2603.20575	translate	read	null
2026-03-20	Delightful Distributed Policy Gradient	Ian Osband et.al.	2603.20521	translate	read	null
2026-03-20	Grounded Chess Reasoning in Language Models via Master Distillation	Zhenwei Tang et.al.	2603.20510	translate	read	null
2026-03-20	Fluid Antenna Networks Beyond Beamforming: An AI-Native Control Paradigm for 6G	Ian F. Akyildiz et.al.	2603.20484	translate	read	null
2026-03-20	Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret	Ming Shi et.al.	2603.20453	translate	read	null
2026-03-20	SymCircuit: Bayesian Structure Inference for Tractable Probabilistic Circuits via Entropy-Regularized Reinforcement Learning	Y. Sungtaek Ju et.al.	2603.20392	translate	read	null
2026-03-20	CAMA: Exploring Collusive Adversarial Attacks in c-MARL	Men Niu et.al.	2603.20390	translate	read	null
2026-03-20	Leum-VL Technical Report	Yuxuan He et.al.	2603.20354	translate	read	null
2026-03-20	Bounded Coupled AI Learning Dynamics in Tri-Hierarchical Drone Swarms	Oleksii Bychkov et.al.	2603.20333	translate	read	null
2026-03-19	MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery	Dong Li et.al.	2603.20295	translate	read	null
2026-03-17	Learning Communication Between Heterogeneous Agents in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence	Alex Popa et.al.	2603.20279	translate	read	null
2026-03-20	AGILE: A Comprehensive Workflow for Humanoid Loco-Manipulation Learning	Huihua Zhao et.al.	2603.20147	translate	read	null
2026-03-20	Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning	Jiajie Li et.al.	2603.20116	translate	read	null
2026-03-20	Fine-tuning Timeseries Predictors Using Reinforcement Learning	Hugo Cazaux et.al.	2603.20063	translate	read	null
2026-03-20	Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs	Wenjian Zhang et.al.	2603.20046	translate	read	null
2026-03-20	ReViSQL: Achieving Human-Level Text-to-SQL	Yuxuan Zhu et.al.	2603.20004	translate	read	null
2026-03-20	Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States	Yurun Yuan et.al.	2603.19987	translate	read	null
2026-03-20	Interpreting Reinforcement Learning Model Behavior via Koopman with Control	William T. Redman et.al.	2603.19968	translate	read	null
2026-03-20	GustPilot: A Hierarchical DRL-INDI Framework for Wind-Resilient Quadrotor Navigation	Amir Atef Habel et.al.	2603.19966	translate	read	null
2026-03-20	SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia	Zhixiang Lu et.al.	2603.19931	translate	read	null
2026-03-20	Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach	Anouar Nechi et.al.	2603.19930	translate	read	null
2026-03-20	Learning Adaptive Parameter Policies for Nonlinear Bayesian Filtering	Ondrej Straka et.al.	2603.19910	translate	read	null
2026-03-20	What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time	Dong Yan et.al.	2603.19880	translate	read	null
2026-03-20	NASimJax: GPU-Accelerated Policy Learning Framework for Penetration Testing	Raphael Simon et.al.	2603.19864	translate	read	null
2026-03-20	FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization	Chiyu Ma et.al.	2603.19835	translate	read	null
2026-03-20	Generalized Task-Driven Design of Soft Robots via Reduced-Order FEM-based Surrogate Modeling	Yao Yao et.al.	2603.19794	translate	read	null
2026-03-20	FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment	Kewen Zhu et.al.	2603.19741	translate	read	null
2026-03-20	LoopRPT: Reinforcement Pre-Training for Looped Language Models	Guo Tang et.al.	2603.19714	translate	read	null
2026-03-20	A Subgoal-driven Framework for Improving Long-Horizon LLM Agents	Taiyi Wang et.al.	2603.19685	translate	read	null
2026-03-20	Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis	Siddharth Chandak et.al.	2603.19648	translate	read	null
2026-03-20	ContractionPPO: Certified Reinforcement Learning via Differentiable Contraction Layers	Vrushabh Zinage et.al.	2603.19632	translate	read	null
2026-03-20	DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management	Yaqi Xie et.al.	2603.19621	translate	read	null
2026-03-20	SaFRO: Satisfaction-Aware Fusion via Dual-Relative Policy Optimization for Short-Video Search	Renzhe Zhou et.al.	2603.19585	translate	read	null
2026-03-20	PA2D-MORL: Pareto Ascent Directional Decomposition based Multi-Objective Reinforcement Learning	Tianmeng Hu et.al.	2603.19579	translate	read	null
2026-03-20	Learning to Bet for Horizon-Aware Anytime-Valid Testing	Ege Onur Taga et.al.	2603.19551	translate	read	null
2026-03-20	EvidenceRL: Reinforcing Evidence Consistency for Trustworthy Language Models	J. Ben Tamo et.al.	2603.19532	translate	read	null
2026-03-19	Stochastic Sequential Decision Making over Expanding Networks with Graph Filtering	Zhan Gao et.al.	2603.19501	translate	read	null
2026-03-19	Teaching an Agent to Sketch One Part at a Time	Xiaodan Du et.al.	2603.19500	translate	read	null
2026-03-19	Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids	Lucas Ferraz et.al.	2603.19473	translate	read	null
2026-03-19	ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models	Thomas De Min et.al.	2603.19466	translate	read	null
2026-03-19	Deep Hilbert–Galerkin Methods for Infinite-Dimensional PDEs and Optimal Control	Samuel N. Cohen et.al.	2603.19463	translate	read	null
2026-03-19	Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social Dilemmas	Víctor Gallego et.al.	2603.19453	translate	read	null
2026-03-19	Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning	Xueqiao Peng et.al.	2603.19397	translate	read	null
2026-03-18	Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification	Zenan Li et.al.	2603.19329	translate	read	null
2026-03-19	OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards	Zehao Li et.al.	2603.19191	translate	read	null
2026-03-19	Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving	Huiwen Yan et.al.	2603.19188	translate	read	null
2026-03-19	Box Maze: A Process-Control Architecture for Reliable LLM Reasoning	Zou Qiang et.al.	2603.19182	translate	read	null
2026-03-19	VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models	Chonghan Liu et.al.	2603.19152	translate	read	null
2026-03-19	Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control	Mohammad Al Ridhawi et.al.	2603.19136	translate	read	null
2026-03-19	Variational and Annealing-Based Approaches to Quantum Combinatorial Optimization	Hala Hawashin et.al.	2603.19117	translate	read	null
2026-03-19	Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning	Sangwoo Shin et.al.	2603.19078	translate	read	null
2026-03-19	MoRI: Learning Motivation-Grounded Reasoning for Scientific Ideation in Large Language Models	Chenyang Gu et.al.	2603.19044	translate	read	null
2026-03-19	CRAFT: Aligning Diffusion Models with Fine-Tuning Is Easier Than You Think	Zening Sun et.al.	2603.18991	translate	read	null
2026-03-19	Maximum-Entropy Exploration with Future State-Action Visitation Measures	Adrien Bolland et.al.	2603.18965	translate	read	null
2026-03-19	Context Bootstrapped Reinforcement Learning	Saaket Agashe et.al.	2603.18953	translate	read	null
2026-03-19	Safety-Guaranteed Imitation Learning from Nonlinear Model Predictive Control for Spacecraft Close Proximity Operations	Alexander Meinert et.al.	2603.18910	translate	read	null
2026-03-19	MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model	Youngwan Lee et.al.	2603.18892	translate	read	null
2026-03-19	Bridging Network Fragmentation: A Semantic-Augmented DRL Framework for UAV-aided VANETs	Gaoxiang Cao et.al.	2603.18871	translate	read	null
2026-03-19	RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models	Xiao Feng et.al.	2603.18859	translate	read	null
2026-03-19	Learn for Variation: Variationally Guided AAV Trajectory Learning in Differentiable Environments	Xiucheng Wang et.al.	2603.18853	translate	read	null
2026-03-19	ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents	Hao Zhang et.al.	2603.18815	translate	read	null
2026-03-19	V-Dreamer: Automating Robotic Simulation and Trajectory Synthesis via Video Generation Priors	Songjia He et.al.	2603.18811	translate	read	null
2026-03-19	Mi:dm K 2.5 Pro	KT Tech innovation Group et.al.	2603.18788	translate	read	null
2026-03-19	ViTac-Tracing: Visual-Tactile Imitation Learning of Deformable Object Tracing	Yongqiang Zhao et.al.	2603.18784	translate	read	null
2026-03-19	Automatic Configuration of LLM Post-Training Pipelines	Channe Chwa et.al.	2603.18773	translate	read	null
2026-03-19	Memento-Skills: Let Agents Design Agents	Huichi Zhou et.al.	2603.18743	translate	read	null
2026-03-19	CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks	Hao Wang et.al.	2603.18736	translate	read	null
2026-03-19	HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning	Zhicong Lu et.al.	2603.18683	translate	read	null
2026-03-19	Thinking with Constructions: A Benchmark and Policy Optimization for Visual-Text Interleaved Geometric Reasoning	Haokun Zhao et.al.	2603.18662	translate	read	null
2026-03-19	Balanced Thinking: Improving Chain of Thought Training in Vision Language Models	Shaked Perek et.al.	2603.18656	translate	read	null
2026-03-19	Learning to Self-Evolve	Xiaoyin Chen et.al.	2603.18620	translate	read	null
2026-03-19	iSatCR: Graph-Empowered Joint Onboard Computing and Routing for LEO Data Delivery	Jiangtao Luo et.al.	2603.18539	translate	read	null
2026-03-19	Balancing the Reasoning Load: Difficulty-Differentiated Policy Optimization with Length Redistribution for Efficient and Robust Reinforcement Learning	Yinan Xia et.al.	2603.18533	translate	read	null
2026-03-19	Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds	Andrew Choi et.al.	2603.18532	translate	read	null
2026-03-19	AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models	Chengxuan Lu et.al.	2603.18464	translate	read	null
2026-03-19	Discounted Beta–Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards	Haechan Kim et.al.	2603.18444	translate	read	null
2026-03-19	Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation	Asmita Bhardwaj et.al.	2603.18428	translate	read	null
2026-03-19	Efficient and Versatile Quadrupedal Skating: Optimal Co-design via Reinforcement Learning and Bayesian Optimization	Hanwen Wang et.al.	2603.18408	translate	read	null
2026-03-19	RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach	Yifan Zhang et.al.	2603.18396	translate	read	null
2026-03-19	Mathematical Foundations of Deep Learning	Xiaojing Ye et.al.	2603.18387	translate	read	null
2026-03-19	PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching	Ruishuo Chen et.al.	2603.18363	translate	read	null
2026-03-18	Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration	Amirhossein Roknilamouki et.al.	2603.18326	translate	read	null
2026-03-18	Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum	Nived Rajaraman et.al.	2603.18325	translate	read	null
2026-03-18	DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving	Zilin Huang et.al.	2603.18315	translate	read	null
2026-03-18	Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning	Kaiyang Li et.al.	2603.18314	translate	read	null
2026-03-18	Discovering What You Can Control: Interventional Boundary Discovery for Reinforcement Learning	Jiaxin Liu et.al.	2603.18257	translate	read	null
2026-03-18	MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasonning Models	Philippe Formont et.al.	2603.18256	translate	read	null
2026-03-18	How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence	Alex Anvi Eponon et.al.	2603.18203	translate	read	null
2026-03-18	R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation	Naoki Morihira et.al.	2603.18202	translate	read	null
2026-03-18	Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models	Yuhao Dong et.al.	2603.18118	translate	read	null
2026-03-18	BoundAD: Boundary-Aware Negative Generation for Time Series Anomaly Detection	Xiancheng Wang et.al.	2603.18111	translate	read	null
2026-03-18	Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner	Hao Ma et.al.	2603.18088	translate	read	null
2026-03-18	Uncovering Latent Phase Structures and Branching Logic in Locomotion Policies: A Case Study on HalfCheetah	Daisuke Yasui et.al.	2603.18084	translate	read	null
2026-03-18	SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training	Prince Zizhuang Wang et.al.	2603.18079	translate	read	null
2026-03-18	Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction	Yi Yu et.al.	2603.18074	translate	read	null
2026-03-18	Reinforcement Learning for Fast and Robust Longitudinal Qubit Readout	Yiming Yu et.al.	2603.18060	translate	read	null
2026-03-18	Unified Policy Value Decomposition for Rapid Adaptation	Cristiano Capone et.al.	2603.17947	translate	read	null
2026-03-18	Training Diffusion Language Models for Black-Box Optimization	Zipeng Sun et.al.	2603.17919	translate	read	null
2026-03-18	Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs	Abhishek Gupta et.al.	2603.17875	translate	read	null
2026-03-18	Procedural Generation of Algorithm Discovery Tasks in Machine Learning	Alexander D. Goldie et.al.	2603.17863	translate	read	null
2026-03-18	Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control	Zunzhe Zhang et.al.	2603.17834	translate	read	null
2026-03-18	CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents	Lintang Sutawika et.al.	2603.17829	translate	read	null
2026-03-18	Federated Distributional Reinforcement Learning with Distributional Critic Regularization	David Millard et.al.	2603.17820	translate	read	null
2026-03-18	EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards	Ruixiang Wang et.al.	2603.17808	translate	read	null
2026-03-18	CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution	Teng Pan et.al.	2603.17775	translate	read	null
2026-03-18	Fast stabilizer state preparation via AI-optimized graph decimation	Michael Doherty et.al.	2603.17743	translate	read	null
2026-03-18	VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning	Tianxing Zhou et.al.	2603.17720	translate	read	null
2026-03-18	Machine Learning for Network Attacks Classification and Statistical Evaluation of Machine Learning for Network Attacks Classification and Adversarial Learning Methodologies for Synthetic Data Generation	Iakovos-Christos Zarkadis et.al.	2603.17717	translate	read	null
2026-03-18	Flow Matching Policy with Entropy Regularization	Ting Gao et.al.	2603.17685	translate	read	null
2026-03-18	Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards	Philipp Normann et.al.	2603.17673	translate	read	null
2026-03-18	Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies	Sinan Ibrahim et.al.	2603.17631	translate	read	null
2026-03-18	Complementary Reinforcement Learning	Dilxat Muhtar et.al.	2603.17621	translate	read	null
2026-03-18	From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation	Pujun Zheng et.al.	2603.17588	translate	read	null
2026-03-18	Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation	Tharun Sethuraman et.al.	2603.17510	translate	read	null
2026-03-18	Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control	Hao Ma et.al.	2603.17468	translate	read	null
2026-03-18	AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization	Dailan He et.al.	2603.17461	translate	read	null
2026-03-18	CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval	Guangzhi Wang et.al.	2603.17387	translate	read	null
2026-03-18	Efficient Exploration at Scale	Seyed Mohammad Asghari et.al.	2603.17378	translate	read	null
2026-03-18	EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection	Chenyang Zhu et.al.	2603.17343	translate	read	null
2026-03-18	A Progressive Visual-Logic-Aligned Framework for Ride-Hailing Adjudication	Weiming Wu et.al.	2603.17328	translate	read	null
2026-03-18	ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling	Ang Li et.al.	2603.17324	translate	read	null
2026-03-18	Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing	Aniruddha Bora et.al.	2603.17319	translate	read	null
2026-03-18	Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress	Yuelin Zhang et.al.	2603.17312	translate	read	null
2026-03-18	Ruyi2.5 Technical Report	Huan Song et.al.	2603.17311	translate	read	null
2026-03-18	InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning	Chengwei Wei et.al.	2603.17310	translate	read	null
2026-03-18	ReLMXEL: Adaptive RL-Based Memory Controller with Explainable Energy and Latency Optimization	Panuganti Chirag Sai et.al.	2603.17309	translate	read	null
2026-03-18	Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations	Haozheng Luo et.al.	2603.17305	translate	read	null
2026-03-18	WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation	Zahin Sufiyan et.al.	2603.17301	translate	read	null
2026-03-18	Network and Device Level Cyber Deception for Contested Environments Using RL and LLMs	Abhijeet Sahu et.al.	2603.17272	translate	read	null
2026-03-18	Adaptive Anchor Policies for Efficient 4D Gaussian Streaming	Ashim Dahal et.al.	2603.17227	translate	read	null
2026-03-17	MetaClaw: Just Talk – An Agent That Meta-Learns and Evolves in the Wild	Peng Xia et.al.	2603.17187	translate	read	null
2026-03-17	Shielded Reinforcement Learning Under Dynamic Temporal Logic Constraints	Sadık Bera Yüksel et.al.	2603.17152	translate	read	null
2026-03-17	REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge	Yasi Zhang et.al.	2603.17145	translate	read	null
2026-03-17	SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion	Elham Daneshmand et.al.	2603.17092	translate	read	null
2026-03-17	CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning	Weikun K. Zhang et.al.	2603.17075	translate	read	null
2026-03-17	PaAgent: Portrait-Aware Image Restoration Agent via Subjective-Objective Reinforcement Learning	Yijian Wang et.al.	2603.17055	translate	read	null
2026-03-17	Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models	Songchun Zhang et.al.	2603.17051	translate	read	null
2026-03-17	HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning	Shenzhi Wang et.al.	2603.17024	translate	read	null
2026-03-17	Efficient and Reliable Teleoperation through Real-to-Sim-to-Real Shared Autonomy	Shuo Sha et.al.	2603.17016	translate	read	null
2026-03-17	Rewarding DINO: Predicting Dense Rewards with Vision Foundation Models	Pierre Krack et.al.	2603.16978	translate	read	null
2026-03-17	DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns	Trung V. Phan et.al.	2603.16969	translate	read	null
2026-03-17	Efficient Reasoning on the Edge	Yelysei Bondarenko et.al.	2603.16867	translate	read	null
2026-03-17	DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models	Emily Yue-Ting Jia et.al.	2603.16860	translate	read	null
2026-03-17	Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning	Jello Zhou et.al.	2603.16842	translate	read	null
2026-03-17	Learning to Present: Inverse Specification Rewards for Agentic Slide Generation	Karthik Ragunath Ananda Kumar et.al.	2603.16839	translate	read	null
2026-03-17	Deep Reinforcement Learning-driven Edge Offloading for Latency-constrained XR pipelines	Sourya Saha et.al.	2603.16823	translate	read	null
2026-03-17	Anticipatory Planning for Multimodal AI Agents	Yongyuan Liang et.al.	2603.16777	translate	read	null
2026-03-16	GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution	Qiaosi Yi et.al.	2603.16769	translate	read	null
2026-03-17	Learning Whole-Body Control for a Salamander Robot	Mengze Tian et.al.	2603.16683	translate	read	null
2026-03-17	When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making	Jun Liu et.al.	2603.16673	translate	read	null
2026-03-17	What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline	Benoît Alcaraz et.al.	2603.16651	translate	read	null
2026-03-17	Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLM Reward Models	Weijie Qiu et.al.	2603.16600	translate	read	null
2026-03-17	When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective	Zelin Zhang et.al.	2603.16578	translate	read	null
2026-03-17	EmoLLM: Appraisal-Grounded Cognitive-Emotional Co-Reasoning in Large Language Models	Yifei Zhang et.al.	2603.16553	translate	read	null
2026-03-17	Kamino: GPU-based Massively Parallel Simulation of Multi-Body Systems with Challenging Topologies	Vassilios Tsounis et.al.	2603.16536	translate	read	null
2026-03-17	From the Inside Out: Progressive Distribution Refinement for Confidence Calibration	Xizhong Yang et.al.	2603.16500	translate	read	null
2026-03-17	Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems	Marios Aristodemou et.al.	2603.16470	translate	read	null
2026-03-17	Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition	Yu Liu et.al.	2603.16463	translate	read	null
2026-03-17	Agentic AI for SAGIN Resource Management_Semantic Awareness, Orchestration, and Optimization	Linghao Zhang et.al.	2603.16458	translate	read	null
2026-03-17	TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas	Ai Jian et.al.	2603.16448	translate	read	null
2026-03-17	Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences	Quan Cheng et.al.	2603.16417	translate	read	null
2026-03-17	Onboard MuJoCo-based Model Predictive Control for Shipboard Crane with Double-Pendulum Sway Suppression	Oscar Pang et.al.	2603.16407	translate	read	null
2026-03-17	Deep Reinforcement Learning-Assisted Automated Operator Portfolio for Constrained Multi-objective Optimization	Shuai Shao et.al.	2603.16401	translate	read	null
2026-03-17	Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement	Yusuke Nishii et.al.	2603.16384	translate	read	null
2026-03-17	Agile Interception of a Flying Target using Competitive Reinforcement Learning	Timothée Gavin et.al.	2603.16279	translate	read	null
2026-03-17	VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment	Tengjiao Yin et.al.	2603.16271	translate	read	null
2026-03-17	Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism	Kaixuan Du et.al.	2603.16223	translate	read	null
2026-03-17	Offline Exploration-Aware Fine-Tuning for Long-Chain Mathematical Reasoning	Yongyu Mu et.al.	2603.16206	translate	read	null
2026-03-17	Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning	Haomin Wang et.al.	2603.16189	translate	read	null
2026-03-17	ECHO: Edge-Cloud Humanoid Orchestration for Language-to-Motion Control	Haozhe Jia et.al.	2603.16188	translate	read	null
2026-03-17	Task-Specified Compliance Bounds for Humanoids via Lipschitz-Constrained Policies	Zewen He et.al.	2603.16180	translate	read	null
2026-03-17	SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation	Long Li et.al.	2603.16161	translate	read	null
2026-03-17	Execution-Grounded Credit Assignment for GRPO in Code Generation	Abhijit Kumar et.al.	2603.16158	translate	read	null
2026-03-17	DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay	Long Li et.al.	2603.16157	translate	read	null
2026-03-17	HIPO: Instruction Hierarchy via Constrained Reinforcement Learning	Keru Chen et.al.	2603.16152	translate	read	null
2026-03-17	Communication-Aware Multi-Agent Reinforcement Learning for Decentralized Cooperative UAV Deployment	Enguang Fan et.al.	2603.16141	translate	read	null
2026-03-17	Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards	Yuxuan Zhu et.al.	2603.16140	translate	read	null
2026-03-17	SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding	Songcheng Cai et.al.	2603.16124	translate	read	null
2026-03-17	Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models	Yanru Wu et.al.	2603.16065	translate	read	null
2026-03-17	ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning	Yu Li et.al.	2603.16060	translate	read	null
2026-03-17	Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition	Xiaozhou Ye et.al.	2603.16043	translate	read	null
2026-03-16	Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning	Jingxiang Chen et.al.	2603.15981	translate	read	null
2026-03-16	ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors	Zifan Xu et.al.	2603.15956	translate	read	null
2026-03-16	Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions	Goutam Das et.al.	2603.15907	translate	read	null
2026-03-16	Counteractive RL: Rethinking Core Principles for Efficient and Scalable Deep Reinforcement Learning	Ezgi Korkmaz et.al.	2603.15871	translate	read	null
2026-03-16	Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning	Patrick Yin et.al.	2603.15789	translate	read	null
2026-03-16	CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving	Yihong Guo et.al.	2603.15771	translate	read	null
2026-03-16	Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation	Jacob Levy et.al.	2603.15759	translate	read	null
2026-03-16	Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models	Lit Sin Tan et.al.	2603.15724	translate	read	null
2026-03-16	BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator	Ruyi Zhang et.al.	2603.15692	translate	read	null
2026-03-16	GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering	Xincheng Shuai et.al.	2603.15616	translate	read	null
2026-03-16	HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions	Yukang Cao et.al.	2603.15612	translate	read	null
2026-03-16	Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning	Aozhe Wang et.al.	2603.15611	translate	read	null
2026-03-16	From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation	Yibin Liu et.al.	2603.15600	translate	read	null
2026-03-16	Unbiased and Biased Variance-Reduced Forward-Reflected-Backward Splitting Methods for Stochastic Composite Inclusions	Quoc Tran-Dinh et.al.	2603.15576	translate	read	null
2026-03-16	Deep Reinforcement Learning for Fano Hypersurfaces	Marc Truter et.al.	2603.15437	translate	read	null
2026-03-16	Listening to the Echo: User-Reaction Aware Policy Optimization via Scalar-Verbal Hybrid Reinforcement Learning	Jing Ye et.al.	2603.15434	translate	read	null
2026-03-16	Gym-V: A Unified Vision Environment System for Agentic Vision Research	Fanqing Meng et.al.	2603.15432	translate	read	null
2026-03-16	MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings	Shahil Shaik et.al.	2603.15418	translate	read	null
2026-03-16	Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities	Vanshaj Khattar et.al.	2603.15417	translate	read	null
2026-03-16	Fusian: Multi-LoRA Fusion for Fine-Grained Continuous MBTI Personality Control in Large Language Models	Zehao Chen et.al.	2603.15405	translate	read	null
2026-03-16	Trajectory-Diversity-Driven Robust Vision-and-Language Navigation	Jiangyang Li et.al.	2603.15370	translate	read	null
2026-03-16	NavThinker: Action-Conditioned World Models for Coupled Prediction and Planning in Social Navigation	Tianshuai Hu et.al.	2603.15359	translate	read	null
2026-03-16	Evaluating the Robustness of Reinforcement Learning based Adaptive Traffic Signal Control	Dickens Kwesiga et.al.	2603.15283	translate	read	null
2026-03-16	MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers	Kangjun Guo et.al.	2603.15265	translate	read	null
2026-03-16	Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search	Mengxiang Chen et.al.	2603.15262	translate	read	null
2026-03-16	SAGE: Multi-Agent Self-Evolution for LLM Reasoning	Yulin Peng et.al.	2603.15255	translate	read	null
2026-03-16	Towards Foundation Models for Consensus Rank Aggregation	Yijun Jin et.al.	2603.15218	translate	read	null
2026-03-16	What Matters for Scalable and Robust Learning in End-to-End Driving Planners?	David Holtz et.al.	2603.15185	translate	read	null
2026-03-16	Iterative Learning Control-Informed Reinforcement Learning for Batch Process Control	Runze Lin et.al.	2603.15180	translate	read	null
2026-03-16	KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots	Xiaoyi Wei et.al.	2603.15179	translate	read	null
2026-03-16	Multi-Scale Control of Large Agent Populations: From Density Dynamics to Individual Actuation	Mario di Bernardo et.al.	2603.15160	translate	read	null
2026-03-16	Master Micro Residual Correction with Adaptive Tactile Fusion and Force-Mixed Control for Contact-Rich Manipulation	Xingting Li et.al.	2603.15152	translate	read	null
2026-03-16	Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies	Mumuksh Tayal et.al.	2603.15136	translate	read	null
2026-03-16	MMKU-Bench: A Multimodal Update Benchmark for Diverse Visual Knowledge	Baochen Fu et.al.	2603.15117	translate	read	null
2026-03-16	Sampling-guided exploration of active feature selection policies	Gabriel Bernardino et.al.	2603.15110	translate	read	null
2026-03-16	HALO:Closing Sim-to-Real Gap for Heavy-loaded Humanoid Agile Motion Skills via Differentiable Simulation	Xingyi Wang et.al.	2603.15084	translate	read	null
2026-03-16	Writer-R1: Enhancing Generative Writing in LLMs via Memory-augmented Replay Policy Optimization	Jihao Zhao et.al.	2603.15061	translate	read	null
2026-03-16	Interference-Aware K-Step Reachable Communication in Multi-Agent Reinforcement Learning	Ziyu Cheng et.al.	2603.15054	translate	read	null
2026-03-16	CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control	Gelu Liu et.al.	2603.15013	translate	read	null
2026-03-16	Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing	Jiahe Song et.al.	2603.15011	translate	read	null
2026-03-16	CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models	Xiaojun Shan et.al.	2603.14957	translate	read	null
2026-03-16	EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing	Zitong Xu et.al.	2603.14916	translate	read	null
2026-03-16	PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning	Yinfeng Gao et.al.	2603.14908	translate	read	null
2026-03-16	ViSA: Visited-State Augmentation for Generalized Goal-Space Contrastive Reinforcement Learning	Issa Nakamura et.al.	2603.14887	translate	read	null
2026-03-16	Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning	Mikoto Kudo et.al.	2603.14867	translate	read	null
2026-03-16	Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks	Zijian Yu et.al.	2603.14864	translate	read	null
2026-03-16	Ego to World: Collaborative Spatial Reasoning in Embodied Systems via Reinforcement Learning	Heng Zhou et.al.	2603.14811	translate	read	null
2026-03-16	DeFRiS: Silo-Cooperative IoT Applications Scheduling via Decentralized Federated Reinforcement Learning	Zhiyu Wang et.al.	2603.14729	translate	read	null
2026-03-15	VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting	Daeun Lee et.al.	2603.14659	translate	read	null
2026-03-15	EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees	Saad Alqithami et.al.	2603.14625	translate	read	null
2026-03-15	A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study	Jingyi Liu et.al.	2603.14600	translate	read	null
2026-03-15	Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning	Jingyi Liu et.al.	2603.14589	translate	read	null
2026-03-15	Machine Learning-Driven Intelligent Memory System Design: From On-Chip Caches to Storage	Rahul Bera et.al.	2603.14583	translate	read	null
2026-03-15	MorFiC: Fixing Value Miscalibration for Zero-Shot Quadruped Transfer	Prakhar Mishra et.al.	2603.14554	translate	read	null
2026-03-15	Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms	Jingyi Liu et.al.	2603.14535	translate	read	null
2026-03-15	VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning	Chaoyang Wang et.al.	2603.14523	translate	read	null
2026-03-15	AI Can Learn Scientific Taste	Jingqi Tong et.al.	2603.14473	translate	read	link
2026-03-15	Physics-Informed Policy Optimization via Analytic Dynamics Regularization	Namai Chandra et.al.	2603.14469	translate	read	null
2026-03-15	eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation	Prithvi Jai Ramesh et.al.	2603.14397	translate	read	null
2026-03-15	From $\boldsymbol{\logπ}$ to $\boldsymbolπ$ : Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight	Xiaoliang Fu et.al.	2603.14389	translate	read	null
2026-03-15	SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI	Parth Patne et.al.	2603.14380	translate	read	null
2026-03-15	Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling	Suvadeep Hajra et.al.	2603.14355	translate	read	null
2026-03-15	VIP-Loco: A Visually Guided Infinite Horizon Planning Framework for Legged Locomotion	Aditya Shirwatkar et.al.	2603.14345	translate	read	null
2026-03-15	AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models	Jiarui Zhang et.al.	2603.14342	translate	read	null
2026-03-15	Data-Driven Physics Embedded Dynamics with Predictive Control and Reinforcement Learning for Quadrupeds	Prakrut Kotecha et.al.	2603.14333	translate	read	null
2026-03-15	Load-Aware Locomotion Control for Humanoid Robots in Industrial Transportation Tasks	Lequn Fu et.al.	2603.14308	translate	read	null
2026-03-15	RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment	Yujia Wang et.al.	2603.14297	translate	read	null
2026-03-15	MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos	Sagnik Majumder et.al.	2603.14252	translate	read	null
2026-03-15	GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies	He Zhang et.al.	2603.14245	translate	read	null
2026-03-15	Understanding Strategic Platform Entry and Seller Exploration: A Stackelberg Model	Garrett Seo et.al.	2603.14206	translate	read	null
2026-03-12	HumDex:Humanoid Dexterous Manipulation Made Easy	Liang Heng et.al.	2603.12260	translate	read	null
2026-03-12	DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning	Yujie Wei et.al.	2603.12257	translate	read	null
2026-03-12	Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing	Baifeng Shi et.al.	2603.12254	translate	read	null
2026-03-12	Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation	Xiangyu Zhao et.al.	2603.12247	translate	read	null
2026-03-12	Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training	Yixin Liu et.al.	2603.12246	translate	read	null
2026-03-12	Separable neural architectures as a primitive for unified predictive and generative intelligence	Reza T. Batley et.al.	2603.12244	translate	read	null
2026-03-12	HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies	Amber Xie et.al.	2603.12243	translate	read	null
2026-03-12	Integrated Online Monitoring and Adaption of Process Model Predictive Controllers	Samuel Mallick et.al.	2603.12187	translate	read	null
2026-03-12	LatentGeo: Learnable Auxiliary Constructions in Latent Space for Multimodal Geometric Reasoning	Haiying Xu et.al.	2603.12166	translate	read	null
2026-03-12	IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL	Zhoujun Cheng et.al.	2603.12151	translate	read	null
2026-03-12	Linking Perception, Confidence and Accuracy in MLLMs	Yuetian Du et.al.	2603.12149	translate	read	null
2026-03-12	EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next	Ye Pan et.al.	2603.12147	translate	read	null
2026-03-12	Automatic Generation of High-Performance RL Environments	Seth Karten et.al.	2603.12145	translate	read	null
2026-03-12	Increasing intelligence in AI agents can worsen collective outcomes	Neil F. Johnson et.al.	2603.12129	translate	read	null
2026-03-12	Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives	Taeho Lee et.al.	2603.12110	translate	read	null
2026-03-12	On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents	Deyu Zou et.al.	2603.12109	translate	read	null
2026-03-12	A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control	Sheng-You Huang et.al.	2603.12096	translate	read	null
2026-03-12	Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics	Ming-Hong Chen et.al.	2603.12087	translate	read	null
2026-03-12	AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling	Hamed Hamzeh et.al.	2603.12031	translate	read	null
2026-03-12	Sim-to-reality adaptation for Deep Reinforcement Learning applied to an underwater docking application	Alaaeddine Chaarani et.al.	2603.12020	translate	read	null
2026-03-12	Learning Visuomotor Policy for Multi-Robot Laser Tag Game	Kai Li et.al.	2603.11980	translate	read	null
2026-03-12	FlexRec: Adapting LLM-based Recommenders for Flexible Needs via Reinforcement Learning	Yijun Pan et.al.	2603.11901	translate	read	null
2026-03-12	The price of decentralization in managing engineering systems through multi-agent reinforcement learning	Prateek Bhustali et.al.	2603.11884	translate	read	null
2026-03-12	Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language	Remigiusz Kinas et.al.	2603.11881	translate	read	null
2026-03-12	Hybrid Human-Agent Social Dilemmas in Energy Markets	Isuri Perera et.al.	2603.11834	translate	read	null
2026-03-12	Towards High-Fidelity CAD Generation via LLM-Driven Program Generation and Text-Based B-Rep Primitive Grounding	Jiahao Li et.al.	2603.11831	translate	read	null
2026-03-12	RADAR: Closed-Loop Robotic Data Generation via Semantic Planning and Autonomous Causal Environment Reset	Yongzhong Wang et.al.	2603.11811	translate	read	null
2026-03-12	Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach	Erfan Mirzaei et.al.	2603.11757	translate	read	null
2026-03-12	STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning	Jiwon Jeon et.al.	2603.11691	translate	read	null
2026-03-12	Entropy-Preserving Reinforcement Learning	Aleksei Petrenko et.al.	2603.11682	translate	read	null
2026-03-12	Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge	Junjie Wu et.al.	2603.11665	translate	read	null
2026-03-12	Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models	Xiquan Li et.al.	2603.11661	translate	read	null
2026-03-12	Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning	Jiaheng Hu et.al.	2603.11653	translate	read	null
2026-03-12	Diversity You Can Actually Measure: A Fast, Model-Free Diversity Metric for Robotics Datasets	Sreevardhan Sirigiri et.al.	2603.11634	translate	read	null
2026-03-12	Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization	Qijun Liao et.al.	2603.11600	translate	read	null
2026-03-12	WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing	Hui Zhang et.al.	2603.11593	translate	read	null
2026-03-12	Multi-Agent Reinforcement Learning for UAV-Based Chemical Plume Source Localization	Zhirun Li et.al.	2603.11582	translate	read	null
2026-03-12	SVLL: Staged Vision-Language Learning for Physically Grounded Embodied Task Planning	Yuyuan Yang et.al.	2603.11563	translate	read	null
2026-03-12	NFPO: Stabilized Policy Optimization of Normalizing Flow for Robotic Policy Learning	Diyuan Shi et.al.	2603.11470	translate	read	null
2026-03-12	Adversarial Reinforcement Learning for Detecting False Data Injection Attacks in Vehicular Routing	Taha Eghtesad et.al.	2603.11433	translate	read	null
2026-03-12	ARROW: Augmented Replay for RObust World models	Abdulaziz Alyahya et.al.	2603.11395	translate	read	null
2026-03-12	SliceFed: Federated Constrained Multi-Agent DRL for Dynamic Spectrum Slicing in 6G	Hossein Mohammadi et.al.	2603.11390	translate	read	null
2026-03-11	Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification	Hang Yu et.al.	2603.11372	translate	read	null
2026-03-11	abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance	Joyce Lee et.al.	2603.11369	translate	read	null
2026-03-11	Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning	Hong Lu et.al.	2603.11351	translate	read	null
2026-03-11	Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning	Yuto Shibata et.al.	2603.11346	translate	read	null
2026-03-11	Meta-Reinforcement Learning with Self-Reflection for Agentic Search	Teng Xiao et.al.	2603.11327	translate	read	null
2026-03-11	Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings	Yuning Wu et.al.	2603.11321	translate	read	null
2026-03-11	ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning	Lingxiao Tang et.al.	2603.11226	translate	read	null
2026-03-11	Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning	Yuehao Song et.al.	2603.11219	translate	read	null
2026-03-11	DeReason: A Difficulty-Aware Curriculum Improves Decoupled SFT-then-RL Training for General Reasoning	Hanxu Hu et.al.	2603.11193	translate	read	null
2026-03-11	Learning to Unscramble: Simplifying Symbolic Expressions via Self-Supervised Oracle Trajectories	David Shih et.al.	2603.11164	translate	read	null
2026-03-11	Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion	Yuanhong Wu et.al.	2603.11126	translate	read	null
2026-03-11	Learning Tree-Based Models with Gradient Descent	Sascha Marton et.al.	2603.11117	translate	read	null
2026-03-11	ResWM: Residual-Action World Model for Visual RL	Jseen Zhang et.al.	2603.11110	translate	read	null
2026-03-11	RC-NF: Robot-Conditioned Normalizing Flow for Real-Time Anomaly Detection in Robotic Manipulation	Shijie Zhou et.al.	2603.11106	translate	read	null
2026-03-11	Learning Adaptive Force Control for Contact-Rich Sample Scraping with Heterogeneous Materials	Cenk Cetin et.al.	2603.10979	translate	read	null
2026-03-11	Contact Coverage-Guided Exploration for General-Purpose Dexterous Manipulation	Zixuan Liu et.al.	2603.10971	translate	read	null
2026-03-11	Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control	Yaswanth Chittepu et.al.	2603.10938	translate	read	null
2026-03-11	Lifelong Imitation Learning with Multimodal Latent Replay and Incremental Adjustment	Fanqi Yu et.al.	2603.10929	translate	read	null
2026-03-11	Ergodicity in reinforcement learning	Dominik Baumann et.al.	2603.10895	translate	read	null
2026-03-11	Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models	Yixiu Mao et.al.	2603.10887	translate	read	null
2026-03-11	RL-Augmented MPC for Non-Gaited Legged and Hybrid Locomotion	Andrea Patrizi et.al.	2603.10878	translate	read	null
2026-03-11	$V_{0.5}$ : Generalist Value Model as a Prior for Sparse RL Rollouts	Yi-Kai Zhang et.al.	2603.10848	translate	read	null
2026-03-11	Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis	Yujie Zheng et.al.	2603.10846	translate	read	null
2026-03-11	ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning	Xiaofeng Lin et.al.	2603.10823	translate	read	null
2026-03-11	Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments	Konstantin Dobler et.al.	2603.10793	translate	read	null
2026-03-11	mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR	Konstantin Dobler et.al.	2603.10767	translate	read	null
2026-03-11	ASTER: Attitude-aware Suspended-payload Quadrotor Traversal via Efficient Reinforcement Learning	Dongcheng Cao et.al.	2603.10715	translate	read	null
2026-03-11	MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers	Jin Zhou et.al.	2603.10714	translate	read	null
2026-03-11	Splat2Real: Novel-view Scaling for Physical AI with 3D Gaussian Splatting	Hansol Lim et.al.	2603.10638	translate	read	null
2026-03-11	Reinforcement Learning with Conditional Expectation Reward	Changyi Xiao et.al.	2603.10624	translate	read	null
2026-03-11	AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments	Zixuan Chen et.al.	2603.10616	translate	read	null
2026-03-11	Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning	Zhaowei Zhang et.al.	2603.10588	translate	read	null
2026-03-11	Safety-critical Control Under Partial Observability: Reach-Avoid POMDP meets Belief Space Control	Matti Vahs et.al.	2603.10572	translate	read	null
2026-03-11	Adaptive RAN Slicing Control via Reward-Free Self-Finetuning Agents	Yuanhao Li et.al.	2603.10564	translate	read	null
2026-03-11	Learning to Score: Tuning Cluster Schedulers through Reinforcement Learning	Martin Asenov et.al.	2603.10545	translate	read	null
2026-03-11	Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning	Zichao Li et.al.	2603.10535	translate	read	null
2026-03-11	UAV-MARL: Multi-Agent Reinforcement Learning for Time-Critical and Dynamic Medical Supply Delivery	Islam Guven et.al.	2603.10528	translate	read	null
2026-03-11	IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs	Chuan Guo et.al.	2603.10521	translate	read	null
2026-03-11	Muscle Synergy Priors Enhance Biomechanical Fidelity in Predictive Musculoskeletal Locomotion Simulation	Ilseung Park et.al.	2603.10474	translate	read	null
2026-03-11	COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints	Mohammad Saeid Anwar et.al.	2603.10436	translate	read	null
2026-03-11	Adaptive Active Learning for Regression via Reinforcement Learning	Simon D. Nguyen et.al.	2603.10435	translate	read	null
2026-03-11	Graph-GRPO: Training Graph Flow Models with Reinforcement Learning	Baoheng Zhu et.al.	2603.10395	translate	read	null
2026-03-11	ScanDP: Generalizable 3D Scanning with Diffusion Policy	Itsuki Hirako et.al.	2603.10390	translate	read	null
2026-03-11	SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning	Anlun Huang et.al.	2603.10306	translate	read	null
2026-03-11	From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification	Ke Zhang et.al.	2603.10300	translate	read	null
2026-03-11	Quantum entanglement provides a competitive advantage in adversarial games	Peiyong Wang et.al.	2603.10289	translate	read	null
2026-03-10	From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning	Zhanyi Sun et.al.	2603.10263	translate	read	null
2026-03-10	SiMPO: Measure Matching for Online Diffusion Reinforcement Learning	Haitong Ma et.al.	2603.10250	translate	read	null
2026-03-10	Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces	Ji Gao et.al.	2603.10199	translate	read	null
2026-03-10	Learning to Decode Quantum LDPC Codes Via Belief Propagation	Mohsen Moradi et.al.	2603.10192	translate	read	null
2026-03-10	Calibration-Reasoning Framework for Descriptive Speech Quality Assessment	Elizaveta Kostenok et.al.	2603.10175	translate	read	null
2026-03-10	ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning	Ruizhong Qiu et.al.	2603.10160	translate	read	null
2026-03-10	CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR	Sijia Cui et.al.	2603.10101	translate	read	null
2026-03-10	Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models	Daniel Hennes et.al.	2603.10098	translate	read	null
2026-03-10	Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models	Ali Raza et.al.	2603.10080	translate	read	null
2026-03-10	Improving Search Agent with One Line of Code	Jian Li et.al.	2603.10069	translate	read	null
2026-03-09	Cluster-Aware Attention-Based Deep Reinforcement Learning for Pickup and Delivery Problems	Wentao Wang et.al.	2603.10053	translate	read	null
2026-03-10	Kinodynamic Motion Retargeting for Humanoid Locomotion via Multi-Contact Whole-Body Trajectory Optimization	Xiaoyu Zhang et.al.	2603.09956	translate	read	null
2026-03-10	When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic	Alberto Fernández-Hernández et.al.	2603.09950	translate	read	null
2026-03-10	Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts	Hongbo Bo et.al.	2603.09890	translate	read	null
2026-03-10	Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning	Yixin Zheng et.al.	2603.09882	translate	read	null
2026-03-10	RecThinker: An Agentic Framework for Tool-Augmented Reasoning in Recommendation	Haobo Zhang et.al.	2603.09843	translate	read	null
2026-03-10	Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning	Tiehua Mei et.al.	2603.09803	translate	read	null
2026-03-10	Long-Run Conditional Value-at-Risk Reinforcement Learning	Qixin Wang et.al.	2603.09734	translate	read	null
2026-03-10	GSStream: 3D Gaussian Splatting based Volumetric Scene Streaming System	Zhiye Tang et.al.	2603.09718	translate	read	null
2026-03-10	ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning	Davit Melikidze et.al.	2603.09692	translate	read	null
2026-03-10	ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly	Minchi Ruan et.al.	2603.09565	translate	read	null
2026-03-10	GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision	Lang Sun et.al.	2603.09551	translate	read	null
2026-03-10	NS-VLA: Towards Neuro-Symbolic Vision-Language-Action Models	Ziyue Zhu et.al.	2603.09542	translate	read	null
2026-03-10	Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization	Ming Nie et.al.	2603.09538	translate	read	null
2026-03-10	MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning	Xiang Yuan et.al.	2603.09478	translate	read	null
2026-03-10	SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments	Shiyi Chen et.al.	2603.09460	translate	read	null
2026-03-10	Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning	Tatjana Krau et.al.	2603.09427	translate	read	null
2026-03-10	SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space	Swaminathan S K et.al.	2603.09378	translate	read	null
2026-03-10	Robust Regularized Policy Iteration under Transition Uncertainty	Hongqiang Lin et.al.	2603.09344	translate	read	null
2026-03-10	Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning	Heng Zhang et.al.	2603.09331	translate	read	null
2026-03-10	OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models	Tengjin Weng et.al.	2603.09326	translate	read	null
2026-03-10	Social-R1: Towards Human-like Social Reasoning in LLMs	Jincenzi Wu et.al.	2603.09249	translate	read	null
2026-03-10	MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics	Neil Janwani et.al.	2603.09237	translate	read	null
2026-03-10	Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control	Peihao Wang et.al.	2603.09221	translate	read	null
2026-03-10	Embodied Human Simulation for Quantitative Design and Analysis of Interactive Robotics	Chenhui Zuo et.al.	2603.09218	translate	read	null
2026-03-10	Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation	Jake Gonzales et.al.	2603.09208	translate	read	null
2026-03-10	Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents	Jiangming Shu et.al.	2603.09203	translate	read	null
2026-03-10	RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning	Tzu-Heng Huang et.al.	2603.09160	translate	read	null
2026-03-10	Critical States Preparation With Deep Reinforcement Learning	Jia-Wen Yu et.al.	2603.09135	translate	read	null
2026-03-10	Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards	Zhengzhao Ma et.al.	2603.09117	translate	read	null
2026-03-10	Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms	Renos Zabounidis et.al.	2603.09090	translate	read	null
2026-03-10	Learning Adaptive LLM Decoding	Chloe H. Su et.al.	2603.09065	translate	read	null
2026-03-10	Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection	George Edwards et.al.	2603.09044	translate	read	null
2026-03-09	PlayWorld: Learning Robot World Models from Autonomous Play	Tenny Yin et.al.	2603.09030	translate	read	null
2026-03-09	MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment	Kailong Fan et.al.	2603.08987	translate	read	null
2026-03-09	FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid	Niraj Pudasaini et.al.	2603.08961	translate	read	null
2026-03-09	A Survey of Reinforcement Learning For Economics	Pranjal Rawat et.al.	2603.08956	translate	read	null
2026-03-09	Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance	Joshua Castillo et.al.	2603.08933	translate	read	null
2026-03-09	Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks	Hanzhi Yu et.al.	2603.08931	translate	read	null
2026-03-09	APPLV: Adaptive Planner Parameter Learning from Vision-Language-Action Model	Yuanjie Lu et.al.	2603.08862	translate	read	null
2026-03-09	VisionCreator-R1: A Reflection-Enhanced Native Visual-Generation Agentic Model	Jinxiang Lai et.al.	2603.08812	translate	read	null
2026-03-09	Multi-level meta-reinforcement learning with skill-based curriculum	Sichen Yang et.al.	2603.08773	translate	read	null
2026-03-09	SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning	Kaushik Roy et.al.	2603.08763	translate	read	null
2026-03-09	Agentic Critical Training	Weize Liu et.al.	2603.08706	translate	read	null
2026-03-09	How Far Can Unsupervised RLVR Scale LLM Training?	Bingxiang He et.al.	2603.08660	translate	read	null
2026-03-09	Embedding Classical Balance Control Principles in Reinforcement Learning for Humanoid Recovery	Nehar Poddar et.al.	2603.08619	translate	read	null
2026-03-09	Diff-Muscle: Efficient Learning for Musculoskeletal Robotic Table Tennis	Wentao Zhao et.al.	2603.08617	translate	read	null
2026-03-09	Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control	Riccardo De Monte et.al.	2603.08588	translate	read	null
2026-03-09	MetaWorld-X: Hierarchical World Modeling via VLM-Orchestrated Experts for Humanoid Loco-Manipulation	Yutong Shen et.al.	2603.08572	translate	read	null
2026-03-09	RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback	Xiaoying Zhang et.al.	2603.08561	translate	read	null
2026-03-09	Impact of Connectivity on Laplacian Representations in Reinforcement Learning	Tommaso Giorgi et.al.	2603.08558	translate	read	null
2026-03-09	EquiBim: Learning Symmetry-Equivariant Policy for Bimanual Manipulation	Zhiyuan Zhang et.al.	2603.08541	translate	read	null
2026-03-09	Breaking the Bias Barrier in Concave Multi-Objective Reinforcement Learning	Swetha Ganesh et.al.	2603.08518	translate	read	null
2026-03-09	Oracle-Guided Soft Shielding for Safe Move Prediction in Chess	Prajit T Rajendran et.al.	2603.08506	translate	read	null
2026-03-09	LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning	Ariel Rodriguez et.al.	2603.08476	translate	read	null
2026-03-09	Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning	Shreya Das et.al.	2603.08468	translate	read	null
2026-03-09	Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck	Fabio Valerio Massoli et.al.	2603.08462	translate	read	null
2026-03-09	Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems	Théo Zangato et.al.	2603.08418	translate	read	null
2026-03-09	Aligning to Illusions: Choice Blindness in Human and AI Feedback	Wenbin Wu et.al.	2603.08412	translate	read	null
2026-03-09	A Recipe for Stable Offline Multi-agent Reinforcement Learning	Dongsu Lee et.al.	2603.08399	translate	read	null
2026-03-09	Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective	Liyuan Mao et.al.	2603.08398	translate	read	null
2026-03-09	SlowBA: An efficiency backdoor attack towards VLM-based GUI agents	Junxian Li et.al.	2603.08316	translate	read	null
2026-03-09	Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces	Hamish Flynn et.al.	2603.08287	translate	read	null
2026-03-09	SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM	Makoto Sato et.al.	2603.08269	translate	read	null
2026-03-09	Adaptive shape control for microswimmer navigation in turbulence	Jingran Qiu et.al.	2603.08201	translate	read	null
2026-03-09	RexDrug: Reliable Multi-Drug Combination Extraction through Reasoning-Enhanced LLMs	Zhijun Wang et.al.	2603.08166	translate	read	null
2026-03-09	Towards Human-Like Manipulation through RL-Augmented Teleoperation and Mixture-of-Dexterous-Experts VLA	Tutian Tang et.al.	2603.08122	translate	read	null
2026-03-09	Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting	Zhongjian Qiao et.al.	2603.08118	translate	read	null
2026-03-09	DeReCo: Decoupling Representation and Coordination Learning for Object-Adaptive Decentralized Multi-Robot Cooperative Transport	Kazuki Shibata et.al.	2603.08111	translate	read	null
2026-03-09	Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization	Hongli Zhou et.al.	2603.08091	translate	read	null
2026-03-09	In-Context Reinforcement Learning for Tool Use in Large Language Models	Yaoqi Ye et.al.	2603.08068	translate	read	null
2026-03-09	ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning	Yiran Zhao et.al.	2603.08059	translate	read	null
2026-03-09	MJ1: Multimodal Judgment via Grounded Verification	Bhavesh Kumar et.al.	2603.07990	translate	read	null
2026-03-09	On the Feasibility and Opportunity of Autoregressive 3D Object Detection	Zanming Huang et.al.	2603.07985	translate	read	null
2026-03-09	VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments	Ning Liu et.al.	2603.07973	translate	read	null
2026-03-09	Model-Free DRL Control for Power Inverters: From Policy Learning to Real-Time Implementation via Knowledge Distillation	Yang Yang et.al.	2603.07964	translate	read	null
2026-03-09	SGG-R $^{\rm 3}$ : From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation	Jiaye Feng et.al.	2603.07961	translate	read	null
2026-03-09	RL unknotter, hard unknots and unknotting number	Anne Dranowski et.al.	2603.07955	translate	read	null
2026-03-09	SMGI: A Structural Theory of General Artificial Intelligence	Aomar Osmani et.al.	2603.07896	translate	read	null
2026-03-09	SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans	Hansi Zeng et.al.	2603.07853	translate	read	null
2026-03-08	Relating Reinforcement Learning to Dynamic Programming-Based Planning	Filip V. Georgiev et.al.	2603.07844	translate	read	null
2026-03-08	Preference-Conditioned Reinforcement Learning for Space-Time Efficient Online 3D Bin Packing	Nikita Sarawgi et.al.	2603.07800	translate	read	null
2026-03-08	Toward Global Intent Inference for Human Motion by Inverse Reinforcement Learning	Sarmad Mehrdad et.al.	2603.07797	translate	read	null
2026-03-08	ProgAgent:A Continual RL Agent with Progress-Aware Rewards	Jinzhou Tan et.al.	2603.07784	translate	read	null
2026-03-08	Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems	Zongqian Li et.al.	2603.07779	translate	read	null
2026-03-08	Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models	Zongqian Li et.al.	2603.07777	translate	read	null
2026-03-08	Residual Control for Fast Recovery from Dynamics Shifts	Nethmi Jayasinghe et.al.	2603.07775	translate	read	null
2026-03-08	TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward	Yihong Luo et.al.	2603.07700	translate	read	null
2026-03-08	Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization	Anirudh Satheesh et.al.	2603.07698	translate	read	null
2026-03-08	Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques	Rahul Bera et.al.	2603.07683	translate	read	null
2026-03-08	Numerical Approach for On-the-Fly Active Flow Control via Flow Map Learning Method	Xinyu Liu et.al.	2603.07678	translate	read	null
2026-03-08	DAISS: Phase-Aware Imitation Learning for Dual-Arm Robotic Ultrasound-Guided Interventions	Feng Li et.al.	2603.07663	translate	read	null
2026-03-08	Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving	Chang Su et.al.	2603.07642	translate	read	null
2026-03-08	Exoskeleton Control through Learning to Reduce Biological Joint Moments in Simulations	Zihang You et.al.	2603.07629	translate	read	null
2026-03-08	GeoLoco: Leveraging 3D Geometric Priors from Visual Foundation Model for Robust RGB-Only Humanoid Locomotion	Yufei Liu et.al.	2603.07624	translate	read	null
2026-03-08	Approximate Imitation Learning for Event-based Quadrotor Flight in Cluttered Environments	Nico Messikommer et.al.	2603.07578	translate	read	null
2026-03-08	Constraints Matrix Diffusion based Generative Neural Solver for Vehicle Routing Problems	Zhenwei Wang et.al.	2603.07568	translate	read	null
2026-03-08	COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance	Dennis Gross et.al.	2603.07546	translate	read	null
2026-03-08	ICLR: In-Context Imitation Learning with Visual Reasoning	Toan Nguyen et.al.	2603.07530	translate	read	null
2026-03-08	TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning	Mingyue Cheng et.al.	2603.07528	translate	read	null
2026-03-08	Reinforcement learning-based dynamic cleaning scheduling framework for solar energy system	Heungjo An et.al.	2603.07518	translate	read	null
2026-03-08	InterReal: A Unified Physics-Based Imitation Framework for Learning Human-Object Interaction Skills	Dayang Liang et.al.	2603.07516	translate	read	null
2026-03-08	EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification	Binjia Zhou et.al.	2603.07515	translate	read	null
2026-03-08	Med-Evo: Test-time Self-evolution for Medical Multimodal Large Language Models	Dunyuan Xu et.al.	2603.07443	translate	read	null
2026-03-08	Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II	Yi Tian et.al.	2603.07437	translate	read	null
2026-03-08	Generalization in Online Reinforcement Learning for Mobile Agents	Li Gu et.al.	2603.07432	translate	read	null
2026-03-08	Dynamic Vehicle Routing Problem with Prompt Confirmation of Advance Requests	Amutheezan Sivagnanam et.al.	2603.07422	translate	read	null
2026-03-08	Underwater Embodied Intelligence for Autonomous Robots: A Constraint-Coupled Perspective on Planning, Control, and Deployment	Jingzehua Xu et.al.	2603.07393	translate	read	null
2026-03-07	Learning to Reflect: Hierarchical Multi-Agent Reinforcement Learning for CSI-Free mmWave Beam-Focusing	Hieu Le et.al.	2603.07370	translate	read	null
2026-03-07	Neural Control and Learning of Simulated Hand Movements With an EMG-Based Closed-Loop Interface	Balint K. Hodossy et.al.	2603.07364	translate	read	null
2026-03-07	Adversarial Latent-State Training for Robust Policies in Partially Observable Domains	Angad Singh Ahuja et.al.	2603.07313	translate	read	null
2026-03-07	AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery	Nilesh Jain et.al.	2603.07300	translate	read	null
2026-03-07	Adaptive Double-Booking Strategy for Outpatient Scheduling Using Multi-Objective Reinforcement Learning	Ninda Nurseha Amalina et.al.	2603.07270	translate	read	null
2026-03-07	Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving	Jiazhuo Li et.al.	2603.07264	translate	read	null
2026-03-07	Learning When to Cooperate Under Heterogeneous Goals	Max Taylor-Davies et.al.	2603.07253	translate	read	null
2026-03-07	Reinforcement Learning for Vehicle-to-Grid Voltage Regulation: Single-Hub to Multi-Hub Coordination with Battery-Aware Constraints	Jingbo Wang et.al.	2603.07237	translate	read	null
2026-03-07	$\textbf{Re}^{2}$ : Unlocking LLM Reasoning via Reinforcement Learning with Re-solving	Pinzheng Wang et.al.	2603.07197	translate	read	null
2026-03-07	RoTri-Diff: A Spatial Robot-Object Triadic Interaction-Guided Diffusion Model for Bimanual Manipulation	Zixuan Chen et.al.	2603.07165	translate	read	null
2026-03-07	Learning From Failures: Efficient Reinforcement Learning Control with Episodic Memory	Chenyang Miao et.al.	2603.07110	translate	read	null
2026-03-07	Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction	Xu Chen et.al.	2603.07093	translate	read	null
2026-03-07	Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR	Muhammad Khalifa et.al.	2603.07084	translate	read	null
2026-03-07	Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction	Michael Hauri et.al.	2603.07083	translate	read	null
2026-03-07	SSP: Safety-guaranteed Surgical Policy via Joint Optimization of Behavioral and Spatial Constraints	Jianshu Hu et.al.	2603.07032	translate	read	null
2026-03-07	RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States	Xiangjie Xiao et.al.	2603.07020	translate	read	null
2026-03-07	AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge	Karen Zhou et.al.	2603.07019	translate	read	null
2026-03-07	AdaGen: Learning Adaptive Policy for Image Synthesis	Zanlin Ni et.al.	2603.06993	translate	read	null
2026-03-07	Diffusion Controller: Framework, Algorithms and Parameterization	Tong Yang et.al.	2603.06981	translate	read	null
2026-03-07	NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning	Addison Kalanther et.al.	2603.06977	translate	read	null
2026-03-07	Topology-Aware Reinforcement Learning over Graphs for Resilient Power Distribution Networks	Roshni Anna Jacob et.al.	2603.06964	translate	read	null
2026-03-07	Learning Quadruped Walking from Seconds of Demonstration	Ruipeng Zhang et.al.	2603.06961	translate	read	null
2026-03-07	Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards	Xin Zhang et.al.	2603.06958	translate	read	null
2026-03-06	Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments	Ege C. Kaya et.al.	2603.06946	translate	read	null
2026-03-06	Collaborative Planning with Concurrent Synchronization for Operationally Constrained UAV-UGV Teams	Zihao Deng et.al.	2603.06898	translate	read	null
2026-03-06	Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration	Yanjun Chen et.al.	2603.06859	translate	read	null
2026-03-06	Reinforcing the World’s Edge: A Continual Learning Problem in the Multi-Agent-World Boundary	Dane Malenfant et.al.	2603.06813	translate	read	null
2026-03-06	Multi-Agent Reinforcement Learning with Submodular Reward	Wenjing Chen et.al.	2603.06810	translate	read	null
2026-03-06	Optimistic Policy Regularization	Mai Pham et.al.	2603.06793	translate	read	null
2026-03-06	HGT-Scheduler: Deep Reinforcement Learning for the Job Shop Scheduling Problem via Heterogeneous Graph Transformers	Bulent Soykan et.al.	2603.06777	translate	read	null
2026-03-06	HybridMimic: Hybrid RL-Centroidal Control for Humanoid Motion Mimicking	Ludwig Chee-Ying Tay et.al.	2603.06775	translate	read	null
2026-03-06	Stabilizing Reinforcement Learning for Diffusion Language Models	Jianyuan Zhong et.al.	2603.06743	translate	read	null
2026-03-06	Don’t Freeze, Don’t Crash: Extending the Safe Operating Range of Neural Navigation in Dense Crowds	Jiefu Zhang et.al.	2603.06729	translate	read	null
2026-03-06	Boosting deep Reinforcement Learning using pretraining with Logical Options	Zihan Ye et.al.	2603.06565	translate	read	null
2026-03-06	EgoReasoner: Learning Egocentric 4D Reasoning via Task-Adaptive Structured Thinking	Fangrui Zhu et.al.	2603.06561	translate	read	null
2026-03-06	On a PDE model for Learning in Stochastic Market Entry Games	Esther Bou Dagher et.al.	2603.06514	translate	read	null
2026-03-06	A Reference Architecture of Reinforcement Learning Frameworks	Xiaoran Liu et.al.	2603.06413	translate	read	null
2026-03-06	Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion	Pengcheng Jiang et.al.	2603.06397	translate	read	null
2026-03-06	OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis	Yuxuan Fan et.al.	2603.06366	translate	read	null
2026-03-06	From Entropy to Calibrated Uncertainty: Training Language Models to Reason About Uncertainty	Azza Jenane et.al.	2603.06317	translate	read	null
2026-03-06	Artificial Intelligence for Climate Adaptation: Reinforcement Learning for Climate Change-Resilient Transport	Miguel Costa et.al.	2603.06278	translate	read	null
2026-03-06	Synthetic Monitoring Environments for Reinforcement Learning	Leonard Pleiss et.al.	2603.06252	translate	read	null
2026-03-06	MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue	Naifan Zhang et.al.	2603.06194	translate	read	null
2026-03-06	Optimizing 3D Diffusion Models for Medical Imaging via Multi-Scale Reward Learning	Yueying Tian et.al.	2603.06173	translate	read	null
2026-03-06	Dual-Agent Multiple-Model Reinforcement Learning for Event-Triggered Human-Robot Co-Adaptation in Decoupled Task Spaces	Yaqi Li et.al.	2603.06163	translate	read	null
2026-03-06	Partial Policy Gradients for RL in LLMs	Puneet Mathur et.al.	2603.06138	translate	read	null
2026-03-06	ChatShopBuddy: Towards Reliable Conversational Shopping Agents via Reinforcement Learning	Yiruo Cheng et.al.	2603.06065	translate	read	null
2026-03-06	Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models	Canyu Chen et.al.	2603.06049	translate	read	null
2026-03-06	Reinforcement Learning for Secrecy Optimization in Underwater Energy Harvesting Relay Network	Shalini Tripathi et.al.	2603.06046	translate	read	null
2026-03-06	Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models	Jiadong Pan et.al.	2603.06043	translate	read	null
2026-03-06	ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning	Xingjian Tao et.al.	2603.06024	translate	read	null
2026-03-06	TADPO: Reinforcement Learning Goes Off-road	Zhouchonghao Wu et.al.	2603.05995	translate	read	null
2026-03-06	LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution	Song Fei et.al.	2603.05947	translate	read	null
2026-03-06	How to Model Your Crazyflie Brushless	Alexander Gräfe et.al.	2603.05944	translate	read	null
2026-03-06	Swooper: Learning High-Speed Aerial Grasping With a Simple Gripper	Ziken Huang et.al.	2603.05935	translate	read	null
2026-03-06	CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning	Yuxin Xie et.al.	2603.05911	translate	read	null
2026-03-06	Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning	Xuan Li et.al.	2603.05900	translate	read	null
2026-03-06	Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation	Changcheng Li et.al.	2603.05881	translate	read	null
2026-03-06	PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues	Yukun Qi et.al.	2603.05869	translate	read	null
2026-03-06	ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning	Juyong Jiang et.al.	2603.05863	translate	read	null
2026-03-06	Expert Knowledge-driven Reinforcement Learning for Autonomous Racing via Trajectory Guidance and Dynamics Constraints	Bo Leng et.al.	2603.05842	translate	read	null
2026-03-06	OpenHEART: Opening Heterogeneous Articulated Objects with a Legged Manipulator	Seonghyeon Lim et.al.	2603.05830	translate	read	null
2026-03-06	CDF-Glove: A Cable-Driven Force Feedback Glove for Dexterous Teleoperation	Huayue Liang et.al.	2603.05804	translate	read	null
2026-03-06	Task-Level Decisions to Gait Level Control: A Hierarchical Policy Approach for Quadruped Navigation	Sijia Li et.al.	2603.05783	translate	read	null
2026-03-05	MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation	Rifny Rachman et.al.	2603.05760	translate	read	null
2026-03-05	Reinforcement Learning for Power-Flow Network Analysis	Alperen Ergur et.al.	2603.05673	translate	read	null
2026-03-05	TransMASK: Masked State Representation through Learned Transformation	Sagar Parekh et.al.	2603.05670	translate	read	null
2026-03-05	When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On	Wisdom Ikezogwo et.al.	2603.05659	translate	read	null
2026-03-05	Thinking with Spatial Code for Physical-World Video Reasoning	Jieneng Chen et.al.	2603.05591	translate	read	null
2026-03-05	A Novel Hybrid Heuristic-Reinforcement Learning Optimization Approach for a Class of Railcar Shunting Problems	Ruonan Zhao et.al.	2603.05579	translate	read	null
2026-03-05	Task Parameter Extrapolation via Learning Inverse Tasks from Forward Demonstrations	Serdar Bahar et.al.	2603.05576	translate	read	null
2026-03-05	PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions	Arnau Boix-Granell et.al.	2603.05574	translate	read	null
2026-03-05	Autocorrelation effects in a stochastic-process model for decision making via time series	Tomoki Yamagami et.al.	2603.05559	translate	read	null
2026-03-05	RoboPocket: Improve Robot Policies Instantly with Your Phone	Junjie Fang et.al.	2603.05504	translate	read	null
2026-03-05	Latent Wasserstein Adversarial Imitation Learning	Siqi Yang et.al.	2603.05440	translate	read	null
2026-03-05	SpiderCat: Optimal Fault-Tolerant Cat State Preparation	Andrey Boris Khesin et.al.	2603.05391	translate	read	null
2026-03-05	DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning	Mohammad Mahdi Moradi et.al.	2603.05357	translate	read	null
2026-03-05	Latent Policy Steering through One-Step Flow Policies	Hokyun Im et.al.	2603.05296	translate	read	null
2026-03-05	Knowledge Divergence and the Value of Debate for Scalable Oversight	Robin Young et.al.	2603.05293	translate	read	null
2026-03-05	Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts	Samandar Samandarov et.al.	2603.05276	translate	read	null
2026-03-05	SarcasmMiner: A Dual-Track Post-Training Framework for Robust Audio-Visual Sarcasm Reasoning	Zhu Li et.al.	2603.05275	translate	read	null
2026-03-05	Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum	Shan Ning et.al.	2603.05256	translate	read	null
2026-03-05	Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards	Linghan Fang et.al.	2603.05231	translate	read	null
2026-03-05	KARL: Knowledge Agents via Reinforcement Learning	Jonathan D. Chang et.al.	2603.05218	translate	read	null
2026-03-05	LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting	Yewen Li et.al.	2603.05134	translate	read	null
2026-03-05	SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation	Youqiang Gui et.al.	2603.05117	translate	read	null
2026-03-05	Decoupling Task and Behavior: A Two-Stage Reward Curriculum in Reinforcement Learning for Robotics	Kilian Freitag et.al.	2603.05113	translate	read	null
2026-03-05	Reward-Conditioned Reinforcement Learning	Michal Nauman et.al.	2603.05066	translate	read	null
2026-03-05	WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents	Sicheng Fan et.al.	2603.05044	translate	read	null
2026-03-05	Formal Entropy-Regularized Control of Stochastic Systems	Menno van Zutphen et.al.	2603.05021	translate	read	null
2026-03-05	BioLLMAgent: A Hybrid Framework with Enhanced Structural Interpretability for Simulating Human Decision-Making in Computational Psychiatry	Zuo Fei et.al.	2603.05016	translate	read	null
2026-03-05	Competitive Multi-Operator Reinforcement Learning for Joint Pricing and Fleet Rebalancing in AMoD Systems	Emil Kragh Toft et.al.	2603.05000	translate	read	null
2026-03-05	3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding	Xiongkun Linghu et.al.	2603.04976	translate	read	null
2026-03-05	$\nabla$ -Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space	Peihao Wang et.al.	2603.04948	translate	read	null
2026-03-05	Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition	Mengze Hong et.al.	2603.04945	translate	read	null
2026-03-05	BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning	Yuan Li et.al.	2603.04918	translate	read	null
2026-03-05	VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory	Yuheng Lei et.al.	2603.04910	translate	read	null
2026-03-05	Task-Relevant and Irrelevant Region-Aware Augmentation for Generalizable Vision-Based Imitation Learning in Agricultural Manipulation	Shun Hattori et.al.	2603.04845	translate	read	null
2026-03-05	SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning	Manav Vora et.al.	2603.04833	translate	read	null
2026-03-05	VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment	Jiawei Chen et.al.	2603.04822	translate	read	null
2026-03-05	Diffusion Policy through Conditional Proximal Policy Optimization	Ben Liu et.al.	2603.04790	translate	read	null
2026-03-05	Adaptive Personalized Federated Reinforcement Learning for RIS-Assisted Aerial Relays in SAGINs with Fluid Antennas	Yuxuan Yang et.al.	2603.04788	translate	read	null
2026-03-05	Data-Driven Control of a Magnetically Actuated Fish-Like Robot	Akiyuki Koyama et.al.	2603.04787	translate	read	null
2026-03-05	Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction	Xingwu Chen et.al.	2603.04783	translate	read	null
2026-03-05	Selfish Cooperation Towards Low-Altitude Economy: Integrated Multi-Service Deployment with Resilient Federated Reinforcement Learning	Yuxuan Yang et.al.	2603.04779	translate	read	null
2026-03-05	Distributional Reinforcement Learning with Information Bottleneck for Uncertainty-Aware DRAM Equalization	Muhammad Usama et.al.	2603.04768	translate	read	null
2026-03-05	LLM-Guided Decentralized Exploration with Self-Organizing Robot Teams	Hiroaki Kawashima et.al.	2603.04762	translate	read	null
2026-03-05	SeekRBP: Leveraging Sequence-Structure Integration with Reinforcement Learning for Receptor-Binding Protein Identification	Xiling Luo et.al.	2603.04748	translate	read	null
2026-03-04	Optimizing Language Models for Crosslingual Knowledge Consistency	Tianyu Liu et.al.	2603.04678	translate	read	null
2026-03-04	When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift	Kevin Vogt-Lowell et.al.	2603.04648	translate	read	null
2026-03-04	Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning	Lei Huang et.al.	2603.04597	translate	read	null
2026-03-04	ELLIPSE: Evidential Learning for Robust Waypoints and Uncertainties	Zihao Dong et.al.	2603.04585	translate	read	null
2026-03-04	Risk-Aware Reinforcement Learning for Mobile Manipulation	Michael Groom et.al.	2603.04579	translate	read	null
2026-03-04	Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling	Tal Daniel et.al.	2603.04553	translate	read	null
2026-03-04	Transformer-Based Multipath Congestion Control: A Decoupled Approach for Wireless Uplinks	Zongyuan Zhang et.al.	2603.04550	translate	read	null
2026-03-04	PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation	Rosy Chen et.al.	2603.04531	translate	read	null
2026-03-04	TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning	Maximilian von Klinski et.al.	2603.04380	translate	read	null
2026-03-04	Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks	Haoyu Liu et.al.	2603.04364	translate	read	null
2026-03-04	A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications	Ozan Aygün et.al.	2603.04353	translate	read	null
2026-03-04	Tendon Force Modeling for Sim2Real Transfer of Reinforcement Learning Policies for Tendon-Driven Robots	Valentin Yuryev et.al.	2603.04351	translate	read	null
2026-03-04	What Does Flow Matching Bring To TD Learning?	Bhavya Agrawalla et.al.	2603.04333	translate	read	null
2026-03-04	IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning	Yihao Qin et.al.	2603.04289	translate	read	null
2026-03-04	Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory	Zhenting Wang et.al.	2603.04257	translate	read	null
2026-03-04	OptiQKD: A Machine Learning-Optimized Framework for Real-Time Parameter Tuning in Quantum Key Distribution	Noureldin Mohamed et.al.	2603.04192	translate	read	null
2026-03-04	Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation	Ilseung Park et.al.	2603.04166	translate	read	null
2026-03-04	BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning	Tarjei Paule Hage et.al.	2603.04124	translate	read	null
2026-03-04	Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning	Ajan Subramanian et.al.	2603.04098	translate	read	null
2026-03-04	Swimming Under Constraints: A Safe Reinforcement Learning Framework for Quadrupedal Bio-Inspired Propulsion	Xinyu Cui et.al.	2603.04073	translate	read	null
2026-03-04	SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling	Jinlong Cui et.al.	2603.04071	translate	read	null
2026-03-04	Force-Aware Residual DAgger via Trajectory Editing for Precision Insertion with Impedance Control	Yiou Huang et.al.	2603.04038	translate	read	null
2026-03-04	Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback	Fabian Domberg et.al.	2603.04029	translate	read	null
2026-03-04	Rethinking the Efficiency and Effectiveness of Reinforcement Learning for Radiology Report Generation	Zilin Lu et.al.	2603.04022	translate	read	null
2026-03-04	Discriminative Perception via Anchored Description for Reasoning Segmentation	Tao Yang et.al.	2603.04002	translate	read	null
2026-03-04	Structural Action Transformer for 3D Dexterous Manipulation	Xiaohan Lei et.al.	2603.03960	translate	read	null
2026-03-04	GIPO: Gaussian Importance Sampling Policy Optimization	Chengxuan Lu et.al.	2603.03955	translate	read	null
2026-03-04	RVN-Bench: A Benchmark for Reactive Visual Navigation	Jaewon Lee et.al.	2603.03953	translate	read	null
2026-03-04	Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control	Nicolas Helson et.al.	2603.03932	translate	read	null
2026-03-04	IROSA: Interactive Robot Skill Adaptation using Natural Language	Markus Knauer et.al.	2603.03897	translate	read	null
2026-03-04	Dual-Interaction-Aware Cooperative Control Strategy for Alleviating Mixed Traffic Congestion	Zhengxuan Liu et.al.	2603.03848	translate	read	null
2026-03-04	Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation	Yun Lu et.al.	2603.03820	translate	read	null
2026-03-04	Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling	Emile Anand et.al.	2603.03759	translate	read	null
2026-03-04	Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning	Chuang Zhang et.al.	2603.03752	translate	read	null
2026-03-04	Interaction-Aware Whole-Body Control for Compliant Object Transport	Hao Zhang et.al.	2603.03751	translate	read	null
2026-03-04	HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration	Hao Zhang et.al.	2603.03741	translate	read	null
2026-03-04	UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services	Tonmoy Dey et.al.	2603.03701	translate	read	null
2026-03-04	MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation	Lu Yang et.al.	2603.03680	translate	read	null
2026-03-04	MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation	Guoyi Li et.al.	2603.03677	translate	read	null
2026-03-04	Principled Learning-to-Communicate with Quasi-Classical Information Structures	Xiangyu Liu et.al.	2603.03664	translate	read	null
2026-03-04	Freezing of Gait Prediction using Proactive Agent that Learns from Selected Experience and DDQN Algorithm	Septian Enggar Sukmana et.al.	2603.03651	translate	read	null
2026-03-04	Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration	Danish Rizvi et.al.	2603.03595	translate	read	null
2026-03-03	Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence	Shengbo Wang et.al.	2603.03523	translate	read	null
2026-03-03	PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation	Shang Wu et.al.	2603.03505	translate	read	null
2026-03-03	Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion	Haoran Lu et.al.	2603.03485	translate	read	null
2026-03-03	Optimal trajectory-guided stochastic co-optimization for e-fuel system design and real-time operation	Jeongdong Kim et.al.	2603.03484	translate	read	null
2026-03-03	Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning	Harin Lee et.al.	2603.03480	translate	read	null
2026-03-03	[Re] FairDICE: A Gap Between Theory And Practice	Peter Adema et.al.	2603.03454	translate	read	null
2026-03-03	Beyond Accuracy: Evaluating Visual Grounding In Multimodal Medical Reasoning	Anas Zafar et.al.	2603.03437	translate	read	null
2026-03-03	Multi-Agent-Based Simulation of Archaeological Mobility in Uneven Landscapes	Chairi Kiourt et.al.	2603.03390	translate	read	null
2026-03-03	How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference	Toru Lin et.al.	2603.03280	translate	read	null
2026-03-03	ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation	Xialin He et.al.	2603.03279	translate	read	null
2026-03-03	Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use	Aradhye Agarwal et.al.	2603.03205	translate	read	null
2026-03-03	Specificity-aware reinforcement learning for fine-grained open-world classification	Samuele Angheben et.al.	2603.03197	translate	read	null
2026-03-03	Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing	Jiyuan Wang et.al.	2603.03143	translate	read	null
2026-03-03	RL-Based Coverage Path Planning for Deformable Objects on 3D Surfaces	Yuhang Zhang et.al.	2603.03137	translate	read	null
2026-03-03	Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics	Hossein Rastgoftar et.al.	2603.03127	translate	read	null
2026-03-03	Proactive Guiding Strategy for Item-side Fairness in Interactive Recommendation	Chongjun Xia et.al.	2603.03094	translate	read	null
2026-03-03	RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization	Siwei Zhang et.al.	2603.03078	translate	read	null
2026-03-03	TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning	Christian Greisinger et.al.	2603.03072	translate	read	null
2026-03-03	Reinforcement Learning with Symbolic Reward Machines	Thomas Krug et.al.	2603.03068	translate	read	null
2026-03-03	CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots	Shihao Ma et.al.	2603.03067	translate	read	null
2026-03-03	PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems	Sudip Bhujel et.al.	2603.03054	translate	read	null
2026-03-03	QFlowNet: Fast, Diverse, and Efficient Unitary Synthesis with Generative Flow Networks	Inhoe Koo et.al.	2603.03045	translate	read	null
2026-03-03	Why Does RLAIF Work At All?	Robin Young et.al.	2603.03000	translate	read	null
2026-03-03	Contextualized Privacy Defense for LLM Agents	Yule Wen et.al.	2603.02983	translate	read	null
2026-03-03	DreamFlow: Local Navigation Beyond Observation via Conditional Flow Matching in the Latent Space	Jiwon Park et.al.	2603.02976	translate	read	null
2026-03-03	CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning	Zhenquan Yao et.al.	2603.02951	translate	read	null
2026-03-03	Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models	Fengzhi Li et.al.	2603.02938	translate	read	null
2026-03-03	Contextual Latent World Models for Offline Meta Reinforcement Learning	Mohammadreza Nakheai et.al.	2603.02935	translate	read	null
2026-03-03	On the Structural Limitations of Weight-Based Neural Adaptation and the Role of Reversible Behavioral Learning	Pardhu Sri Rushi Varma Konduru et.al.	2603.02934	translate	read	null
2026-03-03	Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection?	Xin Wang et.al.	2603.02914	translate	read	null
2026-03-03	SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training	Qi Zhang et.al.	2603.02908	translate	read	null
2026-03-03	Learning in Markov Decision Processes with Exogenous Dynamics	Davide Maran et.al.	2603.02862	translate	read	null
2026-03-03	Rhythm: Learning Interactive Whole-Body Control for Dual Humanoids	Hongjin Chen et.al.	2603.02856	translate	read	null
2026-03-03	Learning Memory-Enhanced Improvement Heuristics for Flexible Job Shop Scheduling	Jiaqi Wang et.al.	2603.02846	translate	read	null
2026-03-03	VSearcher: Long-Horizon Multimodal Search Agent via Reinforcement Learning	Ruiyang Zhang et.al.	2603.02795	translate	read	null
2026-03-03	Generative adversarial imitation learning for robot swarms: Learning from human demonstrations and trained policies	Mattes Kraus et.al.	2603.02783	translate	read	null
2026-03-03	Next Embedding Prediction Makes World Models Stronger	George Bredis et.al.	2603.02765	translate	read	null
2026-03-03	Enhancing User Throughput in Multi-panel mmWave Radio Access Networks for Beam-based MU-MIMO Using a DRL Method	Ramin Hashemi et.al.	2603.02745	translate	read	null
2026-03-03	From “What” to “How”: Constrained Reasoning for Autoregressive Image Generation	Ruxue Yan et.al.	2603.02712	translate	read	null
2026-03-03	Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization	Yueyang Cang et.al.	2603.02701	translate	read	null
2026-03-03	VisionCreator: A Native Visual-Generation Agentic Model with Understanding, Thinking, Planning and Creation	Jinxiang Lai et.al.	2603.02681	translate	read	null
2026-03-03	Watch Your Step: Learning Semantically-Guided Locomotion in Cluttered Environment	Denan Liang et.al.	2603.02657	translate	read	null
2026-03-03	Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization	Seongmin Kim et.al.	2603.02654	translate	read	null
2026-03-03	Improving Diffusion Planners by Self-Supervised Action Gating with Energies	Yuan Lu et.al.	2603.02650	translate	read	null
2026-03-02	Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training	Valentin Lacombe et.al.	2603.02208	translate	read	null
2026-03-02	Tool Verification for Test-Time Reinforcement Learning	Ruotong Liao et.al.	2603.02203	translate	read	null
2026-03-02	Near-Optimal Regret for KL-Regularized Multi-Armed Bandits	Kaixuan Ji et.al.	2603.02155	translate	read	null
2026-03-02	LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards	Guanzheng Chen et.al.	2603.02146	translate	read	null
2026-03-02	Rethinking Camera Choice: An Empirical Study on Fisheye Camera Properties in Robotic Manipulation	Han Xue et.al.	2603.02139	translate	read	null
2026-03-02	Pencil Puzzle Bench: A Benchmark for Multi-Step Verifiable Reasoning	Justin Waugh et.al.	2603.02119	translate	read	null
2026-03-02	ACDC: Adaptive Curriculum Planning with Dynamic Contrastive Control for Goal-Conditioned Reinforcement Learning in Robotic Manipulation	Xuerui Wang et.al.	2603.02104	translate	read	null
2026-03-02	Learning from Synthetic Data Improves Multi-hop Reasoning	Anmol Kabra et.al.	2603.02091	translate	read	null
2026-03-02	Reinforcement Learning-Based Filters for Convection-Dominated Flows: Reference-Free and Reference-Guided Training	Anna Ivagnes et.al.	2603.02086	translate	read	null
2026-03-02	$π$ -StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs	Siting Wang et.al.	2603.02083	translate	read	null
2026-03-02	Accelerating PDE Surrogates via RL-Guided Mesh Optimization	Yang Meng et.al.	2603.02066	translate	read	null
2026-03-02	Expanding LLM Agent Boundaries with Strategy-Guided Exploration	Andrew Szot et.al.	2603.02045	translate	read	null
2026-03-02	Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards	Faisal Mohamed et.al.	2603.02008	translate	read	null
2026-03-02	Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection	Yuchen Zhang et.al.	2603.01993	translate	read	null
2026-03-02	CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production	Yixin Nie et.al.	2603.01973	translate	read	null
2026-03-02	CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification	Jinpeng Chen et.al.	2603.01940	translate	read	null
2026-03-02	LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving	Yuechen Luo et.al.	2603.01928	translate	read	null
2026-03-02	Efficient RLVR Training via Weighted Mutual Information Data Selection	Xinyu Zhou et.al.	2603.01907	translate	read	null
2026-03-02	Visual Bias in Simulated Users: The Impact of Luminance and Contrast on Reinforcement Learning-based Interaction	Hannah Selder et.al.	2603.01901	translate	read	null
2026-03-02	Generative Visual Chain-of-Thought for Image Editing	Zijin Yin et.al.	2603.01893	translate	read	null
2026-03-02	SEAR: Sample Efficient Action Chunking Reinforcement Learning	C. F. Maximilian Nagy et.al.	2603.01891	translate	read	null
2026-03-02	FireRed-OCR Technical Report	Hao Wu et.al.	2603.01840	translate	read	link
2026-03-02	Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport	Harry Amad et.al.	2603.01771	translate	read	null
2026-03-02	Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning	Naoki Shitanda et.al.	2603.01741	translate	read	null
2026-03-02	TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training	Jinluan Yang et.al.	2603.01714	translate	read	null
2026-03-02	Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning	Haonan Jia et.al.	2603.01696	translate	read	null
2026-03-02	MVR: Multi-view Video Reward Shaping for Reinforcement Learning	Lirui Luo et.al.	2603.01694	translate	read	null
2026-03-02	Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs	Shuangchun Gui et.al.	2603.01667	translate	read	null
2026-03-02	Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning	Jiebin Zhang et.al.	2603.01639	translate	read	null
2026-03-02	Learning Thermal-Aware Locomotion Policies for an Electrically-Actuated Quadruped Robot	Letian Qian et.al.	2603.01631	translate	read	null
2026-03-02	ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment in Domain-Specific Agents	Pengbo Liu et.al.	2603.01620	translate	read	null
2026-03-02	CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework	Yuexi Du et.al.	2603.01607	translate	read	null
2026-03-02	Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models	Qiyuan Zhang et.al.	2603.01571	translate	read	null
2026-03-02	Investigating Group Relative Policy Optimization for Diffusion Transformer based Text-to-Audio Generation	Yi Gu et.al.	2603.01565	translate	read	null
2026-03-02	LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models	Chenxing Wei et.al.	2603.01563	translate	read	null
2026-03-02	State-Action Inpainting Diffuser for Continuous Control with Delay	Dongqi Han et.al.	2603.01553	translate	read	null
2026-03-02	GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control	Haofeng Xu et.al.	2603.01501	translate	read	null
2026-03-02	LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning	Chang Yao et.al.	2603.01488	translate	read	null
2026-03-02	Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents	Haojin Yang et.al.	2603.01481	translate	read	null
2026-03-02	Towards Robot Skill Learning and Adaptation with Gaussian Processes	A K M Nadimul Haque et.al.	2603.01480	translate	read	null
2026-03-02	ProtRLSearch: A Multi-Round Multimodal Protein Search Agent with Large Language Models Trained via Reinforcement Learning	Congying Liu et.al.	2603.01464	translate	read	null
2026-03-02	Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning	Shaohuai Liu et.al.	2603.01452	translate	read	null
2026-03-02	Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents	Zhixiang Wang et.al.	2603.01416	translate	read	null
2026-03-02	MIST-RL: Mutation-based Incremental Suite Testing via Reinforcement Learning	Sicheng Zhu et.al.	2603.01409	translate	read	null
2026-03-02	SubstratumGraphEnv: Reinforcement Learning Environment (RLE) for Modeling System Attack Paths	Bahirah Adewunmi et.al.	2603.01340	translate	read	null
2026-03-02	Energy Efficient Traffic Scheduling For Optical LEO Satellite Downlinks	Ethan Fettes et.al.	2603.01334	translate	read	null
2026-03-01	PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure	Joshua Steier et.al.	2603.01309	translate	read	null
2026-03-01	Hybrid TD3: Overestimation Bias Analysis and Stable Policy Optimization for Hybrid Action Space	Thanh-Tuan Tran et.al.	2603.01302	translate	read	null
2026-03-01	When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains	Ahmadreza Jeddi et.al.	2603.01301	translate	read	link
2026-03-01	Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models	Adel Javanmard et.al.	2603.01293	translate	read	null
2026-03-01	Integrating LTL Constraints into PPO for Safe Reinforcement Learning	Maifang Zhang et.al.	2603.01292	translate	read	null
2026-03-01	Beyond Reward: A Bounded Measure of Agent Environment Coupling	Wael Hafez et.al.	2603.01283	translate	read	null
2026-03-01	MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers	Abdulhamid M. Mousa et.al.	2603.01260	translate	read	null
2026-03-01	Towards Policy-Adaptive Image Guardrail: Benchmark and Method	Caiyong Piao et.al.	2603.01228	translate	read	null
2026-03-01	Can Thinking Models Think to Detect Hateful Memes?	Mohamed Bayan Kmainasi et.al.	2603.01225	translate	read	null
2026-03-01	Learn Hard Problems During RL with Reference Guided Fine-tuning	Yangzhen Wu et.al.	2603.01223	translate	read	null
2026-03-01	Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning	Dan Qiao et.al.	2603.01221	translate	read	null
2026-03-01	Reasoning Boosts Opinion Alignment in LLMs	Frédéric Berdoz et.al.	2603.01214	translate	read	null
2026-03-01	PARWiS: Winner determination under shoestring budgets using active pairwise comparisons	Shailendra Bhandari et.al.	2603.01171	translate	read	null
2026-03-01	BeautyGRPO: Aesthetic Alignment for Face Retouching via Dynamic Path Guidance and Fine-Grained Preference Modeling	Jiachen Yang et.al.	2603.01163	translate	read	null
2026-03-01	DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent	Tongzhou Wu et.al.	2603.01152	translate	read	null
2026-03-01	Compact Task-Aligned Imitation Learning for Laboratory Automation	Kanata Suzuki et.al.	2603.01110	translate	read	null
2026-03-01	DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage	Haowen Gao et.al.	2603.01106	translate	read	null
2026-03-01	Feasible Pairings for Decentralized Integral Controllability of Non-Square Systems	Yuhao Tong et.al.	2603.01076	translate	read	null
2026-03-01	How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning	Xiangxiang Zhang et.al.	2603.01070	translate	read	null
2026-03-01	Unleashing VLA Potentials in Autonomous Driving via Explicit Learning from Failures	Yuechen Luo et.al.	2603.01063	translate	read	null
2026-03-01	MM-DeepResearch: A Simple and Effective Multimodal Agentic Search Baseline	Huanjin Yao et.al.	2603.01050	translate	read	null
2026-03-01	HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents	Hongbo Jin et.al.	2603.00977	translate	read	null
2026-03-01	Intent-Context Synergy Reinforcement Learning for Autonomous UAV Decision-Making in Air Combat	Jiahao Fu et.al.	2603.00974	translate	read	null
2026-03-01	Stabilizing Policy Optimization via Logits Convexity	Hongzhan Chen et.al.	2603.00963	translate	read	null
2026-03-01	HierKick: Hierarchical Reinforcement Learning for Vision-Guided Soccer Robot Control	Yizhi Chen et.al.	2603.00948	translate	read	null
2026-03-01	Non-Rectangular Average-Reward Robust MDPs: Non-Rectangular Average-Reward Robust MDPs:Optimal Policies and Their Transient Values	Shengbo wang et.al.	2603.00945	translate	read	null
2026-03-01	Minimalist Compliance Control	Haochen Shi et.al.	2603.00913	translate	read	null
2026-03-01	Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning	Ke Sun et.al.	2603.00903	translate	read	null
2026-03-01	CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning	Xinyu Zhu et.al.	2603.00889	translate	read	null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)