Reinforcement Learning - 2026-01

Publish Date Title Authors PDF Translate Read Code
2026-01-30 IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models Seyed Ahmad Hosseini Miangoleh et.al. 2601.23266 translate read null
2026-01-30 Agile Reinforcement Learning through Separable Neural Architecture Rajib Mostakim et.al. 2601.23225 translate read null
2026-01-30 Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning Xiangyu Zeng et.al. 2601.23224 translate read null
2026-01-30 Med-Scout: Curing MLLMs’ Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training Anglin Liu et.al. 2601.23220 translate read null
2026-01-30 Unsupervised Hierarchical Skill Discovery Damion Harvey et.al. 2601.23156 translate read null
2026-01-30 On Safer Reinforcement Learning Policies for Sedation and Analgesia in Intensive Care Joel Romero-Hernandez et.al. 2601.23154 translate read null
2026-01-30 THINKSAFE: Self-Generated Safety Alignment for Reasoning Models Seanie Lee et.al. 2601.23143 translate read link
2026-01-30 Why GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive Gradients Cheng Ge et.al. 2601.23135 translate read null
2026-01-30 Temporally Coherent Imitation Learning via Latent Action Flow Matching for Robotic Manipulation Wu Songwei et.al. 2601.23087 translate read null
2026-01-30 RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning Yuexin Bian et.al. 2601.23075 translate read null
2026-01-30 From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning Wenzhe Niu et.al. 2601.23058 translate read null
2026-01-30 Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning Siyu Gong et.al. 2601.23032 translate read null
2026-01-30 Mem-T: Densifying Rewards for Long-Horizon Memory Agents Yanwei Yue et.al. 2601.23014 translate read null
2026-01-30 Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning Xinchen Han et.al. 2601.23010 translate read null
2026-01-30 Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Ximing Lu et.al. 2601.22975 translate read null
2026-01-30 Self-Imitated Diffusion Policy for Efficient and Robust Visual Navigation Runhua Zhang et.al. 2601.22965 translate read null
2026-01-30 SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding Boyin Tan et.al. 2601.22956 translate read null
2026-01-30 MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving Xidong Li et.al. 2601.22930 translate read null
2026-01-30 MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop Xuancheng Li et.al. 2601.22900 translate read null
2026-01-30 PlatoLTL: Learning to Generalize Across Symbols in LTL Instructions for Multi-Task RL Jacques Cloete et.al. 2601.22891 translate read null
2026-01-30 Reinforcement Learning-Based Co-Design and Operation of Chiller and Thermal Energy Storage for Cost-Optimal HVAC Systems Tanay Raghunandan Srinivasa et.al. 2601.22880 translate read null
2026-01-30 Degradation-Aware Frequency Regulation of a Heterogeneous Battery Fleet via Reinforcement Learning Tanay Raghunandan Srinivasa et.al. 2601.22865 translate read null
2026-01-30 The two-nest ants process on triangle-series-parallel graphs Cécile Mailler et.al. 2601.22855 translate read null
2026-01-30 Robust Rigid Body Assembly via Contact-Implicit Optimal Control with Exact Second-Order Derivatives Christian Dietz et.al. 2601.22849 translate read null
2026-01-30 Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment Mathieu Petitbois et.al. 2601.22823 translate read null
2026-01-30 CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning Ji Shi et.al. 2601.22803 translate read null
2026-01-30 Clipping-Free Policy Optimization for Large Language Models Ömer Veysel Çağatan et.al. 2601.22801 translate read null
2026-01-30 TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization Shichao Ma et.al. 2601.22776 translate read null
2026-01-30 A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization Shiye Lei et.al. 2601.22718 translate read null
2026-01-30 Real-Time Aligned Reward Model beyond Semantics Zixuan Huang et.al. 2601.22664 translate read null
2026-01-30 Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability Yong Ren et.al. 2601.22661 translate read null
2026-01-30 COBRA++: Enhanced COBRA Optimizer with Augmented Surrogate Pool and Reinforced Surrogate Selection Zepei Yu et.al. 2601.22624 translate read null
2026-01-30 From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents Jiaxuan Gao et.al. 2601.22607 translate read null
2026-01-30 Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR Hao Yi et.al. 2601.22595 translate read null
2026-01-30 MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning Youngeun Kim et.al. 2601.22582 translate read null
2026-01-30 Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation Geonho Leem et.al. 2601.22550 translate read null
2026-01-30 PersonaAct: Simulating Short-Video Users with Personalized Agents for Counterfactual Filter Bubble Auditing Shilong Zhao et.al. 2601.22547 translate read null
2026-01-30 Adapting Reinforcement Learning for Path Planning in Constrained Parking Scenarios Feng Tao et.al. 2601.22545 translate read null
2026-01-30 Detect and Act: Automated Dynamic Optimizer through Meta-Black-Box Optimization Zijian Gao et.al. 2601.22542 translate read null
2026-01-30 One Ring to Rule Them All: Unifying Group-Based RL via Dynamic Power-Mean Geometry Weisong Zhao et.al. 2601.22521 translate read null
2026-01-30 RoboStriker: Hierarchical Decision-Making for Autonomous Humanoid Boxing Kangning Yin et.al. 2601.22517 translate read null
2026-01-30 Mock Worlds, Real Skills: Building Small Agentic Language Models with Synthetic Tasks, Simulated Environments, and Rubric-Based Rewards Yuan-Jay Lü et.al. 2601.22511 translate read null
2026-01-30 DreamVAR: Taming Reinforced Visual Autoregressive Model for High-Fidelity Subject-Driven Image Generation Xin Jiang et.al. 2601.22507 translate read null
2026-01-30 Action-Sufficient Goal Representations Jinu Hyeon et.al. 2601.22496 translate read null
2026-01-30 SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization Jinyang Wu et.al. 2601.22491 translate read null
2026-01-30 RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D Floorplanning Ruizhe Zhong et.al. 2601.22476 translate read null
2026-01-30 Continual Policy Distillation from Distributed Reinforcement Learning Teachers Yuxuan Li et.al. 2601.22475 translate read null
2026-01-30 Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology Jian Xiong et.al. 2601.22474 translate read null
2026-01-30 HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning Weiqi Wang et.al. 2601.22448 translate read null
2026-01-29 SAIR: Cost-Efficient Multi-Stage ML Pipeline Autoscaling via In-Context Reinforcement Learning Jianchang Su et.al. 2601.22397 translate read null
2026-01-29 Quantum-Inspired Reinforcement Learning for Secure and Sustainable AIoT-Driven Supply Chain Systems Muhammad Bilal Akram Dastagir et.al. 2601.22339 translate read null
2026-01-29 Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning Qi Cao et.al. 2601.22323 translate read null
2026-01-29 Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning Chenxi Liu et.al. 2601.22297 translate read null
2026-01-29 Learning Reward Functions for Cooperative Resilience in Multi-Agent Systems Manuela Chacon-Chamorro et.al. 2601.22292 translate read null
2026-01-29 Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data Zhihao Zhang et.al. 2601.22242 translate read null
2026-01-29 Smart Walkers in Discrete Space Gianluca Peri et.al. 2601.22235 translate read null
2026-01-29 Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions Lingkai Kong et.al. 2601.22211 translate read null
2026-01-29 Causal Imitation Learning Under Measurement Error and Distribution Shift Shi Bo et.al. 2601.22206 translate read null
2026-01-28 ShellForge: Adversarial Co-Evolution of Webshell Generation and Multi-View Detection for Robust Webshell Defense Yizhong Ding et.al. 2601.22182 translate read null
2026-01-29 Exploring Reasoning Reward Model for Agents Kaixuan Fan et.al. 2601.22154 translate read link
2026-01-29 DynaWeb: Model-Based Reinforcement Learning of Web Agents Hang Ding et.al. 2601.22149 translate read null
2026-01-29 Learning to Dial-a-Ride: A Deep Graph Reinforcement Learning Approach to the Electric Dial-a-Ride Problem Sten Elling Tingstad Jacobsen et.al. 2601.22052 translate read null
2026-01-29 SIA: Symbolic Interpretability for Anticipatory Deep Reinforcement Learning in Network Control MohammadErfan Jabbari et.al. 2601.22044 translate read null
2026-01-29 SymbXRL: Symbolic Explainable Deep Reinforcement Learning for Mobile Networks Abhishek Duttagupta et.al. 2601.22024 translate read null
2026-01-29 Geometry of Drifting MDPs with Path-Integral Stability Certificates Zuyuan Zhang et.al. 2601.21991 translate read null
2026-01-29 Elign: Equivariant Diffusion Model Alignment from Foundational Machine Learning Force Fields Yunyang Li et.al. 2601.21985 translate read null
2026-01-29 Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic Shuo Liu et.al. 2601.21972 translate read null
2026-01-29 MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts Lorenzo Mazza et.al. 2601.21971 translate read null
2026-01-29 Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding Yifan Zhu et.al. 2601.21969 translate read null
2026-01-29 OVD: On-policy Verbal Distillation Jing Xiong et.al. 2601.21968 translate read null
2026-01-29 Optimistic Transfer under Task Shift via Bellman Alignment Jinhang Chai et.al. 2601.21924 translate read null
2026-01-29 Self-Compression of Chain-of-Thought via Multi-Agent Reinforcement Learning Yiqun Chen et.al. 2601.21919 translate read null
2026-01-29 ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation Zhao Wang et.al. 2601.21912 translate read null
2026-01-29 From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning Shaojie Wang et.al. 2601.21909 translate read null
2026-01-29 Acquiring Human-Like Mechanics Intuition from Scarce Observations via Deep Reinforcement Learning Jingruo Peng et.al. 2601.21881 translate read null
2026-01-29 WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents Yao Zhang et.al. 2601.21872 translate read null
2026-01-29 Spatiotemporal Continual Learning for Mobile Edge UAV Networks: Mitigating Catastrophic Forgetting Chuan-Chi Lai et.al. 2601.21861 translate read null
2026-01-29 Self-Adaptive Probabilistic Skyline Query Processing in Distributed Edge Computing via Deep Reinforcement Learning Chuan-Chi Lai et.al. 2601.21855 translate read null
2026-01-29 READY: Reward Discovery for Meta-Black-Box Optimization Zechuan Huang et.al. 2601.21847 translate read null
2026-01-29 Constrained Meta Reinforcement Learning with Provable Test-Time Safety Tingting Ni et.al. 2601.21845 translate read null
2026-01-29 Distribution-Aware Reward Estimation for Test-Time Reinforcement Learning Bodong Du et.al. 2601.21804 translate read null
2026-01-29 Error Amplification Limits ANN-to-SNN Conversion in Continuous Control Zijie Xu et.al. 2601.21778 translate read null
2026-01-29 OneMall: One Model, More Scenarios – End-to-End Generative Recommender Family at Kuaishou E-Commerce Kun Zhang et.al. 2601.21770 translate read null
2026-01-29 Influence Guided Sampling for Domain Adaptation of Text Retrievers Meet Doshi et.al. 2601.21759 translate read null
2026-01-29 Language-based Trial and Error Falls Behind in the Era of Experience Haoyu Wang et.al. 2601.21754 translate read link
2026-01-29 Epistemic Context Learning: Building Trust the Right Way in LLM-Based Multi-Agent Systems Ruiwen Zhou et.al. 2601.21742 translate read null
2026-01-29 Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators Rebecca Pelke et.al. 2601.21737 translate read null
2026-01-29 When does predictive inverse dynamics outperform behavior cloning? Lukas Schäfer et.al. 2601.21718 translate read null
2026-01-29 Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations Donatien Delehelle et.al. 2601.21713 translate read null
2026-01-29 TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning Huiyuan Lai et.al. 2601.21711 translate read null
2026-01-29 Can David Beat Goliath? On Multi-Hop Reasoning with Resource-Constrained Agents Hojae Han et.al. 2601.21699 translate read null
2026-01-29 BAP-SRL: Bayesian Adaptive Priority Safe Reinforcement Learning for Vehicle Motion Planning at Mixed Traffic Intersections Yuansheng Lian et.al. 2601.21679 translate read null
2026-01-29 Expected Return Causes Outcome-Level Mode Collapse in Reinforcement Learning and How to Fix It with Inverse Probability Scaling Abhijeet Sinha et.al. 2601.21669 translate read null
2026-01-29 Reinforcement Learning for Adaptive Composition of Quantum Circuit Optimisation Passes Daniel Mills et.al. 2601.21629 translate read null
2026-01-29 PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization Songhan Jiang et.al. 2601.21617 translate read null
2026-01-29 RecNet: Self-Evolving Preference Propagation for Agentic Recommender Systems Bingqian Li et.al. 2601.21609 translate read null
2026-01-29 Beyond Imitation: Reinforcement Learning for Active Latent Planning Zhi Zheng et.al. 2601.21598 translate read link
2026-01-29 Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening Xiaotong Ji et.al. 2601.21590 translate read null
2026-01-29 Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks Jinhao Li et.al. 2601.21572 translate read null
2026-01-29 ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Xiaoyu Tian et.al. 2601.21558 translate read link
2026-01-29 Training slow silicon neurons to control extremely fast robots with spiking reinforcement learning Irene Ambrosini et.al. 2601.21548 translate read null
2026-01-29 Explicit Credit Assignment through Local Rewards and Dependence Graphs in Multi-Agent Reinforcement Learning Bang Giang Le et.al. 2601.21523 translate read null
2026-01-29 ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment Xiuyu Li et.al. 2601.21484 translate read null
2026-01-29 Mean-Field Control on Sparse Graphs: From Local Limits to GNNs via Neighborhood Distributions Tobias Schmidt et.al. 2601.21477 translate read null
2026-01-29 SOUP: Token-level Single-sample Mix-policy Reinforcement Learning for Large Language Models Lei Yang et.al. 2601.21476 translate read null
2026-01-29 MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Yaorui Shi et.al. 2601.21468 translate read null
2026-01-29 HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing Chengyu Du et.al. 2601.21459 translate read null
2026-01-29 Mitigating Overthinking in Large Reasoning Models via Difficulty-aware Reinforcement Learning Qian Wan et.al. 2601.21418 translate read null
2026-01-29 Towards Space-Based Environmentally-Adaptive Grasping Leonidas Askianakis et.al. 2601.21394 translate read null
2026-01-29 Shaping the learning signal in a combined Q-learning rule to improve structured cooperation Chunpeng Du et.al. 2601.21392 translate read null
2026-01-29 Intrinsic Reward Policy Optimization for Sparse-Reward Environments Minjae Cho et.al. 2601.21391 translate read null
2026-01-29 Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control Weidong Huang et.al. 2601.21363 translate read null
2026-01-29 Factored Causal Representation Learning for Robust Reward Modeling in RLHF Yupei Yang et.al. 2601.21350 translate read null
2026-01-29 Self-Improving Pretraining: using post-trained models to pretrain better models Ellen Xiaoqing Tan et.al. 2601.21343 translate read null
2026-01-29 Heterogeneous Vertiport Selection Optimization for On-Demand Air Taxi Services: A Deep Reinforcement Learning Approach Aoyu Pang et.al. 2601.21316 translate read null
2026-01-29 Few-Shot Learning for Dynamic Operations of Automated Electric Taxi Fleets under Evolving Charging Infrastructure: A Meta-Deep Reinforcement Learning Approach Xiaozhuang Li et.al. 2601.21312 translate read null
2026-01-29 The Surprising Difficulty of Search in Model-Based Reinforcement Learning Wei-Di Chang et.al. 2601.21306 translate read null
2026-01-29 EGAM: Extended Graph Attention Model for Solving Routing Problems Licheng Wang et.al. 2601.21281 translate read null
2026-01-29 Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels Micah Rentschler et.al. 2601.21268 translate read null
2026-01-29 Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification Yiju Guo et.al. 2601.21244 translate read null
2026-01-29 Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning Xixian Yong et.al. 2601.21212 translate read null
2026-01-29 When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning Wei Wen et.al. 2601.21208 translate read null
2026-01-29 Do Reasoning Models Enhance Embedding Models? Wun Yu Chan et.al. 2601.21192 translate read null
2026-01-28 Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed Minjae Kwon et.al. 2601.21094 translate read link
2026-01-28 Deep Reinforcement Learning for Fault-Adaptive Routing in Eisenstein-Jacobi Interconnection Topologies Mohammad Walid Charrwi et.al. 2601.21090 translate read null
2026-01-28 OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence Jarrod Barnes et.al. 2601.21083 translate read link
2026-01-28 Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report Zhuoran Yang et.al. 2601.21051 translate read null
2026-01-28 Log2Motion: Biomechanical Motion Synthesis from Touch Logs Michał Patryk Miazga et.al. 2601.21043 translate read null
2026-01-28 SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model Zongheng Guo et.al. 2601.21031 translate read link
2026-01-28 Distributional Active Inference Abdullah Akgül et.al. 2601.20985 translate read null
2026-01-28 End-to-end example-based sim-to-real RL policy transfer based on neural stylisation with application to robotic cutting Jamie Hathaway et.al. 2601.20846 translate read null
2026-01-28 Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning Minwu Kim et.al. 2601.20829 translate read link
2026-01-28 Reinforcement Learning via Self-Distillation Jonas Hübotter et.al. 2601.20802 translate read link
2026-01-28 SERA: Soft-Verified Efficient Repository Agents Ethan Shen et.al. 2601.20789 translate read link
2026-01-28 Less is More: Clustered Cross-Covariance Control for Offline RL Nan Qiao et.al. 2601.20765 translate read null
2026-01-28 GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning Zhiheng Jiang et.al. 2601.20753 translate read null
2026-01-28 Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions Raul de la Rosa et.al. 2601.20714 translate read null
2026-01-28 One Step Is Enough: Dispersive MeanFlow Policy Optimization Guowei Zou et.al. 2601.20701 translate read null
2026-01-28 Grover’s Search-Inspired Quantum Reinforcement Learning for Massive MIMO User Scheduling Ruining Fan et.al. 2601.20688 translate read null
2026-01-28 Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models Zhiqiang Kou et.al. 2601.20687 translate read null
2026-01-28 GPO: Growing Policy Optimization for Legged Robot Locomotion and Whole-Body Control Shuhao Liao et.al. 2601.20668 translate read null
2026-01-28 Deep Learning based Three-stage Solution for ISAC Beamforming Optimization Qian Gao et.al. 2601.20667 translate read null
2026-01-28 Integrated Sensing and Communication for Segmented Waveguide-Enabled Pinching Antenna Systems Qian Gao et.al. 2601.20658 translate read null
2026-01-28 RL based Beamforming Optimization for 3D Pinching Antenna assisted ISAC Systems Qian Gao et.al. 2601.20654 translate read null
2026-01-28 P2S: Probabilistic Process Supervision for General-Domain Reasoning Question Answering Wenlin Zhong et.al. 2601.20649 translate read null
2026-01-28 Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Yanqi Dai et.al. 2601.20614 translate read null
2026-01-28 Ranking-aware Reinforcement Learning for Ordinal Ranking Aiming Hao et.al. 2601.20585 translate read null
2026-01-28 Inequality in Congestion Games with Learning Agents Dimitris Michailidis et.al. 2601.20578 translate read null
2026-01-28 Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual Explanations Fatima Ezzeddine et.al. 2601.20449 translate read null
2026-01-28 PEARL: Plan Exploration and Adaptive Reinforcement Learning for Multihop Tool Use Qihao Wang et.al. 2601.20439 translate read null
2026-01-28 MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models Wenbo Xu et.al. 2601.20433 translate read null
2026-01-28 Reinforcement Learning for Dividend Optimization in Partially Observed Regime-Switching Diffusion Model Zhongqin Gao et.al. 2601.20387 translate read null
2026-01-28 PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments Zhuang Chen et.al. 2601.20330 translate read null
2026-01-28 CE-RM: A Pointwise Generative Reward Model Optimized via Two-Stage Rollout and Unified Criteria Xinyu Hu et.al. 2601.20327 translate read null
2026-01-28 Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models Zhenchen Tang et.al. 2601.20305 translate read null
2026-01-28 Proactive SFC Provisioning with Forecast-Driven DRL in Data Centers Parisa Fard Moshiri et.al. 2601.20229 translate read null
2026-01-28 Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning Hang Zhang et.al. 2601.20221 translate read null
2026-01-28 Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning Jinyang Wu et.al. 2601.20209 translate read null
2026-01-28 Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery Zhipeng Zhang et.al. 2601.20193 translate read null
2026-01-27 Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models Abha Jha et.al. 2601.20126 translate read null
2026-01-27 A Reinforcement Learning Based Universal Sequence Design for Polar Codes David Kin Wai Ho et.al. 2601.20118 translate read null
2026-01-27 In-Context Reinforcement Learning From Suboptimal Historical Data Juncheng Dong et.al. 2601.20116 translate read null
2026-01-27 Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis Darshan Deshpande et.al. 2601.20103 translate read null
2026-01-27 Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery Meng Xin et.al. 2601.20088 translate read null
2026-01-27 Techno-economic optimization of a heat-pipe microreactor, part II: multi-objective optimization analysis Paul Seurin et.al. 2601.20079 translate read null
2026-01-27 Distributional value gradients for stochastic environments Baptiste Debes et.al. 2601.20071 translate read null
2026-01-27 Exploring the holographic entropy cone via reinforcement learning Temple He et.al. 2601.19979 translate read null
2026-01-27 E2HiL: Entropy-Guided Sample Selection for Efficient Real-World Human-in-the-Loop Reinforcement Learning Haoyuan Deng et.al. 2601.19969 translate read null
2026-01-27 Self-Distillation Enables Continual Learning Idan Shenfeld et.al. 2601.19897 translate read null
2026-01-27 A Latent Space Framework for Modeling Transient Engine Emissions Using Joint Embedding Predictive Architectures Ganesh Sundaram et.al. 2601.19822 translate read null
2026-01-27 Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals Octavio Pappalardo et.al. 2601.19810 translate read null
2026-01-27 Reimagining Peer Review Process Through Multi-Agent Mechanism Design Ahmad Farooq et.al. 2601.19778 translate read null
2026-01-27 Reimagining Social Robots as Recommender Systems: Foundations, Framework, and Applications Jin Huang et.al. 2601.19761 translate read null
2026-01-27 Improving Policy Exploitation in Online Reinforcement Learning with Instant Retrospect Action Gong Gao et.al. 2601.19720 translate read null
2026-01-27 Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow Yunyue Wei et.al. 2601.19707 translate read null
2026-01-27 AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion Tianyue Jiang et.al. 2601.19697 translate read null
2026-01-27 Video-KTR: Reinforcing Video Reasoning via Key Token Attribution Ziyue Wang et.al. 2601.19686 translate read null
2026-01-27 Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning Tongxi Wang et.al. 2601.19624 translate read null
2026-01-27 R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning Zhizheng Jiang et.al. 2601.19620 translate read null
2026-01-27 Safe Exploration via Policy Priors Manuel Wendl et.al. 2601.19612 translate read null
2026-01-27 LLM-Enhanced Reinforcement Learning for Long-Term User Satisfaction in Interactive Recommendation Chongjun Xia et.al. 2601.19585 translate read null
2026-01-27 Bridging Information Asymmetry: A Hierarchical Framework for Deterministic Blind Face Restoration Zhengjian Yao et.al. 2601.19506 translate read null
2026-01-27 Reinforcement Learning Goal-Reaching Control with Guaranteed Lyapunov-Like Stabilizer for Mobile Robots Mehdi Heydari Shahna et.al. 2601.19499 translate read null
2026-01-27 APC-RL: Exceeding Data-Driven Behavior Priors with Adaptive Policy Composition Finn Rietz et.al. 2601.19452 translate read null
2026-01-27 OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation Giuseppe Chiari et.al. 2601.19439 translate read null
2026-01-27 Task-Centric Policy Optimization from Misaligned Motion Priors Ziang Zheng et.al. 2601.19411 translate read null
2026-01-27 CHEHAB RL: Learning to Optimize Fully Homomorphic Encryption Computations Bilel Sefsaf et.al. 2601.19367 translate read null
2026-01-27 From Observations to Events: Event-Aware World Model for Reinforcement Learning Zhao-Han Peng et.al. 2601.19336 translate read null
2026-01-27 Innovator-VL: A Multimodal Large Language Model for Scientific Discovery Zichen Wen et.al. 2601.19325 translate read null
2026-01-27 Reinforced Rate Control for Neural Video Compression via Inter-Frame Rate-Distortion Awareness Wuyang Cong et.al. 2601.19293 translate read null
2026-01-27 Model-Free Output Feedback Stabilization via Policy Gradient Methods Ankang Zhang et.al. 2601.19284 translate read null
2026-01-27 Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning Kishan Panaganti et.al. 2601.19280 translate read null
2026-01-27 Reinforcement Learning for Enhanced Advanced QEC Architecture Decoding Yidong Zhou et.al. 2601.19279 translate read null
2026-01-27 iFAN Ecosystem: A Unified AI, Digital Twin, Cyber-Physical Security, and Robotics Environment for Advanced Nuclear Simulation and Operations Youndo Do et.al. 2601.19234 translate read null
2026-01-27 Structure-based RNA Design by Step-wise Optimization of Latent Diffusion Model Qi Si et.al. 2601.19232 translate read null
2026-01-27 Towards Pixel-Level VLM Perception via Simple Points Prediction Tianhui Song et.al. 2601.19228 translate read null
2026-01-27 Exploring Weaknesses in Function Call Models via Reinforcement Learning: An Adversarial Data Augmentation Approach Weiran Guo et.al. 2601.19122 translate read null
2026-01-27 Glance and Focus Reinforcement for Pan-cancer Screening Linshan Wu et.al. 2601.19103 translate read null
2026-01-27 Reward Engineering for Reinforcement Learning in Software Tasks Md Rayhanul Masud et.al. 2601.19100 translate read null
2026-01-27 m2sv: A Scalable Benchmark for Map-to-Street-View Spatial Reasoning Yosub Shin et.al. 2601.19099 translate read null
2026-01-27 Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback Siddhant Arora et.al. 2601.19063 translate read null
2026-01-26 A Unifying View of Coverage in Linear Off-Policy Evaluation Philip Amortila et.al. 2601.19030 translate read null
2026-01-26 Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning Haolin Liu et.al. 2601.18984 translate read null
2026-01-26 Reinforcement Learning for Quantum Technology Marin Bukov et.al. 2601.18953 translate read null
2026-01-26 Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach Mehrdad Mohammadi et.al. 2601.18952 translate read null
2026-01-26 Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration Hwanwoo Kim et.al. 2601.18907 translate read null
2026-01-26 Analysis of Control Bellman Residual Minimization for Markov Decision Problem Donghwan Lee et.al. 2601.18840 translate read null
2026-01-26 Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes Amrith Setlur et.al. 2601.18795 translate read null
2026-01-26 Multi-Objective Reinforcement Learning for Efficient Tactical Decision Making for Trucks in Highway Traffic Deepthi Pathare et.al. 2601.18783 translate read null
2026-01-26 POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration Yuxiao Qu et.al. 2601.18779 translate read null
2026-01-26 Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Shobhita Sundaram et.al. 2601.18778 translate read null
2026-01-26 Dep-Search: Learning Dependency-Aware Reasoning Traces with Persistent Memory Yanming Liu et.al. 2601.18771 translate read null
2026-01-26 Trust, Don’t Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback Seyed Amir Hosseini et.al. 2601.18751 translate read null
2026-01-26 Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models Siyan Zhao et.al. 2601.18734 translate read null
2026-01-26 Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale Henry Bell et.al. 2601.18730 translate read null
2026-01-26 Trustworthy Evaluation of Robotic Manipulation: A New Benchmark and AutoEval Methods Mengyuan Liu et.al. 2601.18723 translate read null
2026-01-26 Health-SCORE: Towards Scalable Rubrics for Improving Health-LLMs Zhichao Yang et.al. 2601.18706 translate read null
2026-01-26 ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule Yilie Huang et.al. 2601.18681 translate read null
2026-01-26 AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning Mingyang Song et.al. 2601.18631 translate read null
2026-01-26 Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning Yingxiao Huo et.al. 2601.18626 translate read null
2026-01-26 Learning long term climate-resilient transport adaptation pathways under direct and indirect flood impacts using reinforcement learning Miguel Costa et.al. 2601.18586 translate read null
2026-01-26 From Classification to Ranking: Enhancing LLM Reasoning Capabilities for MBTI Personality Detection Yuan Cao et.al. 2601.18582 translate read null
2026-01-26 K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents Vincenzo De Paola et.al. 2601.18580 translate read null
2026-01-26 GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning Kaixun Jiang et.al. 2601.18543 translate read null
2026-01-26 From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation Yuxin Jiang et.al. 2601.18533 translate read null
2026-01-26 Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates Yibo Li et.al. 2601.18510 translate read null
2026-01-26 Enhancing Control Policy Smoothness by Aligning Actions with Predictions from Preceding States Kyoleen Kwak et.al. 2601.18479 translate read null
2026-01-26 OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents Yuhang Zhou et.al. 2601.18467 translate read null
2026-01-26 Deep Reinforcement Learning for Hybrid RIS Assisted MIMO Communications Phuong Nam Tran et.al. 2601.18453 translate read null
2026-01-26 Emergent Cooperation in Quantum Multi-Agent Reinforcement Learning Using Communication Michael Kölle et.al. 2601.18419 translate read null
2026-01-26 daVinci-Dev: Agent-native Mid-training for Software Engineering Ji Zeng et.al. 2601.18418 translate read null
2026-01-26 AI Agent for Reverse-Engineering Legacy Finite-Difference Code and Translating to Devito Yinghan Hou et.al. 2601.18381 translate read null
2026-01-26 Temp-R1: A Unified Autonomous Agent for Complex Temporal KGQA via Reverse Curriculum Reinforcement Learning Zhaoyan Gong et.al. 2601.18296 translate read null
2026-01-26 Reinforcement Learning with Distributed MPC for Fuel-Efficient Platoon Control with Discrete Gear Transitions Samuel Mallick et.al. 2601.18294 translate read null
2026-01-26 TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment Zhewen Tan et.al. 2601.18292 translate read null
2026-01-26 VissimRL: A Multi-Agent Reinforcement Learning Framework for Traffic Signal Control Based on Vissim Hsiao-Chuan Chang et.al. 2601.18284 translate read null
2026-01-26 Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue Yuhang Jia et.al. 2601.18281 translate read null
2026-01-26 ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants Pei Wang et.al. 2601.18225 translate read null
2026-01-26 Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents Zhihan Liu et.al. 2601.18217 translate read null
2026-01-26 PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR James Burgess et.al. 2601.18207 translate read null
2026-01-26 QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding Linhan Cao et.al. 2601.18195 translate read null
2026-01-26 FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning Zhaopeng Qiu et.al. 2601.18150 translate read null
2026-01-26 Exact Minimum-Volume Confidence Set Intersection for Multinomial Outcomes Heguang Lin et.al. 2601.18145 translate read null
2026-01-26 Enhance the Safety in Reinforcement Learning by ADRC Lagrangian Methods Mingxu Zhang et.al. 2601.18142 translate read null
2026-01-26 Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions Pedram Agand et.al. 2601.18107 translate read null
2026-01-26 Diffusion Model-based Reinforcement Learning for Version Age of Information Scheduling: Average and Tail-Risk-Sensitive Control Haoyuan Pan et.al. 2601.18069 translate read null
2026-01-23 Autonomous Optical Alignment of Satellite-Based Entanglement Sources using Reinforcement Learning Andrzej Gajewski et.al. 2601.16968 translate read null
2026-01-23 The Trajectory Alignment Coefficient in Two Acts: From Reward Tuning to Reward Learning Calarina Muslimani et.al. 2601.16906 translate read null
2026-01-23 Boosting Deep Reinforcement Learning with Semantic Knowledge for Robotic Manipulators Lucía Güitta-López et.al. 2601.16866 translate read null
2026-01-23 Reasoning Promotes Robustness in Theory of Mind Tasks Ian B. de Haan et.al. 2601.16853 translate read null
2026-01-23 LongCat-Flash-Thinking-2601 Technical Report Meituan LongCat Team et.al. 2601.16725 translate read null
2026-01-23 Adaptive Reinforcement and Model Predictive Control Switching for Safe Human-Robot Cooperative Navigation Ning Liu et.al. 2601.16686 translate read null
2026-01-23 Sim-to-Real Transfer via a Style-Identified Cycle Consistent Generative Adversarial Network: Zero-Shot Deployment on Robotic Manipulators through Visual Domain Adaptation Lucía Güitta-López et.al. 2601.16677 translate read null
2026-01-23 A Cognitive Framework for Autonomous Agents: Toward Human-Inspired Design Francesco Guidi et.al. 2601.16648 translate read null
2026-01-23 Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab Julius Beerwerth et.al. 2601.16578 translate read null
2026-01-23 Spiking Neural Networks for Communication Systems: Encoding Schemes, Learning Algorithms, and Equalization~Techniques Eike-Manuel Edelmann et.al. 2601.16550 translate read null
2026-01-23 UAV-Assisted Joint Data Collection and Wireless Power Transfer for Batteryless Sensor Networks Wen Zhang et.al. 2601.16533 translate read null
2026-01-23 Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic Yichuan Ma et.al. 2601.16486 translate read null
2026-01-23 FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning Haoxu Wang et.al. 2601.16483 translate read null
2026-01-23 Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go Yichuan Ma et.al. 2601.16447 translate read null
2026-01-23 Endless Terminals: Scaling RL Environments for Terminal Agents Kanishk Gandhi et.al. 2601.16443 translate read link
2026-01-23 Reinforcement Learning-Based Energy-Aware Coverage Path Planning for Precision Agriculture Beining Wu et.al. 2601.16405 translate read null
2026-01-23 Towards a Theoretical Understanding to the Generalization of RLHF Zhaochun Li et.al. 2601.16403 translate read null
2026-01-23 Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification Zongwan Cao et.al. 2601.16400 translate read null
2026-01-23 A Regularized Actor-Critic Algorithm for Bi-Level Reinforcement Learning Sihan Zeng et.al. 2601.16399 translate read null
2026-01-22 LLM-in-Sandbox Elicits General Agentic Intelligence Daixuan Cheng et.al. 2601.16206 translate read link
2026-01-22 Learning to Discover at Test Time Mert Yuksekgonul et.al. 2601.16175 translate read link
2026-01-22 Structured Hints for Sample-Efficient Lean Theorem Proving Zachary Burton et.al. 2601.16172 translate read null
2026-01-22 Efficiently Learning Robust Torque-based Locomotion Through Reinforcement with Model-Based Supervision Yashuai Yan et.al. 2601.16109 translate read null
2026-01-22 SAMTok: Representing Any Mask with Two Words Yikang Zhou et.al. 2601.16093 translate read link
2026-01-22 Dynamic Tactile Sensing System and Soft Actor Critic Reinforcement Learning for Inclusion Characterization John Bannan et.al. 2601.16061 translate read null
2026-01-22 Keyframe-Based Feed-Forward Visual Odometry Weichen Dai et.al. 2601.16020 translate read null
2026-01-22 PUMA: Perception-driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour Liang Wang et.al. 2601.15995 translate read null
2026-01-22 Decoupling Return-to-Go for Efficient Decision Transformer Yongyi Wang et.al. 2601.15953 translate read null
2026-01-22 Off-Policy Actor-Critic with Sigmoid-Bounded Entropy for Real-World Robot Learning Xiefeng Wu et.al. 2601.15761 translate read null
2026-01-22 PhysProver: Advancing Automatic Theorem Proving for Physics Hanning Zhang et.al. 2601.15737 translate read null
2026-01-22 Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind Zhitao He et.al. 2601.15715 translate read null
2026-01-22 D-Optimality-Guided Reinforcement Learning for Efficient Open-Loop Calibration of a 3-DOF Ankle Rehabilitation Robot Qifan Hu et.al. 2601.15707 translate read null
2026-01-22 From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models Jiaxin Zhang et.al. 2601.15690 translate read null
2026-01-22 Performance-guided Reinforced Active Learning for Object Detection Zhixuan Liang et.al. 2601.15688 translate read null
2026-01-22 EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning Dingdong Wang et.al. 2601.15668 translate read null
2026-01-22 Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors Zhiwei Zhang et.al. 2601.15625 translate read null
2026-01-22 Explainable Deepfake Detection with RL Enhanced Self-Blended Images Ning Jiang et.al. 2601.15624 translate read null
2026-01-22 AION: Aerial Indoor Object-Goal Navigation Using Dual-Policy Reinforcement Learning Zichen Yan et.al. 2601.15614 translate read null
2026-01-22 When Sharpening Becomes Collapse: Sampling Bias and Semantic Coupling in RL with Verifiable Rewards Mingyuan Fan et.al. 2601.15609 translate read null
2026-01-22 A Mobile Magnetic Manipulation Platform for Gastrointestinal Navigation with Deep Reinforcement Learning Control Zhifan Yan et.al. 2601.15545 translate read null
2026-01-21 Non-Stationary Functional Bilevel Optimization Jason Bohne et.al. 2601.15363 translate read null
2026-01-21 Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing Xiang Li et.al. 2601.15356 translate read null
2026-01-21 Statistical Reinforcement Learning in the Real World: A Survey of Challenges and Future Directions Asim H. Gazi et.al. 2601.15353 translate read null
2026-01-20 ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation Zhebo Wang et.al. 2601.15330 translate read null
2026-01-21 The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Zanlin Ni et.al. 2601.15165 translate read link
2026-01-21 Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning Yuval Kansal et.al. 2601.15160 translate read null
2026-01-21 Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data Yuval Ran-Milo et.al. 2601.15158 translate read null
2026-01-21 CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning Tianshi Xu et.al. 2601.15141 translate read null
2026-01-21 Vehicle Routing with Finite Time Horizon using Deep Reinforcement Learning with Improved Network Embedding Ayan Maity et.al. 2601.15131 translate read null
2026-01-21 Memory Retention Is Not Enough to Master Memory Tasks in Reinforcement Learning Oleg Shchendrigin et.al. 2601.15086 translate read null
2026-01-21 A Curriculum-Based Deep Reinforcement Learning Framework for the Electric Vehicle Routing Problem Mertcan Daysalilar et.al. 2601.15038 translate read null
2026-01-21 Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control Jannis Becktepe et.al. 2601.15015 translate read null
2026-01-21 Improving Regret Approximation for Unsupervised Dynamic Environment Generation Harry Mead et.al. 2601.14957 translate read null
2026-01-21 Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation Rui Qi et.al. 2601.14896 translate read null
2026-01-21 What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study Keyu Lv et.al. 2601.14888 translate read null
2026-01-21 CI4A: Semantic Component Interfaces for Agents Empowering Web Automation Zhi Qiu et.al. 2601.14790 translate read null
2026-01-21 ReinPath: A Multimodal Reinforcement Learning Approach for Pathology Kangcheng Zhou et.al. 2601.14757 translate read null
2026-01-21 PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning Yao Lu et.al. 2601.14716 translate read null
2026-01-21 DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs Mingxuan Song et.al. 2601.14711 translate read null
2026-01-21 Case-Guided Sequential Assay Planning in Drug Discovery Tianchi Chen et.al. 2601.14710 translate read null
2026-01-21 Proximal Policy Optimization with Evolutionary Mutations Casimir Czworkowski et.al. 2601.14705 translate read null
2026-01-21 DARL: Encouraging Diverse Answers for General Reasoning without Verifiers Chongxuan Huang et.al. 2601.14700 translate read null
2026-01-21 CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation Yutong Chen et.al. 2601.14695 translate read null
2026-01-21 Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning Jianwen Sun et.al. 2601.14693 translate read null
2026-01-21 FARE: Fast-Slow Agentic Robotic Exploration Shuhao Liao et.al. 2601.14681 translate read null
2026-01-21 MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks Zixuan Ke et.al. 2601.14652 translate read null
2026-01-21 SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation Xichen Zhang et.al. 2601.14615 translate read null
2026-01-21 Learning Consistent Taxonomic Classification through Hierarchical Reasoning Zhenghong Li et.al. 2601.14610 translate read null
2026-01-21 Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education Unggi Lee et.al. 2601.14560 translate read null
2026-01-20 Report for NSF Workshop on AI for Electronic Design Automation Deming Chen et.al. 2601.14541 translate read null
2026-01-20 Towards Execution-Grounded Automated AI Research Chenglei Si et.al. 2601.14525 translate read link
2026-01-20 Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree Leyi Zhao et.al. 2601.14523 translate read null
2026-01-20 Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow Haocheng Xi et.al. 2601.14243 translate read null
2026-01-20 Spatiotemporal Wildfire Prediction and Reinforcement Learning for Helitack Suppression Shaurya Mathur et.al. 2601.14238 translate read null
2026-01-20 Q-learning with Adjoint Matching Qiyang Li et.al. 2601.14234 translate read link
2026-01-20 KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning Egor Cherepanov et.al. 2601.14232 translate read link
2026-01-20 Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment Punit Kumar et.al. 2601.14228 translate read null
2026-01-20 InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning Matthew Y. R. Yang et.al. 2601.14209 translate read null
2026-01-20 Differentiated Pickup Point Offering for Emission Reduction in Last-Mile Delivery Albina Galiullina et.al. 2601.14196 translate read null
2026-01-20 Toward Efficient Agents: Memory, Tool learning, and Planning Xiaofang Yang et.al. 2601.14192 translate read link
2026-01-20 CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems Tong Xie et.al. 2601.14140 translate read null
2026-01-20 Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning Tairan Huang et.al. 2601.14104 translate read null
2026-01-20 Optimizing Energy and Data Collection in UAV-aided IoT Networks using Attention-based Multi-Objective Reinforcement Learning Babacar Toure et.al. 2601.14092 translate read null
2026-01-20 RM-Distiller: Exploiting Generative LLM for Reward Model Distillation Hongli Zhou et.al. 2601.14032 translate read null
2026-01-20 RL-BioAug: Label-Efficient Reinforcement Learning for Self-Supervised EEG Representation Learning Cheol-Hui Lee et.al. 2601.13964 translate read null
2026-01-20 Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning Hongbo Bai et.al. 2601.13942 translate read null
2026-01-20 Deep Reinforcement Learning-Based Dynamic Resource Allocation in Cell-Free Massive MIMO Phuong Nam Tran et.al. 2601.13934 translate read null
2026-01-20 HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs Yuezhe Yang et.al. 2601.13919 translate read null
2026-01-20 TractRLFusion: A GPT-Based Multi-Critic Policy Fusion Framework for Fiber Tractography Ankita Joshi et.al. 2601.13897 translate read null
2026-01-20 Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering Chak Tou Leong et.al. 2601.13752 translate read null
2026-01-20 Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning Yue Guo et.al. 2601.13690 translate read null
2026-01-20 Reinforcement Learning for Opportunistic Routing in Software-Defined LEO-Terrestrial Systems Sivaram Krishnan et.al. 2601.13662 translate read null
2026-01-20 Communication-Free Collective Navigation for a Swarm of UAVs via LiDAR-Based Deep Reinforcement Learning Myong-Yol Choi et.al. 2601.13657 translate read null
2026-01-20 Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning Yuchen Jiao et.al. 2601.13642 translate read null
2026-01-20 A Kubernetes custom scheduler based on reinforcement learning for compute-intensive pods Hanlin Zhou et.al. 2601.13579 translate read null
2026-01-20 Behavior Knowledge Merge in Reinforced Agentic Models Xiangchi Yuan et.al. 2601.13572 translate read link
2026-01-20 Reasoning While Recommending: Entropy-Guided Latent Reasoning in Generative Re-ranking Models Changshuo Zhang et.al. 2601.13533 translate read null
2026-01-20 Group Relative Policy Optimization for Robust Blind Interference Alignment with Fluid Antennas Jianqiu Peng et.al. 2601.13506 translate read null
2026-01-19 RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models Bo Ren et.al. 2601.13409 translate read null
2026-01-19 Balancing Classification and Calibration Performance in Decision-Making LLMs via Calibration Aware Reinforcement Learning Duygu Nur Yaldiz et.al. 2601.13284 translate read null
2026-01-19 CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning Eric Onyame et.al. 2601.13262 translate read link
2026-01-19 Autonomous Navigation at the Nano-Scale: Algorithms, Architectures, and Constraints Mahmud S. Zango et.al. 2601.13252 translate read null
2026-01-19 Training instability in deep learning follows low-dimensional dynamical principles Zhipeng Zhang et.al. 2601.13160 translate read null
2026-01-19 Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning Fengran Mo et.al. 2601.13115 translate read null
2026-01-19 Static Is Not Enough: A Comparative Study of VR and SpaceMouse in Static and Dynamic Teleoperation Tasks Yijun Zhou et.al. 2601.13042 translate read null
2026-01-19 Feedforward-Feedback Integration in Flight Control: Reinforcement Learning with Sliding Mode Control Imran Sayyed et.al. 2601.13037 translate read null
2026-01-19 Think3D: Thinking with Space for Spatial Reasoning Zaibin Zhang et.al. 2601.13029 translate read link
2026-01-19 Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models Runxuan Liu et.al. 2601.12995 translate read null
2026-01-19 PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient Zijian Wang et.al. 2601.12988 translate read null
2026-01-19 Imitation learning-based spacecraft rendezvous and docking method with Expert Demonstration Shibo Shao et.al. 2601.12952 translate read null
2026-01-19 Communication Methods in Multi-Agent Reinforcement Learning Christoph Wittner et.al. 2601.12886 translate read null
2026-01-19 FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions Peng Li et.al. 2601.12799 translate read link
2026-01-19 Unleashing Efficient Asynchronous RL Post-Training via Staleness-Constrained Rollout Coordination Haoyang Li et.al. 2601.12784 translate read null
2026-01-19 SDN-Blockchain Based Security Routing for UAV Communication via Reinforcement Learning Yulu Han et.al. 2601.12774 translate read null
2026-01-19 Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction Xingjie Gao et.al. 2601.12762 translate read link
2026-01-19 Distribution-Centric Policy Optimization Dominates Exploration-Exploitation Trade-off Zhaochun Li et.al. 2601.12730 translate read link
2026-01-19 Teaching Large Reasoning Models Effective Reflection Hanbin Wang et.al. 2601.12720 translate read null
2026-01-19 Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization Junyi Liao et.al. 2601.12707 translate read null
2026-01-19 Resource-Conscious RL Algorithms for Deep Brain Stimulation Arkaprava Gupta et.al. 2601.12699 translate read null
2026-01-19 Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks Xingran Chen et.al. 2601.12662 translate read null
2026-01-19 Two-Layer Reinforcement Learning-Assisted Joint Beamforming and Trajectory Optimization for Multi-UAV Downlink Communications Ruiqi Wang et.al. 2601.12659 translate read null
2026-01-19 Multiagent Reinforcement Learning in Enhancing Resilience of Microgrids under Extreme Weather Events Yin Wu et.al. 2601.12657 translate read null
2026-01-19 STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models Xiangyu Shi et.al. 2601.12641 translate read null
2026-01-16 Do explanations generalize across large reasoning models? Koyena Pal et.al. 2601.11517 translate read null
2026-01-16 Generative Scenario Rollouts for End-to-End Autonomous Driving Rajeev Yasarla et.al. 2601.11475 translate read null
2026-01-16 The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents Ziyu Wang et.al. 2601.11421 translate read null
2026-01-16 Factored Value Functions for Graph-Based Multi-Agent Reinforcement Learning Ahmed Rashwan et.al. 2601.11401 translate read null
2026-01-16 The Mini Wheelbot Dataset: High-Fidelity Data for Robot Learning Henrik Hose et.al. 2601.11394 translate read null
2026-01-16 Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency Akhilesh Raj et.al. 2601.11352 translate read null
2026-01-16 Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation Pingzhi Tang et.al. 2601.11258 translate read null
2026-01-16 Model-free policy gradient for discrete-time mean-field control Matthieu Meunier et.al. 2601.11217 translate read null
2026-01-16 Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems Sofiene Lassoued et.al. 2601.11189 translate read null
2026-01-16 TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech Girish A. Koushik et.al. 2601.11178 translate read null
2026-01-16 Deep GraphRAG: A Balanced Approach to Hierarchical Retrieval and Adaptive Integration Yuejie Li et.al. 2601.11144 translate read null
2026-01-16 Learning Quadrupedal Locomotion for a Heavy Hydraulic Robot Using an Actuator Model Minho Lee et.al. 2601.11143 translate read null
2026-01-16 PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models Qiyuan Zhang et.al. 2601.11087 translate read null
2026-01-16 Visual Marker Search for Autonomous Drone Landing in Diverse Urban Environments Jiaohong Yao et.al. 2601.11078 translate read null
2026-01-16 Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs Lecheng Yan et.al. 2601.11061 translate read null
2026-01-16 BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search Shiyu Liu et.al. 2601.11037 translate read link
2026-01-16 Toward Adaptive Grid Resilience: A Gradient-Free Meta-RL Framework for Critical Load Restoration Zain ul Abdeen et.al. 2601.10973 translate read null
2026-01-16 MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement Meidan Ding et.al. 2601.10949 translate read null
2026-01-16 Where to Touch, How to Contact: Hierarchical RL-MPC Framework for Geometry-Aware Long-Horizon Dexterous Manipulation Zhixian Xie et.al. 2601.10930 translate read null
2026-01-15 Realistic Curriculum Reinforcement Learning for Autonomous and Sustainable Marine Vessel Navigation Zhang Xiaocai et.al. 2601.10911 translate read null
2026-01-15 Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning Rajat Ghosh et.al. 2601.10905 translate read null
2026-01-15 Reasoning Models Generate Societies of Thought Junsol Kim et.al. 2601.10825 translate read null
2026-01-11 Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration Sen Wang et.al. 2601.10744 translate read null
2026-01-15 MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching Changle Qu et.al. 2601.10712 translate read null
2026-01-15 Institutional AI: A Governance Framework for Distributional AGI Safety Federico Pierucci et.al. 2601.10599 translate read null
2026-01-15 Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay Hao Wang et.al. 2601.10589 translate read null
2026-01-15 Combinatorial Optimization Augmented Machine Learning Maximilian Schiffer et.al. 2601.10583 translate read null
2026-01-15 PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models Chengbing Wang et.al. 2601.10532 translate read null
2026-01-15 Projected Microbatch Accumulation yields reference-free proximal policy updates for reinforcement learning Nilin Abrahamsen et.al. 2601.10498 translate read null
2026-01-15 Urban Socio-Semantic Segmentation with Vision-Language Reasoning Yu Wang et.al. 2601.10477 translate read null
2026-01-15 Reinforcement Learning with Multi-Step Lookahead Information Via Adaptive Batching Nadav Merlis et.al. 2601.10418 translate read null
2026-01-15 CS-GBA: A Critical Sample-based Gradient-guided Backdoor Attack for Offline Reinforcement Learning Yuanjie Zhao et.al. 2601.10407 translate read null
2026-01-15 Advanced Manufacturing with Renewable and Bio-based Materials: AI/ML workflows and Process Optimization Rigoberto Advincula et.al. 2601.10382 translate read null
2026-01-15 FastStair: Learning to Run Up Stairs with Humanoid Robots Yan Liu et.al. 2601.10365 translate read null
2026-01-15 SuS: Strategy-aware Surprise for Intrinsic Exploration Mark Kashirskiy et.al. 2601.10349 translate read null
2026-01-15 Boundary-Aware NL2SQL: Integrating Reliability through Hybrid Reward and Data Synthesis Songsong Tian et.al. 2601.10318 translate read null
2026-01-15 Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning Xin Guan et.al. 2601.10306 translate read null
2026-01-15 The impact of tactile sensor configurations on grasp learning efficiency – a comparative evaluation in simulation Eszter Birtalan et.al. 2601.10268 translate read null
2026-01-15 PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary Jiarui Yao et.al. 2601.10201 translate read null
2026-01-15 HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning Ziang Cui et.al. 2601.10187 translate read null
2026-01-15 Reinforcement Learning to Discover a NorthEast Monsoon Index for Monthly Rainfall Prediction in Thailand Kiattikun Chobtham et.al. 2601.10181 translate read null
2026-01-15 Service Provisioning and Path Planning with Obstacle Avoidance for Low-Altitude Wireless Networks Senning Wan et.al. 2601.10179 translate read null
2026-01-15 ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback Yutao Mou et.al. 2601.10156 translate read null
2026-01-15 DecisionLLM: Large Language Models for Long Sequence Decision Exploration Xiaowei Lv et.al. 2601.10148 translate read null
2026-01-15 History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis Haochong Xia et.al. 2601.10143 translate read null
2026-01-15 Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts Sijia Luo et.al. 2601.10079 translate read null
2026-01-15 Event-Driven Deep RL Dispatcher for Post-Storm Distribution System Restoration Farshad Amani et.al. 2601.10044 translate read null
2026-01-15 PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization Tingyue Pan et.al. 2601.10029 translate read null
2026-01-15 Towards Native Intelligence: 6G-LLM Trained with Reinforcement Learning from NDT Feedback Zhuoran Xiao et.al. 2601.09992 translate read null
2026-01-14 OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing Yilin Bao et.al. 2601.09858 translate read null
2026-01-14 Eluder dimension: localise it! Alireza Bakhtiari et.al. 2601.09825 translate read null
2026-01-14 GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents Chen Chen et.al. 2601.09770 translate read null
2026-01-14 STEP3-VL-10B Technical Report Ailin Huang et.al. 2601.09668 translate read null
2026-01-14 Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Zhiyuan Hu et.al. 2601.09667 translate read null
2026-01-14 DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing Qian Cao et.al. 2601.09609 translate read null
2026-01-14 Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets Jeremiah Coholich et.al. 2601.09605 translate read null
2026-01-14 Dialogue Telemetry: Turn-Level Instrumentation for Autonomous Information Gathering Dimitris Panagopoulos et.al. 2601.09570 translate read null
2026-01-14 Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations Wei-Jin Huang et.al. 2601.09518 translate read null
2026-01-14 Data Scaling for Navigation in Unknown Environments Lauri Suomela et.al. 2601.09444 translate read null
2026-01-14 Draw it like Euclid: Teaching transformer models to generate CAD profiles using ruler and compass construction steps Siyi Li et.al. 2601.09428 translate read null
2026-01-14 Semi-Contention-Free Access in IoT NOMA Networks: A Reinforcement Learning Framework Abhishek Kumar et.al. 2601.09422 translate read null
2026-01-14 GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR Jiaying Zhang et.al. 2601.09361 translate read null
2026-01-14 Monte-Carlo Tree Search with Neural Network Guidance for Lane-Free Autonomous Driving Ioannis Peridis et.al. 2601.09353 translate read null
2026-01-14 Policy-Based Reinforcement Learning with Action Masking for Dynamic Job Shop Scheduling under Uncertainty: Handling Random Arrivals and Machine Failures Sofiene Lassoued et.al. 2601.09293 translate read null
2026-01-14 Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction Mianzhi Pan et.al. 2601.09285 translate read null
2026-01-14 RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering Wencheng Ye et.al. 2601.09269 translate read link
2026-01-14 Learning to Trust Experience: A Monitor-Trust-Regulator Framework for Learning under Unobservable Feedback Reliability Zhipeng Zhang et.al. 2601.09261 translate read null
2026-01-14 Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models Yan Liu et.al. 2601.09260 translate read null
2026-01-14 Reward Learning through Ranking Mean Squared Error Chaitanya Kharyal et.al. 2601.09236 translate read null
2026-01-14 GIFT: Unlocking Global Optimality in Post-Training via Finite-Temperature Gibbs Initialization Zhengyang Zhao et.al. 2601.09233 translate read null
2026-01-14 UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning Feng Zhang et.al. 2601.09215 translate read null
2026-01-14 SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL Lijun Liu et.al. 2601.09136 translate read null
2026-01-14 SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache Chi-Chih Chang et.al. 2601.09083 translate read null
2026-01-13 TranslateGemma Technical Report Mara Finkelstein et.al. 2601.09012 translate read null
2026-01-13 Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Yao Tang et.al. 2601.08808 translate read null
2026-01-13 Identifying Latent Intentions via Inverse Reinforcement Learning in Repeated Linear Public Good Games Carina I. Hausladen et.al. 2601.08803 translate read null
2026-01-13 Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Zhiyuan Hu et.al. 2601.08763 translate read null
2026-01-13 TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback Prithwish Jana et.al. 2601.08734 translate read null
2026-01-13 Learning from Demonstrations via Capability-Aware Goal Sampling Yuanlin Duan et.al. 2601.08731 translate read null
2026-01-13 Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts Bert Verbruggen et.al. 2601.08726 translate read null
2026-01-13 QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models Zhaolu Kang et.al. 2601.08689 translate read null
2026-01-13 PersonaDual: Balancing Personalization and Objectivity via Adaptive Reasoning Xiaoyou Liu et.al. 2601.08679 translate read null
2026-01-13 VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory Shaoan Wang et.al. 2601.08665 translate read null
2026-01-13 From Classical to Quantum Reinforcement Learning and Its Applications in Quantum Control: A Beginner’s Tutorial Abhijit Sen et.al. 2601.08662 translate read null
2026-01-13 Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization Abhijit Mazumdar et.al. 2601.08646 translate read null
2026-01-13 Your Group-Relative Advantage Is Biased Fengkai Yang et.al. 2601.08521 translate read null
2026-01-13 AUV Trajectory Learning for Underwater Acoustic Energy Transfer and Age Minimization Mohamed Afouene Melki et.al. 2601.08491 translate read null
2026-01-13 AME-2: Agile and Generalized Legged Locomotion via Attention-Based Neural Map Encoding Chong Zhang et.al. 2601.08485 translate read null
2026-01-13 Baiting AI: Deceptive Adversary Against AI-Protected Industrial Infrastructures Aryan Pasikhani et.al. 2601.08481 translate read null
2026-01-13 JudgeRLVR: Judge First, Generate Second for Efficient Reasoning Jiangshan Duo et.al. 2601.08468 translate read null
2026-01-13 Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis Yi Qin et.al. 2601.08440 translate read null
2026-01-13 Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management Weitao Ma et.al. 2601.08435 translate read null
2026-01-13 Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving? Long Zhang et.al. 2601.08434 translate read null
2026-01-13 RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation Sunzhu Li et.al. 2601.08430 translate read null
2026-01-13 Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering Nonghai Zhang et.al. 2601.08427 translate read null
2026-01-13 Owen-Shapley Policy Optimization (OSPO): A Principled RL Algorithm for Generative Search LLMs Abhijnan Nath et.al. 2601.08403 translate read null
2026-01-13 Safe Heterogeneous Multi-Agent RL with Communication Regularization for Coordinated Target Acquisition Gabriele Calzolari et.al. 2601.08327 translate read null
2026-01-13 AtomMem : Learnable Dynamic Agentic Memory with Atomic Memory Operation Yupeng Huo et.al. 2601.08323 translate read null
2026-01-13 ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning Kun Liang et.al. 2601.08310 translate read null
2026-01-13 D $^2$ Plan: Dual-Agent Dynamic Global Planning for Complex Retrieval-Augmented Reasoning Kangcheng Luo et.al. 2601.08282 translate read null
2026-01-13 Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees Kun Li et.al. 2601.08274 translate read null
2026-01-13 Unleashing Tool Engineering and Intelligence for Agentic AI in Next-Generation Communication Networks Yinqiu Liu et.al. 2601.08259 translate read null
2026-01-13 Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks Abdikarim Mohamed Ibrahim et.al. 2601.08254 translate read null
2026-01-13 Incorporating Cognitive Biases into Reinforcement Learning for Financial Decision-Making Liu He et.al. 2601.08247 translate read null
2026-01-13 The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination Haoran Su et.al. 2601.08237 translate read null
2026-01-13 Scalable Multiagent Reinforcement Learning with Collective Influence Estimation Zhenglong Luo et.al. 2601.08210 translate read null
2026-01-13 ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms Mohammad Pivezhandi et.al. 2601.08166 translate read null
2026-01-13 Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies Zeyang Li et.al. 2601.08136 translate read null
2026-01-13 Structure Detection for Contextual Reinforcement Learning Tianyue Zhou et.al. 2601.08120 translate read null
2026-01-13 STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order Chengyang Gu et.al. 2601.08107 translate read null
2026-01-12 DRL-based Power Allocation in LiDAL-Assisted RLNC-NOMA OWC Systems Ahmed A. Hassan et.al. 2601.08060 translate read null
2026-01-12 Forecast Aware Deep Reinforcement Learning for Efficient Electricity Load Scheduling in Dairy Farms Nawazish Alia et.al. 2601.08052 translate read null
2026-01-12 Formalizing the Relationship between Hamilton-Jacobi Reachability and Reinforcement Learning Prashant Solanki et.al. 2601.08050 translate read null
2026-01-12 FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures Jifeng Song et.al. 2601.08026 translate read null
2026-01-12 Learning Better Error Correction Codes with Hybrid Quantum-Assisted Machine Learning Yariv Yanay et.al. 2601.08014 translate read null
2026-01-12 Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety Can Jin et.al. 2601.08000 translate read null
2026-01-12 Reinforcement Learning Methods for Neighborhood Selection in Local Search Yannick Molinghen et.al. 2601.07948 translate read null
2026-01-12 Video Generation Models in Robotics – Applications, Research Challenges, Future Directions Zhiting Mei et.al. 2601.07823 translate read null
2026-01-12 Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation Huanyu Li et.al. 2601.07821 translate read null
2026-01-12 Data-driven control of hydraulic impact hammers under strict operational and control constraints Francisco Leiva et.al. 2601.07813 translate read null
2026-01-12 Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning Wei Fang et.al. 2601.07782 translate read null
2026-01-12 Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding Yanxiang Huang et.al. 2601.07761 translate read null
2026-01-12 Hiking in the Wild: A Scalable Perceptive Parkour Framework for Humanoids Shaoting Zhu et.al. 2601.07718 translate read null
2026-01-12 Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model Siwen Jiao et.al. 2601.07695 translate read null
2026-01-12 Reinforcement Learning for Micro-Level Claims Reserving Benjamin Avanzi et.al. 2601.07637 translate read null
2026-01-12 Clipped Affine Policy: Low-Complexity Near-Optimal Online Power Control for Energy Harvesting Communications over Fading Channels Hao Wu et.al. 2601.07622 translate read null
2026-01-12 GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation Dimple Vijay Kochar et.al. 2601.07593 translate read null
2026-01-12 Large Language Models for Physics Instrument Design Sara Zoccheddu et.al. 2601.07580 translate read null
2026-01-12 Stagewise Reinforcement Learning and the Geometry of the Regret Landscape Chris Elliott et.al. 2601.07524 translate read null
2026-01-12 Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions Yongqi Li et.al. 2601.07516 translate read null
2026-01-12 Graph Inference Towards ICD Coding Xiaoxiao Deng et.al. 2601.07496 translate read null
2026-01-12 Online Markov Decision Processes with Terminal Law Constraints Bianca Marin Moreno et.al. 2601.07492 translate read null
2026-01-12 Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning Sijia li et.al. 2601.07463 translate read null
2026-01-12 LOONG: Online Time-Optimal Autonomous Flight for MAVs in Cluttered Environments Xin Guan et.al. 2601.07434 translate read null
2026-01-12 Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning Ziheng Li et.al. 2601.07408 translate read null
2026-01-12 On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training Xueyan Niu et.al. 2601.07389 translate read null
2026-01-12 OpenTinker: Separating Concerns in Agentic Reinforcement Learning Siqi Zhu et.al. 2601.07376 translate read link
2026-01-12 Reward Modeling from Natural Language Human Feedback Zongqi Wang et.al. 2601.07349 translate read null
2026-01-12 Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training Xue Gong et.al. 2601.07320 translate read null
2026-01-12 Low-Altitude Satellite-AAV Collaborative Joint Mobile Edge Computing and Data Collection via Diffusion-based Deep Reinforcement Learning Boxiong Wang et.al. 2601.07307 translate read null
2026-01-12 Heterogeneous Multi-Expert Reinforcement Learning for Long-Horizon Multi-Goal Tasks in Autonomous Forklifts Yun Chen et.al. 2601.07304 translate read null
2026-01-12 Mimic Human Cognition, Master Multi-Image Reasoning: A Meta-Action Framework for Enhanced Visual Understanding Jianghao Yin et.al. 2601.07298 translate read null
2026-01-12 LRAS: Advanced Legal Reasoning with Agentic Search Yujin Zhou et.al. 2601.07296 translate read null
2026-01-12 ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios Changzai Pan et.al. 2601.07280 translate read null
2026-01-12 The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Weihao Xuan et.al. 2601.07264 translate read null
2026-01-12 Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning Hanbin Wang et.al. 2601.07238 translate read null
2026-01-12 Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration Yang Zhao et.al. 2601.07224 translate read null
2026-01-12 Structured Reasoning for Large Language Models Jinyi Han et.al. 2601.07180 translate read null
2026-01-12 Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization Min Wang et.al. 2601.07164 translate read null
2026-01-12 AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units Xinzi Cao et.al. 2601.07160 translate read null
2026-01-12 Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation (Extended Version) Aja Khanal et.al. 2601.07152 translate read null
2026-01-12 Rewarding Creativity: A Human-Aligned Generative Reward Model for Reinforcement Learning in Storytelling Zhaoyan Li et.al. 2601.07149 translate read null
2026-01-12 Generating readily synthesizable small molecule fluorophore scaffolds with reinforcement learning Ruhi Sayana et.al. 2601.07145 translate read null
2026-01-12 Dynamics of Multi-Agent Actor-Critic Learning in Stochastic Games: from Multistability and Chaos to Stable Cooperation Yuxin Geng et.al. 2601.07142 translate read null
2026-01-12 ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System Sungguk Cha et.al. 2601.07125 translate read null
2026-01-12 ENTRA: Entropy-Based Redundancy Avoidance in Large Language Model Reasoning Ruichu Cai et.al. 2601.07123 translate read null
2026-01-12 Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework Yixiao Peng et.al. 2601.07122 translate read null
2026-01-12 Reward-Preserving Attacks For Robust Reinforcement Learning Lucas Schott et.al. 2601.07118 translate read null
2026-01-12 MEDVISTAGYM: A Scalable Training Environment for Thinking with Medical Images via Tool-Integrated Reinforcement Learning Meng Lu et.al. 2601.07107 translate read null
2026-01-11 X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests Jie Wu et.al. 2601.06953 translate read link
2026-01-11 TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG Tianhua Zhang et.al. 2601.06922 translate read null
2026-01-11 Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models Shaoning Sun et.al. 2601.06911 translate read null
2026-01-11 Personality-Aware Reinforcement Learning for Persuasive Dialogue with LLM-Driven Simulation Donghuo Zeng et.al. 2601.06877 translate read null
2026-01-11 A Brain-like Synergistic Core in LLMs Drives Behaviour and Learning Pedro Urbina-Rodriguez et.al. 2601.06851 translate read null
2026-01-11 Code Evolution for Control: Synthesizing Policies via LLM-Driven Evolutionary Search Ping Guo et.al. 2601.06845 translate read null
2026-01-11 Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy Shujian Gao et.al. 2601.06801 translate read null
2026-01-11 Artificial Intelligence Driven Channel Coding and Resource Optimization for Wireless Networks Yasir Ali et.al. 2601.06796 translate read null
2026-01-11 GDEPO: Group Dual-dynamic and Equal-right-advantage Policy Optimization with Enhanced Training Data Utilization for Sample-Constrained Reinforcement Learning Zhengqing Yan et.al. 2601.06795 translate read null
2026-01-11 No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning Zhicong Li et.al. 2601.06794 translate read null
2026-01-11 ImmuniFraug: A Metacognitive Intervention Anti-Fraud Approach to Enhance Undergraduate Students’ Cyber Fraud Awareness Xiangzhe Yuan et.al. 2601.06774 translate read null
2026-01-11 GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO Shubhashis Roy Dipta et.al. 2601.06767 translate read null
2026-01-11 On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning Changyu Liu et.al. 2601.06748 translate read null
2026-01-10 Characterising Toxicity in Generative Large Language Models Zhiyao Zhang et.al. 2601.06700 translate read null
2026-01-10 Plasticity vs. Rigidity: The Impact of Low-Rank Adapters on Reasoning on a Micro-Budget Zohaib Khan et.al. 2601.06677 translate read null
2026-01-10 Reinforcement Learning-Guided Dynamic Multi-Graph Fusion for Evacuation Traffic Prediction Md Nafees Fuad Rafi et.al. 2601.06664 translate read null
2026-01-10 KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks Zhangqi Duan et.al. 2601.06633 translate read null
2026-01-10 Object-Centric World Models Meet Monte Carlo Tree Search Rodion Vakhitov et.al. 2601.06604 translate read null
2026-01-10 ArrowGEV: Grounding Events in Video via Learning the Arrow of Time Fangxu Yu et.al. 2601.06559 translate read null
2026-01-10 Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODASER) for Safe Reinforcement Learning in Optimal Control Roya Khalili Amirabadi et.al. 2601.06540 translate read null
2026-01-10 Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection Minghui Jia et.al. 2601.06498 translate read link
2026-01-10 ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Qiang Zhang et.al. 2601.06487 translate read link
2026-01-10 Coupling Smoothed Particle Hydrodynamics with Multi-Agent Deep Reinforcement Learning for Cooperative Control of Point Absorbers Yi Zhan et.al. 2601.06485 translate read null
2026-01-10 Deep Reinforcement Learning based Control Design for Aircraft Recovery from Loss-of-Control Scenario Imran Sayyed et.al. 2601.06439 translate read null
2026-01-10 LSRIF: Logic-Structured Reinforcement Learning for Instruction Following Qingyu Ren et.al. 2601.06431 translate read null
2026-01-10 Lightweight Yet Secure: Secure Scripting Language Generation via Lightweight LLMs Keyang Zhang et.al. 2601.06419 translate read null
2026-01-10 Dynamic Incentivized Cooperation under Changing Rewards Philipp Altmann et.al. 2601.06382 translate read null
2026-01-09 Future-as-Label: Scalable Supervision from Real-World Outcomes Benjamin Turtel et.al. 2601.06336 translate read null
2026-01-09 The pros and cons of using deep reinforcement learning or genetic algorithms to design control schemes for quantum state transfer on qubit chains Sofía Perón Santana et.al. 2601.06303 translate read null
2026-01-09 How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning? Yufeng Wang et.al. 2601.06289 translate read null
2026-01-09 Walk the PLANC: Physics-Guided RL for Agile Humanoid Locomotion on Constrained Footholds Min Dai et.al. 2601.06286 translate read null
2026-01-09 Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization Miao Pan et.al. 2601.06224 translate read null
2026-01-09 Toward Safe and Responsible AI Agents: A Three-Pillar Model for Transparency, Accountability, and Trustworthiness Edward C. Cheng et.al. 2601.06223 translate read null
2026-01-08 TimeGNN-Augmented Hybrid-Action MARL for Fine-Grained Task Partitioning and Energy-Aware Offloading in MEC Wei Ai et.al. 2601.06191 translate read null
2026-01-07 TIR-Flow: Active Video Search and Reasoning with Frozen VLMs Hongbo Jin et.al. 2601.06176 translate read null
2026-01-06 HiMeS: Hippocampus-inspired Memory System for Personalized AI Assistants Hailong Li et.al. 2601.06152 translate read null
2026-01-05 A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control Wonhyeok Choi et.al. 2601.06133 translate read null
2026-01-09 Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Jiajie Zhang et.al. 2601.06021 translate read link
2026-01-09 TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents Dawei Wang et.al. 2601.05899 translate read link
2026-01-09 StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management Ruizhe Zhang et.al. 2601.05890 translate read null
2026-01-09 IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck Huilin Deng et.al. 2601.05870 translate read null
2026-01-09 Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning Kaichen Shen et.al. 2601.05868 translate read null
2026-01-09 Intelligent Singularity Avoidance in UR10 Robotic Arm Path Planning Using Hybrid Fuzzy Logic and Reinforcement Learning Sheng-Kai Chen et.al. 2601.05836 translate read null
2026-01-09 EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis Xiaoshuai Song et.al. 2601.05808 translate read link
2026-01-09 From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation Zezhou Wang et.al. 2601.05787 translate read link
2026-01-09 SketchVL: Policy Optimization via Fine-Grained Credit Assignment for Chart Understanding and More Muye Huang et.al. 2601.05688 translate read null
2026-01-09 CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space Bingyi Liu et.al. 2601.05675 translate read null
2026-01-09 EvoQRE: Modeling Bounded Rationality in Safety-Critical Traffic Simulation via Evolutionary Quantal Response Equilibrium Phu-Hoa Pham et.al. 2601.05653 translate read null
2026-01-09 GIFT: Games as Informal Training for Generalizable LLMs Nuoyan Lyu et.al. 2601.05633 translate read null
2026-01-09 Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks ShaoZhen Liu et.al. 2601.05616 translate read null
2026-01-09 Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR Zijun Min et.al. 2601.05607 translate read null
2026-01-09 PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Jingcheng Hu et.al. 2601.05593 translate read link
2026-01-09 Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection Cooper Lin et.al. 2601.05578 translate read null
2026-01-09 Autonomous Discovery of the Ising Model’s Critical Parameters with Reinforcement Learning Hai Man et.al. 2601.05577 translate read null
2026-01-09 WildSci: Advancing Scientific Reasoning from In-the-Wild Literature Tengxiao Liu et.al. 2601.05567 translate read null
2026-01-09 Closing the Modality Reasoning Gap for Speech Large Language Models Chaoren Wang et.al. 2601.05543 translate read null
2026-01-09 LEAPS: An LLM-Empowered Adaptive Plugin for Taobao AI Search Lei Wang et.al. 2601.05513 translate read null
2026-01-09 How Exploration Breaks Cooperation in Shared-Policy Multi-Agent Reinforcement Learning Yi-Ning Weng et.al. 2601.05509 translate read null
2026-01-09 MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards Zhiyu Shen et.al. 2601.05488 translate read null
2026-01-09 MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization Jiefu Ou et.al. 2601.05475 translate read null
2026-01-09 Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning Zhaoqi Wang et.al. 2601.05466 translate read null
2026-01-09 PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering Yu Liu et.al. 2601.05465 translate read null
2026-01-09 Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction Hongjin Kim et.al. 2601.05459 translate read null
2026-01-08 Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Yuxiang Ji et.al. 2601.05432 translate read link
2026-01-08 Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning Minwoo Cho et.al. 2601.05407 translate read null
2026-01-08 Imitation Learning for Combinatorial Optimisation under Uncertainty Prakash Gawas et.al. 2601.05383 translate read null
2026-01-05 On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis Hector Zenil et.al. 2601.05280 translate read null
2026-01-08 RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes Yuan-Kang Lee et.al. 2601.05249 translate read link
2026-01-08 GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Shih-Yang Liu et.al. 2601.05242 translate read link
2026-01-08 EARL: Energy-Aware Optimization of Liquid State Machines for Pervasive AI Zain Iqbal et.al. 2601.05205 translate read null
2026-01-08 SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning Yanchang Liang et.al. 2601.05187 translate read null
2026-01-08 Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems Jihao Zhao et.al. 2601.05171 translate read null
2026-01-08 Safe Continual Reinforcement Learning Methods for Nonstationary Environments. Towards a Survey of the State of the Art Timofey Tomashevskiy et.al. 2601.05152 translate read null
2026-01-08 Unitary fault-tolerant encoding of Pauli states in surface codes Luis Colmenarez et.al. 2601.05113 translate read null
2026-01-08 Reinforced Efficient Reasoning via Semantically Diverse Exploration Ziqi Zhao et.al. 2601.05053 translate read link
2026-01-08 Hán Dān Xué Bù (Mimicry) or Qīng Chū Yú Lán (Mastery)? A Cognitive Perspective on Reasoning Distillation in Large Language Models Yueqing Hu et.al. 2601.05019 translate read null
2026-01-08 On the Hidden Objective Biases of Group-based Reinforcement Learning Aleksandar Fontana et.al. 2601.05002 translate read null
2026-01-08 AlgBench: To What Extent Do Large Reasoning Models Understand Algorithms? Henan Sun et.al. 2601.04996 translate read null
2026-01-08 A DQN-based model for intelligent network selection in heterogeneous wireless systems Fayssal Bendaoud et.al. 2601.04978 translate read null
2026-01-08 ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning Minda Hu et.al. 2601.04973 translate read null
2026-01-08 Text as a Universal Interface for Transferable Personalization Yuting Liu et.al. 2601.04963 translate read null
2026-01-08 Safe Reinforcement Learning Beyond Baseline Control: A Hierarchical Framework for Space Triangle Tethered Formation System Xinyi Tao et.al. 2601.04957 translate read null
2026-01-08 Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following Yirong Zeng et.al. 2601.04954 translate read null
2026-01-08 SKATER: Synthesized Kinematics for Advanced Traversing Efficiency on a Humanoid Robot via Roller Skate Swizzles Junchi Gu et.al. 2601.04948 translate read null
2026-01-08 Deep Reinforcement Learning for Optimum Order Execution: Mitigating Risk and Maximizing Returns Khabbab Zakaria et.al. 2601.04896 translate read null
2026-01-08 Flexible Manufacturing Systems Intralogistics: Dynamic Optimization of AGVs and Tool Sharing Using Coloured-Timed Petri Nets and Actor-Critic RL with Actions Masking Sofiene Lassoued et.al. 2601.04887 translate read null
2026-01-08 RAAR: Retrieval Augmented Agentic Reasoning for Cross-Domain Misinformation Detection Zhiwei Liu et.al. 2601.04853 translate read null
2026-01-08 Intelligent resource allocation in wireless networks via deep reinforcement learning Marie Diane Iradukunda et.al. 2601.04842 translate read null
2026-01-08 SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning Caijun Xu et.al. 2601.04809 translate read link
2026-01-08 Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning Siyuan Gan et.al. 2601.04805 translate read null
2026-01-08 AgentOCR: Reimagining Agent History via Optical Self-Compression Lang Feng et.al. 2601.04786 translate read null
2026-01-08 AT $^2$ PO: Agentic Turn-based Policy Optimization via Tree Search Zefang Zong et.al. 2601.04767 translate read link
2026-01-08 AM $^3$ Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs Han Zhu et.al. 2601.04736 translate read null
2026-01-08 ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving Chang Zhao et.al. 2601.04714 translate read null
2026-01-08 TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning Yinuo Wang et.al. 2601.04698 translate read null
2026-01-08 A Method for Constructing a Digital Transformation Driving Mechanism Based on Semantic Understanding of Large Models Huayi Liu et.al. 2601.04696 translate read null
2026-01-08 Tape: A Cellular Automata Benchmark for Evaluating Rule-Shift Generalization in Reinforcement Learning Enze Pan et.al. 2601.04695 translate read null
2026-01-08 ResMAS: Resilience Optimization in LLM-based Multi-agent Systems Zhilun Zhou et.al. 2601.04694 translate read null
2026-01-08 Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead Oluwatosin Oseni et.al. 2601.04686 translate read null
2026-01-08 Agri-R1: Empowering Generalizable Agricultural Reasoning in Vision-Language Models with Reinforcement Learning Wentao Zhang et.al. 2601.04672 translate read null
2026-01-08 Learning Dynamics in RL Post-Training for Language Models Akiyoshi Tomihari et.al. 2601.04670 translate read null
2026-01-08 Optimizing Path Planning using Deep Reinforcement Learning for UGVs in Precision Agriculture Laukik Patade et.al. 2601.04668 translate read null
2026-01-08 Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization Mizanur Rahman et.al. 2601.04582 translate read link
2026-01-08 Reasoning Over Space: Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation Dongyi Lv et.al. 2601.04562 translate read null
2026-01-08 Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training Tianle Wang et.al. 2601.04537 translate read null
2026-01-08 GRACE: Reinforcement Learning for Grounded Response and Abstention under Contextual Evidence Yibo Zhao et.al. 2601.04525 translate read null
2026-01-08 TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation Jacob Ede Levine et.al. 2601.04521 translate read null
2026-01-08 Multiagent Reinforcement Learning with Neighbor Action Estimation Zhenglong Luo et.al. 2601.04511 translate read null
2026-01-07 Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization Xingjian Diao et.al. 2601.04442 translate read null
2026-01-07 Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization Matthew Landers et.al. 2601.04441 translate read null
2026-01-07 Rate or Fate? RLV $^\varepsilon$ R: Reinforcement Learning with Verifiable Noisy Rewards Ali Rad et.al. 2601.04411 translate read null
2026-01-07 Transformer-based Multi-agent Reinforcement Learning for Separation Assurance in Structured and Unstructured Airspaces Arsyi Aziz et.al. 2601.04401 translate read null
2026-01-07 Enhanced-FQL( $λ$ ), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay Mohsen Jalaeian-Farimani et.al. 2601.04392 translate read null
2026-01-07 Survival Dynamics of Neural and Programmatic Policies in Evolutionary Reinforcement Learning Anton Roupassov-Ruiz et.al. 2601.04365 translate read null
2026-01-07 Online Action-Stacking Improves Reinforcement Learning Performance for Air Traffic Control Ben Carvell et.al. 2601.04287 translate read null
2026-01-07 A Future Capabilities Agent for Tactical Air Traffic Control Paul Kent et.al. 2601.04285 translate read null
2026-01-07 Making Tunable Parameters State-Dependent in Weather and Climate Models with Reinforcement Learning Pritthijit Nath et.al. 2601.04268 translate read null
2026-01-06 Cross-Language Speaker Attribute Prediction Using MIL and RL Sunny Shu et.al. 2601.04257 translate read null
2026-01-07 Hierarchical GNN-Based Multi-Agent Learning for Dynamic Queue-Jump Lane and Emergency Vehicle Corridor Formation Haoran Su et.al. 2601.04177 translate read null
2026-01-07 Agentic Rubrics as Contextual Verifiers for SWE Agents Mohit Raghavendra et.al. 2601.04171 translate read null
2026-01-07 InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training Ziyun Zhang et.al. 2601.04126 translate read null
2026-01-07 GeoReason: Aligning Thinking And Answering In Remote Sensing Vision-Language Models Via Logical Consistency Reinforcement Learning Wenshuai Li et.al. 2601.04118 translate read null
2026-01-07 Cells on Autopilot: Adaptive Cell (Re)Selection via Reinforcement Learning Marvin Illian et.al. 2601.04083 translate read null
2026-01-07 Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model Yuan Wang et.al. 2601.04033 translate read null
2026-01-07 On-Device Deep Reinforcement Learning for Decentralized Task Offloading Performance trade-offs in the training process Gorka Nieto et.al. 2601.03976 translate read null
2026-01-07 Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models Wei Wu et.al. 2601.03969 translate read null
2026-01-07 CoINS: Counterfactual Interactive Navigation via Skill-Aware VLM Kangjie Zhou et.al. 2601.03956 translate read null
2026-01-07 Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification Rui Sun et.al. 2601.03948 translate read null
2026-01-07 Adaptive-Boundary-Clipping GRPO: Ensuring Bounded Ratios for Stable and Generalizable Training Chi Liu et.al. 2601.03895 translate read null
2026-01-07 IndexTTS 2.5 Technical Report Yunpei Li et.al. 2601.03888 translate read null
2026-01-07 Staged Voxel-Level Deep Reinforcement Learning for 3D Medical Image Segmentation with Noisy Annotations Yuyang Fu et.al. 2601.03875 translate read null
2026-01-07 Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning Fei Wu et.al. 2601.03823 translate read null
2026-01-07 ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition Muyang Zhao et.al. 2601.03822 translate read null
2026-01-07 From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs Usha Shrestha et.al. 2601.03808 translate read null
2026-01-07 NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning Zhongtao Miao et.al. 2601.03790 translate read null
2026-01-07 MVP: Enhancing Video Large Language Models via Self-supervised Masked Video Prediction Xiaokun Sun et.al. 2601.03781 translate read null
2026-01-07 O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL Yi Yao et.al. 2601.03743 translate read null
2026-01-07 EDCO: Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning Jing-Cheng Pang et.al. 2601.03725 translate read null
2026-01-07 ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization Shijie Zhang et.al. 2601.03723 translate read null
2026-01-07 R $^3$ L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification Weijie Shi et.al. 2601.03715 translate read link
2026-01-07 TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL Lang Cao et.al. 2601.03703 translate read null
2026-01-07 Dual-Attention Heterogeneous GNN for Multi-robot Collaborative Area Search via Deep Reinforcement Learning Lina Zhu et.al. 2601.03686 translate read null
2026-01-07 Accounting for Optimal Control in the Sizing of Isolated Hybrid Renewable Energy Systems Using Imitation Learning Simon Halvdansson et.al. 2601.03679 translate read null
2026-01-07 Sandwich Reasoning: An Answer-Reasoning-Answer Approach for Low-Latency Query Correction Chen Zhang et.al. 2601.03672 translate read null
2026-01-07 AMIR-GRPO: Inducing Implicit Preference Signals into GRPO Amir Hossein Yari et.al. 2601.03661 translate read null
2026-01-07 ReLA: Representation Learning and Aggregation for Job Scheduling with Reinforcement Learning Zhengyi Kwan et.al. 2601.03646 translate read null
2026-01-07 Locomotion Beyond Feet Tae Hoon Yang et.al. 2601.03607 translate read null
2026-01-07 Interleaved Tool-Call Reasoning for Protein Function Understanding Chuanliu Fan et.al. 2601.03604 translate read null
2026-01-07 From Score to Sound: An End-to-End MIDI-to-Motion Pipeline for Robotic Cello Performance Samantha Sudhoff et.al. 2601.03562 translate read null
2026-01-07 SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models Yuxuan Jiang et.al. 2601.03555 translate read null
2026-01-07 VeRPO: Verifiable Dense Reward Policy Optimization for Code Generation Longwen Wang et.al. 2601.03525 translate read null
2026-01-07 A Reinforcement Learning-Based Model for Mapping and Goal-Directed Navigation Using Multiscale Place Fields Bekarys Dukenbaev et.al. 2601.03520 translate read null
2026-01-07 Semantic Belief-State World Model for 3D Human Motion Prediction Sarim Chaudhry et.al. 2601.03517 translate read null
2026-01-07 Adaptive Model-Based Reinforcement Learning for Orbit Feedback Control in NSLS-II Storage Ring Zeyu Dong et.al. 2601.03486 translate read null
2026-01-06 Understanding Reward Hacking in Text-to-Image Reinforcement Learning Yunqi Hong et.al. 2601.03468 translate read null
2026-01-06 ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing Hengjia Li et.al. 2601.03467 translate read null
2026-01-06 FIRE-VLM: A Vision-Language-Driven Reinforcement Learning Framework for UAV Wildfire Tracking in a Physics-Grounded Fire Digital Twin Chris Webb et.al. 2601.03449 translate read null
2026-01-06 Foundation Model-Aided Hierarchical Control for Robust RIS-Assisted Near-Field Communications Mohammad Ghassemi et.al. 2601.03427 translate read null
2026-01-06 Sensor to Pixels: Decentralized Swarm Gathering via Image-Based Reinforcement Learning Yigal Koifman et.al. 2601.03413 translate read null
2026-01-06 Exploration Through Introspection: A Self-Aware Reward Model Michael Petrowski et.al. 2601.03389 translate read null
2026-01-06 Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting Kun Zhao et.al. 2601.03321 translate read null
2026-01-06 Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning Yu Luo et.al. 2601.03320 translate read null
2026-01-06 Mastering the Game of Go with Self-play Experience Replay Jingbin Liu et.al. 2601.03306 translate read null
2026-01-06 Autonomous Threat Detection and Response in Cloud Security: A Comprehensive Survey of AI-Driven Strategies Gaurav Sarraf et.al. 2601.03303 translate read null
2026-01-06 PC2P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception Guotao Li et.al. 2601.03301 translate read null
2026-01-06 STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning Juntong Ni et.al. 2601.03248 translate read null
2026-01-06 Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion Mykola Vysotskyi et.al. 2601.03213 translate read null
2026-01-06 UltraLogic: Enhancing LLM Reasoning through Large-Scale Data Synthesis and Bipolar Float Reward Yile Liu et.al. 2601.03205 translate read null
2026-01-06 MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory Shengtao Zhang et.al. 2601.03192 translate read null
2026-01-06 WebAnchor: Anchoring Agent Planning to Stabilize Long-Horizon Web Reasoning Xinmiao Yu et.al. 2601.03164 translate read null
2026-01-06 Unified Thinker: A General Reasoning Modular Core for Image Generation Sashuai Zhou et.al. 2601.03127 translate read null
2026-01-06 One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling Yiyuan Li et.al. 2601.03111 translate read null
2026-01-06 Post-Decision State-Based Online Learning for Delay-Energy-Aware Flow Allocation in Wireless Systems Mahesh Ganesh Bhat et.al. 2601.03108 translate read null
2026-01-06 IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation Yankai Jiang et.al. 2601.03054 translate read null
2026-01-06 SOP: A Scalable Online Post-Training System for Vision-Language-Action Models Mingjie Pan et.al. 2601.03044 translate read null
2026-01-06 Dementia-R1: Reinforced Pretraining and Reasoning from Unstructured Clinical Notes for Real-World Dementia Prognosis Choonghan Kim et.al. 2601.03018 translate read null
2026-01-06 In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior Anaïs Berkes et.al. 2601.03015 translate read null
2026-01-06 Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning Yuankun Xie et.al. 2601.02983 translate read null
2026-01-06 Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning Nathanaël Carraz Rakotonirina et.al. 2601.02972 translate read null
2026-01-06 The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models Yuhuan You et.al. 2601.02954 translate read null
2026-01-06 Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning Guoqiang Liang et.al. 2601.02918 translate read null
2026-01-06 ChemBART: A Pre-trained BART Model Assisting Organic Chemistry Analysis Kenan Li et.al. 2601.02915 translate read null
2026-01-06 SimRPD: Optimizing Recruitment Proactive Dialogue Agents through Simulator-Based Data Evaluation and Selection Zhiyong Cao et.al. 2601.02871 translate read null
2026-01-06 Sample-Efficient Neurosymbolic Deep Reinforcement Learning Celeste Veronese et.al. 2601.02850 translate read null
2026-01-06 SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models Ruiyang Zhang et.al. 2601.02825 translate read null
2026-01-06 Reinforcement Learning for Follow-the-Leader Robotic Endoscopic Navigation via Synthetic Data Sicong Gao et.al. 2601.02798 translate read null
2026-01-06 MiMo-V2-Flash Technical Report Xiaomi LLM-Core Team et.al. 2601.02780 translate read null
2026-01-06 Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation Zhe Zhao et.al. 2601.02778 translate read null
2026-01-06 Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies Mingming Zhang et.al. 2601.02754 translate read null
2026-01-06 Time-Scaling Is What Agents Need Now Zhi Liu et.al. 2601.02714 translate read null
2026-01-06 Inferring Causal Graph Temporal Logic Formulas to Expedite Reinforcement Learning in Temporally Extended Tasks Hadi Partovi Aria et.al. 2601.02666 translate read null
2026-01-06 Effective Online 3D Bin Packing with Lookahead Parcels Using Monte Carlo Tree Search Jiangyi Fang et.al. 2601.02649 translate read null
2026-01-05 SWaRL: Safeguard Code Watermarking via Reinforcement Learning Neusha Javidnia et.al. 2601.02602 translate read null
2026-01-05 Textual Explanations and Their Evaluations for Reinforcement Learning Policy Ahmad Terra et.al. 2601.02514 translate read null
2026-01-05 LLM-Enhanced Reinforcement Learning for Time Series Anomaly Detection Bahareh Golchin et.al. 2601.02511 translate read null
2026-01-05 WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks Hao Bai et.al. 2601.02439 translate read null
2026-01-05 Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes Jing Tan et.al. 2601.02356 translate read null
2026-01-05 VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation Shikun Sun et.al. 2601.02256 translate read null
2026-01-05 Enabling Deep Reinforcement Learning Research for Energy Saving in Open RAN Matteo Bordin et.al. 2601.02240 translate read null
2026-01-05 NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Huichao Zhang et.al. 2601.02204 translate read null
2026-01-05 CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents Keyu Wang et.al. 2601.02201 translate read null
2026-01-05 ACDZero: MCTS Agent for Mastering Automated Cyber Defense Yu Li et.al. 2601.02196 translate read null
2026-01-05 Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Muxi Diao et.al. 2601.02151 translate read null
2026-01-05 MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics Zhuofan Shi et.al. 2601.02075 translate read null
2026-01-05 Reinforcement Learning Based Computationally Efficient Conditional Choice Simulation Estimation of Dynamic Discrete Choice Models Ahmed Khwaja et.al. 2601.02069 translate read null
2026-01-05 Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management Faizan Ahmed et.al. 2601.02061 translate read null
2026-01-05 GDRO: Group-level Reward Post-training Suitable for Diffusion Models Yiyang Wang et.al. 2601.02036 translate read null
2026-01-05 AgentVNE: LLM-Augmented Graph Reinforcement Learning for Affinity-Aware Multi-Agent Placement in Edge Agentic AI Runze Zheng et.al. 2601.02021 translate read null
2026-01-05 Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation Weijian Ma et.al. 2601.01984 translate read null
2026-01-05 Distorted Distributional Policy Evaluation for Offline Reinforcement Learning Ryo Iwaki et.al. 2601.01917 translate read null
2026-01-05 Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning Yuxuan Li et.al. 2601.01904 translate read null
2026-01-05 Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents Yi Yu et.al. 2601.01885 translate read null
2026-01-05 DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs Jinghan Ru et.al. 2601.01868 translate read null
2026-01-05 Moments Matter:Stabilizing Policy Optimization using Return Distributions Dennis Jabs et.al. 2601.01803 translate read null
2026-01-05 PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor Qianjun Pan et.al. 2601.01802 translate read null
2026-01-05 Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving Qi Wei et.al. 2601.01800 translate read null
2026-01-05 SRAS: A Lightweight Reinforcement Learning-based Document Selector for Edge-Native RAG Pipelines Rajiv Chaitanya Muttur et.al. 2601.01785 translate read null
2026-01-05 Reinforcement Learning for Option Hedging: Static Implied-Volatility Fit versus Shortfall-Aware Performance Ziheng Chen et.al. 2601.01709 translate read null
2026-01-04 All-Optical Deep Learning with Quantum Nonlinearity Qingyi Zhou et.al. 2601.01690 translate read null
2026-01-04 Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives Wei Liu et.al. 2601.01665 translate read null
2026-01-04 DemoBot: Efficient Learning of Bimanual Manipulation with Dexterous Hands From Third-Person Human Videos Yucheng Xu et.al. 2601.01651 translate read null
2026-01-04 Action-Sketcher: From Reasoning to Action via Visual Sketches for Long-Horizon Robotic Manipulation Huajie Tan et.al. 2601.01618 translate read null
2026-01-04 HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller Tran Tien Dat et.al. 2601.01577 translate read null
2026-01-04 Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement Mingyu Xu et.al. 2601.01562 translate read null
2026-01-04 Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization Xinyu Qiu et.al. 2601.01483 translate read null
2026-01-04 Programmable ultra-broadband photonic chaos platform enabled by microwave-chaos-driven electro-optic frequency combs Shiyu Shi et.al. 2601.01440 translate read null
2026-01-04 Context-Aware Information Transfer via Digital Semantic Communication in UAV-Based Networks Poorvi Joshi et.al. 2601.01430 translate read null
2026-01-04 SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving Chaofan Tao et.al. 2601.01426 translate read null
2026-01-04 DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer Xu Guo et.al. 2601.01425 translate read null
2026-01-04 SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning Peidong Wang et.al. 2601.01392 translate read null
2026-01-03 dataRLsec: Safety, Security, and Reliability With Robust Offline Reinforcement Learning for DPAs Shriram KS Pandian et.al. 2601.01289 translate read null
2026-01-03 PyBatchRender: A Python Library for Batched 3D Rendering at Up to One Million FPS Evgenii Rudakov et.al. 2601.01288 translate read null
2026-01-03 Harnessing Environmental Memory with Reinforcement Learning in Open Quantum Systems Safae Gaidi et.al. 2601.01252 translate read null
2026-01-03 OrchestrRL: Dynamic Compute and Network Orchestration for Disaggregated RL Xin Tan et.al. 2601.01209 translate read null
2026-01-03 Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering Wuzhenghong Wen et.al. 2601.01195 translate read null
2026-01-03 SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards Suryansh Singh Sijwali et.al. 2601.01184 translate read null
2026-01-03 Reinforcement Learning Based Whittle Index Policy for Scheduling Wireless Sensors Sokipriala Jonah et.al. 2601.01179 translate read null
2026-01-03 ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation Zhang Shizhe et.al. 2601.01155 translate read null
2026-01-03 Latent Space Reinforcement Learning for Multi-Robot Exploration Sriram Rajasekar et.al. 2601.01139 translate read null
2026-01-03 Performance and Security Aware Distributed Service Placement in Fog Computing Mohammad Goudarzi et.al. 2601.01125 translate read null
2026-01-02 DVGBench: Implicit-to-Explicit Visual Grounding Benchmark in UAV Imagery with Large Vision-Language Models Yue Zhou et.al. 2601.00998 translate read null
2026-01-02 Materials Informatics: Emergence To Autonomous Discovery In The Age Of AI Turab Lookman et.al. 2601.00742 translate read null
2026-01-02 Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty Uğurcan Özalp et.al. 2601.00737 translate read null
2026-01-02 Precision Autotuning for Linear Solvers via Reinforcement Learning Erin Carson et.al. 2601.00728 translate read null
2026-01-02 ARISE: Adaptive Reinforcement Integrated with Swarm Exploration Rajiv Chaitanya M et.al. 2601.00693 translate read null
2026-01-02 IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning Haonan Song et.al. 2601.00677 translate read null
2026-01-02 RoboReward: General-Purpose Vision-Language Reward Models for Robotics Tony Lee et.al. 2601.00675 translate read null
2026-01-02 Integrating Multi-Armed Bandit, Active Learning, and Distributed Computing for Scalable Optimization Foo Hui-Mean et.al. 2601.00615 translate read null
2026-01-02 Vision-based Goal-Reaching Control for Mobile Robots Using a Hierarchical Learning Framework Mehdi Heydari Shahna et.al. 2601.00610 translate read null
2026-01-02 Traffic-Aware Optimal Taxi Placement Using Graph Neural Network-Based Reinforcement Learning Sonia Khetarpaul et.al. 2601.00607 translate read null
2026-01-02 Parametrized Sharing for Multi-Agent Hybrid DRL for Multiple Multi-Functional RISs-Aided Downlink NOMA Networks Chi-Te Kuo et.al. 2601.00538 translate read null
2026-01-01 CPPO: Contrastive Perception for Vision Language Policy Optimization Ahmad Rezaei et.al. 2601.00501 translate read null
2026-01-01 Safe Adaptive Feedback Control via Barrier States Trivikram Satharasi et.al. 2601.00476 translate read null
2026-01-01 Imitation from Observations with Trajectory-Level Generative Embeddings Yongtao Qu et.al. 2601.00452 translate read null
2026-01-01 Modelling cultural evolution Fredrik Jansson et.al. 2601.00433 translate read null
2026-01-01 E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models Shengjun Zhang et.al. 2601.00423 translate read null
2026-01-01 Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach Biao Wu et.al. 2601.00388 translate read null
2026-01-01 Multiagent Reinforcement Learning for Liquidity Games Alicia Vidler et.al. 2601.00324 translate read null
2026-01-01 Offline Multi-Agent Reinforcement Learning for 6G Communications: Fundamentals, Applications and Future Directions Eslam Eldeeb et.al. 2601.00321 translate read null
2026-01-01 Can Optimal Transport Improve Federated Inverse Reinforcement Learning? David Millard et.al. 2601.00309 translate read null
2026-01-01 Next Generation Intelligent Low-Altitude Economy Deployments: The O-RAN Perspective Aly Sabri Abdalla et.al. 2601.00257 translate read null
2026-01-01 Modern Neuromorphic AI: From Intra-Token to Inter-Token Processing Osvaldo Simeone et.al. 2601.00245 translate read null
2026-01-01 From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning Omar Sharif et.al. 2601.00215 translate read null
2026-01-01 Reinforcement-Learned Unequal Error Protection for Quantized Semantic Embeddings Moirangthem Tiken Singh et.al. 2601.00186 translate read null
2026-01-01 Online Finetuning Decision Transformers with Pure RL Gradients Junkai Luo et.al. 2601.00167 translate read null
2026-01-01 Reinforcement Learning with Function Approximation for Non-Markov Processes Ali Devran Kara et.al. 2601.00151 translate read null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)