Reinforcement Learning - 2026-01
Reinforcement Learning - 2026-01
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2026-01-30 | IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models | Seyed Ahmad Hosseini Miangoleh et.al. | 2601.23266 | translate | read | null |
| 2026-01-30 | Agile Reinforcement Learning through Separable Neural Architecture | Rajib Mostakim et.al. | 2601.23225 | translate | read | null |
| 2026-01-30 | Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning | Xiangyu Zeng et.al. | 2601.23224 | translate | read | null |
| 2026-01-30 | Med-Scout: Curing MLLMs’ Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training | Anglin Liu et.al. | 2601.23220 | translate | read | null |
| 2026-01-30 | Unsupervised Hierarchical Skill Discovery | Damion Harvey et.al. | 2601.23156 | translate | read | null |
| 2026-01-30 | On Safer Reinforcement Learning Policies for Sedation and Analgesia in Intensive Care | Joel Romero-Hernandez et.al. | 2601.23154 | translate | read | null |
| 2026-01-30 | THINKSAFE: Self-Generated Safety Alignment for Reasoning Models | Seanie Lee et.al. | 2601.23143 | translate | read | link |
| 2026-01-30 | Why GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive Gradients | Cheng Ge et.al. | 2601.23135 | translate | read | null |
| 2026-01-30 | Temporally Coherent Imitation Learning via Latent Action Flow Matching for Robotic Manipulation | Wu Songwei et.al. | 2601.23087 | translate | read | null |
| 2026-01-30 | RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning | Yuexin Bian et.al. | 2601.23075 | translate | read | null |
| 2026-01-30 | From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning | Wenzhe Niu et.al. | 2601.23058 | translate | read | null |
| 2026-01-30 | Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning | Siyu Gong et.al. | 2601.23032 | translate | read | null |
| 2026-01-30 | Mem-T: Densifying Rewards for Long-Horizon Memory Agents | Yanwei Yue et.al. | 2601.23014 | translate | read | null |
| 2026-01-30 | Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning | Xinchen Han et.al. | 2601.23010 | translate | read | null |
| 2026-01-30 | Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text | Ximing Lu et.al. | 2601.22975 | translate | read | null |
| 2026-01-30 | Self-Imitated Diffusion Policy for Efficient and Robust Visual Navigation | Runhua Zhang et.al. | 2601.22965 | translate | read | null |
| 2026-01-30 | SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding | Boyin Tan et.al. | 2601.22956 | translate | read | null |
| 2026-01-30 | MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving | Xidong Li et.al. | 2601.22930 | translate | read | null |
| 2026-01-30 | MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop | Xuancheng Li et.al. | 2601.22900 | translate | read | null |
| 2026-01-30 | PlatoLTL: Learning to Generalize Across Symbols in LTL Instructions for Multi-Task RL | Jacques Cloete et.al. | 2601.22891 | translate | read | null |
| 2026-01-30 | Reinforcement Learning-Based Co-Design and Operation of Chiller and Thermal Energy Storage for Cost-Optimal HVAC Systems | Tanay Raghunandan Srinivasa et.al. | 2601.22880 | translate | read | null |
| 2026-01-30 | Degradation-Aware Frequency Regulation of a Heterogeneous Battery Fleet via Reinforcement Learning | Tanay Raghunandan Srinivasa et.al. | 2601.22865 | translate | read | null |
| 2026-01-30 | The two-nest ants process on triangle-series-parallel graphs | Cécile Mailler et.al. | 2601.22855 | translate | read | null |
| 2026-01-30 | Robust Rigid Body Assembly via Contact-Implicit Optimal Control with Exact Second-Order Derivatives | Christian Dietz et.al. | 2601.22849 | translate | read | null |
| 2026-01-30 | Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment | Mathieu Petitbois et.al. | 2601.22823 | translate | read | null |
| 2026-01-30 | CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning | Ji Shi et.al. | 2601.22803 | translate | read | null |
| 2026-01-30 | Clipping-Free Policy Optimization for Large Language Models | Ömer Veysel Çağatan et.al. | 2601.22801 | translate | read | null |
| 2026-01-30 | TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization | Shichao Ma et.al. | 2601.22776 | translate | read | null |
| 2026-01-30 | A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization | Shiye Lei et.al. | 2601.22718 | translate | read | null |
| 2026-01-30 | Real-Time Aligned Reward Model beyond Semantics | Zixuan Huang et.al. | 2601.22664 | translate | read | null |
| 2026-01-30 | Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability | Yong Ren et.al. | 2601.22661 | translate | read | null |
| 2026-01-30 | COBRA++: Enhanced COBRA Optimizer with Augmented Surrogate Pool and Reinforced Surrogate Selection | Zepei Yu et.al. | 2601.22624 | translate | read | null |
| 2026-01-30 | From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents | Jiaxuan Gao et.al. | 2601.22607 | translate | read | null |
| 2026-01-30 | Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR | Hao Yi et.al. | 2601.22595 | translate | read | null |
| 2026-01-30 | MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning | Youngeun Kim et.al. | 2601.22582 | translate | read | null |
| 2026-01-30 | Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation | Geonho Leem et.al. | 2601.22550 | translate | read | null |
| 2026-01-30 | PersonaAct: Simulating Short-Video Users with Personalized Agents for Counterfactual Filter Bubble Auditing | Shilong Zhao et.al. | 2601.22547 | translate | read | null |
| 2026-01-30 | Adapting Reinforcement Learning for Path Planning in Constrained Parking Scenarios | Feng Tao et.al. | 2601.22545 | translate | read | null |
| 2026-01-30 | Detect and Act: Automated Dynamic Optimizer through Meta-Black-Box Optimization | Zijian Gao et.al. | 2601.22542 | translate | read | null |
| 2026-01-30 | One Ring to Rule Them All: Unifying Group-Based RL via Dynamic Power-Mean Geometry | Weisong Zhao et.al. | 2601.22521 | translate | read | null |
| 2026-01-30 | RoboStriker: Hierarchical Decision-Making for Autonomous Humanoid Boxing | Kangning Yin et.al. | 2601.22517 | translate | read | null |
| 2026-01-30 | Mock Worlds, Real Skills: Building Small Agentic Language Models with Synthetic Tasks, Simulated Environments, and Rubric-Based Rewards | Yuan-Jay Lü et.al. | 2601.22511 | translate | read | null |
| 2026-01-30 | DreamVAR: Taming Reinforced Visual Autoregressive Model for High-Fidelity Subject-Driven Image Generation | Xin Jiang et.al. | 2601.22507 | translate | read | null |
| 2026-01-30 | Action-Sufficient Goal Representations | Jinu Hyeon et.al. | 2601.22496 | translate | read | null |
| 2026-01-30 | SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization | Jinyang Wu et.al. | 2601.22491 | translate | read | null |
| 2026-01-30 | RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D Floorplanning | Ruizhe Zhong et.al. | 2601.22476 | translate | read | null |
| 2026-01-30 | Continual Policy Distillation from Distributed Reinforcement Learning Teachers | Yuxuan Li et.al. | 2601.22475 | translate | read | null |
| 2026-01-30 | Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology | Jian Xiong et.al. | 2601.22474 | translate | read | null |
| 2026-01-30 | HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning | Weiqi Wang et.al. | 2601.22448 | translate | read | null |
| 2026-01-29 | SAIR: Cost-Efficient Multi-Stage ML Pipeline Autoscaling via In-Context Reinforcement Learning | Jianchang Su et.al. | 2601.22397 | translate | read | null |
| 2026-01-29 | Quantum-Inspired Reinforcement Learning for Secure and Sustainable AIoT-Driven Supply Chain Systems | Muhammad Bilal Akram Dastagir et.al. | 2601.22339 | translate | read | null |
| 2026-01-29 | Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning | Qi Cao et.al. | 2601.22323 | translate | read | null |
| 2026-01-29 | Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning | Chenxi Liu et.al. | 2601.22297 | translate | read | null |
| 2026-01-29 | Learning Reward Functions for Cooperative Resilience in Multi-Agent Systems | Manuela Chacon-Chamorro et.al. | 2601.22292 | translate | read | null |
| 2026-01-29 | Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data | Zhihao Zhang et.al. | 2601.22242 | translate | read | null |
| 2026-01-29 | Smart Walkers in Discrete Space | Gianluca Peri et.al. | 2601.22235 | translate | read | null |
| 2026-01-29 | Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions | Lingkai Kong et.al. | 2601.22211 | translate | read | null |
| 2026-01-29 | Causal Imitation Learning Under Measurement Error and Distribution Shift | Shi Bo et.al. | 2601.22206 | translate | read | null |
| 2026-01-28 | ShellForge: Adversarial Co-Evolution of Webshell Generation and Multi-View Detection for Robust Webshell Defense | Yizhong Ding et.al. | 2601.22182 | translate | read | null |
| 2026-01-29 | Exploring Reasoning Reward Model for Agents | Kaixuan Fan et.al. | 2601.22154 | translate | read | link |
| 2026-01-29 | DynaWeb: Model-Based Reinforcement Learning of Web Agents | Hang Ding et.al. | 2601.22149 | translate | read | null |
| 2026-01-29 | Learning to Dial-a-Ride: A Deep Graph Reinforcement Learning Approach to the Electric Dial-a-Ride Problem | Sten Elling Tingstad Jacobsen et.al. | 2601.22052 | translate | read | null |
| 2026-01-29 | SIA: Symbolic Interpretability for Anticipatory Deep Reinforcement Learning in Network Control | MohammadErfan Jabbari et.al. | 2601.22044 | translate | read | null |
| 2026-01-29 | SymbXRL: Symbolic Explainable Deep Reinforcement Learning for Mobile Networks | Abhishek Duttagupta et.al. | 2601.22024 | translate | read | null |
| 2026-01-29 | Geometry of Drifting MDPs with Path-Integral Stability Certificates | Zuyuan Zhang et.al. | 2601.21991 | translate | read | null |
| 2026-01-29 | Elign: Equivariant Diffusion Model Alignment from Foundational Machine Learning Force Fields | Yunyang Li et.al. | 2601.21985 | translate | read | null |
| 2026-01-29 | Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic | Shuo Liu et.al. | 2601.21972 | translate | read | null |
| 2026-01-29 | MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts | Lorenzo Mazza et.al. | 2601.21971 | translate | read | null |
| 2026-01-29 | Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding | Yifan Zhu et.al. | 2601.21969 | translate | read | null |
| 2026-01-29 | OVD: On-policy Verbal Distillation | Jing Xiong et.al. | 2601.21968 | translate | read | null |
| 2026-01-29 | Optimistic Transfer under Task Shift via Bellman Alignment | Jinhang Chai et.al. | 2601.21924 | translate | read | null |
| 2026-01-29 | Self-Compression of Chain-of-Thought via Multi-Agent Reinforcement Learning | Yiqun Chen et.al. | 2601.21919 | translate | read | null |
| 2026-01-29 | ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation | Zhao Wang et.al. | 2601.21912 | translate | read | null |
| 2026-01-29 | From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning | Shaojie Wang et.al. | 2601.21909 | translate | read | null |
| 2026-01-29 | Acquiring Human-Like Mechanics Intuition from Scarce Observations via Deep Reinforcement Learning | Jingruo Peng et.al. | 2601.21881 | translate | read | null |
| 2026-01-29 | WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents | Yao Zhang et.al. | 2601.21872 | translate | read | null |
| 2026-01-29 | Spatiotemporal Continual Learning for Mobile Edge UAV Networks: Mitigating Catastrophic Forgetting | Chuan-Chi Lai et.al. | 2601.21861 | translate | read | null |
| 2026-01-29 | Self-Adaptive Probabilistic Skyline Query Processing in Distributed Edge Computing via Deep Reinforcement Learning | Chuan-Chi Lai et.al. | 2601.21855 | translate | read | null |
| 2026-01-29 | READY: Reward Discovery for Meta-Black-Box Optimization | Zechuan Huang et.al. | 2601.21847 | translate | read | null |
| 2026-01-29 | Constrained Meta Reinforcement Learning with Provable Test-Time Safety | Tingting Ni et.al. | 2601.21845 | translate | read | null |
| 2026-01-29 | Distribution-Aware Reward Estimation for Test-Time Reinforcement Learning | Bodong Du et.al. | 2601.21804 | translate | read | null |
| 2026-01-29 | Error Amplification Limits ANN-to-SNN Conversion in Continuous Control | Zijie Xu et.al. | 2601.21778 | translate | read | null |
| 2026-01-29 | OneMall: One Model, More Scenarios – End-to-End Generative Recommender Family at Kuaishou E-Commerce | Kun Zhang et.al. | 2601.21770 | translate | read | null |
| 2026-01-29 | Influence Guided Sampling for Domain Adaptation of Text Retrievers | Meet Doshi et.al. | 2601.21759 | translate | read | null |
| 2026-01-29 | Language-based Trial and Error Falls Behind in the Era of Experience | Haoyu Wang et.al. | 2601.21754 | translate | read | link |
| 2026-01-29 | Epistemic Context Learning: Building Trust the Right Way in LLM-Based Multi-Agent Systems | Ruiwen Zhou et.al. | 2601.21742 | translate | read | null |
| 2026-01-29 | Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators | Rebecca Pelke et.al. | 2601.21737 | translate | read | null |
| 2026-01-29 | When does predictive inverse dynamics outperform behavior cloning? | Lukas Schäfer et.al. | 2601.21718 | translate | read | null |
| 2026-01-29 | Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations | Donatien Delehelle et.al. | 2601.21713 | translate | read | null |
| 2026-01-29 | TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning | Huiyuan Lai et.al. | 2601.21711 | translate | read | null |
| 2026-01-29 | Can David Beat Goliath? On Multi-Hop Reasoning with Resource-Constrained Agents | Hojae Han et.al. | 2601.21699 | translate | read | null |
| 2026-01-29 | BAP-SRL: Bayesian Adaptive Priority Safe Reinforcement Learning for Vehicle Motion Planning at Mixed Traffic Intersections | Yuansheng Lian et.al. | 2601.21679 | translate | read | null |
| 2026-01-29 | Expected Return Causes Outcome-Level Mode Collapse in Reinforcement Learning and How to Fix It with Inverse Probability Scaling | Abhijeet Sinha et.al. | 2601.21669 | translate | read | null |
| 2026-01-29 | Reinforcement Learning for Adaptive Composition of Quantum Circuit Optimisation Passes | Daniel Mills et.al. | 2601.21629 | translate | read | null |
| 2026-01-29 | PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization | Songhan Jiang et.al. | 2601.21617 | translate | read | null |
| 2026-01-29 | RecNet: Self-Evolving Preference Propagation for Agentic Recommender Systems | Bingqian Li et.al. | 2601.21609 | translate | read | null |
| 2026-01-29 | Beyond Imitation: Reinforcement Learning for Active Latent Planning | Zhi Zheng et.al. | 2601.21598 | translate | read | link |
| 2026-01-29 | Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening | Xiaotong Ji et.al. | 2601.21590 | translate | read | null |
| 2026-01-29 | Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks | Jinhao Li et.al. | 2601.21572 | translate | read | null |
| 2026-01-29 | ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas | Xiaoyu Tian et.al. | 2601.21558 | translate | read | link |
| 2026-01-29 | Training slow silicon neurons to control extremely fast robots with spiking reinforcement learning | Irene Ambrosini et.al. | 2601.21548 | translate | read | null |
| 2026-01-29 | Explicit Credit Assignment through Local Rewards and Dependence Graphs in Multi-Agent Reinforcement Learning | Bang Giang Le et.al. | 2601.21523 | translate | read | null |
| 2026-01-29 | ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment | Xiuyu Li et.al. | 2601.21484 | translate | read | null |
| 2026-01-29 | Mean-Field Control on Sparse Graphs: From Local Limits to GNNs via Neighborhood Distributions | Tobias Schmidt et.al. | 2601.21477 | translate | read | null |
| 2026-01-29 | SOUP: Token-level Single-sample Mix-policy Reinforcement Learning for Large Language Models | Lei Yang et.al. | 2601.21476 | translate | read | null |
| 2026-01-29 | MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning | Yaorui Shi et.al. | 2601.21468 | translate | read | null |
| 2026-01-29 | HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing | Chengyu Du et.al. | 2601.21459 | translate | read | null |
| 2026-01-29 | Mitigating Overthinking in Large Reasoning Models via Difficulty-aware Reinforcement Learning | Qian Wan et.al. | 2601.21418 | translate | read | null |
| 2026-01-29 | Towards Space-Based Environmentally-Adaptive Grasping | Leonidas Askianakis et.al. | 2601.21394 | translate | read | null |
| 2026-01-29 | Shaping the learning signal in a combined Q-learning rule to improve structured cooperation | Chunpeng Du et.al. | 2601.21392 | translate | read | null |
| 2026-01-29 | Intrinsic Reward Policy Optimization for Sparse-Reward Environments | Minjae Cho et.al. | 2601.21391 | translate | read | null |
| 2026-01-29 | Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control | Weidong Huang et.al. | 2601.21363 | translate | read | null |
| 2026-01-29 | Factored Causal Representation Learning for Robust Reward Modeling in RLHF | Yupei Yang et.al. | 2601.21350 | translate | read | null |
| 2026-01-29 | Self-Improving Pretraining: using post-trained models to pretrain better models | Ellen Xiaoqing Tan et.al. | 2601.21343 | translate | read | null |
| 2026-01-29 | Heterogeneous Vertiport Selection Optimization for On-Demand Air Taxi Services: A Deep Reinforcement Learning Approach | Aoyu Pang et.al. | 2601.21316 | translate | read | null |
| 2026-01-29 | Few-Shot Learning for Dynamic Operations of Automated Electric Taxi Fleets under Evolving Charging Infrastructure: A Meta-Deep Reinforcement Learning Approach | Xiaozhuang Li et.al. | 2601.21312 | translate | read | null |
| 2026-01-29 | The Surprising Difficulty of Search in Model-Based Reinforcement Learning | Wei-Di Chang et.al. | 2601.21306 | translate | read | null |
| 2026-01-29 | EGAM: Extended Graph Attention Model for Solving Routing Problems | Licheng Wang et.al. | 2601.21281 | translate | read | null |
| 2026-01-29 | Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels | Micah Rentschler et.al. | 2601.21268 | translate | read | null |
| 2026-01-29 | Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification | Yiju Guo et.al. | 2601.21244 | translate | read | null |
| 2026-01-29 | Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning | Xixian Yong et.al. | 2601.21212 | translate | read | null |
| 2026-01-29 | When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning | Wei Wen et.al. | 2601.21208 | translate | read | null |
| 2026-01-29 | Do Reasoning Models Enhance Embedding Models? | Wun Yu Chan et.al. | 2601.21192 | translate | read | null |
| 2026-01-28 | Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed | Minjae Kwon et.al. | 2601.21094 | translate | read | link |
| 2026-01-28 | Deep Reinforcement Learning for Fault-Adaptive Routing in Eisenstein-Jacobi Interconnection Topologies | Mohammad Walid Charrwi et.al. | 2601.21090 | translate | read | null |
| 2026-01-28 | OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence | Jarrod Barnes et.al. | 2601.21083 | translate | read | link |
| 2026-01-28 | Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report | Zhuoran Yang et.al. | 2601.21051 | translate | read | null |
| 2026-01-28 | Log2Motion: Biomechanical Motion Synthesis from Touch Logs | Michał Patryk Miazga et.al. | 2601.21043 | translate | read | null |
| 2026-01-28 | SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model | Zongheng Guo et.al. | 2601.21031 | translate | read | link |
| 2026-01-28 | Distributional Active Inference | Abdullah Akgül et.al. | 2601.20985 | translate | read | null |
| 2026-01-28 | End-to-end example-based sim-to-real RL policy transfer based on neural stylisation with application to robotic cutting | Jamie Hathaway et.al. | 2601.20846 | translate | read | null |
| 2026-01-28 | Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning | Minwu Kim et.al. | 2601.20829 | translate | read | link |
| 2026-01-28 | Reinforcement Learning via Self-Distillation | Jonas Hübotter et.al. | 2601.20802 | translate | read | link |
| 2026-01-28 | SERA: Soft-Verified Efficient Repository Agents | Ethan Shen et.al. | 2601.20789 | translate | read | link |
| 2026-01-28 | Less is More: Clustered Cross-Covariance Control for Offline RL | Nan Qiao et.al. | 2601.20765 | translate | read | null |
| 2026-01-28 | GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning | Zhiheng Jiang et.al. | 2601.20753 | translate | read | null |
| 2026-01-28 | Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions | Raul de la Rosa et.al. | 2601.20714 | translate | read | null |
| 2026-01-28 | One Step Is Enough: Dispersive MeanFlow Policy Optimization | Guowei Zou et.al. | 2601.20701 | translate | read | null |
| 2026-01-28 | Grover’s Search-Inspired Quantum Reinforcement Learning for Massive MIMO User Scheduling | Ruining Fan et.al. | 2601.20688 | translate | read | null |
| 2026-01-28 | Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models | Zhiqiang Kou et.al. | 2601.20687 | translate | read | null |
| 2026-01-28 | GPO: Growing Policy Optimization for Legged Robot Locomotion and Whole-Body Control | Shuhao Liao et.al. | 2601.20668 | translate | read | null |
| 2026-01-28 | Deep Learning based Three-stage Solution for ISAC Beamforming Optimization | Qian Gao et.al. | 2601.20667 | translate | read | null |
| 2026-01-28 | Integrated Sensing and Communication for Segmented Waveguide-Enabled Pinching Antenna Systems | Qian Gao et.al. | 2601.20658 | translate | read | null |
| 2026-01-28 | RL based Beamforming Optimization for 3D Pinching Antenna assisted ISAC Systems | Qian Gao et.al. | 2601.20654 | translate | read | null |
| 2026-01-28 | P2S: Probabilistic Process Supervision for General-Domain Reasoning Question Answering | Wenlin Zhong et.al. | 2601.20649 | translate | read | null |
| 2026-01-28 | Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation | Yanqi Dai et.al. | 2601.20614 | translate | read | null |
| 2026-01-28 | Ranking-aware Reinforcement Learning for Ordinal Ranking | Aiming Hao et.al. | 2601.20585 | translate | read | null |
| 2026-01-28 | Inequality in Congestion Games with Learning Agents | Dimitris Michailidis et.al. | 2601.20578 | translate | read | null |
| 2026-01-28 | Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual Explanations | Fatima Ezzeddine et.al. | 2601.20449 | translate | read | null |
| 2026-01-28 | PEARL: Plan Exploration and Adaptive Reinforcement Learning for Multihop Tool Use | Qihao Wang et.al. | 2601.20439 | translate | read | null |
| 2026-01-28 | MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models | Wenbo Xu et.al. | 2601.20433 | translate | read | null |
| 2026-01-28 | Reinforcement Learning for Dividend Optimization in Partially Observed Regime-Switching Diffusion Model | Zhongqin Gao et.al. | 2601.20387 | translate | read | null |
| 2026-01-28 | PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments | Zhuang Chen et.al. | 2601.20330 | translate | read | null |
| 2026-01-28 | CE-RM: A Pointwise Generative Reward Model Optimized via Two-Stage Rollout and Unified Criteria | Xinyu Hu et.al. | 2601.20327 | translate | read | null |
| 2026-01-28 | Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models | Zhenchen Tang et.al. | 2601.20305 | translate | read | null |
| 2026-01-28 | Proactive SFC Provisioning with Forecast-Driven DRL in Data Centers | Parisa Fard Moshiri et.al. | 2601.20229 | translate | read | null |
| 2026-01-28 | Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning | Hang Zhang et.al. | 2601.20221 | translate | read | null |
| 2026-01-28 | Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning | Jinyang Wu et.al. | 2601.20209 | translate | read | null |
| 2026-01-28 | Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery | Zhipeng Zhang et.al. | 2601.20193 | translate | read | null |
| 2026-01-27 | Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models | Abha Jha et.al. | 2601.20126 | translate | read | null |
| 2026-01-27 | A Reinforcement Learning Based Universal Sequence Design for Polar Codes | David Kin Wai Ho et.al. | 2601.20118 | translate | read | null |
| 2026-01-27 | In-Context Reinforcement Learning From Suboptimal Historical Data | Juncheng Dong et.al. | 2601.20116 | translate | read | null |
| 2026-01-27 | Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis | Darshan Deshpande et.al. | 2601.20103 | translate | read | null |
| 2026-01-27 | Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery | Meng Xin et.al. | 2601.20088 | translate | read | null |
| 2026-01-27 | Techno-economic optimization of a heat-pipe microreactor, part II: multi-objective optimization analysis | Paul Seurin et.al. | 2601.20079 | translate | read | null |
| 2026-01-27 | Distributional value gradients for stochastic environments | Baptiste Debes et.al. | 2601.20071 | translate | read | null |
| 2026-01-27 | Exploring the holographic entropy cone via reinforcement learning | Temple He et.al. | 2601.19979 | translate | read | null |
| 2026-01-27 | E2HiL: Entropy-Guided Sample Selection for Efficient Real-World Human-in-the-Loop Reinforcement Learning | Haoyuan Deng et.al. | 2601.19969 | translate | read | null |
| 2026-01-27 | Self-Distillation Enables Continual Learning | Idan Shenfeld et.al. | 2601.19897 | translate | read | null |
| 2026-01-27 | A Latent Space Framework for Modeling Transient Engine Emissions Using Joint Embedding Predictive Architectures | Ganesh Sundaram et.al. | 2601.19822 | translate | read | null |
| 2026-01-27 | Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals | Octavio Pappalardo et.al. | 2601.19810 | translate | read | null |
| 2026-01-27 | Reimagining Peer Review Process Through Multi-Agent Mechanism Design | Ahmad Farooq et.al. | 2601.19778 | translate | read | null |
| 2026-01-27 | Reimagining Social Robots as Recommender Systems: Foundations, Framework, and Applications | Jin Huang et.al. | 2601.19761 | translate | read | null |
| 2026-01-27 | Improving Policy Exploitation in Online Reinforcement Learning with Instant Retrospect Action | Gong Gao et.al. | 2601.19720 | translate | read | null |
| 2026-01-27 | Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow | Yunyue Wei et.al. | 2601.19707 | translate | read | null |
| 2026-01-27 | AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion | Tianyue Jiang et.al. | 2601.19697 | translate | read | null |
| 2026-01-27 | Video-KTR: Reinforcing Video Reasoning via Key Token Attribution | Ziyue Wang et.al. | 2601.19686 | translate | read | null |
| 2026-01-27 | Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning | Tongxi Wang et.al. | 2601.19624 | translate | read | null |
| 2026-01-27 | R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning | Zhizheng Jiang et.al. | 2601.19620 | translate | read | null |
| 2026-01-27 | Safe Exploration via Policy Priors | Manuel Wendl et.al. | 2601.19612 | translate | read | null |
| 2026-01-27 | LLM-Enhanced Reinforcement Learning for Long-Term User Satisfaction in Interactive Recommendation | Chongjun Xia et.al. | 2601.19585 | translate | read | null |
| 2026-01-27 | Bridging Information Asymmetry: A Hierarchical Framework for Deterministic Blind Face Restoration | Zhengjian Yao et.al. | 2601.19506 | translate | read | null |
| 2026-01-27 | Reinforcement Learning Goal-Reaching Control with Guaranteed Lyapunov-Like Stabilizer for Mobile Robots | Mehdi Heydari Shahna et.al. | 2601.19499 | translate | read | null |
| 2026-01-27 | APC-RL: Exceeding Data-Driven Behavior Priors with Adaptive Policy Composition | Finn Rietz et.al. | 2601.19452 | translate | read | null |
| 2026-01-27 | OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation | Giuseppe Chiari et.al. | 2601.19439 | translate | read | null |
| 2026-01-27 | Task-Centric Policy Optimization from Misaligned Motion Priors | Ziang Zheng et.al. | 2601.19411 | translate | read | null |
| 2026-01-27 | CHEHAB RL: Learning to Optimize Fully Homomorphic Encryption Computations | Bilel Sefsaf et.al. | 2601.19367 | translate | read | null |
| 2026-01-27 | From Observations to Events: Event-Aware World Model for Reinforcement Learning | Zhao-Han Peng et.al. | 2601.19336 | translate | read | null |
| 2026-01-27 | Innovator-VL: A Multimodal Large Language Model for Scientific Discovery | Zichen Wen et.al. | 2601.19325 | translate | read | null |
| 2026-01-27 | Reinforced Rate Control for Neural Video Compression via Inter-Frame Rate-Distortion Awareness | Wuyang Cong et.al. | 2601.19293 | translate | read | null |
| 2026-01-27 | Model-Free Output Feedback Stabilization via Policy Gradient Methods | Ankang Zhang et.al. | 2601.19284 | translate | read | null |
| 2026-01-27 | Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning | Kishan Panaganti et.al. | 2601.19280 | translate | read | null |
| 2026-01-27 | Reinforcement Learning for Enhanced Advanced QEC Architecture Decoding | Yidong Zhou et.al. | 2601.19279 | translate | read | null |
| 2026-01-27 | iFAN Ecosystem: A Unified AI, Digital Twin, Cyber-Physical Security, and Robotics Environment for Advanced Nuclear Simulation and Operations | Youndo Do et.al. | 2601.19234 | translate | read | null |
| 2026-01-27 | Structure-based RNA Design by Step-wise Optimization of Latent Diffusion Model | Qi Si et.al. | 2601.19232 | translate | read | null |
| 2026-01-27 | Towards Pixel-Level VLM Perception via Simple Points Prediction | Tianhui Song et.al. | 2601.19228 | translate | read | null |
| 2026-01-27 | Exploring Weaknesses in Function Call Models via Reinforcement Learning: An Adversarial Data Augmentation Approach | Weiran Guo et.al. | 2601.19122 | translate | read | null |
| 2026-01-27 | Glance and Focus Reinforcement for Pan-cancer Screening | Linshan Wu et.al. | 2601.19103 | translate | read | null |
| 2026-01-27 | Reward Engineering for Reinforcement Learning in Software Tasks | Md Rayhanul Masud et.al. | 2601.19100 | translate | read | null |
| 2026-01-27 | m2sv: A Scalable Benchmark for Map-to-Street-View Spatial Reasoning | Yosub Shin et.al. | 2601.19099 | translate | read | null |
| 2026-01-27 | Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback | Siddhant Arora et.al. | 2601.19063 | translate | read | null |
| 2026-01-26 | A Unifying View of Coverage in Linear Off-Policy Evaluation | Philip Amortila et.al. | 2601.19030 | translate | read | null |
| 2026-01-26 | Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning | Haolin Liu et.al. | 2601.18984 | translate | read | null |
| 2026-01-26 | Reinforcement Learning for Quantum Technology | Marin Bukov et.al. | 2601.18953 | translate | read | null |
| 2026-01-26 | Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach | Mehrdad Mohammadi et.al. | 2601.18952 | translate | read | null |
| 2026-01-26 | Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration | Hwanwoo Kim et.al. | 2601.18907 | translate | read | null |
| 2026-01-26 | Analysis of Control Bellman Residual Minimization for Markov Decision Problem | Donghwan Lee et.al. | 2601.18840 | translate | read | null |
| 2026-01-26 | Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes | Amrith Setlur et.al. | 2601.18795 | translate | read | null |
| 2026-01-26 | Multi-Objective Reinforcement Learning for Efficient Tactical Decision Making for Trucks in Highway Traffic | Deepthi Pathare et.al. | 2601.18783 | translate | read | null |
| 2026-01-26 | POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration | Yuxiao Qu et.al. | 2601.18779 | translate | read | null |
| 2026-01-26 | Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability | Shobhita Sundaram et.al. | 2601.18778 | translate | read | null |
| 2026-01-26 | Dep-Search: Learning Dependency-Aware Reasoning Traces with Persistent Memory | Yanming Liu et.al. | 2601.18771 | translate | read | null |
| 2026-01-26 | Trust, Don’t Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback | Seyed Amir Hosseini et.al. | 2601.18751 | translate | read | null |
| 2026-01-26 | Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models | Siyan Zhao et.al. | 2601.18734 | translate | read | null |
| 2026-01-26 | Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale | Henry Bell et.al. | 2601.18730 | translate | read | null |
| 2026-01-26 | Trustworthy Evaluation of Robotic Manipulation: A New Benchmark and AutoEval Methods | Mengyuan Liu et.al. | 2601.18723 | translate | read | null |
| 2026-01-26 | Health-SCORE: Towards Scalable Rubrics for Improving Health-LLMs | Zhichao Yang et.al. | 2601.18706 | translate | read | null |
| 2026-01-26 | ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule | Yilie Huang et.al. | 2601.18681 | translate | read | null |
| 2026-01-26 | AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning | Mingyang Song et.al. | 2601.18631 | translate | read | null |
| 2026-01-26 | Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning | Yingxiao Huo et.al. | 2601.18626 | translate | read | null |
| 2026-01-26 | Learning long term climate-resilient transport adaptation pathways under direct and indirect flood impacts using reinforcement learning | Miguel Costa et.al. | 2601.18586 | translate | read | null |
| 2026-01-26 | From Classification to Ranking: Enhancing LLM Reasoning Capabilities for MBTI Personality Detection | Yuan Cao et.al. | 2601.18582 | translate | read | null |
| 2026-01-26 | K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents | Vincenzo De Paola et.al. | 2601.18580 | translate | read | null |
| 2026-01-26 | GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning | Kaixun Jiang et.al. | 2601.18543 | translate | read | null |
| 2026-01-26 | From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation | Yuxin Jiang et.al. | 2601.18533 | translate | read | null |
| 2026-01-26 | Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates | Yibo Li et.al. | 2601.18510 | translate | read | null |
| 2026-01-26 | Enhancing Control Policy Smoothness by Aligning Actions with Predictions from Preceding States | Kyoleen Kwak et.al. | 2601.18479 | translate | read | null |
| 2026-01-26 | OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents | Yuhang Zhou et.al. | 2601.18467 | translate | read | null |
| 2026-01-26 | Deep Reinforcement Learning for Hybrid RIS Assisted MIMO Communications | Phuong Nam Tran et.al. | 2601.18453 | translate | read | null |
| 2026-01-26 | Emergent Cooperation in Quantum Multi-Agent Reinforcement Learning Using Communication | Michael Kölle et.al. | 2601.18419 | translate | read | null |
| 2026-01-26 | daVinci-Dev: Agent-native Mid-training for Software Engineering | Ji Zeng et.al. | 2601.18418 | translate | read | null |
| 2026-01-26 | AI Agent for Reverse-Engineering Legacy Finite-Difference Code and Translating to Devito | Yinghan Hou et.al. | 2601.18381 | translate | read | null |
| 2026-01-26 | Temp-R1: A Unified Autonomous Agent for Complex Temporal KGQA via Reverse Curriculum Reinforcement Learning | Zhaoyan Gong et.al. | 2601.18296 | translate | read | null |
| 2026-01-26 | Reinforcement Learning with Distributed MPC for Fuel-Efficient Platoon Control with Discrete Gear Transitions | Samuel Mallick et.al. | 2601.18294 | translate | read | null |
| 2026-01-26 | TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment | Zhewen Tan et.al. | 2601.18292 | translate | read | null |
| 2026-01-26 | VissimRL: A Multi-Agent Reinforcement Learning Framework for Traffic Signal Control Based on Vissim | Hsiao-Chuan Chang et.al. | 2601.18284 | translate | read | null |
| 2026-01-26 | Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue | Yuhang Jia et.al. | 2601.18281 | translate | read | null |
| 2026-01-26 | ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants | Pei Wang et.al. | 2601.18225 | translate | read | null |
| 2026-01-26 | Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents | Zhihan Liu et.al. | 2601.18217 | translate | read | null |
| 2026-01-26 | PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR | James Burgess et.al. | 2601.18207 | translate | read | null |
| 2026-01-26 | QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding | Linhan Cao et.al. | 2601.18195 | translate | read | null |
| 2026-01-26 | FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning | Zhaopeng Qiu et.al. | 2601.18150 | translate | read | null |
| 2026-01-26 | Exact Minimum-Volume Confidence Set Intersection for Multinomial Outcomes | Heguang Lin et.al. | 2601.18145 | translate | read | null |
| 2026-01-26 | Enhance the Safety in Reinforcement Learning by ADRC Lagrangian Methods | Mingxu Zhang et.al. | 2601.18142 | translate | read | null |
| 2026-01-26 | Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions | Pedram Agand et.al. | 2601.18107 | translate | read | null |
| 2026-01-26 | Diffusion Model-based Reinforcement Learning for Version Age of Information Scheduling: Average and Tail-Risk-Sensitive Control | Haoyuan Pan et.al. | 2601.18069 | translate | read | null |
| 2026-01-23 | Autonomous Optical Alignment of Satellite-Based Entanglement Sources using Reinforcement Learning | Andrzej Gajewski et.al. | 2601.16968 | translate | read | null |
| 2026-01-23 | The Trajectory Alignment Coefficient in Two Acts: From Reward Tuning to Reward Learning | Calarina Muslimani et.al. | 2601.16906 | translate | read | null |
| 2026-01-23 | Boosting Deep Reinforcement Learning with Semantic Knowledge for Robotic Manipulators | Lucía Güitta-López et.al. | 2601.16866 | translate | read | null |
| 2026-01-23 | Reasoning Promotes Robustness in Theory of Mind Tasks | Ian B. de Haan et.al. | 2601.16853 | translate | read | null |
| 2026-01-23 | LongCat-Flash-Thinking-2601 Technical Report | Meituan LongCat Team et.al. | 2601.16725 | translate | read | null |
| 2026-01-23 | Adaptive Reinforcement and Model Predictive Control Switching for Safe Human-Robot Cooperative Navigation | Ning Liu et.al. | 2601.16686 | translate | read | null |
| 2026-01-23 | Sim-to-Real Transfer via a Style-Identified Cycle Consistent Generative Adversarial Network: Zero-Shot Deployment on Robotic Manipulators through Visual Domain Adaptation | Lucía Güitta-López et.al. | 2601.16677 | translate | read | null |
| 2026-01-23 | A Cognitive Framework for Autonomous Agents: Toward Human-Inspired Design | Francesco Guidi et.al. | 2601.16648 | translate | read | null |
| 2026-01-23 | Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab | Julius Beerwerth et.al. | 2601.16578 | translate | read | null |
| 2026-01-23 | Spiking Neural Networks for Communication Systems: Encoding Schemes, Learning Algorithms, and Equalization~Techniques | Eike-Manuel Edelmann et.al. | 2601.16550 | translate | read | null |
| 2026-01-23 | UAV-Assisted Joint Data Collection and Wireless Power Transfer for Batteryless Sensor Networks | Wen Zhang et.al. | 2601.16533 | translate | read | null |
| 2026-01-23 | Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic | Yichuan Ma et.al. | 2601.16486 | translate | read | null |
| 2026-01-23 | FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning | Haoxu Wang et.al. | 2601.16483 | translate | read | null |
| 2026-01-23 | Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go | Yichuan Ma et.al. | 2601.16447 | translate | read | null |
| 2026-01-23 | Endless Terminals: Scaling RL Environments for Terminal Agents | Kanishk Gandhi et.al. | 2601.16443 | translate | read | link |
| 2026-01-23 | Reinforcement Learning-Based Energy-Aware Coverage Path Planning for Precision Agriculture | Beining Wu et.al. | 2601.16405 | translate | read | null |
| 2026-01-23 | Towards a Theoretical Understanding to the Generalization of RLHF | Zhaochun Li et.al. | 2601.16403 | translate | read | null |
| 2026-01-23 | Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification | Zongwan Cao et.al. | 2601.16400 | translate | read | null |
| 2026-01-23 | A Regularized Actor-Critic Algorithm for Bi-Level Reinforcement Learning | Sihan Zeng et.al. | 2601.16399 | translate | read | null |
| 2026-01-22 | LLM-in-Sandbox Elicits General Agentic Intelligence | Daixuan Cheng et.al. | 2601.16206 | translate | read | link |
| 2026-01-22 | Learning to Discover at Test Time | Mert Yuksekgonul et.al. | 2601.16175 | translate | read | link |
| 2026-01-22 | Structured Hints for Sample-Efficient Lean Theorem Proving | Zachary Burton et.al. | 2601.16172 | translate | read | null |
| 2026-01-22 | Efficiently Learning Robust Torque-based Locomotion Through Reinforcement with Model-Based Supervision | Yashuai Yan et.al. | 2601.16109 | translate | read | null |
| 2026-01-22 | SAMTok: Representing Any Mask with Two Words | Yikang Zhou et.al. | 2601.16093 | translate | read | link |
| 2026-01-22 | Dynamic Tactile Sensing System and Soft Actor Critic Reinforcement Learning for Inclusion Characterization | John Bannan et.al. | 2601.16061 | translate | read | null |
| 2026-01-22 | Keyframe-Based Feed-Forward Visual Odometry | Weichen Dai et.al. | 2601.16020 | translate | read | null |
| 2026-01-22 | PUMA: Perception-driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour | Liang Wang et.al. | 2601.15995 | translate | read | null |
| 2026-01-22 | Decoupling Return-to-Go for Efficient Decision Transformer | Yongyi Wang et.al. | 2601.15953 | translate | read | null |
| 2026-01-22 | Off-Policy Actor-Critic with Sigmoid-Bounded Entropy for Real-World Robot Learning | Xiefeng Wu et.al. | 2601.15761 | translate | read | null |
| 2026-01-22 | PhysProver: Advancing Automatic Theorem Proving for Physics | Hanning Zhang et.al. | 2601.15737 | translate | read | null |
| 2026-01-22 | Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind | Zhitao He et.al. | 2601.15715 | translate | read | null |
| 2026-01-22 | D-Optimality-Guided Reinforcement Learning for Efficient Open-Loop Calibration of a 3-DOF Ankle Rehabilitation Robot | Qifan Hu et.al. | 2601.15707 | translate | read | null |
| 2026-01-22 | From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models | Jiaxin Zhang et.al. | 2601.15690 | translate | read | null |
| 2026-01-22 | Performance-guided Reinforced Active Learning for Object Detection | Zhixuan Liang et.al. | 2601.15688 | translate | read | null |
| 2026-01-22 | EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning | Dingdong Wang et.al. | 2601.15668 | translate | read | null |
| 2026-01-22 | Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors | Zhiwei Zhang et.al. | 2601.15625 | translate | read | null |
| 2026-01-22 | Explainable Deepfake Detection with RL Enhanced Self-Blended Images | Ning Jiang et.al. | 2601.15624 | translate | read | null |
| 2026-01-22 | AION: Aerial Indoor Object-Goal Navigation Using Dual-Policy Reinforcement Learning | Zichen Yan et.al. | 2601.15614 | translate | read | null |
| 2026-01-22 | When Sharpening Becomes Collapse: Sampling Bias and Semantic Coupling in RL with Verifiable Rewards | Mingyuan Fan et.al. | 2601.15609 | translate | read | null |
| 2026-01-22 | A Mobile Magnetic Manipulation Platform for Gastrointestinal Navigation with Deep Reinforcement Learning Control | Zhifan Yan et.al. | 2601.15545 | translate | read | null |
| 2026-01-21 | Non-Stationary Functional Bilevel Optimization | Jason Bohne et.al. | 2601.15363 | translate | read | null |
| 2026-01-21 | Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing | Xiang Li et.al. | 2601.15356 | translate | read | null |
| 2026-01-21 | Statistical Reinforcement Learning in the Real World: A Survey of Challenges and Future Directions | Asim H. Gazi et.al. | 2601.15353 | translate | read | null |
| 2026-01-20 | ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation | Zhebo Wang et.al. | 2601.15330 | translate | read | null |
| 2026-01-21 | The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models | Zanlin Ni et.al. | 2601.15165 | translate | read | link |
| 2026-01-21 | Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning | Yuval Kansal et.al. | 2601.15160 | translate | read | null |
| 2026-01-21 | Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data | Yuval Ran-Milo et.al. | 2601.15158 | translate | read | null |
| 2026-01-21 | CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning | Tianshi Xu et.al. | 2601.15141 | translate | read | null |
| 2026-01-21 | Vehicle Routing with Finite Time Horizon using Deep Reinforcement Learning with Improved Network Embedding | Ayan Maity et.al. | 2601.15131 | translate | read | null |
| 2026-01-21 | Memory Retention Is Not Enough to Master Memory Tasks in Reinforcement Learning | Oleg Shchendrigin et.al. | 2601.15086 | translate | read | null |
| 2026-01-21 | A Curriculum-Based Deep Reinforcement Learning Framework for the Electric Vehicle Routing Problem | Mertcan Daysalilar et.al. | 2601.15038 | translate | read | null |
| 2026-01-21 | Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control | Jannis Becktepe et.al. | 2601.15015 | translate | read | null |
| 2026-01-21 | Improving Regret Approximation for Unsupervised Dynamic Environment Generation | Harry Mead et.al. | 2601.14957 | translate | read | null |
| 2026-01-21 | Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation | Rui Qi et.al. | 2601.14896 | translate | read | null |
| 2026-01-21 | What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study | Keyu Lv et.al. | 2601.14888 | translate | read | null |
| 2026-01-21 | CI4A: Semantic Component Interfaces for Agents Empowering Web Automation | Zhi Qiu et.al. | 2601.14790 | translate | read | null |
| 2026-01-21 | ReinPath: A Multimodal Reinforcement Learning Approach for Pathology | Kangcheng Zhou et.al. | 2601.14757 | translate | read | null |
| 2026-01-21 | PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning | Yao Lu et.al. | 2601.14716 | translate | read | null |
| 2026-01-21 | DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs | Mingxuan Song et.al. | 2601.14711 | translate | read | null |
| 2026-01-21 | Case-Guided Sequential Assay Planning in Drug Discovery | Tianchi Chen et.al. | 2601.14710 | translate | read | null |
| 2026-01-21 | Proximal Policy Optimization with Evolutionary Mutations | Casimir Czworkowski et.al. | 2601.14705 | translate | read | null |
| 2026-01-21 | DARL: Encouraging Diverse Answers for General Reasoning without Verifiers | Chongxuan Huang et.al. | 2601.14700 | translate | read | null |
| 2026-01-21 | CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation | Yutong Chen et.al. | 2601.14695 | translate | read | null |
| 2026-01-21 | Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning | Jianwen Sun et.al. | 2601.14693 | translate | read | null |
| 2026-01-21 | FARE: Fast-Slow Agentic Robotic Exploration | Shuhao Liao et.al. | 2601.14681 | translate | read | null |
| 2026-01-21 | MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks | Zixuan Ke et.al. | 2601.14652 | translate | read | null |
| 2026-01-21 | SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation | Xichen Zhang et.al. | 2601.14615 | translate | read | null |
| 2026-01-21 | Learning Consistent Taxonomic Classification through Hierarchical Reasoning | Zhenghong Li et.al. | 2601.14610 | translate | read | null |
| 2026-01-21 | Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education | Unggi Lee et.al. | 2601.14560 | translate | read | null |
| 2026-01-20 | Report for NSF Workshop on AI for Electronic Design Automation | Deming Chen et.al. | 2601.14541 | translate | read | null |
| 2026-01-20 | Towards Execution-Grounded Automated AI Research | Chenglei Si et.al. | 2601.14525 | translate | read | link |
| 2026-01-20 | Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree | Leyi Zhao et.al. | 2601.14523 | translate | read | null |
| 2026-01-20 | Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow | Haocheng Xi et.al. | 2601.14243 | translate | read | null |
| 2026-01-20 | Spatiotemporal Wildfire Prediction and Reinforcement Learning for Helitack Suppression | Shaurya Mathur et.al. | 2601.14238 | translate | read | null |
| 2026-01-20 | Q-learning with Adjoint Matching | Qiyang Li et.al. | 2601.14234 | translate | read | link |
| 2026-01-20 | KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning | Egor Cherepanov et.al. | 2601.14232 | translate | read | link |
| 2026-01-20 | Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment | Punit Kumar et.al. | 2601.14228 | translate | read | null |
| 2026-01-20 | InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning | Matthew Y. R. Yang et.al. | 2601.14209 | translate | read | null |
| 2026-01-20 | Differentiated Pickup Point Offering for Emission Reduction in Last-Mile Delivery | Albina Galiullina et.al. | 2601.14196 | translate | read | null |
| 2026-01-20 | Toward Efficient Agents: Memory, Tool learning, and Planning | Xiaofang Yang et.al. | 2601.14192 | translate | read | link |
| 2026-01-20 | CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems | Tong Xie et.al. | 2601.14140 | translate | read | null |
| 2026-01-20 | Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning | Tairan Huang et.al. | 2601.14104 | translate | read | null |
| 2026-01-20 | Optimizing Energy and Data Collection in UAV-aided IoT Networks using Attention-based Multi-Objective Reinforcement Learning | Babacar Toure et.al. | 2601.14092 | translate | read | null |
| 2026-01-20 | RM-Distiller: Exploiting Generative LLM for Reward Model Distillation | Hongli Zhou et.al. | 2601.14032 | translate | read | null |
| 2026-01-20 | RL-BioAug: Label-Efficient Reinforcement Learning for Self-Supervised EEG Representation Learning | Cheol-Hui Lee et.al. | 2601.13964 | translate | read | null |
| 2026-01-20 | Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning | Hongbo Bai et.al. | 2601.13942 | translate | read | null |
| 2026-01-20 | Deep Reinforcement Learning-Based Dynamic Resource Allocation in Cell-Free Massive MIMO | Phuong Nam Tran et.al. | 2601.13934 | translate | read | null |
| 2026-01-20 | HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs | Yuezhe Yang et.al. | 2601.13919 | translate | read | null |
| 2026-01-20 | TractRLFusion: A GPT-Based Multi-Critic Policy Fusion Framework for Fiber Tractography | Ankita Joshi et.al. | 2601.13897 | translate | read | null |
| 2026-01-20 | Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering | Chak Tou Leong et.al. | 2601.13752 | translate | read | null |
| 2026-01-20 | Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning | Yue Guo et.al. | 2601.13690 | translate | read | null |
| 2026-01-20 | Reinforcement Learning for Opportunistic Routing in Software-Defined LEO-Terrestrial Systems | Sivaram Krishnan et.al. | 2601.13662 | translate | read | null |
| 2026-01-20 | Communication-Free Collective Navigation for a Swarm of UAVs via LiDAR-Based Deep Reinforcement Learning | Myong-Yol Choi et.al. | 2601.13657 | translate | read | null |
| 2026-01-20 | Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning | Yuchen Jiao et.al. | 2601.13642 | translate | read | null |
| 2026-01-20 | A Kubernetes custom scheduler based on reinforcement learning for compute-intensive pods | Hanlin Zhou et.al. | 2601.13579 | translate | read | null |
| 2026-01-20 | Behavior Knowledge Merge in Reinforced Agentic Models | Xiangchi Yuan et.al. | 2601.13572 | translate | read | link |
| 2026-01-20 | Reasoning While Recommending: Entropy-Guided Latent Reasoning in Generative Re-ranking Models | Changshuo Zhang et.al. | 2601.13533 | translate | read | null |
| 2026-01-20 | Group Relative Policy Optimization for Robust Blind Interference Alignment with Fluid Antennas | Jianqiu Peng et.al. | 2601.13506 | translate | read | null |
| 2026-01-19 | RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models | Bo Ren et.al. | 2601.13409 | translate | read | null |
| 2026-01-19 | Balancing Classification and Calibration Performance in Decision-Making LLMs via Calibration Aware Reinforcement Learning | Duygu Nur Yaldiz et.al. | 2601.13284 | translate | read | null |
| 2026-01-19 | CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning | Eric Onyame et.al. | 2601.13262 | translate | read | link |
| 2026-01-19 | Autonomous Navigation at the Nano-Scale: Algorithms, Architectures, and Constraints | Mahmud S. Zango et.al. | 2601.13252 | translate | read | null |
| 2026-01-19 | Training instability in deep learning follows low-dimensional dynamical principles | Zhipeng Zhang et.al. | 2601.13160 | translate | read | null |
| 2026-01-19 | Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning | Fengran Mo et.al. | 2601.13115 | translate | read | null |
| 2026-01-19 | Static Is Not Enough: A Comparative Study of VR and SpaceMouse in Static and Dynamic Teleoperation Tasks | Yijun Zhou et.al. | 2601.13042 | translate | read | null |
| 2026-01-19 | Feedforward-Feedback Integration in Flight Control: Reinforcement Learning with Sliding Mode Control | Imran Sayyed et.al. | 2601.13037 | translate | read | null |
| 2026-01-19 | Think3D: Thinking with Space for Spatial Reasoning | Zaibin Zhang et.al. | 2601.13029 | translate | read | link |
| 2026-01-19 | Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models | Runxuan Liu et.al. | 2601.12995 | translate | read | null |
| 2026-01-19 | PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient | Zijian Wang et.al. | 2601.12988 | translate | read | null |
| 2026-01-19 | Imitation learning-based spacecraft rendezvous and docking method with Expert Demonstration | Shibo Shao et.al. | 2601.12952 | translate | read | null |
| 2026-01-19 | Communication Methods in Multi-Agent Reinforcement Learning | Christoph Wittner et.al. | 2601.12886 | translate | read | null |
| 2026-01-19 | FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions | Peng Li et.al. | 2601.12799 | translate | read | link |
| 2026-01-19 | Unleashing Efficient Asynchronous RL Post-Training via Staleness-Constrained Rollout Coordination | Haoyang Li et.al. | 2601.12784 | translate | read | null |
| 2026-01-19 | SDN-Blockchain Based Security Routing for UAV Communication via Reinforcement Learning | Yulu Han et.al. | 2601.12774 | translate | read | null |
| 2026-01-19 | Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction | Xingjie Gao et.al. | 2601.12762 | translate | read | link |
| 2026-01-19 | Distribution-Centric Policy Optimization Dominates Exploration-Exploitation Trade-off | Zhaochun Li et.al. | 2601.12730 | translate | read | link |
| 2026-01-19 | Teaching Large Reasoning Models Effective Reflection | Hanbin Wang et.al. | 2601.12720 | translate | read | null |
| 2026-01-19 | Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization | Junyi Liao et.al. | 2601.12707 | translate | read | null |
| 2026-01-19 | Resource-Conscious RL Algorithms for Deep Brain Stimulation | Arkaprava Gupta et.al. | 2601.12699 | translate | read | null |
| 2026-01-19 | Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks | Xingran Chen et.al. | 2601.12662 | translate | read | null |
| 2026-01-19 | Two-Layer Reinforcement Learning-Assisted Joint Beamforming and Trajectory Optimization for Multi-UAV Downlink Communications | Ruiqi Wang et.al. | 2601.12659 | translate | read | null |
| 2026-01-19 | Multiagent Reinforcement Learning in Enhancing Resilience of Microgrids under Extreme Weather Events | Yin Wu et.al. | 2601.12657 | translate | read | null |
| 2026-01-19 | STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models | Xiangyu Shi et.al. | 2601.12641 | translate | read | null |
| 2026-01-16 | Do explanations generalize across large reasoning models? | Koyena Pal et.al. | 2601.11517 | translate | read | null |
| 2026-01-16 | Generative Scenario Rollouts for End-to-End Autonomous Driving | Rajeev Yasarla et.al. | 2601.11475 | translate | read | null |
| 2026-01-16 | The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents | Ziyu Wang et.al. | 2601.11421 | translate | read | null |
| 2026-01-16 | Factored Value Functions for Graph-Based Multi-Agent Reinforcement Learning | Ahmed Rashwan et.al. | 2601.11401 | translate | read | null |
| 2026-01-16 | The Mini Wheelbot Dataset: High-Fidelity Data for Robot Learning | Henrik Hose et.al. | 2601.11394 | translate | read | null |
| 2026-01-16 | Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency | Akhilesh Raj et.al. | 2601.11352 | translate | read | null |
| 2026-01-16 | Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation | Pingzhi Tang et.al. | 2601.11258 | translate | read | null |
| 2026-01-16 | Model-free policy gradient for discrete-time mean-field control | Matthieu Meunier et.al. | 2601.11217 | translate | read | null |
| 2026-01-16 | Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems | Sofiene Lassoued et.al. | 2601.11189 | translate | read | null |
| 2026-01-16 | TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech | Girish A. Koushik et.al. | 2601.11178 | translate | read | null |
| 2026-01-16 | Deep GraphRAG: A Balanced Approach to Hierarchical Retrieval and Adaptive Integration | Yuejie Li et.al. | 2601.11144 | translate | read | null |
| 2026-01-16 | Learning Quadrupedal Locomotion for a Heavy Hydraulic Robot Using an Actuator Model | Minho Lee et.al. | 2601.11143 | translate | read | null |
| 2026-01-16 | PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models | Qiyuan Zhang et.al. | 2601.11087 | translate | read | null |
| 2026-01-16 | Visual Marker Search for Autonomous Drone Landing in Diverse Urban Environments | Jiaohong Yao et.al. | 2601.11078 | translate | read | null |
| 2026-01-16 | Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs | Lecheng Yan et.al. | 2601.11061 | translate | read | null |
| 2026-01-16 | BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search | Shiyu Liu et.al. | 2601.11037 | translate | read | link |
| 2026-01-16 | Toward Adaptive Grid Resilience: A Gradient-Free Meta-RL Framework for Critical Load Restoration | Zain ul Abdeen et.al. | 2601.10973 | translate | read | null |
| 2026-01-16 | MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement | Meidan Ding et.al. | 2601.10949 | translate | read | null |
| 2026-01-16 | Where to Touch, How to Contact: Hierarchical RL-MPC Framework for Geometry-Aware Long-Horizon Dexterous Manipulation | Zhixian Xie et.al. | 2601.10930 | translate | read | null |
| 2026-01-15 | Realistic Curriculum Reinforcement Learning for Autonomous and Sustainable Marine Vessel Navigation | Zhang Xiaocai et.al. | 2601.10911 | translate | read | null |
| 2026-01-15 | Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning | Rajat Ghosh et.al. | 2601.10905 | translate | read | null |
| 2026-01-15 | Reasoning Models Generate Societies of Thought | Junsol Kim et.al. | 2601.10825 | translate | read | null |
| 2026-01-11 | Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration | Sen Wang et.al. | 2601.10744 | translate | read | null |
| 2026-01-15 | MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching | Changle Qu et.al. | 2601.10712 | translate | read | null |
| 2026-01-15 | Institutional AI: A Governance Framework for Distributional AGI Safety | Federico Pierucci et.al. | 2601.10599 | translate | read | null |
| 2026-01-15 | Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay | Hao Wang et.al. | 2601.10589 | translate | read | null |
| 2026-01-15 | Combinatorial Optimization Augmented Machine Learning | Maximilian Schiffer et.al. | 2601.10583 | translate | read | null |
| 2026-01-15 | PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models | Chengbing Wang et.al. | 2601.10532 | translate | read | null |
| 2026-01-15 | Projected Microbatch Accumulation yields reference-free proximal policy updates for reinforcement learning | Nilin Abrahamsen et.al. | 2601.10498 | translate | read | null |
| 2026-01-15 | Urban Socio-Semantic Segmentation with Vision-Language Reasoning | Yu Wang et.al. | 2601.10477 | translate | read | null |
| 2026-01-15 | Reinforcement Learning with Multi-Step Lookahead Information Via Adaptive Batching | Nadav Merlis et.al. | 2601.10418 | translate | read | null |
| 2026-01-15 | CS-GBA: A Critical Sample-based Gradient-guided Backdoor Attack for Offline Reinforcement Learning | Yuanjie Zhao et.al. | 2601.10407 | translate | read | null |
| 2026-01-15 | Advanced Manufacturing with Renewable and Bio-based Materials: AI/ML workflows and Process Optimization | Rigoberto Advincula et.al. | 2601.10382 | translate | read | null |
| 2026-01-15 | FastStair: Learning to Run Up Stairs with Humanoid Robots | Yan Liu et.al. | 2601.10365 | translate | read | null |
| 2026-01-15 | SuS: Strategy-aware Surprise for Intrinsic Exploration | Mark Kashirskiy et.al. | 2601.10349 | translate | read | null |
| 2026-01-15 | Boundary-Aware NL2SQL: Integrating Reliability through Hybrid Reward and Data Synthesis | Songsong Tian et.al. | 2601.10318 | translate | read | null |
| 2026-01-15 | Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning | Xin Guan et.al. | 2601.10306 | translate | read | null |
| 2026-01-15 | The impact of tactile sensor configurations on grasp learning efficiency – a comparative evaluation in simulation | Eszter Birtalan et.al. | 2601.10268 | translate | read | null |
| 2026-01-15 | PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary | Jiarui Yao et.al. | 2601.10201 | translate | read | null |
| 2026-01-15 | HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning | Ziang Cui et.al. | 2601.10187 | translate | read | null |
| 2026-01-15 | Reinforcement Learning to Discover a NorthEast Monsoon Index for Monthly Rainfall Prediction in Thailand | Kiattikun Chobtham et.al. | 2601.10181 | translate | read | null |
| 2026-01-15 | Service Provisioning and Path Planning with Obstacle Avoidance for Low-Altitude Wireless Networks | Senning Wan et.al. | 2601.10179 | translate | read | null |
| 2026-01-15 | ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback | Yutao Mou et.al. | 2601.10156 | translate | read | null |
| 2026-01-15 | DecisionLLM: Large Language Models for Long Sequence Decision Exploration | Xiaowei Lv et.al. | 2601.10148 | translate | read | null |
| 2026-01-15 | History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis | Haochong Xia et.al. | 2601.10143 | translate | read | null |
| 2026-01-15 | Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts | Sijia Luo et.al. | 2601.10079 | translate | read | null |
| 2026-01-15 | Event-Driven Deep RL Dispatcher for Post-Storm Distribution System Restoration | Farshad Amani et.al. | 2601.10044 | translate | read | null |
| 2026-01-15 | PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization | Tingyue Pan et.al. | 2601.10029 | translate | read | null |
| 2026-01-15 | Towards Native Intelligence: 6G-LLM Trained with Reinforcement Learning from NDT Feedback | Zhuoran Xiao et.al. | 2601.09992 | translate | read | null |
| 2026-01-14 | OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing | Yilin Bao et.al. | 2601.09858 | translate | read | null |
| 2026-01-14 | Eluder dimension: localise it! | Alireza Bakhtiari et.al. | 2601.09825 | translate | read | null |
| 2026-01-14 | GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents | Chen Chen et.al. | 2601.09770 | translate | read | null |
| 2026-01-14 | STEP3-VL-10B Technical Report | Ailin Huang et.al. | 2601.09668 | translate | read | null |
| 2026-01-14 | Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning | Zhiyuan Hu et.al. | 2601.09667 | translate | read | null |
| 2026-01-14 | DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing | Qian Cao et.al. | 2601.09609 | translate | read | null |
| 2026-01-14 | Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets | Jeremiah Coholich et.al. | 2601.09605 | translate | read | null |
| 2026-01-14 | Dialogue Telemetry: Turn-Level Instrumentation for Autonomous Information Gathering | Dimitris Panagopoulos et.al. | 2601.09570 | translate | read | null |
| 2026-01-14 | Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations | Wei-Jin Huang et.al. | 2601.09518 | translate | read | null |
| 2026-01-14 | Data Scaling for Navigation in Unknown Environments | Lauri Suomela et.al. | 2601.09444 | translate | read | null |
| 2026-01-14 | Draw it like Euclid: Teaching transformer models to generate CAD profiles using ruler and compass construction steps | Siyi Li et.al. | 2601.09428 | translate | read | null |
| 2026-01-14 | Semi-Contention-Free Access in IoT NOMA Networks: A Reinforcement Learning Framework | Abhishek Kumar et.al. | 2601.09422 | translate | read | null |
| 2026-01-14 | GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR | Jiaying Zhang et.al. | 2601.09361 | translate | read | null |
| 2026-01-14 | Monte-Carlo Tree Search with Neural Network Guidance for Lane-Free Autonomous Driving | Ioannis Peridis et.al. | 2601.09353 | translate | read | null |
| 2026-01-14 | Policy-Based Reinforcement Learning with Action Masking for Dynamic Job Shop Scheduling under Uncertainty: Handling Random Arrivals and Machine Failures | Sofiene Lassoued et.al. | 2601.09293 | translate | read | null |
| 2026-01-14 | Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction | Mianzhi Pan et.al. | 2601.09285 | translate | read | null |
| 2026-01-14 | RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering | Wencheng Ye et.al. | 2601.09269 | translate | read | link |
| 2026-01-14 | Learning to Trust Experience: A Monitor-Trust-Regulator Framework for Learning under Unobservable Feedback Reliability | Zhipeng Zhang et.al. | 2601.09261 | translate | read | null |
| 2026-01-14 | Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models | Yan Liu et.al. | 2601.09260 | translate | read | null |
| 2026-01-14 | Reward Learning through Ranking Mean Squared Error | Chaitanya Kharyal et.al. | 2601.09236 | translate | read | null |
| 2026-01-14 | GIFT: Unlocking Global Optimality in Post-Training via Finite-Temperature Gibbs Initialization | Zhengyang Zhao et.al. | 2601.09233 | translate | read | null |
| 2026-01-14 | UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning | Feng Zhang et.al. | 2601.09215 | translate | read | null |
| 2026-01-14 | SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL | Lijun Liu et.al. | 2601.09136 | translate | read | null |
| 2026-01-14 | SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache | Chi-Chih Chang et.al. | 2601.09083 | translate | read | null |
| 2026-01-13 | TranslateGemma Technical Report | Mara Finkelstein et.al. | 2601.09012 | translate | read | null |
| 2026-01-13 | Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge | Yao Tang et.al. | 2601.08808 | translate | read | null |
| 2026-01-13 | Identifying Latent Intentions via Inverse Reinforcement Learning in Repeated Linear Public Good Games | Carina I. Hausladen et.al. | 2601.08803 | translate | read | null |
| 2026-01-13 | Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs | Zhiyuan Hu et.al. | 2601.08763 | translate | read | null |
| 2026-01-13 | TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback | Prithwish Jana et.al. | 2601.08734 | translate | read | null |
| 2026-01-13 | Learning from Demonstrations via Capability-Aware Goal Sampling | Yuanlin Duan et.al. | 2601.08731 | translate | read | null |
| 2026-01-13 | Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts | Bert Verbruggen et.al. | 2601.08726 | translate | read | null |
| 2026-01-13 | QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models | Zhaolu Kang et.al. | 2601.08689 | translate | read | null |
| 2026-01-13 | PersonaDual: Balancing Personalization and Objectivity via Adaptive Reasoning | Xiaoyou Liu et.al. | 2601.08679 | translate | read | null |
| 2026-01-13 | VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory | Shaoan Wang et.al. | 2601.08665 | translate | read | null |
| 2026-01-13 | From Classical to Quantum Reinforcement Learning and Its Applications in Quantum Control: A Beginner’s Tutorial | Abhijit Sen et.al. | 2601.08662 | translate | read | null |
| 2026-01-13 | Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization | Abhijit Mazumdar et.al. | 2601.08646 | translate | read | null |
| 2026-01-13 | Your Group-Relative Advantage Is Biased | Fengkai Yang et.al. | 2601.08521 | translate | read | null |
| 2026-01-13 | AUV Trajectory Learning for Underwater Acoustic Energy Transfer and Age Minimization | Mohamed Afouene Melki et.al. | 2601.08491 | translate | read | null |
| 2026-01-13 | AME-2: Agile and Generalized Legged Locomotion via Attention-Based Neural Map Encoding | Chong Zhang et.al. | 2601.08485 | translate | read | null |
| 2026-01-13 | Baiting AI: Deceptive Adversary Against AI-Protected Industrial Infrastructures | Aryan Pasikhani et.al. | 2601.08481 | translate | read | null |
| 2026-01-13 | JudgeRLVR: Judge First, Generate Second for Efficient Reasoning | Jiangshan Duo et.al. | 2601.08468 | translate | read | null |
| 2026-01-13 | Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis | Yi Qin et.al. | 2601.08440 | translate | read | null |
| 2026-01-13 | Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management | Weitao Ma et.al. | 2601.08435 | translate | read | null |
| 2026-01-13 | Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving? | Long Zhang et.al. | 2601.08434 | translate | read | null |
| 2026-01-13 | RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation | Sunzhu Li et.al. | 2601.08430 | translate | read | null |
| 2026-01-13 | Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering | Nonghai Zhang et.al. | 2601.08427 | translate | read | null |
| 2026-01-13 | Owen-Shapley Policy Optimization (OSPO): A Principled RL Algorithm for Generative Search LLMs | Abhijnan Nath et.al. | 2601.08403 | translate | read | null |
| 2026-01-13 | Safe Heterogeneous Multi-Agent RL with Communication Regularization for Coordinated Target Acquisition | Gabriele Calzolari et.al. | 2601.08327 | translate | read | null |
| 2026-01-13 | AtomMem : Learnable Dynamic Agentic Memory with Atomic Memory Operation | Yupeng Huo et.al. | 2601.08323 | translate | read | null |
| 2026-01-13 | ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning | Kun Liang et.al. | 2601.08310 | translate | read | null |
| 2026-01-13 | D $^2$ Plan: Dual-Agent Dynamic Global Planning for Complex Retrieval-Augmented Reasoning | Kangcheng Luo et.al. | 2601.08282 | translate | read | null |
| 2026-01-13 | Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees | Kun Li et.al. | 2601.08274 | translate | read | null |
| 2026-01-13 | Unleashing Tool Engineering and Intelligence for Agentic AI in Next-Generation Communication Networks | Yinqiu Liu et.al. | 2601.08259 | translate | read | null |
| 2026-01-13 | Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks | Abdikarim Mohamed Ibrahim et.al. | 2601.08254 | translate | read | null |
| 2026-01-13 | Incorporating Cognitive Biases into Reinforcement Learning for Financial Decision-Making | Liu He et.al. | 2601.08247 | translate | read | null |
| 2026-01-13 | The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination | Haoran Su et.al. | 2601.08237 | translate | read | null |
| 2026-01-13 | Scalable Multiagent Reinforcement Learning with Collective Influence Estimation | Zhenglong Luo et.al. | 2601.08210 | translate | read | null |
| 2026-01-13 | ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms | Mohammad Pivezhandi et.al. | 2601.08166 | translate | read | null |
| 2026-01-13 | Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies | Zeyang Li et.al. | 2601.08136 | translate | read | null |
| 2026-01-13 | Structure Detection for Contextual Reinforcement Learning | Tianyue Zhou et.al. | 2601.08120 | translate | read | null |
| 2026-01-13 | STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order | Chengyang Gu et.al. | 2601.08107 | translate | read | null |
| 2026-01-12 | DRL-based Power Allocation in LiDAL-Assisted RLNC-NOMA OWC Systems | Ahmed A. Hassan et.al. | 2601.08060 | translate | read | null |
| 2026-01-12 | Forecast Aware Deep Reinforcement Learning for Efficient Electricity Load Scheduling in Dairy Farms | Nawazish Alia et.al. | 2601.08052 | translate | read | null |
| 2026-01-12 | Formalizing the Relationship between Hamilton-Jacobi Reachability and Reinforcement Learning | Prashant Solanki et.al. | 2601.08050 | translate | read | null |
| 2026-01-12 | FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures | Jifeng Song et.al. | 2601.08026 | translate | read | null |
| 2026-01-12 | Learning Better Error Correction Codes with Hybrid Quantum-Assisted Machine Learning | Yariv Yanay et.al. | 2601.08014 | translate | read | null |
| 2026-01-12 | Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety | Can Jin et.al. | 2601.08000 | translate | read | null |
| 2026-01-12 | Reinforcement Learning Methods for Neighborhood Selection in Local Search | Yannick Molinghen et.al. | 2601.07948 | translate | read | null |
| 2026-01-12 | Video Generation Models in Robotics – Applications, Research Challenges, Future Directions | Zhiting Mei et.al. | 2601.07823 | translate | read | null |
| 2026-01-12 | Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation | Huanyu Li et.al. | 2601.07821 | translate | read | null |
| 2026-01-12 | Data-driven control of hydraulic impact hammers under strict operational and control constraints | Francisco Leiva et.al. | 2601.07813 | translate | read | null |
| 2026-01-12 | Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning | Wei Fang et.al. | 2601.07782 | translate | read | null |
| 2026-01-12 | Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding | Yanxiang Huang et.al. | 2601.07761 | translate | read | null |
| 2026-01-12 | Hiking in the Wild: A Scalable Perceptive Parkour Framework for Humanoids | Shaoting Zhu et.al. | 2601.07718 | translate | read | null |
| 2026-01-12 | Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model | Siwen Jiao et.al. | 2601.07695 | translate | read | null |
| 2026-01-12 | Reinforcement Learning for Micro-Level Claims Reserving | Benjamin Avanzi et.al. | 2601.07637 | translate | read | null |
| 2026-01-12 | Clipped Affine Policy: Low-Complexity Near-Optimal Online Power Control for Energy Harvesting Communications over Fading Channels | Hao Wu et.al. | 2601.07622 | translate | read | null |
| 2026-01-12 | GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation | Dimple Vijay Kochar et.al. | 2601.07593 | translate | read | null |
| 2026-01-12 | Large Language Models for Physics Instrument Design | Sara Zoccheddu et.al. | 2601.07580 | translate | read | null |
| 2026-01-12 | Stagewise Reinforcement Learning and the Geometry of the Regret Landscape | Chris Elliott et.al. | 2601.07524 | translate | read | null |
| 2026-01-12 | Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions | Yongqi Li et.al. | 2601.07516 | translate | read | null |
| 2026-01-12 | Graph Inference Towards ICD Coding | Xiaoxiao Deng et.al. | 2601.07496 | translate | read | null |
| 2026-01-12 | Online Markov Decision Processes with Terminal Law Constraints | Bianca Marin Moreno et.al. | 2601.07492 | translate | read | null |
| 2026-01-12 | Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning | Sijia li et.al. | 2601.07463 | translate | read | null |
| 2026-01-12 | LOONG: Online Time-Optimal Autonomous Flight for MAVs in Cluttered Environments | Xin Guan et.al. | 2601.07434 | translate | read | null |
| 2026-01-12 | Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning | Ziheng Li et.al. | 2601.07408 | translate | read | null |
| 2026-01-12 | On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training | Xueyan Niu et.al. | 2601.07389 | translate | read | null |
| 2026-01-12 | OpenTinker: Separating Concerns in Agentic Reinforcement Learning | Siqi Zhu et.al. | 2601.07376 | translate | read | link |
| 2026-01-12 | Reward Modeling from Natural Language Human Feedback | Zongqi Wang et.al. | 2601.07349 | translate | read | null |
| 2026-01-12 | Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training | Xue Gong et.al. | 2601.07320 | translate | read | null |
| 2026-01-12 | Low-Altitude Satellite-AAV Collaborative Joint Mobile Edge Computing and Data Collection via Diffusion-based Deep Reinforcement Learning | Boxiong Wang et.al. | 2601.07307 | translate | read | null |
| 2026-01-12 | Heterogeneous Multi-Expert Reinforcement Learning for Long-Horizon Multi-Goal Tasks in Autonomous Forklifts | Yun Chen et.al. | 2601.07304 | translate | read | null |
| 2026-01-12 | Mimic Human Cognition, Master Multi-Image Reasoning: A Meta-Action Framework for Enhanced Visual Understanding | Jianghao Yin et.al. | 2601.07298 | translate | read | null |
| 2026-01-12 | LRAS: Advanced Legal Reasoning with Agentic Search | Yujin Zhou et.al. | 2601.07296 | translate | read | null |
| 2026-01-12 | ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios | Changzai Pan et.al. | 2601.07280 | translate | read | null |
| 2026-01-12 | The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents | Weihao Xuan et.al. | 2601.07264 | translate | read | null |
| 2026-01-12 | Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning | Hanbin Wang et.al. | 2601.07238 | translate | read | null |
| 2026-01-12 | Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration | Yang Zhao et.al. | 2601.07224 | translate | read | null |
| 2026-01-12 | Structured Reasoning for Large Language Models | Jinyi Han et.al. | 2601.07180 | translate | read | null |
| 2026-01-12 | Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization | Min Wang et.al. | 2601.07164 | translate | read | null |
| 2026-01-12 | AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units | Xinzi Cao et.al. | 2601.07160 | translate | read | null |
| 2026-01-12 | Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation (Extended Version) | Aja Khanal et.al. | 2601.07152 | translate | read | null |
| 2026-01-12 | Rewarding Creativity: A Human-Aligned Generative Reward Model for Reinforcement Learning in Storytelling | Zhaoyan Li et.al. | 2601.07149 | translate | read | null |
| 2026-01-12 | Generating readily synthesizable small molecule fluorophore scaffolds with reinforcement learning | Ruhi Sayana et.al. | 2601.07145 | translate | read | null |
| 2026-01-12 | Dynamics of Multi-Agent Actor-Critic Learning in Stochastic Games: from Multistability and Chaos to Stable Cooperation | Yuxin Geng et.al. | 2601.07142 | translate | read | null |
| 2026-01-12 | ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System | Sungguk Cha et.al. | 2601.07125 | translate | read | null |
| 2026-01-12 | ENTRA: Entropy-Based Redundancy Avoidance in Large Language Model Reasoning | Ruichu Cai et.al. | 2601.07123 | translate | read | null |
| 2026-01-12 | Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework | Yixiao Peng et.al. | 2601.07122 | translate | read | null |
| 2026-01-12 | Reward-Preserving Attacks For Robust Reinforcement Learning | Lucas Schott et.al. | 2601.07118 | translate | read | null |
| 2026-01-12 | MEDVISTAGYM: A Scalable Training Environment for Thinking with Medical Images via Tool-Integrated Reinforcement Learning | Meng Lu et.al. | 2601.07107 | translate | read | null |
| 2026-01-11 | X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests | Jie Wu et.al. | 2601.06953 | translate | read | link |
| 2026-01-11 | TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG | Tianhua Zhang et.al. | 2601.06922 | translate | read | null |
| 2026-01-11 | Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models | Shaoning Sun et.al. | 2601.06911 | translate | read | null |
| 2026-01-11 | Personality-Aware Reinforcement Learning for Persuasive Dialogue with LLM-Driven Simulation | Donghuo Zeng et.al. | 2601.06877 | translate | read | null |
| 2026-01-11 | A Brain-like Synergistic Core in LLMs Drives Behaviour and Learning | Pedro Urbina-Rodriguez et.al. | 2601.06851 | translate | read | null |
| 2026-01-11 | Code Evolution for Control: Synthesizing Policies via LLM-Driven Evolutionary Search | Ping Guo et.al. | 2601.06845 | translate | read | null |
| 2026-01-11 | Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy | Shujian Gao et.al. | 2601.06801 | translate | read | null |
| 2026-01-11 | Artificial Intelligence Driven Channel Coding and Resource Optimization for Wireless Networks | Yasir Ali et.al. | 2601.06796 | translate | read | null |
| 2026-01-11 | GDEPO: Group Dual-dynamic and Equal-right-advantage Policy Optimization with Enhanced Training Data Utilization for Sample-Constrained Reinforcement Learning | Zhengqing Yan et.al. | 2601.06795 | translate | read | null |
| 2026-01-11 | No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning | Zhicong Li et.al. | 2601.06794 | translate | read | null |
| 2026-01-11 | ImmuniFraug: A Metacognitive Intervention Anti-Fraud Approach to Enhance Undergraduate Students’ Cyber Fraud Awareness | Xiangzhe Yuan et.al. | 2601.06774 | translate | read | null |
| 2026-01-11 | GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO | Shubhashis Roy Dipta et.al. | 2601.06767 | translate | read | null |
| 2026-01-11 | On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning | Changyu Liu et.al. | 2601.06748 | translate | read | null |
| 2026-01-10 | Characterising Toxicity in Generative Large Language Models | Zhiyao Zhang et.al. | 2601.06700 | translate | read | null |
| 2026-01-10 | Plasticity vs. Rigidity: The Impact of Low-Rank Adapters on Reasoning on a Micro-Budget | Zohaib Khan et.al. | 2601.06677 | translate | read | null |
| 2026-01-10 | Reinforcement Learning-Guided Dynamic Multi-Graph Fusion for Evacuation Traffic Prediction | Md Nafees Fuad Rafi et.al. | 2601.06664 | translate | read | null |
| 2026-01-10 | KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks | Zhangqi Duan et.al. | 2601.06633 | translate | read | null |
| 2026-01-10 | Object-Centric World Models Meet Monte Carlo Tree Search | Rodion Vakhitov et.al. | 2601.06604 | translate | read | null |
| 2026-01-10 | ArrowGEV: Grounding Events in Video via Learning the Arrow of Time | Fangxu Yu et.al. | 2601.06559 | translate | read | null |
| 2026-01-10 | Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODASER) for Safe Reinforcement Learning in Optimal Control | Roya Khalili Amirabadi et.al. | 2601.06540 | translate | read | null |
| 2026-01-10 | Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection | Minghui Jia et.al. | 2601.06498 | translate | read | link |
| 2026-01-10 | ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking | Qiang Zhang et.al. | 2601.06487 | translate | read | link |
| 2026-01-10 | Coupling Smoothed Particle Hydrodynamics with Multi-Agent Deep Reinforcement Learning for Cooperative Control of Point Absorbers | Yi Zhan et.al. | 2601.06485 | translate | read | null |
| 2026-01-10 | Deep Reinforcement Learning based Control Design for Aircraft Recovery from Loss-of-Control Scenario | Imran Sayyed et.al. | 2601.06439 | translate | read | null |
| 2026-01-10 | LSRIF: Logic-Structured Reinforcement Learning for Instruction Following | Qingyu Ren et.al. | 2601.06431 | translate | read | null |
| 2026-01-10 | Lightweight Yet Secure: Secure Scripting Language Generation via Lightweight LLMs | Keyang Zhang et.al. | 2601.06419 | translate | read | null |
| 2026-01-10 | Dynamic Incentivized Cooperation under Changing Rewards | Philipp Altmann et.al. | 2601.06382 | translate | read | null |
| 2026-01-09 | Future-as-Label: Scalable Supervision from Real-World Outcomes | Benjamin Turtel et.al. | 2601.06336 | translate | read | null |
| 2026-01-09 | The pros and cons of using deep reinforcement learning or genetic algorithms to design control schemes for quantum state transfer on qubit chains | Sofía Perón Santana et.al. | 2601.06303 | translate | read | null |
| 2026-01-09 | How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning? | Yufeng Wang et.al. | 2601.06289 | translate | read | null |
| 2026-01-09 | Walk the PLANC: Physics-Guided RL for Agile Humanoid Locomotion on Constrained Footholds | Min Dai et.al. | 2601.06286 | translate | read | null |
| 2026-01-09 | Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization | Miao Pan et.al. | 2601.06224 | translate | read | null |
| 2026-01-09 | Toward Safe and Responsible AI Agents: A Three-Pillar Model for Transparency, Accountability, and Trustworthiness | Edward C. Cheng et.al. | 2601.06223 | translate | read | null |
| 2026-01-08 | TimeGNN-Augmented Hybrid-Action MARL for Fine-Grained Task Partitioning and Energy-Aware Offloading in MEC | Wei Ai et.al. | 2601.06191 | translate | read | null |
| 2026-01-07 | TIR-Flow: Active Video Search and Reasoning with Frozen VLMs | Hongbo Jin et.al. | 2601.06176 | translate | read | null |
| 2026-01-06 | HiMeS: Hippocampus-inspired Memory System for Personalized AI Assistants | Hailong Li et.al. | 2601.06152 | translate | read | null |
| 2026-01-05 | A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control | Wonhyeok Choi et.al. | 2601.06133 | translate | read | null |
| 2026-01-09 | Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards | Jiajie Zhang et.al. | 2601.06021 | translate | read | link |
| 2026-01-09 | TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents | Dawei Wang et.al. | 2601.05899 | translate | read | link |
| 2026-01-09 | StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management | Ruizhe Zhang et.al. | 2601.05890 | translate | read | null |
| 2026-01-09 | IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck | Huilin Deng et.al. | 2601.05870 | translate | read | null |
| 2026-01-09 | Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning | Kaichen Shen et.al. | 2601.05868 | translate | read | null |
| 2026-01-09 | Intelligent Singularity Avoidance in UR10 Robotic Arm Path Planning Using Hybrid Fuzzy Logic and Reinforcement Learning | Sheng-Kai Chen et.al. | 2601.05836 | translate | read | null |
| 2026-01-09 | EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis | Xiaoshuai Song et.al. | 2601.05808 | translate | read | link |
| 2026-01-09 | From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation | Zezhou Wang et.al. | 2601.05787 | translate | read | link |
| 2026-01-09 | SketchVL: Policy Optimization via Fine-Grained Credit Assignment for Chart Understanding and More | Muye Huang et.al. | 2601.05688 | translate | read | null |
| 2026-01-09 | CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space | Bingyi Liu et.al. | 2601.05675 | translate | read | null |
| 2026-01-09 | EvoQRE: Modeling Bounded Rationality in Safety-Critical Traffic Simulation via Evolutionary Quantal Response Equilibrium | Phu-Hoa Pham et.al. | 2601.05653 | translate | read | null |
| 2026-01-09 | GIFT: Games as Informal Training for Generalizable LLMs | Nuoyan Lyu et.al. | 2601.05633 | translate | read | null |
| 2026-01-09 | Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks | ShaoZhen Liu et.al. | 2601.05616 | translate | read | null |
| 2026-01-09 | Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR | Zijun Min et.al. | 2601.05607 | translate | read | null |
| 2026-01-09 | PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning | Jingcheng Hu et.al. | 2601.05593 | translate | read | link |
| 2026-01-09 | Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection | Cooper Lin et.al. | 2601.05578 | translate | read | null |
| 2026-01-09 | Autonomous Discovery of the Ising Model’s Critical Parameters with Reinforcement Learning | Hai Man et.al. | 2601.05577 | translate | read | null |
| 2026-01-09 | WildSci: Advancing Scientific Reasoning from In-the-Wild Literature | Tengxiao Liu et.al. | 2601.05567 | translate | read | null |
| 2026-01-09 | Closing the Modality Reasoning Gap for Speech Large Language Models | Chaoren Wang et.al. | 2601.05543 | translate | read | null |
| 2026-01-09 | LEAPS: An LLM-Empowered Adaptive Plugin for Taobao AI Search | Lei Wang et.al. | 2601.05513 | translate | read | null |
| 2026-01-09 | How Exploration Breaks Cooperation in Shared-Policy Multi-Agent Reinforcement Learning | Yi-Ning Weng et.al. | 2601.05509 | translate | read | null |
| 2026-01-09 | MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards | Zhiyu Shen et.al. | 2601.05488 | translate | read | null |
| 2026-01-09 | MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization | Jiefu Ou et.al. | 2601.05475 | translate | read | null |
| 2026-01-09 | Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning | Zhaoqi Wang et.al. | 2601.05466 | translate | read | null |
| 2026-01-09 | PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering | Yu Liu et.al. | 2601.05465 | translate | read | null |
| 2026-01-09 | Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction | Hongjin Kim et.al. | 2601.05459 | translate | read | null |
| 2026-01-08 | Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization | Yuxiang Ji et.al. | 2601.05432 | translate | read | link |
| 2026-01-08 | Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning | Minwoo Cho et.al. | 2601.05407 | translate | read | null |
| 2026-01-08 | Imitation Learning for Combinatorial Optimisation under Uncertainty | Prakash Gawas et.al. | 2601.05383 | translate | read | null |
| 2026-01-05 | On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis | Hector Zenil et.al. | 2601.05280 | translate | read | null |
| 2026-01-08 | RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes | Yuan-Kang Lee et.al. | 2601.05249 | translate | read | link |
| 2026-01-08 | GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization | Shih-Yang Liu et.al. | 2601.05242 | translate | read | link |
| 2026-01-08 | EARL: Energy-Aware Optimization of Liquid State Machines for Pervasive AI | Zain Iqbal et.al. | 2601.05205 | translate | read | null |
| 2026-01-08 | SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning | Yanchang Liang et.al. | 2601.05187 | translate | read | null |
| 2026-01-08 | Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems | Jihao Zhao et.al. | 2601.05171 | translate | read | null |
| 2026-01-08 | Safe Continual Reinforcement Learning Methods for Nonstationary Environments. Towards a Survey of the State of the Art | Timofey Tomashevskiy et.al. | 2601.05152 | translate | read | null |
| 2026-01-08 | Unitary fault-tolerant encoding of Pauli states in surface codes | Luis Colmenarez et.al. | 2601.05113 | translate | read | null |
| 2026-01-08 | Reinforced Efficient Reasoning via Semantically Diverse Exploration | Ziqi Zhao et.al. | 2601.05053 | translate | read | link |
| 2026-01-08 | Hán Dān Xué Bù (Mimicry) or Qīng Chū Yú Lán (Mastery)? A Cognitive Perspective on Reasoning Distillation in Large Language Models | Yueqing Hu et.al. | 2601.05019 | translate | read | null |
| 2026-01-08 | On the Hidden Objective Biases of Group-based Reinforcement Learning | Aleksandar Fontana et.al. | 2601.05002 | translate | read | null |
| 2026-01-08 | AlgBench: To What Extent Do Large Reasoning Models Understand Algorithms? | Henan Sun et.al. | 2601.04996 | translate | read | null |
| 2026-01-08 | A DQN-based model for intelligent network selection in heterogeneous wireless systems | Fayssal Bendaoud et.al. | 2601.04978 | translate | read | null |
| 2026-01-08 | ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning | Minda Hu et.al. | 2601.04973 | translate | read | null |
| 2026-01-08 | Text as a Universal Interface for Transferable Personalization | Yuting Liu et.al. | 2601.04963 | translate | read | null |
| 2026-01-08 | Safe Reinforcement Learning Beyond Baseline Control: A Hierarchical Framework for Space Triangle Tethered Formation System | Xinyi Tao et.al. | 2601.04957 | translate | read | null |
| 2026-01-08 | Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following | Yirong Zeng et.al. | 2601.04954 | translate | read | null |
| 2026-01-08 | SKATER: Synthesized Kinematics for Advanced Traversing Efficiency on a Humanoid Robot via Roller Skate Swizzles | Junchi Gu et.al. | 2601.04948 | translate | read | null |
| 2026-01-08 | Deep Reinforcement Learning for Optimum Order Execution: Mitigating Risk and Maximizing Returns | Khabbab Zakaria et.al. | 2601.04896 | translate | read | null |
| 2026-01-08 | Flexible Manufacturing Systems Intralogistics: Dynamic Optimization of AGVs and Tool Sharing Using Coloured-Timed Petri Nets and Actor-Critic RL with Actions Masking | Sofiene Lassoued et.al. | 2601.04887 | translate | read | null |
| 2026-01-08 | RAAR: Retrieval Augmented Agentic Reasoning for Cross-Domain Misinformation Detection | Zhiwei Liu et.al. | 2601.04853 | translate | read | null |
| 2026-01-08 | Intelligent resource allocation in wireless networks via deep reinforcement learning | Marie Diane Iradukunda et.al. | 2601.04842 | translate | read | null |
| 2026-01-08 | SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning | Caijun Xu et.al. | 2601.04809 | translate | read | link |
| 2026-01-08 | Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning | Siyuan Gan et.al. | 2601.04805 | translate | read | null |
| 2026-01-08 | AgentOCR: Reimagining Agent History via Optical Self-Compression | Lang Feng et.al. | 2601.04786 | translate | read | null |
| 2026-01-08 | AT $^2$ PO: Agentic Turn-based Policy Optimization via Tree Search | Zefang Zong et.al. | 2601.04767 | translate | read | link |
| 2026-01-08 | AM $^3$ Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs | Han Zhu et.al. | 2601.04736 | translate | read | null |
| 2026-01-08 | ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving | Chang Zhao et.al. | 2601.04714 | translate | read | null |
| 2026-01-08 | TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning | Yinuo Wang et.al. | 2601.04698 | translate | read | null |
| 2026-01-08 | A Method for Constructing a Digital Transformation Driving Mechanism Based on Semantic Understanding of Large Models | Huayi Liu et.al. | 2601.04696 | translate | read | null |
| 2026-01-08 | Tape: A Cellular Automata Benchmark for Evaluating Rule-Shift Generalization in Reinforcement Learning | Enze Pan et.al. | 2601.04695 | translate | read | null |
| 2026-01-08 | ResMAS: Resilience Optimization in LLM-based Multi-agent Systems | Zhilun Zhou et.al. | 2601.04694 | translate | read | null |
| 2026-01-08 | Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead | Oluwatosin Oseni et.al. | 2601.04686 | translate | read | null |
| 2026-01-08 | Agri-R1: Empowering Generalizable Agricultural Reasoning in Vision-Language Models with Reinforcement Learning | Wentao Zhang et.al. | 2601.04672 | translate | read | null |
| 2026-01-08 | Learning Dynamics in RL Post-Training for Language Models | Akiyoshi Tomihari et.al. | 2601.04670 | translate | read | null |
| 2026-01-08 | Optimizing Path Planning using Deep Reinforcement Learning for UGVs in Precision Agriculture | Laukik Patade et.al. | 2601.04668 | translate | read | null |
| 2026-01-08 | Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization | Mizanur Rahman et.al. | 2601.04582 | translate | read | link |
| 2026-01-08 | Reasoning Over Space: Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation | Dongyi Lv et.al. | 2601.04562 | translate | read | null |
| 2026-01-08 | Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training | Tianle Wang et.al. | 2601.04537 | translate | read | null |
| 2026-01-08 | GRACE: Reinforcement Learning for Grounded Response and Abstention under Contextual Evidence | Yibo Zhao et.al. | 2601.04525 | translate | read | null |
| 2026-01-08 | TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation | Jacob Ede Levine et.al. | 2601.04521 | translate | read | null |
| 2026-01-08 | Multiagent Reinforcement Learning with Neighbor Action Estimation | Zhenglong Luo et.al. | 2601.04511 | translate | read | null |
| 2026-01-07 | Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization | Xingjian Diao et.al. | 2601.04442 | translate | read | null |
| 2026-01-07 | Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization | Matthew Landers et.al. | 2601.04441 | translate | read | null |
| 2026-01-07 | Rate or Fate? RLV $^\varepsilon$ R: Reinforcement Learning with Verifiable Noisy Rewards | Ali Rad et.al. | 2601.04411 | translate | read | null |
| 2026-01-07 | Transformer-based Multi-agent Reinforcement Learning for Separation Assurance in Structured and Unstructured Airspaces | Arsyi Aziz et.al. | 2601.04401 | translate | read | null |
| 2026-01-07 | Enhanced-FQL( $λ$ ), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay | Mohsen Jalaeian-Farimani et.al. | 2601.04392 | translate | read | null |
| 2026-01-07 | Survival Dynamics of Neural and Programmatic Policies in Evolutionary Reinforcement Learning | Anton Roupassov-Ruiz et.al. | 2601.04365 | translate | read | null |
| 2026-01-07 | Online Action-Stacking Improves Reinforcement Learning Performance for Air Traffic Control | Ben Carvell et.al. | 2601.04287 | translate | read | null |
| 2026-01-07 | A Future Capabilities Agent for Tactical Air Traffic Control | Paul Kent et.al. | 2601.04285 | translate | read | null |
| 2026-01-07 | Making Tunable Parameters State-Dependent in Weather and Climate Models with Reinforcement Learning | Pritthijit Nath et.al. | 2601.04268 | translate | read | null |
| 2026-01-06 | Cross-Language Speaker Attribute Prediction Using MIL and RL | Sunny Shu et.al. | 2601.04257 | translate | read | null |
| 2026-01-07 | Hierarchical GNN-Based Multi-Agent Learning for Dynamic Queue-Jump Lane and Emergency Vehicle Corridor Formation | Haoran Su et.al. | 2601.04177 | translate | read | null |
| 2026-01-07 | Agentic Rubrics as Contextual Verifiers for SWE Agents | Mohit Raghavendra et.al. | 2601.04171 | translate | read | null |
| 2026-01-07 | InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training | Ziyun Zhang et.al. | 2601.04126 | translate | read | null |
| 2026-01-07 | GeoReason: Aligning Thinking And Answering In Remote Sensing Vision-Language Models Via Logical Consistency Reinforcement Learning | Wenshuai Li et.al. | 2601.04118 | translate | read | null |
| 2026-01-07 | Cells on Autopilot: Adaptive Cell (Re)Selection via Reinforcement Learning | Marvin Illian et.al. | 2601.04083 | translate | read | null |
| 2026-01-07 | Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model | Yuan Wang et.al. | 2601.04033 | translate | read | null |
| 2026-01-07 | On-Device Deep Reinforcement Learning for Decentralized Task Offloading Performance trade-offs in the training process | Gorka Nieto et.al. | 2601.03976 | translate | read | null |
| 2026-01-07 | Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models | Wei Wu et.al. | 2601.03969 | translate | read | null |
| 2026-01-07 | CoINS: Counterfactual Interactive Navigation via Skill-Aware VLM | Kangjie Zhou et.al. | 2601.03956 | translate | read | null |
| 2026-01-07 | Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification | Rui Sun et.al. | 2601.03948 | translate | read | null |
| 2026-01-07 | Adaptive-Boundary-Clipping GRPO: Ensuring Bounded Ratios for Stable and Generalizable Training | Chi Liu et.al. | 2601.03895 | translate | read | null |
| 2026-01-07 | IndexTTS 2.5 Technical Report | Yunpei Li et.al. | 2601.03888 | translate | read | null |
| 2026-01-07 | Staged Voxel-Level Deep Reinforcement Learning for 3D Medical Image Segmentation with Noisy Annotations | Yuyang Fu et.al. | 2601.03875 | translate | read | null |
| 2026-01-07 | Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning | Fei Wu et.al. | 2601.03823 | translate | read | null |
| 2026-01-07 | ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition | Muyang Zhao et.al. | 2601.03822 | translate | read | null |
| 2026-01-07 | From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs | Usha Shrestha et.al. | 2601.03808 | translate | read | null |
| 2026-01-07 | NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning | Zhongtao Miao et.al. | 2601.03790 | translate | read | null |
| 2026-01-07 | MVP: Enhancing Video Large Language Models via Self-supervised Masked Video Prediction | Xiaokun Sun et.al. | 2601.03781 | translate | read | null |
| 2026-01-07 | O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL | Yi Yao et.al. | 2601.03743 | translate | read | null |
| 2026-01-07 | EDCO: Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning | Jing-Cheng Pang et.al. | 2601.03725 | translate | read | null |
| 2026-01-07 | ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization | Shijie Zhang et.al. | 2601.03723 | translate | read | null |
| 2026-01-07 | R $^3$ L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification | Weijie Shi et.al. | 2601.03715 | translate | read | link |
| 2026-01-07 | TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL | Lang Cao et.al. | 2601.03703 | translate | read | null |
| 2026-01-07 | Dual-Attention Heterogeneous GNN for Multi-robot Collaborative Area Search via Deep Reinforcement Learning | Lina Zhu et.al. | 2601.03686 | translate | read | null |
| 2026-01-07 | Accounting for Optimal Control in the Sizing of Isolated Hybrid Renewable Energy Systems Using Imitation Learning | Simon Halvdansson et.al. | 2601.03679 | translate | read | null |
| 2026-01-07 | Sandwich Reasoning: An Answer-Reasoning-Answer Approach for Low-Latency Query Correction | Chen Zhang et.al. | 2601.03672 | translate | read | null |
| 2026-01-07 | AMIR-GRPO: Inducing Implicit Preference Signals into GRPO | Amir Hossein Yari et.al. | 2601.03661 | translate | read | null |
| 2026-01-07 | ReLA: Representation Learning and Aggregation for Job Scheduling with Reinforcement Learning | Zhengyi Kwan et.al. | 2601.03646 | translate | read | null |
| 2026-01-07 | Locomotion Beyond Feet | Tae Hoon Yang et.al. | 2601.03607 | translate | read | null |
| 2026-01-07 | Interleaved Tool-Call Reasoning for Protein Function Understanding | Chuanliu Fan et.al. | 2601.03604 | translate | read | null |
| 2026-01-07 | From Score to Sound: An End-to-End MIDI-to-Motion Pipeline for Robotic Cello Performance | Samantha Sudhoff et.al. | 2601.03562 | translate | read | null |
| 2026-01-07 | SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models | Yuxuan Jiang et.al. | 2601.03555 | translate | read | null |
| 2026-01-07 | VeRPO: Verifiable Dense Reward Policy Optimization for Code Generation | Longwen Wang et.al. | 2601.03525 | translate | read | null |
| 2026-01-07 | A Reinforcement Learning-Based Model for Mapping and Goal-Directed Navigation Using Multiscale Place Fields | Bekarys Dukenbaev et.al. | 2601.03520 | translate | read | null |
| 2026-01-07 | Semantic Belief-State World Model for 3D Human Motion Prediction | Sarim Chaudhry et.al. | 2601.03517 | translate | read | null |
| 2026-01-07 | Adaptive Model-Based Reinforcement Learning for Orbit Feedback Control in NSLS-II Storage Ring | Zeyu Dong et.al. | 2601.03486 | translate | read | null |
| 2026-01-06 | Understanding Reward Hacking in Text-to-Image Reinforcement Learning | Yunqi Hong et.al. | 2601.03468 | translate | read | null |
| 2026-01-06 | ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing | Hengjia Li et.al. | 2601.03467 | translate | read | null |
| 2026-01-06 | FIRE-VLM: A Vision-Language-Driven Reinforcement Learning Framework for UAV Wildfire Tracking in a Physics-Grounded Fire Digital Twin | Chris Webb et.al. | 2601.03449 | translate | read | null |
| 2026-01-06 | Foundation Model-Aided Hierarchical Control for Robust RIS-Assisted Near-Field Communications | Mohammad Ghassemi et.al. | 2601.03427 | translate | read | null |
| 2026-01-06 | Sensor to Pixels: Decentralized Swarm Gathering via Image-Based Reinforcement Learning | Yigal Koifman et.al. | 2601.03413 | translate | read | null |
| 2026-01-06 | Exploration Through Introspection: A Self-Aware Reward Model | Michael Petrowski et.al. | 2601.03389 | translate | read | null |
| 2026-01-06 | Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting | Kun Zhao et.al. | 2601.03321 | translate | read | null |
| 2026-01-06 | Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning | Yu Luo et.al. | 2601.03320 | translate | read | null |
| 2026-01-06 | Mastering the Game of Go with Self-play Experience Replay | Jingbin Liu et.al. | 2601.03306 | translate | read | null |
| 2026-01-06 | Autonomous Threat Detection and Response in Cloud Security: A Comprehensive Survey of AI-Driven Strategies | Gaurav Sarraf et.al. | 2601.03303 | translate | read | null |
| 2026-01-06 | PC2P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception | Guotao Li et.al. | 2601.03301 | translate | read | null |
| 2026-01-06 | STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning | Juntong Ni et.al. | 2601.03248 | translate | read | null |
| 2026-01-06 | Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion | Mykola Vysotskyi et.al. | 2601.03213 | translate | read | null |
| 2026-01-06 | UltraLogic: Enhancing LLM Reasoning through Large-Scale Data Synthesis and Bipolar Float Reward | Yile Liu et.al. | 2601.03205 | translate | read | null |
| 2026-01-06 | MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory | Shengtao Zhang et.al. | 2601.03192 | translate | read | null |
| 2026-01-06 | WebAnchor: Anchoring Agent Planning to Stabilize Long-Horizon Web Reasoning | Xinmiao Yu et.al. | 2601.03164 | translate | read | null |
| 2026-01-06 | Unified Thinker: A General Reasoning Modular Core for Image Generation | Sashuai Zhou et.al. | 2601.03127 | translate | read | null |
| 2026-01-06 | One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling | Yiyuan Li et.al. | 2601.03111 | translate | read | null |
| 2026-01-06 | Post-Decision State-Based Online Learning for Delay-Energy-Aware Flow Allocation in Wireless Systems | Mahesh Ganesh Bhat et.al. | 2601.03108 | translate | read | null |
| 2026-01-06 | IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation | Yankai Jiang et.al. | 2601.03054 | translate | read | null |
| 2026-01-06 | SOP: A Scalable Online Post-Training System for Vision-Language-Action Models | Mingjie Pan et.al. | 2601.03044 | translate | read | null |
| 2026-01-06 | Dementia-R1: Reinforced Pretraining and Reasoning from Unstructured Clinical Notes for Real-World Dementia Prognosis | Choonghan Kim et.al. | 2601.03018 | translate | read | null |
| 2026-01-06 | In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior | Anaïs Berkes et.al. | 2601.03015 | translate | read | null |
| 2026-01-06 | Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning | Yuankun Xie et.al. | 2601.02983 | translate | read | null |
| 2026-01-06 | Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning | Nathanaël Carraz Rakotonirina et.al. | 2601.02972 | translate | read | null |
| 2026-01-06 | The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models | Yuhuan You et.al. | 2601.02954 | translate | read | null |
| 2026-01-06 | Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning | Guoqiang Liang et.al. | 2601.02918 | translate | read | null |
| 2026-01-06 | ChemBART: A Pre-trained BART Model Assisting Organic Chemistry Analysis | Kenan Li et.al. | 2601.02915 | translate | read | null |
| 2026-01-06 | SimRPD: Optimizing Recruitment Proactive Dialogue Agents through Simulator-Based Data Evaluation and Selection | Zhiyong Cao et.al. | 2601.02871 | translate | read | null |
| 2026-01-06 | Sample-Efficient Neurosymbolic Deep Reinforcement Learning | Celeste Veronese et.al. | 2601.02850 | translate | read | null |
| 2026-01-06 | SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models | Ruiyang Zhang et.al. | 2601.02825 | translate | read | null |
| 2026-01-06 | Reinforcement Learning for Follow-the-Leader Robotic Endoscopic Navigation via Synthetic Data | Sicong Gao et.al. | 2601.02798 | translate | read | null |
| 2026-01-06 | MiMo-V2-Flash Technical Report | Xiaomi LLM-Core Team et.al. | 2601.02780 | translate | read | null |
| 2026-01-06 | Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation | Zhe Zhao et.al. | 2601.02778 | translate | read | null |
| 2026-01-06 | Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies | Mingming Zhang et.al. | 2601.02754 | translate | read | null |
| 2026-01-06 | Time-Scaling Is What Agents Need Now | Zhi Liu et.al. | 2601.02714 | translate | read | null |
| 2026-01-06 | Inferring Causal Graph Temporal Logic Formulas to Expedite Reinforcement Learning in Temporally Extended Tasks | Hadi Partovi Aria et.al. | 2601.02666 | translate | read | null |
| 2026-01-06 | Effective Online 3D Bin Packing with Lookahead Parcels Using Monte Carlo Tree Search | Jiangyi Fang et.al. | 2601.02649 | translate | read | null |
| 2026-01-05 | SWaRL: Safeguard Code Watermarking via Reinforcement Learning | Neusha Javidnia et.al. | 2601.02602 | translate | read | null |
| 2026-01-05 | Textual Explanations and Their Evaluations for Reinforcement Learning Policy | Ahmad Terra et.al. | 2601.02514 | translate | read | null |
| 2026-01-05 | LLM-Enhanced Reinforcement Learning for Time Series Anomaly Detection | Bahareh Golchin et.al. | 2601.02511 | translate | read | null |
| 2026-01-05 | WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks | Hao Bai et.al. | 2601.02439 | translate | read | null |
| 2026-01-05 | Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes | Jing Tan et.al. | 2601.02356 | translate | read | null |
| 2026-01-05 | VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation | Shikun Sun et.al. | 2601.02256 | translate | read | null |
| 2026-01-05 | Enabling Deep Reinforcement Learning Research for Energy Saving in Open RAN | Matteo Bordin et.al. | 2601.02240 | translate | read | null |
| 2026-01-05 | NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation | Huichao Zhang et.al. | 2601.02204 | translate | read | null |
| 2026-01-05 | CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents | Keyu Wang et.al. | 2601.02201 | translate | read | null |
| 2026-01-05 | ACDZero: MCTS Agent for Mastering Automated Cyber Defense | Yu Li et.al. | 2601.02196 | translate | read | null |
| 2026-01-05 | Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting | Muxi Diao et.al. | 2601.02151 | translate | read | null |
| 2026-01-05 | MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics | Zhuofan Shi et.al. | 2601.02075 | translate | read | null |
| 2026-01-05 | Reinforcement Learning Based Computationally Efficient Conditional Choice Simulation Estimation of Dynamic Discrete Choice Models | Ahmed Khwaja et.al. | 2601.02069 | translate | read | null |
| 2026-01-05 | Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management | Faizan Ahmed et.al. | 2601.02061 | translate | read | null |
| 2026-01-05 | GDRO: Group-level Reward Post-training Suitable for Diffusion Models | Yiyang Wang et.al. | 2601.02036 | translate | read | null |
| 2026-01-05 | AgentVNE: LLM-Augmented Graph Reinforcement Learning for Affinity-Aware Multi-Agent Placement in Edge Agentic AI | Runze Zheng et.al. | 2601.02021 | translate | read | null |
| 2026-01-05 | Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation | Weijian Ma et.al. | 2601.01984 | translate | read | null |
| 2026-01-05 | Distorted Distributional Policy Evaluation for Offline Reinforcement Learning | Ryo Iwaki et.al. | 2601.01917 | translate | read | null |
| 2026-01-05 | Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning | Yuxuan Li et.al. | 2601.01904 | translate | read | null |
| 2026-01-05 | Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents | Yi Yu et.al. | 2601.01885 | translate | read | null |
| 2026-01-05 | DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs | Jinghan Ru et.al. | 2601.01868 | translate | read | null |
| 2026-01-05 | Moments Matter:Stabilizing Policy Optimization using Return Distributions | Dennis Jabs et.al. | 2601.01803 | translate | read | null |
| 2026-01-05 | PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor | Qianjun Pan et.al. | 2601.01802 | translate | read | null |
| 2026-01-05 | Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving | Qi Wei et.al. | 2601.01800 | translate | read | null |
| 2026-01-05 | SRAS: A Lightweight Reinforcement Learning-based Document Selector for Edge-Native RAG Pipelines | Rajiv Chaitanya Muttur et.al. | 2601.01785 | translate | read | null |
| 2026-01-05 | Reinforcement Learning for Option Hedging: Static Implied-Volatility Fit versus Shortfall-Aware Performance | Ziheng Chen et.al. | 2601.01709 | translate | read | null |
| 2026-01-04 | All-Optical Deep Learning with Quantum Nonlinearity | Qingyi Zhou et.al. | 2601.01690 | translate | read | null |
| 2026-01-04 | Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives | Wei Liu et.al. | 2601.01665 | translate | read | null |
| 2026-01-04 | DemoBot: Efficient Learning of Bimanual Manipulation with Dexterous Hands From Third-Person Human Videos | Yucheng Xu et.al. | 2601.01651 | translate | read | null |
| 2026-01-04 | Action-Sketcher: From Reasoning to Action via Visual Sketches for Long-Horizon Robotic Manipulation | Huajie Tan et.al. | 2601.01618 | translate | read | null |
| 2026-01-04 | HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller | Tran Tien Dat et.al. | 2601.01577 | translate | read | null |
| 2026-01-04 | Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement | Mingyu Xu et.al. | 2601.01562 | translate | read | null |
| 2026-01-04 | Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization | Xinyu Qiu et.al. | 2601.01483 | translate | read | null |
| 2026-01-04 | Programmable ultra-broadband photonic chaos platform enabled by microwave-chaos-driven electro-optic frequency combs | Shiyu Shi et.al. | 2601.01440 | translate | read | null |
| 2026-01-04 | Context-Aware Information Transfer via Digital Semantic Communication in UAV-Based Networks | Poorvi Joshi et.al. | 2601.01430 | translate | read | null |
| 2026-01-04 | SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving | Chaofan Tao et.al. | 2601.01426 | translate | read | null |
| 2026-01-04 | DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer | Xu Guo et.al. | 2601.01425 | translate | read | null |
| 2026-01-04 | SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning | Peidong Wang et.al. | 2601.01392 | translate | read | null |
| 2026-01-03 | dataRLsec: Safety, Security, and Reliability With Robust Offline Reinforcement Learning for DPAs | Shriram KS Pandian et.al. | 2601.01289 | translate | read | null |
| 2026-01-03 | PyBatchRender: A Python Library for Batched 3D Rendering at Up to One Million FPS | Evgenii Rudakov et.al. | 2601.01288 | translate | read | null |
| 2026-01-03 | Harnessing Environmental Memory with Reinforcement Learning in Open Quantum Systems | Safae Gaidi et.al. | 2601.01252 | translate | read | null |
| 2026-01-03 | OrchestrRL: Dynamic Compute and Network Orchestration for Disaggregated RL | Xin Tan et.al. | 2601.01209 | translate | read | null |
| 2026-01-03 | Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering | Wuzhenghong Wen et.al. | 2601.01195 | translate | read | null |
| 2026-01-03 | SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards | Suryansh Singh Sijwali et.al. | 2601.01184 | translate | read | null |
| 2026-01-03 | Reinforcement Learning Based Whittle Index Policy for Scheduling Wireless Sensors | Sokipriala Jonah et.al. | 2601.01179 | translate | read | null |
| 2026-01-03 | ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation | Zhang Shizhe et.al. | 2601.01155 | translate | read | null |
| 2026-01-03 | Latent Space Reinforcement Learning for Multi-Robot Exploration | Sriram Rajasekar et.al. | 2601.01139 | translate | read | null |
| 2026-01-03 | Performance and Security Aware Distributed Service Placement in Fog Computing | Mohammad Goudarzi et.al. | 2601.01125 | translate | read | null |
| 2026-01-02 | DVGBench: Implicit-to-Explicit Visual Grounding Benchmark in UAV Imagery with Large Vision-Language Models | Yue Zhou et.al. | 2601.00998 | translate | read | null |
| 2026-01-02 | Materials Informatics: Emergence To Autonomous Discovery In The Age Of AI | Turab Lookman et.al. | 2601.00742 | translate | read | null |
| 2026-01-02 | Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty | Uğurcan Özalp et.al. | 2601.00737 | translate | read | null |
| 2026-01-02 | Precision Autotuning for Linear Solvers via Reinforcement Learning | Erin Carson et.al. | 2601.00728 | translate | read | null |
| 2026-01-02 | ARISE: Adaptive Reinforcement Integrated with Swarm Exploration | Rajiv Chaitanya M et.al. | 2601.00693 | translate | read | null |
| 2026-01-02 | IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning | Haonan Song et.al. | 2601.00677 | translate | read | null |
| 2026-01-02 | RoboReward: General-Purpose Vision-Language Reward Models for Robotics | Tony Lee et.al. | 2601.00675 | translate | read | null |
| 2026-01-02 | Integrating Multi-Armed Bandit, Active Learning, and Distributed Computing for Scalable Optimization | Foo Hui-Mean et.al. | 2601.00615 | translate | read | null |
| 2026-01-02 | Vision-based Goal-Reaching Control for Mobile Robots Using a Hierarchical Learning Framework | Mehdi Heydari Shahna et.al. | 2601.00610 | translate | read | null |
| 2026-01-02 | Traffic-Aware Optimal Taxi Placement Using Graph Neural Network-Based Reinforcement Learning | Sonia Khetarpaul et.al. | 2601.00607 | translate | read | null |
| 2026-01-02 | Parametrized Sharing for Multi-Agent Hybrid DRL for Multiple Multi-Functional RISs-Aided Downlink NOMA Networks | Chi-Te Kuo et.al. | 2601.00538 | translate | read | null |
| 2026-01-01 | CPPO: Contrastive Perception for Vision Language Policy Optimization | Ahmad Rezaei et.al. | 2601.00501 | translate | read | null |
| 2026-01-01 | Safe Adaptive Feedback Control via Barrier States | Trivikram Satharasi et.al. | 2601.00476 | translate | read | null |
| 2026-01-01 | Imitation from Observations with Trajectory-Level Generative Embeddings | Yongtao Qu et.al. | 2601.00452 | translate | read | null |
| 2026-01-01 | Modelling cultural evolution | Fredrik Jansson et.al. | 2601.00433 | translate | read | null |
| 2026-01-01 | E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models | Shengjun Zhang et.al. | 2601.00423 | translate | read | null |
| 2026-01-01 | Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach | Biao Wu et.al. | 2601.00388 | translate | read | null |
| 2026-01-01 | Multiagent Reinforcement Learning for Liquidity Games | Alicia Vidler et.al. | 2601.00324 | translate | read | null |
| 2026-01-01 | Offline Multi-Agent Reinforcement Learning for 6G Communications: Fundamentals, Applications and Future Directions | Eslam Eldeeb et.al. | 2601.00321 | translate | read | null |
| 2026-01-01 | Can Optimal Transport Improve Federated Inverse Reinforcement Learning? | David Millard et.al. | 2601.00309 | translate | read | null |
| 2026-01-01 | Next Generation Intelligent Low-Altitude Economy Deployments: The O-RAN Perspective | Aly Sabri Abdalla et.al. | 2601.00257 | translate | read | null |
| 2026-01-01 | Modern Neuromorphic AI: From Intra-Token to Inter-Token Processing | Osvaldo Simeone et.al. | 2601.00245 | translate | read | null |
| 2026-01-01 | From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning | Omar Sharif et.al. | 2601.00215 | translate | read | null |
| 2026-01-01 | Reinforcement-Learned Unequal Error Protection for Quantized Semantic Embeddings | Moirangthem Tiken Singh et.al. | 2601.00186 | translate | read | null |
| 2026-01-01 | Online Finetuning Decision Transformers with Pure RL Gradients | Junkai Luo et.al. | 2601.00167 | translate | read | null |
| 2026-01-01 | Reinforcement Learning with Function Approximation for Non-Markov Processes | Ali Devran Kara et.al. | 2601.00151 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)