Reinforcement Learning - 2025-12

Publish Date Title Authors PDF Translate Read Code
2025-12-31 Dichotomous Diffusion Policy Optimization Ruiming Liang et.al. 2601.00898 translate read null
2025-12-31 VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition Hongbo Jin et.al. 2601.00887 translate read null
2025-12-30 SmartFlow Reinforcement Learning and Agentic AI for Bike-Sharing Optimisation Aditya Sreevatsa K et.al. 2601.00868 translate read null
2025-12-25 Horizon Reduction as Information Loss in Offline Reinforcement Learning Uday Kumar Nidadala et.al. 2601.00831 translate read null
2025-12-31 GRL-SNAM: Geometric Reinforcement Learning with Path Differential Hamiltonians for Simultaneous Navigation and Mapping in Unknown Environments Aditya Sai Ellendula et.al. 2601.00116 translate read null
2025-12-31 Adaptive Pinching Antenna Optimization via Meta-Learning for Physical-Layer Security in Dynamic Wireless Networks Khalid T. Musri et.al. 2601.00115 translate read null
2025-12-31 Universal Adaptive Constraint Propagation: Scaling Structured Inference for Large Language Models via Meta-Reinforcement Learning Ibne Farabi Shihab et.al. 2601.00095 translate read null
2025-12-31 Reinforcement learning with timed constraints for robotics motion planning Zhaoan Wang et.al. 2601.00087 translate read null
2025-12-31 Coordinated Humanoid Manipulation with Choice Policies Haozhi Qi et.al. 2512.25072 translate read null
2025-12-31 Scaling Open-Ended Reasoning to Predict the Future Nikhil Chandak et.al. 2512.25070 translate read null
2025-12-31 Many Minds from One Model: Bayesian Transformers for Population Intelligence Diji Yang et.al. 2512.25063 translate read null
2025-12-31 ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning Timo Kaufmann et.al. 2512.25023 translate read null
2025-12-31 MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control Yongwei Zhang et.al. 2512.24955 translate read null
2025-12-31 Iterative Deployment Improves Planning Skills in LLMs Augusto B. Corrêa et.al. 2512.24940 translate read null
2025-12-31 Throughput Optimization in UAV-Mounted RIS under Jittering and Imperfect CSI via DRL Anas K. Saeed et.al. 2512.24773 translate read null
2025-12-31 Sparse Offline Reinforcement Learning with Corruption Robustness Nam Phuong Tran et.al. 2512.24768 translate read null
2025-12-31 Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow Karthik Dharmarajan et.al. 2512.24766 translate read null
2025-12-31 Control of Microrobots with Reinforcement Learning under On-Device Compute Constraints Yichen Liu et.al. 2512.24740 translate read null
2025-12-31 Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting Kai Ye et.al. 2512.24702 translate read null
2025-12-31 Dynamic Policy Learning for Legged Robot with Simplified Model Pretraining and Model Homotopy Transfer Dongyun Kang et.al. 2512.24698 translate read null
2025-12-31 Hierarchical Online Optimization Approach for IRS-enabled Low-altitude MEC in Vehicular Networks Yixian Wang et.al. 2512.24659 translate read null
2025-12-31 RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence Chengkai Hou et.al. 2512.24653 translate read null
2025-12-31 Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation Yury Kolomeytsev et.al. 2512.24651 translate read null
2025-12-31 Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Yuchen Shi et.al. 2512.24615 translate read null
2025-12-31 Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization Dong Qiu et.al. 2512.24609 translate read null
2025-12-31 Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning Shanyu Han et.al. 2512.24580 translate read null
2025-12-31 From Perception to Punchline: Empowering VLM with the Art of In-the-wild Meme Xueyan Li et.al. 2512.24555 translate read null
2025-12-31 From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning Amir Tahmasbi et.al. 2512.24532 translate read null
2025-12-30 Networked Markets, Fragmented Data: Adaptive Graph Learning for Customer Risk Analytics and Policy Design Lecheng Zheng et.al. 2512.24487 translate read null
2025-12-30 Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics Akash Samanta et.al. 2512.24445 translate read null
2025-12-30 Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models Lars van der Laan et.al. 2512.24407 translate read null
2025-12-30 SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning Yong Xien Chng et.al. 2512.24330 translate read null
2025-12-30 MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems Wan Jiang et.al. 2512.24325 translate read null
2025-12-30 Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking Meiqi Chen et.al. 2512.24297 translate read null
2025-12-30 Real-world Reinforcement Learning from Suboptimal Interventions Yinuo Zhao et.al. 2512.24288 translate read null
2025-12-30 DRL-TH: Jointly Utilizing Temporal Graph Attention and Hierarchical Fusion for UGV Navigation in Crowded Environments Ruitong Li et.al. 2512.24284 translate read null
2025-12-30 Deep Reinforcement Learning for Solving the Fleet Size and Mix Vehicle Routing Problem Pengfu Wan et.al. 2512.24251 translate read null
2025-12-30 Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning Chubin Chen et.al. 2512.24146 translate read null
2025-12-30 GARDO: Reinforcing Diffusion Models without Reward Hacking Haoran He et.al. 2512.24138 translate read null
2025-12-30 HY-MT1.5 Technical Report Mao Zheng et.al. 2512.24092 translate read null
2025-12-30 How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns Haoyue Bai et.al. 2512.24063 translate read null
2025-12-30 Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data Wenye Li et.al. 2512.24056 translate read null
2025-12-30 ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment Natchaya Temyingyong et.al. 2512.24040 translate read null
2025-12-30 Reinforced Diffusion: Learning to Push the Limits of Anisotropic Diffusion for Image Denoising Xinran Qin et.al. 2512.24035 translate read null
2025-12-30 RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations Xingqi He et.al. 2512.24023 translate read null
2025-12-30 CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards Zhiming Lin et.al. 2512.23971 translate read null
2025-12-30 Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration Lars van der Laan et.al. 2512.23927 translate read null
2025-12-30 Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias Xia Chen et.al. 2512.23916 translate read null
2025-12-29 Beamforming for Massive MIMO Aerial Communications: A Robust and Scalable DRL Approach Hesam Khoshkbari et.al. 2512.23902 translate read null
2025-12-29 Distributed Beamforming in Massive MIMO Communication for a Constellation of Airborne Platform Stations Hesam Khoshkbari et.al. 2512.23900 translate read null
2025-12-29 Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR Yuyang Zhang et.al. 2512.23870 translate read null
2025-12-29 Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting Lars van der Laan et.al. 2512.23805 translate read null
2025-12-29 Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark Manu et.al. 2512.23779 translate read null
2025-12-29 FineFT: Efficient and Risk-Aware Ensemble Reinforcement Learning for Futures Trading Molei Qin et.al. 2512.23773 translate read null
2025-12-29 Safety-Biased Policy Optimisation: Towards Hard-Constrained Reinforcement Learning via Trust Regions Ankit Kanwar et.al. 2512.23770 translate read null
2025-12-28 Audited Skill-Graph Self-Improvement for Agentic LLMs via Verifiable Rewards, Experience Synthesis, and Continual Memory Ken Huang et.al. 2512.23760 translate read null
2025-12-29 Training AI Co-Scientists Using Rubric Rewards Shashwat Goel et.al. 2512.23707 translate read null
2025-12-29 Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation Huajie Tan et.al. 2512.23703 translate read null
2025-12-29 Bellman Calibration for V-Learning in Offline Reinforcement Learning Lars van der Laan et.al. 2512.23694 translate read null
2025-12-29 Le Cam Distortion: A Decision-Theoretic Framework for Robust Transfer Learning Deniz Akdemir et.al. 2512.23617 translate read null
2025-12-29 ProGuard: Towards Proactive Multimodal Safeguard Shaohan Yu et.al. 2512.23573 translate read null
2025-12-29 ThinkGen: Generalized Thinking for Visual Generation Siyu Jiao et.al. 2512.23568 translate read null
2025-12-29 A NEAT Approach to Evolving Neural-Network-based Optimization of Chiral Photonic Metasurfaces: Application of a Neuro-Evolution Pipeline Davide Filippozzi et.al. 2512.23558 translate read null
2025-12-29 PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis Shengyi Hua et.al. 2512.23545 translate read null
2025-12-29 Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning Zuoyou Jiang et.al. 2512.23515 translate read null
2025-12-29 Hierarchical Decision Mamba Meets Agentic AI: A Novel Approach for RAN Slicing in 6G Md Arafat Habib et.al. 2512.23502 translate read null
2025-12-29 Agentic AI for Autonomous Defense in Software Supply Chain Security: Beyond Provenance to Vulnerability Mitigation Toqeer Ali Syed et.al. 2512.23480 translate read null
2025-12-29 HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation Yuxin Wen et.al. 2512.23464 translate read null
2025-12-29 Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance Zhuo Li et.al. 2512.23461 translate read null
2025-12-29 Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following Kongcheng Zhang et.al. 2512.23457 translate read null
2025-12-29 The World Is Bigger! A Computationally-Embedded Perspective on the Big World Hypothesis Alex Lewandowski et.al. 2512.23419 translate read null
2025-12-29 AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis Cehua Yang et.al. 2512.23366 translate read null
2025-12-29 CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation Ke Niu et.al. 2512.23333 translate read null
2025-12-29 Splitwise: Collaborative Edge-Cloud Inference for LLMs via Lyapunov-Assisted DRL Abolfazl Younesi et.al. 2512.23310 translate read null
2025-12-29 Agentic AI-Enhanced Semantic Communications: Foundations, Architecture, and Applications Haixiao Gao et.al. 2512.23294 translate read null
2025-12-29 Interpretable Safety Alignment via SAE-Constructed Low-Rank Subspace Adaptation Dianyun Wang et.al. 2512.23260 translate read null
2025-12-29 ViLaCD-R1: A Vision-Language Framework for Semantic Change Detection in Remote Sensing Xingwei Ma et.al. 2512.23244 translate read null
2025-12-29 A Human-Oriented Cooperative Driving Approach: Integrating Driving Intention, State, and Conflict Qin Wang et.al. 2512.23220 translate read null
2025-12-29 Evaluating Parameter Efficient Methods for RLVR Qingyu Yin et.al. 2512.23165 translate read null
2025-12-29 SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling Yufan He et.al. 2512.23162 translate read null
2025-12-28 A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms Yingru Li et.al. 2512.23097 translate read null
2025-12-28 Benchmark Success, Clinical Failure: When Reinforcement Learning Optimizes for Benchmarks, Not Patients Armin Berger et.al. 2512.23090 translate read null
2025-12-28 Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning Yingru Li et.al. 2512.23087 translate read null
2025-12-28 Trust Region Masking for Long-Horizon LLM Reinforcement Learning Yingru Li et.al. 2512.23075 translate read null
2025-12-28 Diversity or Precision? A Deep Dive into Next Token Prediction Haoyuan Wu et.al. 2512.22955 translate read null
2025-12-28 APO: Alpha-Divergence Preference Optimization Wang Zixian et.al. 2512.22953 translate read null
2025-12-28 Heterogeneity in Multi-Agent Reinforcement Learning Tianyi Hu et.al. 2512.22941 translate read null
2025-12-28 Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning Ünver Çiftçi et.al. 2512.22910 translate read null
2025-12-28 SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning Xiaotian Ren et.al. 2512.22895 translate read null
2025-12-28 Reinforcement Networks: novel framework for collaborative Multi-Agent Reinforcement Learning tasks Maksim Kryzhanovskiy et.al. 2512.22876 translate read null
2025-12-28 Adaptive Trust Consensus for Blockchain IoT: Comparing RL, DRL, and MARL Against Naive, Collusive, Adaptive, Byzantine, and Sleeper Attacks Soham Padia et.al. 2512.22860 translate read null
2025-12-28 AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning Shihao Cai et.al. 2512.22857 translate read null
2025-12-28 ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning Bangya Liu et.al. 2512.22854 translate read null
2025-12-28 MARPO: A Reflective Policy Optimization for Multi Agent Reinforcement Learning Cuiling Wu et.al. 2512.22832 translate read null
2025-12-28 TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning Gaurav Chaudhary et.al. 2512.22824 translate read null
2025-12-28 ReDiF: Reinforced Distillation for Few Step Diffusion Amirhossein Tighkhorshid et.al. 2512.22802 translate read null
2025-12-28 Parallel Diffusion Solver via Residual Dirichlet Policy Optimization Ruoyu Wang et.al. 2512.22796 translate read null
2025-12-28 FoldAct: Efficient and Stable Context Folding for Long-Horizon Search Agents Jiaqi Shao et.al. 2512.22733 translate read null
2025-12-27 Cyber Resilience in Next-Generation Networks: Threat Landscape, Theoretical Foundations, and Design Paradigms Junaid Farooq et.al. 2512.22721 translate read null
2025-12-27 Memento 2: Learning by Stateful Reflective Memory Jun Wang et.al. 2512.22716 translate read null
2025-12-27 Optimal Regulation of Nonlinear Input-Affine Systems via an Integral Reinforcement Learning-Based State-Dependent Riccati Equation Approach Arya Rashidinejad Meibodi et.al. 2512.22668 translate read null
2025-12-27 FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution Yidi Liu et.al. 2512.22647 translate read null
2025-12-27 RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure Wei Gao et.al. 2512.22560 translate read null
2025-12-27 AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing Jiacheng Li et.al. 2512.22455 translate read null
2025-12-26 PHANTOM: Physics-Aware Adversarial Attacks against Federated Learning-Coordinated EV Charging Management System Mohammad Zakaria Haider et.al. 2512.22381 translate read null
2025-12-26 Reinforcement Learning for Optimal Stopping in POMDPs with Application to Quickest Change Detection Austin Cooper et.al. 2512.22347 translate read null
2025-12-26 SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents Shaofei Cai et.al. 2512.22322 translate read null
2025-12-26 VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning Yang Ding et.al. 2512.22315 translate read null
2025-12-24 Agentic Software Issue Resolution with Large Language Models: A Survey Zhonghao Jiang et.al. 2512.22256 translate read null
2025-12-23 Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Byung-Kwan Lee et.al. 2512.22238 translate read null
2025-12-23 DiRL: An Efficient Post-Training Framework for Diffusion Language Models Ying Zhu et.al. 2512.22234 translate read link
2025-12-26 Hybrid Deep Reinforcement Learning for Joint Resource Allocation in Multi-Active RIS-Aided Uplink Communications Mohamed Shalma et.al. 2512.22107 translate read null
2025-12-26 Meta-Learning-Based Handover Management in NextG O-RAN Michail Kalntis et.al. 2512.22022 translate read null
2025-12-26 Latency-Optimal Cache-aided Multicast Streaming via Forward-Backward Reinforcement Learning Mohsen Amidzadeh et.al. 2512.21954 translate read null
2025-12-26 SWE-RM: Execution-free Feedback For Software Engineering Agents KaShun Shum et.al. 2512.21919 translate read null
2025-12-26 A Comedy of Estimators: On KL Regularization in RL Training of LLMs Vedant Shah et.al. 2512.21852 translate read null
2025-12-26 Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning YuXiang Kong et.al. 2512.21828 translate read null
2025-12-26 Q-A3C2: Quantum Reinforcement Learning with Time-Series Dynamic Clustering for Adaptive ETF Stock Selection Yen-Ku Liu et.al. 2512.21819 translate read null
2025-12-25 Multiconnectivity for SAGIN: Current Trends, Challenges, AI-driven Solutions, and Opportunities Abd Ullah Khan et.al. 2512.21717 translate read null
2025-12-25 Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search Maximilian Weichart et.al. 2512.21648 translate read null
2025-12-25 Jointly Optimal Policies for Remote Estimation of Autoregressive Markov Processes over Time-Correlated Fading Channel Manali Dutta et.al. 2512.21630 translate read null
2025-12-25 Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards Xinyu Tang et.al. 2512.21625 translate read null
2025-12-25 Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations Xin Liu et.al. 2512.21586 translate read null
2025-12-25 Towards Learning-Based Formula 1 Race Strategies Giona Fieni et.al. 2512.21570 translate read null
2025-12-25 Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model Yanhao Li et.al. 2512.21540 translate read null
2025-12-25 Generative Actor Critic Aoyang Qin et.al. 2512.21527 translate read null
2025-12-25 DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO Henglin Liu et.al. 2512.21514 translate read null
2025-12-24 dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning Shirui Chen et.al. 2512.21446 translate read null
2025-12-24 A Survey of Freshness-Aware Wireless Networking with Reinforcement Learning Alimu Alibotaiken et.al. 2512.21412 translate read null
2025-12-24 A Reinforcement Learning Approach to Synthetic Data Generation Natalia Espinosa-Dice et.al. 2512.21395 translate read null
2025-12-24 RoboCade: Gamifying Robot Data Collection Suvir Mirchandani et.al. 2512.21235 translate read null
2025-12-24 MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models Andres M Bran et.al. 2512.21231 translate read null
2025-12-24 Global End-Effector Pose Control of an Underactuated Aerial Manipulator via Reinforcement Learning Shlok Deshmukh et.al. 2512.21085 translate read null
2025-12-24 Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics Karim Abdelsalam et.al. 2512.21081 translate read null
2025-12-24 LSTM-Based Modeling and Reinforcement Learning Control of a Magnetically Actuated Catheter Arya Rashidinejad Meibodi et.al. 2512.21063 translate read null
2025-12-24 Policy-Conditioned Policies for Multi-Agent Task Solving Yue Lin et.al. 2512.21024 translate read null
2025-12-24 LLM-Empowered Agentic AI for QoE-Aware Network Slicing Management in Industrial IoT Xudong Wang et.al. 2512.20997 translate read null
2025-12-24 Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions Jingyang You et.al. 2512.20974 translate read null
2025-12-24 ReACT-Drug: Reaction-Template Guided Reinforcement Learning for de novo Drug Design R Yadunandan et.al. 2512.20958 translate read null
2025-12-24 One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents Zhaoxi Zhang et.al. 2512.20957 translate read null
2025-12-24 Model-free stochastic linear quadratic control for discrete-time systems with multiplicative and additive noises via semidefinite programming Jing Guo et.al. 2512.20911 translate read null
2025-12-24 Embodied AI-Enhanced IoMT Edge Computing: UAV Trajectory Optimization and Task Offloading with Mobility Prediction Siqi Mu et.al. 2512.20902 translate read null
2025-12-24 The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents Zan-Kai Chong et.al. 2512.20884 translate read null
2025-12-24 Proprioception Enhances Vision Language Model in Generating Captions and Subtask Segmentations for Robot Task Kanata Suzuki et.al. 2512.20876 translate read null
2025-12-24 NVIDIA Nemotron 3: Efficient and Open Intelligence NVIDIA et.al. 2512.20856 translate read null
2025-12-23 QoS- and Physics-Aware Routing in Optical LEO Satellite Networks via Deep Reinforcement Learning Mohammad Taghi Dabiri et.al. 2512.20835 translate read null
2025-12-23 Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions Rashmeet Kaur Nayyar et.al. 2512.20831 translate read null
2025-12-23 Safety Alignment of LMs via Non-cooperative Games Anselm Paulus et.al. 2512.20806 translate read link
2025-12-23 Generalization of RLVR Using Causal Reasoning as a Testbed Brian Lu et.al. 2512.20760 translate read null
2025-12-23 AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent Haipeng Luo et.al. 2512.20745 translate read null
2025-12-23 AI-Driven Green Cognitive Radio Networks for Sustainable 6G Communication Anshul Sharma et.al. 2512.20739 translate read null
2025-12-23 Learning-Enabled Elastic Network Topology for Distributed ISAC Service Provisioning Jie Chen et.al. 2512.20722 translate read null
2025-12-22 Mechanism-Based Intelligence (MBI): Differentiable Incentives for Rational Coordination and Guaranteed Alignment in Multi-Agent Systems Stefano Grassi et.al. 2512.20688 translate read null
2025-12-23 LongVideoAgent: Multi-Agent Reasoning with Long Videos Runtao Liu et.al. 2512.20618 translate read link
2025-12-23 Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Seijin Kobayashi et.al. 2512.20605 translate read null
2025-12-23 Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information İbrahim Oğuz Çetinkaya et.al. 2512.20589 translate read null
2025-12-23 Performative Policy Gradient: Optimality in Performative Reinforcement Learning Debabrota Basu et.al. 2512.20576 translate read null
2025-12-23 LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving Long Nguyen et.al. 2512.20563 translate read link
2025-12-23 Recurrent Off-Policy Deep Reinforcement Learning Doesn’t Have to be Slow Tyler Clark et.al. 2512.20513 translate read null
2025-12-23 Resilient Packet Forwarding: A Reinforcement Learning Approach to Routing in Gaussian Interconnected Networks with Clustered Faults Mohammad Walid Charrwi et.al. 2512.20394 translate read null
2025-12-23 Identifying Appropriately-Sized Services with Deep Reinforcement Learning Syeda Tasnim Fabiha et.al. 2512.20381 translate read null
2025-12-23 TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning Saisai Yang et.al. 2512.20312 translate read null
2025-12-23 Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks Divya Vijay et.al. 2512.20275 translate read null
2025-12-23 Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning Kausthubh Manda et.al. 2512.20220 translate read null
2025-12-23 Joint Design of Embedded Index Coding and Beamforming for MIMO-based Distributed Computing via Multi-Agent Reinforcement Learning Heekang Song et.al. 2512.20201 translate read null
2025-12-23 Edge-Served Congestion Control for Wireless Multipath Transmission with a Transformer Agent Liang Wang et.al. 2512.20186 translate read null
2025-12-23 FaithLens: Detecting and Explaining Faithfulness Hallucination Shuzheng Si et.al. 2512.20182 translate read link
2025-12-23 RESPOND: Risk-Enhanced Structured Pattern for LLM-driven Online Node-level Decision-making Dan Chen et.al. 2512.20179 translate read null
2025-12-23 Offline Safe Policy Optimization From Heterogeneous Feedback Ze Gong et.al. 2512.20173 translate read null
2025-12-23 Multi-hop Reasoning via Early Knowledge Alignment Yuxin Wang et.al. 2512.20144 translate read link
2025-12-23 MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization Zhuo Yang et.al. 2512.20135 translate read null
2025-12-23 Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering Yuanhao Chen et.al. 2512.20115 translate read null
2025-12-23 ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language Aly Lidayan et.al. 2512.20111 translate read null
2025-12-23 Information-directed sampling for bandits: a primer Annika Hirling et.al. 2512.20096 translate read null
2025-12-23 Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Yiming Du et.al. 2512.20092 translate read link
2025-12-23 Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches Chaithra et.al. 2512.20082 translate read null
2025-12-23 Scaling Reinforcement Learning for Content Moderation with Large Language Models Hamed Firooz et.al. 2512.20061 translate read null
2025-12-23 An Optimal Policy for Learning Controllable Dynamics by Exploration Peter N. Loxley et.al. 2512.20053 translate read null
2025-12-23 From Optimization to Learning: Dual-Approach Resource Allocation for Over-the-Air Edge Computing Under Execution Uncertainty Tuo Wu et.al. 2512.20008 translate read null
2025-12-22 Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning Jiayun Wu et.al. 2512.19920 translate read null
2025-12-21 Learning to Design City-scale Transit Routes Bibek Poudel et.al. 2512.19767 translate read null
2025-12-22 Scalably Enhancing the Clinical Validity of a Task Benchmark with Physician Oversight Junze Ye et.al. 2512.19691 translate read null
2025-12-22 Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Yuqiao Tan et.al. 2512.19673 translate read link
2025-12-22 Learning Generalizable Hand-Object Tracking from Synthetic Demonstrations Yinhuai Wang et.al. 2512.19583 translate read null
2025-12-22 LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller Kirill Djebko et.al. 2512.19576 translate read null
2025-12-22 Variational Autoregressive Networks Applied to $φ^4$ Field Theory Systems Moxian Qian et.al. 2512.19575 translate read null
2025-12-22 CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal Yongxin Wang et.al. 2512.19554 translate read null
2025-12-22 LacaDM: A Latent Causal Diffusion Model for Multiobjective Reinforcement Learning Xueming Yan et.al. 2512.19516 translate read null
2025-12-22 A Gauss-Newton-Induced Structure-Exploiting Algorithm for Differentiable Optimal Control Yuankun Chen et.al. 2512.19447 translate read null
2025-12-22 CodeSimpleQA: Scaling Factuality in Code Large Language Models Jian Yang et.al. 2512.19424 translate read null
2025-12-22 Learning General Policies with Policy Gradient Methods Simon Ståhlberg et.al. 2512.19366 translate read null
2025-12-22 Interpretable Hybrid Deep Q-Learning Framework for IoT-Based Food Spoilage Prediction with Synthetic Data Generation and Hardware Validation Isshaan Singh et.al. 2512.19361 translate read null
2025-12-22 First-Order Representation Languages for Goal-Conditioned RL Simon Ståhlberg et.al. 2512.19355 translate read null
2025-12-22 Enhancing PLS of Indoor IRS-VLC Systems for Colluding and Non-Colluding Eavesdroppers Rashid Iqbal et.al. 2512.19339 translate read null
2025-12-22 Learning-Assisted Multi-Operator Variable Neighborhood Search for Urban Cable Routing Wei Liu et.al. 2512.19321 translate read null
2025-12-22 SafeMed-R1: Adversarial Reinforcement Learning for Generalizable and Robust Medical Reasoning in Vision-Language Models A. A. Gde Yogi Pramana et.al. 2512.19317 translate read null
2025-12-22 Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing Xu Zhang et.al. 2512.19302 translate read null
2025-12-22 RMLer: Synthesizing Novel Objects across Diverse Categories via Reinforcement Mixing Learning Jun Li et.al. 2512.19300 translate read null
2025-12-22 Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization Zhaoyang Liu et.al. 2512.19270 translate read null
2025-12-22 WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving Pengxuan Yang et.al. 2512.19133 translate read link
2025-12-22 AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards Zihan Lin et.al. 2512.19126 translate read null
2025-12-22 Explicit and Non-asymptotic Query Complexities of Rank-Based Zeroth-order Algorithm on Stochastic Smooth Functions Haishan Ye et.al. 2512.19104 translate read null
2025-12-22 Tool-Augmented Hybrid Ensemble Reasoning with Distillation for Bilingual Mathematical Problem Solving Peiqing Lu et.al. 2512.19093 translate read null
2025-12-22 CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models Pengyu Chen et.al. 2512.19083 translate read null
2025-12-22 ORPR: An OR-Guided Pretrain-then-Reinforce Learning Model for Inventory Management Lingjie Zhao et.al. 2512.19001 translate read null
2025-12-22 DTCCL: Disengagement-Triggered Contrastive Continual Learning for Autonomous Bus Planners Yanding Yang et.al. 2512.18988 translate read null
2025-12-22 Scaling Online Distributionally Robust Reinforcement Learning: Sample-Efficient Guarantees with General Function Approximation Debamita Ghosh et.al. 2512.18957 translate read null
2025-12-22 Training Multimodal Large Reasoning Models Needs Better Thoughts: A Three-Stage Framework for Long Chain-of-Thought Synthesis and Selection Yizhi Wang et.al. 2512.18956 translate read null
2025-12-22 A Framework for Deploying Learning-based Quadruped Loco-Manipulation Yadong Liu et.al. 2512.18938 translate read null
2025-12-21 QoS-Aware Load Balancing in the Computing Continuum via Multi-Player Bandits Ivan Čilić et.al. 2512.18915 translate read null
2025-12-21 Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations Shaomu Tan et.al. 2512.18906 translate read null
2025-12-21 Structural Reinforcement Learning for Heterogeneous Agent Macroeconomics Yucheng Yang et.al. 2512.18892 translate read null
2025-12-21 CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning Zijun Gao et.al. 2512.18857 translate read null
2025-12-21 InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement Feeza Khan Khanzada et.al. 2512.18850 translate read null
2025-12-21 From Word to World: Can Large Language Models be Implicit Text-based World Models? Yixia Li et.al. 2512.18832 translate read null
2025-12-21 MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation Guohui Zhang et.al. 2512.18766 translate read null
2025-12-21 Gaussian-Mixture-Model Q-Functions for Policy Iteration in Reinforcement Learning Minh Vu et.al. 2512.18763 translate read null
2025-12-21 InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search Kaican Li et.al. 2512.18745 translate read null
2025-12-21 A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models Zhiquan Tan et.al. 2512.18730 translate read null
2025-12-21 Demonstration-Guided Continual Reinforcement Learning in Dynamic Environments Xue Yang et.al. 2512.18670 translate read null
2025-12-21 Offline Reinforcement Learning for End-to-End Autonomous Driving Chihiro Noguchi et.al. 2512.18662 translate read null
2025-12-21 LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction Jensen Zhang et.al. 2512.18623 translate read null
2025-12-21 A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback Thanh Dat Hoang et.al. 2512.18622 translate read null
2025-12-21 Trajectory Planning for UAV-Based Smart Farming Using Imitation-Based Triple Deep Q-Learning Wencan Mao et.al. 2512.18604 translate read null
2025-12-21 SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models Pengcheng Li et.al. 2512.18583 translate read null
2025-12-21 ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning Weijie Zhou et.al. 2512.18571 translate read null
2025-12-21 Vox Deorum: A Hybrid LLM Architecture for 4X / Grand Strategy Game AI – Lessons from Civilization V John Chen et.al. 2512.18564 translate read null
2025-12-21 Distributionally Robust Multi-Agent Reinforcement Learning for Intelligent Traffic Control Shuwei Pei et.al. 2512.18558 translate read null
2025-12-21 Toward Training Superintelligent Software Agents through Self-Play SWE-RL Yuxiang Wei et.al. 2512.18552 translate read null
2025-12-20 Scaling up Stability: Reinforcement Learning for Distributed Control of Networked Systems in the Space of Stabilizing Policies John Cao et.al. 2512.18540 translate read null
2025-12-20 When Robots Say No: The Empathic Ethical Disobedience Benchmark Dmytro Kuzmenko et.al. 2512.18474 translate read null
2025-12-20 On the Universality of Transformer Architectures; How Much Attention Is Enough? Amirreza Abbasi et.al. 2512.18445 translate read null
2025-12-20 Learning Semantic Atomic Skills for Multi-Task Robotic Manipulation Yihang Zhu et.al. 2512.18368 translate read null
2025-12-20 Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism Youssef Mahran et.al. 2512.18336 translate read null
2025-12-20 Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC) Youssef Mahran et.al. 2512.18333 translate read null
2025-12-20 Trustworthy and Explainable Deep Reinforcement Learning for Safe and Energy-Efficient Process Control: A Use Case in Industrial Compressed Air Systems Vincent Bezold et.al. 2512.18317 translate read null
2025-12-20 Monitoring Monitorability Melody Y. Guan et.al. 2512.18311 translate read null
2025-12-20 Embedded Safety-Aligned Intelligence via Differentiable Internal Alignment Embeddings Harsh Rathva et.al. 2512.18309 translate read null
2025-12-20 Stable and Efficient Single-Rollout RL for Multimodal Reasoning Rui Liu et.al. 2512.18215 translate read null
2025-12-20 Sophia: A Persistent Agent Framework of Artificial Life Mingyang Sun et.al. 2512.18202 translate read null
2025-12-20 NL2CA: Auto-formalizing Cognitive Decision-Making from Natural Language Using an Unsupervised CriticNL2LTL Framework Zihao Deng et.al. 2512.18189 translate read null
2025-12-20 On Swarm Leader Identification using Probing Policies Stergios E. Bachoumas et.al. 2512.18146 translate read null
2025-12-19 Unifying Causal Reinforcement Learning: Survey, Taxonomy, Algorithms and Applications Cristiano da Costa Cunha et.al. 2512.18135 translate read null
2025-12-19 Towards Autonomous Navigation in Endovascular Interventions Tudor Jianu et.al. 2512.18081 translate read null
2025-12-19 SurgiPose: Estimating Surgical Tool Kinematics from Monocular Video for Surgical Robot Learning Juo-Tung Chen et.al. 2512.18068 translate read null
2025-12-19 ReGal: A First Look at PPO-based Legal AI for Judgment Prediction and Summarization in India Shubham Kumar Nigam et.al. 2512.18014 translate read null
2025-12-19 Adaptive Agents in Spatial Double-Auction Markets: Modeling the Emergence of Industrial Symbiosis Matthieu Mastio et.al. 2512.17979 translate read null
2025-12-19 Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy Aditya Gahlawat et.al. 2512.17899 translate read null
2025-12-19 AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning Ran Gong et.al. 2512.17853 translate read null
2025-12-19 Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes Carlos Vélez García et.al. 2512.17846 translate read null
2025-12-19 NeuRehab: A Reinforcement Learning and Spiking Neural Network-Based Rehab Automation Framework Phani Pavan Kambhampati et.al. 2512.17841 translate read null
2025-12-19 About Time: Model-free Reinforcement Learning with Timed Reward Machines Anirban Majumdar et.al. 2512.17637 translate read null
2025-12-19 Trust-Region Adaptive Policy Optimization Mingyu Su et.al. 2512.17636 translate read null
2025-12-19 SCOPE: Sequential Causal Optimization of Process Interventions Jakob De Moor et.al. 2512.17629 translate read null
2025-12-19 Learning Safe Autonomous Driving Policies Using Predictive Safety Representations Mahesh Keswani et.al. 2512.17586 translate read null
2025-12-19 Kinematics-Aware Diffusion Policy with Consistent 3D Observation and Action Space for Whole-Arm Robotic Manipulation Kangchen Lv et.al. 2512.17568 translate read null
2025-12-19 HydroGym: A Reinforcement Learning Platform for Fluid Dynamics Christian Lagemann et.al. 2512.17534 translate read null
2025-12-19 Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning Javier Gonzalez-Ruiz et.al. 2512.17444 translate read null
2025-12-19 Xiaomi MiMo-VL-Miloco Technical Report Jiaze Li et.al. 2512.17436 translate read null
2025-12-19 TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data Deqing Liu et.al. 2512.17370 translate read null
2025-12-19 Neuro-Symbolic Control with Large Language Models for Language-Guided Spatial Tasks Momina Liaqat Ali et.al. 2512.17321 translate read null
2025-12-19 Large Language Models as Pokémon Battle Agents: Strategic Play and Content Generation Daksh Jain et.al. 2512.17308 translate read null
2025-12-19 Understanding Generalization in Role-Playing Models via Information Theory Yongqi Li et.al. 2512.17270 translate read null
2025-12-19 A Theoretical Analysis of State Similarity Between Markov Decision Processes Zhenyu Tao et.al. 2512.17265 translate read null
2025-12-19 Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Jiangjie Chen et.al. 2512.17260 translate read null
2025-12-19 Cooperative Energy Scheduling of Multi-Microgrids Based on Risk-Sensitive Reinforcement Learning Rongxiang Zhang et.al. 2512.17246 translate read null
2025-12-19 Learning When to Look: A Disentangled Curriculum for Strategic Perception in Multimodal Reasoning Siqi Yang et.al. 2512.17227 translate read null
2025-12-19 CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency Xiao Liang et.al. 2512.17213 translate read null
2025-12-19 Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs Rujiao Long et.al. 2512.17206 translate read null
2025-12-19 MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation Shengwei Zhao et.al. 2512.17194 translate read null
2025-12-19 MAPPO-LCR: Multi-Agent Proximal Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games Zhaoqilin Yang et.al. 2512.17187 translate read null
2025-12-19 Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots Gang Zhang et.al. 2512.17183 translate read null
2025-12-19 Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors Maher Mesto et.al. 2512.17180 translate read null
2025-12-19 Enhancing AIGC Service Efficiency with Adaptive Multi-Edge Collaboration in A Distributed System Changfu Xu et.al. 2512.17158 translate read null
2025-12-19 Towards Senior-Robot Interaction: Reactive Robot Dog Gestures Chunyang Meng et.al. 2512.17136 translate read null
2025-12-19 Deep Reinforcement Learning-Aided Strategies for Big Data Offloading in Vehicular Networks Talha Akyildiz et.al. 2512.17133 translate read null
2025-12-18 Reinforcement Learning for Self-Improving Agent with Skill Library Jiongxiao Wang et.al. 2512.17102 translate read null
2025-12-18 Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making Toshiaki Hori et.al. 2512.17091 translate read null
2025-12-18 Value Under Ignorance in Universal Artificial Intelligence Cole Wyeth et.al. 2512.17086 translate read null
2025-12-18 UniRel-R1: RL-tuned LLM Reasoning for Knowledge Graph Relational Question Answering Yinxu Tang et.al. 2512.17043 translate read null
2025-12-18 GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning Chang-Hwan Lee et.al. 2512.17034 translate read null
2025-12-18 Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification Qihao Liu et.al. 2512.16921 translate read null
2025-12-18 AdaTooler-V: Adaptive Tool-Use for Images and Videos Chaoyang Wang et.al. 2512.16918 translate read null
2025-12-18 Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning Qihao Liu et.al. 2512.16917 translate read null
2025-12-18 Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward Peter Chen et.al. 2512.16912 translate read null
2025-12-18 Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning Andrew Wagenmaker et.al. 2512.16911 translate read null
2025-12-18 MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning Yuanchen Ju et.al. 2512.16909 translate read null
2025-12-18 AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning Tzu-Han Lin et.al. 2512.16883 translate read null
2025-12-18 A survey of the orienteering problem: model evolution, algorithmic advances, and future directions Songhao Shen et.al. 2512.16865 translate read null
2025-12-18 RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Tianyuan Qu et.al. 2512.16864 translate read null
2025-12-18 ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning Zihan Zhou et.al. 2512.16861 translate read null
2025-12-18 Meta-RL Induces Exploration in Language Agents Yulun Jiang et.al. 2512.16848 translate read null
2025-12-18 Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning Bahman Abolhassani et.al. 2512.16813 translate read null
2025-12-18 Olaf: Bringing an Animated Character to Life in the Physical World David Müller et.al. 2512.16705 translate read null
2025-12-18 JustRL: Scaling a 1.5B LLM with a Simple RL Recipe Bingxiang He et.al. 2512.16649 translate read null
2025-12-18 Implementing a Sharia Chatbot as a Consultation Medium for Questions About Islam Wisnu Uriawan et.al. 2512.16644 translate read null
2025-12-18 Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game Barna Pásztor et.al. 2512.16626 translate read null
2025-12-18 Non-Asymptotic Global Convergence of PPO-Clip Yin Liu et.al. 2512.16565 translate read null
2025-12-18 ParamExplorer: A framework for exploring parameters in generative art Julien Gachadoat et.al. 2512.16529 translate read null
2025-12-18 Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment Yuan Li et.al. 2512.16484 translate read null
2025-12-18 E-SDS: Environment-aware See it, Do it, Sorted - Automated Environment-Aware Reinforcement Learning for Humanoid Locomotion Enis Yalcin et.al. 2512.16446 translate read null
2025-12-18 StarCraft+: Benchmarking Multi-agent Algorithms in Adversary Paradigm Yadong Li et.al. 2512.16444 translate read null
2025-12-18 NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning Ruifeng Xu et.al. 2512.16408 translate read null
2025-12-18 Hypernetworks That Evolve Themselves Joachim Winther Pedersen et.al. 2512.16406 translate read null
2025-12-18 Machine Learning-based Optimal Control for Colloidal Self-Assembly Andres Lizano-Villalobos et.al. 2512.16402 translate read null
2025-12-18 ManiLong-Shot: Interaction-Aware One-Shot Imitation Learning for Long-Horizon Manipulation Zixuan Chen et.al. 2512.16302 translate read null
2025-12-18 Simultaneous Secrecy and Covert Communications (SSACC) in Mobility-Aware RIS-Aided Networks Yanyu Cheng et.al. 2512.16224 translate read null
2025-12-18 Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation Sarosij Bose et.al. 2512.16201 translate read null
2025-12-18 MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation Pengyu Wang et.al. 2512.16145 translate read null
2025-12-18 INTELLECT-3: Technical Report Prime Intellect Team et.al. 2512.16144 translate read null
2025-12-17 Techno-economic optimization of a heat-pipe microreactor, part I: theory and cost optimization Paul Seurin et.al. 2512.16032 translate read null
2025-12-17 Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models Caner Erden et.al. 2512.15973 translate read null
2025-12-17 Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning Polaris Jhandi et.al. 2512.15943 translate read null
2025-12-17 DSO: Direct Steering Optimization for Bias Mitigation Lucas Monteiro Paes et.al. 2512.15926 translate read null
2025-12-15 Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT) Akhil Sharma et.al. 2512.15790 translate read null
2025-12-17 Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Zhenwen Liang et.al. 2512.15687 translate read null
2025-12-17 Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning Jiaqi Xu et.al. 2512.15662 translate read null
2025-12-17 Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction Mathieu Blondel et.al. 2512.15605 translate read null
2025-12-17 Deep Reinforcement Learning for EH-Enabled Cognitive-IoT Under Jamming Attacks Nadia Abdolkhani et.al. 2512.15558 translate read null
2025-12-17 Autonomous Pressure Control in MuVacAS via Deep Reinforcement Learning and Deep Learning Surrogate Models Guillermo Rodriguez-Llorente et.al. 2512.15521 translate read null
2025-12-17 Double Horizon Model-Based Policy Optimization Akihiro Kubo et.al. 2512.15439 translate read null
2025-12-17 FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments Quanxi Zhou et.al. 2512.15430 translate read null
2025-12-17 Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods Ji Zhou et.al. 2512.15422 translate read null
2025-12-17 EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning Jianfei Ma et.al. 2512.15405 translate read null
2025-12-17 Graph Contextual Reinforcement Learning for Efficient Directed Controller Synthesis Toshihide Ubukata et.al. 2512.15295 translate read null
2025-12-17 Learning-Based Phase Shift Optimization of Liquid Crystal RIS in Dynamic mmWave Networks Le Hao et.al. 2512.15279 translate read null
2025-12-17 Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning Yiliu Sun et.al. 2512.15274 translate read null
2025-12-17 EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence Jiaxu Wan et.al. 2512.15160 translate read null
2025-12-17 Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning Weiqin Wang et.al. 2512.15146 translate read null
2025-12-17 Automatic Reward Shaping from Multi-Objective Human Heuristics Yuqing Xie et.al. 2512.15120 translate read null
2025-12-17 QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management Jiayang Wan et.al. 2512.15119 translate read null
2025-12-17 Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models Jinwu Hu et.al. 2512.15089 translate read null
2025-12-17 Deep Reinforcement Learning for Joint Time and Power Management in SWIPT-EH CIoT Nadia Abdolkhani et.al. 2512.15062 translate read null
2025-12-17 Spectral Representation-based Reinforcement Learning Chenxiao Gao et.al. 2512.15036 translate read null
2025-12-17 ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision Wenlong Xia et.al. 2512.15020 translate read null
2025-12-17 Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management E. C. Garrido-Merchán et.al. 2512.14992 translate read null
2025-12-17 Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes Hanqing Jin et.al. 2512.14991 translate read null
2025-12-16 Puzzle Curriculum GRPO for Vision-Centric Reasoning Ahmadreza Jeddi et.al. 2512.14944 translate read null
2025-12-16 Imitation Learning for Multi-turn LM Agents via On-policy Expert Corrections Niklas Lauffer et.al. 2512.14895 translate read null
2025-12-16 Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse Jingwei Chen et.al. 2512.14879 translate read null
2025-12-16 TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs Jun Zhang et.al. 2512.14698 translate read link
2025-12-16 CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives Zihan Wang et.al. 2512.14696 translate read link
2025-12-16 Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes Alessandro Trapasso et.al. 2512.14617 translate read null
2025-12-16 RecGPT-V2 Technical Report Chao Yi et.al. 2512.14503 translate read null
2025-12-16 Hybrid Cognitive IoT with Cooperative Caching and SWIPT-EH: A Hierarchical Reinforcement Learning Framework Nadia Abdolkhani et.al. 2512.14488 translate read null
2025-12-16 Context-Picker: Dynamic context selection using multi-stage reinforcement learning Siyuan Zhu et.al. 2512.14465 translate read null
2025-12-16 A data-physics hybrid generative model for patient-specific post-stroke motor rehabilitation using wearable sensor data Yanning Dai et.al. 2512.14329 translate read null
2025-12-16 Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations Xudong Han et.al. 2512.14321 translate read null
2025-12-16 A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks Agrippina Mwangi et.al. 2512.14297 translate read null
2025-12-16 GLM-TTS Technical Report Jiayan Cui et.al. 2512.14291 translate read link
2025-12-16 Understanding and Improving Hyperbolic Deep Reinforcement Learning Timo Klein et.al. 2512.14202 translate read link
2025-12-16 Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis Yankai Jiang et.al. 2512.14157 translate read null
2025-12-16 A First-Order Logic-Based Alternative to Reward Models in RLHF Chunjin Jian et.al. 2512.14100 translate read null
2025-12-16 RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees Junjie Ma et.al. 2512.14069 translate read null
2025-12-16 Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning Amir M. Soufi Enayati et.al. 2512.14057 translate read null
2025-12-16 OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving Zhenguo Zhang et.al. 2512.14044 translate read null
2025-12-16 Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model Zhaofeng Hu et.al. 2512.14031 translate read null
2025-12-16 Cooperative Caching Towards Efficient Spectrum Utilization in Cognitive-IoT Networks Nadia Abdolkhani et.al. 2512.14029 translate read null
2025-12-16 Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks Nadia Abdolkhani et.al. 2512.14013 translate read null
2025-12-15 Adaptive digital twins for predictive decision-making: Online Bayesian learning of transition dynamics Eugenio Varetti et.al. 2512.13919 translate read null
2025-12-15 Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences Charles Marrder et.al. 2512.13890 translate read null
2025-12-15 SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Jitesh Jain et.al. 2512.13874 translate read link
2025-12-15 Explainable reinforcement learning from human feedback to improve alignment Shicheng Liu et.al. 2512.13837 translate read null
2025-12-13 RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing Yuhan Tang et.al. 2512.13727 translate read null
2025-12-13 Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce Sayak Chakrabarty et.al. 2512.13726 translate read null
2025-12-15 AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection Junwen Miao et.al. 2512.13671 translate read null
2025-12-15 A Scientific Reasoning Model for Organic Synthesis Procedure Generation Guoqing Liu et.al. 2512.13668 translate read null
2025-12-15 Advancing Machine Learning Optimization of Chiral Photonic Metasurface: Comparative Study of Neural Network and Genetic Algorithm Approaches Davide Filippozzi et.al. 2512.13656 translate read null
2025-12-15 MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning Haoyu Fu et.al. 2512.13636 translate read null
2025-12-15 SCR2-ST: Combine Single Cell with Spatial Transcriptomics for Efficient Active Sampling via Reinforcement Learning Junchao Zhu et.al. 2512.13635 translate read null
2025-12-15 Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models Boxin Wang et.al. 2512.13607 translate read null
2025-12-15 Image Diffusion Preview with Consistency Solver Fu-Yun Wang et.al. 2512.13592 translate read link
2025-12-15 MMhops-R1: Multimodal Multi-hop Reasoning Tao Zhang et.al. 2512.13573 translate read null
2025-12-15 Memory in the Age of AI Agents Yuyang Hu et.al. 2512.13564 translate read link
2025-12-15 How Low Can You Go? The Data-Light SE Challenge Kishan Kumar Ganguly et.al. 2512.13524 translate read null
2025-12-15 Reinforcement Learning based 6-DoF Maneuvers for Microgravity Intravehicular Docking: A Simulation Study with Int-Ball2 in ISS-JEM Aman Arora et.al. 2512.13514 translate read null
2025-12-15 MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph Linjie Mu et.al. 2512.13510 translate read null
2025-12-15 Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model Heyi Chen et.al. 2512.13507 translate read null
2025-12-15 Differentiable Evolutionary Reinforcement Learning Sitao Cheng et.al. 2512.13399 translate read null
2025-12-15 QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution Mohammad Reza Fasihi et.al. 2512.13393 translate read null
2025-12-15 Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning Chuan Mao et.al. 2512.13380 translate read null
2025-12-15 Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles Sümer Tunçay et.al. 2512.13359 translate read null
2025-12-15 Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3) Zeyad Gamal et.al. 2512.13356 translate read null
2025-12-15 Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration Hao Fu et.al. 2512.13293 translate read null
2025-12-15 AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning Jiaru Zou et.al. 2512.13278 translate read null
2025-12-15 SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling Muhammad Alfian Amrizal et.al. 2512.13268 translate read null
2025-12-15 Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving Hyunki Seong et.al. 2512.13262 translate read null
2025-12-15 Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection Zihui Zhao et.al. 2512.13240 translate read null
2025-12-15 SACn: Soft Actor-Critic with n-step Returns Jakub Łyskawa et.al. 2512.13165 translate read null
2025-12-15 SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning Emre Can Acikgoz et.al. 2512.13159 translate read null
2025-12-15 TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning Shenzhi Yang et.al. 2512.13106 translate read null
2025-12-15 Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures Mohammad Walid Charrwi et.al. 2512.13096 translate read null
2025-12-15 ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning Feng Zhang et.al. 2512.13095 translate read null
2025-12-15 Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation Xiang Li et.al. 2512.13094 translate read null
2025-12-15 PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations Mingqi Yuan et.al. 2512.13093 translate read null
2025-12-15 M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization Bizhe Bai et.al. 2512.13070 translate read null
2025-12-15 Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments Kangning Gao et.al. 2512.13060 translate read null
2025-12-15 GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training Tong Wei et.al. 2512.13043 translate read null
2025-12-15 What Happens Next? Next Scene Prediction with a Unified Video Model Xinjie Li et.al. 2512.13015 translate read null
2025-12-15 Learning Terrain Aware Bipedal Locomotion via Reduced Dimensional Perceptual Representations Guillermo A. Castillo et.al. 2512.12993 translate read null
2025-12-15 Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning Amin Jalal Aghdasian et.al. 2512.12987 translate read null
2025-12-15 QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management Weizhou Shen et.al. 2512.12967 translate read link
2025-12-15 Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals Gagan Deep et.al. 2512.12924 translate read null
2025-12-15 LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization Bangyu Li et.al. 2512.12922 translate read null
2025-12-15 Meta-GPT: Decoding the Metasurface Genome with Generative Artificial Intelligence David Dang et.al. 2512.12888 translate read null
2025-12-14 Information-Consistent Language Model Recommendations through Group Relative Policy Optimization Sonal Prabhune et.al. 2512.12858 translate read null
2025-12-14 MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems Patrick Kostelac et.al. 2512.12855 translate read null
2025-12-14 Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks Dong Liu et.al. 2512.12803 translate read null
2025-12-14 CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning Xuanzhang Liu et.al. 2512.12716 translate read null
2025-12-14 Self-Motivated Growing Neural Network for Adaptive Architecture via Local Structural Plasticity Yiyang Jia et.al. 2512.12713 translate read null
2025-12-14 Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning Enhong Mu et.al. 2512.12706 translate read null
2025-12-14 Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning Yongcan Yu et.al. 2512.12690 translate read null
2025-12-14 CogDoc: Towards Unified thinking in Documents Qixin Xu et.al. 2512.12658 translate read null
2025-12-14 Coupled Variational Reinforcement Learning for Language Model General Reasoning Xueru Wen et.al. 2512.12576 translate read null
2025-12-14 World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents Yesid Fonseca et.al. 2512.12548 translate read null
2025-12-13 Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings Shengkai Xu et.al. 2512.12492 translate read null
2025-12-13 More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models Hoang Anh Just et.al. 2512.12487 translate read null
2025-12-13 HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments Yongjun He et.al. 2512.12476 translate read null
2025-12-13 Sim2Real Reinforcement Learning for Soccer skills Jonathan Spraggett et.al. 2512.12437 translate read link
2025-12-13 Deep Hedging with Reinforcement Learning: A Practical Framework for Option Risk Management Travon Lucius et.al. 2512.12420 translate read null
2025-12-13 ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) Systems Babak Badnava et.al. 2512.12366 translate read null
2025-12-13 The Role of AI in Modern Penetration Testing J. Alexander Curtis et.al. 2512.12326 translate read null
2025-12-13 A Conflict-Aware Resource Management Framework for the Computing Continuum Vlad Popescu-Vifor et.al. 2512.12299 translate read null
2025-12-13 Moment and Highlight Detection via MLLM Frame Segmentation I Putu Andika Bagas Jiwanta et.al. 2512.12246 translate read null
2025-12-13 Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy Jonathan Spraggett et.al. 2512.12230 translate read link
2025-12-12 Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning Vittorio Giammarino et.al. 2512.12046 translate read null
2025-12-12 Policy Gradient Algorithms for Age-of-Information Cost Minimization José-Ramón Vidal et.al. 2512.11990 translate read null
2025-12-12 Learning to Extract Context for Context-Aware LLM Inference Minseon Kim et.al. 2512.11986 translate read null
2025-12-12 A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach Jia Hu et.al. 2512.11944 translate read null
2025-12-12 Evolutionary Reinforcement Learning based AI tutor for Socratic Interdisciplinary Instruction Mei Jiang et.al. 2512.11930 translate read null
2025-12-12 AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis Junjie Ye et.al. 2512.11797 translate read null
2025-12-12 Agile Flight Emerges from Multi-Agent Competitive Racing Vineet Pasumarti et.al. 2512.11781 translate read null
2025-12-12 SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support Yuming Feng et.al. 2512.11755 translate read null
2025-12-12 UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations Tingyu Yuan et.al. 2512.11609 translate read null
2025-12-12 DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry Zhenyang Cai et.al. 2512.11558 translate read null
2025-12-12 Rethinking Expert Trajectory Utilization in LLM Post-training Bowen Ding et.al. 2512.11470 translate read link
2025-12-12 Three methods, one problem: Classical and AI approaches to no-three-in-line Pranav Ramanathan et.al. 2512.11469 translate read null
2025-12-12 Towards Trustworthy Multi-Turn LLM Agents via Behavioral Guidance Gonca Gürsun et.al. 2512.11421 translate read null
2025-12-12 Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization Yifan Niu et.al. 2512.11391 translate read null
2025-12-12 Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits Minwoo Park et.al. 2512.11345 translate read null
2025-12-12 DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning Jinming Ge et.al. 2512.11342 translate read null
2025-12-12 RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training Tianyuan Wu et.al. 2512.11306 translate read null
2025-12-12 When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents Mrinal Rawat et.al. 2512.11277 translate read null
2025-12-12 A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation Hong Je-Gal et.al. 2512.11270 translate read null
2025-12-12 Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control Iftekharul Islam et.al. 2512.11247 translate read null
2025-12-11 Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning Wei Duan et.al. 2512.11179 translate read null
2025-12-11 Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance Tzu-Hsien Lee et.al. 2512.11173 translate read null
2025-12-11 CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound Akhil S Anand et.al. 2512.11169 translate read null
2025-12-11 Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts Guanli Liu et.al. 2512.11161 translate read null
2025-12-11 In-Context Multi-Objective Optimization Xinyu Zhang et.al. 2512.11114 translate read null
2025-12-11 Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation Yiwen Tang et.al. 2512.10949 translate read link
2025-12-11 Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit Zamirddine Mari et.al. 2512.10934 translate read null
2025-12-11 Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation Zamirddine Mari et.al. 2512.10925 translate read null
2025-12-11 Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies Mohammad Rezoanul Hoque et.al. 2512.10913 translate read null
2025-12-11 Iterative Compositional Data Generation for Robot Control Anh-Quan Pham et.al. 2512.10891 translate read null
2025-12-11 Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments Atahan Cilan et.al. 2512.10835 translate read null
2025-12-11 OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification Zijian Wu et.al. 2512.10756 translate read null
2025-12-11 Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification Maya Swisa et.al. 2512.10747 translate read null
2025-12-11 Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Songyang Gao et.al. 2512.10739 translate read null
2025-12-11 How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning Jianbo Wang et.al. 2512.10698 translate read null
2025-12-11 Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning Benjamin Gundersen et.al. 2512.10691 translate read null
2025-12-11 AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence Bo Yang et.al. 2512.10624 translate read null
2025-12-11 Multi-Objective Reward and Preference Optimization: Theory and Algorithms Akhil Agnihotri et.al. 2512.10601 translate read null
2025-12-11 Grounding Everything in Tokens for Multimodal Large Language Models Xiangxuan Ren et.al. 2512.10554 translate read null
2025-12-11 Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning Haiteng Zhao et.al. 2512.10534 translate read null
2025-12-11 Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning Chihyeon Song et.al. 2512.10510 translate read null
2025-12-11 UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning Jiaxi Wu et.al. 2512.10492 translate read null
2025-12-11 Shot and Architecture Adaptive Subspace Variational Quantum Eigensolver for Microwave Simulation Zhixiu Han et.al. 2512.10458 translate read null
2025-12-11 HypeR Adaptivity: Joint $hr$ -Adaptive Meshing via Hypergraph Multi-Agent Deep Reinforcement Learning Niccolò Grillo et.al. 2512.10439 translate read null
2025-12-11 Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention Yang Yu et.al. 2512.10414 translate read null
2025-12-11 A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale Vinoth Punniyamoorthy et.al. 2512.10341 translate read null
2025-12-11 Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters Shruti Dongare et.al. 2512.10271 translate read null
2025-12-11 Multi-dimensional Preference Alignment by Conditioning Reward Itself Jiho Jang et.al. 2512.10237 translate read null
2025-12-11 Task-Oriented Grasping Using Reinforcement Learning with a Contextual Reward Machine Hui Li et.al. 2512.10235 translate read null
2025-12-11 Latent Chain-of-Thought World Modeling for End-to-End Driving Shuhan Tan et.al. 2512.10226 translate read null
2025-12-11 An exploration for higher efficiency in multi objective optimisation with reinforcement learning Mehmet Emin Aydin et.al. 2512.10208 translate read null
2025-12-10 Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation Pol Mestres et.al. 2512.10118 translate read null
2025-12-10 Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation Steven Caro et.al. 2512.10099 translate read null
2025-12-10 SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation Jongmin Lee et.al. 2512.10042 translate read null
2025-12-10 Diffusion Is Your Friend in Show, Suggest and Tell Jia Cheng Hu et.al. 2512.10038 translate read null
2025-12-10 Latent Action World Models for Control with Unlabeled Trajectories Marvin Alles et.al. 2512.10016 translate read null
2025-12-10 TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0 Jinyu Chen et.al. 2512.09961 translate read null
2025-12-10 STACHE: Local Black-Box Explanations for Reinforcement Learning Policies Andrew Elashkin et.al. 2512.09909 translate read null
2025-12-10 FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning Khurram Khalil et.al. 2512.09872 translate read null
2025-12-10 Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation Yuyang Li et.al. 2512.09851 translate read link
2025-12-10 ChronusOmni: Improving Time Awareness of Omni Large Language Models Yijing Chen et.al. 2512.09841 translate read null
2025-12-10 RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning Khurram Khalil et.al. 2512.09829 translate read null
2025-12-10 Prefrontal scaling of reward prediction error readout gates reinforcement-derived adaptive behavior in primates Tian Sang et.al. 2512.09761 translate read null
2025-12-10 MOA: Multi-Objective Alignment for Role-Playing Agents Chonghua Liao et.al. 2512.09756 translate read null
2025-12-10 Flexible Reconfigurable Intelligent Surface-Aided Covert Communications in UAV Networks Chong Huang et.al. 2512.09714 translate read null
2025-12-10 Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning Kaichen He et.al. 2512.09706 translate read null
2025-12-10 Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies Mika Persson et.al. 2512.09682 translate read null
2025-12-10 d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models Leyi Pan et.al. 2512.09675 translate read null
2025-12-10 SynthPix: A lightspeed PIV images generator Antonio Terpin et.al. 2512.09664 translate read null
2025-12-10 Mastering Diverse, Unknown, and Cluttered Tracks for Robust Vision-Based Drone Racing Feng Yu et.al. 2512.09571 translate read null
2025-12-10 Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search Junkai Ji et.al. 2512.09566 translate read null
2025-12-10 REASAN: Learning Reactive Safe Navigation for Legged Robots Qihao Yuan et.al. 2512.09537 translate read null
2025-12-10 RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning Yucan Guo et.al. 2512.09487 translate read null
2025-12-10 Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation Jialin Ying et.al. 2512.09410 translate read null
2025-12-10 CFLight: Enhancing Safety with Traffic Signal Control through Counterfactual Learning Mingyuan Li et.al. 2512.09368 translate read null
2025-12-10 COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning Lin Li et.al. 2512.09349 translate read null
2025-12-10 Tyche: A Hybrid Computation Framework of Illumination Pattern for Satellite Beam Hopping Ziheng Yang et.al. 2512.09312 translate read null
2025-12-10 One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation Huayi Zhou et.al. 2512.09297 translate read null
2025-12-10 Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning Ruonan Pi et.al. 2512.09293 translate read null
2025-12-10 Exploratory Mean-Variance with Jumps: An Equilibrium Approach Yuling Max Chen et.al. 2512.09224 translate read null
2025-12-09 Learning Unmasking Policies for Diffusion Language Models Metod Jazbec et.al. 2512.09106 translate read null
2025-12-09 Masked Generative Policy for Robotic Control Lipeng Zhuang et.al. 2512.09101 translate read null
2025-12-09 No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers Damiano Marsili et.al. 2512.08889 translate read null
2025-12-09 IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams Ryan LeRoy et.al. 2512.08877 translate read null
2025-12-09 Reinforcement Learning From State and Temporal Differences Lex Weaver et.al. 2512.08855 translate read null
2025-12-09 Optimal navigation in two-dimensional regular and turbulent flows Vladimir Parfenyev et.al. 2512.08766 translate read null
2025-12-09 Learning and Editing Universal Graph Prompt Tuning via Reinforcement Learning Jinfeng Xu et.al. 2512.08763 translate read null
2025-12-09 Direct transfer of optimized controllers to similar systems using dimensionless MPC Josip Kir Hromatko et.al. 2512.08667 translate read null
2025-12-09 Sim2Swim: Zero-Shot Velocity Control for Agile AUV Maneuvering in 3 Minutes Lauritz Rismark Fosso et.al. 2512.08656 translate read null
2025-12-09 Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis Orit Davidovich et.al. 2512.08601 translate read null
2025-12-09 Mind to Hand: Purposeful Robotic Control via Embodied Reasoning Peijun Tang et.al. 2512.08580 translate read null
2025-12-09 Thinking with Images via Self-Calling Agent Wenxi Yang et.al. 2512.08511 translate read link
2025-12-09 Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning Junnan Qiu et.al. 2512.08485 translate read null
2025-12-09 Using reinforcement learning to probe the role of feedback in skill acquisition Antonio Terpin et.al. 2512.08463 translate read null
2025-12-09 From Accuracy to Impact: The Impact-Driven AI Framework (IDAIF) for Aligning Engineering Architecture with Theory of Change Yong-Woon Kim et.al. 2512.08449 translate read null
2025-12-09 Turning Threat into Opportunity: DRL-Powered Anti-Jamming via Energy Harvesting in UAV-Disrupted Channels Ngoc-Tan Nguyen et.al. 2512.08351 translate read null
2025-12-09 Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks Thai Duong Nguyen et.al. 2512.08341 translate read null
2025-12-09 Collaborative Intelligence for UAV-Satellite Network Slicing: Towards a Joint QoS-Energy-Fairness MADRL Optimization Thanh-Dao Nguyen et.al. 2512.08322 translate read null
2025-12-09 rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection Sijia Chen et.al. 2512.08300 translate read null
2025-12-09 Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventions Eunice Yiu et.al. 2512.08230 translate read null
2025-12-09 Primal-dual policy learning for mean-field stochastic LQR problem Xiushan Jiang et.al. 2512.08205 translate read null
2025-12-09 TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models Zheng Ding et.al. 2512.08153 translate read null
2025-12-09 Robust Agents in Open-Ended Worlds Mikayel Samvelyan et.al. 2512.08139 translate read null
2025-12-09 Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward Sampriti Soor et.al. 2512.08131 translate read null
2025-12-08 Scalable Offline Model-Based RL with Action Chunks Kwanyoung Park et.al. 2512.08108 translate read null
2025-12-08 Training LLMs for Honesty via Confessions Manas Joglekar et.al. 2512.08093 translate read null
2025-12-08 An Introduction to Deep Reinforcement and Imitation Learning Pedro Santana et.al. 2512.08052 translate read null
2025-12-08 F2: Offline Reinforcement Learning for Hamiltonian Simulation via Free-Fermionic Subroutine Compilation Ethan Decker et.al. 2512.08023 translate read null
2025-12-08 Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care Aryaman Bansal et.al. 2512.08012 translate read null
2025-12-08 VLD: Visual Language Goal Distance for Reinforcement Learning Navigation Lazar Milikic et.al. 2512.07976 translate read null
2025-12-08 Agentic Artificial Intelligence for Ethical Cybersecurity in Uganda: A Reinforcement Learning Framework for Threat Detection in Resource-Constrained Environments Ibrahim Adabara et.al. 2512.07909 translate read null
2025-12-08 An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning Lukas Johannes Möller et.al. 2512.07827 translate read null
2025-12-08 On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Charlie Zhang et.al. 2512.07783 translate read null
2025-12-08 RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models Xiqiao Xiong et.al. 2512.07761 translate read null
2025-12-08 DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving Jialv Zou et.al. 2512.07745 translate read null
2025-12-08 SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery Meng Cao et.al. 2512.07733 translate read null
2025-12-08 Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE Anxiang Zeng et.al. 2512.07710 translate read null
2025-12-08 Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks Aileen Liao et.al. 2512.07697 translate read null
2025-12-08 The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds Shahar Lutati et.al. 2512.07631 translate read null
2025-12-08 Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement Yongsheng Lian et.al. 2512.07611 translate read null
2025-12-08 Understanding Individual Decision-Making in Multi-Agent Reinforcement Learning: A Dynamical Systems Approach James Rudd-Jones et.al. 2512.07588 translate read null
2025-12-08 ReLaX: Reasoning with Latent Exploration for Large Reasoning Models Shimin Zhang et.al. 2512.07558 translate read null
2025-12-08 Model-Based Reinforcement Learning Under Confounding Nishanth Venkatesh et.al. 2512.07528 translate read null
2025-12-08 How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations JV Roig et.al. 2512.07497 translate read null
2025-12-08 Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization Zhuoran Zhuang et.al. 2512.07478 translate read null
2025-12-08 Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction Haolin Song et.al. 2512.07464 translate read null
2025-12-08 Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Tong Wu et.al. 2512.07461 translate read null
2025-12-08 From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models Clarisse Bardiot et.al. 2512.07452 translate read null
2025-12-08 KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models Chenwei Shi et.al. 2512.07437 translate read null
2025-12-08 Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models Haidong Kang et.al. 2512.07419 translate read null
2025-12-08 Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning Giray Önür et.al. 2512.07417 translate read null
2025-12-08 Training Language Models to Use Prolog as a Tool Niklas Mellgren et.al. 2512.07407 translate read null
2025-12-08 Control and Reinforcement Learning through the Lens of Optimization: An Algorithmic Perspective Tolga Ok et.al. 2512.07377 translate read null
2025-12-08 ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning Byungju Kim et.al. 2512.07371 translate read null
2025-12-08 Multi-Rigid-Body Approximation of Human Hands with Application to Digital Twin Bin Zhao et.al. 2512.07359 translate read null
2025-12-08 PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning Chen Gong et.al. 2512.07342 translate read null
2025-12-08 RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation Zhi Rao et.al. 2512.07273 translate read null
2025-12-08 SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks Florian Tretter et.al. 2512.07266 translate read null
2025-12-08 Benchmarking Humanoid Imitation Learning with Motion Difficulty Zhaorui Meng et.al. 2512.07248 translate read null
2025-12-08 Towards Robust Protective Perturbation against DeepFake Face Swapping Hengyang Yao et.al. 2512.07228 translate read null
2025-12-08 Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation Zhaoyang Liu et.al. 2512.07212 translate read null
2025-12-08 MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning Xuhui Zheng et.al. 2512.07203 translate read null
2025-12-08 Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction Zhen Huang et.al. 2512.07200 translate read null
2025-12-08 Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models Fenghua Weng et.al. 2512.07141 translate read null
2025-12-08 TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning Zebin Xing et.al. 2512.07135 translate read null
2025-12-08 Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots Jue Wang et.al. 2512.07114 translate read null
2025-12-07 A Hetero-Associative Sequential Memory Model Utilizing Neuromorphic Signals: Validated on a Mobile Manipulator Runcong Wang et.al. 2512.07032 translate read null
2025-12-07 Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients Krishna Arun et.al. 2512.06990 translate read null
2025-12-07 LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding Yu Yu et.al. 2512.06982 translate read null
2025-12-07 Neuro-Vesicles: Neuromodulation Should Be a Dynamical System, Not a Tensor Decoration Zilin Li et.al. 2512.06966 translate read null
2025-12-07 Statistical analysis of Inverse Entropy-regularized Reinforcement Learning Denis Belomestny et.al. 2512.06956 translate read null
2025-12-07 Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features Aseer Al Faisal et.al. 2512.06925 translate read null
2025-12-07 Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models Alexandr Plashchinsky et.al. 2512.06920 translate read null
2025-12-07 Know your Trajectory – Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis Clifford F et.al. 2512.06917 translate read null
2025-12-07 Khalasi: Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields Rushiraj Gadhvi et.al. 2512.06912 translate read null
2025-12-07 An Analysis of Large Language Models for Simulating User Responses in Surveys Ziyun Yu et.al. 2512.06874 translate read null
2025-12-07 JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models Ce Chi et.al. 2512.06859 translate read null
2025-12-07 Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning Tingyu Li et.al. 2512.06835 translate read null
2025-12-07 MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning Yueqian Wang et.al. 2512.06810 translate read null
2025-12-07 PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance Jifar Wakuma Ayana et.al. 2512.06747 translate read null
2025-12-07 The Role of Entropy in Visual Grounding: Analysis and Optimization Shuo Li et.al. 2512.06726 translate read null
2025-12-07 RunawayEvil: Jailbreaking the Image-to-Video Generative Models Songping Wang et.al. 2512.06674 translate read null
2025-12-07 LightSearcher: Efficient DeepSearch via Experiential Memory Hengzhi Lan et.al. 2512.06653 translate read null
2025-12-07 Analyzing Collision Rates in Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning Muyang Fan et.al. 2512.06645 translate read null
2025-12-07 Learning to Hedge Swaptions Zaniar Ahmadi et.al. 2512.06639 translate read null
2025-12-07 MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment Ruicheng Zhang et.al. 2512.06628 translate read null
2025-12-07 A New Trajectory-Oriented Approach to Enhancing Comprehensive Crowd Navigation Performance Xinyu Zhou et.al. 2512.06608 translate read null
2025-12-06 MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding Yuhao Su et.al. 2512.06581 translate read null
2025-12-06 Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input Zifan Xu et.al. 2512.06571 translate read null
2025-12-06 A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation Xiaocan Li et.al. 2512.06547 translate read null
2025-12-06 Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning Ming Chen et.al. 2512.06533 translate read null
2025-12-06 Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains Wanru Gong et.al. 2512.06486 translate read null
2025-12-06 Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control Nathan P. Lawrence et.al. 2512.06471 translate read null
2025-12-06 RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs Runlong Zhou et.al. 2512.06392 translate read null
2025-12-06 VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning Yuji Wang et.al. 2512.06373 translate read null
2025-12-06 LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing Zhiying Yang et.al. 2512.06351 translate read null
2025-12-06 ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models Jiahao Li et.al. 2512.06328 translate read null
2025-12-06 A Hybrid Physics-Based and Reinforcement Learning Framework for Electric Vehicle Charging Time Prediction Praharshitha Aryasomayajula et.al. 2512.06287 translate read null
2025-12-06 Networked Restless Multi-Arm Bandits with Reinforcement Learning Hanmo Zhang et.al. 2512.06274 translate read null
2025-12-06 Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models Chen Yang et.al. 2512.06266 translate read null
2025-12-06 Learning Without Time-Based Embodiment Resets in Soft-Actor Critic Homayoon Farrahi et.al. 2512.06252 translate read null
2025-12-06 Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning Chris Tava et.al. 2512.06250 translate read null
2025-12-06 Auto-exploration for online reinforcement learning Caleb Ju et.al. 2512.06244 translate read null
2025-12-06 AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems Chuanhao Nie et.al. 2512.06240 translate read null
2025-12-05 Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration Huizhen Yu et.al. 2512.06218 translate read null
2025-12-05 Quantifying Memory Use in Reinforcement Learning with Temporal Range Rodney Lafuente-Mercado et.al. 2512.06204 translate read null
2025-12-05 JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning Ufuk Çakır et.al. 2512.06102 translate read null
2025-12-05 Empathy by Design: Aligning Large Language Models for Healthcare Dialogue Emre Umucu et.al. 2512.06097 translate read null
2025-12-05 Comparative Analysis of Autonomous and Systematic Control Strategies for Hole-Doped Hubbard Clusters: Reinforcement Learning versus Physics-Guided Design Shivanshu Dwivedi et.al. 2512.06095 translate read null
2025-12-05 Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring Mohanakrishnan Hariharan et.al. 2512.06060 translate read null
2025-12-05 EditThinker: Unlocking Iterative Reasoning for Any Image Editor Hongyu Li et.al. 2512.05965 translate read null
2025-12-05 Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity Germán Kruszewski et.al. 2512.05962 translate read null
2025-12-05 Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning Yunhao Cao et.al. 2512.05953 translate read null
2025-12-05 Variational Quantum Rainbow Deep Q-Network for Optimizing Resource Allocation Problem Truong Thanh Hung Nguyen et.al. 2512.05946 translate read null
2025-12-05 Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation Fabian Konstantinidis et.al. 2512.05812 translate read null
2025-12-05 Real-time Remote Tracking and Autonomous Planning for Whale Rendezvous using Robots Sushmita Bhattacharya et.al. 2512.05808 translate read null
2025-12-05 A Fast Anti-Jamming Cognitive Radar Deployment Algorithm Based on Reinforcement Learning Wencheng Cai et.al. 2512.05753 translate read null
2025-12-05 A High-Order Immersed Boundary Method for Fluid-Structure Interaction Problems Yingjie Xia et.al. 2512.05733 translate read null
2025-12-05 Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning Ali Krayani et.al. 2512.05711 translate read null
2025-12-05 LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving Yiming Shu et.al. 2512.05686 translate read null
2025-12-05 MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation Zhitao He et.al. 2512.05671 translate read null
2025-12-05 Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning Zhenpeng Su et.al. 2512.05591 translate read null
2025-12-05 Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning Pengcheng Dai et.al. 2512.05447 translate read null
2025-12-05 ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction Jiangtong Tan et.al. 2512.05422 translate read null
2025-12-05 State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning Yuxiang Liu et.al. 2512.05335 translate read null
2025-12-04 Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay Mehmet Efe Lorasdagi et.al. 2512.05320 translate read null
2025-12-04 Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces Na Li et.al. 2512.05291 translate read null
2025-12-04 Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem Ali Al Housseini et.al. 2512.05207 translate read null
2025-12-04 ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Shengyuan Ding et.al. 2512.05111 translate read null
2025-12-04 STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models Feng Xu et.al. 2512.05107 translate read null
2025-12-04 Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning Purbesh Mitra et.al. 2512.05105 translate read link

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)