Reinforcement Learning - 2025-12
Reinforcement Learning - 2025-12
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-12-31 | Dichotomous Diffusion Policy Optimization | Ruiming Liang et.al. | 2601.00898 | translate | read | null |
| 2025-12-31 | VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition | Hongbo Jin et.al. | 2601.00887 | translate | read | null |
| 2025-12-30 | SmartFlow Reinforcement Learning and Agentic AI for Bike-Sharing Optimisation | Aditya Sreevatsa K et.al. | 2601.00868 | translate | read | null |
| 2025-12-25 | Horizon Reduction as Information Loss in Offline Reinforcement Learning | Uday Kumar Nidadala et.al. | 2601.00831 | translate | read | null |
| 2025-12-31 | GRL-SNAM: Geometric Reinforcement Learning with Path Differential Hamiltonians for Simultaneous Navigation and Mapping in Unknown Environments | Aditya Sai Ellendula et.al. | 2601.00116 | translate | read | null |
| 2025-12-31 | Adaptive Pinching Antenna Optimization via Meta-Learning for Physical-Layer Security in Dynamic Wireless Networks | Khalid T. Musri et.al. | 2601.00115 | translate | read | null |
| 2025-12-31 | Universal Adaptive Constraint Propagation: Scaling Structured Inference for Large Language Models via Meta-Reinforcement Learning | Ibne Farabi Shihab et.al. | 2601.00095 | translate | read | null |
| 2025-12-31 | Reinforcement learning with timed constraints for robotics motion planning | Zhaoan Wang et.al. | 2601.00087 | translate | read | null |
| 2025-12-31 | Coordinated Humanoid Manipulation with Choice Policies | Haozhi Qi et.al. | 2512.25072 | translate | read | null |
| 2025-12-31 | Scaling Open-Ended Reasoning to Predict the Future | Nikhil Chandak et.al. | 2512.25070 | translate | read | null |
| 2025-12-31 | Many Minds from One Model: Bayesian Transformers for Population Intelligence | Diji Yang et.al. | 2512.25063 | translate | read | null |
| 2025-12-31 | ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning | Timo Kaufmann et.al. | 2512.25023 | translate | read | null |
| 2025-12-31 | MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control | Yongwei Zhang et.al. | 2512.24955 | translate | read | null |
| 2025-12-31 | Iterative Deployment Improves Planning Skills in LLMs | Augusto B. Corrêa et.al. | 2512.24940 | translate | read | null |
| 2025-12-31 | Throughput Optimization in UAV-Mounted RIS under Jittering and Imperfect CSI via DRL | Anas K. Saeed et.al. | 2512.24773 | translate | read | null |
| 2025-12-31 | Sparse Offline Reinforcement Learning with Corruption Robustness | Nam Phuong Tran et.al. | 2512.24768 | translate | read | null |
| 2025-12-31 | Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow | Karthik Dharmarajan et.al. | 2512.24766 | translate | read | null |
| 2025-12-31 | Control of Microrobots with Reinforcement Learning under On-Device Compute Constraints | Yichen Liu et.al. | 2512.24740 | translate | read | null |
| 2025-12-31 | Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting | Kai Ye et.al. | 2512.24702 | translate | read | null |
| 2025-12-31 | Dynamic Policy Learning for Legged Robot with Simplified Model Pretraining and Model Homotopy Transfer | Dongyun Kang et.al. | 2512.24698 | translate | read | null |
| 2025-12-31 | Hierarchical Online Optimization Approach for IRS-enabled Low-altitude MEC in Vehicular Networks | Yixian Wang et.al. | 2512.24659 | translate | read | null |
| 2025-12-31 | RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence | Chengkai Hou et.al. | 2512.24653 | translate | read | null |
| 2025-12-31 | Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation | Yury Kolomeytsev et.al. | 2512.24651 | translate | read | null |
| 2025-12-31 | Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization | Yuchen Shi et.al. | 2512.24615 | translate | read | null |
| 2025-12-31 | Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization | Dong Qiu et.al. | 2512.24609 | translate | read | null |
| 2025-12-31 | Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning | Shanyu Han et.al. | 2512.24580 | translate | read | null |
| 2025-12-31 | From Perception to Punchline: Empowering VLM with the Art of In-the-wild Meme | Xueyan Li et.al. | 2512.24555 | translate | read | null |
| 2025-12-31 | From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning | Amir Tahmasbi et.al. | 2512.24532 | translate | read | null |
| 2025-12-30 | Networked Markets, Fragmented Data: Adaptive Graph Learning for Customer Risk Analytics and Policy Design | Lecheng Zheng et.al. | 2512.24487 | translate | read | null |
| 2025-12-30 | Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics | Akash Samanta et.al. | 2512.24445 | translate | read | null |
| 2025-12-30 | Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models | Lars van der Laan et.al. | 2512.24407 | translate | read | null |
| 2025-12-30 | SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning | Yong Xien Chng et.al. | 2512.24330 | translate | read | null |
| 2025-12-30 | MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems | Wan Jiang et.al. | 2512.24325 | translate | read | null |
| 2025-12-30 | Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking | Meiqi Chen et.al. | 2512.24297 | translate | read | null |
| 2025-12-30 | Real-world Reinforcement Learning from Suboptimal Interventions | Yinuo Zhao et.al. | 2512.24288 | translate | read | null |
| 2025-12-30 | DRL-TH: Jointly Utilizing Temporal Graph Attention and Hierarchical Fusion for UGV Navigation in Crowded Environments | Ruitong Li et.al. | 2512.24284 | translate | read | null |
| 2025-12-30 | Deep Reinforcement Learning for Solving the Fleet Size and Mix Vehicle Routing Problem | Pengfu Wan et.al. | 2512.24251 | translate | read | null |
| 2025-12-30 | Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning | Chubin Chen et.al. | 2512.24146 | translate | read | null |
| 2025-12-30 | GARDO: Reinforcing Diffusion Models without Reward Hacking | Haoran He et.al. | 2512.24138 | translate | read | null |
| 2025-12-30 | HY-MT1.5 Technical Report | Mao Zheng et.al. | 2512.24092 | translate | read | null |
| 2025-12-30 | How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns | Haoyue Bai et.al. | 2512.24063 | translate | read | null |
| 2025-12-30 | Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data | Wenye Li et.al. | 2512.24056 | translate | read | null |
| 2025-12-30 | ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment | Natchaya Temyingyong et.al. | 2512.24040 | translate | read | null |
| 2025-12-30 | Reinforced Diffusion: Learning to Push the Limits of Anisotropic Diffusion for Image Denoising | Xinran Qin et.al. | 2512.24035 | translate | read | null |
| 2025-12-30 | RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations | Xingqi He et.al. | 2512.24023 | translate | read | null |
| 2025-12-30 | CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards | Zhiming Lin et.al. | 2512.23971 | translate | read | null |
| 2025-12-30 | Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration | Lars van der Laan et.al. | 2512.23927 | translate | read | null |
| 2025-12-30 | Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias | Xia Chen et.al. | 2512.23916 | translate | read | null |
| 2025-12-29 | Beamforming for Massive MIMO Aerial Communications: A Robust and Scalable DRL Approach | Hesam Khoshkbari et.al. | 2512.23902 | translate | read | null |
| 2025-12-29 | Distributed Beamforming in Massive MIMO Communication for a Constellation of Airborne Platform Stations | Hesam Khoshkbari et.al. | 2512.23900 | translate | read | null |
| 2025-12-29 | Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR | Yuyang Zhang et.al. | 2512.23870 | translate | read | null |
| 2025-12-29 | Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting | Lars van der Laan et.al. | 2512.23805 | translate | read | null |
| 2025-12-29 | Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark | Manu et.al. | 2512.23779 | translate | read | null |
| 2025-12-29 | FineFT: Efficient and Risk-Aware Ensemble Reinforcement Learning for Futures Trading | Molei Qin et.al. | 2512.23773 | translate | read | null |
| 2025-12-29 | Safety-Biased Policy Optimisation: Towards Hard-Constrained Reinforcement Learning via Trust Regions | Ankit Kanwar et.al. | 2512.23770 | translate | read | null |
| 2025-12-28 | Audited Skill-Graph Self-Improvement for Agentic LLMs via Verifiable Rewards, Experience Synthesis, and Continual Memory | Ken Huang et.al. | 2512.23760 | translate | read | null |
| 2025-12-29 | Training AI Co-Scientists Using Rubric Rewards | Shashwat Goel et.al. | 2512.23707 | translate | read | null |
| 2025-12-29 | Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation | Huajie Tan et.al. | 2512.23703 | translate | read | null |
| 2025-12-29 | Bellman Calibration for V-Learning in Offline Reinforcement Learning | Lars van der Laan et.al. | 2512.23694 | translate | read | null |
| 2025-12-29 | Le Cam Distortion: A Decision-Theoretic Framework for Robust Transfer Learning | Deniz Akdemir et.al. | 2512.23617 | translate | read | null |
| 2025-12-29 | ProGuard: Towards Proactive Multimodal Safeguard | Shaohan Yu et.al. | 2512.23573 | translate | read | null |
| 2025-12-29 | ThinkGen: Generalized Thinking for Visual Generation | Siyu Jiao et.al. | 2512.23568 | translate | read | null |
| 2025-12-29 | A NEAT Approach to Evolving Neural-Network-based Optimization of Chiral Photonic Metasurfaces: Application of a Neuro-Evolution Pipeline | Davide Filippozzi et.al. | 2512.23558 | translate | read | null |
| 2025-12-29 | PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis | Shengyi Hua et.al. | 2512.23545 | translate | read | null |
| 2025-12-29 | Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning | Zuoyou Jiang et.al. | 2512.23515 | translate | read | null |
| 2025-12-29 | Hierarchical Decision Mamba Meets Agentic AI: A Novel Approach for RAN Slicing in 6G | Md Arafat Habib et.al. | 2512.23502 | translate | read | null |
| 2025-12-29 | Agentic AI for Autonomous Defense in Software Supply Chain Security: Beyond Provenance to Vulnerability Mitigation | Toqeer Ali Syed et.al. | 2512.23480 | translate | read | null |
| 2025-12-29 | HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation | Yuxin Wen et.al. | 2512.23464 | translate | read | null |
| 2025-12-29 | Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance | Zhuo Li et.al. | 2512.23461 | translate | read | null |
| 2025-12-29 | Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following | Kongcheng Zhang et.al. | 2512.23457 | translate | read | null |
| 2025-12-29 | The World Is Bigger! A Computationally-Embedded Perspective on the Big World Hypothesis | Alex Lewandowski et.al. | 2512.23419 | translate | read | null |
| 2025-12-29 | AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis | Cehua Yang et.al. | 2512.23366 | translate | read | null |
| 2025-12-29 | CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation | Ke Niu et.al. | 2512.23333 | translate | read | null |
| 2025-12-29 | Splitwise: Collaborative Edge-Cloud Inference for LLMs via Lyapunov-Assisted DRL | Abolfazl Younesi et.al. | 2512.23310 | translate | read | null |
| 2025-12-29 | Agentic AI-Enhanced Semantic Communications: Foundations, Architecture, and Applications | Haixiao Gao et.al. | 2512.23294 | translate | read | null |
| 2025-12-29 | Interpretable Safety Alignment via SAE-Constructed Low-Rank Subspace Adaptation | Dianyun Wang et.al. | 2512.23260 | translate | read | null |
| 2025-12-29 | ViLaCD-R1: A Vision-Language Framework for Semantic Change Detection in Remote Sensing | Xingwei Ma et.al. | 2512.23244 | translate | read | null |
| 2025-12-29 | A Human-Oriented Cooperative Driving Approach: Integrating Driving Intention, State, and Conflict | Qin Wang et.al. | 2512.23220 | translate | read | null |
| 2025-12-29 | Evaluating Parameter Efficient Methods for RLVR | Qingyu Yin et.al. | 2512.23165 | translate | read | null |
| 2025-12-29 | SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling | Yufan He et.al. | 2512.23162 | translate | read | null |
| 2025-12-28 | A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms | Yingru Li et.al. | 2512.23097 | translate | read | null |
| 2025-12-28 | Benchmark Success, Clinical Failure: When Reinforcement Learning Optimizes for Benchmarks, Not Patients | Armin Berger et.al. | 2512.23090 | translate | read | null |
| 2025-12-28 | Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning | Yingru Li et.al. | 2512.23087 | translate | read | null |
| 2025-12-28 | Trust Region Masking for Long-Horizon LLM Reinforcement Learning | Yingru Li et.al. | 2512.23075 | translate | read | null |
| 2025-12-28 | Diversity or Precision? A Deep Dive into Next Token Prediction | Haoyuan Wu et.al. | 2512.22955 | translate | read | null |
| 2025-12-28 | APO: Alpha-Divergence Preference Optimization | Wang Zixian et.al. | 2512.22953 | translate | read | null |
| 2025-12-28 | Heterogeneity in Multi-Agent Reinforcement Learning | Tianyi Hu et.al. | 2512.22941 | translate | read | null |
| 2025-12-28 | Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning | Ünver Çiftçi et.al. | 2512.22910 | translate | read | null |
| 2025-12-28 | SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning | Xiaotian Ren et.al. | 2512.22895 | translate | read | null |
| 2025-12-28 | Reinforcement Networks: novel framework for collaborative Multi-Agent Reinforcement Learning tasks | Maksim Kryzhanovskiy et.al. | 2512.22876 | translate | read | null |
| 2025-12-28 | Adaptive Trust Consensus for Blockchain IoT: Comparing RL, DRL, and MARL Against Naive, Collusive, Adaptive, Byzantine, and Sleeper Attacks | Soham Padia et.al. | 2512.22860 | translate | read | null |
| 2025-12-28 | AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning | Shihao Cai et.al. | 2512.22857 | translate | read | null |
| 2025-12-28 | ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning | Bangya Liu et.al. | 2512.22854 | translate | read | null |
| 2025-12-28 | MARPO: A Reflective Policy Optimization for Multi Agent Reinforcement Learning | Cuiling Wu et.al. | 2512.22832 | translate | read | null |
| 2025-12-28 | TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning | Gaurav Chaudhary et.al. | 2512.22824 | translate | read | null |
| 2025-12-28 | ReDiF: Reinforced Distillation for Few Step Diffusion | Amirhossein Tighkhorshid et.al. | 2512.22802 | translate | read | null |
| 2025-12-28 | Parallel Diffusion Solver via Residual Dirichlet Policy Optimization | Ruoyu Wang et.al. | 2512.22796 | translate | read | null |
| 2025-12-28 | FoldAct: Efficient and Stable Context Folding for Long-Horizon Search Agents | Jiaqi Shao et.al. | 2512.22733 | translate | read | null |
| 2025-12-27 | Cyber Resilience in Next-Generation Networks: Threat Landscape, Theoretical Foundations, and Design Paradigms | Junaid Farooq et.al. | 2512.22721 | translate | read | null |
| 2025-12-27 | Memento 2: Learning by Stateful Reflective Memory | Jun Wang et.al. | 2512.22716 | translate | read | null |
| 2025-12-27 | Optimal Regulation of Nonlinear Input-Affine Systems via an Integral Reinforcement Learning-Based State-Dependent Riccati Equation Approach | Arya Rashidinejad Meibodi et.al. | 2512.22668 | translate | read | null |
| 2025-12-27 | FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution | Yidi Liu et.al. | 2512.22647 | translate | read | null |
| 2025-12-27 | RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure | Wei Gao et.al. | 2512.22560 | translate | read | null |
| 2025-12-27 | AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing | Jiacheng Li et.al. | 2512.22455 | translate | read | null |
| 2025-12-26 | PHANTOM: Physics-Aware Adversarial Attacks against Federated Learning-Coordinated EV Charging Management System | Mohammad Zakaria Haider et.al. | 2512.22381 | translate | read | null |
| 2025-12-26 | Reinforcement Learning for Optimal Stopping in POMDPs with Application to Quickest Change Detection | Austin Cooper et.al. | 2512.22347 | translate | read | null |
| 2025-12-26 | SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents | Shaofei Cai et.al. | 2512.22322 | translate | read | null |
| 2025-12-26 | VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning | Yang Ding et.al. | 2512.22315 | translate | read | null |
| 2025-12-24 | Agentic Software Issue Resolution with Large Language Models: A Survey | Zhonghao Jiang et.al. | 2512.22256 | translate | read | null |
| 2025-12-23 | Masking Teacher and Reinforcing Student for Distilling Vision-Language Models | Byung-Kwan Lee et.al. | 2512.22238 | translate | read | null |
| 2025-12-23 | DiRL: An Efficient Post-Training Framework for Diffusion Language Models | Ying Zhu et.al. | 2512.22234 | translate | read | link |
| 2025-12-26 | Hybrid Deep Reinforcement Learning for Joint Resource Allocation in Multi-Active RIS-Aided Uplink Communications | Mohamed Shalma et.al. | 2512.22107 | translate | read | null |
| 2025-12-26 | Meta-Learning-Based Handover Management in NextG O-RAN | Michail Kalntis et.al. | 2512.22022 | translate | read | null |
| 2025-12-26 | Latency-Optimal Cache-aided Multicast Streaming via Forward-Backward Reinforcement Learning | Mohsen Amidzadeh et.al. | 2512.21954 | translate | read | null |
| 2025-12-26 | SWE-RM: Execution-free Feedback For Software Engineering Agents | KaShun Shum et.al. | 2512.21919 | translate | read | null |
| 2025-12-26 | A Comedy of Estimators: On KL Regularization in RL Training of LLMs | Vedant Shah et.al. | 2512.21852 | translate | read | null |
| 2025-12-26 | Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning | YuXiang Kong et.al. | 2512.21828 | translate | read | null |
| 2025-12-26 | Q-A3C2: Quantum Reinforcement Learning with Time-Series Dynamic Clustering for Adaptive ETF Stock Selection | Yen-Ku Liu et.al. | 2512.21819 | translate | read | null |
| 2025-12-25 | Multiconnectivity for SAGIN: Current Trends, Challenges, AI-driven Solutions, and Opportunities | Abd Ullah Khan et.al. | 2512.21717 | translate | read | null |
| 2025-12-25 | Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search | Maximilian Weichart et.al. | 2512.21648 | translate | read | null |
| 2025-12-25 | Jointly Optimal Policies for Remote Estimation of Autoregressive Markov Processes over Time-Correlated Fading Channel | Manali Dutta et.al. | 2512.21630 | translate | read | null |
| 2025-12-25 | Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards | Xinyu Tang et.al. | 2512.21625 | translate | read | null |
| 2025-12-25 | Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations | Xin Liu et.al. | 2512.21586 | translate | read | null |
| 2025-12-25 | Towards Learning-Based Formula 1 Race Strategies | Giona Fieni et.al. | 2512.21570 | translate | read | null |
| 2025-12-25 | Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model | Yanhao Li et.al. | 2512.21540 | translate | read | null |
| 2025-12-25 | Generative Actor Critic | Aoyang Qin et.al. | 2512.21527 | translate | read | null |
| 2025-12-25 | DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO | Henglin Liu et.al. | 2512.21514 | translate | read | null |
| 2025-12-24 | dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning | Shirui Chen et.al. | 2512.21446 | translate | read | null |
| 2025-12-24 | A Survey of Freshness-Aware Wireless Networking with Reinforcement Learning | Alimu Alibotaiken et.al. | 2512.21412 | translate | read | null |
| 2025-12-24 | A Reinforcement Learning Approach to Synthetic Data Generation | Natalia Espinosa-Dice et.al. | 2512.21395 | translate | read | null |
| 2025-12-24 | RoboCade: Gamifying Robot Data Collection | Suvir Mirchandani et.al. | 2512.21235 | translate | read | null |
| 2025-12-24 | MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models | Andres M Bran et.al. | 2512.21231 | translate | read | null |
| 2025-12-24 | Global End-Effector Pose Control of an Underactuated Aerial Manipulator via Reinforcement Learning | Shlok Deshmukh et.al. | 2512.21085 | translate | read | null |
| 2025-12-24 | Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics | Karim Abdelsalam et.al. | 2512.21081 | translate | read | null |
| 2025-12-24 | LSTM-Based Modeling and Reinforcement Learning Control of a Magnetically Actuated Catheter | Arya Rashidinejad Meibodi et.al. | 2512.21063 | translate | read | null |
| 2025-12-24 | Policy-Conditioned Policies for Multi-Agent Task Solving | Yue Lin et.al. | 2512.21024 | translate | read | null |
| 2025-12-24 | LLM-Empowered Agentic AI for QoE-Aware Network Slicing Management in Industrial IoT | Xudong Wang et.al. | 2512.20997 | translate | read | null |
| 2025-12-24 | Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions | Jingyang You et.al. | 2512.20974 | translate | read | null |
| 2025-12-24 | ReACT-Drug: Reaction-Template Guided Reinforcement Learning for de novo Drug Design | R Yadunandan et.al. | 2512.20958 | translate | read | null |
| 2025-12-24 | One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents | Zhaoxi Zhang et.al. | 2512.20957 | translate | read | null |
| 2025-12-24 | Model-free stochastic linear quadratic control for discrete-time systems with multiplicative and additive noises via semidefinite programming | Jing Guo et.al. | 2512.20911 | translate | read | null |
| 2025-12-24 | Embodied AI-Enhanced IoMT Edge Computing: UAV Trajectory Optimization and Task Offloading with Mobility Prediction | Siqi Mu et.al. | 2512.20902 | translate | read | null |
| 2025-12-24 | The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents | Zan-Kai Chong et.al. | 2512.20884 | translate | read | null |
| 2025-12-24 | Proprioception Enhances Vision Language Model in Generating Captions and Subtask Segmentations for Robot Task | Kanata Suzuki et.al. | 2512.20876 | translate | read | null |
| 2025-12-24 | NVIDIA Nemotron 3: Efficient and Open Intelligence | NVIDIA et.al. | 2512.20856 | translate | read | null |
| 2025-12-23 | QoS- and Physics-Aware Routing in Optical LEO Satellite Networks via Deep Reinforcement Learning | Mohammad Taghi Dabiri et.al. | 2512.20835 | translate | read | null |
| 2025-12-23 | Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions | Rashmeet Kaur Nayyar et.al. | 2512.20831 | translate | read | null |
| 2025-12-23 | Safety Alignment of LMs via Non-cooperative Games | Anselm Paulus et.al. | 2512.20806 | translate | read | link |
| 2025-12-23 | Generalization of RLVR Using Causal Reasoning as a Testbed | Brian Lu et.al. | 2512.20760 | translate | read | null |
| 2025-12-23 | AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent | Haipeng Luo et.al. | 2512.20745 | translate | read | null |
| 2025-12-23 | AI-Driven Green Cognitive Radio Networks for Sustainable 6G Communication | Anshul Sharma et.al. | 2512.20739 | translate | read | null |
| 2025-12-23 | Learning-Enabled Elastic Network Topology for Distributed ISAC Service Provisioning | Jie Chen et.al. | 2512.20722 | translate | read | null |
| 2025-12-22 | Mechanism-Based Intelligence (MBI): Differentiable Incentives for Rational Coordination and Guaranteed Alignment in Multi-Agent Systems | Stefano Grassi et.al. | 2512.20688 | translate | read | null |
| 2025-12-23 | LongVideoAgent: Multi-Agent Reasoning with Long Videos | Runtao Liu et.al. | 2512.20618 | translate | read | link |
| 2025-12-23 | Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning | Seijin Kobayashi et.al. | 2512.20605 | translate | read | null |
| 2025-12-23 | Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information | İbrahim Oğuz Çetinkaya et.al. | 2512.20589 | translate | read | null |
| 2025-12-23 | Performative Policy Gradient: Optimality in Performative Reinforcement Learning | Debabrota Basu et.al. | 2512.20576 | translate | read | null |
| 2025-12-23 | LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving | Long Nguyen et.al. | 2512.20563 | translate | read | link |
| 2025-12-23 | Recurrent Off-Policy Deep Reinforcement Learning Doesn’t Have to be Slow | Tyler Clark et.al. | 2512.20513 | translate | read | null |
| 2025-12-23 | Resilient Packet Forwarding: A Reinforcement Learning Approach to Routing in Gaussian Interconnected Networks with Clustered Faults | Mohammad Walid Charrwi et.al. | 2512.20394 | translate | read | null |
| 2025-12-23 | Identifying Appropriately-Sized Services with Deep Reinforcement Learning | Syeda Tasnim Fabiha et.al. | 2512.20381 | translate | read | null |
| 2025-12-23 | TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning | Saisai Yang et.al. | 2512.20312 | translate | read | null |
| 2025-12-23 | Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks | Divya Vijay et.al. | 2512.20275 | translate | read | null |
| 2025-12-23 | Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning | Kausthubh Manda et.al. | 2512.20220 | translate | read | null |
| 2025-12-23 | Joint Design of Embedded Index Coding and Beamforming for MIMO-based Distributed Computing via Multi-Agent Reinforcement Learning | Heekang Song et.al. | 2512.20201 | translate | read | null |
| 2025-12-23 | Edge-Served Congestion Control for Wireless Multipath Transmission with a Transformer Agent | Liang Wang et.al. | 2512.20186 | translate | read | null |
| 2025-12-23 | FaithLens: Detecting and Explaining Faithfulness Hallucination | Shuzheng Si et.al. | 2512.20182 | translate | read | link |
| 2025-12-23 | RESPOND: Risk-Enhanced Structured Pattern for LLM-driven Online Node-level Decision-making | Dan Chen et.al. | 2512.20179 | translate | read | null |
| 2025-12-23 | Offline Safe Policy Optimization From Heterogeneous Feedback | Ze Gong et.al. | 2512.20173 | translate | read | null |
| 2025-12-23 | Multi-hop Reasoning via Early Knowledge Alignment | Yuxin Wang et.al. | 2512.20144 | translate | read | link |
| 2025-12-23 | MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization | Zhuo Yang et.al. | 2512.20135 | translate | read | null |
| 2025-12-23 | Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering | Yuanhao Chen et.al. | 2512.20115 | translate | read | null |
| 2025-12-23 | ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language | Aly Lidayan et.al. | 2512.20111 | translate | read | null |
| 2025-12-23 | Information-directed sampling for bandits: a primer | Annika Hirling et.al. | 2512.20096 | translate | read | null |
| 2025-12-23 | Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents | Yiming Du et.al. | 2512.20092 | translate | read | link |
| 2025-12-23 | Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches | Chaithra et.al. | 2512.20082 | translate | read | null |
| 2025-12-23 | Scaling Reinforcement Learning for Content Moderation with Large Language Models | Hamed Firooz et.al. | 2512.20061 | translate | read | null |
| 2025-12-23 | An Optimal Policy for Learning Controllable Dynamics by Exploration | Peter N. Loxley et.al. | 2512.20053 | translate | read | null |
| 2025-12-23 | From Optimization to Learning: Dual-Approach Resource Allocation for Over-the-Air Edge Computing Under Execution Uncertainty | Tuo Wu et.al. | 2512.20008 | translate | read | null |
| 2025-12-22 | Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning | Jiayun Wu et.al. | 2512.19920 | translate | read | null |
| 2025-12-21 | Learning to Design City-scale Transit Routes | Bibek Poudel et.al. | 2512.19767 | translate | read | null |
| 2025-12-22 | Scalably Enhancing the Clinical Validity of a Task Benchmark with Physician Oversight | Junze Ye et.al. | 2512.19691 | translate | read | null |
| 2025-12-22 | Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies | Yuqiao Tan et.al. | 2512.19673 | translate | read | link |
| 2025-12-22 | Learning Generalizable Hand-Object Tracking from Synthetic Demonstrations | Yinhuai Wang et.al. | 2512.19583 | translate | read | null |
| 2025-12-22 | LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller | Kirill Djebko et.al. | 2512.19576 | translate | read | null |
| 2025-12-22 | Variational Autoregressive Networks Applied to $φ^4$ Field Theory Systems | Moxian Qian et.al. | 2512.19575 | translate | read | null |
| 2025-12-22 | CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal | Yongxin Wang et.al. | 2512.19554 | translate | read | null |
| 2025-12-22 | LacaDM: A Latent Causal Diffusion Model for Multiobjective Reinforcement Learning | Xueming Yan et.al. | 2512.19516 | translate | read | null |
| 2025-12-22 | A Gauss-Newton-Induced Structure-Exploiting Algorithm for Differentiable Optimal Control | Yuankun Chen et.al. | 2512.19447 | translate | read | null |
| 2025-12-22 | CodeSimpleQA: Scaling Factuality in Code Large Language Models | Jian Yang et.al. | 2512.19424 | translate | read | null |
| 2025-12-22 | Learning General Policies with Policy Gradient Methods | Simon Ståhlberg et.al. | 2512.19366 | translate | read | null |
| 2025-12-22 | Interpretable Hybrid Deep Q-Learning Framework for IoT-Based Food Spoilage Prediction with Synthetic Data Generation and Hardware Validation | Isshaan Singh et.al. | 2512.19361 | translate | read | null |
| 2025-12-22 | First-Order Representation Languages for Goal-Conditioned RL | Simon Ståhlberg et.al. | 2512.19355 | translate | read | null |
| 2025-12-22 | Enhancing PLS of Indoor IRS-VLC Systems for Colluding and Non-Colluding Eavesdroppers | Rashid Iqbal et.al. | 2512.19339 | translate | read | null |
| 2025-12-22 | Learning-Assisted Multi-Operator Variable Neighborhood Search for Urban Cable Routing | Wei Liu et.al. | 2512.19321 | translate | read | null |
| 2025-12-22 | SafeMed-R1: Adversarial Reinforcement Learning for Generalizable and Robust Medical Reasoning in Vision-Language Models | A. A. Gde Yogi Pramana et.al. | 2512.19317 | translate | read | null |
| 2025-12-22 | Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing | Xu Zhang et.al. | 2512.19302 | translate | read | null |
| 2025-12-22 | RMLer: Synthesizing Novel Objects across Diverse Categories via Reinforcement Mixing Learning | Jun Li et.al. | 2512.19300 | translate | read | null |
| 2025-12-22 | Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization | Zhaoyang Liu et.al. | 2512.19270 | translate | read | null |
| 2025-12-22 | WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving | Pengxuan Yang et.al. | 2512.19133 | translate | read | link |
| 2025-12-22 | AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards | Zihan Lin et.al. | 2512.19126 | translate | read | null |
| 2025-12-22 | Explicit and Non-asymptotic Query Complexities of Rank-Based Zeroth-order Algorithm on Stochastic Smooth Functions | Haishan Ye et.al. | 2512.19104 | translate | read | null |
| 2025-12-22 | Tool-Augmented Hybrid Ensemble Reasoning with Distillation for Bilingual Mathematical Problem Solving | Peiqing Lu et.al. | 2512.19093 | translate | read | null |
| 2025-12-22 | CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models | Pengyu Chen et.al. | 2512.19083 | translate | read | null |
| 2025-12-22 | ORPR: An OR-Guided Pretrain-then-Reinforce Learning Model for Inventory Management | Lingjie Zhao et.al. | 2512.19001 | translate | read | null |
| 2025-12-22 | DTCCL: Disengagement-Triggered Contrastive Continual Learning for Autonomous Bus Planners | Yanding Yang et.al. | 2512.18988 | translate | read | null |
| 2025-12-22 | Scaling Online Distributionally Robust Reinforcement Learning: Sample-Efficient Guarantees with General Function Approximation | Debamita Ghosh et.al. | 2512.18957 | translate | read | null |
| 2025-12-22 | Training Multimodal Large Reasoning Models Needs Better Thoughts: A Three-Stage Framework for Long Chain-of-Thought Synthesis and Selection | Yizhi Wang et.al. | 2512.18956 | translate | read | null |
| 2025-12-22 | A Framework for Deploying Learning-based Quadruped Loco-Manipulation | Yadong Liu et.al. | 2512.18938 | translate | read | null |
| 2025-12-21 | QoS-Aware Load Balancing in the Computing Continuum via Multi-Player Bandits | Ivan Čilić et.al. | 2512.18915 | translate | read | null |
| 2025-12-21 | Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations | Shaomu Tan et.al. | 2512.18906 | translate | read | null |
| 2025-12-21 | Structural Reinforcement Learning for Heterogeneous Agent Macroeconomics | Yucheng Yang et.al. | 2512.18892 | translate | read | null |
| 2025-12-21 | CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning | Zijun Gao et.al. | 2512.18857 | translate | read | null |
| 2025-12-21 | InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement | Feeza Khan Khanzada et.al. | 2512.18850 | translate | read | null |
| 2025-12-21 | From Word to World: Can Large Language Models be Implicit Text-based World Models? | Yixia Li et.al. | 2512.18832 | translate | read | null |
| 2025-12-21 | MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation | Guohui Zhang et.al. | 2512.18766 | translate | read | null |
| 2025-12-21 | Gaussian-Mixture-Model Q-Functions for Policy Iteration in Reinforcement Learning | Minh Vu et.al. | 2512.18763 | translate | read | null |
| 2025-12-21 | InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search | Kaican Li et.al. | 2512.18745 | translate | read | null |
| 2025-12-21 | A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models | Zhiquan Tan et.al. | 2512.18730 | translate | read | null |
| 2025-12-21 | Demonstration-Guided Continual Reinforcement Learning in Dynamic Environments | Xue Yang et.al. | 2512.18670 | translate | read | null |
| 2025-12-21 | Offline Reinforcement Learning for End-to-End Autonomous Driving | Chihiro Noguchi et.al. | 2512.18662 | translate | read | null |
| 2025-12-21 | LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction | Jensen Zhang et.al. | 2512.18623 | translate | read | null |
| 2025-12-21 | A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback | Thanh Dat Hoang et.al. | 2512.18622 | translate | read | null |
| 2025-12-21 | Trajectory Planning for UAV-Based Smart Farming Using Imitation-Based Triple Deep Q-Learning | Wencan Mao et.al. | 2512.18604 | translate | read | null |
| 2025-12-21 | SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models | Pengcheng Li et.al. | 2512.18583 | translate | read | null |
| 2025-12-21 | ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning | Weijie Zhou et.al. | 2512.18571 | translate | read | null |
| 2025-12-21 | Vox Deorum: A Hybrid LLM Architecture for 4X / Grand Strategy Game AI – Lessons from Civilization V | John Chen et.al. | 2512.18564 | translate | read | null |
| 2025-12-21 | Distributionally Robust Multi-Agent Reinforcement Learning for Intelligent Traffic Control | Shuwei Pei et.al. | 2512.18558 | translate | read | null |
| 2025-12-21 | Toward Training Superintelligent Software Agents through Self-Play SWE-RL | Yuxiang Wei et.al. | 2512.18552 | translate | read | null |
| 2025-12-20 | Scaling up Stability: Reinforcement Learning for Distributed Control of Networked Systems in the Space of Stabilizing Policies | John Cao et.al. | 2512.18540 | translate | read | null |
| 2025-12-20 | When Robots Say No: The Empathic Ethical Disobedience Benchmark | Dmytro Kuzmenko et.al. | 2512.18474 | translate | read | null |
| 2025-12-20 | On the Universality of Transformer Architectures; How Much Attention Is Enough? | Amirreza Abbasi et.al. | 2512.18445 | translate | read | null |
| 2025-12-20 | Learning Semantic Atomic Skills for Multi-Task Robotic Manipulation | Yihang Zhu et.al. | 2512.18368 | translate | read | null |
| 2025-12-20 | Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism | Youssef Mahran et.al. | 2512.18336 | translate | read | null |
| 2025-12-20 | Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC) | Youssef Mahran et.al. | 2512.18333 | translate | read | null |
| 2025-12-20 | Trustworthy and Explainable Deep Reinforcement Learning for Safe and Energy-Efficient Process Control: A Use Case in Industrial Compressed Air Systems | Vincent Bezold et.al. | 2512.18317 | translate | read | null |
| 2025-12-20 | Monitoring Monitorability | Melody Y. Guan et.al. | 2512.18311 | translate | read | null |
| 2025-12-20 | Embedded Safety-Aligned Intelligence via Differentiable Internal Alignment Embeddings | Harsh Rathva et.al. | 2512.18309 | translate | read | null |
| 2025-12-20 | Stable and Efficient Single-Rollout RL for Multimodal Reasoning | Rui Liu et.al. | 2512.18215 | translate | read | null |
| 2025-12-20 | Sophia: A Persistent Agent Framework of Artificial Life | Mingyang Sun et.al. | 2512.18202 | translate | read | null |
| 2025-12-20 | NL2CA: Auto-formalizing Cognitive Decision-Making from Natural Language Using an Unsupervised CriticNL2LTL Framework | Zihao Deng et.al. | 2512.18189 | translate | read | null |
| 2025-12-20 | On Swarm Leader Identification using Probing Policies | Stergios E. Bachoumas et.al. | 2512.18146 | translate | read | null |
| 2025-12-19 | Unifying Causal Reinforcement Learning: Survey, Taxonomy, Algorithms and Applications | Cristiano da Costa Cunha et.al. | 2512.18135 | translate | read | null |
| 2025-12-19 | Towards Autonomous Navigation in Endovascular Interventions | Tudor Jianu et.al. | 2512.18081 | translate | read | null |
| 2025-12-19 | SurgiPose: Estimating Surgical Tool Kinematics from Monocular Video for Surgical Robot Learning | Juo-Tung Chen et.al. | 2512.18068 | translate | read | null |
| 2025-12-19 | ReGal: A First Look at PPO-based Legal AI for Judgment Prediction and Summarization in India | Shubham Kumar Nigam et.al. | 2512.18014 | translate | read | null |
| 2025-12-19 | Adaptive Agents in Spatial Double-Auction Markets: Modeling the Emergence of Industrial Symbiosis | Matthieu Mastio et.al. | 2512.17979 | translate | read | null |
| 2025-12-19 | Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy | Aditya Gahlawat et.al. | 2512.17899 | translate | read | null |
| 2025-12-19 | AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning | Ran Gong et.al. | 2512.17853 | translate | read | null |
| 2025-12-19 | Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes | Carlos Vélez García et.al. | 2512.17846 | translate | read | null |
| 2025-12-19 | NeuRehab: A Reinforcement Learning and Spiking Neural Network-Based Rehab Automation Framework | Phani Pavan Kambhampati et.al. | 2512.17841 | translate | read | null |
| 2025-12-19 | About Time: Model-free Reinforcement Learning with Timed Reward Machines | Anirban Majumdar et.al. | 2512.17637 | translate | read | null |
| 2025-12-19 | Trust-Region Adaptive Policy Optimization | Mingyu Su et.al. | 2512.17636 | translate | read | null |
| 2025-12-19 | SCOPE: Sequential Causal Optimization of Process Interventions | Jakob De Moor et.al. | 2512.17629 | translate | read | null |
| 2025-12-19 | Learning Safe Autonomous Driving Policies Using Predictive Safety Representations | Mahesh Keswani et.al. | 2512.17586 | translate | read | null |
| 2025-12-19 | Kinematics-Aware Diffusion Policy with Consistent 3D Observation and Action Space for Whole-Arm Robotic Manipulation | Kangchen Lv et.al. | 2512.17568 | translate | read | null |
| 2025-12-19 | HydroGym: A Reinforcement Learning Platform for Fluid Dynamics | Christian Lagemann et.al. | 2512.17534 | translate | read | null |
| 2025-12-19 | Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning | Javier Gonzalez-Ruiz et.al. | 2512.17444 | translate | read | null |
| 2025-12-19 | Xiaomi MiMo-VL-Miloco Technical Report | Jiaze Li et.al. | 2512.17436 | translate | read | null |
| 2025-12-19 | TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data | Deqing Liu et.al. | 2512.17370 | translate | read | null |
| 2025-12-19 | Neuro-Symbolic Control with Large Language Models for Language-Guided Spatial Tasks | Momina Liaqat Ali et.al. | 2512.17321 | translate | read | null |
| 2025-12-19 | Large Language Models as Pokémon Battle Agents: Strategic Play and Content Generation | Daksh Jain et.al. | 2512.17308 | translate | read | null |
| 2025-12-19 | Understanding Generalization in Role-Playing Models via Information Theory | Yongqi Li et.al. | 2512.17270 | translate | read | null |
| 2025-12-19 | A Theoretical Analysis of State Similarity Between Markov Decision Processes | Zhenyu Tao et.al. | 2512.17265 | translate | read | null |
| 2025-12-19 | Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience | Jiangjie Chen et.al. | 2512.17260 | translate | read | null |
| 2025-12-19 | Cooperative Energy Scheduling of Multi-Microgrids Based on Risk-Sensitive Reinforcement Learning | Rongxiang Zhang et.al. | 2512.17246 | translate | read | null |
| 2025-12-19 | Learning When to Look: A Disentangled Curriculum for Strategic Perception in Multimodal Reasoning | Siqi Yang et.al. | 2512.17227 | translate | read | null |
| 2025-12-19 | CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency | Xiao Liang et.al. | 2512.17213 | translate | read | null |
| 2025-12-19 | Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs | Rujiao Long et.al. | 2512.17206 | translate | read | null |
| 2025-12-19 | MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation | Shengwei Zhao et.al. | 2512.17194 | translate | read | null |
| 2025-12-19 | MAPPO-LCR: Multi-Agent Proximal Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games | Zhaoqilin Yang et.al. | 2512.17187 | translate | read | null |
| 2025-12-19 | Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots | Gang Zhang et.al. | 2512.17183 | translate | read | null |
| 2025-12-19 | Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors | Maher Mesto et.al. | 2512.17180 | translate | read | null |
| 2025-12-19 | Enhancing AIGC Service Efficiency with Adaptive Multi-Edge Collaboration in A Distributed System | Changfu Xu et.al. | 2512.17158 | translate | read | null |
| 2025-12-19 | Towards Senior-Robot Interaction: Reactive Robot Dog Gestures | Chunyang Meng et.al. | 2512.17136 | translate | read | null |
| 2025-12-19 | Deep Reinforcement Learning-Aided Strategies for Big Data Offloading in Vehicular Networks | Talha Akyildiz et.al. | 2512.17133 | translate | read | null |
| 2025-12-18 | Reinforcement Learning for Self-Improving Agent with Skill Library | Jiongxiao Wang et.al. | 2512.17102 | translate | read | null |
| 2025-12-18 | Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making | Toshiaki Hori et.al. | 2512.17091 | translate | read | null |
| 2025-12-18 | Value Under Ignorance in Universal Artificial Intelligence | Cole Wyeth et.al. | 2512.17086 | translate | read | null |
| 2025-12-18 | UniRel-R1: RL-tuned LLM Reasoning for Knowledge Graph Relational Question Answering | Yinxu Tang et.al. | 2512.17043 | translate | read | null |
| 2025-12-18 | GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning | Chang-Hwan Lee et.al. | 2512.17034 | translate | read | null |
| 2025-12-18 | Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification | Qihao Liu et.al. | 2512.16921 | translate | read | null |
| 2025-12-18 | AdaTooler-V: Adaptive Tool-Use for Images and Videos | Chaoyang Wang et.al. | 2512.16918 | translate | read | null |
| 2025-12-18 | Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning | Qihao Liu et.al. | 2512.16917 | translate | read | null |
| 2025-12-18 | Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward | Peter Chen et.al. | 2512.16912 | translate | read | null |
| 2025-12-18 | Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning | Andrew Wagenmaker et.al. | 2512.16911 | translate | read | null |
| 2025-12-18 | MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning | Yuanchen Ju et.al. | 2512.16909 | translate | read | null |
| 2025-12-18 | AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning | Tzu-Han Lin et.al. | 2512.16883 | translate | read | null |
| 2025-12-18 | A survey of the orienteering problem: model evolution, algorithmic advances, and future directions | Songhao Shen et.al. | 2512.16865 | translate | read | null |
| 2025-12-18 | RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing | Tianyuan Qu et.al. | 2512.16864 | translate | read | null |
| 2025-12-18 | ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning | Zihan Zhou et.al. | 2512.16861 | translate | read | null |
| 2025-12-18 | Meta-RL Induces Exploration in Language Agents | Yulun Jiang et.al. | 2512.16848 | translate | read | null |
| 2025-12-18 | Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning | Bahman Abolhassani et.al. | 2512.16813 | translate | read | null |
| 2025-12-18 | Olaf: Bringing an Animated Character to Life in the Physical World | David Müller et.al. | 2512.16705 | translate | read | null |
| 2025-12-18 | JustRL: Scaling a 1.5B LLM with a Simple RL Recipe | Bingxiang He et.al. | 2512.16649 | translate | read | null |
| 2025-12-18 | Implementing a Sharia Chatbot as a Consultation Medium for Questions About Islam | Wisnu Uriawan et.al. | 2512.16644 | translate | read | null |
| 2025-12-18 | Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game | Barna Pásztor et.al. | 2512.16626 | translate | read | null |
| 2025-12-18 | Non-Asymptotic Global Convergence of PPO-Clip | Yin Liu et.al. | 2512.16565 | translate | read | null |
| 2025-12-18 | ParamExplorer: A framework for exploring parameters in generative art | Julien Gachadoat et.al. | 2512.16529 | translate | read | null |
| 2025-12-18 | Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment | Yuan Li et.al. | 2512.16484 | translate | read | null |
| 2025-12-18 | E-SDS: Environment-aware See it, Do it, Sorted - Automated Environment-Aware Reinforcement Learning for Humanoid Locomotion | Enis Yalcin et.al. | 2512.16446 | translate | read | null |
| 2025-12-18 | StarCraft+: Benchmarking Multi-agent Algorithms in Adversary Paradigm | Yadong Li et.al. | 2512.16444 | translate | read | null |
| 2025-12-18 | NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning | Ruifeng Xu et.al. | 2512.16408 | translate | read | null |
| 2025-12-18 | Hypernetworks That Evolve Themselves | Joachim Winther Pedersen et.al. | 2512.16406 | translate | read | null |
| 2025-12-18 | Machine Learning-based Optimal Control for Colloidal Self-Assembly | Andres Lizano-Villalobos et.al. | 2512.16402 | translate | read | null |
| 2025-12-18 | ManiLong-Shot: Interaction-Aware One-Shot Imitation Learning for Long-Horizon Manipulation | Zixuan Chen et.al. | 2512.16302 | translate | read | null |
| 2025-12-18 | Simultaneous Secrecy and Covert Communications (SSACC) in Mobility-Aware RIS-Aided Networks | Yanyu Cheng et.al. | 2512.16224 | translate | read | null |
| 2025-12-18 | Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation | Sarosij Bose et.al. | 2512.16201 | translate | read | null |
| 2025-12-18 | MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation | Pengyu Wang et.al. | 2512.16145 | translate | read | null |
| 2025-12-18 | INTELLECT-3: Technical Report | Prime Intellect Team et.al. | 2512.16144 | translate | read | null |
| 2025-12-17 | Techno-economic optimization of a heat-pipe microreactor, part I: theory and cost optimization | Paul Seurin et.al. | 2512.16032 | translate | read | null |
| 2025-12-17 | Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models | Caner Erden et.al. | 2512.15973 | translate | read | null |
| 2025-12-17 | Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning | Polaris Jhandi et.al. | 2512.15943 | translate | read | null |
| 2025-12-17 | DSO: Direct Steering Optimization for Bias Mitigation | Lucas Monteiro Paes et.al. | 2512.15926 | translate | read | null |
| 2025-12-15 | Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT) | Akhil Sharma et.al. | 2512.15790 | translate | read | null |
| 2025-12-17 | Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning | Zhenwen Liang et.al. | 2512.15687 | translate | read | null |
| 2025-12-17 | Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning | Jiaqi Xu et.al. | 2512.15662 | translate | read | null |
| 2025-12-17 | Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction | Mathieu Blondel et.al. | 2512.15605 | translate | read | null |
| 2025-12-17 | Deep Reinforcement Learning for EH-Enabled Cognitive-IoT Under Jamming Attacks | Nadia Abdolkhani et.al. | 2512.15558 | translate | read | null |
| 2025-12-17 | Autonomous Pressure Control in MuVacAS via Deep Reinforcement Learning and Deep Learning Surrogate Models | Guillermo Rodriguez-Llorente et.al. | 2512.15521 | translate | read | null |
| 2025-12-17 | Double Horizon Model-Based Policy Optimization | Akihiro Kubo et.al. | 2512.15439 | translate | read | null |
| 2025-12-17 | FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments | Quanxi Zhou et.al. | 2512.15430 | translate | read | null |
| 2025-12-17 | Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods | Ji Zhou et.al. | 2512.15422 | translate | read | null |
| 2025-12-17 | EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning | Jianfei Ma et.al. | 2512.15405 | translate | read | null |
| 2025-12-17 | Graph Contextual Reinforcement Learning for Efficient Directed Controller Synthesis | Toshihide Ubukata et.al. | 2512.15295 | translate | read | null |
| 2025-12-17 | Learning-Based Phase Shift Optimization of Liquid Crystal RIS in Dynamic mmWave Networks | Le Hao et.al. | 2512.15279 | translate | read | null |
| 2025-12-17 | Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning | Yiliu Sun et.al. | 2512.15274 | translate | read | null |
| 2025-12-17 | EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence | Jiaxu Wan et.al. | 2512.15160 | translate | read | null |
| 2025-12-17 | Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning | Weiqin Wang et.al. | 2512.15146 | translate | read | null |
| 2025-12-17 | Automatic Reward Shaping from Multi-Objective Human Heuristics | Yuqing Xie et.al. | 2512.15120 | translate | read | null |
| 2025-12-17 | QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management | Jiayang Wan et.al. | 2512.15119 | translate | read | null |
| 2025-12-17 | Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models | Jinwu Hu et.al. | 2512.15089 | translate | read | null |
| 2025-12-17 | Deep Reinforcement Learning for Joint Time and Power Management in SWIPT-EH CIoT | Nadia Abdolkhani et.al. | 2512.15062 | translate | read | null |
| 2025-12-17 | Spectral Representation-based Reinforcement Learning | Chenxiao Gao et.al. | 2512.15036 | translate | read | null |
| 2025-12-17 | ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision | Wenlong Xia et.al. | 2512.15020 | translate | read | null |
| 2025-12-17 | Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management | E. C. Garrido-Merchán et.al. | 2512.14992 | translate | read | null |
| 2025-12-17 | Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes | Hanqing Jin et.al. | 2512.14991 | translate | read | null |
| 2025-12-16 | Puzzle Curriculum GRPO for Vision-Centric Reasoning | Ahmadreza Jeddi et.al. | 2512.14944 | translate | read | null |
| 2025-12-16 | Imitation Learning for Multi-turn LM Agents via On-policy Expert Corrections | Niklas Lauffer et.al. | 2512.14895 | translate | read | null |
| 2025-12-16 | Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse | Jingwei Chen et.al. | 2512.14879 | translate | read | null |
| 2025-12-16 | TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs | Jun Zhang et.al. | 2512.14698 | translate | read | link |
| 2025-12-16 | CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives | Zihan Wang et.al. | 2512.14696 | translate | read | link |
| 2025-12-16 | Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes | Alessandro Trapasso et.al. | 2512.14617 | translate | read | null |
| 2025-12-16 | RecGPT-V2 Technical Report | Chao Yi et.al. | 2512.14503 | translate | read | null |
| 2025-12-16 | Hybrid Cognitive IoT with Cooperative Caching and SWIPT-EH: A Hierarchical Reinforcement Learning Framework | Nadia Abdolkhani et.al. | 2512.14488 | translate | read | null |
| 2025-12-16 | Context-Picker: Dynamic context selection using multi-stage reinforcement learning | Siyuan Zhu et.al. | 2512.14465 | translate | read | null |
| 2025-12-16 | A data-physics hybrid generative model for patient-specific post-stroke motor rehabilitation using wearable sensor data | Yanning Dai et.al. | 2512.14329 | translate | read | null |
| 2025-12-16 | Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations | Xudong Han et.al. | 2512.14321 | translate | read | null |
| 2025-12-16 | A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks | Agrippina Mwangi et.al. | 2512.14297 | translate | read | null |
| 2025-12-16 | GLM-TTS Technical Report | Jiayan Cui et.al. | 2512.14291 | translate | read | link |
| 2025-12-16 | Understanding and Improving Hyperbolic Deep Reinforcement Learning | Timo Klein et.al. | 2512.14202 | translate | read | link |
| 2025-12-16 | Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis | Yankai Jiang et.al. | 2512.14157 | translate | read | null |
| 2025-12-16 | A First-Order Logic-Based Alternative to Reward Models in RLHF | Chunjin Jian et.al. | 2512.14100 | translate | read | null |
| 2025-12-16 | RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees | Junjie Ma et.al. | 2512.14069 | translate | read | null |
| 2025-12-16 | Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning | Amir M. Soufi Enayati et.al. | 2512.14057 | translate | read | null |
| 2025-12-16 | OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving | Zhenguo Zhang et.al. | 2512.14044 | translate | read | null |
| 2025-12-16 | Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model | Zhaofeng Hu et.al. | 2512.14031 | translate | read | null |
| 2025-12-16 | Cooperative Caching Towards Efficient Spectrum Utilization in Cognitive-IoT Networks | Nadia Abdolkhani et.al. | 2512.14029 | translate | read | null |
| 2025-12-16 | Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks | Nadia Abdolkhani et.al. | 2512.14013 | translate | read | null |
| 2025-12-15 | Adaptive digital twins for predictive decision-making: Online Bayesian learning of transition dynamics | Eugenio Varetti et.al. | 2512.13919 | translate | read | null |
| 2025-12-15 | Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences | Charles Marrder et.al. | 2512.13890 | translate | read | null |
| 2025-12-15 | SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning | Jitesh Jain et.al. | 2512.13874 | translate | read | link |
| 2025-12-15 | Explainable reinforcement learning from human feedback to improve alignment | Shicheng Liu et.al. | 2512.13837 | translate | read | null |
| 2025-12-13 | RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing | Yuhan Tang et.al. | 2512.13727 | translate | read | null |
| 2025-12-13 | Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce | Sayak Chakrabarty et.al. | 2512.13726 | translate | read | null |
| 2025-12-15 | AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection | Junwen Miao et.al. | 2512.13671 | translate | read | null |
| 2025-12-15 | A Scientific Reasoning Model for Organic Synthesis Procedure Generation | Guoqing Liu et.al. | 2512.13668 | translate | read | null |
| 2025-12-15 | Advancing Machine Learning Optimization of Chiral Photonic Metasurface: Comparative Study of Neural Network and Genetic Algorithm Approaches | Davide Filippozzi et.al. | 2512.13656 | translate | read | null |
| 2025-12-15 | MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning | Haoyu Fu et.al. | 2512.13636 | translate | read | null |
| 2025-12-15 | SCR2-ST: Combine Single Cell with Spatial Transcriptomics for Efficient Active Sampling via Reinforcement Learning | Junchao Zhu et.al. | 2512.13635 | translate | read | null |
| 2025-12-15 | Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models | Boxin Wang et.al. | 2512.13607 | translate | read | null |
| 2025-12-15 | Image Diffusion Preview with Consistency Solver | Fu-Yun Wang et.al. | 2512.13592 | translate | read | link |
| 2025-12-15 | MMhops-R1: Multimodal Multi-hop Reasoning | Tao Zhang et.al. | 2512.13573 | translate | read | null |
| 2025-12-15 | Memory in the Age of AI Agents | Yuyang Hu et.al. | 2512.13564 | translate | read | link |
| 2025-12-15 | How Low Can You Go? The Data-Light SE Challenge | Kishan Kumar Ganguly et.al. | 2512.13524 | translate | read | null |
| 2025-12-15 | Reinforcement Learning based 6-DoF Maneuvers for Microgravity Intravehicular Docking: A Simulation Study with Int-Ball2 in ISS-JEM | Aman Arora et.al. | 2512.13514 | translate | read | null |
| 2025-12-15 | MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph | Linjie Mu et.al. | 2512.13510 | translate | read | null |
| 2025-12-15 | Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model | Heyi Chen et.al. | 2512.13507 | translate | read | null |
| 2025-12-15 | Differentiable Evolutionary Reinforcement Learning | Sitao Cheng et.al. | 2512.13399 | translate | read | null |
| 2025-12-15 | QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution | Mohammad Reza Fasihi et.al. | 2512.13393 | translate | read | null |
| 2025-12-15 | Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning | Chuan Mao et.al. | 2512.13380 | translate | read | null |
| 2025-12-15 | Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles | Sümer Tunçay et.al. | 2512.13359 | translate | read | null |
| 2025-12-15 | Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3) | Zeyad Gamal et.al. | 2512.13356 | translate | read | null |
| 2025-12-15 | Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration | Hao Fu et.al. | 2512.13293 | translate | read | null |
| 2025-12-15 | AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning | Jiaru Zou et.al. | 2512.13278 | translate | read | null |
| 2025-12-15 | SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling | Muhammad Alfian Amrizal et.al. | 2512.13268 | translate | read | null |
| 2025-12-15 | Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving | Hyunki Seong et.al. | 2512.13262 | translate | read | null |
| 2025-12-15 | Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection | Zihui Zhao et.al. | 2512.13240 | translate | read | null |
| 2025-12-15 | SACn: Soft Actor-Critic with n-step Returns | Jakub Łyskawa et.al. | 2512.13165 | translate | read | null |
| 2025-12-15 | SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning | Emre Can Acikgoz et.al. | 2512.13159 | translate | read | null |
| 2025-12-15 | TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning | Shenzhi Yang et.al. | 2512.13106 | translate | read | null |
| 2025-12-15 | Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures | Mohammad Walid Charrwi et.al. | 2512.13096 | translate | read | null |
| 2025-12-15 | ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning | Feng Zhang et.al. | 2512.13095 | translate | read | null |
| 2025-12-15 | Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation | Xiang Li et.al. | 2512.13094 | translate | read | null |
| 2025-12-15 | PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations | Mingqi Yuan et.al. | 2512.13093 | translate | read | null |
| 2025-12-15 | M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization | Bizhe Bai et.al. | 2512.13070 | translate | read | null |
| 2025-12-15 | Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments | Kangning Gao et.al. | 2512.13060 | translate | read | null |
| 2025-12-15 | GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training | Tong Wei et.al. | 2512.13043 | translate | read | null |
| 2025-12-15 | What Happens Next? Next Scene Prediction with a Unified Video Model | Xinjie Li et.al. | 2512.13015 | translate | read | null |
| 2025-12-15 | Learning Terrain Aware Bipedal Locomotion via Reduced Dimensional Perceptual Representations | Guillermo A. Castillo et.al. | 2512.12993 | translate | read | null |
| 2025-12-15 | Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning | Amin Jalal Aghdasian et.al. | 2512.12987 | translate | read | null |
| 2025-12-15 | QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management | Weizhou Shen et.al. | 2512.12967 | translate | read | link |
| 2025-12-15 | Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals | Gagan Deep et.al. | 2512.12924 | translate | read | null |
| 2025-12-15 | LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization | Bangyu Li et.al. | 2512.12922 | translate | read | null |
| 2025-12-15 | Meta-GPT: Decoding the Metasurface Genome with Generative Artificial Intelligence | David Dang et.al. | 2512.12888 | translate | read | null |
| 2025-12-14 | Information-Consistent Language Model Recommendations through Group Relative Policy Optimization | Sonal Prabhune et.al. | 2512.12858 | translate | read | null |
| 2025-12-14 | MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems | Patrick Kostelac et.al. | 2512.12855 | translate | read | null |
| 2025-12-14 | Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks | Dong Liu et.al. | 2512.12803 | translate | read | null |
| 2025-12-14 | CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning | Xuanzhang Liu et.al. | 2512.12716 | translate | read | null |
| 2025-12-14 | Self-Motivated Growing Neural Network for Adaptive Architecture via Local Structural Plasticity | Yiyang Jia et.al. | 2512.12713 | translate | read | null |
| 2025-12-14 | Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning | Enhong Mu et.al. | 2512.12706 | translate | read | null |
| 2025-12-14 | Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning | Yongcan Yu et.al. | 2512.12690 | translate | read | null |
| 2025-12-14 | CogDoc: Towards Unified thinking in Documents | Qixin Xu et.al. | 2512.12658 | translate | read | null |
| 2025-12-14 | Coupled Variational Reinforcement Learning for Language Model General Reasoning | Xueru Wen et.al. | 2512.12576 | translate | read | null |
| 2025-12-14 | World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents | Yesid Fonseca et.al. | 2512.12548 | translate | read | null |
| 2025-12-13 | Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings | Shengkai Xu et.al. | 2512.12492 | translate | read | null |
| 2025-12-13 | More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models | Hoang Anh Just et.al. | 2512.12487 | translate | read | null |
| 2025-12-13 | HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments | Yongjun He et.al. | 2512.12476 | translate | read | null |
| 2025-12-13 | Sim2Real Reinforcement Learning for Soccer skills | Jonathan Spraggett et.al. | 2512.12437 | translate | read | link |
| 2025-12-13 | Deep Hedging with Reinforcement Learning: A Practical Framework for Option Risk Management | Travon Lucius et.al. | 2512.12420 | translate | read | null |
| 2025-12-13 | ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) Systems | Babak Badnava et.al. | 2512.12366 | translate | read | null |
| 2025-12-13 | The Role of AI in Modern Penetration Testing | J. Alexander Curtis et.al. | 2512.12326 | translate | read | null |
| 2025-12-13 | A Conflict-Aware Resource Management Framework for the Computing Continuum | Vlad Popescu-Vifor et.al. | 2512.12299 | translate | read | null |
| 2025-12-13 | Moment and Highlight Detection via MLLM Frame Segmentation | I Putu Andika Bagas Jiwanta et.al. | 2512.12246 | translate | read | null |
| 2025-12-13 | Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy | Jonathan Spraggett et.al. | 2512.12230 | translate | read | link |
| 2025-12-12 | Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning | Vittorio Giammarino et.al. | 2512.12046 | translate | read | null |
| 2025-12-12 | Policy Gradient Algorithms for Age-of-Information Cost Minimization | José-Ramón Vidal et.al. | 2512.11990 | translate | read | null |
| 2025-12-12 | Learning to Extract Context for Context-Aware LLM Inference | Minseon Kim et.al. | 2512.11986 | translate | read | null |
| 2025-12-12 | A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach | Jia Hu et.al. | 2512.11944 | translate | read | null |
| 2025-12-12 | Evolutionary Reinforcement Learning based AI tutor for Socratic Interdisciplinary Instruction | Mei Jiang et.al. | 2512.11930 | translate | read | null |
| 2025-12-12 | AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis | Junjie Ye et.al. | 2512.11797 | translate | read | null |
| 2025-12-12 | Agile Flight Emerges from Multi-Agent Competitive Racing | Vineet Pasumarti et.al. | 2512.11781 | translate | read | null |
| 2025-12-12 | SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support | Yuming Feng et.al. | 2512.11755 | translate | read | null |
| 2025-12-12 | UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations | Tingyu Yuan et.al. | 2512.11609 | translate | read | null |
| 2025-12-12 | DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry | Zhenyang Cai et.al. | 2512.11558 | translate | read | null |
| 2025-12-12 | Rethinking Expert Trajectory Utilization in LLM Post-training | Bowen Ding et.al. | 2512.11470 | translate | read | link |
| 2025-12-12 | Three methods, one problem: Classical and AI approaches to no-three-in-line | Pranav Ramanathan et.al. | 2512.11469 | translate | read | null |
| 2025-12-12 | Towards Trustworthy Multi-Turn LLM Agents via Behavioral Guidance | Gonca Gürsun et.al. | 2512.11421 | translate | read | null |
| 2025-12-12 | Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization | Yifan Niu et.al. | 2512.11391 | translate | read | null |
| 2025-12-12 | Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits | Minwoo Park et.al. | 2512.11345 | translate | read | null |
| 2025-12-12 | DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning | Jinming Ge et.al. | 2512.11342 | translate | read | null |
| 2025-12-12 | RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training | Tianyuan Wu et.al. | 2512.11306 | translate | read | null |
| 2025-12-12 | When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents | Mrinal Rawat et.al. | 2512.11277 | translate | read | null |
| 2025-12-12 | A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation | Hong Je-Gal et.al. | 2512.11270 | translate | read | null |
| 2025-12-12 | Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control | Iftekharul Islam et.al. | 2512.11247 | translate | read | null |
| 2025-12-11 | Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning | Wei Duan et.al. | 2512.11179 | translate | read | null |
| 2025-12-11 | Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance | Tzu-Hsien Lee et.al. | 2512.11173 | translate | read | null |
| 2025-12-11 | CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound | Akhil S Anand et.al. | 2512.11169 | translate | read | null |
| 2025-12-11 | Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts | Guanli Liu et.al. | 2512.11161 | translate | read | null |
| 2025-12-11 | In-Context Multi-Objective Optimization | Xinyu Zhang et.al. | 2512.11114 | translate | read | null |
| 2025-12-11 | Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation | Yiwen Tang et.al. | 2512.10949 | translate | read | link |
| 2025-12-11 | Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit | Zamirddine Mari et.al. | 2512.10934 | translate | read | null |
| 2025-12-11 | Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation | Zamirddine Mari et.al. | 2512.10925 | translate | read | null |
| 2025-12-11 | Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies | Mohammad Rezoanul Hoque et.al. | 2512.10913 | translate | read | null |
| 2025-12-11 | Iterative Compositional Data Generation for Robot Control | Anh-Quan Pham et.al. | 2512.10891 | translate | read | null |
| 2025-12-11 | Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments | Atahan Cilan et.al. | 2512.10835 | translate | read | null |
| 2025-12-11 | OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification | Zijian Wu et.al. | 2512.10756 | translate | read | null |
| 2025-12-11 | Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification | Maya Swisa et.al. | 2512.10747 | translate | read | null |
| 2025-12-11 | Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving | Songyang Gao et.al. | 2512.10739 | translate | read | null |
| 2025-12-11 | How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning | Jianbo Wang et.al. | 2512.10698 | translate | read | null |
| 2025-12-11 | Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning | Benjamin Gundersen et.al. | 2512.10691 | translate | read | null |
| 2025-12-11 | AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence | Bo Yang et.al. | 2512.10624 | translate | read | null |
| 2025-12-11 | Multi-Objective Reward and Preference Optimization: Theory and Algorithms | Akhil Agnihotri et.al. | 2512.10601 | translate | read | null |
| 2025-12-11 | Grounding Everything in Tokens for Multimodal Large Language Models | Xiangxuan Ren et.al. | 2512.10554 | translate | read | null |
| 2025-12-11 | Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning | Haiteng Zhao et.al. | 2512.10534 | translate | read | null |
| 2025-12-11 | Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning | Chihyeon Song et.al. | 2512.10510 | translate | read | null |
| 2025-12-11 | UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning | Jiaxi Wu et.al. | 2512.10492 | translate | read | null |
| 2025-12-11 | Shot and Architecture Adaptive Subspace Variational Quantum Eigensolver for Microwave Simulation | Zhixiu Han et.al. | 2512.10458 | translate | read | null |
| 2025-12-11 | HypeR Adaptivity: Joint $hr$ -Adaptive Meshing via Hypergraph Multi-Agent Deep Reinforcement Learning | Niccolò Grillo et.al. | 2512.10439 | translate | read | null |
| 2025-12-11 | Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention | Yang Yu et.al. | 2512.10414 | translate | read | null |
| 2025-12-11 | A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale | Vinoth Punniyamoorthy et.al. | 2512.10341 | translate | read | null |
| 2025-12-11 | Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters | Shruti Dongare et.al. | 2512.10271 | translate | read | null |
| 2025-12-11 | Multi-dimensional Preference Alignment by Conditioning Reward Itself | Jiho Jang et.al. | 2512.10237 | translate | read | null |
| 2025-12-11 | Task-Oriented Grasping Using Reinforcement Learning with a Contextual Reward Machine | Hui Li et.al. | 2512.10235 | translate | read | null |
| 2025-12-11 | Latent Chain-of-Thought World Modeling for End-to-End Driving | Shuhan Tan et.al. | 2512.10226 | translate | read | null |
| 2025-12-11 | An exploration for higher efficiency in multi objective optimisation with reinforcement learning | Mehmet Emin Aydin et.al. | 2512.10208 | translate | read | null |
| 2025-12-10 | Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation | Pol Mestres et.al. | 2512.10118 | translate | read | null |
| 2025-12-10 | Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation | Steven Caro et.al. | 2512.10099 | translate | read | null |
| 2025-12-10 | SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation | Jongmin Lee et.al. | 2512.10042 | translate | read | null |
| 2025-12-10 | Diffusion Is Your Friend in Show, Suggest and Tell | Jia Cheng Hu et.al. | 2512.10038 | translate | read | null |
| 2025-12-10 | Latent Action World Models for Control with Unlabeled Trajectories | Marvin Alles et.al. | 2512.10016 | translate | read | null |
| 2025-12-10 | TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0 | Jinyu Chen et.al. | 2512.09961 | translate | read | null |
| 2025-12-10 | STACHE: Local Black-Box Explanations for Reinforcement Learning Policies | Andrew Elashkin et.al. | 2512.09909 | translate | read | null |
| 2025-12-10 | FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning | Khurram Khalil et.al. | 2512.09872 | translate | read | null |
| 2025-12-10 | Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation | Yuyang Li et.al. | 2512.09851 | translate | read | link |
| 2025-12-10 | ChronusOmni: Improving Time Awareness of Omni Large Language Models | Yijing Chen et.al. | 2512.09841 | translate | read | null |
| 2025-12-10 | RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning | Khurram Khalil et.al. | 2512.09829 | translate | read | null |
| 2025-12-10 | Prefrontal scaling of reward prediction error readout gates reinforcement-derived adaptive behavior in primates | Tian Sang et.al. | 2512.09761 | translate | read | null |
| 2025-12-10 | MOA: Multi-Objective Alignment for Role-Playing Agents | Chonghua Liao et.al. | 2512.09756 | translate | read | null |
| 2025-12-10 | Flexible Reconfigurable Intelligent Surface-Aided Covert Communications in UAV Networks | Chong Huang et.al. | 2512.09714 | translate | read | null |
| 2025-12-10 | Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning | Kaichen He et.al. | 2512.09706 | translate | read | null |
| 2025-12-10 | Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies | Mika Persson et.al. | 2512.09682 | translate | read | null |
| 2025-12-10 | d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models | Leyi Pan et.al. | 2512.09675 | translate | read | null |
| 2025-12-10 | SynthPix: A lightspeed PIV images generator | Antonio Terpin et.al. | 2512.09664 | translate | read | null |
| 2025-12-10 | Mastering Diverse, Unknown, and Cluttered Tracks for Robust Vision-Based Drone Racing | Feng Yu et.al. | 2512.09571 | translate | read | null |
| 2025-12-10 | Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search | Junkai Ji et.al. | 2512.09566 | translate | read | null |
| 2025-12-10 | REASAN: Learning Reactive Safe Navigation for Legged Robots | Qihao Yuan et.al. | 2512.09537 | translate | read | null |
| 2025-12-10 | RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning | Yucan Guo et.al. | 2512.09487 | translate | read | null |
| 2025-12-10 | Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation | Jialin Ying et.al. | 2512.09410 | translate | read | null |
| 2025-12-10 | CFLight: Enhancing Safety with Traffic Signal Control through Counterfactual Learning | Mingyuan Li et.al. | 2512.09368 | translate | read | null |
| 2025-12-10 | COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning | Lin Li et.al. | 2512.09349 | translate | read | null |
| 2025-12-10 | Tyche: A Hybrid Computation Framework of Illumination Pattern for Satellite Beam Hopping | Ziheng Yang et.al. | 2512.09312 | translate | read | null |
| 2025-12-10 | One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation | Huayi Zhou et.al. | 2512.09297 | translate | read | null |
| 2025-12-10 | Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning | Ruonan Pi et.al. | 2512.09293 | translate | read | null |
| 2025-12-10 | Exploratory Mean-Variance with Jumps: An Equilibrium Approach | Yuling Max Chen et.al. | 2512.09224 | translate | read | null |
| 2025-12-09 | Learning Unmasking Policies for Diffusion Language Models | Metod Jazbec et.al. | 2512.09106 | translate | read | null |
| 2025-12-09 | Masked Generative Policy for Robotic Control | Lipeng Zhuang et.al. | 2512.09101 | translate | read | null |
| 2025-12-09 | No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers | Damiano Marsili et.al. | 2512.08889 | translate | read | null |
| 2025-12-09 | IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams | Ryan LeRoy et.al. | 2512.08877 | translate | read | null |
| 2025-12-09 | Reinforcement Learning From State and Temporal Differences | Lex Weaver et.al. | 2512.08855 | translate | read | null |
| 2025-12-09 | Optimal navigation in two-dimensional regular and turbulent flows | Vladimir Parfenyev et.al. | 2512.08766 | translate | read | null |
| 2025-12-09 | Learning and Editing Universal Graph Prompt Tuning via Reinforcement Learning | Jinfeng Xu et.al. | 2512.08763 | translate | read | null |
| 2025-12-09 | Direct transfer of optimized controllers to similar systems using dimensionless MPC | Josip Kir Hromatko et.al. | 2512.08667 | translate | read | null |
| 2025-12-09 | Sim2Swim: Zero-Shot Velocity Control for Agile AUV Maneuvering in 3 Minutes | Lauritz Rismark Fosso et.al. | 2512.08656 | translate | read | null |
| 2025-12-09 | Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis | Orit Davidovich et.al. | 2512.08601 | translate | read | null |
| 2025-12-09 | Mind to Hand: Purposeful Robotic Control via Embodied Reasoning | Peijun Tang et.al. | 2512.08580 | translate | read | null |
| 2025-12-09 | Thinking with Images via Self-Calling Agent | Wenxi Yang et.al. | 2512.08511 | translate | read | link |
| 2025-12-09 | Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning | Junnan Qiu et.al. | 2512.08485 | translate | read | null |
| 2025-12-09 | Using reinforcement learning to probe the role of feedback in skill acquisition | Antonio Terpin et.al. | 2512.08463 | translate | read | null |
| 2025-12-09 | From Accuracy to Impact: The Impact-Driven AI Framework (IDAIF) for Aligning Engineering Architecture with Theory of Change | Yong-Woon Kim et.al. | 2512.08449 | translate | read | null |
| 2025-12-09 | Turning Threat into Opportunity: DRL-Powered Anti-Jamming via Energy Harvesting in UAV-Disrupted Channels | Ngoc-Tan Nguyen et.al. | 2512.08351 | translate | read | null |
| 2025-12-09 | Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks | Thai Duong Nguyen et.al. | 2512.08341 | translate | read | null |
| 2025-12-09 | Collaborative Intelligence for UAV-Satellite Network Slicing: Towards a Joint QoS-Energy-Fairness MADRL Optimization | Thanh-Dao Nguyen et.al. | 2512.08322 | translate | read | null |
| 2025-12-09 | rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection | Sijia Chen et.al. | 2512.08300 | translate | read | null |
| 2025-12-09 | Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventions | Eunice Yiu et.al. | 2512.08230 | translate | read | null |
| 2025-12-09 | Primal-dual policy learning for mean-field stochastic LQR problem | Xiushan Jiang et.al. | 2512.08205 | translate | read | null |
| 2025-12-09 | TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models | Zheng Ding et.al. | 2512.08153 | translate | read | null |
| 2025-12-09 | Robust Agents in Open-Ended Worlds | Mikayel Samvelyan et.al. | 2512.08139 | translate | read | null |
| 2025-12-09 | Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward | Sampriti Soor et.al. | 2512.08131 | translate | read | null |
| 2025-12-08 | Scalable Offline Model-Based RL with Action Chunks | Kwanyoung Park et.al. | 2512.08108 | translate | read | null |
| 2025-12-08 | Training LLMs for Honesty via Confessions | Manas Joglekar et.al. | 2512.08093 | translate | read | null |
| 2025-12-08 | An Introduction to Deep Reinforcement and Imitation Learning | Pedro Santana et.al. | 2512.08052 | translate | read | null |
| 2025-12-08 | F2: Offline Reinforcement Learning for Hamiltonian Simulation via Free-Fermionic Subroutine Compilation | Ethan Decker et.al. | 2512.08023 | translate | read | null |
| 2025-12-08 | Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care | Aryaman Bansal et.al. | 2512.08012 | translate | read | null |
| 2025-12-08 | VLD: Visual Language Goal Distance for Reinforcement Learning Navigation | Lazar Milikic et.al. | 2512.07976 | translate | read | null |
| 2025-12-08 | Agentic Artificial Intelligence for Ethical Cybersecurity in Uganda: A Reinforcement Learning Framework for Threat Detection in Resource-Constrained Environments | Ibrahim Adabara et.al. | 2512.07909 | translate | read | null |
| 2025-12-08 | An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning | Lukas Johannes Möller et.al. | 2512.07827 | translate | read | null |
| 2025-12-08 | On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models | Charlie Zhang et.al. | 2512.07783 | translate | read | null |
| 2025-12-08 | RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models | Xiqiao Xiong et.al. | 2512.07761 | translate | read | null |
| 2025-12-08 | DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving | Jialv Zou et.al. | 2512.07745 | translate | read | null |
| 2025-12-08 | SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery | Meng Cao et.al. | 2512.07733 | translate | read | null |
| 2025-12-08 | Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE | Anxiang Zeng et.al. | 2512.07710 | translate | read | null |
| 2025-12-08 | Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks | Aileen Liao et.al. | 2512.07697 | translate | read | null |
| 2025-12-08 | The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds | Shahar Lutati et.al. | 2512.07631 | translate | read | null |
| 2025-12-08 | Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement | Yongsheng Lian et.al. | 2512.07611 | translate | read | null |
| 2025-12-08 | Understanding Individual Decision-Making in Multi-Agent Reinforcement Learning: A Dynamical Systems Approach | James Rudd-Jones et.al. | 2512.07588 | translate | read | null |
| 2025-12-08 | ReLaX: Reasoning with Latent Exploration for Large Reasoning Models | Shimin Zhang et.al. | 2512.07558 | translate | read | null |
| 2025-12-08 | Model-Based Reinforcement Learning Under Confounding | Nishanth Venkatesh et.al. | 2512.07528 | translate | read | null |
| 2025-12-08 | How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations | JV Roig et.al. | 2512.07497 | translate | read | null |
| 2025-12-08 | Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization | Zhuoran Zhuang et.al. | 2512.07478 | translate | read | null |
| 2025-12-08 | Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction | Haolin Song et.al. | 2512.07464 | translate | read | null |
| 2025-12-08 | Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning | Tong Wu et.al. | 2512.07461 | translate | read | null |
| 2025-12-08 | From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models | Clarisse Bardiot et.al. | 2512.07452 | translate | read | null |
| 2025-12-08 | KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models | Chenwei Shi et.al. | 2512.07437 | translate | read | null |
| 2025-12-08 | Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models | Haidong Kang et.al. | 2512.07419 | translate | read | null |
| 2025-12-08 | Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning | Giray Önür et.al. | 2512.07417 | translate | read | null |
| 2025-12-08 | Training Language Models to Use Prolog as a Tool | Niklas Mellgren et.al. | 2512.07407 | translate | read | null |
| 2025-12-08 | Control and Reinforcement Learning through the Lens of Optimization: An Algorithmic Perspective | Tolga Ok et.al. | 2512.07377 | translate | read | null |
| 2025-12-08 | ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning | Byungju Kim et.al. | 2512.07371 | translate | read | null |
| 2025-12-08 | Multi-Rigid-Body Approximation of Human Hands with Application to Digital Twin | Bin Zhao et.al. | 2512.07359 | translate | read | null |
| 2025-12-08 | PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning | Chen Gong et.al. | 2512.07342 | translate | read | null |
| 2025-12-08 | RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation | Zhi Rao et.al. | 2512.07273 | translate | read | null |
| 2025-12-08 | SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks | Florian Tretter et.al. | 2512.07266 | translate | read | null |
| 2025-12-08 | Benchmarking Humanoid Imitation Learning with Motion Difficulty | Zhaorui Meng et.al. | 2512.07248 | translate | read | null |
| 2025-12-08 | Towards Robust Protective Perturbation against DeepFake Face Swapping | Hengyang Yao et.al. | 2512.07228 | translate | read | null |
| 2025-12-08 | Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation | Zhaoyang Liu et.al. | 2512.07212 | translate | read | null |
| 2025-12-08 | MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning | Xuhui Zheng et.al. | 2512.07203 | translate | read | null |
| 2025-12-08 | Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction | Zhen Huang et.al. | 2512.07200 | translate | read | null |
| 2025-12-08 | Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models | Fenghua Weng et.al. | 2512.07141 | translate | read | null |
| 2025-12-08 | TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning | Zebin Xing et.al. | 2512.07135 | translate | read | null |
| 2025-12-08 | Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots | Jue Wang et.al. | 2512.07114 | translate | read | null |
| 2025-12-07 | A Hetero-Associative Sequential Memory Model Utilizing Neuromorphic Signals: Validated on a Mobile Manipulator | Runcong Wang et.al. | 2512.07032 | translate | read | null |
| 2025-12-07 | Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients | Krishna Arun et.al. | 2512.06990 | translate | read | null |
| 2025-12-07 | LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding | Yu Yu et.al. | 2512.06982 | translate | read | null |
| 2025-12-07 | Neuro-Vesicles: Neuromodulation Should Be a Dynamical System, Not a Tensor Decoration | Zilin Li et.al. | 2512.06966 | translate | read | null |
| 2025-12-07 | Statistical analysis of Inverse Entropy-regularized Reinforcement Learning | Denis Belomestny et.al. | 2512.06956 | translate | read | null |
| 2025-12-07 | Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features | Aseer Al Faisal et.al. | 2512.06925 | translate | read | null |
| 2025-12-07 | Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models | Alexandr Plashchinsky et.al. | 2512.06920 | translate | read | null |
| 2025-12-07 | Know your Trajectory – Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis | Clifford F et.al. | 2512.06917 | translate | read | null |
| 2025-12-07 | Khalasi: Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields | Rushiraj Gadhvi et.al. | 2512.06912 | translate | read | null |
| 2025-12-07 | An Analysis of Large Language Models for Simulating User Responses in Surveys | Ziyun Yu et.al. | 2512.06874 | translate | read | null |
| 2025-12-07 | JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models | Ce Chi et.al. | 2512.06859 | translate | read | null |
| 2025-12-07 | Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning | Tingyu Li et.al. | 2512.06835 | translate | read | null |
| 2025-12-07 | MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning | Yueqian Wang et.al. | 2512.06810 | translate | read | null |
| 2025-12-07 | PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance | Jifar Wakuma Ayana et.al. | 2512.06747 | translate | read | null |
| 2025-12-07 | The Role of Entropy in Visual Grounding: Analysis and Optimization | Shuo Li et.al. | 2512.06726 | translate | read | null |
| 2025-12-07 | RunawayEvil: Jailbreaking the Image-to-Video Generative Models | Songping Wang et.al. | 2512.06674 | translate | read | null |
| 2025-12-07 | LightSearcher: Efficient DeepSearch via Experiential Memory | Hengzhi Lan et.al. | 2512.06653 | translate | read | null |
| 2025-12-07 | Analyzing Collision Rates in Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning | Muyang Fan et.al. | 2512.06645 | translate | read | null |
| 2025-12-07 | Learning to Hedge Swaptions | Zaniar Ahmadi et.al. | 2512.06639 | translate | read | null |
| 2025-12-07 | MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment | Ruicheng Zhang et.al. | 2512.06628 | translate | read | null |
| 2025-12-07 | A New Trajectory-Oriented Approach to Enhancing Comprehensive Crowd Navigation Performance | Xinyu Zhou et.al. | 2512.06608 | translate | read | null |
| 2025-12-06 | MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding | Yuhao Su et.al. | 2512.06581 | translate | read | null |
| 2025-12-06 | Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input | Zifan Xu et.al. | 2512.06571 | translate | read | null |
| 2025-12-06 | A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation | Xiaocan Li et.al. | 2512.06547 | translate | read | null |
| 2025-12-06 | Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning | Ming Chen et.al. | 2512.06533 | translate | read | null |
| 2025-12-06 | Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains | Wanru Gong et.al. | 2512.06486 | translate | read | null |
| 2025-12-06 | Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control | Nathan P. Lawrence et.al. | 2512.06471 | translate | read | null |
| 2025-12-06 | RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs | Runlong Zhou et.al. | 2512.06392 | translate | read | null |
| 2025-12-06 | VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning | Yuji Wang et.al. | 2512.06373 | translate | read | null |
| 2025-12-06 | LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing | Zhiying Yang et.al. | 2512.06351 | translate | read | null |
| 2025-12-06 | ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models | Jiahao Li et.al. | 2512.06328 | translate | read | null |
| 2025-12-06 | A Hybrid Physics-Based and Reinforcement Learning Framework for Electric Vehicle Charging Time Prediction | Praharshitha Aryasomayajula et.al. | 2512.06287 | translate | read | null |
| 2025-12-06 | Networked Restless Multi-Arm Bandits with Reinforcement Learning | Hanmo Zhang et.al. | 2512.06274 | translate | read | null |
| 2025-12-06 | Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models | Chen Yang et.al. | 2512.06266 | translate | read | null |
| 2025-12-06 | Learning Without Time-Based Embodiment Resets in Soft-Actor Critic | Homayoon Farrahi et.al. | 2512.06252 | translate | read | null |
| 2025-12-06 | Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning | Chris Tava et.al. | 2512.06250 | translate | read | null |
| 2025-12-06 | Auto-exploration for online reinforcement learning | Caleb Ju et.al. | 2512.06244 | translate | read | null |
| 2025-12-06 | AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems | Chuanhao Nie et.al. | 2512.06240 | translate | read | null |
| 2025-12-05 | Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration | Huizhen Yu et.al. | 2512.06218 | translate | read | null |
| 2025-12-05 | Quantifying Memory Use in Reinforcement Learning with Temporal Range | Rodney Lafuente-Mercado et.al. | 2512.06204 | translate | read | null |
| 2025-12-05 | JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning | Ufuk Çakır et.al. | 2512.06102 | translate | read | null |
| 2025-12-05 | Empathy by Design: Aligning Large Language Models for Healthcare Dialogue | Emre Umucu et.al. | 2512.06097 | translate | read | null |
| 2025-12-05 | Comparative Analysis of Autonomous and Systematic Control Strategies for Hole-Doped Hubbard Clusters: Reinforcement Learning versus Physics-Guided Design | Shivanshu Dwivedi et.al. | 2512.06095 | translate | read | null |
| 2025-12-05 | Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring | Mohanakrishnan Hariharan et.al. | 2512.06060 | translate | read | null |
| 2025-12-05 | EditThinker: Unlocking Iterative Reasoning for Any Image Editor | Hongyu Li et.al. | 2512.05965 | translate | read | null |
| 2025-12-05 | Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity | Germán Kruszewski et.al. | 2512.05962 | translate | read | null |
| 2025-12-05 | Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning | Yunhao Cao et.al. | 2512.05953 | translate | read | null |
| 2025-12-05 | Variational Quantum Rainbow Deep Q-Network for Optimizing Resource Allocation Problem | Truong Thanh Hung Nguyen et.al. | 2512.05946 | translate | read | null |
| 2025-12-05 | Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation | Fabian Konstantinidis et.al. | 2512.05812 | translate | read | null |
| 2025-12-05 | Real-time Remote Tracking and Autonomous Planning for Whale Rendezvous using Robots | Sushmita Bhattacharya et.al. | 2512.05808 | translate | read | null |
| 2025-12-05 | A Fast Anti-Jamming Cognitive Radar Deployment Algorithm Based on Reinforcement Learning | Wencheng Cai et.al. | 2512.05753 | translate | read | null |
| 2025-12-05 | A High-Order Immersed Boundary Method for Fluid-Structure Interaction Problems | Yingjie Xia et.al. | 2512.05733 | translate | read | null |
| 2025-12-05 | Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning | Ali Krayani et.al. | 2512.05711 | translate | read | null |
| 2025-12-05 | LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving | Yiming Shu et.al. | 2512.05686 | translate | read | null |
| 2025-12-05 | MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation | Zhitao He et.al. | 2512.05671 | translate | read | null |
| 2025-12-05 | Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning | Zhenpeng Su et.al. | 2512.05591 | translate | read | null |
| 2025-12-05 | Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning | Pengcheng Dai et.al. | 2512.05447 | translate | read | null |
| 2025-12-05 | ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction | Jiangtong Tan et.al. | 2512.05422 | translate | read | null |
| 2025-12-05 | State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning | Yuxiang Liu et.al. | 2512.05335 | translate | read | null |
| 2025-12-04 | Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay | Mehmet Efe Lorasdagi et.al. | 2512.05320 | translate | read | null |
| 2025-12-04 | Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces | Na Li et.al. | 2512.05291 | translate | read | null |
| 2025-12-04 | Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem | Ali Al Housseini et.al. | 2512.05207 | translate | read | null |
| 2025-12-04 | ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning | Shengyuan Ding et.al. | 2512.05111 | translate | read | null |
| 2025-12-04 | STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models | Feng Xu et.al. | 2512.05107 | translate | read | null |
| 2025-12-04 | Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning | Purbesh Mitra et.al. | 2512.05105 | translate | read | link |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)