Reinforcement Learning - 2025-10
Reinforcement Learning - 2025-10
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-10-31 | Reinforcement Learning for Resource Allocation in Vehicular Multi-Fog Computing | Mohammad Hadi Akbarzadeh et.al. | 2511.00276 | translate | read | null |
| 2025-10-31 | Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning | Michiel Straat et.al. | 2511.00272 | translate | read | null |
| 2025-10-31 | Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning | Marwa Abdulhai et.al. | 2511.00222 | translate | read | null |
| 2025-10-31 | Iterative Foundation Model Fine-Tuning on Multiple Rewards | Pouya M. Ghari et.al. | 2511.00220 | translate | read | null |
| 2025-10-31 | Deep reinforcement learning for optimal trading with partial information | Andrea Macrì et.al. | 2511.00190 | translate | read | null |
| 2025-10-31 | Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning | Shiman Zhang et.al. | 2511.00166 | translate | read | null |
| 2025-10-31 | EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations | Justin Yu et.al. | 2511.00153 | translate | read | null |
| 2025-10-31 | A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control | Qing Guo et.al. | 2511.00136 | translate | read | null |
| 2025-10-31 | DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads | Antonio Guillen-Perez et.al. | 2511.00117 | translate | read | null |
| 2025-10-31 | LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers | Avisek Naug et.al. | 2511.00116 | translate | read | null |
| 2025-10-31 | End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning | Hanae Elmekki et.al. | 2511.00114 | translate | read | null |
| 2025-10-30 | Real-DRL: Teach and Learn in Reality | Yanbing Mao et.al. | 2511.00112 | translate | read | null |
| 2025-10-30 | Self-Improving Vision-Language-Action Models with Data Generation via Residual RL | Wenli Xiao et.al. | 2511.00091 | translate | read | null |
| 2025-10-30 | Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail | NVIDIA et.al. | 2511.00088 | translate | read | null |
| 2025-10-29 | Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models | Tue Le et.al. | 2511.00066 | translate | read | null |
| 2025-10-31 | Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems | Alireza Saleh Abadi et.al. | 2510.27659 | translate | read | null |
| 2025-10-31 | Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning | Yuhong Liu et.al. | 2510.27606 | translate | read | link |
| 2025-10-31 | MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval | Qi Luo et.al. | 2510.27569 | translate | read | null |
| 2025-10-31 | Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval | Yulong Hui et.al. | 2510.27566 | translate | read | null |
| 2025-10-31 | VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision | Xuan Gong et.al. | 2510.27462 | translate | read | null |
| 2025-10-31 | Learning Soft Robotic Dynamics with Active Exploration | Hehui Zheng et.al. | 2510.27428 | translate | read | null |
| 2025-10-31 | DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains | Tian Liang et.al. | 2510.27419 | translate | read | null |
| 2025-10-31 | Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints | Yueyang Wang et.al. | 2510.27383 | translate | read | null |
| 2025-10-31 | Reasoning Models Sometimes Output Illegible Chains of Thought | Arun Jose et.al. | 2510.27338 | translate | read | null |
| 2025-10-31 | When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making | Ali Raza Jafree et.al. | 2510.27334 | translate | read | null |
| 2025-10-31 | Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines | Kristina Levina et.al. | 2510.27329 | translate | read | null |
| 2025-10-31 | A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination | Zhengchang Hua et.al. | 2510.27289 | translate | read | null |
| 2025-10-31 | Inferring trust in recommendation systems from brain, behavioural, and physiological data | Vincent K. M. Cheung et.al. | 2510.27272 | translate | read | null |
| 2025-10-31 | MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models | Kangkun Mao et.al. | 2510.27267 | translate | read | null |
| 2025-10-31 | GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation | Tao Liu et.al. | 2510.27210 | translate | read | null |
| 2025-10-31 | ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction | Jing Chang et.al. | 2510.27168 | translate | read | null |
| 2025-10-31 | Disrupting Networks: Amplifying Social Dissensus via Opinion Perturbation and Large Language Models | Erica Coppolillo et.al. | 2510.27152 | translate | read | null |
| 2025-10-31 | AURA: A Reinforcement Learning Framework for AI-Driven Adaptive Conversational Surveys | Jinwen Tang et.al. | 2510.27126 | translate | read | null |
| 2025-10-31 | Towards Understanding Self-play for LLM Reasoning | Justin Yang Chae et.al. | 2510.27072 | translate | read | null |
| 2025-10-31 | Distributed Precoding for Cell-free Massive MIMO in O-RAN: A Multi-agent Deep Reinforcement Learning Framework | Mohammad Hossein Shokouhi et.al. | 2510.27069 | translate | read | null |
| 2025-10-31 | Adaptive Human-Computer Interaction Strategies Through Reinforcement Learning in Complex | Rui Liu et.al. | 2510.27058 | translate | read | null |
| 2025-10-30 | SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation | Eric T. Chang et.al. | 2510.27048 | translate | read | null |
| 2025-10-30 | Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning | Md Tanvirul Alam et.al. | 2510.27044 | translate | read | link |
| 2025-10-30 | e1: Learning Adaptive Control of Reasoning Effort | Michael Kleinman et.al. | 2510.27042 | translate | read | null |
| 2025-10-30 | Algorithmic Predation: Equilibrium Analysis in Dynamic Oligopolies with Smooth Market Sharing | Fabian Raoul Pieroth et.al. | 2510.27008 | translate | read | null |
| 2025-10-30 | A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms | Elise Wolf et.al. | 2510.27001 | translate | read | null |
| 2025-10-30 | Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench | Fenfen Lin et.al. | 2510.26865 | translate | read | link |
| 2025-10-30 | Defeating the Training-Inference Mismatch via FP16 | Penghui Qi et.al. | 2510.26788 | translate | read | link |
| 2025-10-30 | A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation | Ashwin Kumar et.al. | 2510.26740 | translate | read | null |
| 2025-10-30 | Stabilizing Rayleigh-Benard convection with reinforcement learning trained on a reduced-order model | Qiwei Chen et.al. | 2510.26705 | translate | read | null |
| 2025-10-30 | Kimi Linear: An Expressive, Efficient Attention Architecture | Kimi Team et.al. | 2510.26692 | translate | read | link |
| 2025-10-30 | Action-Driven Processes for Continuous-Time Control | Ruimin He et.al. | 2510.26672 | translate | read | null |
| 2025-10-30 | Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation | Qianyou Zhao et.al. | 2510.26670 | translate | read | null |
| 2025-10-30 | The Era of Agentic Organization: Learning to Organize with Language Models | Zewen Chi et.al. | 2510.26658 | translate | read | null |
| 2025-10-30 | Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments | Xiaoyi He et.al. | 2510.26646 | translate | read | null |
| 2025-10-30 | Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications | Chuang Zhang et.al. | 2510.26628 | translate | read | null |
| 2025-10-30 | A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication | Weixuan Chen et.al. | 2510.26610 | translate | read | null |
| 2025-10-30 | Emu3.5: Native Multimodal Models are World Learners | Yufeng Cui et.al. | 2510.26583 | translate | read | link |
| 2025-10-30 | InfoFlow: Reinforcing Search Agent Via Reward Density Optimization | Kun Luo et.al. | 2510.26575 | translate | read | null |
| 2025-10-30 | Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics | Prathamesh Kothavale et.al. | 2510.26551 | translate | read | null |
| 2025-10-30 | Think Outside the Policy: In-Context Steered Policy Optimization | Hsiu-Yuan Huang et.al. | 2510.26519 | translate | read | null |
| 2025-10-30 | Data-Efficient RLVR via Off-Policy Influence Guidance | Erle Zhu et.al. | 2510.26491 | translate | read | null |
| 2025-10-30 | ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems | Qiaoling Chen et.al. | 2510.26475 | translate | read | null |
| 2025-10-30 | PolarZero: A Reinforcement Learning Approach for Low-Complexity Polarization Kernel Design | Yi-Ting Hong et.al. | 2510.26452 | translate | read | null |
| 2025-10-30 | An Impulse Control Approach to Market Making in a Hawkes LOB Market | Konark Jain et.al. | 2510.26438 | translate | read | null |
| 2025-10-30 | Human-in-the-loop Online Rejection Sampling for Robotic Manipulation | Guanxing Lu et.al. | 2510.26406 | translate | read | null |
| 2025-10-30 | Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning | Wenchang Duan et.al. | 2510.26389 | translate | read | null |
| 2025-10-30 | Towards Reinforcement Learning Based Log Loading Automation | Ilya Kurinov et.al. | 2510.26363 | translate | read | null |
| 2025-10-30 | Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle | Sebastian Zieglmeier et.al. | 2510.26347 | translate | read | null |
| 2025-10-30 | Offline Clustering of Preference Learning with Active-data Augmentation | Jingyuan Liu et.al. | 2510.26301 | translate | read | null |
| 2025-10-30 | Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving | Lin Liu et.al. | 2510.26292 | translate | read | null |
| 2025-10-30 | Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search | Guochang Li et.al. | 2510.26287 | translate | read | null |
| 2025-10-30 | Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments | Gangyang Li et.al. | 2510.26280 | translate | read | null |
| 2025-10-30 | Graph-Enhanced Policy Optimization in LLM Agent Training | Jiazhen Yuan et.al. | 2510.26270 | translate | read | null |
| 2025-10-30 | A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation | Songxin Lei et.al. | 2510.26184 | translate | read | null |
| 2025-10-30 | One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning | Renhao Li et.al. | 2510.26167 | translate | read | null |
| 2025-10-30 | Learning to Manage Investment Portfolios beyond Simple Utility Functions | Maarten P. Scholl et.al. | 2510.26165 | translate | read | null |
| 2025-10-30 | Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math | Bo Pang et.al. | 2510.26143 | translate | read | null |
| 2025-10-30 | EgoExo-Con: Exploring View-Invariant Video Temporal Understanding | Minjoon Jung et.al. | 2510.26113 | translate | read | null |
| 2025-10-30 | Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error | Chenming Tang et.al. | 2510.26109 | translate | read | null |
| 2025-10-30 | GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks | Chenrui Shi et.al. | 2510.26098 | translate | read | null |
| 2025-10-30 | Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing | Fazel Arasteh et.al. | 2510.26089 | translate | read | null |
| 2025-10-30 | Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion | Chi Zhang et.al. | 2510.26067 | translate | read | null |
| 2025-10-30 | Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods | Emily Steiner et.al. | 2510.26040 | translate | read | null |
| 2025-10-29 | Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation | Feichen Gan et.al. | 2510.26026 | translate | read | null |
| 2025-10-29 | PORTool: Tool-Use LLM Training with Rewarded Tree | Feijie Wu et.al. | 2510.26020 | translate | read | null |
| 2025-10-29 | Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning | Yihe Deng et.al. | 2510.25992 | translate | read | null |
| 2025-10-29 | Estimating cognitive biases with attention-aware inverse planning | Sounak Banerjee et.al. | 2510.25951 | translate | read | null |
| 2025-10-29 | InputDSA: Demixing then Comparing Recurrent and Externally Driven Dynamics | Ann Huang et.al. | 2510.25943 | translate | read | null |
| 2025-10-29 | Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion | Ziyi Wang et.al. | 2510.25929 | translate | read | null |
| 2025-10-29 | $π_\texttt{RL}$ : Online RL Fine-tuning for Flow-based Vision-Language-Action Models | Kang Chen et.al. | 2510.25889 | translate | read | null |
| 2025-10-29 | Approximating Human Preferences Using a Multi-Judge Learned System | Eitán Sprejer et.al. | 2510.25884 | translate | read | null |
| 2025-10-29 | MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs | Xiaoke Huang et.al. | 2510.25867 | translate | read | null |
| 2025-10-29 | Adversarial Pre-Padding: Generating Evasive Network Traffic Against Transformer-Based Classifiers | Quanliang Jing et.al. | 2510.25810 | translate | read | null |
| 2025-10-29 | MetaLore: Learning to Orchestrate Communication and Computation for Metaverse Synchronization | Elif Ebru Ohri et.al. | 2510.25705 | translate | read | null |
| 2025-10-29 | PairUni: Pairwise Training for Unified Multimodal Language Models | Jiani Zheng et.al. | 2510.25682 | translate | read | null |
| 2025-10-29 | Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning | Federica Tonti et.al. | 2510.25679 | translate | read | null |
| 2025-10-29 | ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents | Tianyu Yang et.al. | 2510.25668 | translate | read | null |
| 2025-10-29 | Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills | Weikang Wan et.al. | 2510.25634 | translate | read | null |
| 2025-10-29 | EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis | Yusheng Liao et.al. | 2510.25628 | translate | read | null |
| 2025-10-29 | On the instability of local learning algorithms: Q-learning can fail in infinite state spaces | Urtzi Ayesta et.al. | 2510.25572 | translate | read | null |
| 2025-10-29 | Deep Reinforcement Learning-Based Cooperative Rate Splitting for Satellite-to-Underground Communication Networks | Kaiqiang Lin et.al. | 2510.25562 | translate | read | null |
| 2025-10-29 | Off-policy Reinforcement Learning with Model-based Exploration Augmentation | Likun Wang et.al. | 2510.25529 | translate | read | null |
| 2025-10-29 | Zero Reinforcement Learning Towards General Domains | Yuyuan Zeng et.al. | 2510.25528 | translate | read | null |
| 2025-10-29 | MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL | Zekun Xu et.al. | 2510.25510 | translate | read | null |
| 2025-10-29 | Dynamic Beamforming and Power Allocation in ISAC via Deep Reinforcement Learning | Duc Nguyen Dao et.al. | 2510.25496 | translate | read | null |
| 2025-10-29 | Reinforcement Learning techniques for the flavor problem in particle physics | A. Giarnetti et.al. | 2510.25495 | translate | read | null |
| 2025-10-29 | Generalized Pseudo-Relevance Feedback | Yiteng Tu et.al. | 2510.25488 | translate | read | null |
| 2025-10-29 | Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning | Kei Ikemura et.al. | 2510.25405 | translate | read | null |
| 2025-10-29 | Model-Free Robust Beamforming in Satellite Downlink using Reinforcement Learning | Alea Schröder et.al. | 2510.25393 | translate | read | null |
| 2025-10-29 | Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork | Beiwen Zhang et.al. | 2510.25340 | translate | read | null |
| 2025-10-29 | GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning | Jiaqi Wu et.al. | 2510.25320 | translate | read | null |
| 2025-10-29 | Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning | Sagalpreet Singh et.al. | 2510.25311 | translate | read | null |
| 2025-10-29 | Adaptive Design of mmWave Initial Access Codebooks using Reinforcement Learning | Sabrine Aroua et.al. | 2510.25271 | translate | read | null |
| 2025-10-29 | The influence of the random numbers quality on the results in stochastic simulations and machine learning | Benjamin A. Antunes et.al. | 2510.25269 | translate | read | null |
| 2025-10-29 | SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation | Wang zhi et.al. | 2510.25268 | translate | read | null |
| 2025-10-29 | One-shot Humanoid Whole-body Motion Learning | Hao Huang et.al. | 2510.25241 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)