Reinforcement Learning - 2025-10

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-10-31	Reinforcement Learning for Resource Allocation in Vehicular Multi-Fog Computing	Mohammad Hadi Akbarzadeh et.al.	2511.00276	translate	read	null
2025-10-31	Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning	Michiel Straat et.al.	2511.00272	translate	read	null
2025-10-31	Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning	Marwa Abdulhai et.al.	2511.00222	translate	read	null
2025-10-31	Iterative Foundation Model Fine-Tuning on Multiple Rewards	Pouya M. Ghari et.al.	2511.00220	translate	read	null
2025-10-31	Deep reinforcement learning for optimal trading with partial information	Andrea Macrì et.al.	2511.00190	translate	read	null
2025-10-31	Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning	Shiman Zhang et.al.	2511.00166	translate	read	null
2025-10-31	EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations	Justin Yu et.al.	2511.00153	translate	read	null
2025-10-31	A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control	Qing Guo et.al.	2511.00136	translate	read	null
2025-10-31	DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads	Antonio Guillen-Perez et.al.	2511.00117	translate	read	null
2025-10-31	LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers	Avisek Naug et.al.	2511.00116	translate	read	null
2025-10-31	End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning	Hanae Elmekki et.al.	2511.00114	translate	read	null
2025-10-30	Real-DRL: Teach and Learn in Reality	Yanbing Mao et.al.	2511.00112	translate	read	null
2025-10-30	Self-Improving Vision-Language-Action Models with Data Generation via Residual RL	Wenli Xiao et.al.	2511.00091	translate	read	null
2025-10-30	Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail	NVIDIA et.al.	2511.00088	translate	read	null
2025-10-29	Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models	Tue Le et.al.	2511.00066	translate	read	null
2025-10-31	Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems	Alireza Saleh Abadi et.al.	2510.27659	translate	read	null
2025-10-31	Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning	Yuhong Liu et.al.	2510.27606	translate	read	link
2025-10-31	MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval	Qi Luo et.al.	2510.27569	translate	read	null
2025-10-31	Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval	Yulong Hui et.al.	2510.27566	translate	read	null
2025-10-31	VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision	Xuan Gong et.al.	2510.27462	translate	read	null
2025-10-31	Learning Soft Robotic Dynamics with Active Exploration	Hehui Zheng et.al.	2510.27428	translate	read	null
2025-10-31	DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains	Tian Liang et.al.	2510.27419	translate	read	null
2025-10-31	Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints	Yueyang Wang et.al.	2510.27383	translate	read	null
2025-10-31	Reasoning Models Sometimes Output Illegible Chains of Thought	Arun Jose et.al.	2510.27338	translate	read	null
2025-10-31	When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making	Ali Raza Jafree et.al.	2510.27334	translate	read	null
2025-10-31	Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines	Kristina Levina et.al.	2510.27329	translate	read	null
2025-10-31	A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination	Zhengchang Hua et.al.	2510.27289	translate	read	null
2025-10-31	Inferring trust in recommendation systems from brain, behavioural, and physiological data	Vincent K. M. Cheung et.al.	2510.27272	translate	read	null
2025-10-31	MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models	Kangkun Mao et.al.	2510.27267	translate	read	null
2025-10-31	GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation	Tao Liu et.al.	2510.27210	translate	read	null
2025-10-31	ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction	Jing Chang et.al.	2510.27168	translate	read	null
2025-10-31	Disrupting Networks: Amplifying Social Dissensus via Opinion Perturbation and Large Language Models	Erica Coppolillo et.al.	2510.27152	translate	read	null
2025-10-31	AURA: A Reinforcement Learning Framework for AI-Driven Adaptive Conversational Surveys	Jinwen Tang et.al.	2510.27126	translate	read	null
2025-10-31	Towards Understanding Self-play for LLM Reasoning	Justin Yang Chae et.al.	2510.27072	translate	read	null
2025-10-31	Distributed Precoding for Cell-free Massive MIMO in O-RAN: A Multi-agent Deep Reinforcement Learning Framework	Mohammad Hossein Shokouhi et.al.	2510.27069	translate	read	null
2025-10-31	Adaptive Human-Computer Interaction Strategies Through Reinforcement Learning in Complex	Rui Liu et.al.	2510.27058	translate	read	null
2025-10-30	SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation	Eric T. Chang et.al.	2510.27048	translate	read	null
2025-10-30	Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning	Md Tanvirul Alam et.al.	2510.27044	translate	read	link
2025-10-30	e1: Learning Adaptive Control of Reasoning Effort	Michael Kleinman et.al.	2510.27042	translate	read	null
2025-10-30	Algorithmic Predation: Equilibrium Analysis in Dynamic Oligopolies with Smooth Market Sharing	Fabian Raoul Pieroth et.al.	2510.27008	translate	read	null
2025-10-30	A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms	Elise Wolf et.al.	2510.27001	translate	read	null
2025-10-30	Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench	Fenfen Lin et.al.	2510.26865	translate	read	link
2025-10-30	Defeating the Training-Inference Mismatch via FP16	Penghui Qi et.al.	2510.26788	translate	read	link
2025-10-30	A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation	Ashwin Kumar et.al.	2510.26740	translate	read	null
2025-10-30	Stabilizing Rayleigh-Benard convection with reinforcement learning trained on a reduced-order model	Qiwei Chen et.al.	2510.26705	translate	read	null
2025-10-30	Kimi Linear: An Expressive, Efficient Attention Architecture	Kimi Team et.al.	2510.26692	translate	read	link
2025-10-30	Action-Driven Processes for Continuous-Time Control	Ruimin He et.al.	2510.26672	translate	read	null
2025-10-30	Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation	Qianyou Zhao et.al.	2510.26670	translate	read	null
2025-10-30	The Era of Agentic Organization: Learning to Organize with Language Models	Zewen Chi et.al.	2510.26658	translate	read	null
2025-10-30	Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments	Xiaoyi He et.al.	2510.26646	translate	read	null
2025-10-30	Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications	Chuang Zhang et.al.	2510.26628	translate	read	null
2025-10-30	A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication	Weixuan Chen et.al.	2510.26610	translate	read	null
2025-10-30	Emu3.5: Native Multimodal Models are World Learners	Yufeng Cui et.al.	2510.26583	translate	read	link
2025-10-30	InfoFlow: Reinforcing Search Agent Via Reward Density Optimization	Kun Luo et.al.	2510.26575	translate	read	null
2025-10-30	Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics	Prathamesh Kothavale et.al.	2510.26551	translate	read	null
2025-10-30	Think Outside the Policy: In-Context Steered Policy Optimization	Hsiu-Yuan Huang et.al.	2510.26519	translate	read	null
2025-10-30	Data-Efficient RLVR via Off-Policy Influence Guidance	Erle Zhu et.al.	2510.26491	translate	read	null
2025-10-30	ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems	Qiaoling Chen et.al.	2510.26475	translate	read	null
2025-10-30	PolarZero: A Reinforcement Learning Approach for Low-Complexity Polarization Kernel Design	Yi-Ting Hong et.al.	2510.26452	translate	read	null
2025-10-30	An Impulse Control Approach to Market Making in a Hawkes LOB Market	Konark Jain et.al.	2510.26438	translate	read	null
2025-10-30	Human-in-the-loop Online Rejection Sampling for Robotic Manipulation	Guanxing Lu et.al.	2510.26406	translate	read	null
2025-10-30	Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning	Wenchang Duan et.al.	2510.26389	translate	read	null
2025-10-30	Towards Reinforcement Learning Based Log Loading Automation	Ilya Kurinov et.al.	2510.26363	translate	read	null
2025-10-30	Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle	Sebastian Zieglmeier et.al.	2510.26347	translate	read	null
2025-10-30	Offline Clustering of Preference Learning with Active-data Augmentation	Jingyuan Liu et.al.	2510.26301	translate	read	null
2025-10-30	Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving	Lin Liu et.al.	2510.26292	translate	read	null
2025-10-30	Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search	Guochang Li et.al.	2510.26287	translate	read	null
2025-10-30	Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments	Gangyang Li et.al.	2510.26280	translate	read	null
2025-10-30	Graph-Enhanced Policy Optimization in LLM Agent Training	Jiazhen Yuan et.al.	2510.26270	translate	read	null
2025-10-30	A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation	Songxin Lei et.al.	2510.26184	translate	read	null
2025-10-30	One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning	Renhao Li et.al.	2510.26167	translate	read	null
2025-10-30	Learning to Manage Investment Portfolios beyond Simple Utility Functions	Maarten P. Scholl et.al.	2510.26165	translate	read	null
2025-10-30	Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math	Bo Pang et.al.	2510.26143	translate	read	null
2025-10-30	EgoExo-Con: Exploring View-Invariant Video Temporal Understanding	Minjoon Jung et.al.	2510.26113	translate	read	null
2025-10-30	Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error	Chenming Tang et.al.	2510.26109	translate	read	null
2025-10-30	GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks	Chenrui Shi et.al.	2510.26098	translate	read	null
2025-10-30	Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing	Fazel Arasteh et.al.	2510.26089	translate	read	null
2025-10-30	Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion	Chi Zhang et.al.	2510.26067	translate	read	null
2025-10-30	Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods	Emily Steiner et.al.	2510.26040	translate	read	null
2025-10-29	Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation	Feichen Gan et.al.	2510.26026	translate	read	null
2025-10-29	PORTool: Tool-Use LLM Training with Rewarded Tree	Feijie Wu et.al.	2510.26020	translate	read	null
2025-10-29	Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning	Yihe Deng et.al.	2510.25992	translate	read	null
2025-10-29	Estimating cognitive biases with attention-aware inverse planning	Sounak Banerjee et.al.	2510.25951	translate	read	null
2025-10-29	InputDSA: Demixing then Comparing Recurrent and Externally Driven Dynamics	Ann Huang et.al.	2510.25943	translate	read	null
2025-10-29	Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion	Ziyi Wang et.al.	2510.25929	translate	read	null
2025-10-29	$π_\texttt{RL}$ : Online RL Fine-tuning for Flow-based Vision-Language-Action Models	Kang Chen et.al.	2510.25889	translate	read	null
2025-10-29	Approximating Human Preferences Using a Multi-Judge Learned System	Eitán Sprejer et.al.	2510.25884	translate	read	null
2025-10-29	MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs	Xiaoke Huang et.al.	2510.25867	translate	read	null
2025-10-29	Adversarial Pre-Padding: Generating Evasive Network Traffic Against Transformer-Based Classifiers	Quanliang Jing et.al.	2510.25810	translate	read	null
2025-10-29	MetaLore: Learning to Orchestrate Communication and Computation for Metaverse Synchronization	Elif Ebru Ohri et.al.	2510.25705	translate	read	null
2025-10-29	PairUni: Pairwise Training for Unified Multimodal Language Models	Jiani Zheng et.al.	2510.25682	translate	read	null
2025-10-29	Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning	Federica Tonti et.al.	2510.25679	translate	read	null
2025-10-29	ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents	Tianyu Yang et.al.	2510.25668	translate	read	null
2025-10-29	Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills	Weikang Wan et.al.	2510.25634	translate	read	null
2025-10-29	EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis	Yusheng Liao et.al.	2510.25628	translate	read	null
2025-10-29	On the instability of local learning algorithms: Q-learning can fail in infinite state spaces	Urtzi Ayesta et.al.	2510.25572	translate	read	null
2025-10-29	Deep Reinforcement Learning-Based Cooperative Rate Splitting for Satellite-to-Underground Communication Networks	Kaiqiang Lin et.al.	2510.25562	translate	read	null
2025-10-29	Off-policy Reinforcement Learning with Model-based Exploration Augmentation	Likun Wang et.al.	2510.25529	translate	read	null
2025-10-29	Zero Reinforcement Learning Towards General Domains	Yuyuan Zeng et.al.	2510.25528	translate	read	null
2025-10-29	MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL	Zekun Xu et.al.	2510.25510	translate	read	null
2025-10-29	Dynamic Beamforming and Power Allocation in ISAC via Deep Reinforcement Learning	Duc Nguyen Dao et.al.	2510.25496	translate	read	null
2025-10-29	Reinforcement Learning techniques for the flavor problem in particle physics	A. Giarnetti et.al.	2510.25495	translate	read	null
2025-10-29	Generalized Pseudo-Relevance Feedback	Yiteng Tu et.al.	2510.25488	translate	read	null
2025-10-29	Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning	Kei Ikemura et.al.	2510.25405	translate	read	null
2025-10-29	Model-Free Robust Beamforming in Satellite Downlink using Reinforcement Learning	Alea Schröder et.al.	2510.25393	translate	read	null
2025-10-29	Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork	Beiwen Zhang et.al.	2510.25340	translate	read	null
2025-10-29	GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning	Jiaqi Wu et.al.	2510.25320	translate	read	null
2025-10-29	Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning	Sagalpreet Singh et.al.	2510.25311	translate	read	null
2025-10-29	Adaptive Design of mmWave Initial Access Codebooks using Reinforcement Learning	Sabrine Aroua et.al.	2510.25271	translate	read	null
2025-10-29	The influence of the random numbers quality on the results in stochastic simulations and machine learning	Benjamin A. Antunes et.al.	2510.25269	translate	read	null
2025-10-29	SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation	Wang zhi et.al.	2510.25268	translate	read	null
2025-10-29	One-shot Humanoid Whole-body Motion Learning	Hao Huang et.al.	2510.25241	translate	read	null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)