Reinforcement Learning - 2025-06
Reinforcement Learning - 2025-06
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-06-30 | Scaling Human Judgment in Community Notes with LLMs | Haiwen Li et.al. | 2506.24118 | translate | read | null |
| 2025-06-30 | Constructing Non-Markovian Decision Process via History Aggregator | Yongyi Wang et.al. | 2506.24026 | translate | read | null |
| 2025-06-30 | Provably Efficient and Agile Randomized Q-Learning | He Wang et.al. | 2506.24005 | translate | read | null |
| 2025-06-30 | Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning | Seungjun Yi et.al. | 2506.23998 | translate | read | null |
| 2025-06-30 | ADReFT: Adaptive Decision Repair for Safe Autonomous Driving via Reinforcement Fine-Tuning | Mingfei Cheng et.al. | 2506.23960 | translate | read | null |
| 2025-06-30 | Reinforcement Learning for Synchronised Flow Control in a Dual-Gate Resin Infusion System | Miguel Camacho-Sánchez et.al. | 2506.23923 | translate | read | null |
| 2025-06-30 | The Trilemma of Truth in Large Language Models | Germans Savcisens et.al. | 2506.23921 | translate | read | link |
| 2025-06-30 | Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning | Anton Andreychuk et.al. | 2506.23793 | translate | read | link |
| 2025-06-27 | MiCo: Multi-image Contrast for Reinforcement Visual Reasoning | Xi Chen et.al. | 2506.22434 | translate | read | null |
| 2025-06-27 | ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks | Pritam Dash et.al. | 2506.22423 | translate | read | null |
| 2025-06-27 | HyperCLOVA X THINK Technical Report | NAVER Cloud HyperCLOVA X Team et.al. | 2506.22403 | translate | read | null |
| 2025-06-27 | Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL | Tong Yang et.al. | 2506.22401 | translate | read | null |
| 2025-06-27 | Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation | Tao Li et.al. | 2506.22365 | translate | read | null |
| 2025-06-27 | Education-Oriented Graph Retrieval-Augmented Generation for Learning Path Recommendation | Xinghe Cheng et.al. | 2506.22303 | translate | read | null |
| 2025-06-27 | ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning | Ming Zhao et.al. | 2506.22216 | translate | read | null |
| 2025-06-27 | A Reinforcement Learning Framework for Some Singular Stochastic Control Problems | Zongxia Liang et.al. | 2506.22203 | translate | read | null |
| 2025-06-27 | EFRame: Deeper Reasoning via Exploration-Filtering-Replay Reinforcement Learning Framework | Chen Wang et.al. | 2506.22200 | translate | read | link |
| 2025-06-27 | ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research | Bavo Lesy et.al. | 2506.22174 | translate | read | null |
| 2025-06-26 | Joint Scheduling of DER under Demand Charges: Structure and Approximation | Ruixiao Yang et.al. | 2506.21510 | translate | read | null |
| 2025-06-26 | Bridging Offline and Online Reinforcement Learning for LLMs | Jack Lanchantin et.al. | 2506.21495 | translate | read | null |
| 2025-06-26 | Reinforcement Learning for Optimal Control of Spin Magnetometers | Logan W. Cooke et.al. | 2506.21475 | translate | read | null |
| 2025-06-26 | Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage | Gavin Lee Goodship et.al. | 2506.21465 | translate | read | null |
| 2025-06-26 | Spatial Mental Modeling from Limited Views | Baiqiao Yin et.al. | 2506.21458 | translate | read | null |
| 2025-06-26 | Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning | Prajwal Koirala et.al. | 2506.21427 | translate | read | null |
| 2025-06-26 | rQdia: Regularizing Q-Value Distributions With Image Augmentation | Sam Lerman et.al. | 2506.21367 | translate | read | null |
| 2025-06-26 | HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context | Qize Yang et.al. | 2506.21277 | translate | read | link |
| 2025-06-26 | World-aware Planning Narratives Enhance Large Vision-Language Model Planner | Junhao Shi et.al. | 2506.21230 | translate | read | null |
| 2025-06-26 | Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design | Hampus Gummesson Svensson et.al. | 2506.21158 | translate | read | null |
| 2025-06-25 | MMSearch-R1: Incentivizing LMMs to Search | Jinming Wu et.al. | 2506.20670 | translate | read | link |
| 2025-06-25 | DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy | Sungjae Park et.al. | 2506.20668 | translate | read | null |
| 2025-06-25 | The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind | Andrei Lupu et.al. | 2506.20664 | translate | read | null |
| 2025-06-25 | DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation | Shansan Gong et.al. | 2506.20639 | translate | read | link |
| 2025-06-25 | PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models | Soufiane Hayou et.al. | 2506.20629 | translate | read | link |
| 2025-06-25 | Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control | Andrew Mole et.al. | 2506.20554 | translate | read | null |
| 2025-06-25 | Demonstration of effective UCB-based routing in skill-based queues on real-world data | Sanne van Kempen et.al. | 2506.20543 | translate | read | null |
| 2025-06-25 | Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards | Charles Arnal et.al. | 2506.20520 | translate | read | null |
| 2025-06-25 | OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling | Zengzhi Wang et.al. | 2506.20512 | translate | read | link |
| 2025-06-25 | ReCode: Updating Code API Knowledge with Reinforcement Learning | Haoze Wu et.al. | 2506.20495 | translate | read | link |
| 2025-06-24 | JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning | Ai Han et.al. | 2506.19846 | translate | read | null |
| 2025-06-24 | Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning | Guo Li et.al. | 2506.19843 | translate | read | null |
| 2025-06-24 | Persona Features Control Emergent Misalignment | Miles Wang et.al. | 2506.19823 | translate | read | null |
| 2025-06-24 | KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | Baochang Ren et.al. | 2506.19807 | translate | read | null |
| 2025-06-24 | Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning | Menglong Zhang et.al. | 2506.19785 | translate | read | null |
| 2025-06-24 | SAGE: Strategy-Adaptive Generation Engine for Query Rewriting | Teng Wang et.al. | 2506.19783 | translate | read | null |
| 2025-06-24 | Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment | Yuhui Sun et.al. | 2506.19780 | translate | read | null |
| 2025-06-24 | SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning | Yuqian Fu et.al. | 2506.19767 | translate | read | null |
| 2025-06-24 | Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks | Nathan Maurer et.al. | 2506.19703 | translate | read | null |
| 2025-06-24 | From memories to maps: Mechanisms of in context reinforcement learning in transformers | Ching Fang et.al. | 2506.19686 | translate | read | null |
| 2025-06-23 | ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | Jiaru Zou et.al. | 2506.18896 | translate | read | null |
| 2025-06-23 | Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning | Anthony Kobanda et.al. | 2506.18847 | translate | read | null |
| 2025-06-23 | LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning | Yuhao Wu et.al. | 2506.18841 | translate | read | null |
| 2025-06-23 | SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives | Yizhou Chen et.al. | 2506.18825 | translate | read | null |
| 2025-06-23 | MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation | Ruicheng Zhang et.al. | 2506.18679 | translate | read | null |
| 2025-06-23 | Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation | Jingming Liu et.al. | 2506.18670 | translate | read | null |
| 2025-06-23 | RL-Driven Semantic Compression Model Selection and Resource Allocation in Semantic Communication Systems | Xinyi Lin et.al. | 2506.18660 | translate | read | null |
| 2025-06-23 | Dual-level Behavioral Consistency for Inter-group and Intra-group Coordination in Multi-Agent Systems | Shuocun Yang et.al. | 2506.18651 | translate | read | null |
| 2025-06-23 | Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits | Yannik Mahlau et.al. | 2506.18627 | translate | read | null |
| 2025-06-23 | Policy gradient methods for ordinal policies | Simón Weinberger et.al. | 2506.18614 | translate | read | null |
| 2025-06-20 | No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Yanzhi Zhang et.al. | 2506.17219 | translate | read | null |
| 2025-06-20 | Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens | Zeyuan Yang et.al. | 2506.17218 | translate | read | null |
| 2025-06-20 | BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning | Xuechen Zhang et.al. | 2506.17211 | translate | read | null |
| 2025-06-20 | Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning | Guozheng Ma et.al. | 2506.17204 | translate | read | null |
| 2025-06-20 | Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity | Samin Yeasar Arnob et.al. | 2506.17155 | translate | read | null |
| 2025-06-20 | When Can Model-Free Reinforcement Learning be Enough for Thinking? | Josiah P. Hanna et.al. | 2506.17124 | translate | read | null |
| 2025-06-20 | TransDreamerV3: Implanting Transformer In DreamerV3 | Shruti Sadanand Dongare et.al. | 2506.17103 | translate | read | null |
| 2025-06-20 | Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs | Ricardo Rei et.al. | 2506.17080 | translate | read | null |
| 2025-06-20 | Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment | Leizhen Wang et.al. | 2506.17029 | translate | read | null |
| 2025-06-20 | Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators | Marco Jiralerspong et.al. | 2506.17007 | translate | read | null |
| 2025-06-18 | Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards | Qingming Liu et.al. | 2506.15684 | translate | read | null |
| 2025-06-18 | CC-LEARN: Cohort-based Consistency Learning | Xiao Ye et.al. | 2506.15662 | translate | read | null |
| 2025-06-18 | CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization | Ranting Hu et.al. | 2506.15654 | translate | read | null |
| 2025-06-18 | AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning | Tevin Wang et.al. | 2506.15651 | translate | read | null |
| 2025-06-18 | Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement | Weixiang Zhao et.al. | 2506.15647 | translate | read | null |
| 2025-06-18 | Learning to flock in open space by avoiding collisions and staying together | Martino Brambati et.al. | 2506.15587 | translate | read | null |
| 2025-06-18 | Design of an all-facet illuminator for high NA EUV lithography exposure tool based on deep reinforcement learning | Tong Li et.al. | 2506.15558 | translate | read | null |
| 2025-06-18 | Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning | Roger Creus Castanyer et.al. | 2506.15544 | translate | read | link |
| 2025-06-18 | Lessons from Training Grounded LLMs with Verifiable Rewards | Shang Hong Sim et.al. | 2506.15522 | translate | read | null |
| 2025-06-18 | Zero-Shot Reinforcement Learning Under Partial Observability | Scott Jeen et.al. | 2506.15446 | translate | read | null |
| 2025-06-17 | Reasoning with Exploration: An Entropy Perspective | Daixuan Cheng et.al. | 2506.14758 | translate | read | null |
| 2025-06-17 | Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation | Carolina Higuera et.al. | 2506.14754 | translate | read | null |
| 2025-06-17 | Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Ring Team et.al. | 2506.14731 | translate | read | null |
| 2025-06-17 | Adaptive Accompaniment with ReaLchords | Yusong Wu et.al. | 2506.14723 | translate | read | null |
| 2025-06-17 | SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning | Hexian Ni et.al. | 2506.14648 | translate | read | null |
| 2025-06-17 | On Quantum BSDE Solver for High-Dimensional Parabolic PDEs | Howard Su et.al. | 2506.14612 | translate | read | null |
| 2025-06-17 | TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization | Mingkang Zhu et.al. | 2506.14574 | translate | read | null |
| 2025-06-17 | Toward Safety-First Human-Like Decision Making for Autonomous Vehicles in Time-Varying Traffic Flow | Xiao Wang et.al. | 2506.14502 | translate | read | null |
| 2025-06-17 | Zeroth-Order Optimization is Secretly Single-Step Policy Optimization | Junbin Qiu et.al. | 2506.14460 | translate | read | null |
| 2025-06-17 | Toward Rich Video Human-Motion2D Generation | Ruihao Xi et.al. | 2506.14428 | translate | read | null |
| 2025-06-16 | Touch begins where vision ends: Generalizable policies for contact-rich manipulation | Zifan Zhao et.al. | 2506.13762 | translate | read | null |
| 2025-06-16 | MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering | Arya Fayyazi et.al. | 2506.13755 | translate | read | null |
| 2025-06-16 | LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction | Haoru Xue et.al. | 2506.13751 | translate | read | null |
| 2025-06-16 | PB $^2$ : Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning | Brahim Driss et.al. | 2506.13741 | translate | read | null |
| 2025-06-16 | TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning | Junru Zhang et.al. | 2506.13705 | translate | read | link |
| 2025-06-16 | Value-Free Policy Optimization via Reward Partitioning | Bilal Faye et.al. | 2506.13702 | translate | read | null |
| 2025-06-16 | OneRec Technical Report | Guorui Zhou et.al. | 2506.13695 | translate | read | null |
| 2025-06-16 | Meta-learning how to Share Credit among Macro-Actions | Ionel-Alexandru Hosu et.al. | 2506.13690 | translate | read | null |
| 2025-06-16 | The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning | Jiashun Liu et.al. | 2506.13672 | translate | read | null |
| 2025-06-16 | We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems | Junfeng Fang et.al. | 2506.13666 | translate | read | null |
| 2025-06-13 | Schema-R1: A reasoning training approach for schema linking in Text-to-SQL Task | Wuzhenghong Wen et.al. | 2506.11986 | translate | read | null |
| 2025-06-13 | Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks | Ankit Bhardwaj et.al. | 2506.11973 | translate | read | null |
| 2025-06-13 | Visual Pre-Training on Unlabeled Images using Reinforcement Learning | Dibya Ghosh et.al. | 2506.11967 | translate | read | null |
| 2025-06-13 | Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning | Mohammadamin Moradi et.al. | 2506.11957 | translate | read | null |
| 2025-06-13 | SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies | Nadun Ranawaka Arachchige et.al. | 2506.11948 | translate | read | null |
| 2025-06-13 | Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations | Miguel Suau et.al. | 2506.11912 | translate | read | null |
| 2025-06-13 | Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients | Chapa Sirithunge et.al. | 2506.11906 | translate | read | null |
| 2025-06-13 | TreeRL: LLM Reinforcement Learning with On-Policy Tree Search | Zhenyu Hou et.al. | 2506.11902 | translate | read | link |
| 2025-06-13 | An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing | Haochen Sun et.al. | 2506.11882 | translate | read | null |
| 2025-06-13 | LLM-based Dynamic Differential Testing for Database Connectors with Reinforcement Learning-Guided Prompt Selection | Ce Lyu et.al. | 2506.11870 | translate | read | null |
| 2025-06-12 | Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop | Justin Kerr et.al. | 2506.10968 | translate | read | null |
| 2025-06-12 | Spurious Rewards: Rethinking Training Signals in RLVR | Rulin Shao et.al. | 2506.10947 | translate | read | link |
| 2025-06-12 | Self-Adapting Language Models | Adam Zweiger et.al. | 2506.10943 | translate | read | null |
| 2025-06-12 | Magistral | Mistral-AI et.al. | 2506.10910 | translate | read | null |
| 2025-06-12 | Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning | Waylon Luo et.al. | 2506.10889 | translate | read | null |
| 2025-06-12 | Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization | Pierre-François Massiani et.al. | 2506.10871 | translate | read | null |
| 2025-06-13 | Joint Beamforming with Extremely Large Scale RIS: A Sequential Multi-Agent A2C Approach | Zhi Chai et.al. | 2506.10815 | translate | read | null |
| 2025-06-12 | Human-Robot Navigation using Event-based Cameras and Reinforcement Learning | Ignacio Bugueno-Cordova et.al. | 2506.10790 | translate | read | null |
| 2025-06-12 | PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework | SiXiang Chen et.al. | 2506.10741 | translate | read | link |
| 2025-06-12 | Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs | Yucong Luo et.al. | 2506.10630 | translate | read | null |
| 2025-06-11 | Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Junfei Wu et.al. | 2506.09965 | translate | read | link |
| 2025-06-11 | VerIF: Verification Engineering for Reinforcement Learning in Instruction Following | Hao Peng et.al. | 2506.09942 | translate | read | link |
| 2025-06-11 | The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability | Jiachen Hu et.al. | 2506.09940 | translate | read | null |
| 2025-06-11 | From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models | Irving Fang et.al. | 2506.09930 | translate | read | link |
| 2025-06-11 | “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) | Noel Brindise et.al. | 2506.09901 | translate | read | null |
| 2025-06-11 | Hierarchical Learning-Enhanced MPC for Safe Crowd Navigation with Heterogeneous Constraints | Huajian Liu et.al. | 2506.09859 | translate | read | null |
| 2025-06-11 | Foundation Model-Aided Deep Reinforcement Learning for RIS-Assisted Wireless Communication | Mohammad Ghassemi et.al. | 2506.09855 | translate | read | null |
| 2025-06-11 | CoRT: Code-integrated Reasoning within Thinking | Chengpeng Li et.al. | 2506.09820 | translate | read | link |
| 2025-06-11 | Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy | Tonghe Wang et.al. | 2506.09805 | translate | read | null |
| 2025-06-11 | Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving | Haochen Liu et.al. | 2506.09800 | translate | read | null |
| 2025-06-09 | Play to Generalize: Learning to Reason Through Game Play | Yunfei Xie et.al. | 2506.08011 | translate | read | link |
| 2025-06-09 | Reinforcement Pre-Training | Qingxiu Dong et.al. | 2506.08007 | translate | read | null |
| 2025-06-09 | Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator | Alberto Bazán-Guillén et.al. | 2506.07980 | translate | read | null |
| 2025-06-09 | Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction | Junhong Shen et.al. | 2506.07976 | translate | read | link |
| 2025-06-09 | A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle | Amirreza Yasami et.al. | 2506.07929 | translate | read | null |
| 2025-06-09 | LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement | Dimitris Panagopoulos et.al. | 2506.07915 | translate | read | null |
| 2025-06-09 | WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning | Jie Yang et.al. | 2506.07905 | translate | read | link |
| 2025-06-09 | MiniCPM4: Ultra-Efficient LLMs on End Devices | MiniCPM Team et.al. | 2506.07900 | translate | read | link |
| 2025-06-09 | Diffusion-RL for Scalable Resource Allocation for 6G Networks | Salar Nouri et.al. | 2506.07880 | translate | read | null |
| 2025-06-09 | Versatile Loco-Manipulation through Flexible Interlimb Coordination | Xinghao Zhu et.al. | 2506.07876 | translate | read | null |
| 2025-06-06 | Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens | Jihwan Jeong et.al. | 2506.06261 | translate | read | null |
| 2025-06-06 | How to craft a deep reinforcement learning policy for wind farm flow control | Elie Kadoche et.al. | 2506.06204 | translate | read | null |
| 2025-06-06 | Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization | Jonathan Yang et.al. | 2506.06196 | translate | read | null |
| 2025-06-06 | A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization | Muhammed Ustaomeroglu et.al. | 2506.06179 | translate | read | null |
| 2025-06-06 | Reusing Trajectories in Policy Gradients Enables Fast Convergence | Alessandro Montenegro et.al. | 2506.06178 | translate | read | null |
| 2025-06-06 | Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach | James Ford et.al. | 2506.06175 | translate | read | null |
| 2025-06-06 | Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models | Rihui Jin et.al. | 2506.06137 | translate | read | null |
| 2025-06-06 | Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library | Weixun Wang et.al. | 2506.06122 | translate | read | link |
| 2025-06-06 | On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems | Elim Kwan et.al. | 2506.06094 | translate | read | null |
| 2025-06-06 | Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning | Atharv Kulkarni et.al. | 2506.06093 | translate | read | null |
| 2025-06-05 | ContentV: Efficient Training of Video Generation Models with Limited Compute | Wenfeng Lin et.al. | 2506.05343 | translate | read | null |
| 2025-06-05 | AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs | Lidong Lu et.al. | 2506.05328 | translate | read | link |
| 2025-06-05 | Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay | Yifan Sun et.al. | 2506.05316 | translate | read | null |
| 2025-06-05 | Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s | Ramesh Johari et.al. | 2506.05308 | translate | read | null |
| 2025-06-05 | A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$ : Robust Imitation via Learning to Search | Arnav Kumar Jain et.al. | 2506.05294 | translate | read | link |
| 2025-06-06 | Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning | Violet Xiang et.al. | 2506.05256 | translate | read | null |
| 2025-06-05 | Towards Language-Augmented Multi-Agent Deep Reinforcement Learning | Maxime Toquebiau et.al. | 2506.05236 | translate | read | null |
| 2025-06-05 | Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning | Yuhua Zhu et.al. | 2506.05208 | translate | read | null |
| 2025-06-05 | TreeRPO: Tree Relative Policy Optimization | Zhicheng Yang et.al. | 2506.05183 | translate | read | link |
| 2025-06-05 | Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning | Yunsheng Tian et.al. | 2506.05168 | translate | read | null |
| 2025-06-04 | Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning | Shuang Chen et.al. | 2506.04207 | translate | read | link |
| 2025-06-04 | MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures | Elena Zamaraeva et.al. | 2506.04195 | translate | read | null |
| 2025-06-04 | R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning | Qingfei Zhao et.al. | 2506.04185 | translate | read | link |
| 2025-06-04 | Horizon Reduction Makes RL Scalable | Seohong Park et.al. | 2506.04168 | translate | read | null |
| 2025-06-04 | SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL | Jiaheng Hu et.al. | 2506.04147 | translate | read | null |
| 2025-06-04 | Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning | Muling Wu et.al. | 2506.04065 | translate | read | null |
| 2025-06-04 | Crowd-SFT: Crowdsourcing for LLM Alignment | Alex Sotiropoulos et.al. | 2506.04063 | translate | read | null |
| 2025-06-04 | Autonomous Vehicle Lateral Control Using Deep Reinforcement Learning with MPC-PID Demonstration | Chengdong Wu et.al. | 2506.04040 | translate | read | null |
| 2025-06-04 | Interpretability by Design for Efficient Multi-Objective Reinforcement Learning | Qiyue Xia et.al. | 2506.04022 | translate | read | null |
| 2025-06-04 | Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning | Xunzhu Tang et.al. | 2506.03921 | translate | read | null |
| 2025-06-03 | Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning | Yinjie Wang et.al. | 2506.03136 | translate | read | link |
| 2025-06-03 | AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation | Prashanth Vijayaraghavan et.al. | 2506.03122 | translate | read | null |
| 2025-06-03 | Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback | Xiaoying Zhang et.al. | 2506.03106 | translate | read | link |
| 2025-06-03 | EgoVLM: Policy Optimization for Egocentric Video Understanding | Ashwin Vinod et.al. | 2506.03097 | translate | read | link |
| 2025-06-03 | DPO Learning with LLMs-Judge Signal for Computer Use Agents | Man Luo et.al. | 2506.03095 | translate | read | null |
| 2025-06-03 | Provable Reinforcement Learning from Human Feedback with an Unknown Link Function | Qining Zhang et.al. | 2506.03066 | translate | read | null |
| 2025-06-03 | EDEN: Entorhinal Driven Egocentric Navigation Toward Robotic Deployment | Mikolaj Walczak et.al. | 2506.03046 | translate | read | null |
| 2025-06-03 | Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective | Jintian Shao et.al. | 2506.03038 | translate | read | null |
| 2025-06-03 | MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver | Yuepeng Zheng et.al. | 2506.02935 | translate | read | null |
| 2025-06-03 | Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning | Yin Fang et.al. | 2506.02911 | translate | read | link |
| 2025-06-03 | Reinforcing Video Reasoning with Focused Thinking | Jisheng Dang et.al. | 2505.24718 | translate | read | link |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)