Reinforcement Learning - 2025-05
Reinforcement Learning - 2025-05
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-05-30 | ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL | Yu Zhang et.al. | 2505.24875 | translate | read | null |
| 2025-05-30 | ProxyThinker: Test-Time Guidance through Small Visual Reasoners | Zilin Xiao et.al. | 2505.24872 | translate | read | null |
| 2025-05-30 | MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning | Yiqing Liang et.al. | 2505.24871 | translate | read | null |
| 2025-05-30 | ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models | Mingjie Liu et.al. | 2505.24864 | translate | read | null |
| 2025-05-30 | MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning | Jingyan Shen et.al. | 2505.24846 | translate | read | null |
| 2025-05-30 | AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models | Conor Heins et.al. | 2505.24784 | translate | read | null |
| 2025-05-30 | Diffusion-Based Symbolic Regression | Zachary Bastiani et.al. | 2505.24776 | translate | read | null |
| 2025-05-30 | REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards | Zafir Stojanovski et.al. | 2505.24760 | translate | read | link |
| 2025-05-30 | Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning | Shelly Bensal et.al. | 2505.24726 | translate | read | null |
| 2025-05-29 | ZeroGUI: Automating Online GUI Learning at Zero Human Cost | Chenyu Yang et.al. | 2505.23762 | translate | read | link |
| 2025-05-29 | DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning | Ziyin Zhang et.al. | 2505.23754 | translate | read | link |
| 2025-05-29 | PixelThink: Towards Efficient Chain-of-Pixel Reasoning | Song Wang et.al. | 2505.23727 | translate | read | null |
| 2025-05-29 | ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering | Zexi Liu et.al. | 2505.23723 | translate | read | link |
| 2025-05-29 | AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning | Lucas N. Alegre et.al. | 2505.23708 | translate | read | null |
| 2025-05-29 | Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability | Ruida Wang et.al. | 2505.23703 | translate | read | null |
| 2025-05-29 | Grounded Reinforcement Learning for Visual Reasoning | Gabriel Sarch et.al. | 2505.23678 | translate | read | null |
| 2025-05-29 | Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models | Lang Cao et.al. | 2505.23667 | translate | read | null |
| 2025-05-29 | AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction | Niklas Freymuth et.al. | 2505.23663 | translate | read | link |
| 2025-05-29 | Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation | Hongxiang Zhang et.al. | 2505.23657 | translate | read | null |
| 2025-05-28 | Maximizing Confidence Alone Improves Reasoning | Mihir Prabhudesai et.al. | 2505.22660 | translate | read | null |
| 2025-05-28 | The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason | Ang Lv et.al. | 2505.22653 | translate | read | null |
| 2025-05-28 | WebDancer: Towards Autonomous Information Seeking Agency | Jialong Wu et.al. | 2505.22648 | translate | read | null |
| 2025-05-28 | FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control | Younggyo Seo et.al. | 2505.22642 | translate | read | null |
| 2025-05-28 | SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning | Yu Zhang et.al. | 2505.22626 | translate | read | null |
| 2025-05-28 | The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models | Ganqu Cui et.al. | 2505.22617 | translate | read | null |
| 2025-05-28 | HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym | Ngoc La et.al. | 2505.22597 | translate | read | null |
| 2025-05-28 | SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning | Jiaqi Huang et.al. | 2505.22596 | translate | read | null |
| 2025-05-28 | Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs | Changhao Song et.al. | 2505.22548 | translate | read | null |
| 2025-05-28 | Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation | Hongyi Zhou et.al. | 2505.22492 | translate | read | null |
| 2025-05-27 | Reinforcing General Reasoning without Verifiers | Xiangxin Zhou et.al. | 2505.21493 | translate | read | null |
| 2025-05-27 | Policy Optimized Text-to-Image Pipeline Design | Uri Gadot et.al. | 2505.21478 | translate | read | null |
| 2025-05-27 | Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO | Muzhi Zhu et.al. | 2505.21457 | translate | read | null |
| 2025-05-27 | Can Large Reasoning Models Self-Train? | Sheikh Shafayat et.al. | 2505.21444 | translate | read | null |
| 2025-05-27 | A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment | Brett Bissey et.al. | 2505.21414 | translate | read | null |
| 2025-05-27 | MRSD: Multi-Resolution Skill Discovery for HRL Agents | Shashank Sharma et.al. | 2505.21410 | translate | read | null |
| 2025-05-27 | Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features | Zixuan Xie et.al. | 2505.21391 | translate | read | null |
| 2025-05-27 | EgoWalk: A Multimodal Dataset for Robot Navigation in the Wild | Timur Akhtyamov et.al. | 2505.21282 | translate | read | null |
| 2025-05-27 | Data-Driven Cellular Mobility Management via Bayesian Optimization and Reinforcement Learning | Mohamed Benzaghta et.al. | 2505.21249 | translate | read | null |
| 2025-05-27 | Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies | Felix Chalumeau et.al. | 2505.21236 | translate | read | null |
| 2025-05-26 | FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities | Jin Wang et.al. | 2505.20147 | translate | read | null |
| 2025-05-26 | MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning | Yuanxin Zhuang et.al. | 2505.20131 | translate | read | null |
| 2025-05-26 | Proxy-Free GFlowNet | Ruishuo Chen et.al. | 2505.20110 | translate | read | null |
| 2025-05-26 | Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning | Ziyi Zhang et.al. | 2505.20107 | translate | read | null |
| 2025-05-26 | Adaptive Deep Reasoning: Triggering Deep Thinking When Needed | Yunhao Wang et.al. | 2505.20101 | translate | read | null |
| 2025-05-26 | SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale | Qi Li et.al. | 2505.20094 | translate | read | null |
| 2025-05-26 | Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback | Mengdi Li et.al. | 2505.20075 | translate | read | null |
| 2025-05-26 | Incentivizing Reasoning from Weak Supervision | Yige Yuan et.al. | 2505.20072 | translate | read | null |
| 2025-05-26 | SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety | Geon-Hyeong Kim et.al. | 2505.20065 | translate | read | null |
| 2025-05-26 | REARANK: Reasoning Re-ranking Agent via Reinforcement Learning | Le Zhang et.al. | 2505.20046 | translate | read | null |
| 2025-05-23 | One RL to See Them All: Visual Triple Unified Reinforcement Learning | Yan Ma et.al. | 2505.18129 | translate | read | null |
| 2025-05-23 | Reward Model Overoptimisation in Iterated RLHF | Lorenz Wolf et.al. | 2505.18126 | translate | read | null |
| 2025-05-23 | ProgRM: Build Better GUI Agents with Progress Rewards | Danyang Zhang et.al. | 2505.18121 | translate | read | null |
| 2025-05-23 | Bridging Supervised Learning and Reinforcement Learning in Math Reasoning | Huayu Chen et.al. | 2505.18116 | translate | read | null |
| 2025-05-23 | Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL | Joey Hong et.al. | 2505.18098 | translate | read | null |
| 2025-05-23 | Stable Reinforcement Learning for Efficient Reasoning | Muzhi Dai et.al. | 2505.18086 | translate | read | null |
| 2025-05-23 | What Do You Need for Diverse Trajectory Stitching in Diffusion Planning? | Quentin Clark et.al. | 2505.18083 | translate | read | null |
| 2025-05-23 | Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals | Jia-Nan Li et.al. | 2505.18071 | translate | read | null |
| 2025-05-23 | Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective | Jintian Shao et.al. | 2505.17997 | translate | read | null |
| 2025-05-23 | Outcome-based Reinforcement Learning to Predict the Future | Benjamin Turtel et.al. | 2505.17989 | translate | read | null |
| 2025-05-22 | GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | Chengqi Duan et.al. | 2505.17022 | translate | read | link |
| 2025-05-22 | SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward | Kaixuan Fan et.al. | 2505.17018 | translate | read | link |
| 2025-05-22 | Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO | Chengzhuo Tong et.al. | 2505.17017 | translate | read | link |
| 2025-05-22 | Interactive Post-Training for Vision-Language-Action Models | Shuhan Tan et.al. | 2505.17016 | translate | read | null |
| 2025-05-22 | R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning | Huatong Song et.al. | 2505.17005 | translate | read | link |
| 2025-05-22 | $\text{R}^2\text{ec}$ : Towards Large Recommender Models with Reasoning | Runyang You et.al. | 2505.16994 | translate | read | link |
| 2025-05-22 | SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development | Yaxin Du et.al. | 2505.16975 | translate | read | link |
| 2025-05-22 | Risk-Averse Reinforcement Learning with Itakura-Saito Loss | Igor Udovichenko et.al. | 2505.16925 | translate | read | null |
| 2025-05-22 | LARES: Latent Reasoning for Sequential Recommendation | Enze Liu et.al. | 2505.16865 | translate | read | null |
| 2025-05-22 | Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only | Wei Xiao et.al. | 2505.16856 | translate | read | null |
| 2025-05-21 | GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents | Yuqi Zhou et.al. | 2505.15810 | translate | read | link |
| 2025-05-21 | MMaDA: Multimodal Large Diffusion Language Models | Ling Yang et.al. | 2505.15809 | translate | read | link |
| 2025-05-21 | STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | Zongzhao Li et.al. | 2505.15804 | translate | read | null |
| 2025-05-21 | VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models | Yuchen Yan et.al. | 2505.15801 | translate | read | null |
| 2025-05-21 | Reverse Engineering Human Preferences with Reinforcement Learning | Lisa Alazraki et.al. | 2505.15795 | translate | read | null |
| 2025-05-21 | HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving | Zhiwen Chen et.al. | 2505.15793 | translate | read | null |
| 2025-05-21 | VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL | Fengyuan Dai et.al. | 2505.15791 | translate | read | null |
| 2025-05-21 | ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning | Changtai Zhu et.al. | 2505.15776 | translate | read | null |
| 2025-05-21 | Improving planning and MBRL with temporally-extended actions | Palash Chatterjee et.al. | 2505.15754 | translate | read | null |
| 2025-05-21 | UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning | Xiangyu Wang et.al. | 2505.15725 | translate | read | null |
| 2025-05-20 | Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning | Haolei Xu et.al. | 2505.14684 | translate | read | link |
| 2025-05-20 | Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning | Jiaer Xia et.al. | 2505.14677 | translate | read | link |
| 2025-05-20 | Reward Reasoning Model | Jiaxin Guo et.al. | 2505.14674 | translate | read | null |
| 2025-05-20 | General-Reasoner: Advancing LLM Reasoning Across All Domains | Xueguang Ma et.al. | 2505.14652 | translate | read | link |
| 2025-05-20 | Think Only When You Need with Large Hybrid-Reasoning Models | Lingjie Jiang et.al. | 2505.14631 | translate | read | null |
| 2025-05-20 | TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning | Zhangchen Xu et.al. | 2505.14625 | translate | read | link |
| 2025-05-20 | Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning | Wenbin Hu et.al. | 2505.14585 | translate | read | null |
| 2025-05-20 | Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning | Deemah H. Tashman et.al. | 2505.14581 | translate | read | null |
| 2025-05-20 | KIPPO: Koopman-Inspired Proximal Policy Optimization | Andrei Cozma et.al. | 2505.14566 | translate | read | null |
| 2025-05-20 | Bellman operator convergence enhancements in reinforcement learning algorithms | David Krame Kadurha et.al. | 2505.14564 | translate | read | null |
| 2025-05-19 | Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards | Xiaoyuan Liu et.al. | 2505.13445 | translate | read | link |
| 2025-05-19 | Optimizing Anytime Reasoning via Budget Relative Policy Optimization | Penghui Qi et.al. | 2505.13438 | translate | read | link |
| 2025-05-19 | KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture | R. James Cotton et.al. | 2505.13436 | translate | read | null |
| 2025-05-19 | G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning | Liang Chen et.al. | 2505.13426 | translate | read | link |
| 2025-05-20 | A Dataless Reinforcement Learning Approach to Rounding Hyperplane Optimization for Max-Cut | Gabriel Malikal et.al. | 2505.13405 | translate | read | null |
| 2025-05-19 | Thinkless: LLM Learns When to Think | Gongfan Fang et.al. | 2505.13379 | translate | read | link |
| 2025-05-19 | Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning | Irene Brugnara et.al. | 2505.13372 | translate | read | null |
| 2025-05-19 | J4R: Learning to Judge with Equivalent Initial State Group Relative Preference Optimization | Austin Xu et.al. | 2505.13346 | translate | read | null |
| 2025-05-19 | Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems | Babak Badnava et.al. | 2505.13337 | translate | read | null |
| 2025-05-19 | CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning | Lei Sheng et.al. | 2505.13271 | translate | read | link |
| 2025-05-16 | SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics | Lizhi Yang et.al. | 2505.11494 | translate | read | null |
| 2025-05-16 | Improving Assembly Code Performance with Large Language Models via Reinforcement Learning | Anjiang Wei et.al. | 2505.11480 | translate | read | null |
| 2025-05-16 | Automatic Reward Shaping from Confounded Offline Data | Mingxuan Li et.al. | 2505.11478 | translate | read | null |
| 2025-05-16 | HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages | Zhilin Wang et.al. | 2505.11475 | translate | read | null |
| 2025-05-16 | Disentangling Reasoning and Knowledge in Medical Large Language Models | Rahul Thapa et.al. | 2505.11462 | translate | read | null |
| 2025-05-16 | Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks | Wesley A Suttle et.al. | 2505.11461 | translate | read | null |
| 2025-05-16 | Visual Planning: Let’s Think Only with Images | Yi Xu et.al. | 2505.11409 | translate | read | link |
| 2025-05-16 | Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner | Wenchuan Zhang et.al. | 2505.11404 | translate | read | link |
| 2025-05-16 | Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space | Ali Rabiee et.al. | 2505.11366 | translate | read | null |
| 2025-05-16 | Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics | Ardian Selmonaj et.al. | 2505.11311 | translate | read | null |
| 2025-05-15 | Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models | Zhiyuan Hu et.al. | 2505.10554 | translate | read | link |
| 2025-05-15 | Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation | Xinrui Wang et.al. | 2505.10522 | translate | read | null |
| 2025-05-15 | Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning | Andrea Baisero et.al. | 2505.10484 | translate | read | null |
| 2025-05-15 | Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps | Ningyuan Yang et.al. | 2505.10482 | translate | read | null |
| 2025-05-15 | Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | Zemin Huang et.al. | 2505.10446 | translate | read | null |
| 2025-05-15 | IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning | Dechen Gao et.al. | 2505.10442 | translate | read | null |
| 2025-05-15 | Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs | Jingyao Wang et.al. | 2505.10425 | translate | read | null |
| 2025-05-15 | Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency | Daniel Weitekamp et.al. | 2505.10422 | translate | read | null |
| 2025-05-15 | Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change | Jonathan Clifford Balloch et.al. | 2505.10330 | translate | read | null |
| 2025-05-15 | J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning | Chenxi Whitehouse et.al. | 2505.10320 | translate | read | null |
| 2025-05-14 | DataMIL: Selecting Data for Robot Imitation Learning with Datamodels | Shivin Dass et.al. | 2505.09603 | translate | read | null |
| 2025-05-14 | Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware | Justin Yu et.al. | 2505.09601 | translate | read | link |
| 2025-05-14 | VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation | Chaofan Zhang et.al. | 2505.09577 | translate | read | null |
| 2025-05-14 | Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach | Shannon Lodoen et.al. | 2505.09576 | translate | read | null |
| 2025-05-14 | Learning Long-Context Diffusion Policies via Past-Token Prediction | Marcel Torne et.al. | 2505.09561 | translate | read | null |
| 2025-05-14 | WavReward: Spoken Dialogue Models With Generalist Reward Evaluators | Shengpeng Ji et.al. | 2505.09558 | translate | read | link |
| 2025-05-14 | Distilling Realizable Students from Unrealizable Teachers | Yujin Kim et.al. | 2505.09546 | translate | read | null |
| 2025-05-14 | Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data | Rui Miao et.al. | 2505.09496 | translate | read | null |
| 2025-05-14 | Preserving Plasticity in Continual Learning with Adaptive Linearity Injection | Seyed Roozbeh Razavi Rohani et.al. | 2505.09486 | translate | read | null |
| 2025-05-14 | Quantum state-agnostic work extraction (almost) without dissipation | Josep Lumbreras et.al. | 2505.09456 | translate | read | null |
| 2025-05-13 | Generative Molecular Design with Steerable and Granular Synthesizability Control | Jeff Guo et.al. | 2505.08774 | translate | read | null |
| 2025-05-13 | Preference Optimization for Combinatorial Optimization Problems | Mingjun Pan et.al. | 2505.08735 | translate | read | null |
| 2025-05-13 | A Study of Data-driven Methods for Inventory Optimization | Lee Yeung Ping et.al. | 2505.08673 | translate | read | null |
| 2025-05-13 | Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning | Shuai Han et.al. | 2505.08630 | translate | read | null |
| 2025-05-13 | Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations | Sarmad Mehrdad et.al. | 2505.08619 | translate | read | null |
| 2025-05-13 | OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning | Zhaochen Su et.al. | 2505.08617 | translate | read | link |
| 2025-05-13 | Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection | Ayush K. Rai et.al. | 2505.08561 | translate | read | null |
| 2025-05-13 | Strategy-Augmented Planning for Large Language Models via Opponent Exploitation | Shuai Xu et.al. | 2505.08459 | translate | read | null |
| 2025-05-13 | Zero-Shot Sim-to-Real Reinforcement Learning for Fruit Harvesting | Emlyn Williams et.al. | 2505.08458 | translate | read | null |
| 2025-05-13 | Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges | Miguel Arana-Catania et.al. | 2505.08453 | translate | read | null |
| 2025-05-12 | DanceGRPO: Unleashing GRPO on Visual Generation | Zeyue Xue et.al. | 2505.07818 | translate | read | link |
| 2025-05-12 | A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values | Daniel Beechey et.al. | 2505.07797 | translate | read | link |
| 2025-05-12 | MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering | Rushi Qiang et.al. | 2505.07782 | translate | read | link |
| 2025-05-12 | Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | Xinji Mai et.al. | 2505.07773 | translate | read | link |
| 2025-05-12 | Guiding Data Collection via Factored Scaling Curves | Lihan Zha et.al. | 2505.07728 | translate | read | link |
| 2025-05-12 | S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models | Muzhi Dai et.al. | 2505.07686 | translate | read | null |
| 2025-05-12 | A comparative study of Bitcoin and Ripple cryptocurrencies trading using Deep Reinforcement Learning algorithms | Dieu-Donne Fangnon et.al. | 2505.07660 | translate | read | null |
| 2025-05-12 | MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining | Xiaomi LLM-Core Team et.al. | 2505.07608 | translate | read | link |
| 2025-05-12 | Multi-Objective Reinforcement Learning for Energy-Efficient Industrial Control | Georg Schäfer et.al. | 2505.07607 | translate | read | null |
| 2025-05-12 | Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent | Ziyang Huang et.al. | 2505.07596 | translate | read | link |
| 2025-05-09 | VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction | Noah Frahm et.al. | 2505.06219 | translate | read | null |
| 2025-05-09 | Let Humanoids Hike! Integrative Skill Development on Complex Trails | Kwan-Yee Lin et.al. | 2505.06218 | translate | read | null |
| 2025-05-09 | Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach | Tim Schneider et.al. | 2505.06182 | translate | read | null |
| 2025-05-09 | Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning | Haokun Yu et.al. | 2505.06122 | translate | read | null |
| 2025-05-09 | TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations | Shuaiyi Huang et.al. | 2505.06079 | translate | read | null |
| 2025-05-09 | Safe-EF: Error Feedback for Nonsmooth Constrained Optimization | Rustem Islamov et.al. | 2505.06053 | translate | read | null |
| 2025-05-09 | Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI | Jianpeng Qi et.al. | 2505.06025 | translate | read | null |
| 2025-05-09 | Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models | Lennart Stöpler et.al. | 2505.05970 | translate | read | null |
| 2025-05-09 | Offline Multi-agent Reinforcement Learning via Score Decomposition | Dan Qiao et.al. | 2505.05968 | translate | read | null |
| 2025-05-09 | Learning Power Control Protocol for In-Factory 6G Subnetworks | Uyoata E. Uyoata et.al. | 2505.05967 | translate | read | null |
| 2025-05-08 | Flow-GRPO: Training Flow Matching Models via Online RL | Jie Liu et.al. | 2505.05470 | translate | read | link |
| 2025-05-08 | RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles | Pouria Behnoudfar et.al. | 2505.05452 | translate | read | null |
| 2025-05-08 | Reasoning Models Don’t Always Say What They Think | Yanda Chen et.al. | 2505.05410 | translate | read | null |
| 2025-05-08 | Repair Crew Routing for Infrastructure Network Restoration under Incomplete Information | Subhojit Biswas et.al. | 2505.05297 | translate | read | null |
| 2025-05-08 | Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation | Zechu Li et.al. | 2505.05287 | translate | read | null |
| 2025-05-08 | Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration | Andreas Kontogiannis et.al. | 2505.05262 | translate | read | null |
| 2025-05-08 | High Altitude Platform-Based Caching and Multicasting for Rural Connectivity | Yongqiang Zhang et.al. | 2505.05251 | translate | read | null |
| 2025-05-08 | Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation | Luca Marzari et.al. | 2505.05235 | translate | read | null |
| 2025-05-08 | Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network | Changxiang Wu et.al. | 2505.05231 | translate | read | null |
| 2025-05-08 | Multi-Objective Reinforcement Learning for Adaptive Personalized Autonomous Driving | Hendrik Surmann et.al. | 2505.05223 | translate | read | null |
| 2025-05-07 | EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning | Zhenghao Xing et.al. | 2505.04623 | translate | read | link |
| 2025-05-07 | Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation | Abdulaziz Almuzairee et.al. | 2505.04619 | translate | read | null |
| 2025-05-07 | ZeroSearch: Incentivize the Search Capability of LLMs without Searching | Hao Sun et.al. | 2505.04588 | translate | read | link |
| 2025-05-07 | Active Sampling for MRI-based Sequential Decision Making | Yuning Du et.al. | 2505.04586 | translate | read | link |
| 2025-05-07 | Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions | Stéphane Aroca-Ouellette et.al. | 2505.04579 | translate | read | null |
| 2025-05-07 | Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization | Wenjun Cao et.al. | 2505.04578 | translate | read | null |
| 2025-05-07 | Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions | Shanyu Han et.al. | 2505.04553 | translate | read | null |
| 2025-05-07 | A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance | Axel Friedrich Wolter et.al. | 2505.04494 | translate | read | null |
| 2025-05-07 | RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation | Jing Hu et.al. | 2505.04424 | translate | read | link |
| 2025-05-07 | A Heuristic-Integrated DRL Approach for Phase Optimization in Large-Scale RISs | Wei Wang et.al. | 2505.04401 | translate | read | null |
| 2025-05-06 | AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control | Jialong Li et.al. | 2505.03738 | translate | read | null |
| 2025-05-06 | Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning | Dian Chen et.al. | 2505.03721 | translate | read | null |
| 2025-05-06 | Actor-Critics Can Achieve Optimal Sample Efficiency | Kevin Tan et.al. | 2505.03710 | translate | read | null |
| 2025-05-06 | Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches | Feiran Zhao et.al. | 2505.03706 | translate | read | null |
| 2025-05-06 | Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation | Songchen Fu et.al. | 2505.03586 | translate | read | null |
| 2025-05-06 | Ergodic Generative Flows | Leo Maxime Brunswic et.al. | 2505.03561 | translate | read | null |
| 2025-05-06 | Multi-Agent Reinforcement Learning Scheduling to Support Low Latency in Teleoperated Driving | Giacomo Avanzi et.al. | 2505.03558 | translate | read | null |
| 2025-05-06 | Small-Scale-Fading-Aware Resource Allocation in Wireless Federated Learning | Jiacheng Wang et.al. | 2505.03533 | translate | read | null |
| 2025-05-06 | The Steganographic Potentials of Language Models | Artem Karpov et.al. | 2505.03439 | translate | read | null |
| 2025-05-06 | Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients | Stefano Bruno et.al. | 2505.03432 | translate | read | null |
| 2025-05-05 | R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning | Yi-Fan Zhang et.al. | 2505.02835 | translate | read | link |
| 2025-05-05 | TWIST: Teleoperated Whole-Body Imitation System | Yanjie Ze et.al. | 2505.02833 | translate | read | null |
| 2025-05-05 | Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing | Diji Yang et.al. | 2505.02811 | translate | read | link |
| 2025-05-05 | Teaching the social media generation: rethinking learning without sacrificing quality | Sepinoud Azimi et.al. | 2505.02770 | translate | read | null |
| 2025-05-05 | The use of Artificial Intelligence for Intervention and Assessment in Individuals with ASD | Aggeliki Sideraki et.al. | 2505.02747 | translate | read | null |
| 2025-05-05 | Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry | Junu Kim et.al. | 2505.02722 | translate | read | link |
| 2025-05-05 | Graph Neural Network-Based Reinforcement Learning for Controlling Biological Networks: The GATTACA Framework | Andrzej Mizera et.al. | 2505.02712 | translate | read | null |
| 2025-05-05 | Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models | Xiaobao Wu et.al. | 2505.02686 | translate | read | link |
| 2025-05-05 | Online Phase Estimation of Human Oscillatory Motions using Deep Learning | Antonio Grotta et.al. | 2505.02668 | translate | read | null |
| 2025-05-05 | A Survey on Progress in LLM Alignment from the Perspective of Reward Design | Miaomiao Ji et.al. | 2505.02666 | translate | read | null |
| 2025-05-02 | FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research | Yan Miao et.al. | 2505.01383 | translate | read | null |
| 2025-05-02 | Stabilizing Temporal Difference Learning via Implicit Stochastic Approximation | Hwanwoo Kim et.al. | 2505.01361 | translate | read | null |
| 2025-05-02 | Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story | Vincenzo De Paola et.al. | 2505.01336 | translate | read | null |
| 2025-05-02 | Integration of Multi-Mode Preference into Home Energy Management System Using Deep Reinforcement Learning | Mohammed Sumayli et.al. | 2505.01332 | translate | read | null |
| 2025-05-02 | Exploring Equity of Climate Policies using Multi-Agent Multi-Objective Reinforcement Learning | Palok Biswas et.al. | 2505.01115 | translate | read | null |
| 2025-05-02 | Multi-Objective Reinforcement Learning for Water Management | Zuzanna Osika et.al. | 2505.01094 | translate | read | null |
| 2025-05-02 | Llama-Nemotron: Efficient Reasoning Models | Akhiad Bercovich et.al. | 2505.00949 | translate | read | null |
| 2025-05-01 | Learning Neural Control Barrier Functions from Offline Data with Conservatism | Ihab Tabbara et.al. | 2505.00908 | translate | read | null |
| 2025-05-01 | SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation | Quang P. M. Pham et.al. | 2505.00831 | translate | read | null |
| 2025-05-01 | Constructing an Optimal Behavior Basis for the Option Keyboard | Lucas N. Alegre et.al. | 2505.00787 | translate | read | null |
| 2025-05-01 | T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT | Dongzhi Jiang et.al. | 2505.00703 | translate | read | link |
| 2025-05-01 | Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions | Chenggang Wang et.al. | 2505.00671 | translate | read | null |
| 2025-05-01 | Deep Reinforcement Learning for Urban Air Quality Management: Multi-Objective Optimization of Pollution Mitigation Booth Placement in Metropolitan Environments | Kirtan Rajesh et.al. | 2505.00668 | translate | read | null |
| 2025-05-01 | Wasserstein Policy Optimization | David Pfau et.al. | 2505.00663 | translate | read | null |
| 2025-05-01 | DeepCritic: Deliberate Critique with Large Language Models | Wenkai Yang et.al. | 2505.00662 | translate | read | link |
| 2025-05-02 | 100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models | Chong Zhang et.al. | 2505.00551 | translate | read | null |
| 2025-05-01 | Directly Forecasting Belief for Reinforcement Learning with Delays | Qingyuan Wu et.al. | 2505.00546 | translate | read | null |
| 2025-05-01 | Emergence of Roles in Robotic Teams with Model Sharing and Limited Communication | Ian O’Flynn et.al. | 2505.00540 | translate | read | null |
| 2025-05-01 | Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks | Xinyu Wang et.al. | 2505.00530 | translate | read | null |
| 2025-05-01 | DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation | Zixuan Chen et.al. | 2505.00527 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)