Reinforcement Learning - 2025-07
Reinforcement Learning - 2025-07
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-07-29 | Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics | Leonard Hinckeldey et.al. | 2507.21638 | translate | read | null |
| 2025-07-23 | Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains | Anisha Gunjal et.al. | 2507.17746 | translate | read | null |
| 2025-07-23 | Megrez2 Technical Report | Boxun Li et.al. | 2507.17728 | translate | read | null |
| 2025-07-23 | How Should We Meta-Learn Reinforcement Learning Algorithms? | Alexander David Goldie et.al. | 2507.17668 | translate | read | null |
| 2025-07-23 | CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning | Lingxiao Tang et.al. | 2507.17548 | translate | read | null |
| 2025-07-23 | Generalized Advantage Estimation for Distributional Policy Gradients | Shahil Shaik et.al. | 2507.17530 | translate | read | null |
| 2025-07-23 | Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice | Shanbo Cheng et.al. | 2507.17527 | translate | read | null |
| 2025-07-23 | URPO: A Unified Reward & Policy Optimization Framework for Large Language Models | Songshuo Lu et.al. | 2507.17515 | translate | read | null |
| 2025-07-23 | Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning | Yu Li et.al. | 2507.17512 | translate | read | null |
| 2025-07-23 | ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents | Chang Nie et.al. | 2507.17462 | translate | read | null |
| 2025-07-23 | Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning | Situo Zhang et.al. | 2507.17448 | translate | read | null |
| 2025-07-22 | Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning | Junhao Shen et.al. | 2507.16814 | translate | read | null |
| 2025-07-22 | Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty | Mehul Damani et.al. | 2507.16806 | translate | read | null |
| 2025-07-22 | Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning | Mian Ibad Ali Shah et.al. | 2507.16796 | translate | read | null |
| 2025-07-22 | Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning | Ang Li et.al. | 2507.16746 | translate | read | link |
| 2025-07-23 | Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints | Zhenyun Yin et.al. | 2507.16727 | translate | read | null |
| 2025-07-22 | Adaptive Inventory Strategies using Deep Reinforcement Learning for Dynamic Agri-Food Supply Chains | Amandeep Kaur et.al. | 2507.16670 | translate | read | null |
| 2025-07-22 | FOGNITE: Federated Learning-Enhanced Fog-Cloud Architecture | Somayeh Sobati-M et.al. | 2507.16668 | translate | read | null |
| 2025-07-22 | Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis | Sara Giordano et.al. | 2507.16641 | translate | read | null |
| 2025-07-22 | Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems | Ali Mohamed Ali et.al. | 2507.16635 | translate | read | null |
| 2025-07-22 | Step-Audio 2 Technical Report | Boyong Wu et.al. | 2507.16632 | translate | read | link |
| 2025-07-21 | The Impact of Language Mixing on Bilingual LLM Reasoning | Yihao Li et.al. | 2507.15849 | translate | read | null |
| 2025-07-21 | GUI-G $^2$ : Gaussian Reward Modeling for GUI Grounding | Fei Tang et.al. | 2507.15846 | translate | read | link |
| 2025-07-22 | Hierarchical Budget Policy Optimization for Adaptive Reasoning | Shangke Lyu et.al. | 2507.15844 | translate | read | link |
| 2025-07-21 | LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra | Seth Karten et.al. | 2507.15815 | translate | read | link |
| 2025-07-21 | Power-Constrained Policy Gradient Methods for LQR | Ashwin Verma et.al. | 2507.15806 | translate | read | null |
| 2025-07-21 | Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning | Sneheel Sarangi et.al. | 2507.15788 | translate | read | null |
| 2025-07-21 | Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR | Jiakang Wang et.al. | 2507.15778 | translate | read | link |
| 2025-07-21 | LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization | Xingyu Wu et.al. | 2507.15758 | translate | read | link |
| 2025-07-21 | EMP: Executable Motion Prior for Humanoid Robot Standing Upper-body Motion Imitation | Haocheng Xu et.al. | 2507.15649 | translate | read | null |
| 2025-07-21 | Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training | Kailai Yang et.al. | 2507.15640 | translate | read | null |
| 2025-07-18 | CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | Xiaoya Li et.al. | 2507.14111 | translate | read | link |
| 2025-07-18 | Preference-based Multi-Objective Reinforcement Learning | Ni Mu et.al. | 2507.14066 | translate | read | null |
| 2025-07-18 | Reframing attention as a reinforcement learning problem for causal discovery | Turan Orujlu et.al. | 2507.13920 | translate | read | null |
| 2025-07-18 | Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments | Kathrin Korte et.al. | 2507.13846 | translate | read | null |
| 2025-07-18 | Scalable Submodular Policy Optimization via Pruned Submodularity Graph | Aditi Anand et.al. | 2507.13834 | translate | read | null |
| 2025-07-18 | DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training | Zhixin Wang et.al. | 2507.13833 | translate | read | null |
| 2025-07-18 | Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery | Joydeep Chandra et.al. | 2507.13757 | translate | read | null |
| 2025-07-18 | LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction | Jing Chang et.al. | 2507.13712 | translate | read | null |
| 2025-07-18 | CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation | Jing Chang et.al. | 2507.13710 | translate | read | null |
| 2025-07-18 | State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions | Sen Lu et.al. | 2507.13638 | translate | read | null |
| 2025-07-17 | VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning | Senqiao Yang et.al. | 2507.13348 | translate | read | link |
| 2025-07-17 | The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner | Zhouqi Hua et.al. | 2507.13332 | translate | read | null |
| 2025-07-17 | Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour | Emma M. A. Harrison et.al. | 2507.13277 | translate | read | null |
| 2025-07-17 | QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation | Jiazheng Li et.al. | 2507.13266 | translate | read | null |
| 2025-07-17 | Signal Temporal Logic Compliant Co-design of Planning and Control | Manas Sashank Juvvi et.al. | 2507.13225 | translate | read | null |
| 2025-07-17 | Spectral Bellman Method: Unifying Representation and Exploration in RL | Ofir Nabati et.al. | 2507.13181 | translate | read | null |
| 2025-07-17 | Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback | Suzie Kim et.al. | 2507.13171 | translate | read | null |
| 2025-07-17 | Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities | Hao Sun et.al. | 2507.13158 | translate | read | null |
| 2025-07-17 | From Roots to Rewards: Dynamic Tree Reasoning with RL | Ahmed Bahloul et.al. | 2507.13142 | translate | read | null |
| 2025-07-17 | ZipMPC: Compressed Context-Dependent MPC Cost via Imitation Learning | Rahel Rickenbach et.al. | 2507.13088 | translate | read | null |
| 2025-07-16 | EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos | Ruihan Yang et.al. | 2507.12440 | translate | read | null |
| 2025-07-16 | Improving Reinforcement Learning Sample-Efficiency using Local Approximation | Mohit Prashant et.al. | 2507.12383 | translate | read | null |
| 2025-07-16 | Thought Purity: Defense Paradigm For Chain-of-Thought Attack | Zihao Xue et.al. | 2507.12314 | translate | read | null |
| 2025-07-16 | Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning | Yuhao Chen et.al. | 2507.12215 | translate | read | null |
| 2025-07-16 | BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search | Azhar Ikhtiarudin et.al. | 2507.12189 | translate | read | link |
| 2025-07-17 | Efficient Preparation of Fermionic Superfluids in an Optical Dipole Trap through Reinforcement Learning | Yueyang Min et.al. | 2507.12152 | translate | read | null |
| 2025-07-16 | Topology Enhanced MARL for Multi-Vehicle Cooperative Decision-Making of CAVs | Ye Han et.al. | 2507.12110 | translate | read | null |
| 2025-07-16 | Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics | Muleilan Pei et.al. | 2507.12083 | translate | read | null |
| 2025-07-16 | Towards Ultra-Reliable 6G in-X Subnetworks: Dynamic Link Adaptation by Deep Reinforcement Learning | Fateme Salehi et.al. | 2507.12031 | translate | read | null |
| 2025-07-16 | QAS-QTNs: Curriculum Reinforcement Learning-Driven Quantum Architecture Search for Quantum Tensor Networks | Siddhant Dutta et.al. | 2507.12013 | translate | read | null |
| 2025-07-15 | Robot Drummer: Learning Rhythmic Skills for Humanoid Drumming | Asad Ali Shahid et.al. | 2507.11498 | translate | read | null |
| 2025-07-15 | Exploring the robustness of TractOracle methods in RL-based tractography | Jeremi Levesque et.al. | 2507.11486 | translate | read | null |
| 2025-07-15 | Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light | Mani Hamidi et.al. | 2507.11482 | translate | read | null |
| 2025-07-15 | Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs | Gabriel Bo et.al. | 2507.11371 | translate | read | null |
| 2025-07-15 | Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning | Daniel Tanneberg et.al. | 2507.11367 | translate | read | null |
| 2025-07-15 | Sensing Accuracy Optimization for Multi-UAV SAR Interferometry with Data Offloading | Mohamed-Amine Lahmeri et.al. | 2507.11284 | translate | read | null |
| 2025-07-15 | Ocean Diviner: A Diffusion-Augmented Reinforcement Learning for AUV Robust Control in the Underwater Tasks | Weiyi Liu et.al. | 2507.11283 | translate | read | null |
| 2025-07-15 | Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound | Tal Fiskus et.al. | 2507.11269 | translate | read | null |
| 2025-07-15 | Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction | Deepak Kumar Panda et.al. | 2507.11173 | translate | read | null |
| 2025-07-15 | Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities | Yiting Qu et.al. | 2507.11155 | translate | read | null |
| 2025-07-14 | EmbRACE-3K: Embodied Reasoning and Action in Complex Environments | Mingxian Lin et.al. | 2507.10548 | translate | read | link |
| 2025-07-14 | Disentangling Neural Disjunctive Normal Form Models | Kexin Gu Baugh et.al. | 2507.10546 | translate | read | null |
| 2025-07-14 | Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Mingqi Wu et.al. | 2507.10532 | translate | read | link |
| 2025-07-14 | Some remarks on gradient dominance and LQR policy optimization | Eduardo D. Sontag et.al. | 2507.10452 | translate | read | null |
| 2025-07-14 | Prompt Informed Reinforcement Learning for Visual Coverage Path Planning | Venkat Margapuri et.al. | 2507.10284 | translate | read | null |
| 2025-07-14 | Cross-Timeslot Optimization for Distributed GPU Inference Using Reinforcement Learning | Chengze Du et.al. | 2507.10259 | translate | read | null |
| 2025-07-14 | ToMacVF : Temporal Macro-action Value Factorization for Asynchronous Multi-Agent Reinforcement Learning | Wenjing Zhang et.al. | 2507.10251 | translate | read | null |
| 2025-07-14 | Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning? | Yumi Omori et.al. | 2507.10174 | translate | read | null |
| 2025-07-14 | Robust RL Control for Bipedal Locomotion with Closed Kinematic Chains | Egor Maslennikov et.al. | 2507.10164 | translate | read | null |
| 2025-07-14 | Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review | Siyi Hu et.al. | 2507.10142 | translate | read | null |
| 2025-07-11 | One Token to Fool LLM-as-a-Judge | Yulai Zhao et.al. | 2507.08794 | translate | read | null |
| 2025-07-11 | Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning | James McCarthy et.al. | 2507.08793 | translate | read | null |
| 2025-07-11 | Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data | Jeonghye Kim et.al. | 2507.08761 | translate | read | null |
| 2025-07-11 | On the Effect of Regularization in Policy Mirror Descent | Jan Felix Kleuker et.al. | 2507.08718 | translate | read | null |
| 2025-07-11 | SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations | Peter Crowley et.al. | 2507.08707 | translate | read | null |
| 2025-07-11 | elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings | Philip Osborne et.al. | 2507.08705 | translate | read | null |
| 2025-07-11 | Multi-critic Learning for Whole-body End-effector Twist Tracking | Aravind Elanjimattathil Vijayan et.al. | 2507.08656 | translate | read | null |
| 2025-07-11 | Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees | Berire Gunes Reyhan et.al. | 2507.08653 | translate | read | null |
| 2025-07-11 | Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning | Xingguang Ji et.al. | 2507.08649 | translate | read | link |
| 2025-07-11 | Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data | Parag Dutta et.al. | 2507.08610 | translate | read | null |
| 2025-07-10 | Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology | Haochen Wang et.al. | 2507.07999 | translate | read | link |
| 2025-07-10 | Single-pass Adaptive Image Tokenization for Minimum Program Search | Shivam Duggal et.al. | 2507.07995 | translate | read | null |
| 2025-07-10 | EXPO: Stable Reinforcement Learning with Expressive Policies | Perry Dong et.al. | 2507.07986 | translate | read | null |
| 2025-07-10 | Reinforcement Learning with Action Chunking | Qiyang Li et.al. | 2507.07969 | translate | read | null |
| 2025-07-10 | Scaling RL to Long Videos | Yukang Chen et.al. | 2507.07966 | translate | read | link |
| 2025-07-10 | Excess Observables Reveal Nonreciprocity in Integrated Covariance | Timur Aslyamov et.al. | 2507.07876 | translate | read | null |
| 2025-07-10 | “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents | Giovanni Dispoto et.al. | 2507.07848 | translate | read | null |
| 2025-07-10 | Beyond Robustness: Learning Unknown Dynamic Load Adaptation for Quadruped Locomotion on Rough Terrain | Leixin Chang et.al. | 2507.07825 | translate | read | null |
| 2025-07-10 | BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning | Ruohong Liu et.al. | 2507.07769 | translate | read | null |
| 2025-07-10 | Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization | Chengtao Jian et.al. | 2507.07723 | translate | read | null |
| 2025-07-09 | Graph-Based Complexity Metrics for Multi-Agent Curriculum Learning: A Validated Approach to Task Ordering in Cooperative Coordination Environments | Farhaan Ebadulla et.al. | 2507.07074 | translate | read | null |
| 2025-07-09 | First Return, Entropy-Eliciting Explore | Tianyu Zheng et.al. | 2507.07017 | translate | read | null |
| 2025-07-09 | Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks | Deemah H. Tashman et.al. | 2507.06997 | translate | read | null |
| 2025-07-09 | Optimizing Cognitive Networks: Reinforcement Learning Meets Energy Harvesting Over Cascaded Channels | Deemah H. Tashman et.al. | 2507.06981 | translate | read | null |
| 2025-07-09 | Bounomodes: the grazing ox algorithm for exploration of clustered anomalies | Samuel Matloob et.al. | 2507.06960 | translate | read | null |
| 2025-07-10 | Rethinking Verification for LLM Code Generation: From Generation to Testing | Zihan Ma et.al. | 2507.06920 | translate | read | link |
| 2025-07-09 | Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams | Abolfazl Zarghani et.al. | 2507.06901 | translate | read | null |
| 2025-07-09 | Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model | Jing Liang et.al. | 2507.06892 | translate | read | null |
| 2025-07-09 | Episodic Contextual Bandits with Knapsacks under Conversion Models | Zitian Li et.al. | 2507.06859 | translate | read | null |
| 2025-07-10 | Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning | Matej Straka et.al. | 2507.06825 | translate | read | link |
| 2025-07-08 | EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow | Yixiang Chen et.al. | 2507.06224 | translate | read | null |
| 2025-07-08 | CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization | Zhongyuan Peng et.al. | 2507.06181 | translate | read | link |
| 2025-07-08 | Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model | Koki Yamane et.al. | 2507.06174 | translate | read | null |
| 2025-07-08 | Learning Agile Tensile Perching for Aerial Robots from Demonstrations | Kangle Yuan et.al. | 2507.06172 | translate | read | null |
| 2025-07-08 | Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation | Mohamad H. Danesh et.al. | 2507.06111 | translate | read | null |
| 2025-07-08 | AI-Based Demand Forecasting and Load Balancing for Optimising Energy use in Healthcare Systems: A real case study | Iman Rahimi et.al. | 2507.06077 | translate | read | null |
| 2025-07-09 | FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models | Bo Pang et.al. | 2507.06057 | translate | read | null |
| 2025-07-08 | CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation | Kushal Gajjar et.al. | 2507.06013 | translate | read | null |
| 2025-07-08 | From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination | Chang Yao et.al. | 2507.06004 | translate | read | null |
| 2025-07-08 | BlueLM-2.5-3B Technical Report | Baojiao Xiong et.al. | 2507.05934 | translate | read | null |
| 2025-07-07 | Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Yana Wei et.al. | 2507.05255 | translate | read | link |
| 2025-07-07 | Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving | Elahe Delavari et.al. | 2507.05251 | translate | read | null |
| 2025-07-07 | NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving | Qucheng Peng et.al. | 2507.05227 | translate | read | null |
| 2025-07-07 | EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling | Boyuan Wang et.al. | 2507.05198 | translate | read | null |
| 2025-07-07 | Sequential Attention-based Sampling for Histopathological Analysis | Tarun G et.al. | 2507.05077 | translate | read | null |
| 2025-07-07 | Replacing thinking with tool usage enables reasoning in small language models | Corrado Rainone et.al. | 2507.05065 | translate | read | null |
| 2025-07-07 | When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning | Maxence Boels et.al. | 2507.05011 | translate | read | null |
| 2025-07-07 | Linking Homeostasis to Reinforcement Learning: Internal State Control of Motivated Behavior | Naoto Yoshida et.al. | 2507.04998 | translate | read | null |
| 2025-07-07 | Object-centric Denoising Diffusion Models for Physical Reasoning | Moritz Lange et.al. | 2507.04920 | translate | read | null |
| 2025-07-07 | Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning | Sanyam Vyas et.al. | 2507.04883 | translate | read | null |
| 2025-07-03 | MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs | Purbesh Mitra et.al. | 2507.02851 | translate | read | link |
| 2025-07-03 | StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason | Kaiyi Zhang et.al. | 2507.02841 | translate | read | null |
| 2025-07-03 | ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning | Ruiyang Zhou et.al. | 2507.02834 | translate | read | null |
| 2025-07-03 | Generalizing Verifiable Instruction Following | Valentina Pyatkin et.al. | 2507.02833 | translate | read | null |
| 2025-07-03 | Multimodal Mathematical Reasoning with Diverse Solving Perspective | Wenhao Shi et.al. | 2507.02804 | translate | read | null |
| 2025-07-03 | A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control | Zilin Kang et.al. | 2507.02712 | translate | read | null |
| 2025-07-03 | Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions | Thomas Hazenberg et.al. | 2507.02698 | translate | read | null |
| 2025-07-03 | RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes | Jiaxing Wang et.al. | 2507.02690 | translate | read | null |
| 2025-07-03 | TUC-PPO: Team Utility-Constrained Proximal Policy Optimization for Spatial Public Goods Games | Zhaoqilin Yang et.al. | 2507.02675 | translate | read | null |
| 2025-07-03 | On Efficient Bayesian Exploration in Model-Based Reinforcement Learning | Alberto Caron et.al. | 2507.02639 | translate | read | null |
| 2025-07-02 | Kwai Keye-VL Technical Report | Kwai Keye Team et.al. | 2507.01949 | translate | read | link |
| 2025-07-02 | NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks | Yang Li et.al. | 2507.01921 | translate | read | null |
| 2025-07-02 | Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models | Chengao Li et.al. | 2507.01915 | translate | read | null |
| 2025-07-02 | TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types | Yuhao Lin et.al. | 2507.01857 | translate | read | null |
| 2025-07-02 | TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents | Dmytro Kuzmenko et.al. | 2507.01823 | translate | read | null |
| 2025-07-02 | Quantum reinforcement learning in dynamic environments | Oliver Sefrin et.al. | 2507.01691 | translate | read | null |
| 2025-07-02 | AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training | Zhenyu Han et.al. | 2507.01663 | translate | read | null |
| 2025-07-02 | Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning | Wu Fei et.al. | 2507.01551 | translate | read | null |
| 2025-07-02 | Chargax: A JAX Accelerated EV Charging Simulator | Koen Ponse et.al. | 2507.01522 | translate | read | null |
| 2025-07-02 | Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning | Yanfei Zhang et.al. | 2507.01489 | translate | read | null |
| 2025-07-01 | SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning | Bo Liu et.al. | 2506.24119 | translate | read | link |
| 2025-07-01 | Adapt Your Body: Mitigating Proprioception Shifts in Imitation Learning | Fuhang Kuang et.al. | 2506.23944 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)