Reinforcement Learning - 2025-02
Reinforcement Learning - 2025-02
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-02-28 | LLM Post-Training: A Deep Dive into Reasoning Large Language Models | Komal Kumar et.al. | 2502.21321 | translate | read | null |
| 2025-02-28 | ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers | Alexander Scarlatos et.al. | 2502.21267 | translate | read | null |
| 2025-02-28 | ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs | Hao Ge et.al. | 2502.21231 | translate | read | null |
| 2025-02-28 | A Method of Selective Attention for Reservoir Based Agents | Kevin McKee et.al. | 2502.21229 | translate | read | null |
| 2025-02-28 | Reducing Reward Dependence in RL Through Adaptive Confidence Discounting | Muhammed Yusuf Satici et.al. | 2502.21181 | translate | read | null |
| 2025-02-28 | Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning | Léopold Maytié et.al. | 2502.21142 | translate | read | null |
| 2025-02-28 | Dynamically Local-Enhancement Planner for Large-Scale Autonomous Driving | Nanshan Deng et.al. | 2502.21134 | translate | read | null |
| 2025-02-28 | AuthSim: Towards Authentic and Effective Safety-critical Scenario Generation for Autonomous Driving Tests | Yukuan Yang et.al. | 2502.21100 | translate | read | null |
| 2025-02-28 | Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control | Taeho Lee et.al. | 2502.21057 | translate | read | null |
| 2025-02-28 | Motion ReTouch: Motion Modification Using Four-Channel Bilateral Control | Koki Inami et.al. | 2502.20982 | translate | read | null |
| 2025-02-27 | Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids | Toru Lin et.al. | 2502.20396 | translate | read | null |
| 2025-02-27 | Multi-Turn Code Generation Through Single-Step Rewards | Arnav Kumar Jain et.al. | 2502.20380 | translate | read | null |
| 2025-02-27 | The Role of Tactile Sensing for Learning Reach and Grasp | Boya Zhang et.al. | 2502.20367 | translate | read | null |
| 2025-02-27 | Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning | Thomas Budiarjo et.al. | 2502.20348 | translate | read | null |
| 2025-02-27 | Safety Representations for Safer Policy Learning | Kaustubh Mani et.al. | 2502.20341 | translate | read | null |
| 2025-02-27 | Deep Reinforcement Learning based Autonomous Decision-Making for Cooperative UAVs: A Search and Rescue Real World Application | Thomas Hickling et.al. | 2502.20326 | translate | read | null |
| 2025-02-27 | On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+( $λ$,$λ$ ))-GA | Tai Nguyen et.al. | 2502.20265 | translate | read | null |
| 2025-02-27 | Explainable physics-based constraints on reinforcement learning for accelerator controls | Jonathan Colen et.al. | 2502.20247 | translate | read | null |
| 2025-02-27 | MARVEL: Multi-Agent Reinforcement Learning for constrained field-of-View multi-robot Exploration in Large-scale environments | Jimmy Chiun et.al. | 2502.20217 | translate | read | null |
| 2025-02-27 | Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies | Zhouyu He et.al. | 2502.20190 | translate | read | null |
| 2025-02-26 | Recurrent Auto-Encoders for Enhanced Deep Reinforcement Learning in Wilderness Search and Rescue Planning | Jan-Hendrik Ewers et.al. | 2502.19356 | translate | read | null |
| 2025-02-26 | Hybrid Robot Learning for Automatic Robot Motion Planning in Manufacturing | Siddharth Singh et.al. | 2502.19340 | translate | read | null |
| 2025-02-26 | WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies | William Solow et.al. | 2502.19308 | translate | read | null |
| 2025-02-26 | Combining Planning and Reinforcement Learning for Solving Relational Multiagent Domains | Nikhilesh Prabhakar et.al. | 2502.19297 | translate | read | null |
| 2025-02-26 | Deep Computerized Adaptive Testing | Jiguang Li et.al. | 2502.19275 | translate | read | null |
| 2025-02-26 | Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective | Jiawei Huang et.al. | 2502.19255 | translate | read | null |
| 2025-02-26 | ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration | Minjie Zhu et.al. | 2502.19250 | translate | read | null |
| 2025-02-26 | Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time | Jiazheng Li et.al. | 2502.19230 | translate | read | null |
| 2025-02-26 | When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning | Yijiang River Dong et.al. | 2502.19158 | translate | read | null |
| 2025-02-26 | Policy Testing with MDPFuzz (Replicability Study) | Quentin Mazouni et.al. | 2502.19116 | translate | read | null |
| 2025-02-25 | SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | Yuxiang Wei et.al. | 2502.18449 | translate | read | null |
| 2025-02-25 | MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning | Chanwoo Park et.al. | 2502.18439 | translate | read | null |
| 2025-02-25 | Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand | Fengshuo Bai et.al. | 2502.18423 | translate | read | null |
| 2025-02-25 | Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck | Ryo Takizawa et.al. | 2502.18121 | translate | read | null |
| 2025-02-25 | Controlling dynamics of stochastic systems with deep reinforcement learning | Ruslan Mukhamadiarov et.al. | 2502.18111 | translate | read | null |
| 2025-02-25 | From planning to policy: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation | Haewon Jung et.al. | 2502.18015 | translate | read | null |
| 2025-02-25 | NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms | Yashan Wang et.al. | 2502.18008 | translate | read | null |
| 2025-02-25 | Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric | Zhenyu Tao et.al. | 2502.17983 | translate | read | null |
| 2025-02-25 | FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real | Weiheng Liu et.al. | 2502.17894 | translate | read | null |
| 2025-02-25 | Sample-efficient diffusion-based control of complex nonlinear systems | Hongyi Chen et.al. | 2502.17893 | translate | read | null |
| 2025-02-24 | Event-Based Limit Order Book Simulation under a Neural Hawkes Process: Application in Market-Making | Luca Lalor et.al. | 2502.17417 | translate | read | null |
| 2025-02-24 | Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models | Alon Albalak et.al. | 2502.17387 | translate | read | link |
| 2025-02-24 | Distributed Coordination for Heterogeneous Non-Terrestrial Networks | Jikang Deng et.al. | 2502.17366 | translate | read | null |
| 2025-02-24 | TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control | Zifeng Zhuang et.al. | 2502.17322 | translate | read | null |
| 2025-02-24 | Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach | Jichen Li et.al. | 2502.17307 | translate | read | null |
| 2025-02-24 | A Reinforcement Learning Approach to Non-prehensile Manipulation through Sliding | Hamidreza Raei et.al. | 2502.17221 | translate | read | null |
| 2025-02-24 | Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning | Weiji Xie et.al. | 2502.17219 | translate | read | null |
| 2025-02-24 | Teleology-Driven Affective Computing: A Causal Framework for Sustained Well-Being | Bin Yin et.al. | 2502.17172 | translate | read | null |
| 2025-02-24 | A Novel Multiple Access Scheme for Heterogeneous Wireless Communications using Symmetry-aware Continual Deep Reinforcement Learning | Hamidreza Mazandarani et.al. | 2502.17167 | translate | read | null |
| 2025-02-24 | MA2RL: Masked Autoencoders for Generalizable Multi-Agent Reinforcement Learning | Jinyuan Feng et.al. | 2502.17046 | translate | read | null |
| 2025-02-21 | BOSS: Benchmark for Observation Space Shift in Long-Horizon Task | Yue Yang et.al. | 2502.15679 | translate | read | null |
| 2025-02-21 | VaViM and VaVAM: Autonomous Driving through Video Generative Modeling | Florent Bartoccioni et.al. | 2502.15672 | translate | read | link |
| 2025-02-21 | Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network | Vincent Hsiao et.al. | 2502.15662 | translate | read | null |
| 2025-02-21 | A Simulation Pipeline to Facilitate Real-World Robotic Reinforcement Learning Applications | Jefferson Silveira et.al. | 2502.15649 | translate | read | null |
| 2025-02-21 | Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach | Xiangtong Yao et.al. | 2502.15613 | translate | read | null |
| 2025-02-21 | SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning | Xuyang Li et.al. | 2502.15512 | translate | read | null |
| 2025-02-21 | Learning Long-Horizon Robot Manipulation Skills via Privileged Action | Xiaofeng Mao et.al. | 2502.15442 | translate | read | null |
| 2025-02-21 | TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning | Giuseppe Paolo et.al. | 2502.15425 | translate | read | null |
| 2025-02-21 | Hyperspherical Normalization for Scalable Deep Reinforcement Learning | Hojoon Lee et.al. | 2502.15280 | translate | read | null |
| 2025-02-21 | CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models | Shunchang Liu et.al. | 2502.15278 | translate | read | null |
| 2025-02-20 | Generating $π$ -Functional Molecules Using STGG+ with Active Learning | Alexia Jolicoeur-Martineau et.al. | 2502.14842 | translate | read | link |
| 2025-02-20 | Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models | Vlad Sobal et.al. | 2502.14819 | translate | read | null |
| 2025-02-20 | Making Universal Policies Universal | Niklas Höpner et.al. | 2502.14777 | translate | read | null |
| 2025-02-20 | Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | Tian Xie et.al. | 2502.14768 | translate | read | link |
| 2025-02-20 | Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse | Michael Doherty et.al. | 2502.14741 | translate | read | null |
| 2025-02-20 | Length-Controlled Margin-Based Preference Optimization without Reference Model | Gengxu Li et.al. | 2502.14643 | translate | read | null |
| 2025-02-20 | Curiosity Driven Multi-agent Reinforcement Learning for 3D Game Testing | Raihana Ferdous et.al. | 2502.14606 | translate | read | null |
| 2025-02-20 | ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification | Hyunseok Lee et.al. | 2502.14565 | translate | read | link |
| 2025-02-20 | MLGym: A New Framework and Benchmark for Advancing AI Research Agents | Deepak Nathani et.al. | 2502.14499 | translate | read | link |
| 2025-02-20 | Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization | Zhitao He et.al. | 2502.14496 | translate | read | link |
| 2025-02-19 | A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects | Arjun Gupta et.al. | 2502.13964 | translate | read | null |
| 2025-02-19 | Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks | Guilherme Palma et.al. | 2502.13918 | translate | read | null |
| 2025-02-19 | Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning | Antoine Moulin et.al. | 2502.13900 | translate | read | null |
| 2025-02-19 | NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants | Yiran Qin et.al. | 2502.13894 | translate | read | null |
| 2025-02-19 | Uncertainty quantification for Markov chains with application to temporal difference learning | Weichen Wu et.al. | 2502.13822 | translate | read | null |
| 2025-02-19 | Learning to explore when mistakes are not allowed | Charly Pecqueux-Guézénec et.al. | 2502.13801 | translate | read | null |
| 2025-02-19 | User Agency and System Automation in Interactive Intelligent Systems | Thomas Langerak et.al. | 2502.13779 | translate | read | null |
| 2025-02-19 | Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values | Hongbo Zhang et.al. | 2502.13723 | translate | read | null |
| 2025-02-19 | Hierarchical RL-MPC for Demand Response Scheduling | Maximilian Bloor et.al. | 2502.13714 | translate | read | null |
| 2025-02-19 | User Association and Coordinated Beamforming in Cognitive Aerial-Terrestrial Networks: A Safe Reinforcement Learning Approach | Zizhen Zhou et.al. | 2502.13663 | translate | read | null |
| 2025-02-18 | Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Shuo Xing et.al. | 2502.13146 | translate | read | link |
| 2025-02-18 | RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning | Hao Gao et.al. | 2502.13144 | translate | read | link |
| 2025-02-18 | Theorem Prover as a Judge for Synthetic Data Generation | Joshua Ong Jun Leang et.al. | 2502.13137 | translate | read | null |
| 2025-02-18 | Text2World: Benchmarking Large Language Models for Symbolic World Model Generation | Mengkang Hu et.al. | 2502.13092 | translate | read | link |
| 2025-02-18 | Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation | Sha Li et.al. | 2502.13019 | translate | read | null |
| 2025-02-18 | HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit | Qingwei Ben et.al. | 2502.13013 | translate | read | link |
| 2025-02-18 | Integrating Reinforcement Learning, Action Model Learning, and Numeric Planning for Tackling Complex Tasks | Yarin Benyamin et.al. | 2502.13006 | translate | read | link |
| 2025-02-18 | Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options | Lakshmi Nair et.al. | 2502.12929 | translate | read | link |
| 2025-02-18 | Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning | Nandakishor M et.al. | 2502.12876 | translate | read | null |
| 2025-02-18 | A Survey on DRL based UAV Communications and Networking: DRL Fundamentals, Applications and Implementations | Wei Zhao et.al. | 2502.12875 | translate | read | null |
| 2025-02-17 | Scaling Test-Time Compute Without Verification or RL is Suboptimal | Amrith Setlur et.al. | 2502.12118 | translate | read | null |
| 2025-02-17 | Unhackable Temporal Rewarding for Scalable Video MLLMs | En Yu et.al. | 2502.12081 | translate | read | link |
| 2025-02-17 | How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines | Ayan Sengupta et.al. | 2502.12051 | translate | read | null |
| 2025-02-17 | Theoretical Barriers in Bellman-Based Reinforcement Learning | Brieuc Pinon et.al. | 2502.11968 | translate | read | null |
| 2025-02-17 | Massively Scaling Explicit Policy-conditioned Value Functions | Nico Bohlinger et.al. | 2502.11949 | translate | read | null |
| 2025-02-17 | FitLight: Federated Imitation Learning for Plug-and-Play Autonomous Traffic Signal Control | Yutong Ye et.al. | 2502.11937 | translate | read | null |
| 2025-02-17 | VLP: Vision-Language Preference Learning for Embodied Manipulation | Runze Liu et.al. | 2502.11918 | translate | read | null |
| 2025-02-17 | CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning | Yanxiao Zhao et.al. | 2502.11896 | translate | read | null |
| 2025-02-17 | Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving? | Natalie Grabowsky et.al. | 2502.11864 | translate | read | null |
| 2025-02-17 | Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces | Eric Eaton et.al. | 2502.11828 | translate | read | null |
| 2025-02-14 | BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds | Huayi Wang et.al. | 2502.10363 | translate | read | null |
| 2025-02-14 | Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations | Abdelrhman Shaheen et.al. | 2502.10303 | translate | read | null |
| 2025-02-14 | Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding | Laurin Luttmann et.al. | 2502.10233 | translate | read | null |
| 2025-02-14 | Dynamic Reinforcement Learning for Actors | Katsunari Shibata et.al. | 2502.10200 | translate | read | null |
| 2025-02-14 | Reinforcement Learning based Constrained Optimal Control: an Interpretable Reward Design | Jingjie Ni et.al. | 2502.10187 | translate | read | null |
| 2025-02-14 | Combinatorial Reinforcement Learning with Preference Feedback | Joongkyu Lee et.al. | 2502.10158 | translate | read | null |
| 2025-02-14 | MonoForce: Learnable Image-conditioned Physics Engine | Ruslan Agishev et.al. | 2502.10156 | translate | read | null |
| 2025-02-14 | Cooperative Multi-Agent Planning with Adaptive Skill Synthesis | Zhiyuan Li et.al. | 2502.10148 | translate | read | null |
| 2025-02-14 | Provably Efficient RL under Episode-Wise Safety in Linear CMDPs | Toshinori Kitamura et.al. | 2502.10138 | translate | read | null |
| 2025-02-14 | Causal Information Prioritization for Efficient Reinforcement Learning | Hongye Cao et.al. | 2502.10097 | translate | read | null |
| 2025-02-13 | DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References | Xueyi Liu et.al. | 2502.09614 | translate | read | link |
| 2025-02-13 | Coupled Rendezvous and Docking Maneuver control of satellite using Reinforcement learning-based Adaptive Fixed-Time Sliding Mode Controller | Rakesh Kumar Sahoo et.al. | 2502.09517 | translate | read | null |
| 2025-02-13 | Variable Stiffness for Robust Locomotion through Reinforcement Learning | Dario Spoljaric et.al. | 2502.09436 | translate | read | null |
| 2025-02-13 | A Survey of Reinforcement Learning for Optimization in Automation | Ahmad Farooq et.al. | 2502.09417 | translate | read | null |
| 2025-02-13 | Generalizable Reinforcement Learning with Biologically Inspired Hyperdimensional Occupancy Grid Maps for Exploration and Goal-Directed Path Planning | Shay Snyder et.al. | 2502.09393 | translate | read | null |
| 2025-02-13 | Machine learning for modelling unstructured grid data in computational physics: a review | Sibo Cheng et.al. | 2502.09346 | translate | read | null |
| 2025-02-13 | Revisiting Topological Interference Management: A Learning-to-Code on Graphs Perspective | Zhiwei Shan et.al. | 2502.09344 | translate | read | null |
| 2025-02-13 | Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning | Daniel Koutas et.al. | 2502.09298 | translate | read | null |
| 2025-02-13 | Autonomous Task Completion Based on Goal-directed Answer Set Programming | Alexis R. Tudor et.al. | 2502.09208 | translate | read | null |
| 2025-02-13 | Logical Reasoning in Large Language Models: A Survey | Hanmeng Liu et.al. | 2502.09100 | translate | read | link |
| 2025-02-12 | Re $^3$ Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation | Xiaoshen Han et.al. | 2502.08645 | translate | read | link |
| 2025-02-12 | A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards | Shivansh Patel et.al. | 2502.08643 | translate | read | null |
| 2025-02-12 | Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning | Dhruv Rohatgi et.al. | 2502.08632 | translate | read | null |
| 2025-02-12 | Robot Data Curation with Mutual Information Estimators | Joey Hejna et.al. | 2502.08623 | translate | read | null |
| 2025-02-12 | Learning to Group and Grasp Multiple Objects | Takahiro Yonemaru et.al. | 2502.08452 | translate | read | null |
| 2025-02-12 | CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World | Yankai Fu et.al. | 2502.08449 | translate | read | null |
| 2025-02-12 | Acceleration of crystal structure relaxation with Deep Reinforcement Learning | Elena Trukhan et.al. | 2502.08405 | translate | read | null |
| 2025-02-12 | Learning Humanoid Standing-up Control across Diverse Postures | Tao Huang et.al. | 2502.08378 | translate | read | link |
| 2025-02-12 | Towards Principled Multi-Agent Task Agnostic Exploration | Riccardo Zamboni et.al. | 2502.08365 | translate | read | null |
| 2025-02-12 | Deterministic generation of non-classical mechanical states in cavity optomechanics via reinforcement learning | Yu-Hong Liu et.al. | 2502.08350 | translate | read | null |
| 2025-02-11 | Polynomial-Time Approximability of Constrained Reinforcement Learning | Jeremy McMahan et.al. | 2502.07764 | translate | read | null |
| 2025-02-11 | DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove | Han Zhang et.al. | 2502.07730 | translate | read | null |
| 2025-02-11 | Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning | Aya Kayal et.al. | 2502.07715 | translate | read | null |
| 2025-02-11 | A Unifying Framework for Causal Imitation Learning with Hidden Confounders | Daqian Shao et.al. | 2502.07656 | translate | read | null |
| 2025-02-11 | Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning | Zhaoting Li et.al. | 2502.07645 | translate | read | null |
| 2025-02-11 | Distributed Value Decomposition Networks with Networked Agents | Guilherme S. Varela et.al. | 2502.07635 | translate | read | null |
| 2025-02-11 | Evolution of cooperation in a bimodal mixture of conditional cooperators | Chenyang Zhao et.al. | 2502.07537 | translate | read | null |
| 2025-02-11 | Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization | Daniel Palenicek et.al. | 2502.07523 | translate | read | null |
| 2025-02-11 | Logarithmic Regret for Online KL-Regularized Reinforcement Learning | Heyang Zhao et.al. | 2502.07460 | translate | read | null |
| 2025-02-11 | Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation | Erik M. Lintunen et.al. | 2502.07423 | translate | read | null |
| 2025-02-10 | Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Chengqi Lyu et.al. | 2502.06781 | translate | read | link |
| 2025-02-10 | On the Emergence of Thinking in LLMs I: Searching for the Right Intuition | Guanghao Ye et.al. | 2502.06773 | translate | read | link |
| 2025-02-10 | ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates | Ling Yang et.al. | 2502.06772 | translate | read | link |
| 2025-02-10 | AgilePilot: DRL-Based Drone Agent for Real-Time Motion Planning in Dynamic Environments by Leveraging Object Detection | Roohan Ahmed Khan et.al. | 2502.06725 | translate | read | null |
| 2025-02-10 | Discovery of skill switching criteria for learning agile quadruped locomotion | Wanming Yu et.al. | 2502.06676 | translate | read | null |
| 2025-02-10 | Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series | Aurélien Renault et.al. | 2502.06584 | translate | read | null |
| 2025-02-10 | Predictive Red Teaming: Breaking Policies Without Breaking Robots | Anirudha Majumdar et.al. | 2502.06575 | translate | read | null |
| 2025-02-10 | Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning | Jean Vassoyan et.al. | 2502.06533 | translate | read | link |
| 2025-02-10 | Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling | Shenghong He et.al. | 2502.06491 | translate | read | null |
| 2025-02-10 | SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding | Shuhao Liao et.al. | 2502.06440 | translate | read | null |
| 2025-02-07 | DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails | Yihe Deng et.al. | 2502.05163 | translate | read | link |
| 2025-02-07 | Use of Winsome Robots for Understanding Human Feedback (UWU) | Jessica Eggers et.al. | 2502.05118 | translate | read | null |
| 2025-02-07 | 3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery | Xiuyuan Hu et.al. | 2502.05107 | translate | read | link |
| 2025-02-07 | Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Tushar Pandey et.al. | 2502.05078 | translate | read | link |
| 2025-02-07 | Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning approach with Policy Distillation | Wenqi Bai et.al. | 2502.05069 | translate | read | null |
| 2025-02-07 | Seasonal Station-Keeping of Short Duration High Altitude Balloons using Deep Reinforcement Learning | Tristan K. Schuler et.al. | 2502.05014 | translate | read | null |
| 2025-02-07 | A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning Enhanced Approach | Taiyi Wang et.al. | 2502.05001 | translate | read | null |
| 2025-02-07 | Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits | Finn Rietz et.al. | 2502.04979 | translate | read | null |
| 2025-02-07 | Towards Smarter Sensing: 2D Clutter Mitigation in RL-Driven Cognitive MIMO Radar | Adam Umra et.al. | 2502.04967 | translate | read | null |
| 2025-02-07 | Fast Adaptive Anti-Jamming Channel Access via Deep Q Learning and Coarse-Grained Spectrum Prediction | Jianshu Zhang et.al. | 2502.04963 | translate | read | null |
| 2025-02-06 | DexterityGen: Foundation Controller for Unprecedented Dexterity | Zhao-Heng Yin et.al. | 2502.04307 | translate | read | null |
| 2025-02-06 | PILAF: Optimal Human Preference Sampling for Reward Modeling | Yunzhen Feng et.al. | 2502.04270 | translate | read | null |
| 2025-02-06 | Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning | Wesley A. Suttle et.al. | 2502.04141 | translate | read | null |
| 2025-02-06 | Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents | Yuchen Lian et.al. | 2502.04038 | translate | read | null |
| 2025-02-06 | Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning | Nikunj Gupta et.al. | 2502.04028 | translate | read | link |
| 2025-02-06 | Bilevel Multi-Armed Bandit-Based Hierarchical Reinforcement Learning for Interaction-Aware Self-Driving at Unsignalized Intersections | Zengqi Peng et.al. | 2502.03960 | translate | read | null |
| 2025-02-06 | Fairness Aware Reinforcement Learning via Proximal Policy Optimization | Gabriele La Malfa et.al. | 2502.03953 | translate | read | null |
| 2025-02-06 | CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning | Yousef Koka et.al. | 2502.03946 | translate | read | null |
| 2025-02-06 | Mirror Descent Actor Critic via Bounded Advantage Learning | Ryo Iwaki et.al. | 2502.03854 | translate | read | null |
| 2025-02-06 | PAGNet: Pluggable Adaptive Generative Networks for Information Completion in Multi-Agent Communication | Zhuohui Zhang et.al. | 2502.03845 | translate | read | null |
| 2025-02-05 | Deep Reinforcement Learning-Based Optimization of Second-Life Battery Utilization in Electric Vehicles Charging Stations | Rouzbeh Haghighi et.al. | 2502.03412 | translate | read | null |
| 2025-02-05 | Lightweight Authenticated Task Offloading in 6G-Cloud Vehicular Twin Networks | Sarah Al-Shareeda et.al. | 2502.03403 | translate | read | null |
| 2025-02-05 | Energy-Efficient Flying LoRa Gateways: A Multi-Agent Reinforcement Learning Approach | Abdullahi Isa Ahmed et.al. | 2502.03377 | translate | read | null |
| 2025-02-05 | Demystifying Long Chain-of-Thought Reasoning in LLMs | Edward Yeo et.al. | 2502.03373 | translate | read | link |
| 2025-02-05 | Learning from Active Human Involvement through Proxy Value Propagation | Zhenghao Peng et.al. | 2502.03369 | translate | read | null |
| 2025-02-05 | Conditional Prediction by Simulation for Automated Driving | Fabian Konstantinidis et.al. | 2502.03286 | translate | read | null |
| 2025-02-05 | Calibrated Unsupervised Anomaly Detection in Multivariate Time-series using Reinforcement Learning | Saba Sanami et.al. | 2502.03245 | translate | read | null |
| 2025-02-05 | Underwater Soft Fin Flapping Motion with Deep Neural Network Based Surrogate Model | Yuya Hamamatsu et.al. | 2502.03135 | translate | read | null |
| 2025-02-05 | Double Distillation Network for Multi-Agent Reinforcement Learning | Yang Zhou et.al. | 2502.03125 | translate | read | null |
| 2025-02-05 | HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller | Qiyuan Zhang et.al. | 2502.03122 | translate | read | null |
| 2025-02-04 | Flow Q-Learning | Seohong Park et.al. | 2502.02538 | translate | read | null |
| 2025-02-04 | Brief analysis of DeepSeek R1 and it’s implications for Generative AI | Sarah Mercer et.al. | 2502.02523 | translate | read | null |
| 2025-02-04 | Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Maohao Shen et.al. | 2502.02508 | translate | read | null |
| 2025-02-04 | Towards Fast Graph Generation via Autoregressive Noisy Filtration Modeling | Markus Krimmel et.al. | 2502.02415 | translate | read | null |
| 2025-02-04 | Achieving Hiding and Smart Anti-Jamming Communication: A Parallel DRL Approach against Moving Reactive Jammer | Yangyang Li et.al. | 2502.02385 | translate | read | null |
| 2025-02-04 | Circular Microalgae-Based Carbon Control for Net Zero | Federico Zocco et.al. | 2502.02382 | translate | read | null |
| 2025-02-04 | Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning | Donglin Zhan et.al. | 2502.02332 | translate | read | null |
| 2025-02-04 | Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation | Siyu Wang et.al. | 2502.02327 | translate | read | null |
| 2025-02-04 | DIME:Diffusion-Based Maximum Entropy Reinforcement Learning | Onur Celik et.al. | 2502.02316 | translate | read | null |
| 2025-02-04 | MAGNNET: Multi-Agent Graph Neural Network-based Efficient Task Allocation for Autonomous Vehicles with Deep Reinforcement Learning | Lavanya Ratnabala et.al. | 2502.02311 | translate | read | null |
| 2025-02-03 | SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments | Hüseyin Aydın et.al. | 2501.19245 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)