Reinforcement Learning - 2025-02

Publish Date Title Authors PDF Translate Read Code
2025-02-28 LLM Post-Training: A Deep Dive into Reasoning Large Language Models Komal Kumar et.al. 2502.21321 translate read null
2025-02-28 ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers Alexander Scarlatos et.al. 2502.21267 translate read null
2025-02-28 ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs Hao Ge et.al. 2502.21231 translate read null
2025-02-28 A Method of Selective Attention for Reservoir Based Agents Kevin McKee et.al. 2502.21229 translate read null
2025-02-28 Reducing Reward Dependence in RL Through Adaptive Confidence Discounting Muhammed Yusuf Satici et.al. 2502.21181 translate read null
2025-02-28 Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning Léopold Maytié et.al. 2502.21142 translate read null
2025-02-28 Dynamically Local-Enhancement Planner for Large-Scale Autonomous Driving Nanshan Deng et.al. 2502.21134 translate read null
2025-02-28 AuthSim: Towards Authentic and Effective Safety-critical Scenario Generation for Autonomous Driving Tests Yukuan Yang et.al. 2502.21100 translate read null
2025-02-28 Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control Taeho Lee et.al. 2502.21057 translate read null
2025-02-28 Motion ReTouch: Motion Modification Using Four-Channel Bilateral Control Koki Inami et.al. 2502.20982 translate read null
2025-02-27 Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids Toru Lin et.al. 2502.20396 translate read null
2025-02-27 Multi-Turn Code Generation Through Single-Step Rewards Arnav Kumar Jain et.al. 2502.20380 translate read null
2025-02-27 The Role of Tactile Sensing for Learning Reach and Grasp Boya Zhang et.al. 2502.20367 translate read null
2025-02-27 Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning Thomas Budiarjo et.al. 2502.20348 translate read null
2025-02-27 Safety Representations for Safer Policy Learning Kaustubh Mani et.al. 2502.20341 translate read null
2025-02-27 Deep Reinforcement Learning based Autonomous Decision-Making for Cooperative UAVs: A Search and Rescue Real World Application Thomas Hickling et.al. 2502.20326 translate read null
2025-02-27 On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+( $λ$,$λ$ ))-GA Tai Nguyen et.al. 2502.20265 translate read null
2025-02-27 Explainable physics-based constraints on reinforcement learning for accelerator controls Jonathan Colen et.al. 2502.20247 translate read null
2025-02-27 MARVEL: Multi-Agent Reinforcement Learning for constrained field-of-View multi-robot Exploration in Large-scale environments Jimmy Chiun et.al. 2502.20217 translate read null
2025-02-27 Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies Zhouyu He et.al. 2502.20190 translate read null
2025-02-26 Recurrent Auto-Encoders for Enhanced Deep Reinforcement Learning in Wilderness Search and Rescue Planning Jan-Hendrik Ewers et.al. 2502.19356 translate read null
2025-02-26 Hybrid Robot Learning for Automatic Robot Motion Planning in Manufacturing Siddharth Singh et.al. 2502.19340 translate read null
2025-02-26 WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies William Solow et.al. 2502.19308 translate read null
2025-02-26 Combining Planning and Reinforcement Learning for Solving Relational Multiagent Domains Nikhilesh Prabhakar et.al. 2502.19297 translate read null
2025-02-26 Deep Computerized Adaptive Testing Jiguang Li et.al. 2502.19275 translate read null
2025-02-26 Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective Jiawei Huang et.al. 2502.19255 translate read null
2025-02-26 ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration Minjie Zhu et.al. 2502.19250 translate read null
2025-02-26 Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time Jiazheng Li et.al. 2502.19230 translate read null
2025-02-26 When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning Yijiang River Dong et.al. 2502.19158 translate read null
2025-02-26 Policy Testing with MDPFuzz (Replicability Study) Quentin Mazouni et.al. 2502.19116 translate read null
2025-02-25 SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Yuxiang Wei et.al. 2502.18449 translate read null
2025-02-25 MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning Chanwoo Park et.al. 2502.18439 translate read null
2025-02-25 Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand Fengshuo Bai et.al. 2502.18423 translate read null
2025-02-25 Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck Ryo Takizawa et.al. 2502.18121 translate read null
2025-02-25 Controlling dynamics of stochastic systems with deep reinforcement learning Ruslan Mukhamadiarov et.al. 2502.18111 translate read null
2025-02-25 From planning to policy: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation Haewon Jung et.al. 2502.18015 translate read null
2025-02-25 NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms Yashan Wang et.al. 2502.18008 translate read null
2025-02-25 Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric Zhenyu Tao et.al. 2502.17983 translate read null
2025-02-25 FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real Weiheng Liu et.al. 2502.17894 translate read null
2025-02-25 Sample-efficient diffusion-based control of complex nonlinear systems Hongyi Chen et.al. 2502.17893 translate read null
2025-02-24 Event-Based Limit Order Book Simulation under a Neural Hawkes Process: Application in Market-Making Luca Lalor et.al. 2502.17417 translate read null
2025-02-24 Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Alon Albalak et.al. 2502.17387 translate read link
2025-02-24 Distributed Coordination for Heterogeneous Non-Terrestrial Networks Jikang Deng et.al. 2502.17366 translate read null
2025-02-24 TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control Zifeng Zhuang et.al. 2502.17322 translate read null
2025-02-24 Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach Jichen Li et.al. 2502.17307 translate read null
2025-02-24 A Reinforcement Learning Approach to Non-prehensile Manipulation through Sliding Hamidreza Raei et.al. 2502.17221 translate read null
2025-02-24 Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning Weiji Xie et.al. 2502.17219 translate read null
2025-02-24 Teleology-Driven Affective Computing: A Causal Framework for Sustained Well-Being Bin Yin et.al. 2502.17172 translate read null
2025-02-24 A Novel Multiple Access Scheme for Heterogeneous Wireless Communications using Symmetry-aware Continual Deep Reinforcement Learning Hamidreza Mazandarani et.al. 2502.17167 translate read null
2025-02-24 MA2RL: Masked Autoencoders for Generalizable Multi-Agent Reinforcement Learning Jinyuan Feng et.al. 2502.17046 translate read null
2025-02-21 BOSS: Benchmark for Observation Space Shift in Long-Horizon Task Yue Yang et.al. 2502.15679 translate read null
2025-02-21 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling Florent Bartoccioni et.al. 2502.15672 translate read link
2025-02-21 Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network Vincent Hsiao et.al. 2502.15662 translate read null
2025-02-21 A Simulation Pipeline to Facilitate Real-World Robotic Reinforcement Learning Applications Jefferson Silveira et.al. 2502.15649 translate read null
2025-02-21 Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach Xiangtong Yao et.al. 2502.15613 translate read null
2025-02-21 SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning Xuyang Li et.al. 2502.15512 translate read null
2025-02-21 Learning Long-Horizon Robot Manipulation Skills via Privileged Action Xiaofeng Mao et.al. 2502.15442 translate read null
2025-02-21 TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning Giuseppe Paolo et.al. 2502.15425 translate read null
2025-02-21 Hyperspherical Normalization for Scalable Deep Reinforcement Learning Hojoon Lee et.al. 2502.15280 translate read null
2025-02-21 CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models Shunchang Liu et.al. 2502.15278 translate read null
2025-02-20 Generating $π$ -Functional Molecules Using STGG+ with Active Learning Alexia Jolicoeur-Martineau et.al. 2502.14842 translate read link
2025-02-20 Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models Vlad Sobal et.al. 2502.14819 translate read null
2025-02-20 Making Universal Policies Universal Niklas Höpner et.al. 2502.14777 translate read null
2025-02-20 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Tian Xie et.al. 2502.14768 translate read link
2025-02-20 Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse Michael Doherty et.al. 2502.14741 translate read null
2025-02-20 Length-Controlled Margin-Based Preference Optimization without Reference Model Gengxu Li et.al. 2502.14643 translate read null
2025-02-20 Curiosity Driven Multi-agent Reinforcement Learning for 3D Game Testing Raihana Ferdous et.al. 2502.14606 translate read null
2025-02-20 ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification Hyunseok Lee et.al. 2502.14565 translate read link
2025-02-20 MLGym: A New Framework and Benchmark for Advancing AI Research Agents Deepak Nathani et.al. 2502.14499 translate read link
2025-02-20 Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization Zhitao He et.al. 2502.14496 translate read link
2025-02-19 A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects Arjun Gupta et.al. 2502.13964 translate read null
2025-02-19 Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks Guilherme Palma et.al. 2502.13918 translate read null
2025-02-19 Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning Antoine Moulin et.al. 2502.13900 translate read null
2025-02-19 NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants Yiran Qin et.al. 2502.13894 translate read null
2025-02-19 Uncertainty quantification for Markov chains with application to temporal difference learning Weichen Wu et.al. 2502.13822 translate read null
2025-02-19 Learning to explore when mistakes are not allowed Charly Pecqueux-Guézénec et.al. 2502.13801 translate read null
2025-02-19 User Agency and System Automation in Interactive Intelligent Systems Thomas Langerak et.al. 2502.13779 translate read null
2025-02-19 Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values Hongbo Zhang et.al. 2502.13723 translate read null
2025-02-19 Hierarchical RL-MPC for Demand Response Scheduling Maximilian Bloor et.al. 2502.13714 translate read null
2025-02-19 User Association and Coordinated Beamforming in Cognitive Aerial-Terrestrial Networks: A Safe Reinforcement Learning Approach Zizhen Zhou et.al. 2502.13663 translate read null
2025-02-18 Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Shuo Xing et.al. 2502.13146 translate read link
2025-02-18 RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning Hao Gao et.al. 2502.13144 translate read link
2025-02-18 Theorem Prover as a Judge for Synthetic Data Generation Joshua Ong Jun Leang et.al. 2502.13137 translate read null
2025-02-18 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Mengkang Hu et.al. 2502.13092 translate read link
2025-02-18 Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation Sha Li et.al. 2502.13019 translate read null
2025-02-18 HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit Qingwei Ben et.al. 2502.13013 translate read link
2025-02-18 Integrating Reinforcement Learning, Action Model Learning, and Numeric Planning for Tackling Complex Tasks Yarin Benyamin et.al. 2502.13006 translate read link
2025-02-18 Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options Lakshmi Nair et.al. 2502.12929 translate read link
2025-02-18 Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning Nandakishor M et.al. 2502.12876 translate read null
2025-02-18 A Survey on DRL based UAV Communications and Networking: DRL Fundamentals, Applications and Implementations Wei Zhao et.al. 2502.12875 translate read null
2025-02-17 Scaling Test-Time Compute Without Verification or RL is Suboptimal Amrith Setlur et.al. 2502.12118 translate read null
2025-02-17 Unhackable Temporal Rewarding for Scalable Video MLLMs En Yu et.al. 2502.12081 translate read link
2025-02-17 How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines Ayan Sengupta et.al. 2502.12051 translate read null
2025-02-17 Theoretical Barriers in Bellman-Based Reinforcement Learning Brieuc Pinon et.al. 2502.11968 translate read null
2025-02-17 Massively Scaling Explicit Policy-conditioned Value Functions Nico Bohlinger et.al. 2502.11949 translate read null
2025-02-17 FitLight: Federated Imitation Learning for Plug-and-Play Autonomous Traffic Signal Control Yutong Ye et.al. 2502.11937 translate read null
2025-02-17 VLP: Vision-Language Preference Learning for Embodied Manipulation Runze Liu et.al. 2502.11918 translate read null
2025-02-17 CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning Yanxiao Zhao et.al. 2502.11896 translate read null
2025-02-17 Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving? Natalie Grabowsky et.al. 2502.11864 translate read null
2025-02-17 Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces Eric Eaton et.al. 2502.11828 translate read null
2025-02-14 BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds Huayi Wang et.al. 2502.10363 translate read null
2025-02-14 Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations Abdelrhman Shaheen et.al. 2502.10303 translate read null
2025-02-14 Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding Laurin Luttmann et.al. 2502.10233 translate read null
2025-02-14 Dynamic Reinforcement Learning for Actors Katsunari Shibata et.al. 2502.10200 translate read null
2025-02-14 Reinforcement Learning based Constrained Optimal Control: an Interpretable Reward Design Jingjie Ni et.al. 2502.10187 translate read null
2025-02-14 Combinatorial Reinforcement Learning with Preference Feedback Joongkyu Lee et.al. 2502.10158 translate read null
2025-02-14 MonoForce: Learnable Image-conditioned Physics Engine Ruslan Agishev et.al. 2502.10156 translate read null
2025-02-14 Cooperative Multi-Agent Planning with Adaptive Skill Synthesis Zhiyuan Li et.al. 2502.10148 translate read null
2025-02-14 Provably Efficient RL under Episode-Wise Safety in Linear CMDPs Toshinori Kitamura et.al. 2502.10138 translate read null
2025-02-14 Causal Information Prioritization for Efficient Reinforcement Learning Hongye Cao et.al. 2502.10097 translate read null
2025-02-13 DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References Xueyi Liu et.al. 2502.09614 translate read link
2025-02-13 Coupled Rendezvous and Docking Maneuver control of satellite using Reinforcement learning-based Adaptive Fixed-Time Sliding Mode Controller Rakesh Kumar Sahoo et.al. 2502.09517 translate read null
2025-02-13 Variable Stiffness for Robust Locomotion through Reinforcement Learning Dario Spoljaric et.al. 2502.09436 translate read null
2025-02-13 A Survey of Reinforcement Learning for Optimization in Automation Ahmad Farooq et.al. 2502.09417 translate read null
2025-02-13 Generalizable Reinforcement Learning with Biologically Inspired Hyperdimensional Occupancy Grid Maps for Exploration and Goal-Directed Path Planning Shay Snyder et.al. 2502.09393 translate read null
2025-02-13 Machine learning for modelling unstructured grid data in computational physics: a review Sibo Cheng et.al. 2502.09346 translate read null
2025-02-13 Revisiting Topological Interference Management: A Learning-to-Code on Graphs Perspective Zhiwei Shan et.al. 2502.09344 translate read null
2025-02-13 Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning Daniel Koutas et.al. 2502.09298 translate read null
2025-02-13 Autonomous Task Completion Based on Goal-directed Answer Set Programming Alexis R. Tudor et.al. 2502.09208 translate read null
2025-02-13 Logical Reasoning in Large Language Models: A Survey Hanmeng Liu et.al. 2502.09100 translate read link
2025-02-12 Re $^3$ Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation Xiaoshen Han et.al. 2502.08645 translate read link
2025-02-12 A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards Shivansh Patel et.al. 2502.08643 translate read null
2025-02-12 Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning Dhruv Rohatgi et.al. 2502.08632 translate read null
2025-02-12 Robot Data Curation with Mutual Information Estimators Joey Hejna et.al. 2502.08623 translate read null
2025-02-12 Learning to Group and Grasp Multiple Objects Takahiro Yonemaru et.al. 2502.08452 translate read null
2025-02-12 CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World Yankai Fu et.al. 2502.08449 translate read null
2025-02-12 Acceleration of crystal structure relaxation with Deep Reinforcement Learning Elena Trukhan et.al. 2502.08405 translate read null
2025-02-12 Learning Humanoid Standing-up Control across Diverse Postures Tao Huang et.al. 2502.08378 translate read link
2025-02-12 Towards Principled Multi-Agent Task Agnostic Exploration Riccardo Zamboni et.al. 2502.08365 translate read null
2025-02-12 Deterministic generation of non-classical mechanical states in cavity optomechanics via reinforcement learning Yu-Hong Liu et.al. 2502.08350 translate read null
2025-02-11 Polynomial-Time Approximability of Constrained Reinforcement Learning Jeremy McMahan et.al. 2502.07764 translate read null
2025-02-11 DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove Han Zhang et.al. 2502.07730 translate read null
2025-02-11 Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning Aya Kayal et.al. 2502.07715 translate read null
2025-02-11 A Unifying Framework for Causal Imitation Learning with Hidden Confounders Daqian Shao et.al. 2502.07656 translate read null
2025-02-11 Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning Zhaoting Li et.al. 2502.07645 translate read null
2025-02-11 Distributed Value Decomposition Networks with Networked Agents Guilherme S. Varela et.al. 2502.07635 translate read null
2025-02-11 Evolution of cooperation in a bimodal mixture of conditional cooperators Chenyang Zhao et.al. 2502.07537 translate read null
2025-02-11 Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization Daniel Palenicek et.al. 2502.07523 translate read null
2025-02-11 Logarithmic Regret for Online KL-Regularized Reinforcement Learning Heyang Zhao et.al. 2502.07460 translate read null
2025-02-11 Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation Erik M. Lintunen et.al. 2502.07423 translate read null
2025-02-10 Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Chengqi Lyu et.al. 2502.06781 translate read link
2025-02-10 On the Emergence of Thinking in LLMs I: Searching for the Right Intuition Guanghao Ye et.al. 2502.06773 translate read link
2025-02-10 ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Ling Yang et.al. 2502.06772 translate read link
2025-02-10 AgilePilot: DRL-Based Drone Agent for Real-Time Motion Planning in Dynamic Environments by Leveraging Object Detection Roohan Ahmed Khan et.al. 2502.06725 translate read null
2025-02-10 Discovery of skill switching criteria for learning agile quadruped locomotion Wanming Yu et.al. 2502.06676 translate read null
2025-02-10 Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series Aurélien Renault et.al. 2502.06584 translate read null
2025-02-10 Predictive Red Teaming: Breaking Policies Without Breaking Robots Anirudha Majumdar et.al. 2502.06575 translate read null
2025-02-10 Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Jean Vassoyan et.al. 2502.06533 translate read link
2025-02-10 Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling Shenghong He et.al. 2502.06491 translate read null
2025-02-10 SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding Shuhao Liao et.al. 2502.06440 translate read null
2025-02-07 DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Yihe Deng et.al. 2502.05163 translate read link
2025-02-07 Use of Winsome Robots for Understanding Human Feedback (UWU) Jessica Eggers et.al. 2502.05118 translate read null
2025-02-07 3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery Xiuyuan Hu et.al. 2502.05107 translate read link
2025-02-07 Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures Tushar Pandey et.al. 2502.05078 translate read link
2025-02-07 Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning approach with Policy Distillation Wenqi Bai et.al. 2502.05069 translate read null
2025-02-07 Seasonal Station-Keeping of Short Duration High Altitude Balloons using Deep Reinforcement Learning Tristan K. Schuler et.al. 2502.05014 translate read null
2025-02-07 A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning Enhanced Approach Taiyi Wang et.al. 2502.05001 translate read null
2025-02-07 Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits Finn Rietz et.al. 2502.04979 translate read null
2025-02-07 Towards Smarter Sensing: 2D Clutter Mitigation in RL-Driven Cognitive MIMO Radar Adam Umra et.al. 2502.04967 translate read null
2025-02-07 Fast Adaptive Anti-Jamming Channel Access via Deep Q Learning and Coarse-Grained Spectrum Prediction Jianshu Zhang et.al. 2502.04963 translate read null
2025-02-06 DexterityGen: Foundation Controller for Unprecedented Dexterity Zhao-Heng Yin et.al. 2502.04307 translate read null
2025-02-06 PILAF: Optimal Human Preference Sampling for Reward Modeling Yunzhen Feng et.al. 2502.04270 translate read null
2025-02-06 Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning Wesley A. Suttle et.al. 2502.04141 translate read null
2025-02-06 Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents Yuchen Lian et.al. 2502.04038 translate read null
2025-02-06 Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning Nikunj Gupta et.al. 2502.04028 translate read link
2025-02-06 Bilevel Multi-Armed Bandit-Based Hierarchical Reinforcement Learning for Interaction-Aware Self-Driving at Unsignalized Intersections Zengqi Peng et.al. 2502.03960 translate read null
2025-02-06 Fairness Aware Reinforcement Learning via Proximal Policy Optimization Gabriele La Malfa et.al. 2502.03953 translate read null
2025-02-06 CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning Yousef Koka et.al. 2502.03946 translate read null
2025-02-06 Mirror Descent Actor Critic via Bounded Advantage Learning Ryo Iwaki et.al. 2502.03854 translate read null
2025-02-06 PAGNet: Pluggable Adaptive Generative Networks for Information Completion in Multi-Agent Communication Zhuohui Zhang et.al. 2502.03845 translate read null
2025-02-05 Deep Reinforcement Learning-Based Optimization of Second-Life Battery Utilization in Electric Vehicles Charging Stations Rouzbeh Haghighi et.al. 2502.03412 translate read null
2025-02-05 Lightweight Authenticated Task Offloading in 6G-Cloud Vehicular Twin Networks Sarah Al-Shareeda et.al. 2502.03403 translate read null
2025-02-05 Energy-Efficient Flying LoRa Gateways: A Multi-Agent Reinforcement Learning Approach Abdullahi Isa Ahmed et.al. 2502.03377 translate read null
2025-02-05 Demystifying Long Chain-of-Thought Reasoning in LLMs Edward Yeo et.al. 2502.03373 translate read link
2025-02-05 Learning from Active Human Involvement through Proxy Value Propagation Zhenghao Peng et.al. 2502.03369 translate read null
2025-02-05 Conditional Prediction by Simulation for Automated Driving Fabian Konstantinidis et.al. 2502.03286 translate read null
2025-02-05 Calibrated Unsupervised Anomaly Detection in Multivariate Time-series using Reinforcement Learning Saba Sanami et.al. 2502.03245 translate read null
2025-02-05 Underwater Soft Fin Flapping Motion with Deep Neural Network Based Surrogate Model Yuya Hamamatsu et.al. 2502.03135 translate read null
2025-02-05 Double Distillation Network for Multi-Agent Reinforcement Learning Yang Zhou et.al. 2502.03125 translate read null
2025-02-05 HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller Qiyuan Zhang et.al. 2502.03122 translate read null
2025-02-04 Flow Q-Learning Seohong Park et.al. 2502.02538 translate read null
2025-02-04 Brief analysis of DeepSeek R1 and it’s implications for Generative AI Sarah Mercer et.al. 2502.02523 translate read null
2025-02-04 Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Maohao Shen et.al. 2502.02508 translate read null
2025-02-04 Towards Fast Graph Generation via Autoregressive Noisy Filtration Modeling Markus Krimmel et.al. 2502.02415 translate read null
2025-02-04 Achieving Hiding and Smart Anti-Jamming Communication: A Parallel DRL Approach against Moving Reactive Jammer Yangyang Li et.al. 2502.02385 translate read null
2025-02-04 Circular Microalgae-Based Carbon Control for Net Zero Federico Zocco et.al. 2502.02382 translate read null
2025-02-04 Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning Donglin Zhan et.al. 2502.02332 translate read null
2025-02-04 Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation Siyu Wang et.al. 2502.02327 translate read null
2025-02-04 DIME:Diffusion-Based Maximum Entropy Reinforcement Learning Onur Celik et.al. 2502.02316 translate read null
2025-02-04 MAGNNET: Multi-Agent Graph Neural Network-based Efficient Task Allocation for Autonomous Vehicles with Deep Reinforcement Learning Lavanya Ratnabala et.al. 2502.02311 translate read null
2025-02-03 SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments Hüseyin Aydın et.al. 2501.19245 translate read null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)