Reinforcement Learning - 2025-02

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-02-28	LLM Post-Training: A Deep Dive into Reasoning Large Language Models	Komal Kumar et.al.	2502.21321	translate	read	null
2025-02-28	ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers	Alexander Scarlatos et.al.	2502.21267	translate	read	null
2025-02-28	ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs	Hao Ge et.al.	2502.21231	translate	read	null
2025-02-28	A Method of Selective Attention for Reservoir Based Agents	Kevin McKee et.al.	2502.21229	translate	read	null
2025-02-28	Reducing Reward Dependence in RL Through Adaptive Confidence Discounting	Muhammed Yusuf Satici et.al.	2502.21181	translate	read	null
2025-02-28	Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning	Léopold Maytié et.al.	2502.21142	translate	read	null
2025-02-28	Dynamically Local-Enhancement Planner for Large-Scale Autonomous Driving	Nanshan Deng et.al.	2502.21134	translate	read	null
2025-02-28	AuthSim: Towards Authentic and Effective Safety-critical Scenario Generation for Autonomous Driving Tests	Yukuan Yang et.al.	2502.21100	translate	read	null
2025-02-28	Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control	Taeho Lee et.al.	2502.21057	translate	read	null
2025-02-28	Motion ReTouch: Motion Modification Using Four-Channel Bilateral Control	Koki Inami et.al.	2502.20982	translate	read	null
2025-02-27	Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids	Toru Lin et.al.	2502.20396	translate	read	null
2025-02-27	Multi-Turn Code Generation Through Single-Step Rewards	Arnav Kumar Jain et.al.	2502.20380	translate	read	null
2025-02-27	The Role of Tactile Sensing for Learning Reach and Grasp	Boya Zhang et.al.	2502.20367	translate	read	null
2025-02-27	Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning	Thomas Budiarjo et.al.	2502.20348	translate	read	null
2025-02-27	Safety Representations for Safer Policy Learning	Kaustubh Mani et.al.	2502.20341	translate	read	null
2025-02-27	Deep Reinforcement Learning based Autonomous Decision-Making for Cooperative UAVs: A Search and Rescue Real World Application	Thomas Hickling et.al.	2502.20326	translate	read	null
2025-02-27	On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+( $λ$,$λ$ ))-GA	Tai Nguyen et.al.	2502.20265	translate	read	null
2025-02-27	Explainable physics-based constraints on reinforcement learning for accelerator controls	Jonathan Colen et.al.	2502.20247	translate	read	null
2025-02-27	MARVEL: Multi-Agent Reinforcement Learning for constrained field-of-View multi-robot Exploration in Large-scale environments	Jimmy Chiun et.al.	2502.20217	translate	read	null
2025-02-27	Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies	Zhouyu He et.al.	2502.20190	translate	read	null
2025-02-26	Recurrent Auto-Encoders for Enhanced Deep Reinforcement Learning in Wilderness Search and Rescue Planning	Jan-Hendrik Ewers et.al.	2502.19356	translate	read	null
2025-02-26	Hybrid Robot Learning for Automatic Robot Motion Planning in Manufacturing	Siddharth Singh et.al.	2502.19340	translate	read	null
2025-02-26	WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies	William Solow et.al.	2502.19308	translate	read	null
2025-02-26	Combining Planning and Reinforcement Learning for Solving Relational Multiagent Domains	Nikhilesh Prabhakar et.al.	2502.19297	translate	read	null
2025-02-26	Deep Computerized Adaptive Testing	Jiguang Li et.al.	2502.19275	translate	read	null
2025-02-26	Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective	Jiawei Huang et.al.	2502.19255	translate	read	null
2025-02-26	ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration	Minjie Zhu et.al.	2502.19250	translate	read	null
2025-02-26	Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time	Jiazheng Li et.al.	2502.19230	translate	read	null
2025-02-26	When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning	Yijiang River Dong et.al.	2502.19158	translate	read	null
2025-02-26	Policy Testing with MDPFuzz (Replicability Study)	Quentin Mazouni et.al.	2502.19116	translate	read	null
2025-02-25	SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	Yuxiang Wei et.al.	2502.18449	translate	read	null
2025-02-25	MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning	Chanwoo Park et.al.	2502.18439	translate	read	null
2025-02-25	Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand	Fengshuo Bai et.al.	2502.18423	translate	read	null
2025-02-25	Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck	Ryo Takizawa et.al.	2502.18121	translate	read	null
2025-02-25	Controlling dynamics of stochastic systems with deep reinforcement learning	Ruslan Mukhamadiarov et.al.	2502.18111	translate	read	null
2025-02-25	From planning to policy: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation	Haewon Jung et.al.	2502.18015	translate	read	null
2025-02-25	NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms	Yashan Wang et.al.	2502.18008	translate	read	null
2025-02-25	Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric	Zhenyu Tao et.al.	2502.17983	translate	read	null
2025-02-25	FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real	Weiheng Liu et.al.	2502.17894	translate	read	null
2025-02-25	Sample-efficient diffusion-based control of complex nonlinear systems	Hongyi Chen et.al.	2502.17893	translate	read	null
2025-02-24	Event-Based Limit Order Book Simulation under a Neural Hawkes Process: Application in Market-Making	Luca Lalor et.al.	2502.17417	translate	read	null
2025-02-24	Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models	Alon Albalak et.al.	2502.17387	translate	read	link
2025-02-24	Distributed Coordination for Heterogeneous Non-Terrestrial Networks	Jikang Deng et.al.	2502.17366	translate	read	null
2025-02-24	TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control	Zifeng Zhuang et.al.	2502.17322	translate	read	null
2025-02-24	Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach	Jichen Li et.al.	2502.17307	translate	read	null
2025-02-24	A Reinforcement Learning Approach to Non-prehensile Manipulation through Sliding	Hamidreza Raei et.al.	2502.17221	translate	read	null
2025-02-24	Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning	Weiji Xie et.al.	2502.17219	translate	read	null
2025-02-24	Teleology-Driven Affective Computing: A Causal Framework for Sustained Well-Being	Bin Yin et.al.	2502.17172	translate	read	null
2025-02-24	A Novel Multiple Access Scheme for Heterogeneous Wireless Communications using Symmetry-aware Continual Deep Reinforcement Learning	Hamidreza Mazandarani et.al.	2502.17167	translate	read	null
2025-02-24	MA2RL: Masked Autoencoders for Generalizable Multi-Agent Reinforcement Learning	Jinyuan Feng et.al.	2502.17046	translate	read	null
2025-02-21	BOSS: Benchmark for Observation Space Shift in Long-Horizon Task	Yue Yang et.al.	2502.15679	translate	read	null
2025-02-21	VaViM and VaVAM: Autonomous Driving through Video Generative Modeling	Florent Bartoccioni et.al.	2502.15672	translate	read	link
2025-02-21	Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network	Vincent Hsiao et.al.	2502.15662	translate	read	null
2025-02-21	A Simulation Pipeline to Facilitate Real-World Robotic Reinforcement Learning Applications	Jefferson Silveira et.al.	2502.15649	translate	read	null
2025-02-21	Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach	Xiangtong Yao et.al.	2502.15613	translate	read	null
2025-02-21	SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning	Xuyang Li et.al.	2502.15512	translate	read	null
2025-02-21	Learning Long-Horizon Robot Manipulation Skills via Privileged Action	Xiaofeng Mao et.al.	2502.15442	translate	read	null
2025-02-21	TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning	Giuseppe Paolo et.al.	2502.15425	translate	read	null
2025-02-21	Hyperspherical Normalization for Scalable Deep Reinforcement Learning	Hojoon Lee et.al.	2502.15280	translate	read	null
2025-02-21	CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models	Shunchang Liu et.al.	2502.15278	translate	read	null
2025-02-20	Generating $π$ -Functional Molecules Using STGG+ with Active Learning	Alexia Jolicoeur-Martineau et.al.	2502.14842	translate	read	link
2025-02-20	Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models	Vlad Sobal et.al.	2502.14819	translate	read	null
2025-02-20	Making Universal Policies Universal	Niklas Höpner et.al.	2502.14777	translate	read	null
2025-02-20	Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning	Tian Xie et.al.	2502.14768	translate	read	link
2025-02-20	Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse	Michael Doherty et.al.	2502.14741	translate	read	null
2025-02-20	Length-Controlled Margin-Based Preference Optimization without Reference Model	Gengxu Li et.al.	2502.14643	translate	read	null
2025-02-20	Curiosity Driven Multi-agent Reinforcement Learning for 3D Game Testing	Raihana Ferdous et.al.	2502.14606	translate	read	null
2025-02-20	ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification	Hyunseok Lee et.al.	2502.14565	translate	read	link
2025-02-20	MLGym: A New Framework and Benchmark for Advancing AI Research Agents	Deepak Nathani et.al.	2502.14499	translate	read	link
2025-02-20	Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization	Zhitao He et.al.	2502.14496	translate	read	link
2025-02-19	A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects	Arjun Gupta et.al.	2502.13964	translate	read	null
2025-02-19	Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks	Guilherme Palma et.al.	2502.13918	translate	read	null
2025-02-19	Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning	Antoine Moulin et.al.	2502.13900	translate	read	null
2025-02-19	NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants	Yiran Qin et.al.	2502.13894	translate	read	null
2025-02-19	Uncertainty quantification for Markov chains with application to temporal difference learning	Weichen Wu et.al.	2502.13822	translate	read	null
2025-02-19	Learning to explore when mistakes are not allowed	Charly Pecqueux-Guézénec et.al.	2502.13801	translate	read	null
2025-02-19	User Agency and System Automation in Interactive Intelligent Systems	Thomas Langerak et.al.	2502.13779	translate	read	null
2025-02-19	Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values	Hongbo Zhang et.al.	2502.13723	translate	read	null
2025-02-19	Hierarchical RL-MPC for Demand Response Scheduling	Maximilian Bloor et.al.	2502.13714	translate	read	null
2025-02-19	User Association and Coordinated Beamforming in Cognitive Aerial-Terrestrial Networks: A Safe Reinforcement Learning Approach	Zizhen Zhou et.al.	2502.13663	translate	read	null
2025-02-18	Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization	Shuo Xing et.al.	2502.13146	translate	read	link
2025-02-18	RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning	Hao Gao et.al.	2502.13144	translate	read	link
2025-02-18	Theorem Prover as a Judge for Synthetic Data Generation	Joshua Ong Jun Leang et.al.	2502.13137	translate	read	null
2025-02-18	Text2World: Benchmarking Large Language Models for Symbolic World Model Generation	Mengkang Hu et.al.	2502.13092	translate	read	link
2025-02-18	Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation	Sha Li et.al.	2502.13019	translate	read	null
2025-02-18	HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit	Qingwei Ben et.al.	2502.13013	translate	read	link
2025-02-18	Integrating Reinforcement Learning, Action Model Learning, and Numeric Planning for Tackling Complex Tasks	Yarin Benyamin et.al.	2502.13006	translate	read	link
2025-02-18	Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options	Lakshmi Nair et.al.	2502.12929	translate	read	link
2025-02-18	Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning	Nandakishor M et.al.	2502.12876	translate	read	null
2025-02-18	A Survey on DRL based UAV Communications and Networking: DRL Fundamentals, Applications and Implementations	Wei Zhao et.al.	2502.12875	translate	read	null
2025-02-17	Scaling Test-Time Compute Without Verification or RL is Suboptimal	Amrith Setlur et.al.	2502.12118	translate	read	null
2025-02-17	Unhackable Temporal Rewarding for Scalable Video MLLMs	En Yu et.al.	2502.12081	translate	read	link
2025-02-17	How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines	Ayan Sengupta et.al.	2502.12051	translate	read	null
2025-02-17	Theoretical Barriers in Bellman-Based Reinforcement Learning	Brieuc Pinon et.al.	2502.11968	translate	read	null
2025-02-17	Massively Scaling Explicit Policy-conditioned Value Functions	Nico Bohlinger et.al.	2502.11949	translate	read	null
2025-02-17	FitLight: Federated Imitation Learning for Plug-and-Play Autonomous Traffic Signal Control	Yutong Ye et.al.	2502.11937	translate	read	null
2025-02-17	VLP: Vision-Language Preference Learning for Embodied Manipulation	Runze Liu et.al.	2502.11918	translate	read	null
2025-02-17	CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning	Yanxiao Zhao et.al.	2502.11896	translate	read	null
2025-02-17	Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving?	Natalie Grabowsky et.al.	2502.11864	translate	read	null
2025-02-17	Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces	Eric Eaton et.al.	2502.11828	translate	read	null
2025-02-14	BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds	Huayi Wang et.al.	2502.10363	translate	read	null
2025-02-14	Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations	Abdelrhman Shaheen et.al.	2502.10303	translate	read	null
2025-02-14	Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding	Laurin Luttmann et.al.	2502.10233	translate	read	null
2025-02-14	Dynamic Reinforcement Learning for Actors	Katsunari Shibata et.al.	2502.10200	translate	read	null
2025-02-14	Reinforcement Learning based Constrained Optimal Control: an Interpretable Reward Design	Jingjie Ni et.al.	2502.10187	translate	read	null
2025-02-14	Combinatorial Reinforcement Learning with Preference Feedback	Joongkyu Lee et.al.	2502.10158	translate	read	null
2025-02-14	MonoForce: Learnable Image-conditioned Physics Engine	Ruslan Agishev et.al.	2502.10156	translate	read	null
2025-02-14	Cooperative Multi-Agent Planning with Adaptive Skill Synthesis	Zhiyuan Li et.al.	2502.10148	translate	read	null
2025-02-14	Provably Efficient RL under Episode-Wise Safety in Linear CMDPs	Toshinori Kitamura et.al.	2502.10138	translate	read	null
2025-02-14	Causal Information Prioritization for Efficient Reinforcement Learning	Hongye Cao et.al.	2502.10097	translate	read	null
2025-02-13	DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References	Xueyi Liu et.al.	2502.09614	translate	read	link
2025-02-13	Coupled Rendezvous and Docking Maneuver control of satellite using Reinforcement learning-based Adaptive Fixed-Time Sliding Mode Controller	Rakesh Kumar Sahoo et.al.	2502.09517	translate	read	null
2025-02-13	Variable Stiffness for Robust Locomotion through Reinforcement Learning	Dario Spoljaric et.al.	2502.09436	translate	read	null
2025-02-13	A Survey of Reinforcement Learning for Optimization in Automation	Ahmad Farooq et.al.	2502.09417	translate	read	null
2025-02-13	Generalizable Reinforcement Learning with Biologically Inspired Hyperdimensional Occupancy Grid Maps for Exploration and Goal-Directed Path Planning	Shay Snyder et.al.	2502.09393	translate	read	null
2025-02-13	Machine learning for modelling unstructured grid data in computational physics: a review	Sibo Cheng et.al.	2502.09346	translate	read	null
2025-02-13	Revisiting Topological Interference Management: A Learning-to-Code on Graphs Perspective	Zhiwei Shan et.al.	2502.09344	translate	read	null
2025-02-13	Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning	Daniel Koutas et.al.	2502.09298	translate	read	null
2025-02-13	Autonomous Task Completion Based on Goal-directed Answer Set Programming	Alexis R. Tudor et.al.	2502.09208	translate	read	null
2025-02-13	Logical Reasoning in Large Language Models: A Survey	Hanmeng Liu et.al.	2502.09100	translate	read	link
2025-02-12	Re $^3$ Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation	Xiaoshen Han et.al.	2502.08645	translate	read	link
2025-02-12	A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards	Shivansh Patel et.al.	2502.08643	translate	read	null
2025-02-12	Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning	Dhruv Rohatgi et.al.	2502.08632	translate	read	null
2025-02-12	Robot Data Curation with Mutual Information Estimators	Joey Hejna et.al.	2502.08623	translate	read	null
2025-02-12	Learning to Group and Grasp Multiple Objects	Takahiro Yonemaru et.al.	2502.08452	translate	read	null
2025-02-12	CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World	Yankai Fu et.al.	2502.08449	translate	read	null
2025-02-12	Acceleration of crystal structure relaxation with Deep Reinforcement Learning	Elena Trukhan et.al.	2502.08405	translate	read	null
2025-02-12	Learning Humanoid Standing-up Control across Diverse Postures	Tao Huang et.al.	2502.08378	translate	read	link
2025-02-12	Towards Principled Multi-Agent Task Agnostic Exploration	Riccardo Zamboni et.al.	2502.08365	translate	read	null
2025-02-12	Deterministic generation of non-classical mechanical states in cavity optomechanics via reinforcement learning	Yu-Hong Liu et.al.	2502.08350	translate	read	null
2025-02-11	Polynomial-Time Approximability of Constrained Reinforcement Learning	Jeremy McMahan et.al.	2502.07764	translate	read	null
2025-02-11	DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove	Han Zhang et.al.	2502.07730	translate	read	null
2025-02-11	Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning	Aya Kayal et.al.	2502.07715	translate	read	null
2025-02-11	A Unifying Framework for Causal Imitation Learning with Hidden Confounders	Daqian Shao et.al.	2502.07656	translate	read	null
2025-02-11	Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning	Zhaoting Li et.al.	2502.07645	translate	read	null
2025-02-11	Distributed Value Decomposition Networks with Networked Agents	Guilherme S. Varela et.al.	2502.07635	translate	read	null
2025-02-11	Evolution of cooperation in a bimodal mixture of conditional cooperators	Chenyang Zhao et.al.	2502.07537	translate	read	null
2025-02-11	Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization	Daniel Palenicek et.al.	2502.07523	translate	read	null
2025-02-11	Logarithmic Regret for Online KL-Regularized Reinforcement Learning	Heyang Zhao et.al.	2502.07460	translate	read	null
2025-02-11	Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation	Erik M. Lintunen et.al.	2502.07423	translate	read	null
2025-02-10	Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning	Chengqi Lyu et.al.	2502.06781	translate	read	link
2025-02-10	On the Emergence of Thinking in LLMs I: Searching for the Right Intuition	Guanghao Ye et.al.	2502.06773	translate	read	link
2025-02-10	ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates	Ling Yang et.al.	2502.06772	translate	read	link
2025-02-10	AgilePilot: DRL-Based Drone Agent for Real-Time Motion Planning in Dynamic Environments by Leveraging Object Detection	Roohan Ahmed Khan et.al.	2502.06725	translate	read	null
2025-02-10	Discovery of skill switching criteria for learning agile quadruped locomotion	Wanming Yu et.al.	2502.06676	translate	read	null
2025-02-10	Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series	Aurélien Renault et.al.	2502.06584	translate	read	null
2025-02-10	Predictive Red Teaming: Breaking Policies Without Breaking Robots	Anirudha Majumdar et.al.	2502.06575	translate	read	null
2025-02-10	Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning	Jean Vassoyan et.al.	2502.06533	translate	read	link
2025-02-10	Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling	Shenghong He et.al.	2502.06491	translate	read	null
2025-02-10	SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding	Shuhao Liao et.al.	2502.06440	translate	read	null
2025-02-07	DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails	Yihe Deng et.al.	2502.05163	translate	read	link
2025-02-07	Use of Winsome Robots for Understanding Human Feedback (UWU)	Jessica Eggers et.al.	2502.05118	translate	read	null
2025-02-07	3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery	Xiuyuan Hu et.al.	2502.05107	translate	read	link
2025-02-07	Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Tushar Pandey et.al.	2502.05078	translate	read	link
2025-02-07	Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning approach with Policy Distillation	Wenqi Bai et.al.	2502.05069	translate	read	null
2025-02-07	Seasonal Station-Keeping of Short Duration High Altitude Balloons using Deep Reinforcement Learning	Tristan K. Schuler et.al.	2502.05014	translate	read	null
2025-02-07	A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning Enhanced Approach	Taiyi Wang et.al.	2502.05001	translate	read	null
2025-02-07	Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits	Finn Rietz et.al.	2502.04979	translate	read	null
2025-02-07	Towards Smarter Sensing: 2D Clutter Mitigation in RL-Driven Cognitive MIMO Radar	Adam Umra et.al.	2502.04967	translate	read	null
2025-02-07	Fast Adaptive Anti-Jamming Channel Access via Deep Q Learning and Coarse-Grained Spectrum Prediction	Jianshu Zhang et.al.	2502.04963	translate	read	null
2025-02-06	DexterityGen: Foundation Controller for Unprecedented Dexterity	Zhao-Heng Yin et.al.	2502.04307	translate	read	null
2025-02-06	PILAF: Optimal Human Preference Sampling for Reward Modeling	Yunzhen Feng et.al.	2502.04270	translate	read	null
2025-02-06	Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning	Wesley A. Suttle et.al.	2502.04141	translate	read	null
2025-02-06	Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents	Yuchen Lian et.al.	2502.04038	translate	read	null
2025-02-06	Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning	Nikunj Gupta et.al.	2502.04028	translate	read	link
2025-02-06	Bilevel Multi-Armed Bandit-Based Hierarchical Reinforcement Learning for Interaction-Aware Self-Driving at Unsignalized Intersections	Zengqi Peng et.al.	2502.03960	translate	read	null
2025-02-06	Fairness Aware Reinforcement Learning via Proximal Policy Optimization	Gabriele La Malfa et.al.	2502.03953	translate	read	null
2025-02-06	CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning	Yousef Koka et.al.	2502.03946	translate	read	null
2025-02-06	Mirror Descent Actor Critic via Bounded Advantage Learning	Ryo Iwaki et.al.	2502.03854	translate	read	null
2025-02-06	PAGNet: Pluggable Adaptive Generative Networks for Information Completion in Multi-Agent Communication	Zhuohui Zhang et.al.	2502.03845	translate	read	null
2025-02-05	Deep Reinforcement Learning-Based Optimization of Second-Life Battery Utilization in Electric Vehicles Charging Stations	Rouzbeh Haghighi et.al.	2502.03412	translate	read	null
2025-02-05	Lightweight Authenticated Task Offloading in 6G-Cloud Vehicular Twin Networks	Sarah Al-Shareeda et.al.	2502.03403	translate	read	null
2025-02-05	Energy-Efficient Flying LoRa Gateways: A Multi-Agent Reinforcement Learning Approach	Abdullahi Isa Ahmed et.al.	2502.03377	translate	read	null
2025-02-05	Demystifying Long Chain-of-Thought Reasoning in LLMs	Edward Yeo et.al.	2502.03373	translate	read	link
2025-02-05	Learning from Active Human Involvement through Proxy Value Propagation	Zhenghao Peng et.al.	2502.03369	translate	read	null
2025-02-05	Conditional Prediction by Simulation for Automated Driving	Fabian Konstantinidis et.al.	2502.03286	translate	read	null
2025-02-05	Calibrated Unsupervised Anomaly Detection in Multivariate Time-series using Reinforcement Learning	Saba Sanami et.al.	2502.03245	translate	read	null
2025-02-05	Underwater Soft Fin Flapping Motion with Deep Neural Network Based Surrogate Model	Yuya Hamamatsu et.al.	2502.03135	translate	read	null
2025-02-05	Double Distillation Network for Multi-Agent Reinforcement Learning	Yang Zhou et.al.	2502.03125	translate	read	null
2025-02-05	HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller	Qiyuan Zhang et.al.	2502.03122	translate	read	null
2025-02-04	Flow Q-Learning	Seohong Park et.al.	2502.02538	translate	read	null
2025-02-04	Brief analysis of DeepSeek R1 and it’s implications for Generative AI	Sarah Mercer et.al.	2502.02523	translate	read	null
2025-02-04	Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search	Maohao Shen et.al.	2502.02508	translate	read	null
2025-02-04	Towards Fast Graph Generation via Autoregressive Noisy Filtration Modeling	Markus Krimmel et.al.	2502.02415	translate	read	null
2025-02-04	Achieving Hiding and Smart Anti-Jamming Communication: A Parallel DRL Approach against Moving Reactive Jammer	Yangyang Li et.al.	2502.02385	translate	read	null
2025-02-04	Circular Microalgae-Based Carbon Control for Net Zero	Federico Zocco et.al.	2502.02382	translate	read	null
2025-02-04	Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning	Donglin Zhan et.al.	2502.02332	translate	read	null
2025-02-04	Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation	Siyu Wang et.al.	2502.02327	translate	read	null
2025-02-04	DIME:Diffusion-Based Maximum Entropy Reinforcement Learning	Onur Celik et.al.	2502.02316	translate	read	null
2025-02-04	MAGNNET: Multi-Agent Graph Neural Network-based Efficient Task Allocation for Autonomous Vehicles with Deep Reinforcement Learning	Lavanya Ratnabala et.al.	2502.02311	translate	read	null
2025-02-03	SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments	Hüseyin Aydın et.al.	2501.19245	translate	read	null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)