Reinforcement Learning - 2025-12

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-12-31	Dichotomous Diffusion Policy Optimization	Ruiming Liang et.al.	2601.00898	translate	read	null
2025-12-31	VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition	Hongbo Jin et.al.	2601.00887	translate	read	null
2025-12-30	SmartFlow Reinforcement Learning and Agentic AI for Bike-Sharing Optimisation	Aditya Sreevatsa K et.al.	2601.00868	translate	read	null
2025-12-25	Horizon Reduction as Information Loss in Offline Reinforcement Learning	Uday Kumar Nidadala et.al.	2601.00831	translate	read	null
2025-12-31	GRL-SNAM: Geometric Reinforcement Learning with Path Differential Hamiltonians for Simultaneous Navigation and Mapping in Unknown Environments	Aditya Sai Ellendula et.al.	2601.00116	translate	read	null
2025-12-31	Adaptive Pinching Antenna Optimization via Meta-Learning for Physical-Layer Security in Dynamic Wireless Networks	Khalid T. Musri et.al.	2601.00115	translate	read	null
2025-12-31	Universal Adaptive Constraint Propagation: Scaling Structured Inference for Large Language Models via Meta-Reinforcement Learning	Ibne Farabi Shihab et.al.	2601.00095	translate	read	null
2025-12-31	Reinforcement learning with timed constraints for robotics motion planning	Zhaoan Wang et.al.	2601.00087	translate	read	null
2025-12-31	Coordinated Humanoid Manipulation with Choice Policies	Haozhi Qi et.al.	2512.25072	translate	read	null
2025-12-31	Scaling Open-Ended Reasoning to Predict the Future	Nikhil Chandak et.al.	2512.25070	translate	read	null
2025-12-31	Many Minds from One Model: Bayesian Transformers for Population Intelligence	Diji Yang et.al.	2512.25063	translate	read	null
2025-12-31	ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning	Timo Kaufmann et.al.	2512.25023	translate	read	null
2025-12-31	MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control	Yongwei Zhang et.al.	2512.24955	translate	read	null
2025-12-31	Iterative Deployment Improves Planning Skills in LLMs	Augusto B. Corrêa et.al.	2512.24940	translate	read	null
2025-12-31	Throughput Optimization in UAV-Mounted RIS under Jittering and Imperfect CSI via DRL	Anas K. Saeed et.al.	2512.24773	translate	read	null
2025-12-31	Sparse Offline Reinforcement Learning with Corruption Robustness	Nam Phuong Tran et.al.	2512.24768	translate	read	null
2025-12-31	Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow	Karthik Dharmarajan et.al.	2512.24766	translate	read	null
2025-12-31	Control of Microrobots with Reinforcement Learning under On-Device Compute Constraints	Yichen Liu et.al.	2512.24740	translate	read	null
2025-12-31	Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting	Kai Ye et.al.	2512.24702	translate	read	null
2025-12-31	Dynamic Policy Learning for Legged Robot with Simplified Model Pretraining and Model Homotopy Transfer	Dongyun Kang et.al.	2512.24698	translate	read	null
2025-12-31	Hierarchical Online Optimization Approach for IRS-enabled Low-altitude MEC in Vehicular Networks	Yixian Wang et.al.	2512.24659	translate	read	null
2025-12-31	RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence	Chengkai Hou et.al.	2512.24653	translate	read	null
2025-12-31	Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation	Yury Kolomeytsev et.al.	2512.24651	translate	read	null
2025-12-31	Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization	Yuchen Shi et.al.	2512.24615	translate	read	null
2025-12-31	Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization	Dong Qiu et.al.	2512.24609	translate	read	null
2025-12-31	Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning	Shanyu Han et.al.	2512.24580	translate	read	null
2025-12-31	From Perception to Punchline: Empowering VLM with the Art of In-the-wild Meme	Xueyan Li et.al.	2512.24555	translate	read	null
2025-12-31	From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning	Amir Tahmasbi et.al.	2512.24532	translate	read	null
2025-12-30	Networked Markets, Fragmented Data: Adaptive Graph Learning for Customer Risk Analytics and Policy Design	Lecheng Zheng et.al.	2512.24487	translate	read	null
2025-12-30	Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics	Akash Samanta et.al.	2512.24445	translate	read	null
2025-12-30	Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models	Lars van der Laan et.al.	2512.24407	translate	read	null
2025-12-30	SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning	Yong Xien Chng et.al.	2512.24330	translate	read	null
2025-12-30	MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems	Wan Jiang et.al.	2512.24325	translate	read	null
2025-12-30	Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking	Meiqi Chen et.al.	2512.24297	translate	read	null
2025-12-30	Real-world Reinforcement Learning from Suboptimal Interventions	Yinuo Zhao et.al.	2512.24288	translate	read	null
2025-12-30	DRL-TH: Jointly Utilizing Temporal Graph Attention and Hierarchical Fusion for UGV Navigation in Crowded Environments	Ruitong Li et.al.	2512.24284	translate	read	null
2025-12-30	Deep Reinforcement Learning for Solving the Fleet Size and Mix Vehicle Routing Problem	Pengfu Wan et.al.	2512.24251	translate	read	null
2025-12-30	Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning	Chubin Chen et.al.	2512.24146	translate	read	null
2025-12-30	GARDO: Reinforcing Diffusion Models without Reward Hacking	Haoran He et.al.	2512.24138	translate	read	null
2025-12-30	HY-MT1.5 Technical Report	Mao Zheng et.al.	2512.24092	translate	read	null
2025-12-30	How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns	Haoyue Bai et.al.	2512.24063	translate	read	null
2025-12-30	Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data	Wenye Li et.al.	2512.24056	translate	read	null
2025-12-30	ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment	Natchaya Temyingyong et.al.	2512.24040	translate	read	null
2025-12-30	Reinforced Diffusion: Learning to Push the Limits of Anisotropic Diffusion for Image Denoising	Xinran Qin et.al.	2512.24035	translate	read	null
2025-12-30	RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations	Xingqi He et.al.	2512.24023	translate	read	null
2025-12-30	CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards	Zhiming Lin et.al.	2512.23971	translate	read	null
2025-12-30	Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration	Lars van der Laan et.al.	2512.23927	translate	read	null
2025-12-30	Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias	Xia Chen et.al.	2512.23916	translate	read	null
2025-12-29	Beamforming for Massive MIMO Aerial Communications: A Robust and Scalable DRL Approach	Hesam Khoshkbari et.al.	2512.23902	translate	read	null
2025-12-29	Distributed Beamforming in Massive MIMO Communication for a Constellation of Airborne Platform Stations	Hesam Khoshkbari et.al.	2512.23900	translate	read	null
2025-12-29	Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR	Yuyang Zhang et.al.	2512.23870	translate	read	null
2025-12-29	Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting	Lars van der Laan et.al.	2512.23805	translate	read	null
2025-12-29	Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark	Manu et.al.	2512.23779	translate	read	null
2025-12-29	FineFT: Efficient and Risk-Aware Ensemble Reinforcement Learning for Futures Trading	Molei Qin et.al.	2512.23773	translate	read	null
2025-12-29	Safety-Biased Policy Optimisation: Towards Hard-Constrained Reinforcement Learning via Trust Regions	Ankit Kanwar et.al.	2512.23770	translate	read	null
2025-12-28	Audited Skill-Graph Self-Improvement for Agentic LLMs via Verifiable Rewards, Experience Synthesis, and Continual Memory	Ken Huang et.al.	2512.23760	translate	read	null
2025-12-29	Training AI Co-Scientists Using Rubric Rewards	Shashwat Goel et.al.	2512.23707	translate	read	null
2025-12-29	Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation	Huajie Tan et.al.	2512.23703	translate	read	null
2025-12-29	Bellman Calibration for V-Learning in Offline Reinforcement Learning	Lars van der Laan et.al.	2512.23694	translate	read	null
2025-12-29	Le Cam Distortion: A Decision-Theoretic Framework for Robust Transfer Learning	Deniz Akdemir et.al.	2512.23617	translate	read	null
2025-12-29	ProGuard: Towards Proactive Multimodal Safeguard	Shaohan Yu et.al.	2512.23573	translate	read	null
2025-12-29	ThinkGen: Generalized Thinking for Visual Generation	Siyu Jiao et.al.	2512.23568	translate	read	null
2025-12-29	A NEAT Approach to Evolving Neural-Network-based Optimization of Chiral Photonic Metasurfaces: Application of a Neuro-Evolution Pipeline	Davide Filippozzi et.al.	2512.23558	translate	read	null
2025-12-29	PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis	Shengyi Hua et.al.	2512.23545	translate	read	null
2025-12-29	Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning	Zuoyou Jiang et.al.	2512.23515	translate	read	null
2025-12-29	Hierarchical Decision Mamba Meets Agentic AI: A Novel Approach for RAN Slicing in 6G	Md Arafat Habib et.al.	2512.23502	translate	read	null
2025-12-29	Agentic AI for Autonomous Defense in Software Supply Chain Security: Beyond Provenance to Vulnerability Mitigation	Toqeer Ali Syed et.al.	2512.23480	translate	read	null
2025-12-29	HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation	Yuxin Wen et.al.	2512.23464	translate	read	null
2025-12-29	Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance	Zhuo Li et.al.	2512.23461	translate	read	null
2025-12-29	Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following	Kongcheng Zhang et.al.	2512.23457	translate	read	null
2025-12-29	The World Is Bigger! A Computationally-Embedded Perspective on the Big World Hypothesis	Alex Lewandowski et.al.	2512.23419	translate	read	null
2025-12-29	AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis	Cehua Yang et.al.	2512.23366	translate	read	null
2025-12-29	CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation	Ke Niu et.al.	2512.23333	translate	read	null
2025-12-29	Splitwise: Collaborative Edge-Cloud Inference for LLMs via Lyapunov-Assisted DRL	Abolfazl Younesi et.al.	2512.23310	translate	read	null
2025-12-29	Agentic AI-Enhanced Semantic Communications: Foundations, Architecture, and Applications	Haixiao Gao et.al.	2512.23294	translate	read	null
2025-12-29	Interpretable Safety Alignment via SAE-Constructed Low-Rank Subspace Adaptation	Dianyun Wang et.al.	2512.23260	translate	read	null
2025-12-29	ViLaCD-R1: A Vision-Language Framework for Semantic Change Detection in Remote Sensing	Xingwei Ma et.al.	2512.23244	translate	read	null
2025-12-29	A Human-Oriented Cooperative Driving Approach: Integrating Driving Intention, State, and Conflict	Qin Wang et.al.	2512.23220	translate	read	null
2025-12-29	Evaluating Parameter Efficient Methods for RLVR	Qingyu Yin et.al.	2512.23165	translate	read	null
2025-12-29	SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling	Yufan He et.al.	2512.23162	translate	read	null
2025-12-28	A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms	Yingru Li et.al.	2512.23097	translate	read	null
2025-12-28	Benchmark Success, Clinical Failure: When Reinforcement Learning Optimizes for Benchmarks, Not Patients	Armin Berger et.al.	2512.23090	translate	read	null
2025-12-28	Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning	Yingru Li et.al.	2512.23087	translate	read	null
2025-12-28	Trust Region Masking for Long-Horizon LLM Reinforcement Learning	Yingru Li et.al.	2512.23075	translate	read	null
2025-12-28	Diversity or Precision? A Deep Dive into Next Token Prediction	Haoyuan Wu et.al.	2512.22955	translate	read	null
2025-12-28	APO: Alpha-Divergence Preference Optimization	Wang Zixian et.al.	2512.22953	translate	read	null
2025-12-28	Heterogeneity in Multi-Agent Reinforcement Learning	Tianyi Hu et.al.	2512.22941	translate	read	null
2025-12-28	Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning	Ünver Çiftçi et.al.	2512.22910	translate	read	null
2025-12-28	SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning	Xiaotian Ren et.al.	2512.22895	translate	read	null
2025-12-28	Reinforcement Networks: novel framework for collaborative Multi-Agent Reinforcement Learning tasks	Maksim Kryzhanovskiy et.al.	2512.22876	translate	read	null
2025-12-28	Adaptive Trust Consensus for Blockchain IoT: Comparing RL, DRL, and MARL Against Naive, Collusive, Adaptive, Byzantine, and Sleeper Attacks	Soham Padia et.al.	2512.22860	translate	read	null
2025-12-28	AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning	Shihao Cai et.al.	2512.22857	translate	read	null
2025-12-28	ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning	Bangya Liu et.al.	2512.22854	translate	read	null
2025-12-28	MARPO: A Reflective Policy Optimization for Multi Agent Reinforcement Learning	Cuiling Wu et.al.	2512.22832	translate	read	null
2025-12-28	TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning	Gaurav Chaudhary et.al.	2512.22824	translate	read	null
2025-12-28	ReDiF: Reinforced Distillation for Few Step Diffusion	Amirhossein Tighkhorshid et.al.	2512.22802	translate	read	null
2025-12-28	Parallel Diffusion Solver via Residual Dirichlet Policy Optimization	Ruoyu Wang et.al.	2512.22796	translate	read	null
2025-12-28	FoldAct: Efficient and Stable Context Folding for Long-Horizon Search Agents	Jiaqi Shao et.al.	2512.22733	translate	read	null
2025-12-27	Cyber Resilience in Next-Generation Networks: Threat Landscape, Theoretical Foundations, and Design Paradigms	Junaid Farooq et.al.	2512.22721	translate	read	null
2025-12-27	Memento 2: Learning by Stateful Reflective Memory	Jun Wang et.al.	2512.22716	translate	read	null
2025-12-27	Optimal Regulation of Nonlinear Input-Affine Systems via an Integral Reinforcement Learning-Based State-Dependent Riccati Equation Approach	Arya Rashidinejad Meibodi et.al.	2512.22668	translate	read	null
2025-12-27	FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution	Yidi Liu et.al.	2512.22647	translate	read	null
2025-12-27	RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure	Wei Gao et.al.	2512.22560	translate	read	null
2025-12-27	AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing	Jiacheng Li et.al.	2512.22455	translate	read	null
2025-12-26	PHANTOM: Physics-Aware Adversarial Attacks against Federated Learning-Coordinated EV Charging Management System	Mohammad Zakaria Haider et.al.	2512.22381	translate	read	null
2025-12-26	Reinforcement Learning for Optimal Stopping in POMDPs with Application to Quickest Change Detection	Austin Cooper et.al.	2512.22347	translate	read	null
2025-12-26	SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents	Shaofei Cai et.al.	2512.22322	translate	read	null
2025-12-26	VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning	Yang Ding et.al.	2512.22315	translate	read	null
2025-12-24	Agentic Software Issue Resolution with Large Language Models: A Survey	Zhonghao Jiang et.al.	2512.22256	translate	read	null
2025-12-23	Masking Teacher and Reinforcing Student for Distilling Vision-Language Models	Byung-Kwan Lee et.al.	2512.22238	translate	read	null
2025-12-23	DiRL: An Efficient Post-Training Framework for Diffusion Language Models	Ying Zhu et.al.	2512.22234	translate	read	link
2025-12-26	Hybrid Deep Reinforcement Learning for Joint Resource Allocation in Multi-Active RIS-Aided Uplink Communications	Mohamed Shalma et.al.	2512.22107	translate	read	null
2025-12-26	Meta-Learning-Based Handover Management in NextG O-RAN	Michail Kalntis et.al.	2512.22022	translate	read	null
2025-12-26	Latency-Optimal Cache-aided Multicast Streaming via Forward-Backward Reinforcement Learning	Mohsen Amidzadeh et.al.	2512.21954	translate	read	null
2025-12-26	SWE-RM: Execution-free Feedback For Software Engineering Agents	KaShun Shum et.al.	2512.21919	translate	read	null
2025-12-26	A Comedy of Estimators: On KL Regularization in RL Training of LLMs	Vedant Shah et.al.	2512.21852	translate	read	null
2025-12-26	Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning	YuXiang Kong et.al.	2512.21828	translate	read	null
2025-12-26	Q-A3C2: Quantum Reinforcement Learning with Time-Series Dynamic Clustering for Adaptive ETF Stock Selection	Yen-Ku Liu et.al.	2512.21819	translate	read	null
2025-12-25	Multiconnectivity for SAGIN: Current Trends, Challenges, AI-driven Solutions, and Opportunities	Abd Ullah Khan et.al.	2512.21717	translate	read	null
2025-12-25	Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search	Maximilian Weichart et.al.	2512.21648	translate	read	null
2025-12-25	Jointly Optimal Policies for Remote Estimation of Autoregressive Markov Processes over Time-Correlated Fading Channel	Manali Dutta et.al.	2512.21630	translate	read	null
2025-12-25	Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards	Xinyu Tang et.al.	2512.21625	translate	read	null
2025-12-25	Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations	Xin Liu et.al.	2512.21586	translate	read	null
2025-12-25	Towards Learning-Based Formula 1 Race Strategies	Giona Fieni et.al.	2512.21570	translate	read	null
2025-12-25	Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model	Yanhao Li et.al.	2512.21540	translate	read	null
2025-12-25	Generative Actor Critic	Aoyang Qin et.al.	2512.21527	translate	read	null
2025-12-25	DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO	Henglin Liu et.al.	2512.21514	translate	read	null
2025-12-24	dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning	Shirui Chen et.al.	2512.21446	translate	read	null
2025-12-24	A Survey of Freshness-Aware Wireless Networking with Reinforcement Learning	Alimu Alibotaiken et.al.	2512.21412	translate	read	null
2025-12-24	A Reinforcement Learning Approach to Synthetic Data Generation	Natalia Espinosa-Dice et.al.	2512.21395	translate	read	null
2025-12-24	RoboCade: Gamifying Robot Data Collection	Suvir Mirchandani et.al.	2512.21235	translate	read	null
2025-12-24	MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models	Andres M Bran et.al.	2512.21231	translate	read	null
2025-12-24	Global End-Effector Pose Control of an Underactuated Aerial Manipulator via Reinforcement Learning	Shlok Deshmukh et.al.	2512.21085	translate	read	null
2025-12-24	Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics	Karim Abdelsalam et.al.	2512.21081	translate	read	null
2025-12-24	LSTM-Based Modeling and Reinforcement Learning Control of a Magnetically Actuated Catheter	Arya Rashidinejad Meibodi et.al.	2512.21063	translate	read	null
2025-12-24	Policy-Conditioned Policies for Multi-Agent Task Solving	Yue Lin et.al.	2512.21024	translate	read	null
2025-12-24	LLM-Empowered Agentic AI for QoE-Aware Network Slicing Management in Industrial IoT	Xudong Wang et.al.	2512.20997	translate	read	null
2025-12-24	Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions	Jingyang You et.al.	2512.20974	translate	read	null
2025-12-24	ReACT-Drug: Reaction-Template Guided Reinforcement Learning for de novo Drug Design	R Yadunandan et.al.	2512.20958	translate	read	null
2025-12-24	One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents	Zhaoxi Zhang et.al.	2512.20957	translate	read	null
2025-12-24	Model-free stochastic linear quadratic control for discrete-time systems with multiplicative and additive noises via semidefinite programming	Jing Guo et.al.	2512.20911	translate	read	null
2025-12-24	Embodied AI-Enhanced IoMT Edge Computing: UAV Trajectory Optimization and Task Offloading with Mobility Prediction	Siqi Mu et.al.	2512.20902	translate	read	null
2025-12-24	The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents	Zan-Kai Chong et.al.	2512.20884	translate	read	null
2025-12-24	Proprioception Enhances Vision Language Model in Generating Captions and Subtask Segmentations for Robot Task	Kanata Suzuki et.al.	2512.20876	translate	read	null
2025-12-24	NVIDIA Nemotron 3: Efficient and Open Intelligence	NVIDIA et.al.	2512.20856	translate	read	null
2025-12-23	QoS- and Physics-Aware Routing in Optical LEO Satellite Networks via Deep Reinforcement Learning	Mohammad Taghi Dabiri et.al.	2512.20835	translate	read	null
2025-12-23	Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions	Rashmeet Kaur Nayyar et.al.	2512.20831	translate	read	null
2025-12-23	Safety Alignment of LMs via Non-cooperative Games	Anselm Paulus et.al.	2512.20806	translate	read	link
2025-12-23	Generalization of RLVR Using Causal Reasoning as a Testbed	Brian Lu et.al.	2512.20760	translate	read	null
2025-12-23	AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent	Haipeng Luo et.al.	2512.20745	translate	read	null
2025-12-23	AI-Driven Green Cognitive Radio Networks for Sustainable 6G Communication	Anshul Sharma et.al.	2512.20739	translate	read	null
2025-12-23	Learning-Enabled Elastic Network Topology for Distributed ISAC Service Provisioning	Jie Chen et.al.	2512.20722	translate	read	null
2025-12-22	Mechanism-Based Intelligence (MBI): Differentiable Incentives for Rational Coordination and Guaranteed Alignment in Multi-Agent Systems	Stefano Grassi et.al.	2512.20688	translate	read	null
2025-12-23	LongVideoAgent: Multi-Agent Reasoning with Long Videos	Runtao Liu et.al.	2512.20618	translate	read	link
2025-12-23	Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning	Seijin Kobayashi et.al.	2512.20605	translate	read	null
2025-12-23	Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information	İbrahim Oğuz Çetinkaya et.al.	2512.20589	translate	read	null
2025-12-23	Performative Policy Gradient: Optimality in Performative Reinforcement Learning	Debabrota Basu et.al.	2512.20576	translate	read	null
2025-12-23	LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving	Long Nguyen et.al.	2512.20563	translate	read	link
2025-12-23	Recurrent Off-Policy Deep Reinforcement Learning Doesn’t Have to be Slow	Tyler Clark et.al.	2512.20513	translate	read	null
2025-12-23	Resilient Packet Forwarding: A Reinforcement Learning Approach to Routing in Gaussian Interconnected Networks with Clustered Faults	Mohammad Walid Charrwi et.al.	2512.20394	translate	read	null
2025-12-23	Identifying Appropriately-Sized Services with Deep Reinforcement Learning	Syeda Tasnim Fabiha et.al.	2512.20381	translate	read	null
2025-12-23	TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning	Saisai Yang et.al.	2512.20312	translate	read	null
2025-12-23	Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks	Divya Vijay et.al.	2512.20275	translate	read	null
2025-12-23	Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning	Kausthubh Manda et.al.	2512.20220	translate	read	null
2025-12-23	Joint Design of Embedded Index Coding and Beamforming for MIMO-based Distributed Computing via Multi-Agent Reinforcement Learning	Heekang Song et.al.	2512.20201	translate	read	null
2025-12-23	Edge-Served Congestion Control for Wireless Multipath Transmission with a Transformer Agent	Liang Wang et.al.	2512.20186	translate	read	null
2025-12-23	FaithLens: Detecting and Explaining Faithfulness Hallucination	Shuzheng Si et.al.	2512.20182	translate	read	link
2025-12-23	RESPOND: Risk-Enhanced Structured Pattern for LLM-driven Online Node-level Decision-making	Dan Chen et.al.	2512.20179	translate	read	null
2025-12-23	Offline Safe Policy Optimization From Heterogeneous Feedback	Ze Gong et.al.	2512.20173	translate	read	null
2025-12-23	Multi-hop Reasoning via Early Knowledge Alignment	Yuxin Wang et.al.	2512.20144	translate	read	link
2025-12-23	MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization	Zhuo Yang et.al.	2512.20135	translate	read	null
2025-12-23	Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering	Yuanhao Chen et.al.	2512.20115	translate	read	null
2025-12-23	ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language	Aly Lidayan et.al.	2512.20111	translate	read	null
2025-12-23	Information-directed sampling for bandits: a primer	Annika Hirling et.al.	2512.20096	translate	read	null
2025-12-23	Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents	Yiming Du et.al.	2512.20092	translate	read	link
2025-12-23	Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches	Chaithra et.al.	2512.20082	translate	read	null
2025-12-23	Scaling Reinforcement Learning for Content Moderation with Large Language Models	Hamed Firooz et.al.	2512.20061	translate	read	null
2025-12-23	An Optimal Policy for Learning Controllable Dynamics by Exploration	Peter N. Loxley et.al.	2512.20053	translate	read	null
2025-12-23	From Optimization to Learning: Dual-Approach Resource Allocation for Over-the-Air Edge Computing Under Execution Uncertainty	Tuo Wu et.al.	2512.20008	translate	read	null
2025-12-22	Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning	Jiayun Wu et.al.	2512.19920	translate	read	null
2025-12-21	Learning to Design City-scale Transit Routes	Bibek Poudel et.al.	2512.19767	translate	read	null
2025-12-22	Scalably Enhancing the Clinical Validity of a Task Benchmark with Physician Oversight	Junze Ye et.al.	2512.19691	translate	read	null
2025-12-22	Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies	Yuqiao Tan et.al.	2512.19673	translate	read	link
2025-12-22	Learning Generalizable Hand-Object Tracking from Synthetic Demonstrations	Yinhuai Wang et.al.	2512.19583	translate	read	null
2025-12-22	LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller	Kirill Djebko et.al.	2512.19576	translate	read	null
2025-12-22	Variational Autoregressive Networks Applied to $φ^4$ Field Theory Systems	Moxian Qian et.al.	2512.19575	translate	read	null
2025-12-22	CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal	Yongxin Wang et.al.	2512.19554	translate	read	null
2025-12-22	LacaDM: A Latent Causal Diffusion Model for Multiobjective Reinforcement Learning	Xueming Yan et.al.	2512.19516	translate	read	null
2025-12-22	A Gauss-Newton-Induced Structure-Exploiting Algorithm for Differentiable Optimal Control	Yuankun Chen et.al.	2512.19447	translate	read	null
2025-12-22	CodeSimpleQA: Scaling Factuality in Code Large Language Models	Jian Yang et.al.	2512.19424	translate	read	null
2025-12-22	Learning General Policies with Policy Gradient Methods	Simon Ståhlberg et.al.	2512.19366	translate	read	null
2025-12-22	Interpretable Hybrid Deep Q-Learning Framework for IoT-Based Food Spoilage Prediction with Synthetic Data Generation and Hardware Validation	Isshaan Singh et.al.	2512.19361	translate	read	null
2025-12-22	First-Order Representation Languages for Goal-Conditioned RL	Simon Ståhlberg et.al.	2512.19355	translate	read	null
2025-12-22	Enhancing PLS of Indoor IRS-VLC Systems for Colluding and Non-Colluding Eavesdroppers	Rashid Iqbal et.al.	2512.19339	translate	read	null
2025-12-22	Learning-Assisted Multi-Operator Variable Neighborhood Search for Urban Cable Routing	Wei Liu et.al.	2512.19321	translate	read	null
2025-12-22	SafeMed-R1: Adversarial Reinforcement Learning for Generalizable and Robust Medical Reasoning in Vision-Language Models	A. A. Gde Yogi Pramana et.al.	2512.19317	translate	read	null
2025-12-22	Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing	Xu Zhang et.al.	2512.19302	translate	read	null
2025-12-22	RMLer: Synthesizing Novel Objects across Diverse Categories via Reinforcement Mixing Learning	Jun Li et.al.	2512.19300	translate	read	null
2025-12-22	Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization	Zhaoyang Liu et.al.	2512.19270	translate	read	null
2025-12-22	WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving	Pengxuan Yang et.al.	2512.19133	translate	read	link
2025-12-22	AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards	Zihan Lin et.al.	2512.19126	translate	read	null
2025-12-22	Explicit and Non-asymptotic Query Complexities of Rank-Based Zeroth-order Algorithm on Stochastic Smooth Functions	Haishan Ye et.al.	2512.19104	translate	read	null
2025-12-22	Tool-Augmented Hybrid Ensemble Reasoning with Distillation for Bilingual Mathematical Problem Solving	Peiqing Lu et.al.	2512.19093	translate	read	null
2025-12-22	CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models	Pengyu Chen et.al.	2512.19083	translate	read	null
2025-12-22	ORPR: An OR-Guided Pretrain-then-Reinforce Learning Model for Inventory Management	Lingjie Zhao et.al.	2512.19001	translate	read	null
2025-12-22	DTCCL: Disengagement-Triggered Contrastive Continual Learning for Autonomous Bus Planners	Yanding Yang et.al.	2512.18988	translate	read	null
2025-12-22	Scaling Online Distributionally Robust Reinforcement Learning: Sample-Efficient Guarantees with General Function Approximation	Debamita Ghosh et.al.	2512.18957	translate	read	null
2025-12-22	Training Multimodal Large Reasoning Models Needs Better Thoughts: A Three-Stage Framework for Long Chain-of-Thought Synthesis and Selection	Yizhi Wang et.al.	2512.18956	translate	read	null
2025-12-22	A Framework for Deploying Learning-based Quadruped Loco-Manipulation	Yadong Liu et.al.	2512.18938	translate	read	null
2025-12-21	QoS-Aware Load Balancing in the Computing Continuum via Multi-Player Bandits	Ivan Čilić et.al.	2512.18915	translate	read	null
2025-12-21	Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations	Shaomu Tan et.al.	2512.18906	translate	read	null
2025-12-21	Structural Reinforcement Learning for Heterogeneous Agent Macroeconomics	Yucheng Yang et.al.	2512.18892	translate	read	null
2025-12-21	CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning	Zijun Gao et.al.	2512.18857	translate	read	null
2025-12-21	InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement	Feeza Khan Khanzada et.al.	2512.18850	translate	read	null
2025-12-21	From Word to World: Can Large Language Models be Implicit Text-based World Models?	Yixia Li et.al.	2512.18832	translate	read	null
2025-12-21	MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation	Guohui Zhang et.al.	2512.18766	translate	read	null
2025-12-21	Gaussian-Mixture-Model Q-Functions for Policy Iteration in Reinforcement Learning	Minh Vu et.al.	2512.18763	translate	read	null
2025-12-21	InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search	Kaican Li et.al.	2512.18745	translate	read	null
2025-12-21	A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models	Zhiquan Tan et.al.	2512.18730	translate	read	null
2025-12-21	Demonstration-Guided Continual Reinforcement Learning in Dynamic Environments	Xue Yang et.al.	2512.18670	translate	read	null
2025-12-21	Offline Reinforcement Learning for End-to-End Autonomous Driving	Chihiro Noguchi et.al.	2512.18662	translate	read	null
2025-12-21	LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction	Jensen Zhang et.al.	2512.18623	translate	read	null
2025-12-21	A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback	Thanh Dat Hoang et.al.	2512.18622	translate	read	null
2025-12-21	Trajectory Planning for UAV-Based Smart Farming Using Imitation-Based Triple Deep Q-Learning	Wencan Mao et.al.	2512.18604	translate	read	null
2025-12-21	SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models	Pengcheng Li et.al.	2512.18583	translate	read	null
2025-12-21	ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning	Weijie Zhou et.al.	2512.18571	translate	read	null
2025-12-21	Vox Deorum: A Hybrid LLM Architecture for 4X / Grand Strategy Game AI – Lessons from Civilization V	John Chen et.al.	2512.18564	translate	read	null
2025-12-21	Distributionally Robust Multi-Agent Reinforcement Learning for Intelligent Traffic Control	Shuwei Pei et.al.	2512.18558	translate	read	null
2025-12-21	Toward Training Superintelligent Software Agents through Self-Play SWE-RL	Yuxiang Wei et.al.	2512.18552	translate	read	null
2025-12-20	Scaling up Stability: Reinforcement Learning for Distributed Control of Networked Systems in the Space of Stabilizing Policies	John Cao et.al.	2512.18540	translate	read	null
2025-12-20	When Robots Say No: The Empathic Ethical Disobedience Benchmark	Dmytro Kuzmenko et.al.	2512.18474	translate	read	null
2025-12-20	On the Universality of Transformer Architectures; How Much Attention Is Enough?	Amirreza Abbasi et.al.	2512.18445	translate	read	null
2025-12-20	Learning Semantic Atomic Skills for Multi-Task Robotic Manipulation	Yihang Zhu et.al.	2512.18368	translate	read	null
2025-12-20	Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism	Youssef Mahran et.al.	2512.18336	translate	read	null
2025-12-20	Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)	Youssef Mahran et.al.	2512.18333	translate	read	null
2025-12-20	Trustworthy and Explainable Deep Reinforcement Learning for Safe and Energy-Efficient Process Control: A Use Case in Industrial Compressed Air Systems	Vincent Bezold et.al.	2512.18317	translate	read	null
2025-12-20	Monitoring Monitorability	Melody Y. Guan et.al.	2512.18311	translate	read	null
2025-12-20	Embedded Safety-Aligned Intelligence via Differentiable Internal Alignment Embeddings	Harsh Rathva et.al.	2512.18309	translate	read	null
2025-12-20	Stable and Efficient Single-Rollout RL for Multimodal Reasoning	Rui Liu et.al.	2512.18215	translate	read	null
2025-12-20	Sophia: A Persistent Agent Framework of Artificial Life	Mingyang Sun et.al.	2512.18202	translate	read	null
2025-12-20	NL2CA: Auto-formalizing Cognitive Decision-Making from Natural Language Using an Unsupervised CriticNL2LTL Framework	Zihao Deng et.al.	2512.18189	translate	read	null
2025-12-20	On Swarm Leader Identification using Probing Policies	Stergios E. Bachoumas et.al.	2512.18146	translate	read	null
2025-12-19	Unifying Causal Reinforcement Learning: Survey, Taxonomy, Algorithms and Applications	Cristiano da Costa Cunha et.al.	2512.18135	translate	read	null
2025-12-19	Towards Autonomous Navigation in Endovascular Interventions	Tudor Jianu et.al.	2512.18081	translate	read	null
2025-12-19	SurgiPose: Estimating Surgical Tool Kinematics from Monocular Video for Surgical Robot Learning	Juo-Tung Chen et.al.	2512.18068	translate	read	null
2025-12-19	ReGal: A First Look at PPO-based Legal AI for Judgment Prediction and Summarization in India	Shubham Kumar Nigam et.al.	2512.18014	translate	read	null
2025-12-19	Adaptive Agents in Spatial Double-Auction Markets: Modeling the Emergence of Industrial Symbiosis	Matthieu Mastio et.al.	2512.17979	translate	read	null
2025-12-19	Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy	Aditya Gahlawat et.al.	2512.17899	translate	read	null
2025-12-19	AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning	Ran Gong et.al.	2512.17853	translate	read	null
2025-12-19	Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes	Carlos Vélez García et.al.	2512.17846	translate	read	null
2025-12-19	NeuRehab: A Reinforcement Learning and Spiking Neural Network-Based Rehab Automation Framework	Phani Pavan Kambhampati et.al.	2512.17841	translate	read	null
2025-12-19	About Time: Model-free Reinforcement Learning with Timed Reward Machines	Anirban Majumdar et.al.	2512.17637	translate	read	null
2025-12-19	Trust-Region Adaptive Policy Optimization	Mingyu Su et.al.	2512.17636	translate	read	null
2025-12-19	SCOPE: Sequential Causal Optimization of Process Interventions	Jakob De Moor et.al.	2512.17629	translate	read	null
2025-12-19	Learning Safe Autonomous Driving Policies Using Predictive Safety Representations	Mahesh Keswani et.al.	2512.17586	translate	read	null
2025-12-19	Kinematics-Aware Diffusion Policy with Consistent 3D Observation and Action Space for Whole-Arm Robotic Manipulation	Kangchen Lv et.al.	2512.17568	translate	read	null
2025-12-19	HydroGym: A Reinforcement Learning Platform for Fluid Dynamics	Christian Lagemann et.al.	2512.17534	translate	read	null
2025-12-19	Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning	Javier Gonzalez-Ruiz et.al.	2512.17444	translate	read	null
2025-12-19	Xiaomi MiMo-VL-Miloco Technical Report	Jiaze Li et.al.	2512.17436	translate	read	null
2025-12-19	TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data	Deqing Liu et.al.	2512.17370	translate	read	null
2025-12-19	Neuro-Symbolic Control with Large Language Models for Language-Guided Spatial Tasks	Momina Liaqat Ali et.al.	2512.17321	translate	read	null
2025-12-19	Large Language Models as Pokémon Battle Agents: Strategic Play and Content Generation	Daksh Jain et.al.	2512.17308	translate	read	null
2025-12-19	Understanding Generalization in Role-Playing Models via Information Theory	Yongqi Li et.al.	2512.17270	translate	read	null
2025-12-19	A Theoretical Analysis of State Similarity Between Markov Decision Processes	Zhenyu Tao et.al.	2512.17265	translate	read	null
2025-12-19	Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience	Jiangjie Chen et.al.	2512.17260	translate	read	null
2025-12-19	Cooperative Energy Scheduling of Multi-Microgrids Based on Risk-Sensitive Reinforcement Learning	Rongxiang Zhang et.al.	2512.17246	translate	read	null
2025-12-19	Learning When to Look: A Disentangled Curriculum for Strategic Perception in Multimodal Reasoning	Siqi Yang et.al.	2512.17227	translate	read	null
2025-12-19	CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency	Xiao Liang et.al.	2512.17213	translate	read	null
2025-12-19	Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs	Rujiao Long et.al.	2512.17206	translate	read	null
2025-12-19	MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation	Shengwei Zhao et.al.	2512.17194	translate	read	null
2025-12-19	MAPPO-LCR: Multi-Agent Proximal Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games	Zhaoqilin Yang et.al.	2512.17187	translate	read	null
2025-12-19	Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots	Gang Zhang et.al.	2512.17183	translate	read	null
2025-12-19	Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors	Maher Mesto et.al.	2512.17180	translate	read	null
2025-12-19	Enhancing AIGC Service Efficiency with Adaptive Multi-Edge Collaboration in A Distributed System	Changfu Xu et.al.	2512.17158	translate	read	null
2025-12-19	Towards Senior-Robot Interaction: Reactive Robot Dog Gestures	Chunyang Meng et.al.	2512.17136	translate	read	null
2025-12-19	Deep Reinforcement Learning-Aided Strategies for Big Data Offloading in Vehicular Networks	Talha Akyildiz et.al.	2512.17133	translate	read	null
2025-12-18	Reinforcement Learning for Self-Improving Agent with Skill Library	Jiongxiao Wang et.al.	2512.17102	translate	read	null
2025-12-18	Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making	Toshiaki Hori et.al.	2512.17091	translate	read	null
2025-12-18	Value Under Ignorance in Universal Artificial Intelligence	Cole Wyeth et.al.	2512.17086	translate	read	null
2025-12-18	UniRel-R1: RL-tuned LLM Reasoning for Knowledge Graph Relational Question Answering	Yinxu Tang et.al.	2512.17043	translate	read	null
2025-12-18	GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning	Chang-Hwan Lee et.al.	2512.17034	translate	read	null
2025-12-18	Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification	Qihao Liu et.al.	2512.16921	translate	read	null
2025-12-18	AdaTooler-V: Adaptive Tool-Use for Images and Videos	Chaoyang Wang et.al.	2512.16918	translate	read	null
2025-12-18	Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning	Qihao Liu et.al.	2512.16917	translate	read	null
2025-12-18	Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward	Peter Chen et.al.	2512.16912	translate	read	null
2025-12-18	Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning	Andrew Wagenmaker et.al.	2512.16911	translate	read	null
2025-12-18	MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning	Yuanchen Ju et.al.	2512.16909	translate	read	null
2025-12-18	AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning	Tzu-Han Lin et.al.	2512.16883	translate	read	null
2025-12-18	A survey of the orienteering problem: model evolution, algorithmic advances, and future directions	Songhao Shen et.al.	2512.16865	translate	read	null
2025-12-18	RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing	Tianyuan Qu et.al.	2512.16864	translate	read	null
2025-12-18	ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning	Zihan Zhou et.al.	2512.16861	translate	read	null
2025-12-18	Meta-RL Induces Exploration in Language Agents	Yulun Jiang et.al.	2512.16848	translate	read	null
2025-12-18	Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning	Bahman Abolhassani et.al.	2512.16813	translate	read	null
2025-12-18	Olaf: Bringing an Animated Character to Life in the Physical World	David Müller et.al.	2512.16705	translate	read	null
2025-12-18	JustRL: Scaling a 1.5B LLM with a Simple RL Recipe	Bingxiang He et.al.	2512.16649	translate	read	null
2025-12-18	Implementing a Sharia Chatbot as a Consultation Medium for Questions About Islam	Wisnu Uriawan et.al.	2512.16644	translate	read	null
2025-12-18	Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game	Barna Pásztor et.al.	2512.16626	translate	read	null
2025-12-18	Non-Asymptotic Global Convergence of PPO-Clip	Yin Liu et.al.	2512.16565	translate	read	null
2025-12-18	ParamExplorer: A framework for exploring parameters in generative art	Julien Gachadoat et.al.	2512.16529	translate	read	null
2025-12-18	Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment	Yuan Li et.al.	2512.16484	translate	read	null
2025-12-18	E-SDS: Environment-aware See it, Do it, Sorted - Automated Environment-Aware Reinforcement Learning for Humanoid Locomotion	Enis Yalcin et.al.	2512.16446	translate	read	null
2025-12-18	StarCraft+: Benchmarking Multi-agent Algorithms in Adversary Paradigm	Yadong Li et.al.	2512.16444	translate	read	null
2025-12-18	NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning	Ruifeng Xu et.al.	2512.16408	translate	read	null
2025-12-18	Hypernetworks That Evolve Themselves	Joachim Winther Pedersen et.al.	2512.16406	translate	read	null
2025-12-18	Machine Learning-based Optimal Control for Colloidal Self-Assembly	Andres Lizano-Villalobos et.al.	2512.16402	translate	read	null
2025-12-18	ManiLong-Shot: Interaction-Aware One-Shot Imitation Learning for Long-Horizon Manipulation	Zixuan Chen et.al.	2512.16302	translate	read	null
2025-12-18	Simultaneous Secrecy and Covert Communications (SSACC) in Mobility-Aware RIS-Aided Networks	Yanyu Cheng et.al.	2512.16224	translate	read	null
2025-12-18	Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation	Sarosij Bose et.al.	2512.16201	translate	read	null
2025-12-18	MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation	Pengyu Wang et.al.	2512.16145	translate	read	null
2025-12-18	INTELLECT-3: Technical Report	Prime Intellect Team et.al.	2512.16144	translate	read	null
2025-12-17	Techno-economic optimization of a heat-pipe microreactor, part I: theory and cost optimization	Paul Seurin et.al.	2512.16032	translate	read	null
2025-12-17	Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models	Caner Erden et.al.	2512.15973	translate	read	null
2025-12-17	Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning	Polaris Jhandi et.al.	2512.15943	translate	read	null
2025-12-17	DSO: Direct Steering Optimization for Bias Mitigation	Lucas Monteiro Paes et.al.	2512.15926	translate	read	null
2025-12-15	Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)	Akhil Sharma et.al.	2512.15790	translate	read	null
2025-12-17	Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning	Zhenwen Liang et.al.	2512.15687	translate	read	null
2025-12-17	Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning	Jiaqi Xu et.al.	2512.15662	translate	read	null
2025-12-17	Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction	Mathieu Blondel et.al.	2512.15605	translate	read	null
2025-12-17	Deep Reinforcement Learning for EH-Enabled Cognitive-IoT Under Jamming Attacks	Nadia Abdolkhani et.al.	2512.15558	translate	read	null
2025-12-17	Autonomous Pressure Control in MuVacAS via Deep Reinforcement Learning and Deep Learning Surrogate Models	Guillermo Rodriguez-Llorente et.al.	2512.15521	translate	read	null
2025-12-17	Double Horizon Model-Based Policy Optimization	Akihiro Kubo et.al.	2512.15439	translate	read	null
2025-12-17	FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments	Quanxi Zhou et.al.	2512.15430	translate	read	null
2025-12-17	Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods	Ji Zhou et.al.	2512.15422	translate	read	null
2025-12-17	EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning	Jianfei Ma et.al.	2512.15405	translate	read	null
2025-12-17	Graph Contextual Reinforcement Learning for Efficient Directed Controller Synthesis	Toshihide Ubukata et.al.	2512.15295	translate	read	null
2025-12-17	Learning-Based Phase Shift Optimization of Liquid Crystal RIS in Dynamic mmWave Networks	Le Hao et.al.	2512.15279	translate	read	null
2025-12-17	Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning	Yiliu Sun et.al.	2512.15274	translate	read	null
2025-12-17	EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence	Jiaxu Wan et.al.	2512.15160	translate	read	null
2025-12-17	Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning	Weiqin Wang et.al.	2512.15146	translate	read	null
2025-12-17	Automatic Reward Shaping from Multi-Objective Human Heuristics	Yuqing Xie et.al.	2512.15120	translate	read	null
2025-12-17	QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management	Jiayang Wan et.al.	2512.15119	translate	read	null
2025-12-17	Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models	Jinwu Hu et.al.	2512.15089	translate	read	null
2025-12-17	Deep Reinforcement Learning for Joint Time and Power Management in SWIPT-EH CIoT	Nadia Abdolkhani et.al.	2512.15062	translate	read	null
2025-12-17	Spectral Representation-based Reinforcement Learning	Chenxiao Gao et.al.	2512.15036	translate	read	null
2025-12-17	ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision	Wenlong Xia et.al.	2512.15020	translate	read	null
2025-12-17	Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management	E. C. Garrido-Merchán et.al.	2512.14992	translate	read	null
2025-12-17	Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes	Hanqing Jin et.al.	2512.14991	translate	read	null
2025-12-16	Puzzle Curriculum GRPO for Vision-Centric Reasoning	Ahmadreza Jeddi et.al.	2512.14944	translate	read	null
2025-12-16	Imitation Learning for Multi-turn LM Agents via On-policy Expert Corrections	Niklas Lauffer et.al.	2512.14895	translate	read	null
2025-12-16	Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse	Jingwei Chen et.al.	2512.14879	translate	read	null
2025-12-16	TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs	Jun Zhang et.al.	2512.14698	translate	read	link
2025-12-16	CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives	Zihan Wang et.al.	2512.14696	translate	read	link
2025-12-16	Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes	Alessandro Trapasso et.al.	2512.14617	translate	read	null
2025-12-16	RecGPT-V2 Technical Report	Chao Yi et.al.	2512.14503	translate	read	null
2025-12-16	Hybrid Cognitive IoT with Cooperative Caching and SWIPT-EH: A Hierarchical Reinforcement Learning Framework	Nadia Abdolkhani et.al.	2512.14488	translate	read	null
2025-12-16	Context-Picker: Dynamic context selection using multi-stage reinforcement learning	Siyuan Zhu et.al.	2512.14465	translate	read	null
2025-12-16	A data-physics hybrid generative model for patient-specific post-stroke motor rehabilitation using wearable sensor data	Yanning Dai et.al.	2512.14329	translate	read	null
2025-12-16	Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations	Xudong Han et.al.	2512.14321	translate	read	null
2025-12-16	A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks	Agrippina Mwangi et.al.	2512.14297	translate	read	null
2025-12-16	GLM-TTS Technical Report	Jiayan Cui et.al.	2512.14291	translate	read	link
2025-12-16	Understanding and Improving Hyperbolic Deep Reinforcement Learning	Timo Klein et.al.	2512.14202	translate	read	link
2025-12-16	Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis	Yankai Jiang et.al.	2512.14157	translate	read	null
2025-12-16	A First-Order Logic-Based Alternative to Reward Models in RLHF	Chunjin Jian et.al.	2512.14100	translate	read	null
2025-12-16	RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees	Junjie Ma et.al.	2512.14069	translate	read	null
2025-12-16	Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning	Amir M. Soufi Enayati et.al.	2512.14057	translate	read	null
2025-12-16	OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving	Zhenguo Zhang et.al.	2512.14044	translate	read	null
2025-12-16	Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model	Zhaofeng Hu et.al.	2512.14031	translate	read	null
2025-12-16	Cooperative Caching Towards Efficient Spectrum Utilization in Cognitive-IoT Networks	Nadia Abdolkhani et.al.	2512.14029	translate	read	null
2025-12-16	Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks	Nadia Abdolkhani et.al.	2512.14013	translate	read	null
2025-12-15	Adaptive digital twins for predictive decision-making: Online Bayesian learning of transition dynamics	Eugenio Varetti et.al.	2512.13919	translate	read	null
2025-12-15	Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences	Charles Marrder et.al.	2512.13890	translate	read	null
2025-12-15	SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning	Jitesh Jain et.al.	2512.13874	translate	read	link
2025-12-15	Explainable reinforcement learning from human feedback to improve alignment	Shicheng Liu et.al.	2512.13837	translate	read	null
2025-12-13	RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing	Yuhan Tang et.al.	2512.13727	translate	read	null
2025-12-13	Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce	Sayak Chakrabarty et.al.	2512.13726	translate	read	null
2025-12-15	AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection	Junwen Miao et.al.	2512.13671	translate	read	null
2025-12-15	A Scientific Reasoning Model for Organic Synthesis Procedure Generation	Guoqing Liu et.al.	2512.13668	translate	read	null
2025-12-15	Advancing Machine Learning Optimization of Chiral Photonic Metasurface: Comparative Study of Neural Network and Genetic Algorithm Approaches	Davide Filippozzi et.al.	2512.13656	translate	read	null
2025-12-15	MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning	Haoyu Fu et.al.	2512.13636	translate	read	null
2025-12-15	SCR2-ST: Combine Single Cell with Spatial Transcriptomics for Efficient Active Sampling via Reinforcement Learning	Junchao Zhu et.al.	2512.13635	translate	read	null
2025-12-15	Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models	Boxin Wang et.al.	2512.13607	translate	read	null
2025-12-15	Image Diffusion Preview with Consistency Solver	Fu-Yun Wang et.al.	2512.13592	translate	read	link
2025-12-15	MMhops-R1: Multimodal Multi-hop Reasoning	Tao Zhang et.al.	2512.13573	translate	read	null
2025-12-15	Memory in the Age of AI Agents	Yuyang Hu et.al.	2512.13564	translate	read	link
2025-12-15	How Low Can You Go? The Data-Light SE Challenge	Kishan Kumar Ganguly et.al.	2512.13524	translate	read	null
2025-12-15	Reinforcement Learning based 6-DoF Maneuvers for Microgravity Intravehicular Docking: A Simulation Study with Int-Ball2 in ISS-JEM	Aman Arora et.al.	2512.13514	translate	read	null
2025-12-15	MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph	Linjie Mu et.al.	2512.13510	translate	read	null
2025-12-15	Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model	Heyi Chen et.al.	2512.13507	translate	read	null
2025-12-15	Differentiable Evolutionary Reinforcement Learning	Sitao Cheng et.al.	2512.13399	translate	read	null
2025-12-15	QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution	Mohammad Reza Fasihi et.al.	2512.13393	translate	read	null
2025-12-15	Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning	Chuan Mao et.al.	2512.13380	translate	read	null
2025-12-15	Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles	Sümer Tunçay et.al.	2512.13359	translate	read	null
2025-12-15	Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)	Zeyad Gamal et.al.	2512.13356	translate	read	null
2025-12-15	Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration	Hao Fu et.al.	2512.13293	translate	read	null
2025-12-15	AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning	Jiaru Zou et.al.	2512.13278	translate	read	null
2025-12-15	SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling	Muhammad Alfian Amrizal et.al.	2512.13268	translate	read	null
2025-12-15	Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving	Hyunki Seong et.al.	2512.13262	translate	read	null
2025-12-15	Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection	Zihui Zhao et.al.	2512.13240	translate	read	null
2025-12-15	SACn: Soft Actor-Critic with n-step Returns	Jakub Łyskawa et.al.	2512.13165	translate	read	null
2025-12-15	SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning	Emre Can Acikgoz et.al.	2512.13159	translate	read	null
2025-12-15	TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning	Shenzhi Yang et.al.	2512.13106	translate	read	null
2025-12-15	Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures	Mohammad Walid Charrwi et.al.	2512.13096	translate	read	null
2025-12-15	ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning	Feng Zhang et.al.	2512.13095	translate	read	null
2025-12-15	Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation	Xiang Li et.al.	2512.13094	translate	read	null
2025-12-15	PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations	Mingqi Yuan et.al.	2512.13093	translate	read	null
2025-12-15	M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization	Bizhe Bai et.al.	2512.13070	translate	read	null
2025-12-15	Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments	Kangning Gao et.al.	2512.13060	translate	read	null
2025-12-15	GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training	Tong Wei et.al.	2512.13043	translate	read	null
2025-12-15	What Happens Next? Next Scene Prediction with a Unified Video Model	Xinjie Li et.al.	2512.13015	translate	read	null
2025-12-15	Learning Terrain Aware Bipedal Locomotion via Reduced Dimensional Perceptual Representations	Guillermo A. Castillo et.al.	2512.12993	translate	read	null
2025-12-15	Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning	Amin Jalal Aghdasian et.al.	2512.12987	translate	read	null
2025-12-15	QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management	Weizhou Shen et.al.	2512.12967	translate	read	link
2025-12-15	Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals	Gagan Deep et.al.	2512.12924	translate	read	null
2025-12-15	LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization	Bangyu Li et.al.	2512.12922	translate	read	null
2025-12-15	Meta-GPT: Decoding the Metasurface Genome with Generative Artificial Intelligence	David Dang et.al.	2512.12888	translate	read	null
2025-12-14	Information-Consistent Language Model Recommendations through Group Relative Policy Optimization	Sonal Prabhune et.al.	2512.12858	translate	read	null
2025-12-14	MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems	Patrick Kostelac et.al.	2512.12855	translate	read	null
2025-12-14	Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks	Dong Liu et.al.	2512.12803	translate	read	null
2025-12-14	CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning	Xuanzhang Liu et.al.	2512.12716	translate	read	null
2025-12-14	Self-Motivated Growing Neural Network for Adaptive Architecture via Local Structural Plasticity	Yiyang Jia et.al.	2512.12713	translate	read	null
2025-12-14	Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning	Enhong Mu et.al.	2512.12706	translate	read	null
2025-12-14	Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning	Yongcan Yu et.al.	2512.12690	translate	read	null
2025-12-14	CogDoc: Towards Unified thinking in Documents	Qixin Xu et.al.	2512.12658	translate	read	null
2025-12-14	Coupled Variational Reinforcement Learning for Language Model General Reasoning	Xueru Wen et.al.	2512.12576	translate	read	null
2025-12-14	World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents	Yesid Fonseca et.al.	2512.12548	translate	read	null
2025-12-13	Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings	Shengkai Xu et.al.	2512.12492	translate	read	null
2025-12-13	More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models	Hoang Anh Just et.al.	2512.12487	translate	read	null
2025-12-13	HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments	Yongjun He et.al.	2512.12476	translate	read	null
2025-12-13	Sim2Real Reinforcement Learning for Soccer skills	Jonathan Spraggett et.al.	2512.12437	translate	read	link
2025-12-13	Deep Hedging with Reinforcement Learning: A Practical Framework for Option Risk Management	Travon Lucius et.al.	2512.12420	translate	read	null
2025-12-13	ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) Systems	Babak Badnava et.al.	2512.12366	translate	read	null
2025-12-13	The Role of AI in Modern Penetration Testing	J. Alexander Curtis et.al.	2512.12326	translate	read	null
2025-12-13	A Conflict-Aware Resource Management Framework for the Computing Continuum	Vlad Popescu-Vifor et.al.	2512.12299	translate	read	null
2025-12-13	Moment and Highlight Detection via MLLM Frame Segmentation	I Putu Andika Bagas Jiwanta et.al.	2512.12246	translate	read	null
2025-12-13	Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy	Jonathan Spraggett et.al.	2512.12230	translate	read	link
2025-12-12	Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning	Vittorio Giammarino et.al.	2512.12046	translate	read	null
2025-12-12	Policy Gradient Algorithms for Age-of-Information Cost Minimization	José-Ramón Vidal et.al.	2512.11990	translate	read	null
2025-12-12	Learning to Extract Context for Context-Aware LLM Inference	Minseon Kim et.al.	2512.11986	translate	read	null
2025-12-12	A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach	Jia Hu et.al.	2512.11944	translate	read	null
2025-12-12	Evolutionary Reinforcement Learning based AI tutor for Socratic Interdisciplinary Instruction	Mei Jiang et.al.	2512.11930	translate	read	null
2025-12-12	AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis	Junjie Ye et.al.	2512.11797	translate	read	null
2025-12-12	Agile Flight Emerges from Multi-Agent Competitive Racing	Vineet Pasumarti et.al.	2512.11781	translate	read	null
2025-12-12	SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support	Yuming Feng et.al.	2512.11755	translate	read	null
2025-12-12	UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations	Tingyu Yuan et.al.	2512.11609	translate	read	null
2025-12-12	DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry	Zhenyang Cai et.al.	2512.11558	translate	read	null
2025-12-12	Rethinking Expert Trajectory Utilization in LLM Post-training	Bowen Ding et.al.	2512.11470	translate	read	link
2025-12-12	Three methods, one problem: Classical and AI approaches to no-three-in-line	Pranav Ramanathan et.al.	2512.11469	translate	read	null
2025-12-12	Towards Trustworthy Multi-Turn LLM Agents via Behavioral Guidance	Gonca Gürsun et.al.	2512.11421	translate	read	null
2025-12-12	Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization	Yifan Niu et.al.	2512.11391	translate	read	null
2025-12-12	Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits	Minwoo Park et.al.	2512.11345	translate	read	null
2025-12-12	DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning	Jinming Ge et.al.	2512.11342	translate	read	null
2025-12-12	RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training	Tianyuan Wu et.al.	2512.11306	translate	read	null
2025-12-12	When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents	Mrinal Rawat et.al.	2512.11277	translate	read	null
2025-12-12	A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation	Hong Je-Gal et.al.	2512.11270	translate	read	null
2025-12-12	Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control	Iftekharul Islam et.al.	2512.11247	translate	read	null
2025-12-11	Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning	Wei Duan et.al.	2512.11179	translate	read	null
2025-12-11	Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance	Tzu-Hsien Lee et.al.	2512.11173	translate	read	null
2025-12-11	CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound	Akhil S Anand et.al.	2512.11169	translate	read	null
2025-12-11	Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts	Guanli Liu et.al.	2512.11161	translate	read	null
2025-12-11	In-Context Multi-Objective Optimization	Xinyu Zhang et.al.	2512.11114	translate	read	null
2025-12-11	Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation	Yiwen Tang et.al.	2512.10949	translate	read	link
2025-12-11	Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit	Zamirddine Mari et.al.	2512.10934	translate	read	null
2025-12-11	Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation	Zamirddine Mari et.al.	2512.10925	translate	read	null
2025-12-11	Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies	Mohammad Rezoanul Hoque et.al.	2512.10913	translate	read	null
2025-12-11	Iterative Compositional Data Generation for Robot Control	Anh-Quan Pham et.al.	2512.10891	translate	read	null
2025-12-11	Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments	Atahan Cilan et.al.	2512.10835	translate	read	null
2025-12-11	OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification	Zijian Wu et.al.	2512.10756	translate	read	null
2025-12-11	Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification	Maya Swisa et.al.	2512.10747	translate	read	null
2025-12-11	Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving	Songyang Gao et.al.	2512.10739	translate	read	null
2025-12-11	How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning	Jianbo Wang et.al.	2512.10698	translate	read	null
2025-12-11	Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning	Benjamin Gundersen et.al.	2512.10691	translate	read	null
2025-12-11	AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence	Bo Yang et.al.	2512.10624	translate	read	null
2025-12-11	Multi-Objective Reward and Preference Optimization: Theory and Algorithms	Akhil Agnihotri et.al.	2512.10601	translate	read	null
2025-12-11	Grounding Everything in Tokens for Multimodal Large Language Models	Xiangxuan Ren et.al.	2512.10554	translate	read	null
2025-12-11	Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning	Haiteng Zhao et.al.	2512.10534	translate	read	null
2025-12-11	Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning	Chihyeon Song et.al.	2512.10510	translate	read	null
2025-12-11	UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning	Jiaxi Wu et.al.	2512.10492	translate	read	null
2025-12-11	Shot and Architecture Adaptive Subspace Variational Quantum Eigensolver for Microwave Simulation	Zhixiu Han et.al.	2512.10458	translate	read	null
2025-12-11	HypeR Adaptivity: Joint $hr$ -Adaptive Meshing via Hypergraph Multi-Agent Deep Reinforcement Learning	Niccolò Grillo et.al.	2512.10439	translate	read	null
2025-12-11	Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention	Yang Yu et.al.	2512.10414	translate	read	null
2025-12-11	A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale	Vinoth Punniyamoorthy et.al.	2512.10341	translate	read	null
2025-12-11	Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters	Shruti Dongare et.al.	2512.10271	translate	read	null
2025-12-11	Multi-dimensional Preference Alignment by Conditioning Reward Itself	Jiho Jang et.al.	2512.10237	translate	read	null
2025-12-11	Task-Oriented Grasping Using Reinforcement Learning with a Contextual Reward Machine	Hui Li et.al.	2512.10235	translate	read	null
2025-12-11	Latent Chain-of-Thought World Modeling for End-to-End Driving	Shuhan Tan et.al.	2512.10226	translate	read	null
2025-12-11	An exploration for higher efficiency in multi objective optimisation with reinforcement learning	Mehmet Emin Aydin et.al.	2512.10208	translate	read	null
2025-12-10	Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation	Pol Mestres et.al.	2512.10118	translate	read	null
2025-12-10	Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation	Steven Caro et.al.	2512.10099	translate	read	null
2025-12-10	SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation	Jongmin Lee et.al.	2512.10042	translate	read	null
2025-12-10	Diffusion Is Your Friend in Show, Suggest and Tell	Jia Cheng Hu et.al.	2512.10038	translate	read	null
2025-12-10	Latent Action World Models for Control with Unlabeled Trajectories	Marvin Alles et.al.	2512.10016	translate	read	null
2025-12-10	TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0	Jinyu Chen et.al.	2512.09961	translate	read	null
2025-12-10	STACHE: Local Black-Box Explanations for Reinforcement Learning Policies	Andrew Elashkin et.al.	2512.09909	translate	read	null
2025-12-10	FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning	Khurram Khalil et.al.	2512.09872	translate	read	null
2025-12-10	Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation	Yuyang Li et.al.	2512.09851	translate	read	link
2025-12-10	ChronusOmni: Improving Time Awareness of Omni Large Language Models	Yijing Chen et.al.	2512.09841	translate	read	null
2025-12-10	RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning	Khurram Khalil et.al.	2512.09829	translate	read	null
2025-12-10	Prefrontal scaling of reward prediction error readout gates reinforcement-derived adaptive behavior in primates	Tian Sang et.al.	2512.09761	translate	read	null
2025-12-10	MOA: Multi-Objective Alignment for Role-Playing Agents	Chonghua Liao et.al.	2512.09756	translate	read	null
2025-12-10	Flexible Reconfigurable Intelligent Surface-Aided Covert Communications in UAV Networks	Chong Huang et.al.	2512.09714	translate	read	null
2025-12-10	Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning	Kaichen He et.al.	2512.09706	translate	read	null
2025-12-10	Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies	Mika Persson et.al.	2512.09682	translate	read	null
2025-12-10	d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models	Leyi Pan et.al.	2512.09675	translate	read	null
2025-12-10	SynthPix: A lightspeed PIV images generator	Antonio Terpin et.al.	2512.09664	translate	read	null
2025-12-10	Mastering Diverse, Unknown, and Cluttered Tracks for Robust Vision-Based Drone Racing	Feng Yu et.al.	2512.09571	translate	read	null
2025-12-10	Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search	Junkai Ji et.al.	2512.09566	translate	read	null
2025-12-10	REASAN: Learning Reactive Safe Navigation for Legged Robots	Qihao Yuan et.al.	2512.09537	translate	read	null
2025-12-10	RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning	Yucan Guo et.al.	2512.09487	translate	read	null
2025-12-10	Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation	Jialin Ying et.al.	2512.09410	translate	read	null
2025-12-10	CFLight: Enhancing Safety with Traffic Signal Control through Counterfactual Learning	Mingyuan Li et.al.	2512.09368	translate	read	null
2025-12-10	COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning	Lin Li et.al.	2512.09349	translate	read	null
2025-12-10	Tyche: A Hybrid Computation Framework of Illumination Pattern for Satellite Beam Hopping	Ziheng Yang et.al.	2512.09312	translate	read	null
2025-12-10	One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation	Huayi Zhou et.al.	2512.09297	translate	read	null
2025-12-10	Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning	Ruonan Pi et.al.	2512.09293	translate	read	null
2025-12-10	Exploratory Mean-Variance with Jumps: An Equilibrium Approach	Yuling Max Chen et.al.	2512.09224	translate	read	null
2025-12-09	Learning Unmasking Policies for Diffusion Language Models	Metod Jazbec et.al.	2512.09106	translate	read	null
2025-12-09	Masked Generative Policy for Robotic Control	Lipeng Zhuang et.al.	2512.09101	translate	read	null
2025-12-09	No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers	Damiano Marsili et.al.	2512.08889	translate	read	null
2025-12-09	IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams	Ryan LeRoy et.al.	2512.08877	translate	read	null
2025-12-09	Reinforcement Learning From State and Temporal Differences	Lex Weaver et.al.	2512.08855	translate	read	null
2025-12-09	Optimal navigation in two-dimensional regular and turbulent flows	Vladimir Parfenyev et.al.	2512.08766	translate	read	null
2025-12-09	Learning and Editing Universal Graph Prompt Tuning via Reinforcement Learning	Jinfeng Xu et.al.	2512.08763	translate	read	null
2025-12-09	Direct transfer of optimized controllers to similar systems using dimensionless MPC	Josip Kir Hromatko et.al.	2512.08667	translate	read	null
2025-12-09	Sim2Swim: Zero-Shot Velocity Control for Agile AUV Maneuvering in 3 Minutes	Lauritz Rismark Fosso et.al.	2512.08656	translate	read	null
2025-12-09	Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis	Orit Davidovich et.al.	2512.08601	translate	read	null
2025-12-09	Mind to Hand: Purposeful Robotic Control via Embodied Reasoning	Peijun Tang et.al.	2512.08580	translate	read	null
2025-12-09	Thinking with Images via Self-Calling Agent	Wenxi Yang et.al.	2512.08511	translate	read	link
2025-12-09	Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning	Junnan Qiu et.al.	2512.08485	translate	read	null
2025-12-09	Using reinforcement learning to probe the role of feedback in skill acquisition	Antonio Terpin et.al.	2512.08463	translate	read	null
2025-12-09	From Accuracy to Impact: The Impact-Driven AI Framework (IDAIF) for Aligning Engineering Architecture with Theory of Change	Yong-Woon Kim et.al.	2512.08449	translate	read	null
2025-12-09	Turning Threat into Opportunity: DRL-Powered Anti-Jamming via Energy Harvesting in UAV-Disrupted Channels	Ngoc-Tan Nguyen et.al.	2512.08351	translate	read	null
2025-12-09	Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks	Thai Duong Nguyen et.al.	2512.08341	translate	read	null
2025-12-09	Collaborative Intelligence for UAV-Satellite Network Slicing: Towards a Joint QoS-Energy-Fairness MADRL Optimization	Thanh-Dao Nguyen et.al.	2512.08322	translate	read	null
2025-12-09	rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection	Sijia Chen et.al.	2512.08300	translate	read	null
2025-12-09	Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventions	Eunice Yiu et.al.	2512.08230	translate	read	null
2025-12-09	Primal-dual policy learning for mean-field stochastic LQR problem	Xiushan Jiang et.al.	2512.08205	translate	read	null
2025-12-09	TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models	Zheng Ding et.al.	2512.08153	translate	read	null
2025-12-09	Robust Agents in Open-Ended Worlds	Mikayel Samvelyan et.al.	2512.08139	translate	read	null
2025-12-09	Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward	Sampriti Soor et.al.	2512.08131	translate	read	null
2025-12-08	Scalable Offline Model-Based RL with Action Chunks	Kwanyoung Park et.al.	2512.08108	translate	read	null
2025-12-08	Training LLMs for Honesty via Confessions	Manas Joglekar et.al.	2512.08093	translate	read	null
2025-12-08	An Introduction to Deep Reinforcement and Imitation Learning	Pedro Santana et.al.	2512.08052	translate	read	null
2025-12-08	F2: Offline Reinforcement Learning for Hamiltonian Simulation via Free-Fermionic Subroutine Compilation	Ethan Decker et.al.	2512.08023	translate	read	null
2025-12-08	Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care	Aryaman Bansal et.al.	2512.08012	translate	read	null
2025-12-08	VLD: Visual Language Goal Distance for Reinforcement Learning Navigation	Lazar Milikic et.al.	2512.07976	translate	read	null
2025-12-08	Agentic Artificial Intelligence for Ethical Cybersecurity in Uganda: A Reinforcement Learning Framework for Threat Detection in Resource-Constrained Environments	Ibrahim Adabara et.al.	2512.07909	translate	read	null
2025-12-08	An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning	Lukas Johannes Möller et.al.	2512.07827	translate	read	null
2025-12-08	On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models	Charlie Zhang et.al.	2512.07783	translate	read	null
2025-12-08	RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models	Xiqiao Xiong et.al.	2512.07761	translate	read	null
2025-12-08	DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving	Jialv Zou et.al.	2512.07745	translate	read	null
2025-12-08	SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery	Meng Cao et.al.	2512.07733	translate	read	null
2025-12-08	Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE	Anxiang Zeng et.al.	2512.07710	translate	read	null
2025-12-08	Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks	Aileen Liao et.al.	2512.07697	translate	read	null
2025-12-08	The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds	Shahar Lutati et.al.	2512.07631	translate	read	null
2025-12-08	Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement	Yongsheng Lian et.al.	2512.07611	translate	read	null
2025-12-08	Understanding Individual Decision-Making in Multi-Agent Reinforcement Learning: A Dynamical Systems Approach	James Rudd-Jones et.al.	2512.07588	translate	read	null
2025-12-08	ReLaX: Reasoning with Latent Exploration for Large Reasoning Models	Shimin Zhang et.al.	2512.07558	translate	read	null
2025-12-08	Model-Based Reinforcement Learning Under Confounding	Nishanth Venkatesh et.al.	2512.07528	translate	read	null
2025-12-08	How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations	JV Roig et.al.	2512.07497	translate	read	null
2025-12-08	Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization	Zhuoran Zhuang et.al.	2512.07478	translate	read	null
2025-12-08	Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction	Haolin Song et.al.	2512.07464	translate	read	null
2025-12-08	Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning	Tong Wu et.al.	2512.07461	translate	read	null
2025-12-08	From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models	Clarisse Bardiot et.al.	2512.07452	translate	read	null
2025-12-08	KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models	Chenwei Shi et.al.	2512.07437	translate	read	null
2025-12-08	Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models	Haidong Kang et.al.	2512.07419	translate	read	null
2025-12-08	Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning	Giray Önür et.al.	2512.07417	translate	read	null
2025-12-08	Training Language Models to Use Prolog as a Tool	Niklas Mellgren et.al.	2512.07407	translate	read	null
2025-12-08	Control and Reinforcement Learning through the Lens of Optimization: An Algorithmic Perspective	Tolga Ok et.al.	2512.07377	translate	read	null
2025-12-08	ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning	Byungju Kim et.al.	2512.07371	translate	read	null
2025-12-08	Multi-Rigid-Body Approximation of Human Hands with Application to Digital Twin	Bin Zhao et.al.	2512.07359	translate	read	null
2025-12-08	PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning	Chen Gong et.al.	2512.07342	translate	read	null
2025-12-08	RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation	Zhi Rao et.al.	2512.07273	translate	read	null
2025-12-08	SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks	Florian Tretter et.al.	2512.07266	translate	read	null
2025-12-08	Benchmarking Humanoid Imitation Learning with Motion Difficulty	Zhaorui Meng et.al.	2512.07248	translate	read	null
2025-12-08	Towards Robust Protective Perturbation against DeepFake Face Swapping	Hengyang Yao et.al.	2512.07228	translate	read	null
2025-12-08	Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation	Zhaoyang Liu et.al.	2512.07212	translate	read	null
2025-12-08	MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning	Xuhui Zheng et.al.	2512.07203	translate	read	null
2025-12-08	Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction	Zhen Huang et.al.	2512.07200	translate	read	null
2025-12-08	Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models	Fenghua Weng et.al.	2512.07141	translate	read	null
2025-12-08	TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning	Zebin Xing et.al.	2512.07135	translate	read	null
2025-12-08	Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots	Jue Wang et.al.	2512.07114	translate	read	null
2025-12-07	A Hetero-Associative Sequential Memory Model Utilizing Neuromorphic Signals: Validated on a Mobile Manipulator	Runcong Wang et.al.	2512.07032	translate	read	null
2025-12-07	Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients	Krishna Arun et.al.	2512.06990	translate	read	null
2025-12-07	LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding	Yu Yu et.al.	2512.06982	translate	read	null
2025-12-07	Neuro-Vesicles: Neuromodulation Should Be a Dynamical System, Not a Tensor Decoration	Zilin Li et.al.	2512.06966	translate	read	null
2025-12-07	Statistical analysis of Inverse Entropy-regularized Reinforcement Learning	Denis Belomestny et.al.	2512.06956	translate	read	null
2025-12-07	Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features	Aseer Al Faisal et.al.	2512.06925	translate	read	null
2025-12-07	Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models	Alexandr Plashchinsky et.al.	2512.06920	translate	read	null
2025-12-07	Know your Trajectory – Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis	Clifford F et.al.	2512.06917	translate	read	null
2025-12-07	Khalasi: Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields	Rushiraj Gadhvi et.al.	2512.06912	translate	read	null
2025-12-07	An Analysis of Large Language Models for Simulating User Responses in Surveys	Ziyun Yu et.al.	2512.06874	translate	read	null
2025-12-07	JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models	Ce Chi et.al.	2512.06859	translate	read	null
2025-12-07	Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning	Tingyu Li et.al.	2512.06835	translate	read	null
2025-12-07	MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning	Yueqian Wang et.al.	2512.06810	translate	read	null
2025-12-07	PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance	Jifar Wakuma Ayana et.al.	2512.06747	translate	read	null
2025-12-07	The Role of Entropy in Visual Grounding: Analysis and Optimization	Shuo Li et.al.	2512.06726	translate	read	null
2025-12-07	RunawayEvil: Jailbreaking the Image-to-Video Generative Models	Songping Wang et.al.	2512.06674	translate	read	null
2025-12-07	LightSearcher: Efficient DeepSearch via Experiential Memory	Hengzhi Lan et.al.	2512.06653	translate	read	null
2025-12-07	Analyzing Collision Rates in Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning	Muyang Fan et.al.	2512.06645	translate	read	null
2025-12-07	Learning to Hedge Swaptions	Zaniar Ahmadi et.al.	2512.06639	translate	read	null
2025-12-07	MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment	Ruicheng Zhang et.al.	2512.06628	translate	read	null
2025-12-07	A New Trajectory-Oriented Approach to Enhancing Comprehensive Crowd Navigation Performance	Xinyu Zhou et.al.	2512.06608	translate	read	null
2025-12-06	MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding	Yuhao Su et.al.	2512.06581	translate	read	null
2025-12-06	Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input	Zifan Xu et.al.	2512.06571	translate	read	null
2025-12-06	A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation	Xiaocan Li et.al.	2512.06547	translate	read	null
2025-12-06	Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning	Ming Chen et.al.	2512.06533	translate	read	null
2025-12-06	Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains	Wanru Gong et.al.	2512.06486	translate	read	null
2025-12-06	Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control	Nathan P. Lawrence et.al.	2512.06471	translate	read	null
2025-12-06	RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs	Runlong Zhou et.al.	2512.06392	translate	read	null
2025-12-06	VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning	Yuji Wang et.al.	2512.06373	translate	read	null
2025-12-06	LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing	Zhiying Yang et.al.	2512.06351	translate	read	null
2025-12-06	ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models	Jiahao Li et.al.	2512.06328	translate	read	null
2025-12-06	A Hybrid Physics-Based and Reinforcement Learning Framework for Electric Vehicle Charging Time Prediction	Praharshitha Aryasomayajula et.al.	2512.06287	translate	read	null
2025-12-06	Networked Restless Multi-Arm Bandits with Reinforcement Learning	Hanmo Zhang et.al.	2512.06274	translate	read	null
2025-12-06	Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models	Chen Yang et.al.	2512.06266	translate	read	null
2025-12-06	Learning Without Time-Based Embodiment Resets in Soft-Actor Critic	Homayoon Farrahi et.al.	2512.06252	translate	read	null
2025-12-06	Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning	Chris Tava et.al.	2512.06250	translate	read	null
2025-12-06	Auto-exploration for online reinforcement learning	Caleb Ju et.al.	2512.06244	translate	read	null
2025-12-06	AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems	Chuanhao Nie et.al.	2512.06240	translate	read	null
2025-12-05	Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration	Huizhen Yu et.al.	2512.06218	translate	read	null
2025-12-05	Quantifying Memory Use in Reinforcement Learning with Temporal Range	Rodney Lafuente-Mercado et.al.	2512.06204	translate	read	null
2025-12-05	JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning	Ufuk Çakır et.al.	2512.06102	translate	read	null
2025-12-05	Empathy by Design: Aligning Large Language Models for Healthcare Dialogue	Emre Umucu et.al.	2512.06097	translate	read	null
2025-12-05	Comparative Analysis of Autonomous and Systematic Control Strategies for Hole-Doped Hubbard Clusters: Reinforcement Learning versus Physics-Guided Design	Shivanshu Dwivedi et.al.	2512.06095	translate	read	null
2025-12-05	Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring	Mohanakrishnan Hariharan et.al.	2512.06060	translate	read	null
2025-12-05	EditThinker: Unlocking Iterative Reasoning for Any Image Editor	Hongyu Li et.al.	2512.05965	translate	read	null
2025-12-05	Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity	Germán Kruszewski et.al.	2512.05962	translate	read	null
2025-12-05	Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning	Yunhao Cao et.al.	2512.05953	translate	read	null
2025-12-05	Variational Quantum Rainbow Deep Q-Network for Optimizing Resource Allocation Problem	Truong Thanh Hung Nguyen et.al.	2512.05946	translate	read	null
2025-12-05	Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation	Fabian Konstantinidis et.al.	2512.05812	translate	read	null
2025-12-05	Real-time Remote Tracking and Autonomous Planning for Whale Rendezvous using Robots	Sushmita Bhattacharya et.al.	2512.05808	translate	read	null
2025-12-05	A Fast Anti-Jamming Cognitive Radar Deployment Algorithm Based on Reinforcement Learning	Wencheng Cai et.al.	2512.05753	translate	read	null
2025-12-05	A High-Order Immersed Boundary Method for Fluid-Structure Interaction Problems	Yingjie Xia et.al.	2512.05733	translate	read	null
2025-12-05	Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning	Ali Krayani et.al.	2512.05711	translate	read	null
2025-12-05	LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving	Yiming Shu et.al.	2512.05686	translate	read	null
2025-12-05	MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation	Zhitao He et.al.	2512.05671	translate	read	null
2025-12-05	Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning	Zhenpeng Su et.al.	2512.05591	translate	read	null
2025-12-05	Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning	Pengcheng Dai et.al.	2512.05447	translate	read	null
2025-12-05	ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction	Jiangtong Tan et.al.	2512.05422	translate	read	null
2025-12-05	State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning	Yuxiang Liu et.al.	2512.05335	translate	read	null
2025-12-04	Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay	Mehmet Efe Lorasdagi et.al.	2512.05320	translate	read	null
2025-12-04	Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces	Na Li et.al.	2512.05291	translate	read	null
2025-12-04	Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem	Ali Al Housseini et.al.	2512.05207	translate	read	null
2025-12-04	ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning	Shengyuan Ding et.al.	2512.05111	translate	read	null
2025-12-04	STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models	Feng Xu et.al.	2512.05107	translate	read	null
2025-12-04	Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning	Purbesh Mitra et.al.	2512.05105	translate	read	link

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)