Reinforcement Learning - 2026-04

Publish Date	Title	Authors	PDF	Translate	Read	Code
2026-04-01	Embarrassingly Simple Self-Distillation Improves Code Generation	Ruixiang Zhang et.al.	2604.01193	translate	read	null
2026-04-01	Deep Reinforcement Learning for Robotic Manipulation under Distribution Shift with Bounded Extremum Seeking	Shaifalee Saxena et.al.	2604.01142	translate	read	null
2026-04-01	Multi-Agent LLM Governance for Safe Two-Timescale Reinforcement Learning in SDN-IoT Defense	Saeid Jamshidi et.al.	2604.01127	translate	read	null
2026-04-01	BAT: Balancing Agility and Stability via Online Policy Switching for Long-Horizon Whole-Body Humanoid Control	Donghoon Baek et.al.	2604.01064	translate	read	null
2026-04-01	Adversarial Attacks in AI-Driven RAN Slicing: SLA Violations and Recovery	Deemah H. Tashman et.al.	2604.01049	translate	read	null
2026-04-01	Query-Conditioned Evidential Keyframe Sampling for MLLM-Based Long-Form Video Understanding	Yiheng Wang et.al.	2604.01002	translate	read	null
2026-04-01	Focal plane wavefront control with model-based reinforcement learning	Jalo Nousiainen et.al.	2604.00993	translate	read	null
2026-04-01	Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization	Ruijie Hao et.al.	2604.00977	translate	read	null
2026-04-01	Policy Improvement Reinforcement Learning	Huaiyang Wang et.al.	2604.00860	translate	read	null
2026-04-01	Disentangling to Re-couple: Resolving the Similarity-Controllability Paradox in Subject-Driven Text-to-Image Generation	Shuang Li et.al.	2604.00849	translate	read	null
2026-04-01	Bridging RL and MPC for mixed-integer optimal control with application to Formula 1 race strategies	Joschua Wüthrich et.al.	2604.00826	translate	read	null
2026-04-01	RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning	Shaopeng Fu et.al.	2604.00790	translate	read	null
2026-04-01	LangMARL: Natural Language Multi-Agent Reinforcement Learning	Huaiyuan Yao et.al.	2604.00722	translate	read	null
2026-04-01	Learning to Hint for Reinforcement Learning	Yu Xia et.al.	2604.00698	translate	read	null
2026-04-01	TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning	Soumya Shamarao Jahagirdar et.al.	2604.00696	translate	read	null
2026-04-01	Full-Gradient Successor Feature Representations	Ritish Shrirao et.al.	2604.00686	translate	read	null
2026-04-01	A Survey of On-Policy Distillation for Large Language Models	Mingyang Song et.al.	2604.00626	translate	read	null
2026-04-01	A Physical Imitation Learning Pipeline for Energy-Efficient Quadruped Locomotion Assisted by Parallel Elastic Joint	Huyue Ma et.al.	2604.00611	translate	read	null
2026-04-01	Toward Efficient Deployment and Synchronization in Digital Twins-Empowered Networks	Hossam Farag et.al.	2604.00566	translate	read	null
2026-04-01	Multi-Camera View Scaling for Data-Efficient Robot Imitation Learning	Yichen Xie et.al.	2604.00557	translate	read	null
2026-04-01	Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation	Zhiting Fan et.al.	2604.00536	translate	read	null
2026-04-01	AceTone: Bridging Words and Colors for Conditional Image Grading	Tianren Ma et.al.	2604.00530	translate	read	null
2026-04-01	MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding	Junxian Wu et.al.	2604.00513	translate	read	null
2026-04-01	A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation	Yabin Zhang et.al.	2604.00493	translate	read	null
2026-04-01	All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models	Xinyu Tian et.al.	2604.00479	translate	read	null
2026-04-01	Execution-Verified Reinforcement Learning for Optimization Modeling	Runda Guan et.al.	2604.00442	translate	read	null
2026-04-01	TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning	Wenxuan Jiang et.al.	2604.00438	translate	read	null
2026-04-01	Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games	Wonseok Yang et.al.	2604.00433	translate	read	null
2026-04-01	GUIDE: Reinforcement Learning for Behavioral Action Support in Type 1 Diabetes	Saman Khamesian et.al.	2604.00385	translate	read	null
2026-04-01	Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning	Eric Hanchen Jiang et.al.	2604.00344	translate	read	null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)