Reinforcement Learning - 2026-01

Publish Date	Title	Authors	PDF	Translate	Read	Code
2026-01-30	IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models	Seyed Ahmad Hosseini Miangoleh et.al.	2601.23266	translate	read	null
2026-01-30	Agile Reinforcement Learning through Separable Neural Architecture	Rajib Mostakim et.al.	2601.23225	translate	read	null
2026-01-30	Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning	Xiangyu Zeng et.al.	2601.23224	translate	read	null
2026-01-30	Med-Scout: Curing MLLMs’ Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training	Anglin Liu et.al.	2601.23220	translate	read	null
2026-01-30	Unsupervised Hierarchical Skill Discovery	Damion Harvey et.al.	2601.23156	translate	read	null
2026-01-30	On Safer Reinforcement Learning Policies for Sedation and Analgesia in Intensive Care	Joel Romero-Hernandez et.al.	2601.23154	translate	read	null
2026-01-30	THINKSAFE: Self-Generated Safety Alignment for Reasoning Models	Seanie Lee et.al.	2601.23143	translate	read	link
2026-01-30	Why GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive Gradients	Cheng Ge et.al.	2601.23135	translate	read	null
2026-01-30	Temporally Coherent Imitation Learning via Latent Action Flow Matching for Robotic Manipulation	Wu Songwei et.al.	2601.23087	translate	read	null
2026-01-30	RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning	Yuexin Bian et.al.	2601.23075	translate	read	null
2026-01-30	From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning	Wenzhe Niu et.al.	2601.23058	translate	read	null
2026-01-30	Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning	Siyu Gong et.al.	2601.23032	translate	read	null
2026-01-30	Mem-T: Densifying Rewards for Long-Horizon Memory Agents	Yanwei Yue et.al.	2601.23014	translate	read	null
2026-01-30	Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning	Xinchen Han et.al.	2601.23010	translate	read	null
2026-01-30	Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text	Ximing Lu et.al.	2601.22975	translate	read	null
2026-01-30	Self-Imitated Diffusion Policy for Efficient and Robust Visual Navigation	Runhua Zhang et.al.	2601.22965	translate	read	null
2026-01-30	SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding	Boyin Tan et.al.	2601.22956	translate	read	null
2026-01-30	MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving	Xidong Li et.al.	2601.22930	translate	read	null
2026-01-30	MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop	Xuancheng Li et.al.	2601.22900	translate	read	null
2026-01-30	PlatoLTL: Learning to Generalize Across Symbols in LTL Instructions for Multi-Task RL	Jacques Cloete et.al.	2601.22891	translate	read	null
2026-01-30	Reinforcement Learning-Based Co-Design and Operation of Chiller and Thermal Energy Storage for Cost-Optimal HVAC Systems	Tanay Raghunandan Srinivasa et.al.	2601.22880	translate	read	null
2026-01-30	Degradation-Aware Frequency Regulation of a Heterogeneous Battery Fleet via Reinforcement Learning	Tanay Raghunandan Srinivasa et.al.	2601.22865	translate	read	null
2026-01-30	The two-nest ants process on triangle-series-parallel graphs	Cécile Mailler et.al.	2601.22855	translate	read	null
2026-01-30	Robust Rigid Body Assembly via Contact-Implicit Optimal Control with Exact Second-Order Derivatives	Christian Dietz et.al.	2601.22849	translate	read	null
2026-01-30	Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment	Mathieu Petitbois et.al.	2601.22823	translate	read	null
2026-01-30	CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning	Ji Shi et.al.	2601.22803	translate	read	null
2026-01-30	Clipping-Free Policy Optimization for Large Language Models	Ömer Veysel Çağatan et.al.	2601.22801	translate	read	null
2026-01-30	TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization	Shichao Ma et.al.	2601.22776	translate	read	null
2026-01-30	A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization	Shiye Lei et.al.	2601.22718	translate	read	null
2026-01-30	Real-Time Aligned Reward Model beyond Semantics	Zixuan Huang et.al.	2601.22664	translate	read	null
2026-01-30	Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability	Yong Ren et.al.	2601.22661	translate	read	null
2026-01-30	COBRA++: Enhanced COBRA Optimizer with Augmented Surrogate Pool and Reinforced Surrogate Selection	Zepei Yu et.al.	2601.22624	translate	read	null
2026-01-30	From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents	Jiaxuan Gao et.al.	2601.22607	translate	read	null
2026-01-30	Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR	Hao Yi et.al.	2601.22595	translate	read	null
2026-01-30	MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning	Youngeun Kim et.al.	2601.22582	translate	read	null
2026-01-30	Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation	Geonho Leem et.al.	2601.22550	translate	read	null
2026-01-30	PersonaAct: Simulating Short-Video Users with Personalized Agents for Counterfactual Filter Bubble Auditing	Shilong Zhao et.al.	2601.22547	translate	read	null
2026-01-30	Adapting Reinforcement Learning for Path Planning in Constrained Parking Scenarios	Feng Tao et.al.	2601.22545	translate	read	null
2026-01-30	Detect and Act: Automated Dynamic Optimizer through Meta-Black-Box Optimization	Zijian Gao et.al.	2601.22542	translate	read	null
2026-01-30	One Ring to Rule Them All: Unifying Group-Based RL via Dynamic Power-Mean Geometry	Weisong Zhao et.al.	2601.22521	translate	read	null
2026-01-30	RoboStriker: Hierarchical Decision-Making for Autonomous Humanoid Boxing	Kangning Yin et.al.	2601.22517	translate	read	null
2026-01-30	Mock Worlds, Real Skills: Building Small Agentic Language Models with Synthetic Tasks, Simulated Environments, and Rubric-Based Rewards	Yuan-Jay Lü et.al.	2601.22511	translate	read	null
2026-01-30	DreamVAR: Taming Reinforced Visual Autoregressive Model for High-Fidelity Subject-Driven Image Generation	Xin Jiang et.al.	2601.22507	translate	read	null
2026-01-30	Action-Sufficient Goal Representations	Jinu Hyeon et.al.	2601.22496	translate	read	null
2026-01-30	SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization	Jinyang Wu et.al.	2601.22491	translate	read	null
2026-01-30	RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D Floorplanning	Ruizhe Zhong et.al.	2601.22476	translate	read	null
2026-01-30	Continual Policy Distillation from Distributed Reinforcement Learning Teachers	Yuxuan Li et.al.	2601.22475	translate	read	null
2026-01-30	Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology	Jian Xiong et.al.	2601.22474	translate	read	null
2026-01-30	HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning	Weiqi Wang et.al.	2601.22448	translate	read	null
2026-01-29	SAIR: Cost-Efficient Multi-Stage ML Pipeline Autoscaling via In-Context Reinforcement Learning	Jianchang Su et.al.	2601.22397	translate	read	null
2026-01-29	Quantum-Inspired Reinforcement Learning for Secure and Sustainable AIoT-Driven Supply Chain Systems	Muhammad Bilal Akram Dastagir et.al.	2601.22339	translate	read	null
2026-01-29	Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning	Qi Cao et.al.	2601.22323	translate	read	null
2026-01-29	Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning	Chenxi Liu et.al.	2601.22297	translate	read	null
2026-01-29	Learning Reward Functions for Cooperative Resilience in Multi-Agent Systems	Manuela Chacon-Chamorro et.al.	2601.22292	translate	read	null
2026-01-29	Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data	Zhihao Zhang et.al.	2601.22242	translate	read	null
2026-01-29	Smart Walkers in Discrete Space	Gianluca Peri et.al.	2601.22235	translate	read	null
2026-01-29	Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions	Lingkai Kong et.al.	2601.22211	translate	read	null
2026-01-29	Causal Imitation Learning Under Measurement Error and Distribution Shift	Shi Bo et.al.	2601.22206	translate	read	null
2026-01-28	ShellForge: Adversarial Co-Evolution of Webshell Generation and Multi-View Detection for Robust Webshell Defense	Yizhong Ding et.al.	2601.22182	translate	read	null
2026-01-29	Exploring Reasoning Reward Model for Agents	Kaixuan Fan et.al.	2601.22154	translate	read	link
2026-01-29	DynaWeb: Model-Based Reinforcement Learning of Web Agents	Hang Ding et.al.	2601.22149	translate	read	null
2026-01-29	Learning to Dial-a-Ride: A Deep Graph Reinforcement Learning Approach to the Electric Dial-a-Ride Problem	Sten Elling Tingstad Jacobsen et.al.	2601.22052	translate	read	null
2026-01-29	SIA: Symbolic Interpretability for Anticipatory Deep Reinforcement Learning in Network Control	MohammadErfan Jabbari et.al.	2601.22044	translate	read	null
2026-01-29	SymbXRL: Symbolic Explainable Deep Reinforcement Learning for Mobile Networks	Abhishek Duttagupta et.al.	2601.22024	translate	read	null
2026-01-29	Geometry of Drifting MDPs with Path-Integral Stability Certificates	Zuyuan Zhang et.al.	2601.21991	translate	read	null
2026-01-29	Elign: Equivariant Diffusion Model Alignment from Foundational Machine Learning Force Fields	Yunyang Li et.al.	2601.21985	translate	read	null
2026-01-29	Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic	Shuo Liu et.al.	2601.21972	translate	read	null
2026-01-29	MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts	Lorenzo Mazza et.al.	2601.21971	translate	read	null
2026-01-29	Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding	Yifan Zhu et.al.	2601.21969	translate	read	null
2026-01-29	OVD: On-policy Verbal Distillation	Jing Xiong et.al.	2601.21968	translate	read	null
2026-01-29	Optimistic Transfer under Task Shift via Bellman Alignment	Jinhang Chai et.al.	2601.21924	translate	read	null
2026-01-29	Self-Compression of Chain-of-Thought via Multi-Agent Reinforcement Learning	Yiqun Chen et.al.	2601.21919	translate	read	null
2026-01-29	ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation	Zhao Wang et.al.	2601.21912	translate	read	null
2026-01-29	From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning	Shaojie Wang et.al.	2601.21909	translate	read	null
2026-01-29	Acquiring Human-Like Mechanics Intuition from Scarce Observations via Deep Reinforcement Learning	Jingruo Peng et.al.	2601.21881	translate	read	null
2026-01-29	WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents	Yao Zhang et.al.	2601.21872	translate	read	null
2026-01-29	Spatiotemporal Continual Learning for Mobile Edge UAV Networks: Mitigating Catastrophic Forgetting	Chuan-Chi Lai et.al.	2601.21861	translate	read	null
2026-01-29	Self-Adaptive Probabilistic Skyline Query Processing in Distributed Edge Computing via Deep Reinforcement Learning	Chuan-Chi Lai et.al.	2601.21855	translate	read	null
2026-01-29	READY: Reward Discovery for Meta-Black-Box Optimization	Zechuan Huang et.al.	2601.21847	translate	read	null
2026-01-29	Constrained Meta Reinforcement Learning with Provable Test-Time Safety	Tingting Ni et.al.	2601.21845	translate	read	null
2026-01-29	Distribution-Aware Reward Estimation for Test-Time Reinforcement Learning	Bodong Du et.al.	2601.21804	translate	read	null
2026-01-29	Error Amplification Limits ANN-to-SNN Conversion in Continuous Control	Zijie Xu et.al.	2601.21778	translate	read	null
2026-01-29	OneMall: One Model, More Scenarios – End-to-End Generative Recommender Family at Kuaishou E-Commerce	Kun Zhang et.al.	2601.21770	translate	read	null
2026-01-29	Influence Guided Sampling for Domain Adaptation of Text Retrievers	Meet Doshi et.al.	2601.21759	translate	read	null
2026-01-29	Language-based Trial and Error Falls Behind in the Era of Experience	Haoyu Wang et.al.	2601.21754	translate	read	link
2026-01-29	Epistemic Context Learning: Building Trust the Right Way in LLM-Based Multi-Agent Systems	Ruiwen Zhou et.al.	2601.21742	translate	read	null
2026-01-29	Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators	Rebecca Pelke et.al.	2601.21737	translate	read	null
2026-01-29	When does predictive inverse dynamics outperform behavior cloning?	Lukas Schäfer et.al.	2601.21718	translate	read	null
2026-01-29	Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations	Donatien Delehelle et.al.	2601.21713	translate	read	null
2026-01-29	TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning	Huiyuan Lai et.al.	2601.21711	translate	read	null
2026-01-29	Can David Beat Goliath? On Multi-Hop Reasoning with Resource-Constrained Agents	Hojae Han et.al.	2601.21699	translate	read	null
2026-01-29	BAP-SRL: Bayesian Adaptive Priority Safe Reinforcement Learning for Vehicle Motion Planning at Mixed Traffic Intersections	Yuansheng Lian et.al.	2601.21679	translate	read	null
2026-01-29	Expected Return Causes Outcome-Level Mode Collapse in Reinforcement Learning and How to Fix It with Inverse Probability Scaling	Abhijeet Sinha et.al.	2601.21669	translate	read	null
2026-01-29	Reinforcement Learning for Adaptive Composition of Quantum Circuit Optimisation Passes	Daniel Mills et.al.	2601.21629	translate	read	null
2026-01-29	PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization	Songhan Jiang et.al.	2601.21617	translate	read	null
2026-01-29	RecNet: Self-Evolving Preference Propagation for Agentic Recommender Systems	Bingqian Li et.al.	2601.21609	translate	read	null
2026-01-29	Beyond Imitation: Reinforcement Learning for Active Latent Planning	Zhi Zheng et.al.	2601.21598	translate	read	link
2026-01-29	Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening	Xiaotong Ji et.al.	2601.21590	translate	read	null
2026-01-29	Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks	Jinhao Li et.al.	2601.21572	translate	read	null
2026-01-29	ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas	Xiaoyu Tian et.al.	2601.21558	translate	read	link
2026-01-29	Training slow silicon neurons to control extremely fast robots with spiking reinforcement learning	Irene Ambrosini et.al.	2601.21548	translate	read	null
2026-01-29	Explicit Credit Assignment through Local Rewards and Dependence Graphs in Multi-Agent Reinforcement Learning	Bang Giang Le et.al.	2601.21523	translate	read	null
2026-01-29	ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment	Xiuyu Li et.al.	2601.21484	translate	read	null
2026-01-29	Mean-Field Control on Sparse Graphs: From Local Limits to GNNs via Neighborhood Distributions	Tobias Schmidt et.al.	2601.21477	translate	read	null
2026-01-29	SOUP: Token-level Single-sample Mix-policy Reinforcement Learning for Large Language Models	Lei Yang et.al.	2601.21476	translate	read	null
2026-01-29	MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning	Yaorui Shi et.al.	2601.21468	translate	read	null
2026-01-29	HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing	Chengyu Du et.al.	2601.21459	translate	read	null
2026-01-29	Mitigating Overthinking in Large Reasoning Models via Difficulty-aware Reinforcement Learning	Qian Wan et.al.	2601.21418	translate	read	null
2026-01-29	Towards Space-Based Environmentally-Adaptive Grasping	Leonidas Askianakis et.al.	2601.21394	translate	read	null
2026-01-29	Shaping the learning signal in a combined Q-learning rule to improve structured cooperation	Chunpeng Du et.al.	2601.21392	translate	read	null
2026-01-29	Intrinsic Reward Policy Optimization for Sparse-Reward Environments	Minjae Cho et.al.	2601.21391	translate	read	null
2026-01-29	Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control	Weidong Huang et.al.	2601.21363	translate	read	null
2026-01-29	Factored Causal Representation Learning for Robust Reward Modeling in RLHF	Yupei Yang et.al.	2601.21350	translate	read	null
2026-01-29	Self-Improving Pretraining: using post-trained models to pretrain better models	Ellen Xiaoqing Tan et.al.	2601.21343	translate	read	null
2026-01-29	Heterogeneous Vertiport Selection Optimization for On-Demand Air Taxi Services: A Deep Reinforcement Learning Approach	Aoyu Pang et.al.	2601.21316	translate	read	null
2026-01-29	Few-Shot Learning for Dynamic Operations of Automated Electric Taxi Fleets under Evolving Charging Infrastructure: A Meta-Deep Reinforcement Learning Approach	Xiaozhuang Li et.al.	2601.21312	translate	read	null
2026-01-29	The Surprising Difficulty of Search in Model-Based Reinforcement Learning	Wei-Di Chang et.al.	2601.21306	translate	read	null
2026-01-29	EGAM: Extended Graph Attention Model for Solving Routing Problems	Licheng Wang et.al.	2601.21281	translate	read	null
2026-01-29	Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels	Micah Rentschler et.al.	2601.21268	translate	read	null
2026-01-29	Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification	Yiju Guo et.al.	2601.21244	translate	read	null
2026-01-29	Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning	Xixian Yong et.al.	2601.21212	translate	read	null
2026-01-29	When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning	Wei Wen et.al.	2601.21208	translate	read	null
2026-01-29	Do Reasoning Models Enhance Embedding Models?	Wun Yu Chan et.al.	2601.21192	translate	read	null
2026-01-28	Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed	Minjae Kwon et.al.	2601.21094	translate	read	link
2026-01-28	Deep Reinforcement Learning for Fault-Adaptive Routing in Eisenstein-Jacobi Interconnection Topologies	Mohammad Walid Charrwi et.al.	2601.21090	translate	read	null
2026-01-28	OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence	Jarrod Barnes et.al.	2601.21083	translate	read	link
2026-01-28	Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report	Zhuoran Yang et.al.	2601.21051	translate	read	null
2026-01-28	Log2Motion: Biomechanical Motion Synthesis from Touch Logs	Michał Patryk Miazga et.al.	2601.21043	translate	read	null
2026-01-28	SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model	Zongheng Guo et.al.	2601.21031	translate	read	link
2026-01-28	Distributional Active Inference	Abdullah Akgül et.al.	2601.20985	translate	read	null
2026-01-28	End-to-end example-based sim-to-real RL policy transfer based on neural stylisation with application to robotic cutting	Jamie Hathaway et.al.	2601.20846	translate	read	null
2026-01-28	Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning	Minwu Kim et.al.	2601.20829	translate	read	link
2026-01-28	Reinforcement Learning via Self-Distillation	Jonas Hübotter et.al.	2601.20802	translate	read	link
2026-01-28	SERA: Soft-Verified Efficient Repository Agents	Ethan Shen et.al.	2601.20789	translate	read	link
2026-01-28	Less is More: Clustered Cross-Covariance Control for Offline RL	Nan Qiao et.al.	2601.20765	translate	read	null
2026-01-28	GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning	Zhiheng Jiang et.al.	2601.20753	translate	read	null
2026-01-28	Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions	Raul de la Rosa et.al.	2601.20714	translate	read	null
2026-01-28	One Step Is Enough: Dispersive MeanFlow Policy Optimization	Guowei Zou et.al.	2601.20701	translate	read	null
2026-01-28	Grover’s Search-Inspired Quantum Reinforcement Learning for Massive MIMO User Scheduling	Ruining Fan et.al.	2601.20688	translate	read	null
2026-01-28	Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models	Zhiqiang Kou et.al.	2601.20687	translate	read	null
2026-01-28	GPO: Growing Policy Optimization for Legged Robot Locomotion and Whole-Body Control	Shuhao Liao et.al.	2601.20668	translate	read	null
2026-01-28	Deep Learning based Three-stage Solution for ISAC Beamforming Optimization	Qian Gao et.al.	2601.20667	translate	read	null
2026-01-28	Integrated Sensing and Communication for Segmented Waveguide-Enabled Pinching Antenna Systems	Qian Gao et.al.	2601.20658	translate	read	null
2026-01-28	RL based Beamforming Optimization for 3D Pinching Antenna assisted ISAC Systems	Qian Gao et.al.	2601.20654	translate	read	null
2026-01-28	P2S: Probabilistic Process Supervision for General-Domain Reasoning Question Answering	Wenlin Zhong et.al.	2601.20649	translate	read	null
2026-01-28	Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation	Yanqi Dai et.al.	2601.20614	translate	read	null
2026-01-28	Ranking-aware Reinforcement Learning for Ordinal Ranking	Aiming Hao et.al.	2601.20585	translate	read	null
2026-01-28	Inequality in Congestion Games with Learning Agents	Dimitris Michailidis et.al.	2601.20578	translate	read	null
2026-01-28	Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual Explanations	Fatima Ezzeddine et.al.	2601.20449	translate	read	null
2026-01-28	PEARL: Plan Exploration and Adaptive Reinforcement Learning for Multihop Tool Use	Qihao Wang et.al.	2601.20439	translate	read	null
2026-01-28	MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models	Wenbo Xu et.al.	2601.20433	translate	read	null
2026-01-28	Reinforcement Learning for Dividend Optimization in Partially Observed Regime-Switching Diffusion Model	Zhongqin Gao et.al.	2601.20387	translate	read	null
2026-01-28	PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments	Zhuang Chen et.al.	2601.20330	translate	read	null
2026-01-28	CE-RM: A Pointwise Generative Reward Model Optimized via Two-Stage Rollout and Unified Criteria	Xinyu Hu et.al.	2601.20327	translate	read	null
2026-01-28	Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models	Zhenchen Tang et.al.	2601.20305	translate	read	null
2026-01-28	Proactive SFC Provisioning with Forecast-Driven DRL in Data Centers	Parisa Fard Moshiri et.al.	2601.20229	translate	read	null
2026-01-28	Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning	Hang Zhang et.al.	2601.20221	translate	read	null
2026-01-28	Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning	Jinyang Wu et.al.	2601.20209	translate	read	null
2026-01-28	Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery	Zhipeng Zhang et.al.	2601.20193	translate	read	null
2026-01-27	Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models	Abha Jha et.al.	2601.20126	translate	read	null
2026-01-27	A Reinforcement Learning Based Universal Sequence Design for Polar Codes	David Kin Wai Ho et.al.	2601.20118	translate	read	null
2026-01-27	In-Context Reinforcement Learning From Suboptimal Historical Data	Juncheng Dong et.al.	2601.20116	translate	read	null
2026-01-27	Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis	Darshan Deshpande et.al.	2601.20103	translate	read	null
2026-01-27	Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery	Meng Xin et.al.	2601.20088	translate	read	null
2026-01-27	Techno-economic optimization of a heat-pipe microreactor, part II: multi-objective optimization analysis	Paul Seurin et.al.	2601.20079	translate	read	null
2026-01-27	Distributional value gradients for stochastic environments	Baptiste Debes et.al.	2601.20071	translate	read	null
2026-01-27	Exploring the holographic entropy cone via reinforcement learning	Temple He et.al.	2601.19979	translate	read	null
2026-01-27	E2HiL: Entropy-Guided Sample Selection for Efficient Real-World Human-in-the-Loop Reinforcement Learning	Haoyuan Deng et.al.	2601.19969	translate	read	null
2026-01-27	Self-Distillation Enables Continual Learning	Idan Shenfeld et.al.	2601.19897	translate	read	null
2026-01-27	A Latent Space Framework for Modeling Transient Engine Emissions Using Joint Embedding Predictive Architectures	Ganesh Sundaram et.al.	2601.19822	translate	read	null
2026-01-27	Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals	Octavio Pappalardo et.al.	2601.19810	translate	read	null
2026-01-27	Reimagining Peer Review Process Through Multi-Agent Mechanism Design	Ahmad Farooq et.al.	2601.19778	translate	read	null
2026-01-27	Reimagining Social Robots as Recommender Systems: Foundations, Framework, and Applications	Jin Huang et.al.	2601.19761	translate	read	null
2026-01-27	Improving Policy Exploitation in Online Reinforcement Learning with Instant Retrospect Action	Gong Gao et.al.	2601.19720	translate	read	null
2026-01-27	Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow	Yunyue Wei et.al.	2601.19707	translate	read	null
2026-01-27	AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion	Tianyue Jiang et.al.	2601.19697	translate	read	null
2026-01-27	Video-KTR: Reinforcing Video Reasoning via Key Token Attribution	Ziyue Wang et.al.	2601.19686	translate	read	null
2026-01-27	Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning	Tongxi Wang et.al.	2601.19624	translate	read	null
2026-01-27	R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning	Zhizheng Jiang et.al.	2601.19620	translate	read	null
2026-01-27	Safe Exploration via Policy Priors	Manuel Wendl et.al.	2601.19612	translate	read	null
2026-01-27	LLM-Enhanced Reinforcement Learning for Long-Term User Satisfaction in Interactive Recommendation	Chongjun Xia et.al.	2601.19585	translate	read	null
2026-01-27	Bridging Information Asymmetry: A Hierarchical Framework for Deterministic Blind Face Restoration	Zhengjian Yao et.al.	2601.19506	translate	read	null
2026-01-27	Reinforcement Learning Goal-Reaching Control with Guaranteed Lyapunov-Like Stabilizer for Mobile Robots	Mehdi Heydari Shahna et.al.	2601.19499	translate	read	null
2026-01-27	APC-RL: Exceeding Data-Driven Behavior Priors with Adaptive Policy Composition	Finn Rietz et.al.	2601.19452	translate	read	null
2026-01-27	OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation	Giuseppe Chiari et.al.	2601.19439	translate	read	null
2026-01-27	Task-Centric Policy Optimization from Misaligned Motion Priors	Ziang Zheng et.al.	2601.19411	translate	read	null
2026-01-27	CHEHAB RL: Learning to Optimize Fully Homomorphic Encryption Computations	Bilel Sefsaf et.al.	2601.19367	translate	read	null
2026-01-27	From Observations to Events: Event-Aware World Model for Reinforcement Learning	Zhao-Han Peng et.al.	2601.19336	translate	read	null
2026-01-27	Innovator-VL: A Multimodal Large Language Model for Scientific Discovery	Zichen Wen et.al.	2601.19325	translate	read	null
2026-01-27	Reinforced Rate Control for Neural Video Compression via Inter-Frame Rate-Distortion Awareness	Wuyang Cong et.al.	2601.19293	translate	read	null
2026-01-27	Model-Free Output Feedback Stabilization via Policy Gradient Methods	Ankang Zhang et.al.	2601.19284	translate	read	null
2026-01-27	Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning	Kishan Panaganti et.al.	2601.19280	translate	read	null
2026-01-27	Reinforcement Learning for Enhanced Advanced QEC Architecture Decoding	Yidong Zhou et.al.	2601.19279	translate	read	null
2026-01-27	iFAN Ecosystem: A Unified AI, Digital Twin, Cyber-Physical Security, and Robotics Environment for Advanced Nuclear Simulation and Operations	Youndo Do et.al.	2601.19234	translate	read	null
2026-01-27	Structure-based RNA Design by Step-wise Optimization of Latent Diffusion Model	Qi Si et.al.	2601.19232	translate	read	null
2026-01-27	Towards Pixel-Level VLM Perception via Simple Points Prediction	Tianhui Song et.al.	2601.19228	translate	read	null
2026-01-27	Exploring Weaknesses in Function Call Models via Reinforcement Learning: An Adversarial Data Augmentation Approach	Weiran Guo et.al.	2601.19122	translate	read	null
2026-01-27	Glance and Focus Reinforcement for Pan-cancer Screening	Linshan Wu et.al.	2601.19103	translate	read	null
2026-01-27	Reward Engineering for Reinforcement Learning in Software Tasks	Md Rayhanul Masud et.al.	2601.19100	translate	read	null
2026-01-27	m2sv: A Scalable Benchmark for Map-to-Street-View Spatial Reasoning	Yosub Shin et.al.	2601.19099	translate	read	null
2026-01-27	Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback	Siddhant Arora et.al.	2601.19063	translate	read	null
2026-01-26	A Unifying View of Coverage in Linear Off-Policy Evaluation	Philip Amortila et.al.	2601.19030	translate	read	null
2026-01-26	Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning	Haolin Liu et.al.	2601.18984	translate	read	null
2026-01-26	Reinforcement Learning for Quantum Technology	Marin Bukov et.al.	2601.18953	translate	read	null
2026-01-26	Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach	Mehrdad Mohammadi et.al.	2601.18952	translate	read	null
2026-01-26	Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration	Hwanwoo Kim et.al.	2601.18907	translate	read	null
2026-01-26	Analysis of Control Bellman Residual Minimization for Markov Decision Problem	Donghwan Lee et.al.	2601.18840	translate	read	null
2026-01-26	Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes	Amrith Setlur et.al.	2601.18795	translate	read	null
2026-01-26	Multi-Objective Reinforcement Learning for Efficient Tactical Decision Making for Trucks in Highway Traffic	Deepthi Pathare et.al.	2601.18783	translate	read	null
2026-01-26	POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration	Yuxiao Qu et.al.	2601.18779	translate	read	null
2026-01-26	Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability	Shobhita Sundaram et.al.	2601.18778	translate	read	null
2026-01-26	Dep-Search: Learning Dependency-Aware Reasoning Traces with Persistent Memory	Yanming Liu et.al.	2601.18771	translate	read	null
2026-01-26	Trust, Don’t Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback	Seyed Amir Hosseini et.al.	2601.18751	translate	read	null
2026-01-26	Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models	Siyan Zhao et.al.	2601.18734	translate	read	null
2026-01-26	Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale	Henry Bell et.al.	2601.18730	translate	read	null
2026-01-26	Trustworthy Evaluation of Robotic Manipulation: A New Benchmark and AutoEval Methods	Mengyuan Liu et.al.	2601.18723	translate	read	null
2026-01-26	Health-SCORE: Towards Scalable Rubrics for Improving Health-LLMs	Zhichao Yang et.al.	2601.18706	translate	read	null
2026-01-26	ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule	Yilie Huang et.al.	2601.18681	translate	read	null
2026-01-26	AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning	Mingyang Song et.al.	2601.18631	translate	read	null
2026-01-26	Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning	Yingxiao Huo et.al.	2601.18626	translate	read	null
2026-01-26	Learning long term climate-resilient transport adaptation pathways under direct and indirect flood impacts using reinforcement learning	Miguel Costa et.al.	2601.18586	translate	read	null
2026-01-26	From Classification to Ranking: Enhancing LLM Reasoning Capabilities for MBTI Personality Detection	Yuan Cao et.al.	2601.18582	translate	read	null
2026-01-26	K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents	Vincenzo De Paola et.al.	2601.18580	translate	read	null
2026-01-26	GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning	Kaixun Jiang et.al.	2601.18543	translate	read	null
2026-01-26	From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation	Yuxin Jiang et.al.	2601.18533	translate	read	null
2026-01-26	Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates	Yibo Li et.al.	2601.18510	translate	read	null
2026-01-26	Enhancing Control Policy Smoothness by Aligning Actions with Predictions from Preceding States	Kyoleen Kwak et.al.	2601.18479	translate	read	null
2026-01-26	OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents	Yuhang Zhou et.al.	2601.18467	translate	read	null
2026-01-26	Deep Reinforcement Learning for Hybrid RIS Assisted MIMO Communications	Phuong Nam Tran et.al.	2601.18453	translate	read	null
2026-01-26	Emergent Cooperation in Quantum Multi-Agent Reinforcement Learning Using Communication	Michael Kölle et.al.	2601.18419	translate	read	null
2026-01-26	daVinci-Dev: Agent-native Mid-training for Software Engineering	Ji Zeng et.al.	2601.18418	translate	read	null
2026-01-26	AI Agent for Reverse-Engineering Legacy Finite-Difference Code and Translating to Devito	Yinghan Hou et.al.	2601.18381	translate	read	null
2026-01-26	Temp-R1: A Unified Autonomous Agent for Complex Temporal KGQA via Reverse Curriculum Reinforcement Learning	Zhaoyan Gong et.al.	2601.18296	translate	read	null
2026-01-26	Reinforcement Learning with Distributed MPC for Fuel-Efficient Platoon Control with Discrete Gear Transitions	Samuel Mallick et.al.	2601.18294	translate	read	null
2026-01-26	TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment	Zhewen Tan et.al.	2601.18292	translate	read	null
2026-01-26	VissimRL: A Multi-Agent Reinforcement Learning Framework for Traffic Signal Control Based on Vissim	Hsiao-Chuan Chang et.al.	2601.18284	translate	read	null
2026-01-26	Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue	Yuhang Jia et.al.	2601.18281	translate	read	null
2026-01-26	ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants	Pei Wang et.al.	2601.18225	translate	read	null
2026-01-26	Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents	Zhihan Liu et.al.	2601.18217	translate	read	null
2026-01-26	PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR	James Burgess et.al.	2601.18207	translate	read	null
2026-01-26	QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding	Linhan Cao et.al.	2601.18195	translate	read	null
2026-01-26	FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning	Zhaopeng Qiu et.al.	2601.18150	translate	read	null
2026-01-26	Exact Minimum-Volume Confidence Set Intersection for Multinomial Outcomes	Heguang Lin et.al.	2601.18145	translate	read	null
2026-01-26	Enhance the Safety in Reinforcement Learning by ADRC Lagrangian Methods	Mingxu Zhang et.al.	2601.18142	translate	read	null
2026-01-26	Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions	Pedram Agand et.al.	2601.18107	translate	read	null
2026-01-26	Diffusion Model-based Reinforcement Learning for Version Age of Information Scheduling: Average and Tail-Risk-Sensitive Control	Haoyuan Pan et.al.	2601.18069	translate	read	null
2026-01-23	Autonomous Optical Alignment of Satellite-Based Entanglement Sources using Reinforcement Learning	Andrzej Gajewski et.al.	2601.16968	translate	read	null
2026-01-23	The Trajectory Alignment Coefficient in Two Acts: From Reward Tuning to Reward Learning	Calarina Muslimani et.al.	2601.16906	translate	read	null
2026-01-23	Boosting Deep Reinforcement Learning with Semantic Knowledge for Robotic Manipulators	Lucía Güitta-López et.al.	2601.16866	translate	read	null
2026-01-23	Reasoning Promotes Robustness in Theory of Mind Tasks	Ian B. de Haan et.al.	2601.16853	translate	read	null
2026-01-23	LongCat-Flash-Thinking-2601 Technical Report	Meituan LongCat Team et.al.	2601.16725	translate	read	null
2026-01-23	Adaptive Reinforcement and Model Predictive Control Switching for Safe Human-Robot Cooperative Navigation	Ning Liu et.al.	2601.16686	translate	read	null
2026-01-23	Sim-to-Real Transfer via a Style-Identified Cycle Consistent Generative Adversarial Network: Zero-Shot Deployment on Robotic Manipulators through Visual Domain Adaptation	Lucía Güitta-López et.al.	2601.16677	translate	read	null
2026-01-23	A Cognitive Framework for Autonomous Agents: Toward Human-Inspired Design	Francesco Guidi et.al.	2601.16648	translate	read	null
2026-01-23	Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab	Julius Beerwerth et.al.	2601.16578	translate	read	null
2026-01-23	Spiking Neural Networks for Communication Systems: Encoding Schemes, Learning Algorithms, and Equalization~Techniques	Eike-Manuel Edelmann et.al.	2601.16550	translate	read	null
2026-01-23	UAV-Assisted Joint Data Collection and Wireless Power Transfer for Batteryless Sensor Networks	Wen Zhang et.al.	2601.16533	translate	read	null
2026-01-23	Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic	Yichuan Ma et.al.	2601.16486	translate	read	null
2026-01-23	FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning	Haoxu Wang et.al.	2601.16483	translate	read	null
2026-01-23	Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go	Yichuan Ma et.al.	2601.16447	translate	read	null
2026-01-23	Endless Terminals: Scaling RL Environments for Terminal Agents	Kanishk Gandhi et.al.	2601.16443	translate	read	link
2026-01-23	Reinforcement Learning-Based Energy-Aware Coverage Path Planning for Precision Agriculture	Beining Wu et.al.	2601.16405	translate	read	null
2026-01-23	Towards a Theoretical Understanding to the Generalization of RLHF	Zhaochun Li et.al.	2601.16403	translate	read	null
2026-01-23	Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification	Zongwan Cao et.al.	2601.16400	translate	read	null
2026-01-23	A Regularized Actor-Critic Algorithm for Bi-Level Reinforcement Learning	Sihan Zeng et.al.	2601.16399	translate	read	null
2026-01-22	LLM-in-Sandbox Elicits General Agentic Intelligence	Daixuan Cheng et.al.	2601.16206	translate	read	link
2026-01-22	Learning to Discover at Test Time	Mert Yuksekgonul et.al.	2601.16175	translate	read	link
2026-01-22	Structured Hints for Sample-Efficient Lean Theorem Proving	Zachary Burton et.al.	2601.16172	translate	read	null
2026-01-22	Efficiently Learning Robust Torque-based Locomotion Through Reinforcement with Model-Based Supervision	Yashuai Yan et.al.	2601.16109	translate	read	null
2026-01-22	SAMTok: Representing Any Mask with Two Words	Yikang Zhou et.al.	2601.16093	translate	read	link
2026-01-22	Dynamic Tactile Sensing System and Soft Actor Critic Reinforcement Learning for Inclusion Characterization	John Bannan et.al.	2601.16061	translate	read	null
2026-01-22	Keyframe-Based Feed-Forward Visual Odometry	Weichen Dai et.al.	2601.16020	translate	read	null
2026-01-22	PUMA: Perception-driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour	Liang Wang et.al.	2601.15995	translate	read	null
2026-01-22	Decoupling Return-to-Go for Efficient Decision Transformer	Yongyi Wang et.al.	2601.15953	translate	read	null
2026-01-22	Off-Policy Actor-Critic with Sigmoid-Bounded Entropy for Real-World Robot Learning	Xiefeng Wu et.al.	2601.15761	translate	read	null
2026-01-22	PhysProver: Advancing Automatic Theorem Proving for Physics	Hanning Zhang et.al.	2601.15737	translate	read	null
2026-01-22	Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind	Zhitao He et.al.	2601.15715	translate	read	null
2026-01-22	D-Optimality-Guided Reinforcement Learning for Efficient Open-Loop Calibration of a 3-DOF Ankle Rehabilitation Robot	Qifan Hu et.al.	2601.15707	translate	read	null
2026-01-22	From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models	Jiaxin Zhang et.al.	2601.15690	translate	read	null
2026-01-22	Performance-guided Reinforced Active Learning for Object Detection	Zhixuan Liang et.al.	2601.15688	translate	read	null
2026-01-22	EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning	Dingdong Wang et.al.	2601.15668	translate	read	null
2026-01-22	Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors	Zhiwei Zhang et.al.	2601.15625	translate	read	null
2026-01-22	Explainable Deepfake Detection with RL Enhanced Self-Blended Images	Ning Jiang et.al.	2601.15624	translate	read	null
2026-01-22	AION: Aerial Indoor Object-Goal Navigation Using Dual-Policy Reinforcement Learning	Zichen Yan et.al.	2601.15614	translate	read	null
2026-01-22	When Sharpening Becomes Collapse: Sampling Bias and Semantic Coupling in RL with Verifiable Rewards	Mingyuan Fan et.al.	2601.15609	translate	read	null
2026-01-22	A Mobile Magnetic Manipulation Platform for Gastrointestinal Navigation with Deep Reinforcement Learning Control	Zhifan Yan et.al.	2601.15545	translate	read	null
2026-01-21	Non-Stationary Functional Bilevel Optimization	Jason Bohne et.al.	2601.15363	translate	read	null
2026-01-21	Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing	Xiang Li et.al.	2601.15356	translate	read	null
2026-01-21	Statistical Reinforcement Learning in the Real World: A Survey of Challenges and Future Directions	Asim H. Gazi et.al.	2601.15353	translate	read	null
2026-01-20	ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation	Zhebo Wang et.al.	2601.15330	translate	read	null
2026-01-21	The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models	Zanlin Ni et.al.	2601.15165	translate	read	link
2026-01-21	Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning	Yuval Kansal et.al.	2601.15160	translate	read	null
2026-01-21	Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data	Yuval Ran-Milo et.al.	2601.15158	translate	read	null
2026-01-21	CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning	Tianshi Xu et.al.	2601.15141	translate	read	null
2026-01-21	Vehicle Routing with Finite Time Horizon using Deep Reinforcement Learning with Improved Network Embedding	Ayan Maity et.al.	2601.15131	translate	read	null
2026-01-21	Memory Retention Is Not Enough to Master Memory Tasks in Reinforcement Learning	Oleg Shchendrigin et.al.	2601.15086	translate	read	null
2026-01-21	A Curriculum-Based Deep Reinforcement Learning Framework for the Electric Vehicle Routing Problem	Mertcan Daysalilar et.al.	2601.15038	translate	read	null
2026-01-21	Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control	Jannis Becktepe et.al.	2601.15015	translate	read	null
2026-01-21	Improving Regret Approximation for Unsupervised Dynamic Environment Generation	Harry Mead et.al.	2601.14957	translate	read	null
2026-01-21	Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation	Rui Qi et.al.	2601.14896	translate	read	null
2026-01-21	What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study	Keyu Lv et.al.	2601.14888	translate	read	null
2026-01-21	CI4A: Semantic Component Interfaces for Agents Empowering Web Automation	Zhi Qiu et.al.	2601.14790	translate	read	null
2026-01-21	ReinPath: A Multimodal Reinforcement Learning Approach for Pathology	Kangcheng Zhou et.al.	2601.14757	translate	read	null
2026-01-21	PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning	Yao Lu et.al.	2601.14716	translate	read	null
2026-01-21	DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs	Mingxuan Song et.al.	2601.14711	translate	read	null
2026-01-21	Case-Guided Sequential Assay Planning in Drug Discovery	Tianchi Chen et.al.	2601.14710	translate	read	null
2026-01-21	Proximal Policy Optimization with Evolutionary Mutations	Casimir Czworkowski et.al.	2601.14705	translate	read	null
2026-01-21	DARL: Encouraging Diverse Answers for General Reasoning without Verifiers	Chongxuan Huang et.al.	2601.14700	translate	read	null
2026-01-21	CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation	Yutong Chen et.al.	2601.14695	translate	read	null
2026-01-21	Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning	Jianwen Sun et.al.	2601.14693	translate	read	null
2026-01-21	FARE: Fast-Slow Agentic Robotic Exploration	Shuhao Liao et.al.	2601.14681	translate	read	null
2026-01-21	MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks	Zixuan Ke et.al.	2601.14652	translate	read	null
2026-01-21	SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation	Xichen Zhang et.al.	2601.14615	translate	read	null
2026-01-21	Learning Consistent Taxonomic Classification through Hierarchical Reasoning	Zhenghong Li et.al.	2601.14610	translate	read	null
2026-01-21	Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education	Unggi Lee et.al.	2601.14560	translate	read	null
2026-01-20	Report for NSF Workshop on AI for Electronic Design Automation	Deming Chen et.al.	2601.14541	translate	read	null
2026-01-20	Towards Execution-Grounded Automated AI Research	Chenglei Si et.al.	2601.14525	translate	read	link
2026-01-20	Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree	Leyi Zhao et.al.	2601.14523	translate	read	null
2026-01-20	Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow	Haocheng Xi et.al.	2601.14243	translate	read	null
2026-01-20	Spatiotemporal Wildfire Prediction and Reinforcement Learning for Helitack Suppression	Shaurya Mathur et.al.	2601.14238	translate	read	null
2026-01-20	Q-learning with Adjoint Matching	Qiyang Li et.al.	2601.14234	translate	read	link
2026-01-20	KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning	Egor Cherepanov et.al.	2601.14232	translate	read	link
2026-01-20	Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment	Punit Kumar et.al.	2601.14228	translate	read	null
2026-01-20	InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning	Matthew Y. R. Yang et.al.	2601.14209	translate	read	null
2026-01-20	Differentiated Pickup Point Offering for Emission Reduction in Last-Mile Delivery	Albina Galiullina et.al.	2601.14196	translate	read	null
2026-01-20	Toward Efficient Agents: Memory, Tool learning, and Planning	Xiaofang Yang et.al.	2601.14192	translate	read	link
2026-01-20	CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems	Tong Xie et.al.	2601.14140	translate	read	null
2026-01-20	Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning	Tairan Huang et.al.	2601.14104	translate	read	null
2026-01-20	Optimizing Energy and Data Collection in UAV-aided IoT Networks using Attention-based Multi-Objective Reinforcement Learning	Babacar Toure et.al.	2601.14092	translate	read	null
2026-01-20	RM-Distiller: Exploiting Generative LLM for Reward Model Distillation	Hongli Zhou et.al.	2601.14032	translate	read	null
2026-01-20	RL-BioAug: Label-Efficient Reinforcement Learning for Self-Supervised EEG Representation Learning	Cheol-Hui Lee et.al.	2601.13964	translate	read	null
2026-01-20	Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning	Hongbo Bai et.al.	2601.13942	translate	read	null
2026-01-20	Deep Reinforcement Learning-Based Dynamic Resource Allocation in Cell-Free Massive MIMO	Phuong Nam Tran et.al.	2601.13934	translate	read	null
2026-01-20	HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs	Yuezhe Yang et.al.	2601.13919	translate	read	null
2026-01-20	TractRLFusion: A GPT-Based Multi-Critic Policy Fusion Framework for Fiber Tractography	Ankita Joshi et.al.	2601.13897	translate	read	null
2026-01-20	Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering	Chak Tou Leong et.al.	2601.13752	translate	read	null
2026-01-20	Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning	Yue Guo et.al.	2601.13690	translate	read	null
2026-01-20	Reinforcement Learning for Opportunistic Routing in Software-Defined LEO-Terrestrial Systems	Sivaram Krishnan et.al.	2601.13662	translate	read	null
2026-01-20	Communication-Free Collective Navigation for a Swarm of UAVs via LiDAR-Based Deep Reinforcement Learning	Myong-Yol Choi et.al.	2601.13657	translate	read	null
2026-01-20	Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning	Yuchen Jiao et.al.	2601.13642	translate	read	null
2026-01-20	A Kubernetes custom scheduler based on reinforcement learning for compute-intensive pods	Hanlin Zhou et.al.	2601.13579	translate	read	null
2026-01-20	Behavior Knowledge Merge in Reinforced Agentic Models	Xiangchi Yuan et.al.	2601.13572	translate	read	link
2026-01-20	Reasoning While Recommending: Entropy-Guided Latent Reasoning in Generative Re-ranking Models	Changshuo Zhang et.al.	2601.13533	translate	read	null
2026-01-20	Group Relative Policy Optimization for Robust Blind Interference Alignment with Fluid Antennas	Jianqiu Peng et.al.	2601.13506	translate	read	null
2026-01-19	RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models	Bo Ren et.al.	2601.13409	translate	read	null
2026-01-19	Balancing Classification and Calibration Performance in Decision-Making LLMs via Calibration Aware Reinforcement Learning	Duygu Nur Yaldiz et.al.	2601.13284	translate	read	null
2026-01-19	CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning	Eric Onyame et.al.	2601.13262	translate	read	link
2026-01-19	Autonomous Navigation at the Nano-Scale: Algorithms, Architectures, and Constraints	Mahmud S. Zango et.al.	2601.13252	translate	read	null
2026-01-19	Training instability in deep learning follows low-dimensional dynamical principles	Zhipeng Zhang et.al.	2601.13160	translate	read	null
2026-01-19	Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning	Fengran Mo et.al.	2601.13115	translate	read	null
2026-01-19	Static Is Not Enough: A Comparative Study of VR and SpaceMouse in Static and Dynamic Teleoperation Tasks	Yijun Zhou et.al.	2601.13042	translate	read	null
2026-01-19	Feedforward-Feedback Integration in Flight Control: Reinforcement Learning with Sliding Mode Control	Imran Sayyed et.al.	2601.13037	translate	read	null
2026-01-19	Think3D: Thinking with Space for Spatial Reasoning	Zaibin Zhang et.al.	2601.13029	translate	read	link
2026-01-19	Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models	Runxuan Liu et.al.	2601.12995	translate	read	null
2026-01-19	PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient	Zijian Wang et.al.	2601.12988	translate	read	null
2026-01-19	Imitation learning-based spacecraft rendezvous and docking method with Expert Demonstration	Shibo Shao et.al.	2601.12952	translate	read	null
2026-01-19	Communication Methods in Multi-Agent Reinforcement Learning	Christoph Wittner et.al.	2601.12886	translate	read	null
2026-01-19	FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions	Peng Li et.al.	2601.12799	translate	read	link
2026-01-19	Unleashing Efficient Asynchronous RL Post-Training via Staleness-Constrained Rollout Coordination	Haoyang Li et.al.	2601.12784	translate	read	null
2026-01-19	SDN-Blockchain Based Security Routing for UAV Communication via Reinforcement Learning	Yulu Han et.al.	2601.12774	translate	read	null
2026-01-19	Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction	Xingjie Gao et.al.	2601.12762	translate	read	link
2026-01-19	Distribution-Centric Policy Optimization Dominates Exploration-Exploitation Trade-off	Zhaochun Li et.al.	2601.12730	translate	read	link
2026-01-19	Teaching Large Reasoning Models Effective Reflection	Hanbin Wang et.al.	2601.12720	translate	read	null
2026-01-19	Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization	Junyi Liao et.al.	2601.12707	translate	read	null
2026-01-19	Resource-Conscious RL Algorithms for Deep Brain Stimulation	Arkaprava Gupta et.al.	2601.12699	translate	read	null
2026-01-19	Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks	Xingran Chen et.al.	2601.12662	translate	read	null
2026-01-19	Two-Layer Reinforcement Learning-Assisted Joint Beamforming and Trajectory Optimization for Multi-UAV Downlink Communications	Ruiqi Wang et.al.	2601.12659	translate	read	null
2026-01-19	Multiagent Reinforcement Learning in Enhancing Resilience of Microgrids under Extreme Weather Events	Yin Wu et.al.	2601.12657	translate	read	null
2026-01-19	STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models	Xiangyu Shi et.al.	2601.12641	translate	read	null
2026-01-16	Do explanations generalize across large reasoning models?	Koyena Pal et.al.	2601.11517	translate	read	null
2026-01-16	Generative Scenario Rollouts for End-to-End Autonomous Driving	Rajeev Yasarla et.al.	2601.11475	translate	read	null
2026-01-16	The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents	Ziyu Wang et.al.	2601.11421	translate	read	null
2026-01-16	Factored Value Functions for Graph-Based Multi-Agent Reinforcement Learning	Ahmed Rashwan et.al.	2601.11401	translate	read	null
2026-01-16	The Mini Wheelbot Dataset: High-Fidelity Data for Robot Learning	Henrik Hose et.al.	2601.11394	translate	read	null
2026-01-16	Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency	Akhilesh Raj et.al.	2601.11352	translate	read	null
2026-01-16	Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation	Pingzhi Tang et.al.	2601.11258	translate	read	null
2026-01-16	Model-free policy gradient for discrete-time mean-field control	Matthieu Meunier et.al.	2601.11217	translate	read	null
2026-01-16	Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems	Sofiene Lassoued et.al.	2601.11189	translate	read	null
2026-01-16	TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech	Girish A. Koushik et.al.	2601.11178	translate	read	null
2026-01-16	Deep GraphRAG: A Balanced Approach to Hierarchical Retrieval and Adaptive Integration	Yuejie Li et.al.	2601.11144	translate	read	null
2026-01-16	Learning Quadrupedal Locomotion for a Heavy Hydraulic Robot Using an Actuator Model	Minho Lee et.al.	2601.11143	translate	read	null
2026-01-16	PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models	Qiyuan Zhang et.al.	2601.11087	translate	read	null
2026-01-16	Visual Marker Search for Autonomous Drone Landing in Diverse Urban Environments	Jiaohong Yao et.al.	2601.11078	translate	read	null
2026-01-16	Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs	Lecheng Yan et.al.	2601.11061	translate	read	null
2026-01-16	BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search	Shiyu Liu et.al.	2601.11037	translate	read	link
2026-01-16	Toward Adaptive Grid Resilience: A Gradient-Free Meta-RL Framework for Critical Load Restoration	Zain ul Abdeen et.al.	2601.10973	translate	read	null
2026-01-16	MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement	Meidan Ding et.al.	2601.10949	translate	read	null
2026-01-16	Where to Touch, How to Contact: Hierarchical RL-MPC Framework for Geometry-Aware Long-Horizon Dexterous Manipulation	Zhixian Xie et.al.	2601.10930	translate	read	null
2026-01-15	Realistic Curriculum Reinforcement Learning for Autonomous and Sustainable Marine Vessel Navigation	Zhang Xiaocai et.al.	2601.10911	translate	read	null
2026-01-15	Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning	Rajat Ghosh et.al.	2601.10905	translate	read	null
2026-01-15	Reasoning Models Generate Societies of Thought	Junsol Kim et.al.	2601.10825	translate	read	null
2026-01-11	Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration	Sen Wang et.al.	2601.10744	translate	read	null
2026-01-15	MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching	Changle Qu et.al.	2601.10712	translate	read	null
2026-01-15	Institutional AI: A Governance Framework for Distributional AGI Safety	Federico Pierucci et.al.	2601.10599	translate	read	null
2026-01-15	Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay	Hao Wang et.al.	2601.10589	translate	read	null
2026-01-15	Combinatorial Optimization Augmented Machine Learning	Maximilian Schiffer et.al.	2601.10583	translate	read	null
2026-01-15	PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models	Chengbing Wang et.al.	2601.10532	translate	read	null
2026-01-15	Projected Microbatch Accumulation yields reference-free proximal policy updates for reinforcement learning	Nilin Abrahamsen et.al.	2601.10498	translate	read	null
2026-01-15	Urban Socio-Semantic Segmentation with Vision-Language Reasoning	Yu Wang et.al.	2601.10477	translate	read	null
2026-01-15	Reinforcement Learning with Multi-Step Lookahead Information Via Adaptive Batching	Nadav Merlis et.al.	2601.10418	translate	read	null
2026-01-15	CS-GBA: A Critical Sample-based Gradient-guided Backdoor Attack for Offline Reinforcement Learning	Yuanjie Zhao et.al.	2601.10407	translate	read	null
2026-01-15	Advanced Manufacturing with Renewable and Bio-based Materials: AI/ML workflows and Process Optimization	Rigoberto Advincula et.al.	2601.10382	translate	read	null
2026-01-15	FastStair: Learning to Run Up Stairs with Humanoid Robots	Yan Liu et.al.	2601.10365	translate	read	null
2026-01-15	SuS: Strategy-aware Surprise for Intrinsic Exploration	Mark Kashirskiy et.al.	2601.10349	translate	read	null
2026-01-15	Boundary-Aware NL2SQL: Integrating Reliability through Hybrid Reward and Data Synthesis	Songsong Tian et.al.	2601.10318	translate	read	null
2026-01-15	Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning	Xin Guan et.al.	2601.10306	translate	read	null
2026-01-15	The impact of tactile sensor configurations on grasp learning efficiency – a comparative evaluation in simulation	Eszter Birtalan et.al.	2601.10268	translate	read	null
2026-01-15	PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary	Jiarui Yao et.al.	2601.10201	translate	read	null
2026-01-15	HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning	Ziang Cui et.al.	2601.10187	translate	read	null
2026-01-15	Reinforcement Learning to Discover a NorthEast Monsoon Index for Monthly Rainfall Prediction in Thailand	Kiattikun Chobtham et.al.	2601.10181	translate	read	null
2026-01-15	Service Provisioning and Path Planning with Obstacle Avoidance for Low-Altitude Wireless Networks	Senning Wan et.al.	2601.10179	translate	read	null
2026-01-15	ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback	Yutao Mou et.al.	2601.10156	translate	read	null
2026-01-15	DecisionLLM: Large Language Models for Long Sequence Decision Exploration	Xiaowei Lv et.al.	2601.10148	translate	read	null
2026-01-15	History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis	Haochong Xia et.al.	2601.10143	translate	read	null
2026-01-15	Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts	Sijia Luo et.al.	2601.10079	translate	read	null
2026-01-15	Event-Driven Deep RL Dispatcher for Post-Storm Distribution System Restoration	Farshad Amani et.al.	2601.10044	translate	read	null
2026-01-15	PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization	Tingyue Pan et.al.	2601.10029	translate	read	null
2026-01-15	Towards Native Intelligence: 6G-LLM Trained with Reinforcement Learning from NDT Feedback	Zhuoran Xiao et.al.	2601.09992	translate	read	null
2026-01-14	OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing	Yilin Bao et.al.	2601.09858	translate	read	null
2026-01-14	Eluder dimension: localise it!	Alireza Bakhtiari et.al.	2601.09825	translate	read	null
2026-01-14	GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents	Chen Chen et.al.	2601.09770	translate	read	null
2026-01-14	STEP3-VL-10B Technical Report	Ailin Huang et.al.	2601.09668	translate	read	null
2026-01-14	Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning	Zhiyuan Hu et.al.	2601.09667	translate	read	null
2026-01-14	DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing	Qian Cao et.al.	2601.09609	translate	read	null
2026-01-14	Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets	Jeremiah Coholich et.al.	2601.09605	translate	read	null
2026-01-14	Dialogue Telemetry: Turn-Level Instrumentation for Autonomous Information Gathering	Dimitris Panagopoulos et.al.	2601.09570	translate	read	null
2026-01-14	Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations	Wei-Jin Huang et.al.	2601.09518	translate	read	null
2026-01-14	Data Scaling for Navigation in Unknown Environments	Lauri Suomela et.al.	2601.09444	translate	read	null
2026-01-14	Draw it like Euclid: Teaching transformer models to generate CAD profiles using ruler and compass construction steps	Siyi Li et.al.	2601.09428	translate	read	null
2026-01-14	Semi-Contention-Free Access in IoT NOMA Networks: A Reinforcement Learning Framework	Abhishek Kumar et.al.	2601.09422	translate	read	null
2026-01-14	GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR	Jiaying Zhang et.al.	2601.09361	translate	read	null
2026-01-14	Monte-Carlo Tree Search with Neural Network Guidance for Lane-Free Autonomous Driving	Ioannis Peridis et.al.	2601.09353	translate	read	null
2026-01-14	Policy-Based Reinforcement Learning with Action Masking for Dynamic Job Shop Scheduling under Uncertainty: Handling Random Arrivals and Machine Failures	Sofiene Lassoued et.al.	2601.09293	translate	read	null
2026-01-14	Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction	Mianzhi Pan et.al.	2601.09285	translate	read	null
2026-01-14	RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering	Wencheng Ye et.al.	2601.09269	translate	read	link
2026-01-14	Learning to Trust Experience: A Monitor-Trust-Regulator Framework for Learning under Unobservable Feedback Reliability	Zhipeng Zhang et.al.	2601.09261	translate	read	null
2026-01-14	Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models	Yan Liu et.al.	2601.09260	translate	read	null
2026-01-14	Reward Learning through Ranking Mean Squared Error	Chaitanya Kharyal et.al.	2601.09236	translate	read	null
2026-01-14	GIFT: Unlocking Global Optimality in Post-Training via Finite-Temperature Gibbs Initialization	Zhengyang Zhao et.al.	2601.09233	translate	read	null
2026-01-14	UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning	Feng Zhang et.al.	2601.09215	translate	read	null
2026-01-14	SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL	Lijun Liu et.al.	2601.09136	translate	read	null
2026-01-14	SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache	Chi-Chih Chang et.al.	2601.09083	translate	read	null
2026-01-13	TranslateGemma Technical Report	Mara Finkelstein et.al.	2601.09012	translate	read	null
2026-01-13	Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge	Yao Tang et.al.	2601.08808	translate	read	null
2026-01-13	Identifying Latent Intentions via Inverse Reinforcement Learning in Repeated Linear Public Good Games	Carina I. Hausladen et.al.	2601.08803	translate	read	null
2026-01-13	Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs	Zhiyuan Hu et.al.	2601.08763	translate	read	null
2026-01-13	TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback	Prithwish Jana et.al.	2601.08734	translate	read	null
2026-01-13	Learning from Demonstrations via Capability-Aware Goal Sampling	Yuanlin Duan et.al.	2601.08731	translate	read	null
2026-01-13	Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts	Bert Verbruggen et.al.	2601.08726	translate	read	null
2026-01-13	QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models	Zhaolu Kang et.al.	2601.08689	translate	read	null
2026-01-13	PersonaDual: Balancing Personalization and Objectivity via Adaptive Reasoning	Xiaoyou Liu et.al.	2601.08679	translate	read	null
2026-01-13	VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory	Shaoan Wang et.al.	2601.08665	translate	read	null
2026-01-13	From Classical to Quantum Reinforcement Learning and Its Applications in Quantum Control: A Beginner’s Tutorial	Abhijit Sen et.al.	2601.08662	translate	read	null
2026-01-13	Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization	Abhijit Mazumdar et.al.	2601.08646	translate	read	null
2026-01-13	Your Group-Relative Advantage Is Biased	Fengkai Yang et.al.	2601.08521	translate	read	null
2026-01-13	AUV Trajectory Learning for Underwater Acoustic Energy Transfer and Age Minimization	Mohamed Afouene Melki et.al.	2601.08491	translate	read	null
2026-01-13	AME-2: Agile and Generalized Legged Locomotion via Attention-Based Neural Map Encoding	Chong Zhang et.al.	2601.08485	translate	read	null
2026-01-13	Baiting AI: Deceptive Adversary Against AI-Protected Industrial Infrastructures	Aryan Pasikhani et.al.	2601.08481	translate	read	null
2026-01-13	JudgeRLVR: Judge First, Generate Second for Efficient Reasoning	Jiangshan Duo et.al.	2601.08468	translate	read	null
2026-01-13	Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis	Yi Qin et.al.	2601.08440	translate	read	null
2026-01-13	Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management	Weitao Ma et.al.	2601.08435	translate	read	null
2026-01-13	Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?	Long Zhang et.al.	2601.08434	translate	read	null
2026-01-13	RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation	Sunzhu Li et.al.	2601.08430	translate	read	null
2026-01-13	Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering	Nonghai Zhang et.al.	2601.08427	translate	read	null
2026-01-13	Owen-Shapley Policy Optimization (OSPO): A Principled RL Algorithm for Generative Search LLMs	Abhijnan Nath et.al.	2601.08403	translate	read	null
2026-01-13	Safe Heterogeneous Multi-Agent RL with Communication Regularization for Coordinated Target Acquisition	Gabriele Calzolari et.al.	2601.08327	translate	read	null
2026-01-13	AtomMem : Learnable Dynamic Agentic Memory with Atomic Memory Operation	Yupeng Huo et.al.	2601.08323	translate	read	null
2026-01-13	ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning	Kun Liang et.al.	2601.08310	translate	read	null
2026-01-13	D $^2$ Plan: Dual-Agent Dynamic Global Planning for Complex Retrieval-Augmented Reasoning	Kangcheng Luo et.al.	2601.08282	translate	read	null
2026-01-13	Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees	Kun Li et.al.	2601.08274	translate	read	null
2026-01-13	Unleashing Tool Engineering and Intelligence for Agentic AI in Next-Generation Communication Networks	Yinqiu Liu et.al.	2601.08259	translate	read	null
2026-01-13	Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks	Abdikarim Mohamed Ibrahim et.al.	2601.08254	translate	read	null
2026-01-13	Incorporating Cognitive Biases into Reinforcement Learning for Financial Decision-Making	Liu He et.al.	2601.08247	translate	read	null
2026-01-13	The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination	Haoran Su et.al.	2601.08237	translate	read	null
2026-01-13	Scalable Multiagent Reinforcement Learning with Collective Influence Estimation	Zhenglong Luo et.al.	2601.08210	translate	read	null
2026-01-13	ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms	Mohammad Pivezhandi et.al.	2601.08166	translate	read	null
2026-01-13	Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies	Zeyang Li et.al.	2601.08136	translate	read	null
2026-01-13	Structure Detection for Contextual Reinforcement Learning	Tianyue Zhou et.al.	2601.08120	translate	read	null
2026-01-13	STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order	Chengyang Gu et.al.	2601.08107	translate	read	null
2026-01-12	DRL-based Power Allocation in LiDAL-Assisted RLNC-NOMA OWC Systems	Ahmed A. Hassan et.al.	2601.08060	translate	read	null
2026-01-12	Forecast Aware Deep Reinforcement Learning for Efficient Electricity Load Scheduling in Dairy Farms	Nawazish Alia et.al.	2601.08052	translate	read	null
2026-01-12	Formalizing the Relationship between Hamilton-Jacobi Reachability and Reinforcement Learning	Prashant Solanki et.al.	2601.08050	translate	read	null
2026-01-12	FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures	Jifeng Song et.al.	2601.08026	translate	read	null
2026-01-12	Learning Better Error Correction Codes with Hybrid Quantum-Assisted Machine Learning	Yariv Yanay et.al.	2601.08014	translate	read	null
2026-01-12	Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety	Can Jin et.al.	2601.08000	translate	read	null
2026-01-12	Reinforcement Learning Methods for Neighborhood Selection in Local Search	Yannick Molinghen et.al.	2601.07948	translate	read	null
2026-01-12	Video Generation Models in Robotics – Applications, Research Challenges, Future Directions	Zhiting Mei et.al.	2601.07823	translate	read	null
2026-01-12	Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation	Huanyu Li et.al.	2601.07821	translate	read	null
2026-01-12	Data-driven control of hydraulic impact hammers under strict operational and control constraints	Francisco Leiva et.al.	2601.07813	translate	read	null
2026-01-12	Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning	Wei Fang et.al.	2601.07782	translate	read	null
2026-01-12	Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding	Yanxiang Huang et.al.	2601.07761	translate	read	null
2026-01-12	Hiking in the Wild: A Scalable Perceptive Parkour Framework for Humanoids	Shaoting Zhu et.al.	2601.07718	translate	read	null
2026-01-12	Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model	Siwen Jiao et.al.	2601.07695	translate	read	null
2026-01-12	Reinforcement Learning for Micro-Level Claims Reserving	Benjamin Avanzi et.al.	2601.07637	translate	read	null
2026-01-12	Clipped Affine Policy: Low-Complexity Near-Optimal Online Power Control for Energy Harvesting Communications over Fading Channels	Hao Wu et.al.	2601.07622	translate	read	null
2026-01-12	GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation	Dimple Vijay Kochar et.al.	2601.07593	translate	read	null
2026-01-12	Large Language Models for Physics Instrument Design	Sara Zoccheddu et.al.	2601.07580	translate	read	null
2026-01-12	Stagewise Reinforcement Learning and the Geometry of the Regret Landscape	Chris Elliott et.al.	2601.07524	translate	read	null
2026-01-12	Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions	Yongqi Li et.al.	2601.07516	translate	read	null
2026-01-12	Graph Inference Towards ICD Coding	Xiaoxiao Deng et.al.	2601.07496	translate	read	null
2026-01-12	Online Markov Decision Processes with Terminal Law Constraints	Bianca Marin Moreno et.al.	2601.07492	translate	read	null
2026-01-12	Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning	Sijia li et.al.	2601.07463	translate	read	null
2026-01-12	LOONG: Online Time-Optimal Autonomous Flight for MAVs in Cluttered Environments	Xin Guan et.al.	2601.07434	translate	read	null
2026-01-12	Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning	Ziheng Li et.al.	2601.07408	translate	read	null
2026-01-12	On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training	Xueyan Niu et.al.	2601.07389	translate	read	null
2026-01-12	OpenTinker: Separating Concerns in Agentic Reinforcement Learning	Siqi Zhu et.al.	2601.07376	translate	read	link
2026-01-12	Reward Modeling from Natural Language Human Feedback	Zongqi Wang et.al.	2601.07349	translate	read	null
2026-01-12	Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training	Xue Gong et.al.	2601.07320	translate	read	null
2026-01-12	Low-Altitude Satellite-AAV Collaborative Joint Mobile Edge Computing and Data Collection via Diffusion-based Deep Reinforcement Learning	Boxiong Wang et.al.	2601.07307	translate	read	null
2026-01-12	Heterogeneous Multi-Expert Reinforcement Learning for Long-Horizon Multi-Goal Tasks in Autonomous Forklifts	Yun Chen et.al.	2601.07304	translate	read	null
2026-01-12	Mimic Human Cognition, Master Multi-Image Reasoning: A Meta-Action Framework for Enhanced Visual Understanding	Jianghao Yin et.al.	2601.07298	translate	read	null
2026-01-12	LRAS: Advanced Legal Reasoning with Agentic Search	Yujin Zhou et.al.	2601.07296	translate	read	null
2026-01-12	ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios	Changzai Pan et.al.	2601.07280	translate	read	null
2026-01-12	The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents	Weihao Xuan et.al.	2601.07264	translate	read	null
2026-01-12	Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning	Hanbin Wang et.al.	2601.07238	translate	read	null
2026-01-12	Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration	Yang Zhao et.al.	2601.07224	translate	read	null
2026-01-12	Structured Reasoning for Large Language Models	Jinyi Han et.al.	2601.07180	translate	read	null
2026-01-12	Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization	Min Wang et.al.	2601.07164	translate	read	null
2026-01-12	AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units	Xinzi Cao et.al.	2601.07160	translate	read	null
2026-01-12	Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation (Extended Version)	Aja Khanal et.al.	2601.07152	translate	read	null
2026-01-12	Rewarding Creativity: A Human-Aligned Generative Reward Model for Reinforcement Learning in Storytelling	Zhaoyan Li et.al.	2601.07149	translate	read	null
2026-01-12	Generating readily synthesizable small molecule fluorophore scaffolds with reinforcement learning	Ruhi Sayana et.al.	2601.07145	translate	read	null
2026-01-12	Dynamics of Multi-Agent Actor-Critic Learning in Stochastic Games: from Multistability and Chaos to Stable Cooperation	Yuxin Geng et.al.	2601.07142	translate	read	null
2026-01-12	ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System	Sungguk Cha et.al.	2601.07125	translate	read	null
2026-01-12	ENTRA: Entropy-Based Redundancy Avoidance in Large Language Model Reasoning	Ruichu Cai et.al.	2601.07123	translate	read	null
2026-01-12	Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework	Yixiao Peng et.al.	2601.07122	translate	read	null
2026-01-12	Reward-Preserving Attacks For Robust Reinforcement Learning	Lucas Schott et.al.	2601.07118	translate	read	null
2026-01-12	MEDVISTAGYM: A Scalable Training Environment for Thinking with Medical Images via Tool-Integrated Reinforcement Learning	Meng Lu et.al.	2601.07107	translate	read	null
2026-01-11	X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests	Jie Wu et.al.	2601.06953	translate	read	link
2026-01-11	TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG	Tianhua Zhang et.al.	2601.06922	translate	read	null
2026-01-11	Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models	Shaoning Sun et.al.	2601.06911	translate	read	null
2026-01-11	Personality-Aware Reinforcement Learning for Persuasive Dialogue with LLM-Driven Simulation	Donghuo Zeng et.al.	2601.06877	translate	read	null
2026-01-11	A Brain-like Synergistic Core in LLMs Drives Behaviour and Learning	Pedro Urbina-Rodriguez et.al.	2601.06851	translate	read	null
2026-01-11	Code Evolution for Control: Synthesizing Policies via LLM-Driven Evolutionary Search	Ping Guo et.al.	2601.06845	translate	read	null
2026-01-11	Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy	Shujian Gao et.al.	2601.06801	translate	read	null
2026-01-11	Artificial Intelligence Driven Channel Coding and Resource Optimization for Wireless Networks	Yasir Ali et.al.	2601.06796	translate	read	null
2026-01-11	GDEPO: Group Dual-dynamic and Equal-right-advantage Policy Optimization with Enhanced Training Data Utilization for Sample-Constrained Reinforcement Learning	Zhengqing Yan et.al.	2601.06795	translate	read	null
2026-01-11	No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning	Zhicong Li et.al.	2601.06794	translate	read	null
2026-01-11	ImmuniFraug: A Metacognitive Intervention Anti-Fraud Approach to Enhance Undergraduate Students’ Cyber Fraud Awareness	Xiangzhe Yuan et.al.	2601.06774	translate	read	null
2026-01-11	GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO	Shubhashis Roy Dipta et.al.	2601.06767	translate	read	null
2026-01-11	On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning	Changyu Liu et.al.	2601.06748	translate	read	null
2026-01-10	Characterising Toxicity in Generative Large Language Models	Zhiyao Zhang et.al.	2601.06700	translate	read	null
2026-01-10	Plasticity vs. Rigidity: The Impact of Low-Rank Adapters on Reasoning on a Micro-Budget	Zohaib Khan et.al.	2601.06677	translate	read	null
2026-01-10	Reinforcement Learning-Guided Dynamic Multi-Graph Fusion for Evacuation Traffic Prediction	Md Nafees Fuad Rafi et.al.	2601.06664	translate	read	null
2026-01-10	KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks	Zhangqi Duan et.al.	2601.06633	translate	read	null
2026-01-10	Object-Centric World Models Meet Monte Carlo Tree Search	Rodion Vakhitov et.al.	2601.06604	translate	read	null
2026-01-10	ArrowGEV: Grounding Events in Video via Learning the Arrow of Time	Fangxu Yu et.al.	2601.06559	translate	read	null
2026-01-10	Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODASER) for Safe Reinforcement Learning in Optimal Control	Roya Khalili Amirabadi et.al.	2601.06540	translate	read	null
2026-01-10	Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection	Minghui Jia et.al.	2601.06498	translate	read	link
2026-01-10	ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking	Qiang Zhang et.al.	2601.06487	translate	read	link
2026-01-10	Coupling Smoothed Particle Hydrodynamics with Multi-Agent Deep Reinforcement Learning for Cooperative Control of Point Absorbers	Yi Zhan et.al.	2601.06485	translate	read	null
2026-01-10	Deep Reinforcement Learning based Control Design for Aircraft Recovery from Loss-of-Control Scenario	Imran Sayyed et.al.	2601.06439	translate	read	null
2026-01-10	LSRIF: Logic-Structured Reinforcement Learning for Instruction Following	Qingyu Ren et.al.	2601.06431	translate	read	null
2026-01-10	Lightweight Yet Secure: Secure Scripting Language Generation via Lightweight LLMs	Keyang Zhang et.al.	2601.06419	translate	read	null
2026-01-10	Dynamic Incentivized Cooperation under Changing Rewards	Philipp Altmann et.al.	2601.06382	translate	read	null
2026-01-09	Future-as-Label: Scalable Supervision from Real-World Outcomes	Benjamin Turtel et.al.	2601.06336	translate	read	null
2026-01-09	The pros and cons of using deep reinforcement learning or genetic algorithms to design control schemes for quantum state transfer on qubit chains	Sofía Perón Santana et.al.	2601.06303	translate	read	null
2026-01-09	How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning?	Yufeng Wang et.al.	2601.06289	translate	read	null
2026-01-09	Walk the PLANC: Physics-Guided RL for Agile Humanoid Locomotion on Constrained Footholds	Min Dai et.al.	2601.06286	translate	read	null
2026-01-09	Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization	Miao Pan et.al.	2601.06224	translate	read	null
2026-01-09	Toward Safe and Responsible AI Agents: A Three-Pillar Model for Transparency, Accountability, and Trustworthiness	Edward C. Cheng et.al.	2601.06223	translate	read	null
2026-01-08	TimeGNN-Augmented Hybrid-Action MARL for Fine-Grained Task Partitioning and Energy-Aware Offloading in MEC	Wei Ai et.al.	2601.06191	translate	read	null
2026-01-07	TIR-Flow: Active Video Search and Reasoning with Frozen VLMs	Hongbo Jin et.al.	2601.06176	translate	read	null
2026-01-06	HiMeS: Hippocampus-inspired Memory System for Personalized AI Assistants	Hailong Li et.al.	2601.06152	translate	read	null
2026-01-05	A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control	Wonhyeok Choi et.al.	2601.06133	translate	read	null
2026-01-09	Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards	Jiajie Zhang et.al.	2601.06021	translate	read	link
2026-01-09	TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents	Dawei Wang et.al.	2601.05899	translate	read	link
2026-01-09	StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management	Ruizhe Zhang et.al.	2601.05890	translate	read	null
2026-01-09	IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck	Huilin Deng et.al.	2601.05870	translate	read	null
2026-01-09	Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning	Kaichen Shen et.al.	2601.05868	translate	read	null
2026-01-09	Intelligent Singularity Avoidance in UR10 Robotic Arm Path Planning Using Hybrid Fuzzy Logic and Reinforcement Learning	Sheng-Kai Chen et.al.	2601.05836	translate	read	null
2026-01-09	EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis	Xiaoshuai Song et.al.	2601.05808	translate	read	link
2026-01-09	From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation	Zezhou Wang et.al.	2601.05787	translate	read	link
2026-01-09	SketchVL: Policy Optimization via Fine-Grained Credit Assignment for Chart Understanding and More	Muye Huang et.al.	2601.05688	translate	read	null
2026-01-09	CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space	Bingyi Liu et.al.	2601.05675	translate	read	null
2026-01-09	EvoQRE: Modeling Bounded Rationality in Safety-Critical Traffic Simulation via Evolutionary Quantal Response Equilibrium	Phu-Hoa Pham et.al.	2601.05653	translate	read	null
2026-01-09	GIFT: Games as Informal Training for Generalizable LLMs	Nuoyan Lyu et.al.	2601.05633	translate	read	null
2026-01-09	Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks	ShaoZhen Liu et.al.	2601.05616	translate	read	null
2026-01-09	Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR	Zijun Min et.al.	2601.05607	translate	read	null
2026-01-09	PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning	Jingcheng Hu et.al.	2601.05593	translate	read	link
2026-01-09	Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection	Cooper Lin et.al.	2601.05578	translate	read	null
2026-01-09	Autonomous Discovery of the Ising Model’s Critical Parameters with Reinforcement Learning	Hai Man et.al.	2601.05577	translate	read	null
2026-01-09	WildSci: Advancing Scientific Reasoning from In-the-Wild Literature	Tengxiao Liu et.al.	2601.05567	translate	read	null
2026-01-09	Closing the Modality Reasoning Gap for Speech Large Language Models	Chaoren Wang et.al.	2601.05543	translate	read	null
2026-01-09	LEAPS: An LLM-Empowered Adaptive Plugin for Taobao AI Search	Lei Wang et.al.	2601.05513	translate	read	null
2026-01-09	How Exploration Breaks Cooperation in Shared-Policy Multi-Agent Reinforcement Learning	Yi-Ning Weng et.al.	2601.05509	translate	read	null
2026-01-09	MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards	Zhiyu Shen et.al.	2601.05488	translate	read	null
2026-01-09	MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization	Jiefu Ou et.al.	2601.05475	translate	read	null
2026-01-09	Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning	Zhaoqi Wang et.al.	2601.05466	translate	read	null
2026-01-09	PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering	Yu Liu et.al.	2601.05465	translate	read	null
2026-01-09	Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction	Hongjin Kim et.al.	2601.05459	translate	read	null
2026-01-08	Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization	Yuxiang Ji et.al.	2601.05432	translate	read	link
2026-01-08	Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning	Minwoo Cho et.al.	2601.05407	translate	read	null
2026-01-08	Imitation Learning for Combinatorial Optimisation under Uncertainty	Prakash Gawas et.al.	2601.05383	translate	read	null
2026-01-05	On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis	Hector Zenil et.al.	2601.05280	translate	read	null
2026-01-08	RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes	Yuan-Kang Lee et.al.	2601.05249	translate	read	link
2026-01-08	GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization	Shih-Yang Liu et.al.	2601.05242	translate	read	link
2026-01-08	EARL: Energy-Aware Optimization of Liquid State Machines for Pervasive AI	Zain Iqbal et.al.	2601.05205	translate	read	null
2026-01-08	SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning	Yanchang Liang et.al.	2601.05187	translate	read	null
2026-01-08	Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems	Jihao Zhao et.al.	2601.05171	translate	read	null
2026-01-08	Safe Continual Reinforcement Learning Methods for Nonstationary Environments. Towards a Survey of the State of the Art	Timofey Tomashevskiy et.al.	2601.05152	translate	read	null
2026-01-08	Unitary fault-tolerant encoding of Pauli states in surface codes	Luis Colmenarez et.al.	2601.05113	translate	read	null
2026-01-08	Reinforced Efficient Reasoning via Semantically Diverse Exploration	Ziqi Zhao et.al.	2601.05053	translate	read	link
2026-01-08	Hán Dān Xué Bù (Mimicry) or Qīng Chū Yú Lán (Mastery)? A Cognitive Perspective on Reasoning Distillation in Large Language Models	Yueqing Hu et.al.	2601.05019	translate	read	null
2026-01-08	On the Hidden Objective Biases of Group-based Reinforcement Learning	Aleksandar Fontana et.al.	2601.05002	translate	read	null
2026-01-08	AlgBench: To What Extent Do Large Reasoning Models Understand Algorithms?	Henan Sun et.al.	2601.04996	translate	read	null
2026-01-08	A DQN-based model for intelligent network selection in heterogeneous wireless systems	Fayssal Bendaoud et.al.	2601.04978	translate	read	null
2026-01-08	ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning	Minda Hu et.al.	2601.04973	translate	read	null
2026-01-08	Text as a Universal Interface for Transferable Personalization	Yuting Liu et.al.	2601.04963	translate	read	null
2026-01-08	Safe Reinforcement Learning Beyond Baseline Control: A Hierarchical Framework for Space Triangle Tethered Formation System	Xinyi Tao et.al.	2601.04957	translate	read	null
2026-01-08	Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following	Yirong Zeng et.al.	2601.04954	translate	read	null
2026-01-08	SKATER: Synthesized Kinematics for Advanced Traversing Efficiency on a Humanoid Robot via Roller Skate Swizzles	Junchi Gu et.al.	2601.04948	translate	read	null
2026-01-08	Deep Reinforcement Learning for Optimum Order Execution: Mitigating Risk and Maximizing Returns	Khabbab Zakaria et.al.	2601.04896	translate	read	null
2026-01-08	Flexible Manufacturing Systems Intralogistics: Dynamic Optimization of AGVs and Tool Sharing Using Coloured-Timed Petri Nets and Actor-Critic RL with Actions Masking	Sofiene Lassoued et.al.	2601.04887	translate	read	null
2026-01-08	RAAR: Retrieval Augmented Agentic Reasoning for Cross-Domain Misinformation Detection	Zhiwei Liu et.al.	2601.04853	translate	read	null
2026-01-08	Intelligent resource allocation in wireless networks via deep reinforcement learning	Marie Diane Iradukunda et.al.	2601.04842	translate	read	null
2026-01-08	SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning	Caijun Xu et.al.	2601.04809	translate	read	link
2026-01-08	Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning	Siyuan Gan et.al.	2601.04805	translate	read	null
2026-01-08	AgentOCR: Reimagining Agent History via Optical Self-Compression	Lang Feng et.al.	2601.04786	translate	read	null
2026-01-08	AT $^2$ PO: Agentic Turn-based Policy Optimization via Tree Search	Zefang Zong et.al.	2601.04767	translate	read	link
2026-01-08	AM $^3$ Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs	Han Zhu et.al.	2601.04736	translate	read	null
2026-01-08	ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving	Chang Zhao et.al.	2601.04714	translate	read	null
2026-01-08	TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning	Yinuo Wang et.al.	2601.04698	translate	read	null
2026-01-08	A Method for Constructing a Digital Transformation Driving Mechanism Based on Semantic Understanding of Large Models	Huayi Liu et.al.	2601.04696	translate	read	null
2026-01-08	Tape: A Cellular Automata Benchmark for Evaluating Rule-Shift Generalization in Reinforcement Learning	Enze Pan et.al.	2601.04695	translate	read	null
2026-01-08	ResMAS: Resilience Optimization in LLM-based Multi-agent Systems	Zhilun Zhou et.al.	2601.04694	translate	read	null
2026-01-08	Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead	Oluwatosin Oseni et.al.	2601.04686	translate	read	null
2026-01-08	Agri-R1: Empowering Generalizable Agricultural Reasoning in Vision-Language Models with Reinforcement Learning	Wentao Zhang et.al.	2601.04672	translate	read	null
2026-01-08	Learning Dynamics in RL Post-Training for Language Models	Akiyoshi Tomihari et.al.	2601.04670	translate	read	null
2026-01-08	Optimizing Path Planning using Deep Reinforcement Learning for UGVs in Precision Agriculture	Laukik Patade et.al.	2601.04668	translate	read	null
2026-01-08	Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization	Mizanur Rahman et.al.	2601.04582	translate	read	link
2026-01-08	Reasoning Over Space: Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation	Dongyi Lv et.al.	2601.04562	translate	read	null
2026-01-08	Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training	Tianle Wang et.al.	2601.04537	translate	read	null
2026-01-08	GRACE: Reinforcement Learning for Grounded Response and Abstention under Contextual Evidence	Yibo Zhao et.al.	2601.04525	translate	read	null
2026-01-08	TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation	Jacob Ede Levine et.al.	2601.04521	translate	read	null
2026-01-08	Multiagent Reinforcement Learning with Neighbor Action Estimation	Zhenglong Luo et.al.	2601.04511	translate	read	null
2026-01-07	Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization	Xingjian Diao et.al.	2601.04442	translate	read	null
2026-01-07	Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization	Matthew Landers et.al.	2601.04441	translate	read	null
2026-01-07	Rate or Fate? RLV $^\varepsilon$ R: Reinforcement Learning with Verifiable Noisy Rewards	Ali Rad et.al.	2601.04411	translate	read	null
2026-01-07	Transformer-based Multi-agent Reinforcement Learning for Separation Assurance in Structured and Unstructured Airspaces	Arsyi Aziz et.al.	2601.04401	translate	read	null
2026-01-07	Enhanced-FQL( $λ$ ), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay	Mohsen Jalaeian-Farimani et.al.	2601.04392	translate	read	null
2026-01-07	Survival Dynamics of Neural and Programmatic Policies in Evolutionary Reinforcement Learning	Anton Roupassov-Ruiz et.al.	2601.04365	translate	read	null
2026-01-07	Online Action-Stacking Improves Reinforcement Learning Performance for Air Traffic Control	Ben Carvell et.al.	2601.04287	translate	read	null
2026-01-07	A Future Capabilities Agent for Tactical Air Traffic Control	Paul Kent et.al.	2601.04285	translate	read	null
2026-01-07	Making Tunable Parameters State-Dependent in Weather and Climate Models with Reinforcement Learning	Pritthijit Nath et.al.	2601.04268	translate	read	null
2026-01-06	Cross-Language Speaker Attribute Prediction Using MIL and RL	Sunny Shu et.al.	2601.04257	translate	read	null
2026-01-07	Hierarchical GNN-Based Multi-Agent Learning for Dynamic Queue-Jump Lane and Emergency Vehicle Corridor Formation	Haoran Su et.al.	2601.04177	translate	read	null
2026-01-07	Agentic Rubrics as Contextual Verifiers for SWE Agents	Mohit Raghavendra et.al.	2601.04171	translate	read	null
2026-01-07	InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training	Ziyun Zhang et.al.	2601.04126	translate	read	null
2026-01-07	GeoReason: Aligning Thinking And Answering In Remote Sensing Vision-Language Models Via Logical Consistency Reinforcement Learning	Wenshuai Li et.al.	2601.04118	translate	read	null
2026-01-07	Cells on Autopilot: Adaptive Cell (Re)Selection via Reinforcement Learning	Marvin Illian et.al.	2601.04083	translate	read	null
2026-01-07	Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model	Yuan Wang et.al.	2601.04033	translate	read	null
2026-01-07	On-Device Deep Reinforcement Learning for Decentralized Task Offloading Performance trade-offs in the training process	Gorka Nieto et.al.	2601.03976	translate	read	null
2026-01-07	Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models	Wei Wu et.al.	2601.03969	translate	read	null
2026-01-07	CoINS: Counterfactual Interactive Navigation via Skill-Aware VLM	Kangjie Zhou et.al.	2601.03956	translate	read	null
2026-01-07	Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification	Rui Sun et.al.	2601.03948	translate	read	null
2026-01-07	Adaptive-Boundary-Clipping GRPO: Ensuring Bounded Ratios for Stable and Generalizable Training	Chi Liu et.al.	2601.03895	translate	read	null
2026-01-07	IndexTTS 2.5 Technical Report	Yunpei Li et.al.	2601.03888	translate	read	null
2026-01-07	Staged Voxel-Level Deep Reinforcement Learning for 3D Medical Image Segmentation with Noisy Annotations	Yuyang Fu et.al.	2601.03875	translate	read	null
2026-01-07	Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning	Fei Wu et.al.	2601.03823	translate	read	null
2026-01-07	ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition	Muyang Zhao et.al.	2601.03822	translate	read	null
2026-01-07	From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs	Usha Shrestha et.al.	2601.03808	translate	read	null
2026-01-07	NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning	Zhongtao Miao et.al.	2601.03790	translate	read	null
2026-01-07	MVP: Enhancing Video Large Language Models via Self-supervised Masked Video Prediction	Xiaokun Sun et.al.	2601.03781	translate	read	null
2026-01-07	O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL	Yi Yao et.al.	2601.03743	translate	read	null
2026-01-07	EDCO: Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning	Jing-Cheng Pang et.al.	2601.03725	translate	read	null
2026-01-07	ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization	Shijie Zhang et.al.	2601.03723	translate	read	null
2026-01-07	R $^3$ L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification	Weijie Shi et.al.	2601.03715	translate	read	link
2026-01-07	TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL	Lang Cao et.al.	2601.03703	translate	read	null
2026-01-07	Dual-Attention Heterogeneous GNN for Multi-robot Collaborative Area Search via Deep Reinforcement Learning	Lina Zhu et.al.	2601.03686	translate	read	null
2026-01-07	Accounting for Optimal Control in the Sizing of Isolated Hybrid Renewable Energy Systems Using Imitation Learning	Simon Halvdansson et.al.	2601.03679	translate	read	null
2026-01-07	Sandwich Reasoning: An Answer-Reasoning-Answer Approach for Low-Latency Query Correction	Chen Zhang et.al.	2601.03672	translate	read	null
2026-01-07	AMIR-GRPO: Inducing Implicit Preference Signals into GRPO	Amir Hossein Yari et.al.	2601.03661	translate	read	null
2026-01-07	ReLA: Representation Learning and Aggregation for Job Scheduling with Reinforcement Learning	Zhengyi Kwan et.al.	2601.03646	translate	read	null
2026-01-07	Locomotion Beyond Feet	Tae Hoon Yang et.al.	2601.03607	translate	read	null
2026-01-07	Interleaved Tool-Call Reasoning for Protein Function Understanding	Chuanliu Fan et.al.	2601.03604	translate	read	null
2026-01-07	From Score to Sound: An End-to-End MIDI-to-Motion Pipeline for Robotic Cello Performance	Samantha Sudhoff et.al.	2601.03562	translate	read	null
2026-01-07	SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models	Yuxuan Jiang et.al.	2601.03555	translate	read	null
2026-01-07	VeRPO: Verifiable Dense Reward Policy Optimization for Code Generation	Longwen Wang et.al.	2601.03525	translate	read	null
2026-01-07	A Reinforcement Learning-Based Model for Mapping and Goal-Directed Navigation Using Multiscale Place Fields	Bekarys Dukenbaev et.al.	2601.03520	translate	read	null
2026-01-07	Semantic Belief-State World Model for 3D Human Motion Prediction	Sarim Chaudhry et.al.	2601.03517	translate	read	null
2026-01-07	Adaptive Model-Based Reinforcement Learning for Orbit Feedback Control in NSLS-II Storage Ring	Zeyu Dong et.al.	2601.03486	translate	read	null
2026-01-06	Understanding Reward Hacking in Text-to-Image Reinforcement Learning	Yunqi Hong et.al.	2601.03468	translate	read	null
2026-01-06	ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing	Hengjia Li et.al.	2601.03467	translate	read	null
2026-01-06	FIRE-VLM: A Vision-Language-Driven Reinforcement Learning Framework for UAV Wildfire Tracking in a Physics-Grounded Fire Digital Twin	Chris Webb et.al.	2601.03449	translate	read	null
2026-01-06	Foundation Model-Aided Hierarchical Control for Robust RIS-Assisted Near-Field Communications	Mohammad Ghassemi et.al.	2601.03427	translate	read	null
2026-01-06	Sensor to Pixels: Decentralized Swarm Gathering via Image-Based Reinforcement Learning	Yigal Koifman et.al.	2601.03413	translate	read	null
2026-01-06	Exploration Through Introspection: A Self-Aware Reward Model	Michael Petrowski et.al.	2601.03389	translate	read	null
2026-01-06	Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting	Kun Zhao et.al.	2601.03321	translate	read	null
2026-01-06	Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning	Yu Luo et.al.	2601.03320	translate	read	null
2026-01-06	Mastering the Game of Go with Self-play Experience Replay	Jingbin Liu et.al.	2601.03306	translate	read	null
2026-01-06	Autonomous Threat Detection and Response in Cloud Security: A Comprehensive Survey of AI-Driven Strategies	Gaurav Sarraf et.al.	2601.03303	translate	read	null
2026-01-06	PC2P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception	Guotao Li et.al.	2601.03301	translate	read	null
2026-01-06	STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning	Juntong Ni et.al.	2601.03248	translate	read	null
2026-01-06	Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion	Mykola Vysotskyi et.al.	2601.03213	translate	read	null
2026-01-06	UltraLogic: Enhancing LLM Reasoning through Large-Scale Data Synthesis and Bipolar Float Reward	Yile Liu et.al.	2601.03205	translate	read	null
2026-01-06	MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory	Shengtao Zhang et.al.	2601.03192	translate	read	null
2026-01-06	WebAnchor: Anchoring Agent Planning to Stabilize Long-Horizon Web Reasoning	Xinmiao Yu et.al.	2601.03164	translate	read	null
2026-01-06	Unified Thinker: A General Reasoning Modular Core for Image Generation	Sashuai Zhou et.al.	2601.03127	translate	read	null
2026-01-06	One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling	Yiyuan Li et.al.	2601.03111	translate	read	null
2026-01-06	Post-Decision State-Based Online Learning for Delay-Energy-Aware Flow Allocation in Wireless Systems	Mahesh Ganesh Bhat et.al.	2601.03108	translate	read	null
2026-01-06	IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation	Yankai Jiang et.al.	2601.03054	translate	read	null
2026-01-06	SOP: A Scalable Online Post-Training System for Vision-Language-Action Models	Mingjie Pan et.al.	2601.03044	translate	read	null
2026-01-06	Dementia-R1: Reinforced Pretraining and Reasoning from Unstructured Clinical Notes for Real-World Dementia Prognosis	Choonghan Kim et.al.	2601.03018	translate	read	null
2026-01-06	In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior	Anaïs Berkes et.al.	2601.03015	translate	read	null
2026-01-06	Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning	Yuankun Xie et.al.	2601.02983	translate	read	null
2026-01-06	Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning	Nathanaël Carraz Rakotonirina et.al.	2601.02972	translate	read	null
2026-01-06	The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models	Yuhuan You et.al.	2601.02954	translate	read	null
2026-01-06	Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning	Guoqiang Liang et.al.	2601.02918	translate	read	null
2026-01-06	ChemBART: A Pre-trained BART Model Assisting Organic Chemistry Analysis	Kenan Li et.al.	2601.02915	translate	read	null
2026-01-06	SimRPD: Optimizing Recruitment Proactive Dialogue Agents through Simulator-Based Data Evaluation and Selection	Zhiyong Cao et.al.	2601.02871	translate	read	null
2026-01-06	Sample-Efficient Neurosymbolic Deep Reinforcement Learning	Celeste Veronese et.al.	2601.02850	translate	read	null
2026-01-06	SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models	Ruiyang Zhang et.al.	2601.02825	translate	read	null
2026-01-06	Reinforcement Learning for Follow-the-Leader Robotic Endoscopic Navigation via Synthetic Data	Sicong Gao et.al.	2601.02798	translate	read	null
2026-01-06	MiMo-V2-Flash Technical Report	Xiaomi LLM-Core Team et.al.	2601.02780	translate	read	null
2026-01-06	Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation	Zhe Zhao et.al.	2601.02778	translate	read	null
2026-01-06	Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies	Mingming Zhang et.al.	2601.02754	translate	read	null
2026-01-06	Time-Scaling Is What Agents Need Now	Zhi Liu et.al.	2601.02714	translate	read	null
2026-01-06	Inferring Causal Graph Temporal Logic Formulas to Expedite Reinforcement Learning in Temporally Extended Tasks	Hadi Partovi Aria et.al.	2601.02666	translate	read	null
2026-01-06	Effective Online 3D Bin Packing with Lookahead Parcels Using Monte Carlo Tree Search	Jiangyi Fang et.al.	2601.02649	translate	read	null
2026-01-05	SWaRL: Safeguard Code Watermarking via Reinforcement Learning	Neusha Javidnia et.al.	2601.02602	translate	read	null
2026-01-05	Textual Explanations and Their Evaluations for Reinforcement Learning Policy	Ahmad Terra et.al.	2601.02514	translate	read	null
2026-01-05	LLM-Enhanced Reinforcement Learning for Time Series Anomaly Detection	Bahareh Golchin et.al.	2601.02511	translate	read	null
2026-01-05	WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks	Hao Bai et.al.	2601.02439	translate	read	null
2026-01-05	Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes	Jing Tan et.al.	2601.02356	translate	read	null
2026-01-05	VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation	Shikun Sun et.al.	2601.02256	translate	read	null
2026-01-05	Enabling Deep Reinforcement Learning Research for Energy Saving in Open RAN	Matteo Bordin et.al.	2601.02240	translate	read	null
2026-01-05	NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation	Huichao Zhang et.al.	2601.02204	translate	read	null
2026-01-05	CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents	Keyu Wang et.al.	2601.02201	translate	read	null
2026-01-05	ACDZero: MCTS Agent for Mastering Automated Cyber Defense	Yu Li et.al.	2601.02196	translate	read	null
2026-01-05	Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting	Muxi Diao et.al.	2601.02151	translate	read	null
2026-01-05	MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics	Zhuofan Shi et.al.	2601.02075	translate	read	null
2026-01-05	Reinforcement Learning Based Computationally Efficient Conditional Choice Simulation Estimation of Dynamic Discrete Choice Models	Ahmed Khwaja et.al.	2601.02069	translate	read	null
2026-01-05	Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management	Faizan Ahmed et.al.	2601.02061	translate	read	null
2026-01-05	GDRO: Group-level Reward Post-training Suitable for Diffusion Models	Yiyang Wang et.al.	2601.02036	translate	read	null
2026-01-05	AgentVNE: LLM-Augmented Graph Reinforcement Learning for Affinity-Aware Multi-Agent Placement in Edge Agentic AI	Runze Zheng et.al.	2601.02021	translate	read	null
2026-01-05	Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation	Weijian Ma et.al.	2601.01984	translate	read	null
2026-01-05	Distorted Distributional Policy Evaluation for Offline Reinforcement Learning	Ryo Iwaki et.al.	2601.01917	translate	read	null
2026-01-05	Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning	Yuxuan Li et.al.	2601.01904	translate	read	null
2026-01-05	Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents	Yi Yu et.al.	2601.01885	translate	read	null
2026-01-05	DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs	Jinghan Ru et.al.	2601.01868	translate	read	null
2026-01-05	Moments Matter:Stabilizing Policy Optimization using Return Distributions	Dennis Jabs et.al.	2601.01803	translate	read	null
2026-01-05	PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor	Qianjun Pan et.al.	2601.01802	translate	read	null
2026-01-05	Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving	Qi Wei et.al.	2601.01800	translate	read	null
2026-01-05	SRAS: A Lightweight Reinforcement Learning-based Document Selector for Edge-Native RAG Pipelines	Rajiv Chaitanya Muttur et.al.	2601.01785	translate	read	null
2026-01-05	Reinforcement Learning for Option Hedging: Static Implied-Volatility Fit versus Shortfall-Aware Performance	Ziheng Chen et.al.	2601.01709	translate	read	null
2026-01-04	All-Optical Deep Learning with Quantum Nonlinearity	Qingyi Zhou et.al.	2601.01690	translate	read	null
2026-01-04	Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives	Wei Liu et.al.	2601.01665	translate	read	null
2026-01-04	DemoBot: Efficient Learning of Bimanual Manipulation with Dexterous Hands From Third-Person Human Videos	Yucheng Xu et.al.	2601.01651	translate	read	null
2026-01-04	Action-Sketcher: From Reasoning to Action via Visual Sketches for Long-Horizon Robotic Manipulation	Huajie Tan et.al.	2601.01618	translate	read	null
2026-01-04	HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller	Tran Tien Dat et.al.	2601.01577	translate	read	null
2026-01-04	Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement	Mingyu Xu et.al.	2601.01562	translate	read	null
2026-01-04	Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization	Xinyu Qiu et.al.	2601.01483	translate	read	null
2026-01-04	Programmable ultra-broadband photonic chaos platform enabled by microwave-chaos-driven electro-optic frequency combs	Shiyu Shi et.al.	2601.01440	translate	read	null
2026-01-04	Context-Aware Information Transfer via Digital Semantic Communication in UAV-Based Networks	Poorvi Joshi et.al.	2601.01430	translate	read	null
2026-01-04	SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving	Chaofan Tao et.al.	2601.01426	translate	read	null
2026-01-04	DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer	Xu Guo et.al.	2601.01425	translate	read	null
2026-01-04	SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning	Peidong Wang et.al.	2601.01392	translate	read	null
2026-01-03	dataRLsec: Safety, Security, and Reliability With Robust Offline Reinforcement Learning for DPAs	Shriram KS Pandian et.al.	2601.01289	translate	read	null
2026-01-03	PyBatchRender: A Python Library for Batched 3D Rendering at Up to One Million FPS	Evgenii Rudakov et.al.	2601.01288	translate	read	null
2026-01-03	Harnessing Environmental Memory with Reinforcement Learning in Open Quantum Systems	Safae Gaidi et.al.	2601.01252	translate	read	null
2026-01-03	OrchestrRL: Dynamic Compute and Network Orchestration for Disaggregated RL	Xin Tan et.al.	2601.01209	translate	read	null
2026-01-03	Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering	Wuzhenghong Wen et.al.	2601.01195	translate	read	null
2026-01-03	SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards	Suryansh Singh Sijwali et.al.	2601.01184	translate	read	null
2026-01-03	Reinforcement Learning Based Whittle Index Policy for Scheduling Wireless Sensors	Sokipriala Jonah et.al.	2601.01179	translate	read	null
2026-01-03	ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation	Zhang Shizhe et.al.	2601.01155	translate	read	null
2026-01-03	Latent Space Reinforcement Learning for Multi-Robot Exploration	Sriram Rajasekar et.al.	2601.01139	translate	read	null
2026-01-03	Performance and Security Aware Distributed Service Placement in Fog Computing	Mohammad Goudarzi et.al.	2601.01125	translate	read	null
2026-01-02	DVGBench: Implicit-to-Explicit Visual Grounding Benchmark in UAV Imagery with Large Vision-Language Models	Yue Zhou et.al.	2601.00998	translate	read	null
2026-01-02	Materials Informatics: Emergence To Autonomous Discovery In The Age Of AI	Turab Lookman et.al.	2601.00742	translate	read	null
2026-01-02	Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty	Uğurcan Özalp et.al.	2601.00737	translate	read	null
2026-01-02	Precision Autotuning for Linear Solvers via Reinforcement Learning	Erin Carson et.al.	2601.00728	translate	read	null
2026-01-02	ARISE: Adaptive Reinforcement Integrated with Swarm Exploration	Rajiv Chaitanya M et.al.	2601.00693	translate	read	null
2026-01-02	IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning	Haonan Song et.al.	2601.00677	translate	read	null
2026-01-02	RoboReward: General-Purpose Vision-Language Reward Models for Robotics	Tony Lee et.al.	2601.00675	translate	read	null
2026-01-02	Integrating Multi-Armed Bandit, Active Learning, and Distributed Computing for Scalable Optimization	Foo Hui-Mean et.al.	2601.00615	translate	read	null
2026-01-02	Vision-based Goal-Reaching Control for Mobile Robots Using a Hierarchical Learning Framework	Mehdi Heydari Shahna et.al.	2601.00610	translate	read	null
2026-01-02	Traffic-Aware Optimal Taxi Placement Using Graph Neural Network-Based Reinforcement Learning	Sonia Khetarpaul et.al.	2601.00607	translate	read	null
2026-01-02	Parametrized Sharing for Multi-Agent Hybrid DRL for Multiple Multi-Functional RISs-Aided Downlink NOMA Networks	Chi-Te Kuo et.al.	2601.00538	translate	read	null
2026-01-01	CPPO: Contrastive Perception for Vision Language Policy Optimization	Ahmad Rezaei et.al.	2601.00501	translate	read	null
2026-01-01	Safe Adaptive Feedback Control via Barrier States	Trivikram Satharasi et.al.	2601.00476	translate	read	null
2026-01-01	Imitation from Observations with Trajectory-Level Generative Embeddings	Yongtao Qu et.al.	2601.00452	translate	read	null
2026-01-01	Modelling cultural evolution	Fredrik Jansson et.al.	2601.00433	translate	read	null
2026-01-01	E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models	Shengjun Zhang et.al.	2601.00423	translate	read	null
2026-01-01	Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach	Biao Wu et.al.	2601.00388	translate	read	null
2026-01-01	Multiagent Reinforcement Learning for Liquidity Games	Alicia Vidler et.al.	2601.00324	translate	read	null
2026-01-01	Offline Multi-Agent Reinforcement Learning for 6G Communications: Fundamentals, Applications and Future Directions	Eslam Eldeeb et.al.	2601.00321	translate	read	null
2026-01-01	Can Optimal Transport Improve Federated Inverse Reinforcement Learning?	David Millard et.al.	2601.00309	translate	read	null
2026-01-01	Next Generation Intelligent Low-Altitude Economy Deployments: The O-RAN Perspective	Aly Sabri Abdalla et.al.	2601.00257	translate	read	null
2026-01-01	Modern Neuromorphic AI: From Intra-Token to Inter-Token Processing	Osvaldo Simeone et.al.	2601.00245	translate	read	null
2026-01-01	From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning	Omar Sharif et.al.	2601.00215	translate	read	null
2026-01-01	Reinforcement-Learned Unequal Error Protection for Quantized Semantic Embeddings	Moirangthem Tiken Singh et.al.	2601.00186	translate	read	null
2026-01-01	Online Finetuning Decision Transformers with Pure RL Gradients	Junkai Luo et.al.	2601.00167	translate	read	null
2026-01-01	Reinforcement Learning with Function Approximation for Non-Markov Processes	Ali Devran Kara et.al.	2601.00151	translate	read	null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)