Reinforcement Learning - 2025-11

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-11-06	FoodRL: A Reinforcement Learning Ensembling Framework For In-Kind Food Donation Forecasting	Esha Sharma et.al.	2511.04865	translate	read	null
2025-11-06	Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning	Thore Gerlach et.al.	2511.04856	translate	read	null
2025-11-06	Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning	NVIDIA et.al.	2511.04831	translate	read	null
2025-11-06	Unified Multimodal Diffusion Forcing for Forceful Manipulation	Zixuan Huang et.al.	2511.04812	translate	read	null
2025-11-06	Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models	Chenxi Liu et.al.	2511.04800	translate	read	null
2025-11-05	SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change Memory	Mahek Desai et.al.	2511.04713	translate	read	null
2025-11-05	NCSAC: Effective Neural Community Search via Attribute-augmented Conductance	Longlong Lin et.al.	2511.04712	translate	read	null
2025-11-06	GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction	Qingzhou Lu et.al.	2511.04679	translate	read	null
2025-11-06	Forgetting is Everywhere	Ben Sanati et.al.	2511.04666	translate	read	null
2025-11-06	Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning	Hampus Åström et.al.	2511.04598	translate	read	null
2025-11-06	End-to-End Reinforcement Learning of Koopman Models for eNMPC of an Air Separation Unit	Daniel Mayfrank et.al.	2511.04522	translate	read	null
2025-11-06	V-Thinker: Interactive Thinking with Images	Runqi Qiao et.al.	2511.04460	translate	read	null
2025-11-06	Fitting Reinforcement Learning Model to Behavioral Data under Bandits	Hao Zhu et.al.	2511.04454	translate	read	null
2025-11-06	The Peril of Preference: Why GRPO fails on Ordinal Rewards	Anisha Garg et.al.	2511.04439	translate	read	null
2025-11-06	Temporal Action Selection for Action Chunking	Yueyang Weng et.al.	2511.04421	translate	read	null
2025-11-06	GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies	Maëlic Neau et.al.	2511.04357	translate	read	null
2025-11-06	MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments	Kuankuan Sima et.al.	2511.04320	translate	read	null
2025-11-06	GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents	Jian Mu et.al.	2511.04307	translate	read	null
2025-11-06	Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference	Matteo Cercola et.al.	2511.04286	translate	read	null
2025-11-06	RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization	Zeng Zhiyuan et.al.	2511.04285	translate	read	null
2025-11-06	SSPO: Subsentence-level Policy Optimization	Kun Yang et.al.	2511.04256	translate	read	null
2025-11-06	Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies	Marco Iannotta et.al.	2511.04249	translate	read	null
2025-11-06	Shared Spatial Memory Through Predictive Coding	Zhengru Fang et.al.	2511.04235	translate	read	null
2025-11-06	Opus: A Quantitative Framework for Workflow Evaluation	Alan Seroul et.al.	2511.04220	translate	read	null
2025-11-06	Black-Box Guardrail Reverse-engineering Attack	Hongwei Yao et.al.	2511.04215	translate	read	null
2025-11-06	PUL-SLAM: Path-Uncertainty Co-Optimization with Lightweight Stagnation Detection for Efficient Robotic Exploration	Yizhen Yin et.al.	2511.04180	translate	read	null
2025-11-06	Deep reinforcement learning based navigation of a jellyfish-like swimmer in flows with obstacles	Yihao Chen et.al.	2511.04156	translate	read	null
2025-11-06	Exchange Policy Optimization Algorithm for Semi-Infinite Safe Reinforcement Learning	Jiaming Zhang et.al.	2511.04147	translate	read	null
2025-11-06	BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning	Yitang Li et.al.	2511.04131	translate	read	null
2025-11-06	RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning	Xinyuan Li et.al.	2511.04120	translate	read	null
2025-11-06	CBMC-V3: A CNS-inspired Control Framework Towards Manipulation Agility with SNN	Yanbo Pang et.al.	2511.04109	translate	read	null
2025-11-06	Necessary and Sufficient Conditions for the Optimization-Based Concurrent Execution of Learned Robotic Tasks	Sheikh A. Tahmid et.al.	2511.04054	translate	read	null
2025-11-06	Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots	Yushi Wang et.al.	2511.03996	translate	read	null
2025-11-06	Adaptive Temporal Refinement: Continuous Depth Allocation and Distance Regression for Efficient Action Localization	Ibne Farabi Shihab et.al.	2511.03943	translate	read	null
2025-11-06	RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods	Raghav Sharma et.al.	2511.03939	translate	read	null
2025-11-05	Learning to shine: Neuroevolution enables optical control of phase transitions	Sraddha Agrawal et.al.	2511.03895	translate	read	null
2025-11-05	Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures	Florence Klitzner et.al.	2511.03882	translate	read	null
2025-11-05	From Static to Dynamic: Enhancing Offline-to-Online Reinforcement Learning via Energy-Guided Diffusion Stratification	Lipeng Zu et.al.	2511.03828	translate	read	null
2025-11-05	Scaling Agent Learning via Experience Synthesis	Zhaorun Chen et.al.	2511.03773	translate	read	link
2025-11-05	Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning	Richard Dewey et.al.	2511.03724	translate	read	null
2025-11-05	Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards	Guanning Zeng et.al.	2511.03710	translate	read	null
2025-11-05	AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing	Mohsen Ahmadzadeh et.al.	2511.03697	translate	read	null
2025-11-05	Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL	Lipeng Zu et.al.	2511.03695	translate	read	null
2025-11-05	Simulation-Based Validation of an Integrated 4D/5D Digital-Twin Framework for Predictive Construction Control	Atena Khoshkonesh et.al.	2511.03684	translate	read	null
2025-11-05	DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay	Daniel Perkins et.al.	2511.03670	translate	read	null
2025-11-05	Towards Formalizing Reinforcement Learning Theory	Shangtong Zhang et.al.	2511.03618	translate	read	null
2025-11-05	Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning	Iason Chrysomallis et.al.	2511.03616	translate	read	null
2025-11-05	Tensor-Efficient High-Dimensional Q-learning	Junyi Wu et.al.	2511.03595	translate	read	null
2025-11-05	PerfDojo: Automated ML Library Generation for Heterogeneous Architectures	Andrei Ivanov et.al.	2511.03586	translate	read	null
2025-11-05	Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances	Iason Chrysomallis et.al.	2511.03565	translate	read	null
2025-11-05	Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments	Bryan L. M. de Oliveira et.al.	2511.03527	translate	read	null
2025-11-05	Reinforcement Learning Using known Invariances	Alexandru Cioba et.al.	2511.03473	translate	read	null
2025-11-05	Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG	Longpeng Qiu et.al.	2511.03410	translate	read	null
2025-11-05	Adaptable Hindsight Experience Replay for Search-Based Learning	Alexandros Vazaios et.al.	2511.03405	translate	read	null
2025-11-05	Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning	Changxi Zhu et.al.	2511.03348	translate	read	null
2025-11-05	DRL-Based Robust Multi-Timescale Anti-Jamming Approaches under State Uncertainty	Haoqin Zhao et.al.	2511.03305	translate	read	null
2025-11-05	Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning	Ning Lyu et.al.	2511.03279	translate	read	null
2025-11-05	Climate Adaptation with Reinforcement Learning: Economic vs. Quality of Life Adaptation Pathways	Miguel Costa et.al.	2511.03243	translate	read	null
2025-11-05	Incorporating Quality of Life in Climate Adaptation Planning via Reinforcement Learning	Miguel Costa et.al.	2511.03238	translate	read	null
2025-11-05	Collaborative Assembly Policy Learning of a Sightless Robot	Zeqing Zhang et.al.	2511.03189	translate	read	null
2025-11-05	Periodic Skill Discovery	Jonghae Park et.al.	2511.03187	translate	read	null
2025-11-05	Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control	Rewida Ali et.al.	2511.03181	translate	read	null
2025-11-05	Optimizing Earth-Moon Transfer and Cislunar Navigation: Integrating Low-Energy Trajectories, AI Techniques and GNSS-R Technologies	Arsalan Muhammad et.al.	2511.03173	translate	read	null
2025-11-05	Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning	Xin Liu et.al.	2511.03167	translate	read	null
2025-11-05	Accelerating inverse materials design using generative diffusion models with reinforcement learning	Junwu Chen et.al.	2511.03112	translate	read	null
2025-11-05	Scaling Multi-Agent Environment Co-Design with Diffusion Models	Hao Xiang Li et.al.	2511.03100	translate	read	null
2025-11-04	Leveraging Discrete Function Decomposability for Scientific Design	James C. Bowden et.al.	2511.03032	translate	read	null
2025-11-04	Value of Information-Enhanced Exploration in Bootstrapped DQN	Stergios Plataniotis et.al.	2511.02969	translate	read	null
2025-11-04	Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks	Mohsin Mahmud Topu et.al.	2511.02957	translate	read	null
2025-11-04	Audience Amplified: Virtual Audiences in Asynchronously Performed AR Theater	You-Jin Kim et.al.	2511.02807	translate	read	null
2025-11-04	MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning	Qianhao Yuan et.al.	2511.02805	translate	read	null
2025-11-04	From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos	Xun Wang et.al.	2511.02762	translate	read	null
2025-11-04	Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning	Bowen Jin et.al.	2511.02755	translate	read	null
2025-11-04	VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models	Zhicheng Zhang et.al.	2511.02712	translate	read	null
2025-11-04	Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs	Georgios Tzannetos et.al.	2511.02690	translate	read	null
2025-11-04	RL-Aided Cognitive ISAC: Robust Detection and Sensing-Communication Trade-offs	Adam Umra et.al.	2511.02672	translate	read	null
2025-11-04	Natural-gas storage modelling by deep reinforcement learning	Tiziano Balaconi et.al.	2511.02646	translate	read	null
2025-11-04	Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning	Tiberiu-Andrei Georgescu et.al.	2511.02605	translate	read	null
2025-11-04	Directional-Clamp PPO	Gilad Karpel et.al.	2511.02577	translate	read	null
2025-11-04	Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning	Yixiu Mao et.al.	2511.02567	translate	read	null
2025-11-04	An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems	Changhao Miao et.al.	2511.02525	translate	read	null
2025-11-04	Dexterous Robotic Piano Playing at Scale	Le Chen et.al.	2511.02504	translate	read	null
2025-11-04	Auditable-choice reframing unlocks RL-based verification for open-ended tasks	Mengyu Zhang et.al.	2511.02463	translate	read	null
2025-11-04	ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension	Duo Xu et.al.	2511.02415	translate	read	null
2025-11-04	Large-scale automatic carbon ion treatment planning for head and neck cancers via parallel multi-agent reinforcement learning	Jueye Zhang et.al.	2511.02314	translate	read	null
2025-11-04	Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning	Beyazit Yalcinkaya et.al.	2511.02304	translate	read	null
2025-11-04	Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation	Zhiwei Zhang et.al.	2511.02303	translate	read	null
2025-11-04	Reinforcement learning based data assimilation for unknown state model	Ziyi Wang et.al.	2511.02286	translate	read	null
2025-11-04	SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning	Fangxun Shu et.al.	2511.02280	translate	read	null
2025-11-04	Structural Plasticity as Active Inference: A Biologically-Inspired Architecture for Homeostatic Control	Brennen A. Hill et.al.	2511.02241	translate	read	null
2025-11-04	Learning Interactive World Model for Object-Centric Reinforcement Learning	Fan Feng et.al.	2511.02225	translate	read	null
2025-11-04	Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments	Manonmani Sekar et.al.	2511.02217	translate	read	null
2025-11-04	Adaptive Cooperative Transmission Design for Ultra-Reliable Low-Latency Communications via Deep Reinforcement Learning	Hyemin Yu et.al.	2511.02216	translate	read	null
2025-11-04	Training Proactive and Personalized LLM Agents	Weiwei Sun et.al.	2511.02208	translate	read	null
2025-11-04	A Quantitative Comparison of Centralised and Distributed Reinforcement Learning-Based Control for Soft Robotic Arms	Linxin Hou et.al.	2511.02192	translate	read	null
2025-11-03	JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading	Valentin Mohl et.al.	2511.02136	translate	read	null
2025-11-03	Second-Order Policy Gradient Methods for the Linear Quadratic Regulator	Amirreza Valaei et.al.	2511.02095	translate	read	null
2025-11-03	Automated Reward Design for Gran Turismo	Michel Ma et.al.	2511.02094	translate	read	null
2025-11-03	Deep Reinforcement Learning for Multi-flow Routing in Heterogeneous Wireless Networks	Brian Kim et.al.	2511.02030	translate	read	null
2025-11-03	ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book	Patrick Cheridito et.al.	2511.02016	translate	read	null
2025-11-02	Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR	Abdelaziz Bounhar et.al.	2511.01937	translate	read	link
2025-11-02	Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch	Yirong Zeng et.al.	2511.01934	translate	read	null
2025-11-03	GenDexHand: Generative Simulation for Dexterous Hands	Feng Chen et.al.	2511.01791	translate	read	null
2025-11-03	MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll	Alexander Schperberg et.al.	2511.01774	translate	read	null
2025-11-03	RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks	Mian Wu et.al.	2511.01758	translate	read	null
2025-11-03	Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding	Jungyeon Koh et.al.	2511.01695	translate	read	null
2025-11-03	Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward	Xiaogang Xu et.al.	2511.01645	translate	read	null
2025-11-03	Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models	Xiaoyu Zhan et.al.	2511.01618	translate	read	null
2025-11-03	L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3	Xinyue Yang et.al.	2511.01602	translate	read	null
2025-11-03	Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning	Aditya Kapoor et.al.	2511.01554	translate	read	null
2025-11-03	TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks	Hanwen Xu et.al.	2511.01527	translate	read	null
2025-11-03	BARD: budget-aware reasoning distillation	Lujie Niu et.al.	2511.01470	translate	read	null
2025-11-03	Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis	Yuhang Huang et.al.	2511.01425	translate	read	null
2025-11-03	Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm	Amrapali Pednekar et.al.	2511.01415	translate	read	null
2025-11-03	AoI-Aware Machine Learning for Constrained Multimodal Sensing-Aided Communications	Abolfazl Zakeri et.al.	2511.01406	translate	read	null
2025-11-03	Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization	Ziqi Wang et.al.	2511.01374	translate	read	null
2025-11-03	Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series	Wenrui Cai et.al.	2511.01354	translate	read	null
2025-11-03	Diffusion-Based Solver for CNF Placement on the Cloud-Continuum	Álvaro Vázquez Rodríguez et.al.	2511.01343	translate	read	null
2025-11-03	RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models	Hongyin Zhang et.al.	2511.01331	translate	read	null
2025-11-03	From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models	Sureyya Akin et.al.	2511.01310	translate	read	null
2025-11-03	Optimizing Electric Vehicle Charging Station Placement Using Reinforcement Learning and Agent-Based Simulations	Minh-Duc Nguyen et.al.	2511.01218	translate	read	null
2025-11-03	Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering	Riddhi Jain et.al.	2511.01213	translate	read	null
2025-11-03	DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection	Guoxin Ma et.al.	2511.01192	translate	read	null
2025-11-03	Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning	Ru Wang et.al.	2511.01191	translate	read	null
2025-11-03	DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models	Ruofan Zhang et.al.	2511.01170	translate	read	null
2025-11-02	SLAP: Shortcut Learning for Abstract Planning	Y. Isabel Liu et.al.	2511.01107	translate	read	null
2025-11-02	HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning	Yujian Liu et.al.	2511.01104	translate	read	null
2025-11-02	Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment	Zihan Wang et.al.	2511.01083	translate	read	null
2025-11-02	Predictive Auxiliary Learning for Belief-based Multi-Agent Systems	Qinwei Huang et.al.	2511.01078	translate	read	null
2025-11-02	Quantum Reinforcement Learning for 6G and Beyond Wireless Networks	Dinh-Hieu Tran et.al.	2511.01070	translate	read	null
2025-11-02	Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning	Wenjin Liu et.al.	2511.01016	translate	read	link
2025-11-02	IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation	Bosi Wen et.al.	2511.01014	translate	read	null
2025-11-02	MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL	Haolin Yang et.al.	2511.01008	translate	read	link
2025-11-02	GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies	Ziye Wang et.al.	2511.00998	translate	read	null
2025-11-02	Optimizing Energy and Latency in 6G Smart Cities with Edge CyberTwins	Amine Abouaomar et.al.	2511.00955	translate	read	null
2025-11-02	KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization	Joonyoung Lim et.al.	2511.00880	translate	read	null
2025-11-02	Optimal Undulatory Swimming with Constrained Deformation and Actuation Intervals	Fumiya Tokoro et.al.	2511.00816	translate	read	null
2025-11-02	Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games	Runyu Lu et.al.	2511.00811	translate	read	null
2025-11-02	Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?	Bowen Fang et.al.	2511.00808	translate	read	null
2025-11-02	Logic-informed reinforcement learning for cross-domain optimization of large-scale cyber-physical systems	Guangxi Wan et.al.	2511.00806	translate	read	null
2025-11-02	GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents	Jie JW Wu et.al.	2511.00802	translate	read	null
2025-11-02	Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration	Yan Sun et.al.	2511.00794	translate	read	null
2025-11-02	Power Control Based on Multi-Agent Deep Q Network for D2D Communication	Shi Gengtian et.al.	2511.00767	translate	read	null
2025-11-01	Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries	Minghe Shen et.al.	2511.00710	translate	read	null
2025-11-01	PreferThinker: Reasoning-based Personalized Image Preference Assessment	Shengqi Xu et.al.	2511.00609	translate	read	null
2025-11-01	OpenSIR: Open-Ended Self-Improving Reasoner	Wai-Chung Kwan et.al.	2511.00602	translate	read	link
2025-11-01	Improving Robustness to Out-of-Distribution States in Imitation Learning via Deep Koopman-Boosted Diffusion Policy	Dianye Huang et.al.	2511.00555	translate	read	null
2025-11-01	Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control	Qiang Li et.al.	2511.00551	translate	read	null
2025-11-01	Robust Single-Agent Reinforcement Learning for Regional Traffic Signal Control Under Demand Fluctuations	Qiang Li et.al.	2511.00549	translate	read	null
2025-11-01	ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation	Panwang Pan et.al.	2511.00511	translate	read	null
2025-11-01	GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining	Chunyu Wei et.al.	2511.00457	translate	read	null
2025-11-01	Bootstrap Off-policy with World Model	Guojian Zhan et.al.	2511.00423	translate	read	null
2025-11-01	UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings	Zhibin Lan et.al.	2511.00405	translate	read	link
2025-11-01	CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks	Long Li et.al.	2511.00396	translate	read	null
2025-11-01	VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning	Xuanle Zhao et.al.	2511.00391	translate	read	link
2025-11-01	Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond	Fan Zhang et.al.	2511.00389	translate	read	null
2025-11-01	Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict	Chaochen Wu et.al.	2511.00370	translate	read	null

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)