Reinforcement Learning - 2025-11
Reinforcement Learning - 2025-11
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-11-06 | FoodRL: A Reinforcement Learning Ensembling Framework For In-Kind Food Donation Forecasting | Esha Sharma et.al. | 2511.04865 | translate | read | null |
| 2025-11-06 | Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning | Thore Gerlach et.al. | 2511.04856 | translate | read | null |
| 2025-11-06 | Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning | NVIDIA et.al. | 2511.04831 | translate | read | null |
| 2025-11-06 | Unified Multimodal Diffusion Forcing for Forceful Manipulation | Zixuan Huang et.al. | 2511.04812 | translate | read | null |
| 2025-11-06 | Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models | Chenxi Liu et.al. | 2511.04800 | translate | read | null |
| 2025-11-05 | SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change Memory | Mahek Desai et.al. | 2511.04713 | translate | read | null |
| 2025-11-05 | NCSAC: Effective Neural Community Search via Attribute-augmented Conductance | Longlong Lin et.al. | 2511.04712 | translate | read | null |
| 2025-11-06 | GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction | Qingzhou Lu et.al. | 2511.04679 | translate | read | null |
| 2025-11-06 | Forgetting is Everywhere | Ben Sanati et.al. | 2511.04666 | translate | read | null |
| 2025-11-06 | Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning | Hampus Åström et.al. | 2511.04598 | translate | read | null |
| 2025-11-06 | End-to-End Reinforcement Learning of Koopman Models for eNMPC of an Air Separation Unit | Daniel Mayfrank et.al. | 2511.04522 | translate | read | null |
| 2025-11-06 | V-Thinker: Interactive Thinking with Images | Runqi Qiao et.al. | 2511.04460 | translate | read | null |
| 2025-11-06 | Fitting Reinforcement Learning Model to Behavioral Data under Bandits | Hao Zhu et.al. | 2511.04454 | translate | read | null |
| 2025-11-06 | The Peril of Preference: Why GRPO fails on Ordinal Rewards | Anisha Garg et.al. | 2511.04439 | translate | read | null |
| 2025-11-06 | Temporal Action Selection for Action Chunking | Yueyang Weng et.al. | 2511.04421 | translate | read | null |
| 2025-11-06 | GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies | Maëlic Neau et.al. | 2511.04357 | translate | read | null |
| 2025-11-06 | MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments | Kuankuan Sima et.al. | 2511.04320 | translate | read | null |
| 2025-11-06 | GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents | Jian Mu et.al. | 2511.04307 | translate | read | null |
| 2025-11-06 | Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference | Matteo Cercola et.al. | 2511.04286 | translate | read | null |
| 2025-11-06 | RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization | Zeng Zhiyuan et.al. | 2511.04285 | translate | read | null |
| 2025-11-06 | SSPO: Subsentence-level Policy Optimization | Kun Yang et.al. | 2511.04256 | translate | read | null |
| 2025-11-06 | Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies | Marco Iannotta et.al. | 2511.04249 | translate | read | null |
| 2025-11-06 | Shared Spatial Memory Through Predictive Coding | Zhengru Fang et.al. | 2511.04235 | translate | read | null |
| 2025-11-06 | Opus: A Quantitative Framework for Workflow Evaluation | Alan Seroul et.al. | 2511.04220 | translate | read | null |
| 2025-11-06 | Black-Box Guardrail Reverse-engineering Attack | Hongwei Yao et.al. | 2511.04215 | translate | read | null |
| 2025-11-06 | PUL-SLAM: Path-Uncertainty Co-Optimization with Lightweight Stagnation Detection for Efficient Robotic Exploration | Yizhen Yin et.al. | 2511.04180 | translate | read | null |
| 2025-11-06 | Deep reinforcement learning based navigation of a jellyfish-like swimmer in flows with obstacles | Yihao Chen et.al. | 2511.04156 | translate | read | null |
| 2025-11-06 | Exchange Policy Optimization Algorithm for Semi-Infinite Safe Reinforcement Learning | Jiaming Zhang et.al. | 2511.04147 | translate | read | null |
| 2025-11-06 | BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning | Yitang Li et.al. | 2511.04131 | translate | read | null |
| 2025-11-06 | RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning | Xinyuan Li et.al. | 2511.04120 | translate | read | null |
| 2025-11-06 | CBMC-V3: A CNS-inspired Control Framework Towards Manipulation Agility with SNN | Yanbo Pang et.al. | 2511.04109 | translate | read | null |
| 2025-11-06 | Necessary and Sufficient Conditions for the Optimization-Based Concurrent Execution of Learned Robotic Tasks | Sheikh A. Tahmid et.al. | 2511.04054 | translate | read | null |
| 2025-11-06 | Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots | Yushi Wang et.al. | 2511.03996 | translate | read | null |
| 2025-11-06 | Adaptive Temporal Refinement: Continuous Depth Allocation and Distance Regression for Efficient Action Localization | Ibne Farabi Shihab et.al. | 2511.03943 | translate | read | null |
| 2025-11-06 | RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods | Raghav Sharma et.al. | 2511.03939 | translate | read | null |
| 2025-11-05 | Learning to shine: Neuroevolution enables optical control of phase transitions | Sraddha Agrawal et.al. | 2511.03895 | translate | read | null |
| 2025-11-05 | Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures | Florence Klitzner et.al. | 2511.03882 | translate | read | null |
| 2025-11-05 | From Static to Dynamic: Enhancing Offline-to-Online Reinforcement Learning via Energy-Guided Diffusion Stratification | Lipeng Zu et.al. | 2511.03828 | translate | read | null |
| 2025-11-05 | Scaling Agent Learning via Experience Synthesis | Zhaorun Chen et.al. | 2511.03773 | translate | read | link |
| 2025-11-05 | Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning | Richard Dewey et.al. | 2511.03724 | translate | read | null |
| 2025-11-05 | Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards | Guanning Zeng et.al. | 2511.03710 | translate | read | null |
| 2025-11-05 | AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing | Mohsen Ahmadzadeh et.al. | 2511.03697 | translate | read | null |
| 2025-11-05 | Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL | Lipeng Zu et.al. | 2511.03695 | translate | read | null |
| 2025-11-05 | Simulation-Based Validation of an Integrated 4D/5D Digital-Twin Framework for Predictive Construction Control | Atena Khoshkonesh et.al. | 2511.03684 | translate | read | null |
| 2025-11-05 | DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay | Daniel Perkins et.al. | 2511.03670 | translate | read | null |
| 2025-11-05 | Towards Formalizing Reinforcement Learning Theory | Shangtong Zhang et.al. | 2511.03618 | translate | read | null |
| 2025-11-05 | Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning | Iason Chrysomallis et.al. | 2511.03616 | translate | read | null |
| 2025-11-05 | Tensor-Efficient High-Dimensional Q-learning | Junyi Wu et.al. | 2511.03595 | translate | read | null |
| 2025-11-05 | PerfDojo: Automated ML Library Generation for Heterogeneous Architectures | Andrei Ivanov et.al. | 2511.03586 | translate | read | null |
| 2025-11-05 | Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances | Iason Chrysomallis et.al. | 2511.03565 | translate | read | null |
| 2025-11-05 | Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments | Bryan L. M. de Oliveira et.al. | 2511.03527 | translate | read | null |
| 2025-11-05 | Reinforcement Learning Using known Invariances | Alexandru Cioba et.al. | 2511.03473 | translate | read | null |
| 2025-11-05 | Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG | Longpeng Qiu et.al. | 2511.03410 | translate | read | null |
| 2025-11-05 | Adaptable Hindsight Experience Replay for Search-Based Learning | Alexandros Vazaios et.al. | 2511.03405 | translate | read | null |
| 2025-11-05 | Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning | Changxi Zhu et.al. | 2511.03348 | translate | read | null |
| 2025-11-05 | DRL-Based Robust Multi-Timescale Anti-Jamming Approaches under State Uncertainty | Haoqin Zhao et.al. | 2511.03305 | translate | read | null |
| 2025-11-05 | Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning | Ning Lyu et.al. | 2511.03279 | translate | read | null |
| 2025-11-05 | Climate Adaptation with Reinforcement Learning: Economic vs. Quality of Life Adaptation Pathways | Miguel Costa et.al. | 2511.03243 | translate | read | null |
| 2025-11-05 | Incorporating Quality of Life in Climate Adaptation Planning via Reinforcement Learning | Miguel Costa et.al. | 2511.03238 | translate | read | null |
| 2025-11-05 | Collaborative Assembly Policy Learning of a Sightless Robot | Zeqing Zhang et.al. | 2511.03189 | translate | read | null |
| 2025-11-05 | Periodic Skill Discovery | Jonghae Park et.al. | 2511.03187 | translate | read | null |
| 2025-11-05 | Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control | Rewida Ali et.al. | 2511.03181 | translate | read | null |
| 2025-11-05 | Optimizing Earth-Moon Transfer and Cislunar Navigation: Integrating Low-Energy Trajectories, AI Techniques and GNSS-R Technologies | Arsalan Muhammad et.al. | 2511.03173 | translate | read | null |
| 2025-11-05 | Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning | Xin Liu et.al. | 2511.03167 | translate | read | null |
| 2025-11-05 | Accelerating inverse materials design using generative diffusion models with reinforcement learning | Junwu Chen et.al. | 2511.03112 | translate | read | null |
| 2025-11-05 | Scaling Multi-Agent Environment Co-Design with Diffusion Models | Hao Xiang Li et.al. | 2511.03100 | translate | read | null |
| 2025-11-04 | Leveraging Discrete Function Decomposability for Scientific Design | James C. Bowden et.al. | 2511.03032 | translate | read | null |
| 2025-11-04 | Value of Information-Enhanced Exploration in Bootstrapped DQN | Stergios Plataniotis et.al. | 2511.02969 | translate | read | null |
| 2025-11-04 | Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks | Mohsin Mahmud Topu et.al. | 2511.02957 | translate | read | null |
| 2025-11-04 | Audience Amplified: Virtual Audiences in Asynchronously Performed AR Theater | You-Jin Kim et.al. | 2511.02807 | translate | read | null |
| 2025-11-04 | MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning | Qianhao Yuan et.al. | 2511.02805 | translate | read | null |
| 2025-11-04 | From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos | Xun Wang et.al. | 2511.02762 | translate | read | null |
| 2025-11-04 | Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning | Bowen Jin et.al. | 2511.02755 | translate | read | null |
| 2025-11-04 | VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models | Zhicheng Zhang et.al. | 2511.02712 | translate | read | null |
| 2025-11-04 | Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs | Georgios Tzannetos et.al. | 2511.02690 | translate | read | null |
| 2025-11-04 | RL-Aided Cognitive ISAC: Robust Detection and Sensing-Communication Trade-offs | Adam Umra et.al. | 2511.02672 | translate | read | null |
| 2025-11-04 | Natural-gas storage modelling by deep reinforcement learning | Tiziano Balaconi et.al. | 2511.02646 | translate | read | null |
| 2025-11-04 | Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning | Tiberiu-Andrei Georgescu et.al. | 2511.02605 | translate | read | null |
| 2025-11-04 | Directional-Clamp PPO | Gilad Karpel et.al. | 2511.02577 | translate | read | null |
| 2025-11-04 | Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning | Yixiu Mao et.al. | 2511.02567 | translate | read | null |
| 2025-11-04 | An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems | Changhao Miao et.al. | 2511.02525 | translate | read | null |
| 2025-11-04 | Dexterous Robotic Piano Playing at Scale | Le Chen et.al. | 2511.02504 | translate | read | null |
| 2025-11-04 | Auditable-choice reframing unlocks RL-based verification for open-ended tasks | Mengyu Zhang et.al. | 2511.02463 | translate | read | null |
| 2025-11-04 | ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension | Duo Xu et.al. | 2511.02415 | translate | read | null |
| 2025-11-04 | Large-scale automatic carbon ion treatment planning for head and neck cancers via parallel multi-agent reinforcement learning | Jueye Zhang et.al. | 2511.02314 | translate | read | null |
| 2025-11-04 | Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning | Beyazit Yalcinkaya et.al. | 2511.02304 | translate | read | null |
| 2025-11-04 | Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation | Zhiwei Zhang et.al. | 2511.02303 | translate | read | null |
| 2025-11-04 | Reinforcement learning based data assimilation for unknown state model | Ziyi Wang et.al. | 2511.02286 | translate | read | null |
| 2025-11-04 | SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning | Fangxun Shu et.al. | 2511.02280 | translate | read | null |
| 2025-11-04 | Structural Plasticity as Active Inference: A Biologically-Inspired Architecture for Homeostatic Control | Brennen A. Hill et.al. | 2511.02241 | translate | read | null |
| 2025-11-04 | Learning Interactive World Model for Object-Centric Reinforcement Learning | Fan Feng et.al. | 2511.02225 | translate | read | null |
| 2025-11-04 | Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments | Manonmani Sekar et.al. | 2511.02217 | translate | read | null |
| 2025-11-04 | Adaptive Cooperative Transmission Design for Ultra-Reliable Low-Latency Communications via Deep Reinforcement Learning | Hyemin Yu et.al. | 2511.02216 | translate | read | null |
| 2025-11-04 | Training Proactive and Personalized LLM Agents | Weiwei Sun et.al. | 2511.02208 | translate | read | null |
| 2025-11-04 | A Quantitative Comparison of Centralised and Distributed Reinforcement Learning-Based Control for Soft Robotic Arms | Linxin Hou et.al. | 2511.02192 | translate | read | null |
| 2025-11-03 | JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading | Valentin Mohl et.al. | 2511.02136 | translate | read | null |
| 2025-11-03 | Second-Order Policy Gradient Methods for the Linear Quadratic Regulator | Amirreza Valaei et.al. | 2511.02095 | translate | read | null |
| 2025-11-03 | Automated Reward Design for Gran Turismo | Michel Ma et.al. | 2511.02094 | translate | read | null |
| 2025-11-03 | Deep Reinforcement Learning for Multi-flow Routing in Heterogeneous Wireless Networks | Brian Kim et.al. | 2511.02030 | translate | read | null |
| 2025-11-03 | ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book | Patrick Cheridito et.al. | 2511.02016 | translate | read | null |
| 2025-11-02 | Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR | Abdelaziz Bounhar et.al. | 2511.01937 | translate | read | link |
| 2025-11-02 | Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch | Yirong Zeng et.al. | 2511.01934 | translate | read | null |
| 2025-11-03 | GenDexHand: Generative Simulation for Dexterous Hands | Feng Chen et.al. | 2511.01791 | translate | read | null |
| 2025-11-03 | MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll | Alexander Schperberg et.al. | 2511.01774 | translate | read | null |
| 2025-11-03 | RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks | Mian Wu et.al. | 2511.01758 | translate | read | null |
| 2025-11-03 | Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding | Jungyeon Koh et.al. | 2511.01695 | translate | read | null |
| 2025-11-03 | Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward | Xiaogang Xu et.al. | 2511.01645 | translate | read | null |
| 2025-11-03 | Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models | Xiaoyu Zhan et.al. | 2511.01618 | translate | read | null |
| 2025-11-03 | L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3 | Xinyue Yang et.al. | 2511.01602 | translate | read | null |
| 2025-11-03 | Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning | Aditya Kapoor et.al. | 2511.01554 | translate | read | null |
| 2025-11-03 | TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks | Hanwen Xu et.al. | 2511.01527 | translate | read | null |
| 2025-11-03 | BARD: budget-aware reasoning distillation | Lujie Niu et.al. | 2511.01470 | translate | read | null |
| 2025-11-03 | Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis | Yuhang Huang et.al. | 2511.01425 | translate | read | null |
| 2025-11-03 | Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm | Amrapali Pednekar et.al. | 2511.01415 | translate | read | null |
| 2025-11-03 | AoI-Aware Machine Learning for Constrained Multimodal Sensing-Aided Communications | Abolfazl Zakeri et.al. | 2511.01406 | translate | read | null |
| 2025-11-03 | Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization | Ziqi Wang et.al. | 2511.01374 | translate | read | null |
| 2025-11-03 | Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series | Wenrui Cai et.al. | 2511.01354 | translate | read | null |
| 2025-11-03 | Diffusion-Based Solver for CNF Placement on the Cloud-Continuum | Álvaro Vázquez Rodríguez et.al. | 2511.01343 | translate | read | null |
| 2025-11-03 | RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models | Hongyin Zhang et.al. | 2511.01331 | translate | read | null |
| 2025-11-03 | From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models | Sureyya Akin et.al. | 2511.01310 | translate | read | null |
| 2025-11-03 | Optimizing Electric Vehicle Charging Station Placement Using Reinforcement Learning and Agent-Based Simulations | Minh-Duc Nguyen et.al. | 2511.01218 | translate | read | null |
| 2025-11-03 | Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering | Riddhi Jain et.al. | 2511.01213 | translate | read | null |
| 2025-11-03 | DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection | Guoxin Ma et.al. | 2511.01192 | translate | read | null |
| 2025-11-03 | Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning | Ru Wang et.al. | 2511.01191 | translate | read | null |
| 2025-11-03 | DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models | Ruofan Zhang et.al. | 2511.01170 | translate | read | null |
| 2025-11-02 | SLAP: Shortcut Learning for Abstract Planning | Y. Isabel Liu et.al. | 2511.01107 | translate | read | null |
| 2025-11-02 | HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning | Yujian Liu et.al. | 2511.01104 | translate | read | null |
| 2025-11-02 | Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment | Zihan Wang et.al. | 2511.01083 | translate | read | null |
| 2025-11-02 | Predictive Auxiliary Learning for Belief-based Multi-Agent Systems | Qinwei Huang et.al. | 2511.01078 | translate | read | null |
| 2025-11-02 | Quantum Reinforcement Learning for 6G and Beyond Wireless Networks | Dinh-Hieu Tran et.al. | 2511.01070 | translate | read | null |
| 2025-11-02 | Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning | Wenjin Liu et.al. | 2511.01016 | translate | read | link |
| 2025-11-02 | IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation | Bosi Wen et.al. | 2511.01014 | translate | read | null |
| 2025-11-02 | MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL | Haolin Yang et.al. | 2511.01008 | translate | read | link |
| 2025-11-02 | GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies | Ziye Wang et.al. | 2511.00998 | translate | read | null |
| 2025-11-02 | Optimizing Energy and Latency in 6G Smart Cities with Edge CyberTwins | Amine Abouaomar et.al. | 2511.00955 | translate | read | null |
| 2025-11-02 | KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization | Joonyoung Lim et.al. | 2511.00880 | translate | read | null |
| 2025-11-02 | Optimal Undulatory Swimming with Constrained Deformation and Actuation Intervals | Fumiya Tokoro et.al. | 2511.00816 | translate | read | null |
| 2025-11-02 | Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games | Runyu Lu et.al. | 2511.00811 | translate | read | null |
| 2025-11-02 | Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events? | Bowen Fang et.al. | 2511.00808 | translate | read | null |
| 2025-11-02 | Logic-informed reinforcement learning for cross-domain optimization of large-scale cyber-physical systems | Guangxi Wan et.al. | 2511.00806 | translate | read | null |
| 2025-11-02 | GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents | Jie JW Wu et.al. | 2511.00802 | translate | read | null |
| 2025-11-02 | Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration | Yan Sun et.al. | 2511.00794 | translate | read | null |
| 2025-11-02 | Power Control Based on Multi-Agent Deep Q Network for D2D Communication | Shi Gengtian et.al. | 2511.00767 | translate | read | null |
| 2025-11-01 | Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries | Minghe Shen et.al. | 2511.00710 | translate | read | null |
| 2025-11-01 | PreferThinker: Reasoning-based Personalized Image Preference Assessment | Shengqi Xu et.al. | 2511.00609 | translate | read | null |
| 2025-11-01 | OpenSIR: Open-Ended Self-Improving Reasoner | Wai-Chung Kwan et.al. | 2511.00602 | translate | read | link |
| 2025-11-01 | Improving Robustness to Out-of-Distribution States in Imitation Learning via Deep Koopman-Boosted Diffusion Policy | Dianye Huang et.al. | 2511.00555 | translate | read | null |
| 2025-11-01 | Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control | Qiang Li et.al. | 2511.00551 | translate | read | null |
| 2025-11-01 | Robust Single-Agent Reinforcement Learning for Regional Traffic Signal Control Under Demand Fluctuations | Qiang Li et.al. | 2511.00549 | translate | read | null |
| 2025-11-01 | ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation | Panwang Pan et.al. | 2511.00511 | translate | read | null |
| 2025-11-01 | GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining | Chunyu Wei et.al. | 2511.00457 | translate | read | null |
| 2025-11-01 | Bootstrap Off-policy with World Model | Guojian Zhan et.al. | 2511.00423 | translate | read | null |
| 2025-11-01 | UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings | Zhibin Lan et.al. | 2511.00405 | translate | read | link |
| 2025-11-01 | CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks | Long Li et.al. | 2511.00396 | translate | read | null |
| 2025-11-01 | VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning | Xuanle Zhao et.al. | 2511.00391 | translate | read | link |
| 2025-11-01 | Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond | Fan Zhang et.al. | 2511.00389 | translate | read | null |
| 2025-11-01 | Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict | Chaochen Wu et.al. | 2511.00370 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)