Reinforcement Learning - 2025-04
Reinforcement Learning - 2025-04
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-04-30 | DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition | Z. Z. Ren et.al. | 2504.21801 | translate | read | link |
| 2025-04-30 | Reconciling Discrete-Time Mixed Policies and Continuous-Time Relaxed Controls in Reinforcement Learning and Stochastic Control | Rene Carmona et.al. | 2504.21793 | translate | read | null |
| 2025-04-30 | MAGNET: an open-source library for mesh agglomeration by Graph Neural Networks | Paola F. Antonietti et.al. | 2504.21780 | translate | read | null |
| 2025-04-30 | LLM-based Interactive Imitation Learning for Robotic Manipulation | Jonas Werner et.al. | 2504.21769 | translate | read | null |
| 2025-04-30 | LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning | Yiyang Shao et.al. | 2504.21738 | translate | read | null |
| 2025-04-30 | Adaptive 3D UI Placement in Mixed Reality Using Deep Reinforcement Learning | Feiyu Lu et.al. | 2504.21731 | translate | read | null |
| 2025-04-30 | MovementVR: An open-source tool for the study of motor control and learning in virtual reality | Cristina Rossi et.al. | 2504.21696 | translate | read | null |
| 2025-04-30 | Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation | Luca Marzari et.al. | 2504.21643 | translate | read | null |
| 2025-04-30 | Multi-Goal Dexterous Hand Manipulation using Probabilistic Model-based Reinforcement Learning | Yingzhuo Jiang et.al. | 2504.21585 | translate | read | null |
| 2025-04-30 | SimPRIVE: a Simulation framework for Physical Robot Interaction with Virtual Environments | Federico Nesti et.al. | 2504.21454 | translate | read | null |
| 2025-04-29 | Toward Efficient Exploration by Large Language Model Agents | Dilip Arumugam et.al. | 2504.20997 | translate | read | null |
| 2025-04-29 | XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search | Yiting Zhang et.al. | 2504.20969 | translate | read | null |
| 2025-04-29 | Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity | Taisuke Kobayashi et.al. | 2504.20932 | translate | read | null |
| 2025-04-29 | ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification | Ziqing Fan et.al. | 2504.20930 | translate | read | link |
| 2025-04-29 | Exploiting inter-agent coupling information for efficient reinforcement learning of cooperative LQR | Shahbaz P Qadri Syed et.al. | 2504.20927 | translate | read | null |
| 2025-04-29 | A Domain-Agnostic Scalable AI Safety Ensuring Framework | Beomjun Kim et.al. | 2504.20924 | translate | read | null |
| 2025-04-29 | Reinforcement Learning for LLM Reasoning Under Memory Constraints | Alan Lee et.al. | 2504.20834 | translate | read | null |
| 2025-04-29 | A Teacher-Student MPC-PPO Coupled Reinforcement Learning Framework for Winter Temperature Control of Solar Greenhouses in Northern China | Jingxin Yu et.al. | 2504.20815 | translate | read | null |
| 2025-04-29 | SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings | Florian Vahl et.al. | 2504.20808 | translate | read | null |
| 2025-04-29 | Q-Fusion: Diffusing Quantum Circuits | Collin Beaudoin et.al. | 2504.20794 | translate | read | null |
| 2025-04-28 | SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning | Wufei Ma et.al. | 2504.20024 | translate | read | null |
| 2025-04-28 | Socially-Aware Autonomous Driving: Inferring Yielding Intentions for Safer Interactions | Jing Wang et.al. | 2504.20004 | translate | read | null |
| 2025-04-28 | Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Adam Younsi et.al. | 2504.19981 | translate | read | null |
| 2025-04-28 | Mesh-Learner: Texturing Mesh with Spherical Harmonics | Yunfei Wan et.al. | 2504.19938 | translate | read | null |
| 2025-04-28 | Automated decision-making for dynamic task assignment at scale | Riccardo Lo Bianco et.al. | 2504.19933 | translate | read | null |
| 2025-04-28 | GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets | Mingqian He et.al. | 2504.19898 | translate | read | null |
| 2025-04-28 | Optimizing the Charging of Open Quantum Batteries using Long Short-Term Memory-Driven Reinforcement Learning | Shadab Zakavati et.al. | 2504.19840 | translate | read | null |
| 2025-04-28 | LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects | Guangyi Liu et.al. | 2504.19838 | translate | read | link |
| 2025-04-28 | Reinforcement Learning-Based Heterogeneous Multi-Task Optimization in Semantic Broadcast Communications | Zhilin Lu et.al. | 2504.19806 | translate | read | null |
| 2025-04-28 | Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control | Heisei Yonezawa et.al. | 2504.19715 | translate | read | null |
| 2025-04-25 | Generalization Capability for Imitation Learning | Yixiao Wang et.al. | 2504.18538 | translate | read | null |
| 2025-04-25 | Intelligent Attacks and Defense Methods in Federated Learning-enabled Energy-Efficient Wireless Networks | Han Zhang et.al. | 2504.18519 | translate | read | null |
| 2025-04-25 | Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation | Peiyuan Jing et.al. | 2504.18453 | translate | read | null |
| 2025-04-25 | Pushing the boundary on Natural Language Inference | Pablo Miralles-González et.al. | 2504.18376 | translate | read | null |
| 2025-04-25 | Explainable AI for UAV Mobility Management: A Deep Q-Network Approach for Handover Minimization | Irshad A. Meer et.al. | 2504.18371 | translate | read | null |
| 2025-04-25 | Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps | Simon Hakenes et.al. | 2504.18300 | translate | read | null |
| 2025-04-25 | Depth-Constrained ASV Navigation with Deep RL and Limited Sensing | Amirhossein Zhalehmehrabi et.al. | 2504.18253 | translate | read | null |
| 2025-04-25 | Aligning Language Models for Icelandic Legal Text Summarization | Þórir Hrafn Harðarson et.al. | 2504.18180 | translate | read | null |
| 2025-04-25 | Offline Learning of Controllable Diverse Behaviors | Mathieu Petitbois et.al. | 2504.18160 | translate | read | null |
| 2025-04-25 | Learning from Less: SINDy Surrogates in RL | Aniket Dixit et.al. | 2504.18113 | translate | read | null |
| 2025-04-24 | Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control | Haochen Wang et.al. | 2504.17771 | translate | read | null |
| 2025-04-24 | Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence | Edward Collins et.al. | 2504.17703 | translate | read | null |
| 2025-04-24 | Applied Sheaf Theory For Multi-agent Artificial Intelligence (Reinforcement Learning) Systems: A Prospectus | Eric Schmid et.al. | 2504.17700 | translate | read | null |
| 2025-04-24 | SAPO-RL: Sequential Actuator Placement Optimization for Fuselage Assembly via Reinforcement Learning | Peng Ye et.al. | 2504.17603 | translate | read | null |
| 2025-04-24 | Mitigating xApp conflicts for efficient network slicing in 6G O-RAN: a graph convolutional-based attention network approach | Sihem Bakri et.al. | 2504.17590 | translate | read | null |
| 2025-04-24 | Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization | Hongshu Guo et.al. | 2504.17578 | translate | read | null |
| 2025-04-24 | Cooperative Task Offloading through Asynchronous Deep Reinforcement Learning in Mobile Edge Computing for Future Networks | Yuelin Liu et.al. | 2504.17526 | translate | read | null |
| 2025-04-24 | Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning | Mingqi Yuan et.al. | 2504.17490 | translate | read | null |
| 2025-04-24 | Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning | Weiliang Zhang et.al. | 2504.17356 | translate | read | null |
| 2025-04-24 | Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization | Xiaohan Huang et.al. | 2504.17355 | translate | read | null |
| 2025-04-23 | Latent Diffusion Planning for Imitation Learning | Amber Xie et.al. | 2504.16925 | translate | read | null |
| 2025-04-23 | Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms | Hsin-Jung Yang et.al. | 2504.16916 | translate | read | null |
| 2025-04-23 | Hybrid Reinforcement Learning and Model Predictive Control for Adaptive Control of Hydrogen-Diesel Dual-Fuel Combustion | Julian Bedei et.al. | 2504.16875 | translate | read | null |
| 2025-04-23 | Monte Carlo Planning with Large Language Model for Text-Based Game Agents | Zijing Shi et.al. | 2504.16855 | translate | read | null |
| 2025-04-23 | SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward | Nicolas Jonason et.al. | 2504.16839 | translate | read | null |
| 2025-04-23 | MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme | Weixi Li et.al. | 2504.16729 | translate | read | null |
| 2025-04-23 | PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation | Wenxuan Li et.al. | 2504.16693 | translate | read | null |
| 2025-04-23 | Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator | Chenhao Li et.al. | 2504.16680 | translate | read | null |
| 2025-04-23 | Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning | Chris et.al. | 2504.16656 | translate | read | link |
| 2025-04-23 | Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models | Fredy Pokou et.al. | 2504.16635 | translate | read | null |
| 2025-04-22 | TTRL: Test-Time Reinforcement Learning | Yuxin Zuo et.al. | 2504.16084 | translate | read | link |
| 2025-04-22 | LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities | Thomas Schmied et.al. | 2504.16078 | translate | read | null |
| 2025-04-22 | Reinforcement Learning and Metaheuristics for Feynman Integral Reduction | Mao Zeng et.al. | 2504.16045 | translate | read | null |
| 2025-04-22 | The Formation of Production Networks: How Supply Chains Arise from Simple Learning with Minimal Information | Tuong Manh Vu et.al. | 2504.16010 | translate | read | null |
| 2025-04-22 | Making Neural Networks More Suitable for Approximate Clifford+T Circuit Synthesis | Mathias Weiden et.al. | 2504.15990 | translate | read | null |
| 2025-04-22 | Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems | Lukas Gehrke et.al. | 2504.15984 | translate | read | null |
| 2025-04-22 | Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning | Wang Lin et.al. | 2504.15932 | translate | read | null |
| 2025-04-22 | StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation | Yinmin Zhong et.al. | 2504.15930 | translate | read | null |
| 2025-04-22 | New Recipe for Semi-supervised Community Detection: Clique Annealing under Crystallization Kinetics | Ling Cheng et.al. | 2504.15927 | translate | read | null |
| 2025-04-22 | GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network | Wenjing Xiao et.al. | 2504.15905 | translate | read | null |
| 2025-04-21 | VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models | Weiye Xu et.al. | 2504.15279 | translate | read | null |
| 2025-04-21 | Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Jie Cheng et.al. | 2504.15275 | translate | read | link |
| 2025-04-21 | FlowReasoner: Reinforcing Query-Level Meta-Agents | Hongcheng Gao et.al. | 2504.15257 | translate | read | link |
| 2025-04-21 | DRAGON: Distributional Rewards Optimize Diffusion Generative Models | Yatong Bai et.al. | 2504.15217 | translate | read | null |
| 2025-04-21 | Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs | Marina Sakharova et.al. | 2504.15210 | translate | read | null |
| 2025-04-21 | Beyond Binary Opinions: A Deep Reinforcement Learning-Based Approach to Uncertainty-Aware Competitive Influence Maximization | Qi Zhang et.al. | 2504.15131 | translate | read | null |
| 2025-04-21 | A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment | Kangyao Huang et.al. | 2504.15129 | translate | read | null |
| 2025-04-21 | Fast-Slow Co-advancing Optimizer: Toward Harmonious Adversarial Training of GAN | Lin Wang et.al. | 2504.15099 | translate | read | null |
| 2025-04-21 | Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL | Simone Papicchio et.al. | 2504.15077 | translate | read | null |
| 2025-04-21 | Energy-Efficient UAV-Mounted RIS for IoT: A Hybrid Energy Harvesting and DRL Approach | Mahmoud M. Salim et.al. | 2504.15043 | translate | read | null |
| 2025-04-18 | Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Yang Yue et.al. | 2504.13837 | translate | read | link |
| 2025-04-18 | Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning | Yixuan Even Xu et.al. | 2504.13818 | translate | read | null |
| 2025-04-18 | DiffOG: Differentiable Policy Trajectory Optimization with Generalizability | Zhengtong Xu et.al. | 2504.13807 | translate | read | null |
| 2025-04-18 | Imitation Learning with Precisely Labeled Human Demonstrations | Yilong Song et.al. | 2504.13803 | translate | read | null |
| 2025-04-18 | Bake Two Cakes with One Oven: RL for Defusing Popularity Bias and Cold-start in Third-Party Library Recommendations | Minh Hoang Vuong et.al. | 2504.13772 | translate | read | null |
| 2025-04-18 | A Reinforcement Learning Method to Factual and Counterfactual Explanations for Session-based Recommendation | Han Zhou et.al. | 2504.13632 | translate | read | null |
| 2025-04-18 | Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning | Rohan P. Singh et.al. | 2504.13619 | translate | read | null |
| 2025-04-18 | On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting | Niklas Funk et.al. | 2504.13618 | translate | read | null |
| 2025-04-18 | Compile Scene Graphs with Reinforcement Learning | Zuyao Chen et.al. | 2504.13617 | translate | read | null |
| 2025-04-18 | Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling | Zihao Feng et.al. | 2504.13592 | translate | read | null |
| 2025-04-17 | Energy-Based Reward Models for Robust Language Model Alignment | Anamika Lochab et.al. | 2504.13134 | translate | read | null |
| 2025-04-17 | LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard | Varun Rao et.al. | 2504.13125 | translate | read | null |
| 2025-04-17 | SkyReels-V2: Infinite-length Film Generative Model | Guibin Chen et.al. | 2504.13074 | translate | read | link |
| 2025-04-17 | NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation | Xiangyan Liu et.al. | 2504.13055 | translate | read | link |
| 2025-04-17 | InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning | Zheng Wang et.al. | 2504.13032 | translate | read | null |
| 2025-04-17 | QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? | Zhouyang Jiang et.al. | 2504.12961 | translate | read | null |
| 2025-04-17 | RL-PINNs: Reinforcement Learning-Driven Adaptive Sampling for Efficient Training of PINNs | Zhenao Song et.al. | 2504.12949 | translate | read | null |
| 2025-04-17 | Image-Editing Specialists: An RLAIF Approach for Diffusion Models | Elior Benarous et.al. | 2504.12833 | translate | read | link |
| 2025-04-17 | Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis | James Rudd-Jones et.al. | 2504.12777 | translate | read | null |
| 2025-04-17 | GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks | Hao Xu et.al. | 2504.12764 | translate | read | link |
| 2025-04-16 | Adapting a World Model for Trajectory Following in a 3D Game | Marko Tot et.al. | 2504.12299 | translate | read | null |
| 2025-04-16 | d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | Siyan Zhao et.al. | 2504.12216 | translate | read | link |
| 2025-04-16 | Reasoning-Based AI for Startup Evaluation (R.A.I.S.E.): A Memory-Augmented, Multi-Step Decision Framework | Jack Preuveneers et.al. | 2504.12090 | translate | read | null |
| 2025-04-16 | pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild | Jonas Myhre Schiøtt et.al. | 2504.12045 | translate | read | null |
| 2025-04-16 | Evolutionary Reinforcement Learning for Interpretable Decision-Making in Supply Chain Management | Stefano Genetti et.al. | 2504.12023 | translate | read | null |
| 2025-04-16 | Control of Rayleigh-Bénard Convection: Effectiveness of Reinforcement Learning in the Turbulent Regime | Thorben Markmann et.al. | 2504.12000 | translate | read | null |
| 2025-04-16 | A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs | Kihyuk Hong et.al. | 2504.11997 | translate | read | null |
| 2025-04-16 | Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions | Yifei Dong et.al. | 2504.11967 | translate | read | null |
| 2025-04-16 | R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors | Haoyang Wang et.al. | 2504.11946 | translate | read | null |
| 2025-04-16 | VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning | Xuyang Chen et.al. | 2504.11944 | translate | read | null |
| 2025-04-15 | DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning | Zhiwei He et.al. | 2504.11456 | translate | read | link |
| 2025-04-15 | A Clean Slate for Offline Reinforcement Learning | Matthew Thomas Jackson et.al. | 2504.11453 | translate | read | null |
| 2025-04-15 | Embodied World Models Emerge from Navigational Task in Open-Ended Environments | Li Jin et.al. | 2504.11419 | translate | read | null |
| 2025-04-15 | Measures of Variability for Risk-averse Policy Gradient | Yudong Luo et.al. | 2504.11412 | translate | read | null |
| 2025-04-15 | Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning | Haiming Wang et.al. | 2504.11354 | translate | read | null |
| 2025-04-15 | A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce | Wei Xiong et.al. | 2504.11343 | translate | read | link |
| 2025-04-15 | Multi-Agent Reinforcement Learning for Greenhouse Gas Offset Credit Markets | Liam Welsh et.al. | 2504.11258 | translate | read | null |
| 2025-04-15 | A Rollout-Based Algorithm and Reward Function for Efficient Resource Allocation in Business Processes | Jeroen Middelhuis et.al. | 2504.11250 | translate | read | null |
| 2025-04-15 | Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks | Fikrican Özgür et.al. | 2504.11247 | translate | read | null |
| 2025-04-15 | Revealing Covert Attention by Analyzing Human and Reinforcement Learning Agent Gameplay | Henrik Krauss et.al. | 2504.11118 | translate | read | null |
| 2025-04-14 | Weight Ensembling Improves Reasoning in Language Models | Xingyu Dang et.al. | 2504.10478 | translate | read | null |
| 2025-04-14 | Co-optimizing Physical Reconfiguration Parameters and Controllers for an Origami-inspired Reconfigurable Manipulator | Zhe Chen et.al. | 2504.10474 | translate | read | null |
| 2025-04-14 | GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents | Xiaobo Xia et.al. | 2504.10458 | translate | read | link |
| 2025-04-14 | The Communication and Computation Trade-off in Wireless Semantic Communications | Xuyang Chen et.al. | 2504.10357 | translate | read | null |
| 2025-04-14 | Heimdall: test-time scaling on the generative verification | Wenlei Shi et.al. | 2504.10337 | translate | read | null |
| 2025-04-14 | Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning | Guanqi He et.al. | 2504.10334 | translate | read | null |
| 2025-04-14 | InstructEngine: Instruction-driven Text-to-Image Alignment | Xingyu Lu et.al. | 2504.10329 | translate | read | null |
| 2025-04-14 | Vision based driving agent for race car simulation environments | Gergely Bári et.al. | 2504.10266 | translate | read | null |
| 2025-04-14 | Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins | Collins O. Ogbodo et.al. | 2504.10248 | translate | read | null |
| 2025-04-14 | Deep Reasoning Translation via Reinforcement Learning | Jiaan Wang et.al. | 2504.10187 | translate | read | null |
| 2025-04-11 | Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing | Vinal Asodia et.al. | 2504.08704 | translate | read | null |
| 2025-04-11 | Pobogot – An Open-Hardware Open-Source Low Cost Robot for Swarm Robotics | Alessia Loi et.al. | 2504.08686 | translate | read | null |
| 2025-04-11 | Reinforcement Learning-Driven Plant-Wide Refinery Planning Using Model Decomposition | Zhouchang Li et.al. | 2504.08642 | translate | read | null |
| 2025-04-11 | Neural Fidelity Calibration for Informative Sim-to-Real Adaptation | Youwei Yu et.al. | 2504.08604 | translate | read | null |
| 2025-04-11 | SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning | Peixian Ma et.al. | 2504.08600 | translate | read | link |
| 2025-04-11 | Playpen: An Environment for Exploring Learning Through Conversational Interaction | Nicola Horst et.al. | 2504.08590 | translate | read | link |
| 2025-04-11 | Slicing the Gaussian Mixture Wasserstein Distance | Moritz Piening et.al. | 2504.08544 | translate | read | null |
| 2025-04-11 | Diffusion Models for Robotic Manipulation: A Survey | Rosa Wolf et.al. | 2504.08438 | translate | read | null |
| 2025-04-11 | Belief States for Cooperative Multi-Agent Reinforcement Learning under Partial Observability | Paul J. Pritz et.al. | 2504.08417 | translate | read | null |
| 2025-04-11 | Scalable Conflict-free Decision Making with Photons | Kohei Konaka et.al. | 2504.08331 | translate | read | null |
| 2025-04-10 | Perception-R1: Pioneering Perception Policy with Reinforcement Learning | En Yu et.al. | 2504.07954 | translate | read | link |
| 2025-04-10 | Echo: An Open-Source, Low-Cost Teleoperation System with Force Feedback for Dataset Collection in Robot Learning | Artem Bazhenov et.al. | 2504.07939 | translate | read | null |
| 2025-04-10 | Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining | Rosie Zhao et.al. | 2504.07912 | translate | read | link |
| 2025-04-10 | Fast Adaptation with Behavioral Foundation Models | Harshit Sikchi et.al. | 2504.07896 | translate | read | null |
| 2025-04-10 | 2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization | Mengyang Li et.al. | 2504.07856 | translate | read | null |
| 2025-04-10 | Genetic Programming with Reinforcement Learning Trained Transformer for Real-World Dynamic Scheduling Problems | Xian Chen et.al. | 2504.07779 | translate | read | null |
| 2025-04-10 | Harnessing Equivariance: Modeling Turbulence with Graph Neural Networks | Marius Kurz et.al. | 2504.07741 | translate | read | null |
| 2025-04-10 | Relaxing the Markov Requirements on Reinforcement Learning Under Weak Partial Ignorability | MaryLena Bleile et.al. | 2504.07722 | translate | read | null |
| 2025-04-10 | Sim-to-Real Transfer in Reinforcement Learning for Maneuver Control of a Variable-Pitch MAV | Zhikun Wang et.al. | 2504.07694 | translate | read | null |
| 2025-04-10 | VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model | Haozhan Shen et.al. | 2504.07615 | translate | read | link |
| 2025-04-09 | Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning | Chenjie Hao et.al. | 2504.07095 | translate | read | link |
| 2025-04-09 | AssistanceZero: Scalably Solving Assistance Games | Cassidy Laidlaw et.al. | 2504.07091 | translate | read | link |
| 2025-04-09 | A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility | Andreas Hochlehnert et.al. | 2504.07086 | translate | read | link |
| 2025-04-09 | To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning | Tian Qin et.al. | 2504.07052 | translate | read | null |
| 2025-04-09 | Free Random Projection for In-Context Reinforcement Learning | Tomohiro Hayase et.al. | 2504.06983 | translate | read | null |
| 2025-04-09 | VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning | Xinhao Li et.al. | 2504.06958 | translate | read | link |
| 2025-04-09 | Regret Bounds for Robust Online Decision Making | Alexander Appel et.al. | 2504.06820 | translate | read | null |
| 2025-04-09 | Interactive Expressive Motion Generation Using Dynamic Movement Primitives | Till Hielscher et.al. | 2504.06735 | translate | read | null |
| 2025-04-09 | Learning global control of underactuated systems with Model-Based Reinforcement Learning | Niccolò Turcato et.al. | 2504.06721 | translate | read | null |
| 2025-04-09 | SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination | Delin Zhao et.al. | 2504.06684 | translate | read | null |
| 2025-04-08 | ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface | Fangchen Liu et.al. | 2504.06156 | translate | read | null |
| 2025-04-08 | Adversarial Training of Reward Models | Alexander Bukharin et.al. | 2504.06141 | translate | read | null |
| 2025-04-08 | A Multimedia Analytics Model for the Foundation Model Era | Marcel Worring et.al. | 2504.06138 | translate | read | null |
| 2025-04-08 | Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms | Ido Greenberg et.al. | 2504.06126 | translate | read | null |
| 2025-04-08 | Robo-taxi Fleet Coordination at Scale via Reinforcement Learning | Luigi Tresca et.al. | 2504.06125 | translate | read | link |
| 2025-04-09 | Leanabell-Prover: Posttraining Scaling in Formal Reasoning | Jingyuan Zhang et.al. | 2504.06122 | translate | read | link |
| 2025-04-08 | Trust-Region Twisted Policy Improvement | Joery A. de Vries et.al. | 2504.06048 | translate | read | null |
| 2025-04-08 | Information-Theoretic Reward Decomposition for Generalizable RLHF | Liyuan Mao et.al. | 2504.06020 | translate | read | null |
| 2025-04-08 | Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models | J. S. van Hulst et.al. | 2504.05978 | translate | read | null |
| 2025-04-08 | AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems | Zhuoli Zhuang et.al. | 2504.05950 | translate | read | null |
| 2025-04-07 | RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception | Hui Zhang et.al. | 2504.05287 | translate | read | link |
| 2025-04-07 | Concise Reasoning via Reinforcement Learning | Mehdi Fatemi et.al. | 2504.05185 | translate | read | link |
| 2025-04-07 | Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval | Kidist Amde Mekonnen et.al. | 2504.05181 | translate | read | link |
| 2025-04-07 | RLBayes: a Bayesian Network Structure Learning Algorithm via Reinforcement Learning-Based Search Strategy | Mingcan Wang et.al. | 2504.05167 | translate | read | null |
| 2025-04-07 | A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks | Leonardo Kanashiro Felizardo et.al. | 2504.05150 | translate | read | link |
| 2025-04-08 | VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks | Yu Yue et.al. | 2504.05118 | translate | read | null |
| 2025-04-07 | Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning | Anja Surina et.al. | 2504.05108 | translate | read | null |
| 2025-04-08 | Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation | Huilin Yin et.al. | 2504.05045 | translate | read | null |
| 2025-04-07 | Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning | Bibek Poudel et.al. | 2504.05018 | translate | read | null |
| 2025-04-07 | Wavelet Policy: Imitation Policy Learning in Frequency Domain with Wavelet Transforms | Changchuan Yang et.al. | 2504.04991 | translate | read | link |
| 2025-04-04 | Align to Structure: Aligning Large Language Models with Structural Information | Zae Myung Kim et.al. | 2504.03622 | translate | read | null |
| 2025-04-04 | Optimization of a Triangular Delaunay Mesh Generator using Reinforcement Learning | Will Thacher et.al. | 2504.03610 | translate | read | null |
| 2025-04-04 | Dexterous Manipulation through Imitation Learning: A Survey | Shan An et.al. | 2504.03515 | translate | read | null |
| 2025-04-04 | Learning Dual-Arm Coordination for Grasping Large Flat Objects | Yongliang Wang et.al. | 2504.03500 | translate | read | null |
| 2025-04-04 | Optimizing Quantum Circuits via ZX Diagrams using Reinforcement Learning and Graph Neural Networks | Alexander Mattick et.al. | 2504.03429 | translate | read | null |
| 2025-04-04 | DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models | Sathish Kumar et.al. | 2504.03423 | translate | read | null |
| 2025-04-04 | Autonomous state-space segmentation for Deep-RL sparse reward scenarios | Gianluca Maselli et.al. | 2504.03420 | translate | read | null |
| 2025-04-04 | Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning | Sanghwan Bae et.al. | 2504.03380 | translate | read | null |
| 2025-04-04 | Verification of Autonomous Neural Car Control with KeYmaera X | Enguerrand Prebet et.al. | 2504.03272 | translate | read | null |
| 2025-04-04 | Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward | Yanming Wan et.al. | 2504.03206 | translate | read | null |
| 2025-04-03 | Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets | Chuning Zhu et.al. | 2504.02792 | translate | read | link |
| 2025-04-03 | A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy | Andrea Ghezzi et.al. | 2504.02710 | translate | read | null |
| 2025-04-03 | Handover and SINR-Aware Path Optimization in 5G-UAV mmWave Communication using DRL | Achilles Kiwanuka Machumilane et.al. | 2504.02688 | translate | read | null |
| 2025-04-03 | Integrating Human Knowledge Through Action Masking in Reinforcement Learning for Operations Research | Mirko Stappert et.al. | 2504.02662 | translate | read | null |
| 2025-04-03 | SymDQN: Symbolic Knowledge and Reasoning in Neural Network-based Reinforcement Learning | Ivo Amador et.al. | 2504.02654 | translate | read | null |
| 2025-04-03 | Solving the Paint Shop Problem with Flexible Management of Multi-Lane Buffers Using Reinforcement Learning and Action Masking | Mirko Stappert et.al. | 2504.02644 | translate | read | null |
| 2025-04-03 | Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving | Daoguang Zan et.al. | 2504.02605 | translate | read | link |
| 2025-04-03 | Regulating Spatial Fairness in a Tripartite Micromobility Sharing System via Reinforcement Learning | Matteo Cederle et.al. | 2504.02597 | translate | read | null |
| 2025-04-03 | LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning | Kepu Zhang et.al. | 2504.02590 | translate | read | null |
| 2025-04-04 | Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme | Yan Ma et.al. | 2504.02587 | translate | read | link |
| 2025-04-02 | OpenCodeReasoning: Advancing Data Distillation for Competitive Coding | Wasi Uddin Ahmad et.al. | 2504.01943 | translate | read | null |
| 2025-04-02 | Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity | Lisa Coiffard et.al. | 2504.01915 | translate | read | null |
| 2025-04-02 | GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning | Yanzhou Su et.al. | 2504.01886 | translate | read | link |
| 2025-04-02 | Interpreting Emergent Planning in Model-Free Reinforcement Learning | Thomas Bush et.al. | 2504.01871 | translate | read | null |
| 2025-04-02 | Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error | Anne Somalwar et.al. | 2504.01766 | translate | read | null |
| 2025-04-03 | Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning | Ke Jiang et.al. | 2504.01719 | translate | read | null |
| 2025-04-02 | ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs | Yi-Long Lu et.al. | 2504.01698 | translate | read | null |
| 2025-04-02 | 8-DoFs Cable Driven Parallel Robots for Bimanual Teleportation | Hung Hon Cheng et.al. | 2504.01554 | translate | read | null |
| 2025-04-02 | A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics | Qihao Ye et.al. | 2504.01482 | translate | read | null |
| 2025-04-02 | Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning | Llewyn Salt et.al. | 2504.01459 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)