Reinforcement Learning - 2024-03
Reinforcement Learning - 2024-03
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-03-29 | Learning Visual Quadrupedal Loco-Manipulation from Demonstrations | Zhengmao He et.al. | 2403.20328 | translate | read | null |
| 2024-03-29 | Active flow control of a turbulent separation bubble through deep reinforcement learning | Bernat Font et.al. | 2403.20295 | translate | read | null |
| 2024-03-29 | Functional Bilevel Optimization for Machine Learning | Ieva Petrulionyte et.al. | 2403.20233 | translate | read | null |
| 2024-03-29 | Decentralized Multimedia Data Sharing in IoV: A Learning-based Equilibrium of Supply and Demand | Jiani Fan et.al. | 2403.20218 | translate | read | null |
| 2024-03-29 | Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning | Duzhen Zhang et.al. | 2403.20163 | translate | read | null |
| 2024-03-29 | CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening | Hei Yi Mak et.al. | 2403.20156 | translate | read | null |
| 2024-03-29 | A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles | Jiani Fan et.al. | 2403.20151 | translate | read | null |
| 2024-03-29 | Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation | Jinyeong Park et.al. | 2403.20109 | translate | read | link |
| 2024-03-29 | Reinforcement learning for graph theory, II. Small Ramsey numbers | Mohammad Ghebleh et.al. | 2403.20055 | translate | read | null |
| 2024-03-29 | Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering | Yuki Akiyama et.al. | 2403.20020 | translate | read | null |
| 2024-03-28 | Human-compatible driving partners through data-regularized self-play reinforcement learning | Daphne Cornelisse et.al. | 2403.19648 | translate | read | link |
| 2024-03-28 | Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics | Norman Di Palo et.al. | 2403.19578 | translate | read | null |
| 2024-03-28 | Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment | Alireza Ganjdanesh et.al. | 2403.19490 | translate | read | null |
| 2024-03-28 | Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization | Teodor V. Marinov et.al. | 2403.19462 | translate | read | null |
| 2024-03-28 | RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation | Chongkai Gao et.al. | 2403.19460 | translate | read | null |
| 2024-03-28 | EDA-Driven Preprocessing for SAT Solving | Zhengyuan Shi et.al. | 2403.19446 | translate | read | null |
| 2024-03-28 | Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model | Qi Gou et.al. | 2403.19443 | translate | read | null |
| 2024-03-28 | Fine-Tuning Language Models with Reward Learning on Policy | Hao Lang et.al. | 2403.19279 | translate | read | link |
| 2024-03-28 | Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning | Dieter Coppens et.al. | 2403.19262 | translate | read | null |
| 2024-03-28 | Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning | Wei Duan et.al. | 2403.19253 | translate | read | null |
| 2024-03-27 | Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment | Li Siyao et.al. | 2403.18811 | translate | read | null |
| 2024-03-27 | CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning | Elliot Chane-Sane et.al. | 2403.18765 | translate | read | null |
| 2024-03-27 | Probabilistic Model Checking of Stochastic Reinforcement Learning Policies | Dennis Gross et.al. | 2403.18725 | translate | read | null |
| 2024-03-27 | Fpga-Based Neural Thrust Controller for UAVs | Sharif Azem et.al. | 2403.18703 | translate | read | null |
| 2024-03-27 | Safe and Robust Reinforcement-Learning: Principles and Practice | Taku Yamagata et.al. | 2403.18539 | translate | read | null |
| 2024-03-27 | Bridging the Gap: Regularized Reinforcement Learning for Improved Classical Motion Planning with Safety Modules | Elias Goldsztejn et.al. | 2403.18524 | translate | read | null |
| 2024-03-27 | VersaT2I: Improving Text-to-Image Models with Versatile Reward | Jianshu Guo et.al. | 2403.18493 | translate | read | null |
| 2024-03-27 | Scaling Vision-and-Language Navigation With Offline RL | Valay Bundele et.al. | 2403.18454 | translate | read | null |
| 2024-03-27 | FRESCO: Federated Reinforcement Energy System for Cooperative Optimization | Nicolas Mauricio Cuadrado et.al. | 2403.18444 | translate | read | null |
| 2024-03-27 | Reinforcement learning for graph theory, I. Reimplementation of Wagner’s approach | Salem Al-Yakoob et.al. | 2403.18429 | translate | read | null |
| 2024-03-26 | TractOracle: towards an anatomically-informed reward function for RL-based tractography | Antoine Théberge et.al. | 2403.17845 | translate | read | null |
| 2024-03-26 | Learning the Optimal Power Flow: Environment Design Matters | Thomas Wolgast et.al. | 2403.17831 | translate | read | link |
| 2024-03-26 | Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games | Yikuan Yan et.al. | 2403.17674 | translate | read | null |
| 2024-03-26 | Learning Goal-Directed Object Pushing in Cluttered Scenes with Location-Based Attention | Nils Dengler et.al. | 2403.17667 | translate | read | null |
| 2024-03-26 | Uncertainty-aware Distributional Offline Reinforcement Learning | Xiaocong Chen et.al. | 2403.17646 | translate | read | null |
| 2024-03-26 | PeersimGym: An Environment for Solving the Task Offloading Problem with Reinforcement Learning | Frederico Metelo et.al. | 2403.17637 | translate | read | null |
| 2024-03-26 | Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems | Siyu Wang et.al. | 2403.17634 | translate | read | null |
| 2024-03-26 | LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation | Ke Guo et.al. | 2403.17601 | translate | read | link |
| 2024-03-26 | Towards a Zero-Data, Controllable, Adaptive Dialog System | Dirk Väth et.al. | 2403.17582 | translate | read | null |
| 2024-03-26 | VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts | Marius Captari et.al. | 2403.17542 | translate | read | null |
| 2024-03-25 | An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems | Hanqing Yang et.al. | 2403.16809 | translate | read | null |
| 2024-03-25 | Enhancing Software Effort Estimation through Reinforcement Learning-based Project Management-Oriented Feature Selection | Haoyang Chen et.al. | 2403.16749 | translate | read | null |
| 2024-03-25 | Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization | Fernando Acero et.al. | 2403.16667 | translate | read | null |
| 2024-03-25 | Skill Q-Network: Learning Adaptive Skill Ensemble for Mapless Navigation in Unknown Environments | Hyunki Seong et.al. | 2403.16664 | translate | read | null |
| 2024-03-25 | Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL | Osama Ahmad et.al. | 2403.16652 | translate | read | null |
| 2024-03-25 | CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment | Feiteng Fang et.al. | 2403.16649 | translate | read | link |
| 2024-03-25 | Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications | Thao Dang et.al. | 2403.16593 | translate | read | null |
| 2024-03-25 | Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged Robot | Zifan Wang et.al. | 2403.16535 | translate | read | link |
| 2024-03-25 | Towards Cooperative Maneuver Planning in Mixed Traffic at Urban Intersections | Marvin Klimke et.al. | 2403.16478 | translate | read | null |
| 2024-03-25 | If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions | Reza Esfandiarpoor et.al. | 2403.16442 | translate | read | link |
| 2024-03-25 | Physics-informed RL for Maximal Safety Probability Estimation | Hikaru Hoshino et.al. | 2403.16391 | translate | read | null |
| 2024-03-25 | Learning Action-based Representations Using Invariance | Max Rudolph et.al. | 2403.16369 | translate | read | null |
| 2024-03-22 | Can large language models explore in-context? | Akshay Krishnamurthy et.al. | 2403.15371 | translate | read | null |
| 2024-03-22 | Planning with a Learned Policy Basis to Optimally Solve Complex Tasks | Guillermo Infante et.al. | 2403.15301 | translate | read | null |
| 2024-03-22 | Blockchain-based Pseudonym Management for Vehicle Twin Migrations in Vehicular Edge Metaverse | Jiawen Kang et.al. | 2403.15285 | translate | read | null |
| 2024-03-22 | Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies | Nicolò Botteghi et.al. | 2403.15267 | translate | read | null |
| 2024-03-22 | Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement | Jonathan Pirnay et.al. | 2403.15180 | translate | read | null |
| 2024-03-22 | Subequivariant Reinforcement Learning Framework for Coordinated Motion Control | Haoyu Wang et.al. | 2403.15100 | translate | read | null |
| 2024-03-22 | Improved Long Short-Term Memory-based Wastewater Treatment Simulators for Deep Reinforcement Learning | Esmaeel Mohammadi et.al. | 2403.15091 | translate | read | null |
| 2024-03-22 | Automated Feature Selection for Inverse Reinforcement Learning | Daulet Baimukashev et.al. | 2403.15079 | translate | read | null |
| 2024-03-22 | Testing for Fault Diversity in Reinforcement Learning | Quentin Mazouni et.al. | 2403.15065 | translate | read | null |
| 2024-03-22 | Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation | Zhenrui Yue et.al. | 2403.14952 | translate | read | null |
| 2024-03-21 | Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery | Yangchun Zhang et.al. | 2403.14593 | translate | read | null |
| 2024-03-21 | A Mathematical Introduction to Deep Reinforcement Learning for 5G/6G Applications | Farhad Rezazadeh et.al. | 2403.14516 | translate | read | null |
| 2024-03-21 | Constrained Reinforcement Learning with Smoothed Log Barrier Function | Baohe Zhang et.al. | 2403.14508 | translate | read | null |
| 2024-03-21 | On the continuity and smoothness of the value function in reinforcement learning and optimal control | Hans Harder et.al. | 2403.14432 | translate | read | null |
| 2024-03-21 | Emergent communication and learning pressures in language models: a language evolution perspective | Lukas Galke et.al. | 2403.14427 | translate | read | null |
| 2024-03-21 | Task-optimal data-driven surrogate models for eNMPC via differentiable simulation and optimization | Daniel Mayfrank et.al. | 2403.14425 | translate | read | null |
| 2024-03-21 | A reinforcement learning guided hybrid evolutionary algorithm for the latency location routing problem | Yuji Zou et.al. | 2403.14405 | translate | read | link |
| 2024-03-21 | Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression | Fernando Acero et.al. | 2403.14328 | translate | read | null |
| 2024-03-21 | Bayesian Optimization for Sample-Efficient Policy Improvement in Robotic Manipulation | Adrian Röfer et.al. | 2403.14305 | translate | read | null |
| 2024-03-21 | Reactor Optimization Benchmark by Reinforcement Learning | Deborah Schwarcz et.al. | 2403.14273 | translate | read | link |
| 2024-03-20 | Information-Theoretic Distillation for Reference-less Summarization | Jaehun Jung et.al. | 2403.13780 | translate | read | null |
| 2024-03-20 | Towards Principled Representation Learning from Videos for Reinforcement Learning | Dipendra Misra et.al. | 2403.13765 | translate | read | null |
| 2024-03-20 | Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension Study | Luca Giamattei et.al. | 2403.13729 | translate | read | null |
| 2024-03-20 | Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections | Zengqi Peng et.al. | 2403.13674 | translate | read | null |
| 2024-03-20 | Multi-agent Reinforcement Traffic Signal Control based on Interpretable Influence Mechanism and Biased ReLU Approximation | Zhiyue Luo et.al. | 2403.13639 | translate | read | null |
| 2024-03-20 | Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation | Do June Min et.al. | 2403.13578 | translate | read | link |
| 2024-03-20 | GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot | Wenxuan Song et.al. | 2403.13358 | translate | read | null |
| 2024-03-20 | Waypoint-Based Reinforcement Learning for Robot Manipulation Tasks | Shaunak A. Mehta et.al. | 2403.13281 | translate | read | null |
| 2024-03-20 | Federated reinforcement learning for robot motion planning with zero-shot generalization | Zhenyuan Yuan et.al. | 2403.13245 | translate | read | null |
| 2024-03-20 | Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0 | Jiana Liao et.al. | 2403.13237 | translate | read | null |
| 2024-03-19 | Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes | He Wang et.al. | 2403.12946 | translate | read | null |
| 2024-03-19 | Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers | Vidhi Jain et.al. | 2403.12943 | translate | read | null |
| 2024-03-19 | Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types | Rui Liu et.al. | 2403.12891 | translate | read | null |
| 2024-03-19 | HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning | Fucai Ke et.al. | 2403.12884 | translate | read | null |
| 2024-03-19 | Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning | Mirco Theile et.al. | 2403.12856 | translate | read | null |
| 2024-03-19 | Policy Bifurcation in Safe Reinforcement Learning | Wenjun Zou et.al. | 2403.12847 | translate | read | link |
| 2024-03-19 | AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents | Jieming Cui et.al. | 2403.12835 | translate | read | null |
| 2024-03-19 | Oriented and Non-oriented Cubical Surfaces in The Penteract | Manuel Estevez et.al. | 2403.12825 | translate | read | null |
| 2024-03-19 | Dynamic Manipulation of Deformable Objects using Imitation Learning with Adaptation to Hardware Constraints | Eric Hannus et.al. | 2403.12685 | translate | read | null |
| 2024-03-19 | Automated Contrastive Learning Strategy Search for Time Series | Baoyu Jing et.al. | 2403.12641 | translate | read | null |
| 2024-03-18 | The Value of Reward Lookahead in Reinforcement Learning | Nadav Merlis et.al. | 2403.11637 | translate | read | null |
| 2024-03-18 | Offline Multitask Representation Learning for Reinforcement Learning | Haque Ishfaq et.al. | 2403.11574 | translate | read | null |
| 2024-03-18 | Reinforcement Learning with Token-level Feedback for Controllable Text Generation | Wendi Li et.al. | 2403.11558 | translate | read | null |
| 2024-03-18 | TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling | Weiran Chen et.al. | 2403.11550 | translate | read | null |
| 2024-03-18 | State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards | Yuto Tanimoto et.al. | 2403.11520 | translate | read | link |
| 2024-03-18 | Demystifying Deep Reinforcement Learning-Based Autonomous Vehicle Decision-Making | Hanxi Wan et.al. | 2403.11432 | translate | read | null |
| 2024-03-18 | Variational Sampling of Temporal Trajectories | Jurijs Nazarovs et.al. | 2403.11418 | translate | read | null |
| 2024-03-17 | Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective | Muhammad Aneeq uz Zaman et.al. | 2403.11345 | translate | read | null |
| 2024-03-17 | Causality from Bottom to Top: A Survey | Abraham Itzhak Weinberg et.al. | 2403.11219 | translate | read | null |
| 2024-03-17 | Continuous Jumping of a Parallel Wire-Driven Monopedal Robot RAMIEL Using Reinforcement Learning | Kento Kawaharazuka et.al. | 2403.11205 | translate | read | null |
| 2024-03-14 | Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning | Zhishuai Liu et.al. | 2403.09621 | translate | read | null |
| 2024-03-14 | ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models | Runyu Ma et.al. | 2403.09583 | translate | read | null |
| 2024-03-14 | A Reinforcement Learning Approach to Dairy Farm Battery Management using Q Learning | Nawazish Ali et.al. | 2403.09499 | translate | read | null |
| 2024-03-14 | Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision | Zhiqing Sun et.al. | 2403.09472 | translate | read | link |
| 2024-03-14 | A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces | Hyuckjin Choi et.al. | 2403.09270 | translate | read | null |
| 2024-03-14 | Leveraging Constraint Programming in a Deep Learning Approach for Dynamically Solving the Flexible Job-Shop Scheduling Problem | Imanol Echeverria et.al. | 2403.09249 | translate | read | null |
| 2024-03-14 | Rumor Mitigation in Social Media Platforms with Deep Reinforcement Learning | Hongyuan Su et.al. | 2403.09217 | translate | read | null |
| 2024-03-14 | MetroGNN: Metro Network Expansion with Reinforcement Learning | Hongyuan Su et.al. | 2403.09197 | translate | read | null |
| 2024-03-14 | SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning | Nicholas Zolman et.al. | 2403.09110 | translate | read | link |
| 2024-03-14 | CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences | Martin Weyssow et.al. | 2403.09032 | translate | read | link |
| 2024-03-13 | TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning | Shangding Gu et.al. | 2403.08694 | translate | read | null |
| 2024-03-13 | Digital Twin-assisted Reinforcement Learning for Resource-aware Microservice Offloading in Edge Computing | Xiangchun Chen et.al. | 2403.08687 | translate | read | null |
| 2024-03-13 | Meta Reinforcement Learning for Resource Allocation in Aerial Active-RIS-assisted Networks with Rate-Splitting Multiple Access | Sajad Faramarzi et.al. | 2403.08648 | translate | read | null |
| 2024-03-13 | Human Alignment of Large Language Models through Online Preference Optimisation | Daniele Calandriello et.al. | 2403.08635 | translate | read | null |
| 2024-03-13 | Specification Overfitting in Artificial Intelligence | Benjamin Roth et.al. | 2403.08425 | translate | read | null |
| 2024-03-13 | Optimizing Risk-averse Human-AI Hybrid Teams | Andrew Fuchs et.al. | 2403.08386 | translate | read | null |
| 2024-03-13 | Learning to Describe for Predicting Zero-shot Drug-Drug Interactions | Fangqi Zhu et.al. | 2403.08377 | translate | read | link |
| 2024-03-13 | LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments | Maonan Wang et.al. | 2403.08337 | translate | read | link |
| 2024-03-14 | HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback | Ang Li et.al. | 2403.08309 | translate | read | null |
| 2024-03-13 | SpaceOctopus: An Octopus-inspired Motion Planning Framework for Multi-arm Space Robot | Wenbo Zhao et.al. | 2403.08219 | translate | read | null |
| 2024-03-12 | TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation | Shivin Dass et.al. | 2403.07869 | translate | read | null |
| 2024-03-12 | Exploring Safety Generalization Challenges of Large Language Models via Code | Qibing Ren et.al. | 2403.07865 | translate | read | null |
| 2024-03-12 | DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation | Chen Wang et.al. | 2403.07788 | translate | read | null |
| 2024-03-12 | Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards | Wei Shen et.al. | 2403.07708 | translate | read | null |
| 2024-03-12 | Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning | Motoki Omura et.al. | 2403.07704 | translate | read | null |
| 2024-03-12 | Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation | Michael Ogezi et.al. | 2403.07605 | translate | read | null |
| 2024-03-12 | An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning | Weiwei Gu et.al. | 2403.07566 | translate | read | null |
| 2024-03-12 | Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding | Huijie Tang et.al. | 2403.07559 | translate | read | link |
| 2024-03-12 | Constrained Optimal Fuel Consumption of HEV: A Constrained Reinforcement Learning Approach | Shuchang Yan et.al. | 2403.07503 | translate | read | null |
| 2024-03-12 | Optimization of Pressure Management Strategies for Geological CO2 Sequestration Using Surrogate Model-based Reinforcement Learning | Jungang Chen et.al. | 2403.07360 | translate | read | null |
| 2024-03-11 | Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts | Onur Celik et.al. | 2403.06966 | translate | read | null |
| 2024-03-11 | Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning | Junseok Park et.al. | 2403.06880 | translate | read | null |
| 2024-03-11 | Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification | Joar Skalse et.al. | 2403.06854 | translate | read | null |
| 2024-03-11 | In-context Exploration-Exploitation for Reinforcement Learning | Zhenwen Dai et.al. | 2403.06826 | translate | read | null |
| 2024-03-11 | ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment | Hao-Lun Hsu et.al. | 2403.06814 | translate | read | null |
| 2024-03-11 | From Factor Models to Deep Learning: Machine Learning in Reshaping Empirical Asset Pricing | Junyi Ye et.al. | 2403.06779 | translate | read | null |
| 2024-03-11 | ALaRM: Align Language Models via Hierarchical Rewards Modeling | Yuhang Lai et.al. | 2403.06754 | translate | read | null |
| 2024-03-11 | Generalising Multi-Agent Cooperation through Task-Agnostic Communication | Dulhan Jayalath et.al. | 2403.06750 | translate | read | link |
| 2024-03-11 | Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback | Adarsh N L et.al. | 2403.06735 | translate | read | null |
| 2024-03-11 | Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning | Zijian Zhou et.al. | 2403.06728 | translate | read | null |
| 2024-03-08 | Will GPT-4 Run DOOM? | Adrian de Wynter et.al. | 2403.05468 | translate | read | null |
| 2024-03-08 | Switching the Loss Reduces the Cost in Batch Reinforcement Learning | Alex Ayoub et.al. | 2403.05385 | translate | read | null |
| 2024-03-08 | Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation | Xiaoying Zhang et.al. | 2403.05171 | translate | read | null |
| 2024-03-08 | Inverse Design of Photonic Crystal Surface Emitting Lasers is a Sequence Modeling Problem | Ceyao Zhang et.al. | 2403.05149 | translate | read | null |
| 2024-03-08 | ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models | Jun Xu et.al. | 2403.05132 | translate | read | null |
| 2024-03-08 | RLPeri: Accelerating Visual Perimetry Test with Reinforcement Learning and Convolutional Feature Extraction | Tanvi Verma et.al. | 2403.05112 | translate | read | null |
| 2024-03-08 | Efficient Data Collection for Robotic Manipulation via Compositional Generalization | Jensen Gao et.al. | 2403.05110 | translate | read | null |
| 2024-03-08 | Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detection | Jared M. Ping et.al. | 2403.05106 | translate | read | null |
| 2024-03-08 | Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning | Hongjoon Ahn et.al. | 2403.05066 | translate | read | null |
| 2024-03-08 | Aligning Large Language Models for Controllable Recommendations | Wensheng Lu et.al. | 2403.05063 | translate | read | null |
| 2024-03-07 | Teaching Large Language Models to Reason with Reinforcement Learning | Alex Havrilla et.al. | 2403.04642 | translate | read | null |
| 2024-03-07 | Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace | Léopold Maytié et.al. | 2403.04588 | translate | read | null |
| 2024-03-07 | Learning Agility Adaptation for Flight in Clutter | Guangyu Zhao et.al. | 2403.04586 | translate | read | null |
| 2024-03-07 | Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition | Long-Fei Li et.al. | 2403.04568 | translate | read | null |
| 2024-03-07 | Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation | Fabian Otto et.al. | 2403.04453 | translate | read | null |
| 2024-03-07 | Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation | Tairan He et.al. | 2403.04436 | translate | read | null |
| 2024-03-07 | iTRPL: An Intelligent and Trusted RPL Protocol based on Multi-Agent Reinforcement Learning | Debasmita Dey et.al. | 2403.04416 | translate | read | null |
| 2024-03-07 | Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning | Jing Guo Jing Guo et.al. | 2403.04412 | translate | read | null |
| 2024-03-07 | Model-Free Load Frequency Control of Nonlinear Power Systems Based on Deep Reinforcement Learning | Xiaodi Chen et.al. | 2403.04374 | translate | read | null |
| 2024-03-07 | Symmetry Considerations for Learning Task Symmetric Robot Policies | Mayank Mittal et.al. | 2403.04359 | translate | read | null |
| 2024-03-06 | 3D Diffusion Policy | Yanjie Ze et.al. | 2403.03954 | translate | read | link |
| 2024-03-06 | Stop Regressing: Training Value Functions via Classification for Scalable Deep RL | Jesse Farebrother et.al. | 2403.03950 | translate | read | null |
| 2024-03-06 | Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation | Marcel Torne et.al. | 2403.03949 | translate | read | null |
| 2024-03-06 | Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning | Zifan Xu et.al. | 2403.03848 | translate | read | null |
| 2024-03-06 | A Survey on Applications of Reinforcement Learning in Spatial Resource Allocation | Di Zhang et.al. | 2403.03643 | translate | read | null |
| 2024-03-06 | Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Yuhong Sun et.al. | 2403.03558 | translate | read | link |
| 2024-03-06 | Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning | Zida Wu et.al. | 2403.03552 | translate | read | null |
| 2024-03-05 | RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging | Jordan Poots et.al. | 2403.03359 | translate | read | null |
| 2024-03-05 | Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks | Jianfeng Gao et.al. | 2403.03270 | translate | read | null |
| 2024-03-05 | Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination | Liangzhou Wang et.al. | 2403.03172 | translate | read | null |
| 2024-03-05 | Leveraging Federated Learning and Edge Computing for Recommendation Systems within Cloud Computing Networks | Yaqian Qi et.al. | 2403.03165 | translate | read | null |
| 2024-03-05 | Language Guided Exploration for RL Agents in Text Environments | Hitesh Golchha et.al. | 2403.03141 | translate | read | null |
| 2024-03-05 | SplAgger: Split Aggregation for Meta-Reinforcement Learning | Jacob Beck et.al. | 2403.03020 | translate | read | null |
| 2024-03-05 | Autonomous vehicle decision and control through reinforcement learning with traffic flow randomization | Yuan Lin et.al. | 2403.02882 | translate | read | null |
| 2024-03-05 | SpaceHopper: A Small-Scale Legged Robot for Exploring Low-Gravity Celestial Bodies | Alexander Spiridonov et.al. | 2403.02831 | translate | read | null |
| 2024-03-05 | A Zero-Shot Reinforcement Learning Strategy for Autonomous Guidewire Navigation | Valentina Scarponi et.al. | 2403.02777 | translate | read | null |
| 2024-03-05 | RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches | Priya Sundaresan et.al. | 2403.02709 | translate | read | null |
| 2024-03-05 | Fighting Game Adaptive Background Music for Improved Gameplay | Ibrahim Khan et.al. | 2403.02701 | translate | read | null |
| 2024-03-05 | PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning | Ke Zhang et.al. | 2403.02635 | translate | read | null |
| 2024-03-02 | Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Alexander Scarlatos et.al. | 2403.01304 | translate | read | link |
| 2024-03-02 | Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey | Hamza Kheddar et.al. | 2403.01255 | translate | read | null |
| 2024-03-02 | Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding | Ha-Thanh Nguyen et.al. | 2403.01185 | translate | read | null |
| 2024-03-02 | Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning | Hyungho Na et.al. | 2403.01112 | translate | read | null |
| 2024-03-02 | Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL) | Noah Ford et.al. | 2403.01059 | translate | read | null |
| 2024-03-01 | A Holistic Power Optimization Approach for Microgrid Control Based on Deep Reinforcement Learning | Fulong Yao et.al. | 2403.01013 | translate | read | null |
| 2024-03-01 | Policy Optimization for PDE Control with a Warm Start | Xiangyuan Zhang et.al. | 2403.01005 | translate | read | null |
| 2024-03-01 | On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games | Awni Altabaa et.al. | 2403.00993 | translate | read | null |
| 2024-03-01 | SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation | Noriaki Hirose et.al. | 2403.00991 | translate | read | null |
| 2024-03-01 | Scale-free Adversarial Reinforcement Learning | Mingyu Chen et.al. | 2403.00930 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)