Reinforcement Learning - 2024-04
Reinforcement Learning - 2024-04
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-04-30 | Collaborative Control Method of Transit Signal Priority Based on Cooperative Game and Reinforcement Learning | Hao Qin et.al. | 2404.19683 | translate | read | null |
| 2024-04-30 | Towards Generalist Robot Learning from Internet Video: A Survey | Robert McCarthy et.al. | 2404.19664 | translate | read | null |
| 2024-04-30 | Short term vs. long term: optimization of microswimmer navigation on different time horizons | Navid Mousavi et.al. | 2404.19561 | translate | read | null |
| 2024-04-30 | Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation | Cengis Hasan et.al. | 2404.19462 | translate | read | null |
| 2024-04-30 | Imitation Learning: A Survey of Learning Methods, Environments and Metrics | Nathan Gavenski et.al. | 2404.19456 | translate | read | null |
| 2024-04-30 | Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning | Mathieu Rita et.al. | 2404.19409 | translate | read | link |
| 2024-04-30 | Numeric Reward Machines | Kristina Levina et.al. | 2404.19370 | translate | read | null |
| 2024-04-30 | Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning | Chenjia Bai et.al. | 2404.19346 | translate | read | link |
| 2024-04-30 | Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning | Qiaosheng Zhang et.al. | 2404.19292 | translate | read | null |
| 2024-04-30 | DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets | Xiaoyu Huang et.al. | 2404.19264 | translate | read | null |
| 2024-04-29 | DPO Meets PPO: Reinforced Token Optimization for RLHF | Han Zhong et.al. | 2404.18922 | translate | read | null |
| 2024-04-29 | Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty | Laixi Shi et.al. | 2404.18909 | translate | read | null |
| 2024-04-29 | Overcoming Knowledge Barriers: Online Imitation Learning from Observation with Pretrained World Models | Xingyuan Zhang et.al. | 2404.18896 | translate | read | null |
| 2024-04-29 | More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness | Aaron J. Li et.al. | 2404.18870 | translate | read | link |
| 2024-04-29 | Performance-Aligned LLMs for Generating Fast Code | Daniel Nichols et.al. | 2404.18864 | translate | read | null |
| 2024-04-29 | PlanNetX: Learning an Efficient Neural Network Planner from MPC for Longitudinal Control | Jasper Hoffmann et.al. | 2404.18863 | translate | read | null |
| 2024-04-30 | Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization | Qi Zhang et.al. | 2404.18826 | translate | read | null |
| 2024-04-29 | Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies | Seyed Soroush Karimi Madahi et.al. | 2404.18821 | translate | read | null |
| 2024-04-29 | Multi-Agent Synchronization Tasks | Rolando Fernandez et.al. | 2404.18798 | translate | read | null |
| 2024-04-29 | Resource-rational reinforcement learning and sensorimotor causal states | Sarah Marzen et.al. | 2404.18775 | translate | read | null |
| 2024-04-26 | Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | Stephen Zhao et.al. | 2404.17546 | translate | read | null |
| 2024-04-26 | Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations | Puhao Li et.al. | 2404.17521 | translate | read | link |
| 2024-04-26 | Quantum Multi-Agent Reinforcement Learning for Aerial Ad-hoc Networks | Theodora-Augustina Drăgan et.al. | 2404.17499 | translate | read | null |
| 2024-04-26 | Q-Learning to navigate turbulence without a map | Marco Rando et.al. | 2404.17495 | translate | read | null |
| 2024-04-26 | Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning | Hao Liu et.al. | 2404.17379 | translate | read | null |
| 2024-04-26 | When to Trust LLMs: Aligning Confidence with Response Quality | Shuchang Tao et.al. | 2404.17287 | translate | read | null |
| 2024-04-26 | Enhancing Privacy and Security of Autonomous UAV Navigation | Vatsal Aggarwal et.al. | 2404.17225 | translate | read | null |
| 2024-04-26 | Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving | C. Gong et.al. | 2404.17198 | translate | read | null |
| 2024-04-26 | An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging | Sadjad Anzabi Zadeh et.al. | 2404.17187 | translate | read | null |
| 2024-04-25 | Compiler for Distributed Quantum Computing: a Reinforcement Learning Approach | Panagiotis Promponas et.al. | 2404.17077 | translate | read | null |
| 2024-04-25 | REBEL: Reinforcement Learning via Regressing Relative Rewards | Zhaolin Gao et.al. | 2404.16767 | translate | read | null |
| 2024-04-25 | Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods | Min Kyu Shin et.al. | 2404.16721 | translate | read | null |
| 2024-04-25 | RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments | Diego Martinez-Baselga et.al. | 2404.16672 | translate | read | null |
| 2024-04-25 | Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare | Emre Can Acikgoz et.al. | 2404.16621 | translate | read | null |
| 2024-04-25 | Exploring the Dynamics of Data Transmission in 5G Networks: A Conceptual Analysis | Nikita Smirnov et.al. | 2404.16508 | translate | read | null |
| 2024-04-25 | Leveraging Pretrained Latent Representations for Few-Shot Imitation Learning on a Dexterous Robotic Hand | Davide Liconti et.al. | 2404.16483 | translate | read | null |
| 2024-04-25 | A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints | Bram De Cooman et.al. | 2404.16468 | translate | read | null |
| 2024-04-25 | Offline Reinforcement Learning with Behavioral Supervisor Tuning | Padmanaba Srinivasan et.al. | 2404.16399 | translate | read | null |
| 2024-04-25 | SwarmRL: Building the Future of Smart Active Systems | Samuel Tovey et.al. | 2404.16388 | translate | read | link |
| 2024-04-25 | Reinforcement Learning with Generative Models for Compact Support Sets | Nico Schiavone et.al. | 2404.16300 | translate | read | link |
| 2024-04-24 | DPO: Differential reinforcement learning with application to optimal configuration search | Chandrajit Bajaj et.al. | 2404.15617 | translate | read | null |
| 2024-04-24 | GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL | Lang Qin et.al. | 2404.15597 | translate | read | null |
| 2024-04-24 | Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems | Sarah Keren et.al. | 2404.15583 | translate | read | null |
| 2024-04-23 | An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models | Yangchen Pan et.al. | 2404.15518 | translate | read | null |
| 2024-04-23 | The Power of Resets in Online Reinforcement Learning | Zakaria Mhammedi et.al. | 2404.15417 | translate | read | null |
| 2024-04-23 | Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments | Mateus G. Machado et.al. | 2404.15410 | translate | read | link |
| 2024-04-23 | Reinforcement Learning with Adaptive Control Regularization for Safe Control of Critical Systems | Haozhe Tian et.al. | 2404.15199 | translate | read | null |
| 2024-04-23 | Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation | Xun Wu et.al. | 2404.15100 | translate | read | null |
| 2024-04-23 | Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot | Neil Guan et.al. | 2404.15096 | translate | read | null |
| 2024-04-23 | Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem | Raphael Koster et.al. | 2404.15059 | translate | read | null |
| 2024-04-23 | Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems | Xiaoshuang Chen et.al. | 2404.14961 | translate | read | null |
| 2024-04-23 | Multi-Objective Deep Reinforcement Learning for 5G Base Station Placement to Support Localisation for Future Sustainable Traffic | Ahmed Al-Tahmeesschi et.al. | 2404.14954 | translate | read | null |
| 2024-04-23 | MultiSTOP: Solving Functional Equations with Reinforcement Learning | Alessandro Trenta et.al. | 2404.14909 | translate | read | null |
| 2024-04-23 | Unitary Synthesis of Clifford+T Circuits with Reinforcement Learning | Sebastian Rietsch et.al. | 2404.14865 | translate | read | null |
| 2024-04-23 | Evolutionary Reinforcement Learning via Cooperative Coevolution | Chengpeng Hu et.al. | 2404.14763 | translate | read | null |
| 2024-04-23 | Rank2Reward: Learning Shaped Reward Functions from Passive Video | Daniel Yang et.al. | 2404.14735 | translate | read | null |
| 2024-04-22 | Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data | Fahim Tajwar et.al. | 2404.14367 | translate | read | link |
| 2024-04-22 | PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving | Jie Cheng et.al. | 2404.14327 | translate | read | null |
| 2024-04-22 | Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs | David R. Nickel et.al. | 2404.14319 | translate | read | null |
| 2024-04-22 | LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots | Dongge Han et.al. | 2404.14285 | translate | read | null |
| 2024-04-22 | Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories | Ning Yang et.al. | 2404.14238 | translate | read | null |
| 2024-04-22 | Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems | Yiyang Zhu et.al. | 2404.14092 | translate | read | null |
| 2024-04-22 | Mechanistic Interpretability for AI Safety – A Review | Leonard Bereska et.al. | 2404.14082 | translate | read | null |
| 2024-04-22 | Research on Robot Path Planning Based on Reinforcement Learning | Wang Ruiqi et.al. | 2404.14077 | translate | read | link |
| 2024-04-22 | Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras | Mhairi Dunion et.al. | 2404.14064 | translate | read | link |
| 2024-04-22 | A survey of air combat behavior modeling using machine learning | Patrick Ribu Gorton et.al. | 2404.13954 | translate | read | null |
| 2024-04-19 | Mapping Social Choice Theory to RLHF | Jessica Dai et.al. | 2404.13038 | translate | read | null |
| 2024-04-19 | Deep Reinforcement Learning-Based Active Flow Control of an Elliptical Cylinder: Transitioning from an Elliptical Cylinder to a Circular Cylinder and a Flat Plate | Wang Jia et.al. | 2404.13003 | translate | read | null |
| 2024-04-19 | Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning | Lisheng Wu et.al. | 2404.12999 | translate | read | null |
| 2024-04-19 | MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering | Avinash Anand et.al. | 2404.12926 | translate | read | null |
| 2024-04-19 | Zero-Shot Stitching in Reinforcement Learning using Relative Representations | Antonio Pio Ricciardi et.al. | 2404.12917 | translate | read | null |
| 2024-04-19 | MAexp: A Generic Platform for RL-based Multi-Agent Exploration | Shaohao Zhu et.al. | 2404.12824 | translate | read | link |
| 2024-04-19 | Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation | Qiang He et.al. | 2404.12754 | translate | read | link |
| 2024-04-19 | Demonstration of quantum projective simulation on a single-photon-based quantum computer | Giacomo Franceschetto et.al. | 2404.12729 | translate | read | null |
| 2024-04-19 | Energy Conserved Failure Detection for NS-IoT Systems | Guojin Liu et.al. | 2404.12713 | translate | read | null |
| 2024-04-19 | Single-Task Continual Offline Reinforcement Learning | Sibo Gai et.al. | 2404.12639 | translate | read | null |
| 2024-04-18 | From $r$ to $Q^*$ : Your Language Model is Secretly a Q-Function | Rafael Rafailov et.al. | 2404.12358 | translate | read | null |
| 2024-04-18 | Improving the interpretability of GNN predictions through conformal-based graph sparsification | Pablo Sanchez-Martin et.al. | 2404.12356 | translate | read | link |
| 2024-04-18 | Practical Considerations for Discrete-Time Implementations of Continuous-Time Control Barrier Function-Based Safety Filters | Lukas Brunke et.al. | 2404.12329 | translate | read | null |
| 2024-04-18 | ASID: Active Exploration for System Identification in Robotic Manipulation | Marius Memmel et.al. | 2404.12308 | translate | read | null |
| 2024-04-18 | RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective | Chenxi Wang et.al. | 2404.12281 | translate | read | null |
| 2024-04-18 | Privacy-Preserving UCB Decision Process Verification via zk-SNARKs | Xikun Jiang et.al. | 2404.12186 | translate | read | null |
| 2024-04-18 | Aligning language models with human preferences | Tomasz Korbak et.al. | 2404.12150 | translate | read | link |
| 2024-04-19 | Robust and Adaptive Deep Reinforcement Learning for Enhancing Flow Control around a Square Cylinder with Varying Reynolds Numbers | Wang Jia et.al. | 2404.12123 | translate | read | null |
| 2024-04-18 | X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner | Haoyuan Jiang et.al. | 2404.12090 | translate | read | link |
| 2024-04-18 | Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning | Hyunwoo Park et.al. | 2404.12079 | translate | read | null |
| 2024-04-17 | Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding | Zezhong Fan et.al. | 2404.11589 | translate | read | null |
| 2024-04-17 | Deep Policy Optimization with Temporal Logic Constraints | Ameesh Shah et.al. | 2404.11578 | translate | read | null |
| 2024-04-17 | Spatio-Temporal Motion Retargeting for Quadruped Robots | Taerim Yoon et.al. | 2404.11557 | translate | read | null |
| 2024-04-17 | VC Theory for Inventory Policies | Yaqi Xie et.al. | 2404.11509 | translate | read | null |
| 2024-04-17 | Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem | Bowen Fang et.al. | 2404.11458 | translate | read | null |
| 2024-04-17 | What-if Analysis Framework for Digital Twins in 6G Wireless Network Management | Elif Ak et.al. | 2404.11394 | translate | read | null |
| 2024-04-17 | Convergence of Policy Gradient for Stochastic Linear-Quadratic Control Problem in Infinite Horizon | Xinpei Zhang et.al. | 2404.11382 | translate | read | null |
| 2024-04-17 | Following the Human Thread in Social Navigation | Luca Scofano et.al. | 2404.11327 | translate | read | link |
| 2024-04-17 | On Learning Parities with Dependent Noise | Noah Golowich et.al. | 2404.11325 | translate | read | null |
| 2024-04-17 | Physics-informed Actor-Critic for Coordination of Virtual Inertia from Power Distribution Systems | Simon Stock et.al. | 2404.11149 | translate | read | null |
| 2024-04-16 | Settling Constant Regrets in Linear Markov Decision Processes | Weitong Zhang et.al. | 2404.10745 | translate | read | null |
| 2024-04-16 | N-Agent Ad Hoc Teamwork | Caroline Wang et.al. | 2404.10740 | translate | read | null |
| 2024-04-16 | Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration | Benjamin A Newman et.al. | 2404.10733 | translate | read | null |
| 2024-04-16 | Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning | Hao-Lun Hsu et.al. | 2404.10728 | translate | read | null |
| 2024-04-16 | Automatic re-calibration of quantum devices by reinforcement learning | T. Crosta et.al. | 2404.10726 | translate | read | null |
| 2024-04-16 | Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study | Shusheng Xu et.al. | 2404.10719 | translate | read | null |
| 2024-04-16 | Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning | David Winkel et.al. | 2404.10683 | translate | read | null |
| 2024-04-16 | SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation | Chang Chen et.al. | 2404.10675 | translate | read | null |
| 2024-04-16 | Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay | Jinmei Liu et.al. | 2404.10662 | translate | read | link |
| 2024-04-16 | Trajectory Planning using Reinforcement Learning for Interactive Overtaking Maneuvers in Autonomous Racing Scenarios | Levent Ögretmen et.al. | 2404.10658 | translate | read | null |
| 2024-04-15 | Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model | Hyunsoo Cho et.al. | 2404.09717 | translate | read | null |
| 2024-04-15 | Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning | Linjie Xu et.al. | 2404.09715 | translate | read | null |
| 2024-04-15 | Learn Your Reference Model for Real Good Alignment | Alexey Gorbatovski et.al. | 2404.09656 | translate | read | null |
| 2024-04-15 | Reliability Estimation of News Media Sources: Birds of a Feather Flock Together | Sergio Burdisso et.al. | 2404.09565 | translate | read | null |
| 2024-04-15 | Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning | Tidiane Camaret Ndir et.al. | 2404.09521 | translate | read | link |
| 2024-04-14 | Correlated Mean Field Imitation Learning | Zhiyu Zhao et.al. | 2404.09324 | translate | read | null |
| 2024-04-14 | Egret: Reinforcement Mechanism for Sequential Computation Offloading in Edge Computing | Haosong Peng et.al. | 2404.09285 | translate | read | null |
| 2024-04-14 | A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs | Elliot Kolker-Hicks et.al. | 2404.09264 | translate | read | null |
| 2024-04-14 | Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts | Jing-Cheng Pang et.al. | 2404.09248 | translate | read | null |
| 2024-04-14 | Advanced Intelligent Optimization Algorithms for Multi-Objective Optimal Power Flow in Future Power Systems: A Review | Yuyan Li et.al. | 2404.09203 | translate | read | null |
| 2024-04-12 | Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation | Hanlin Tian et.al. | 2404.08570 | translate | read | null |
| 2024-04-12 | RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs | Shreyas Chaudhari et.al. | 2404.08555 | translate | read | null |
| 2024-04-12 | Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement | Lucas Murray et.al. | 2404.08523 | translate | read | null |
| 2024-04-12 | Adversarial Imitation Learning via Boosting | Jonathan D. Chang et.al. | 2404.08513 | translate | read | null |
| 2024-04-12 | Prescribing Optimal Health-Aware Operation for Urban Air Mobility with Deep Reinforcement Learning | Mina Montazeri et.al. | 2404.08497 | translate | read | null |
| 2024-04-12 | Dataset Reset Policy Optimization for RLHF | Jonathan D. Chang et.al. | 2404.08495 | translate | read | link |
| 2024-04-12 | Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing | Cui Zhang et.al. | 2404.08444 | translate | read | null |
| 2024-04-12 | SIR-RL: Reinforcement Learning for Optimized Policy Control during Epidemiological Outbreaks in Emerging Market and Developing Economies | Maeghal Jain et.al. | 2404.08423 | translate | read | null |
| 2024-04-12 | TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability | Shiwei Lian et.al. | 2404.08353 | translate | read | null |
| 2024-04-12 | Agile and versatile bipedal robot tracking control through reinforcement learning | Jiayi Li et.al. | 2404.08246 | translate | read | null |
| 2024-04-11 | High-Dimension Human Value Representation in Large Language Models | Samuel Cahyawijaya et.al. | 2404.07900 | translate | read | null |
| 2024-04-11 | Data-Driven System Identification of Quadrotors Subject to Motor Delays | Jonas Eschmann et.al. | 2404.07837 | translate | read | null |
| 2024-04-11 | On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning | Giuseppe Canonaco et.al. | 2404.07826 | translate | read | null |
| 2024-04-11 | An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization | Minshuo Chen et.al. | 2404.07771 | translate | read | null |
| 2024-04-11 | Differentially Private Reinforcement Learning with Self-Play | Dan Qiao et.al. | 2404.07559 | translate | read | null |
| 2024-04-11 | Enhancing Policy Gradient with the Polyak Step-Size Adaption | Yunxiang Li et.al. | 2404.07525 | translate | read | null |
| 2024-04-11 | Generative Probabilistic Planning for Optimizing Supply Chain Networks | Hyung-il Ahn et.al. | 2404.07511 | translate | read | null |
| 2024-04-11 | Neural Fault Injection: Generating Software Faults from Natural Language | Domenico Cotroneo et.al. | 2404.07491 | translate | read | null |
| 2024-04-11 | Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains | Soichiro Nishimori et.al. | 2404.07465 | translate | read | null |
| 2024-04-11 | UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning | Saichao Liu et.al. | 2404.07453 | translate | read | null |
| 2024-04-10 | Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery | Zohre Karimi et.al. | 2404.07185 | translate | read | null |
| 2024-04-10 | Adaptive behavior with stable synapses | Cristiano Capone et.al. | 2404.07150 | translate | read | null |
| 2024-04-10 | How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models | Unnseo Park et.al. | 2404.07148 | translate | read | null |
| 2024-04-10 | Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection | Linas Nasvytis et.al. | 2404.07099 | translate | read | link |
| 2024-04-10 | Improving Language Model Reasoning with Self-motivated Learning | Yunlong Feng et.al. | 2404.07017 | translate | read | null |
| 2024-04-10 | Agent-driven Generative Semantic Communication for Remote Surveillance | Wanting Yang et.al. | 2404.06997 | translate | read | null |
| 2024-04-10 | Deep Reinforcement Learning for Mobile Robot Path Planning | Hao Liu et.al. | 2404.06974 | translate | read | null |
| 2024-04-10 | UAV-Assisted Enhanced Coverage and Capacity in Dynamic MU-mMIMO IoT Systems: A Deep Reinforcement Learning Approach | MohammadMahdi Ghadaksaz et.al. | 2404.06726 | translate | read | null |
| 2024-04-10 | Dual Ensemble Kalman Filter for Stochastic Optimal Control | Anant A. Joshi et.al. | 2404.06696 | translate | read | null |
| 2024-04-09 | Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective | Victor-Alexandru Darvariu et.al. | 2404.06492 | translate | read | null |
| 2024-04-09 | Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints | Hritik Bana et.al. | 2404.06423 | translate | read | null |
| 2024-04-09 | The Power in Communication: Power Regularization of Communication for Autonomy in Cooperative Multi-Agent Reinforcement Learning | Nancirose Piazza et.al. | 2404.06387 | translate | read | null |
| 2024-04-09 | Policy-Guided Diffusion | Matthew Thomas Jackson et.al. | 2404.06356 | translate | read | link |
| 2024-04-09 | Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning | Yanjie Li et.al. | 2404.06330 | translate | read | null |
| 2024-04-09 | Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning | Xudong Yu et.al. | 2404.06188 | translate | read | null |
| 2024-04-09 | A quantum information theoretic analysis of reinforcement learning-assisted quantum architecture search | Abhishek Sadhu et.al. | 2404.06174 | translate | read | null |
| 2024-04-09 | Adaptable Recovery Behaviors in Robotics: A Behavior Trees and Motion Generators(BTMG) Approach for Failure Management | Faseeh Ahmad et.al. | 2404.06129 | translate | read | null |
| 2024-04-09 | Automatic Configuration Tuning on Cloud Database: A Survey | Limeng Zhang et.al. | 2404.06043 | translate | read | null |
| 2024-04-09 | Commute with Community: Enhancing Shared Travel through Social Networks | Tian Siyuan et.al. | 2404.05987 | translate | read | null |
| 2024-04-08 | Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer | Xinyang Gu et.al. | 2404.05695 | translate | read | null |
| 2024-04-08 | YaART: Yet Another ART Rendering Technology | Sergey Kastryulin et.al. | 2404.05666 | translate | read | null |
| 2024-04-08 | Dynamic Backtracking in GFlowNet: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms | Shuai Guo et.al. | 2404.05576 | translate | read | null |
| 2024-04-08 | Optimal Flow Admission Control in Edge Computing via Safe Reinforcement Learning | A. Fox et.al. | 2404.05564 | translate | read | null |
| 2024-04-08 | Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data | Tim Baumgärtner et.al. | 2404.05530 | translate | read | null |
| 2024-04-08 | CNN-based Game State Detection for a Foosball Table | David Hagens et.al. | 2404.05357 | translate | read | null |
| 2024-04-08 | Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models | Yutao Ouyang et.al. | 2404.05291 | translate | read | null |
| 2024-04-08 | SAFE-GIL: SAFEty Guided Imitation Learning | Yusuf Umut Ciftci et.al. | 2404.05249 | translate | read | null |
| 2024-04-08 | MeSA-DRL: Memory-Enhanced Deep Reinforcement Learning for Advanced Socially Aware Robot Navigation in Crowded Environments | Mannan Saeed Muhammad et.al. | 2404.05203 | translate | read | null |
| 2024-04-08 | Decision Transformer for Wireless Communications: A New Paradigm of Resource Management | Jie Zhang et.al. | 2404.05199 | translate | read | null |
| 2024-04-05 | Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution | Tim Seyde et.al. | 2404.04253 | translate | read | null |
| 2024-04-05 | Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation | Lanpei Li et.al. | 2404.04219 | translate | read | null |
| 2024-04-05 | Enhancing IoT Intelligence: A Transformer-based Reinforcement Learning Methodology | Gaith Rjoub et.al. | 2404.04205 | translate | read | null |
| 2024-04-05 | Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report | Jerrod Wigmore et.al. | 2404.04106 | translate | read | null |
| 2024-04-05 | Dynamic Prompt Optimizing for Text-to-Image Generation | Wenyi Mo et.al. | 2404.04095 | translate | read | link |
| 2024-04-05 | Demonstration Guided Multi-Objective Reinforcement Learning | Junlin Lu et.al. | 2404.03997 | translate | read | null |
| 2024-04-05 | A proximal policy optimization based intelligent home solar management | Kode Creer et.al. | 2404.03888 | translate | read | null |
| 2024-04-05 | Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration | Xudong Guo et.al. | 2404.03869 | translate | read | null |
| 2024-04-04 | Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning | Noah Golowich et.al. | 2404.03774 | translate | read | null |
| 2024-04-04 | A Reinforcement Learning based Reset Policy for CDCL SAT Solvers | Chunxiao Li et.al. | 2404.03753 | translate | read | null |
| 2024-04-04 | AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | Hanyu Lai et.al. | 2404.03648 | translate | read | link |
| 2024-04-04 | Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention | Ziru Liu et.al. | 2404.03637 | translate | read | link |
| 2024-04-04 | Laser Learning Environment: A new environment for coordination-critical multi-agent tasks | Yannick Molinghen et.al. | 2404.03596 | translate | read | link |
| 2024-04-04 | Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm | Miao Lu et.al. | 2404.03578 | translate | read | null |
| 2024-04-04 | Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity | Jake Varley et.al. | 2404.03570 | translate | read | null |
| 2024-04-04 | AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale | Adam Pardyl et.al. | 2404.03482 | translate | read | link |
| 2024-04-04 | Integrating Hyperparameter Search into GramML | Hernán Ceferino Vázquez et.al. | 2404.03419 | translate | read | link |
| 2024-04-04 | Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought | Jooyoung Lee et.al. | 2404.03414 | translate | read | null |
| 2024-04-04 | SENSOR: Imitate Third-Person Expert’s Behaviors via Active Sensoring | Kaichen Huang et.al. | 2404.03386 | translate | read | null |
| 2024-04-04 | DIDA: Denoised Imitation Learning based on Domain Adaptation | Kaichen Huang et.al. | 2404.03382 | translate | read | null |
| 2024-04-03 | Learning Quadrupedal Locomotion via Differentiable Simulation | Clemens Schwarke et.al. | 2404.02887 | translate | read | null |
| 2024-04-03 | Unsupervised Learning of Effective Actions in Robotics | Marko Zaric et.al. | 2404.02728 | translate | read | link |
| 2024-04-03 | Reinforcement Learning in Categorical Cybernetics | Jules Hedges et.al. | 2404.02688 | translate | read | null |
| 2024-04-03 | Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering | Abhijeet Pendyala et.al. | 2404.02577 | translate | read | null |
| 2024-04-03 | SliceIt! – A Dual Simulator Framework for Learning Robot Food Slicing | Cristian C. Beltran-Hernandez et.al. | 2404.02569 | translate | read | link |
| 2024-04-03 | Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning | Yi Shen et.al. | 2404.02545 | translate | read | link |
| 2024-04-03 | Versatile Scene-Consistent Traffic Scenario Generation as Optimization with Diffusion | Zhiyu Huang et.al. | 2404.02524 | translate | read | null |
| 2024-04-03 | Joint Optimization on Uplink OFDMA and MU-MIMO for IEEE 802.11ax: Deep Hierarchical Reinforcement Learning Approach | Hyeonho Noh et.al. | 2404.02486 | translate | read | null |
| 2024-04-03 | Deep Reinforcement Learning for Traveling Purchaser Problems | Haofeng Yuan et.al. | 2404.02476 | translate | read | null |
| 2024-04-03 | Electric Vehicle Routing Problem for Emergency Power Supply: Towards Telecom Base Station Relief | Daisuke Kikuta et.al. | 2404.02448 | translate | read | link |
| 2024-04-02 | Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL | Golnaz Mesbahi et.al. | 2404.02113 | translate | read | null |
| 2024-04-02 | Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning | Samuel Tovey et.al. | 2404.01999 | translate | read | null |
| 2024-04-02 | VLRM: Vision-Language Models act as Reward Models for Image Captioning | Maksim Dzabraev et.al. | 2404.01911 | translate | read | null |
| 2024-04-02 | Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation | Carlos Plou et.al. | 2404.01867 | translate | read | null |
| 2024-04-02 | Keeping Behavioral Programs Alive: Specifying and Executing Liveness Requirements | Tom Yaacov et.al. | 2404.01858 | translate | read | null |
| 2024-04-02 | EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking | Stavros Orfanoudakis et.al. | 2404.01849 | translate | read | null |
| 2024-04-02 | Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy | Kyungbok Lee et.al. | 2404.01830 | translate | read | null |
| 2024-04-02 | Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid | Eric MSP Veith et.al. | 2404.01794 | translate | read | null |
| 2024-04-02 | Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems | Dapeng Zhi et.al. | 2404.01769 | translate | read | null |
| 2024-04-02 | Asymptotics of Language Model Alignment | Joy Qiping Yang et.al. | 2404.01730 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)