Reinforcement Learning - 2024-05
Reinforcement Learning - 2024-05
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-05-31 | Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF | Tengyang Xie et.al. | 2405.21046 | translate | read | null |
| 2024-05-31 | Direct Alignment of Language Models via Quality-Aware Self-Refinement | Runsheng Yu et.al. | 2405.21040 | translate | read | null |
| 2024-05-31 | Generating Triangulations and Fibrations with Reinforcement Learning | Per Berglund et.al. | 2405.21017 | translate | read | null |
| 2024-05-31 | Bayesian Design Principles for Offline-to-Online Reinforcement Learning | Hao Hu et.al. | 2405.20984 | translate | read | null |
| 2024-05-31 | Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring | Prasoon Raghuwanshi et.al. | 2405.20983 | translate | read | null |
| 2024-05-31 | SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | Tianyang Xu et.al. | 2405.20974 | translate | read | link |
| 2024-05-31 | Amortizing intractable inference in diffusion models for vision, language, and control | Siddarth Venkatraman et.al. | 2405.20971 | translate | read | link |
| 2024-05-31 | Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation | Shangding Gu et.al. | 2405.20860 | translate | read | null |
| 2024-05-31 | Improving Reward Models with Synthetic Critiques | Zihuiwen Ye et.al. | 2405.20850 | translate | read | null |
| 2024-05-30 | Group Robust Preference Optimization in Reward-free RLHF | Shyam Sundhar Ramesh et.al. | 2405.20304 | translate | read | link |
| 2024-05-30 | Evaluating Large Language Model Biases in Persona-Steered Generation | Andy Liu et.al. | 2405.20253 | translate | read | link |
| 2024-05-30 | InstructionCP: A fast approach to transfer Large Language Models into target language | Kuang-Ming Chen et.al. | 2405.20175 | translate | read | null |
| 2024-05-30 | Enhancing Battlefield Awareness: An Aerial RIS-assisted ISAC System with Deep Reinforcement Learning | Hyunsang Cho et.al. | 2405.20168 | translate | read | null |
| 2024-05-30 | Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation | Wooseong Cho et.al. | 2405.20165 | translate | read | null |
| 2024-05-30 | NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models | Kai Wu et.al. | 2405.20081 | translate | read | null |
| 2024-05-30 | Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads | Avelina Asada Hadji-Kyriacou et.al. | 2405.20053 | translate | read | link |
| 2024-05-30 | Deep Reinforcement Learning for Intrusion Detection in IoT: A Survey | Afrah Gueriani et.al. | 2405.20038 | translate | read | null |
| 2024-05-30 | Safe Multi-agent Reinforcement Learning with Natural Language Constraints | Ziyan Wang et.al. | 2405.20018 | translate | read | null |
| 2024-05-30 | LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning | Hyungho Na et.al. | 2405.19998 | translate | read | null |
| 2024-05-29 | Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | Shenao Zhang et.al. | 2405.19332 | translate | read | link |
| 2024-05-29 | Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF | Shicong Cen et.al. | 2405.19320 | translate | read | null |
| 2024-05-29 | Robust Preference Optimization through Reward Model Distillation | Adam Fisch et.al. | 2405.19316 | translate | read | null |
| 2024-05-29 | Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels | Abhay Deshpande et.al. | 2405.19307 | translate | read | null |
| 2024-05-29 | Act Natural! Projecting Autonomous System Trajectories Into Naturalistic Behavior Sets | Hamzah I. Khan et.al. | 2405.19292 | translate | read | null |
| 2024-05-29 | Rich-Observation Reinforcement Learning with Continuous Latent Dynamics | Yuda Song et.al. | 2405.19269 | translate | read | null |
| 2024-05-29 | Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach | Amir Hossein Karbasi et.al. | 2405.19236 | translate | read | null |
| 2024-05-29 | Diffusion-based Dynamics Models for Long-Horizon Rollout in Offline Reinforcement Learning | Hanye Zhao et.al. | 2405.19189 | translate | read | null |
| 2024-05-29 | Conditional Latent ODEs for Motion Prediction in Autonomous Driving | Khang Truong Giang et.al. | 2405.19183 | translate | read | null |
| 2024-05-29 | A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning | Arthur Juliani et.al. | 2405.19153 | translate | read | null |
| 2024-05-28 | Hierarchical World Models as Visual Whole-Body Humanoid Controllers | Nicklas Hansen et.al. | 2405.18418 | translate | read | null |
| 2024-05-28 | Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study | Shreyas Bhat et.al. | 2405.18324 | translate | read | null |
| 2024-05-28 | Highway Reinforcement Learning | Yuhui Wang et.al. | 2405.18289 | translate | read | null |
| 2024-05-28 | Extreme Value Monte Carlo Tree Search | Masataro Asai et.al. | 2405.18248 | translate | read | null |
| 2024-05-28 | Recurrent Natural Policy Gradient for POMDPs | Semih Cayci et.al. | 2405.18221 | translate | read | null |
| 2024-05-28 | Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving | Zhi Zheng et.al. | 2405.18209 | translate | read | link |
| 2024-05-28 | Mutation-Bias Learning in Games | Johann Bauer et.al. | 2405.18190 | translate | read | null |
| 2024-05-28 | Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding | Daniel Bethell et.al. | 2405.18180 | translate | read | link |
| 2024-05-28 | Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing | Wei Zhao et.al. | 2405.18166 | translate | read | link |
| 2024-05-28 | PyTAG: Tabletop Games for Multi-Agent Reinforcement Learning | Martin Balla et.al. | 2405.18123 | translate | read | link |
| 2024-05-27 | A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning | Abdulaziz Almuzairee et.al. | 2405.17416 | translate | read | null |
| 2024-05-27 | Rethinking Transformers in Solving POMDPs | Chenhao Lu et.al. | 2405.17358 | translate | read | link |
| 2024-05-27 | Opinion-Guided Reinforcement Learning | Kyanna Dagenais et.al. | 2405.17287 | translate | read | null |
| 2024-05-27 | DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems | Zhi Zheng et.al. | 2405.17272 | translate | read | link |
| 2024-05-27 | Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning | Adriana Hugessen et.al. | 2405.17243 | translate | read | null |
| 2024-05-27 | InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning | Guozheng Li et.al. | 2405.17229 | translate | read | null |
| 2024-05-27 | Learning Generic and Dynamic Locomotion of Humanoids Across Discrete Terrains | Shangqun Yu et.al. | 2405.17227 | translate | read | null |
| 2024-05-27 | Flow control of three-dimensional cylinders transitioning to turbulence via multi-agent reinforcement learning | P. Suárez et.al. | 2405.17210 | translate | read | null |
| 2024-05-27 | CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control | Jingqing Ruan et.al. | 2405.17152 | translate | read | link |
| 2024-05-27 | Q-value Regularized Transformer for Offline Reinforcement Learning | Shengchao Hu et.al. | 2405.17098 | translate | read | null |
| 2024-05-24 | Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment | Hao Sun et.al. | 2405.15624 | translate | read | null |
| 2024-05-24 | Neuromorphic dreaming: A pathway to efficient learning in artificial agents | Ingo Blakowski et.al. | 2405.15616 | translate | read | null |
| 2024-05-24 | OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code | Maxence Faldor et.al. | 2405.15568 | translate | read | link |
| 2024-05-24 | Learning Generalizable Human Motion Generator with Reinforcement Learning | Yunyao Mao et.al. | 2405.15541 | translate | read | null |
| 2024-05-24 | Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces | Angeliki Kamoutsi et.al. | 2405.15509 | translate | read | null |
| 2024-05-24 | Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments | Olivia Jullian Parra et.al. | 2405.15508 | translate | read | null |
| 2024-05-24 | TD3 Based Collision Free Motion Planning for Robot Navigation | Hao Liu et.al. | 2405.15460 | translate | read | null |
| 2024-05-24 | Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics | David Boetius et.al. | 2405.15430 | translate | read | null |
| 2024-05-24 | Model-free reinforcement learning with noisy actions for automated experimental control in optics | Lea Richtmann et.al. | 2405.15421 | translate | read | null |
| 2024-05-24 | Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate | Fan-Ming Luo et.al. | 2405.15384 | translate | read | null |
| 2024-05-23 | Privileged Sensing Scaffolds Reinforcement Learning | Edward S. Hu et.al. | 2405.14853 | translate | read | null |
| 2024-05-23 | Axioms for AI Alignment from Human Feedback | Luise Ge et.al. | 2405.14758 | translate | read | null |
| 2024-05-23 | AGILE: A Novel Framework of LLM Agents | Peiyuan Feng et.al. | 2405.14751 | translate | read | link |
| 2024-05-23 | Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence | Minheng Xiao et.al. | 2405.14749 | translate | read | null |
| 2024-05-23 | SimPO: Simple Preference Optimization with a Reference-Free Reward | Yu Meng et.al. | 2405.14734 | translate | read | link |
| 2024-05-23 | Multi-turn Reinforcement Learning from Preference Human Feedback | Lior Shani et.al. | 2405.14655 | translate | read | null |
| 2024-05-23 | Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models | Jingyi Chen et.al. | 2405.14632 | translate | read | null |
| 2024-05-23 | Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences | Takuya Hiraoka et.al. | 2405.14629 | translate | read | null |
| 2024-05-23 | Closed-form Symbolic Solutions: A New Perspective on Solving Partial Differential Equations | Shu Wei et.al. | 2405.14620 | translate | read | null |
| 2024-05-23 | Discretization of continuous input spaces in the hippocampal autoencoder | Adrian F. Amil et.al. | 2405.14600 | translate | read | null |
| 2024-05-21 | Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale | Shriram Chennakesavalu et.al. | 2405.12961 | translate | read | null |
| 2024-05-21 | Effect of Synthetic Jets Actuator Parameters on Deep Reinforcement Learning-Based Flow Control Performance in a Square Cylinder | Wang Jia et.al. | 2405.12834 | translate | read | null |
| 2024-05-21 | Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones | Jan-Hendrik Ewers et.al. | 2405.12800 | translate | read | null |
| 2024-05-21 | Generative AI and Large Language Models for Cyber Security: All Insights You Need | Mohamed Amine Ferrag et.al. | 2405.12750 | translate | read | null |
| 2024-05-21 | Reinforcement Learning Enabled Peer-to-Peer Energy Trading for Dairy Farms | Mian Ibad Ali Shah et.al. | 2405.12716 | translate | read | null |
| 2024-05-21 | A Multimodal Learning-based Approach for Autonomous Landing of UAV | Francisco Neves et.al. | 2405.12681 | translate | read | null |
| 2024-05-21 | Learning Causal Dynamics Models in Object-Oriented Environments | Zhongwei Yu et.al. | 2405.12615 | translate | read | null |
| 2024-05-21 | PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation | Yuhua Zhu et.al. | 2405.12535 | translate | read | null |
| 2024-05-21 | GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems | Zhenwei Wang et.al. | 2405.12475 | translate | read | null |
| 2024-05-21 | Physics-based Scene Layout Generation from Human Motion | Jianan Li et.al. | 2405.12460 | translate | read | null |
| 2024-05-20 | Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? | Yang Dai et.al. | 2405.12094 | translate | read | null |
| 2024-05-20 | PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation | Zhuobin Huang et.al. | 2405.12079 | translate | read | null |
| 2024-05-20 | Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning | Hai Zhang et.al. | 2405.12001 | translate | read | null |
| 2024-05-20 | Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space | Qianmei Liu et.al. | 2405.11982 | translate | read | null |
| 2024-05-20 | A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers | Tom Roth et.al. | 2405.11904 | translate | read | null |
| 2024-05-20 | Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process | Ermo Hua et.al. | 2405.11870 | translate | read | link |
| 2024-05-20 | Reward-Punishment Reinforcement Learning with Maximum Entropy | Jiexin Wang et.al. | 2405.11784 | translate | read | null |
| 2024-05-20 | Efficient Multi-agent Reinforcement Learning by Planning | Qihan Liu et.al. | 2405.11778 | translate | read | link |
| 2024-05-20 | Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning | Xin Liu et.al. | 2405.11740 | translate | read | null |
| 2024-05-20 | Highway Graph to Accelerate Reinforcement Learning | Zidu Yin et.al. | 2405.11727 | translate | read | link |
| 2024-05-17 | Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review | Hongyi Yang et.al. | 2405.10883 | translate | read | null |
| 2024-05-17 | Automated Radiology Report Generation: A Review of Recent Advances | Phillip Sloan et.al. | 2405.10842 | translate | read | null |
| 2024-05-17 | Combining Teacher-Student with Representation Learning: A Concurrent Teacher-Student Reinforcement Learning Paradigm for Legged Locomotion | Hongxi Wang et.al. | 2405.10830 | translate | read | null |
| 2024-05-17 | Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities | Hao Zhou et.al. | 2405.10825 | translate | read | null |
| 2024-05-17 | A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization | Andrzej Ruszczyński et.al. | 2405.10815 | translate | read | null |
| 2024-05-17 | SignLLM: Sign Languages Production Large Language Models | Sen Fang et.al. | 2405.10718 | translate | read | null |
| 2024-05-17 | Sample-Efficient Constrained Reinforcement Learning with General Parameterization | Washim Uddin Mondal et.al. | 2405.10624 | translate | read | null |
| 2024-05-17 | An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems | Jiyue Tao et.al. | 2405.10576 | translate | read | null |
| 2024-05-17 | Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage Control | Jaeik Jeong et.al. | 2405.10536 | translate | read | null |
| 2024-05-17 | Towards Better Question Generation in QA-Based Event Extraction | Zijin Hong et.al. | 2405.10517 | translate | read | null |
| 2024-05-16 | Stochastic Q-learning for Large Discrete Action Spaces | Fares Fourati et.al. | 2405.10310 | translate | read | null |
| 2024-05-16 | Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | Yuexiang Zhai et.al. | 2405.10292 | translate | read | null |
| 2024-05-16 | Keep It Private: Unsupervised Privatization of Online Text | Calvin Bao et.al. | 2405.10260 | translate | read | link |
| 2024-05-16 | A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy | Zhaoxing Li et.al. | 2405.10214 | translate | read | null |
| 2024-05-16 | Continuous Transfer Learning for UAV Communication-aware Trajectory Design | Chenrui Sun et.al. | 2405.10087 | translate | read | null |
| 2024-05-16 | Optimizing Search and Rescue UAV Connectivity in Challenging Terrain through Multi Q-Learning | Mohammed M. H. Qazzaz et.al. | 2405.10042 | translate | read | null |
| 2024-05-16 | Reward Centering | Abhishek Naik et.al. | 2405.09999 | translate | read | null |
| 2024-05-16 | Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning | Francisco Leiva et.al. | 2405.09760 | translate | read | null |
| 2024-05-16 | NIFTY Financial News Headlines Dataset | Raeid Saqur et.al. | 2405.09747 | translate | read | null |
| 2024-05-15 | Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning | Sihan Zeng et.al. | 2405.09660 | translate | read | null |
| 2024-05-15 | Reinforcement Learning-Based Framework for the Intelligent Adaptation of User Interfaces | Daniel Gaspar-Figueiredo et.al. | 2405.09255 | translate | read | null |
| 2024-05-15 | DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State Representation | Jingwen Yang et.al. | 2405.09163 | translate | read | null |
| 2024-05-15 | CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving | Dechen Gao et.al. | 2405.09111 | translate | read | null |
| 2024-05-15 | Chaos-based reinforcement learning with TD3 | Toshitaka Matsuki et.al. | 2405.09086 | translate | read | null |
| 2024-05-15 | Deep Learning in Earthquake Engineering: A Comprehensive Review | Yazhou Xie et.al. | 2405.09021 | translate | read | null |
| 2024-05-14 | Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language | Jan Kaiser et.al. | 2405.08888 | translate | read | null |
| 2024-05-14 | Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes | Samuel Tesfazgi et.al. | 2405.08756 | translate | read | null |
| 2024-05-14 | Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach | Urvij Saroliya et.al. | 2405.08754 | translate | read | null |
| 2024-05-14 | Reinformer: Max-Return Sequence Modeling for offline RL | Zifeng Zhuang et.al. | 2405.08740 | translate | read | null |
| 2024-05-14 | I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning | Yashuai Yan et.al. | 2405.08726 | translate | read | null |
| 2024-05-15 | Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search Planning | Jan-Hendrik Ewers et.al. | 2405.08691 | translate | read | null |
| 2024-05-14 | A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning | Matteo Cederle et.al. | 2405.08655 | translate | read | link |
| 2024-05-14 | vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement | Yiwen Zhu et.al. | 2405.08638 | translate | read | null |
| 2024-05-14 | Optimizing Deep Reinforcement Learning for American Put Option Hedging | Reilly Pickard et.al. | 2405.08602 | translate | read | null |
| 2024-05-14 | Python-Based Reinforcement Learning on Simulink Models | Georg Schäfer et.al. | 2405.08567 | translate | read | null |
| 2024-05-14 | Growing Artificial Neural Networks for Control: the Role of Neuronal Diversity | Eleni Nisioti et.al. | 2405.08510 | translate | read | null |
| 2024-05-13 | Hierarchical Decision Mamba | André Correia et.al. | 2405.07943 | translate | read | link |
| 2024-05-13 | RLHF Workflow: From Reward Modeling to Online RLHF | Hanze Dong et.al. | 2405.07863 | translate | read | link |
| 2024-05-13 | Adaptive Exploration for Data-Efficient General Value Function Evaluations | Arushi Jain et.al. | 2405.07838 | translate | read | null |
| 2024-05-13 | Fixed Point Theory Analysis of a Lambda Policy Iteration with Randomization for the Ćirić Contraction Operator | Abdelkader Belhenniche et.al. | 2405.07824 | translate | read | null |
| 2024-05-13 | Hamiltonian-based Quantum Reinforcement Learning for Neural Combinatorial Optimization | Georg Kruse et.al. | 2405.07790 | translate | read | null |
| 2024-05-13 | Hype or Heuristic? Quantum Reinforcement Learning for Join Order Optimisation | Maja Franz et.al. | 2405.07770 | translate | read | null |
| 2024-05-13 | CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian Optimization | Wei-Ting Tang et.al. | 2405.07760 | translate | read | null |
| 2024-05-13 | MADRL-Based Rate Adaptation for 360 $\degree$ Video Streaming with Multi-Viewpoint Prediction | Haopeng Wang et.al. | 2405.07759 | translate | read | null |
| 2024-05-13 | Neural Network Compression for Reinforcement Learning Tasks | Dmitry A. Ivanov et.al. | 2405.07748 | translate | read | null |
| 2024-05-13 | Backdoor Removal for Generative Large Language Models | Haoran Li et.al. | 2405.07667 | translate | read | null |
| 2024-05-10 | Value Augmented Sampling for Language Model Alignment and Personalization | Seungwook Han et.al. | 2405.06639 | translate | read | link |
| 2024-05-10 | EcoEdgeTwin: Enhanced 6G Network via Mobile Edge Computing and Digital Twin Integration | Synthia Hossain Karobi et.al. | 2405.06507 | translate | read | null |
| 2024-05-10 | Advantageous and disadvantageous inequality aversion can be taught through vicarious learning of others’ preferences | Shen Zhang et.al. | 2405.06500 | translate | read | null |
| 2024-05-10 | Contextual Affordances for Safe Exploration in Robotic Scenarios | William Z. Ye et.al. | 2405.06422 | translate | read | null |
| 2024-05-10 | Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs | Davide Maran et.al. | 2405.06363 | translate | read | null |
| 2024-05-10 | Learning Latent Dynamic Robust Representations for World Models | Ruixiang Sun et.al. | 2405.06263 | translate | read | link |
| 2024-05-10 | Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning | Xiaoyu Wen et.al. | 2405.06192 | translate | read | link |
| 2024-05-10 | (A Partial Survey of) Decentralized, Cooperative Multi-Agent Reinforcement Learning | Christopher Amato et.al. | 2405.06161 | translate | read | null |
| 2024-05-09 | An RNN-policy gradient approach for quantum architecture search | Gang Wang et.al. | 2405.05892 | translate | read | null |
| 2024-05-09 | Safe Exploration Using Bayesian World Models and Log-Barrier Optimization | Yarden As et.al. | 2405.05890 | translate | read | null |
| 2024-05-09 | ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers | Liangliang Chen et.al. | 2405.05861 | translate | read | null |
| 2024-05-09 | Policy Gradient with Active Importance Sampling | Matteo Papini et.al. | 2405.05630 | translate | read | null |
| 2024-05-09 | An Automatic Prompt Generation System for Tabular Data Tasks | Ashlesha Akella et.al. | 2405.05618 | translate | read | null |
| 2024-05-09 | Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning | Yuchen Shi et.al. | 2405.05542 | translate | read | link |
| 2024-05-08 | Model-Free Robust $φ$ -Divergence Reinforcement Learning Using Both Offline and Online Data | Kishan Panaganti et.al. | 2405.05468 | translate | read | null |
| 2024-05-08 | Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management | Gang Hu et.al. | 2405.05449 | translate | read | null |
| 2024-05-08 | Learning to Play Pursuit-Evasion with Dynamic and Sensor Constraints | Burak M. Gonultas et.al. | 2405.05372 | translate | read | null |
| 2024-05-08 | Offline Model-Based Optimization via Policy-Guided Gradient Search | Yassine Chemingui et.al. | 2405.05349 | translate | read | link |
| 2024-05-08 | Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models | Aylin Gunal et.al. | 2405.05060 | translate | read | null |
| 2024-05-08 | Fault Identification Enhancement with Reinforcement Learning (FIERL) | Valentina Zaccaria et.al. | 2405.04938 | translate | read | link |
| 2024-05-07 | RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes | Kyle Stachowicz et.al. | 2405.04714 | translate | read | null |
| 2024-05-07 | Proximal Policy Optimization with Adaptive Exploration | Andrei Lixandru et.al. | 2405.04664 | translate | read | null |
| 2024-05-07 | ACEGEN: Reinforcement learning of generative chemical agents for drug discovery | Albert Bou et.al. | 2405.04657 | translate | read | link |
| 2024-05-07 | TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters | Jonathan Wilder Lavington et.al. | 2405.04491 | translate | read | null |
| 2024-05-07 | Designing, Developing, and Validating Network Intelligence for Scaling in Service-Based Architectures based on Deep Reinforcement Learning | Paola Soto et.al. | 2405.04441 | translate | read | null |
| 2024-05-08 | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | DeepSeek-AI et.al. | 2405.04434 | translate | read | link |
| 2024-05-07 | The Curse of Diversity in Ensemble-Based Exploration | Zhixuan Lin et.al. | 2405.04342 | translate | read | link |
| 2024-05-07 | Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation | Atharvan Dogra et.al. | 2405.04325 | translate | read | null |
| 2024-05-07 | Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies | Paul Templier et.al. | 2405.04322 | translate | read | null |
| 2024-05-07 | Improving Offline Reinforcement Learning with Inaccurate Simulators | Yiwen Hou et.al. | 2405.04307 | translate | read | null |
| 2024-05-07 | Deep Reinforcement Learning for Multi-User RF Charging with Non-linear Energy Harvesters | Amirhossein Azarbahram et.al. | 2405.04218 | translate | read | null |
| 2024-05-07 | In-context Learning for Automated Driving Scenarios | Ziqi Zhou et.al. | 2405.04135 | translate | read | null |
| 2024-05-07 | Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning | Chunlin Tian et.al. | 2405.04122 | translate | read | null |
| 2024-05-06 | $ε$ -Policy Gradient for Online Pricing | Lukasz Szpruch et.al. | 2405.03624 | translate | read | null |
| 2024-05-06 | Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions | Xingyou Song et.al. | 2405.03547 | translate | read | null |
| 2024-05-06 | ReinWiFi: A Reinforcement-Learning-Based Framework for the Application-Layer QoS Optimization of WiFi Networks | Qianren Li et.al. | 2405.03526 | translate | read | null |
| 2024-05-06 | Robotic Constrained Imitation Learning for the Peg Transfer Task in Fundamentals of Laparoscopic Surgery | Kento Kawaharazuka et.al. | 2405.03440 | translate | read | null |
| 2024-05-06 | Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning | Stone Tao et.al. | 2405.03379 | translate | read | null |
| 2024-05-06 | Enhancing Q-Learning with Large Language Model Heuristics | Xiefeng Wu et.al. | 2405.03341 | translate | read | null |
| 2024-05-06 | Artificial Intelligence in the Autonomous Navigation of Endovascular Interventions: A Systematic Review | Harry Robertshaw et.al. | 2405.03305 | translate | read | null |
| 2024-05-06 | End-to-End Reinforcement Learning of Curative Curtailment with Partial Measurement Availability | Hinrikus Wolf et.al. | 2405.03262 | translate | read | null |
| 2024-05-06 | Federated Reinforcement Learning with Constraint Heterogeneity | Hao Jin et.al. | 2405.03236 | translate | read | null |
| 2024-05-06 | Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning | Caleb Chuck et.al. | 2405.03113 | translate | read | null |
| 2024-05-03 | Geometric Fabrics: a Safe Guiding Medium for Policy Learning | Karl Van Wyk et.al. | 2405.02250 | translate | read | null |
| 2024-05-03 | Learning Optimal Deterministic Policies with Stochastic Policy Gradients | Alessandro Montenegro et.al. | 2405.02235 | translate | read | null |
| 2024-05-03 | The Cambridge RoboMaster: An Agile Multi-Robot Research Platform | Jan Blumenkamp et.al. | 2405.02198 | translate | read | null |
| 2024-05-03 | Imitation Learning in Discounted Linear MDPs without exploration assumptions | Luca Viano et.al. | 2405.02181 | translate | read | null |
| 2024-05-03 | Simulating the economic impact of rationality through reinforcement learning and agent-based modelling | Simone Brusatin et.al. | 2405.02161 | translate | read | null |
| 2024-05-03 | Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach | Anton Plaksin et.al. | 2405.02044 | translate | read | null |
| 2024-05-03 | Model-based reinforcement learning for protein backbone design | Frederic Renard et.al. | 2405.01983 | translate | read | null |
| 2024-05-03 | Rescale-Invariant Federated Reinforcement Learning for Resource Allocation in V2X Networks | Kaidi Xu et.al. | 2405.01961 | translate | read | null |
| 2024-05-03 | Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization | Changliang Zhou et.al. | 2405.01906 | translate | read | null |
| 2024-05-03 | Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants | Francesco Maldonato et.al. | 2405.01889 | translate | read | link |
| 2024-05-02 | Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks | Murtaza Dalal et.al. | 2405.01534 | translate | read | null |
| 2024-05-02 | FLAME: Factuality-Aware Alignment for Large Language Models | Sheng-Chieh Lin et.al. | 2405.01525 | translate | read | null |
| 2024-05-02 | NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Gerald Shen et.al. | 2405.01481 | translate | read | link |
| 2024-05-02 | IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning | Ryan Hoque et.al. | 2405.01472 | translate | read | null |
| 2024-05-02 | Goal-conditioned reinforcement learning for ultrasound navigation guidance | Abdoul Aziz Amadou et.al. | 2405.01409 | translate | read | null |
| 2024-05-02 | Learning Force Control for Legged Manipulation | Tifanny Portela et.al. | 2405.01402 | translate | read | null |
| 2024-05-02 | Constrained Reinforcement Learning Under Model Mismatch | Zhongchang Sun et.al. | 2405.01327 | translate | read | null |
| 2024-05-02 | Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network | Hyeonsu Lyu et.al. | 2405.01314 | translate | read | null |
| 2024-05-02 | Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning | Liu Qiyuan et.al. | 2405.01284 | translate | read | null |
| 2024-05-02 | Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation | Hao Wang et.al. | 2405.01280 | translate | read | null |
| 2024-05-01 | Self-Play Preference Optimization for Language Model Alignment | Yue Wu et.al. | 2405.00675 | translate | read | null |
| 2024-05-01 | No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO | Skander Moalla et.al. | 2405.00662 | translate | read | link |
| 2024-05-01 | HUGO – Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach | Malte Lehna et.al. | 2405.00629 | translate | read | null |
| 2024-05-01 | Koopman-based Deep Learning for Nonlinear System Estimation | Zexin Sun et.al. | 2405.00627 | translate | read | null |
| 2024-05-01 | Queue-based Eco-Driving at Roundabouts with Reinforcement Learning | Anna-Lena Schlamp et.al. | 2405.00625 | translate | read | null |
| 2024-05-01 | The Real, the Better: Aligning Large Language Models with Online Human Behaviors | Guanying Jiang et.al. | 2405.00578 | translate | read | null |
| 2024-05-01 | Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment | Zhili Liu et.al. | 2405.00557 | translate | read | null |
| 2024-05-01 | Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | Lucas-Andreï Thil et.al. | 2405.00516 | translate | read | null |
| 2024-05-01 | MetaRM: Shifted Distributions Alignment via Meta-Learning | Shihan Dou et.al. | 2405.00438 | translate | read | null |
| 2024-05-01 | UCB-driven Utility Function Search for Multi-objective Reinforcement Learning | Yucheng Shi et.al. | 2405.00410 | translate | read | link |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)