Reinforcement Learning - 2024-05

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-05-31	Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF	Tengyang Xie et.al.	2405.21046	translate	read	null
2024-05-31	Direct Alignment of Language Models via Quality-Aware Self-Refinement	Runsheng Yu et.al.	2405.21040	translate	read	null
2024-05-31	Generating Triangulations and Fibrations with Reinforcement Learning	Per Berglund et.al.	2405.21017	translate	read	null
2024-05-31	Bayesian Design Principles for Offline-to-Online Reinforcement Learning	Hao Hu et.al.	2405.20984	translate	read	null
2024-05-31	Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring	Prasoon Raghuwanshi et.al.	2405.20983	translate	read	null
2024-05-31	SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales	Tianyang Xu et.al.	2405.20974	translate	read	link
2024-05-31	Amortizing intractable inference in diffusion models for vision, language, and control	Siddarth Venkatraman et.al.	2405.20971	translate	read	link
2024-05-31	Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation	Shangding Gu et.al.	2405.20860	translate	read	null
2024-05-31	Improving Reward Models with Synthetic Critiques	Zihuiwen Ye et.al.	2405.20850	translate	read	null
2024-05-30	Group Robust Preference Optimization in Reward-free RLHF	Shyam Sundhar Ramesh et.al.	2405.20304	translate	read	link
2024-05-30	Evaluating Large Language Model Biases in Persona-Steered Generation	Andy Liu et.al.	2405.20253	translate	read	link
2024-05-30	InstructionCP: A fast approach to transfer Large Language Models into target language	Kuang-Ming Chen et.al.	2405.20175	translate	read	null
2024-05-30	Enhancing Battlefield Awareness: An Aerial RIS-assisted ISAC System with Deep Reinforcement Learning	Hyunsang Cho et.al.	2405.20168	translate	read	null
2024-05-30	Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation	Wooseong Cho et.al.	2405.20165	translate	read	null
2024-05-30	NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models	Kai Wu et.al.	2405.20081	translate	read	null
2024-05-30	Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads	Avelina Asada Hadji-Kyriacou et.al.	2405.20053	translate	read	link
2024-05-30	Deep Reinforcement Learning for Intrusion Detection in IoT: A Survey	Afrah Gueriani et.al.	2405.20038	translate	read	null
2024-05-30	Safe Multi-agent Reinforcement Learning with Natural Language Constraints	Ziyan Wang et.al.	2405.20018	translate	read	null
2024-05-30	LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning	Hyungho Na et.al.	2405.19998	translate	read	null
2024-05-29	Self-Exploring Language Models: Active Preference Elicitation for Online Alignment	Shenao Zhang et.al.	2405.19332	translate	read	link
2024-05-29	Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF	Shicong Cen et.al.	2405.19320	translate	read	null
2024-05-29	Robust Preference Optimization through Reward Model Distillation	Adam Fisch et.al.	2405.19316	translate	read	null
2024-05-29	Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels	Abhay Deshpande et.al.	2405.19307	translate	read	null
2024-05-29	Act Natural! Projecting Autonomous System Trajectories Into Naturalistic Behavior Sets	Hamzah I. Khan et.al.	2405.19292	translate	read	null
2024-05-29	Rich-Observation Reinforcement Learning with Continuous Latent Dynamics	Yuda Song et.al.	2405.19269	translate	read	null
2024-05-29	Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach	Amir Hossein Karbasi et.al.	2405.19236	translate	read	null
2024-05-29	Diffusion-based Dynamics Models for Long-Horizon Rollout in Offline Reinforcement Learning	Hanye Zhao et.al.	2405.19189	translate	read	null
2024-05-29	Conditional Latent ODEs for Motion Prediction in Autonomous Driving	Khang Truong Giang et.al.	2405.19183	translate	read	null
2024-05-29	A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning	Arthur Juliani et.al.	2405.19153	translate	read	null
2024-05-28	Hierarchical World Models as Visual Whole-Body Humanoid Controllers	Nicklas Hansen et.al.	2405.18418	translate	read	null
2024-05-28	Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study	Shreyas Bhat et.al.	2405.18324	translate	read	null
2024-05-28	Highway Reinforcement Learning	Yuhui Wang et.al.	2405.18289	translate	read	null
2024-05-28	Extreme Value Monte Carlo Tree Search	Masataro Asai et.al.	2405.18248	translate	read	null
2024-05-28	Recurrent Natural Policy Gradient for POMDPs	Semih Cayci et.al.	2405.18221	translate	read	null
2024-05-28	Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving	Zhi Zheng et.al.	2405.18209	translate	read	link
2024-05-28	Mutation-Bias Learning in Games	Johann Bauer et.al.	2405.18190	translate	read	null
2024-05-28	Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding	Daniel Bethell et.al.	2405.18180	translate	read	link
2024-05-28	Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing	Wei Zhao et.al.	2405.18166	translate	read	link
2024-05-28	PyTAG: Tabletop Games for Multi-Agent Reinforcement Learning	Martin Balla et.al.	2405.18123	translate	read	link
2024-05-27	A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning	Abdulaziz Almuzairee et.al.	2405.17416	translate	read	null
2024-05-27	Rethinking Transformers in Solving POMDPs	Chenhao Lu et.al.	2405.17358	translate	read	link
2024-05-27	Opinion-Guided Reinforcement Learning	Kyanna Dagenais et.al.	2405.17287	translate	read	null
2024-05-27	DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems	Zhi Zheng et.al.	2405.17272	translate	read	link
2024-05-27	Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning	Adriana Hugessen et.al.	2405.17243	translate	read	null
2024-05-27	InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning	Guozheng Li et.al.	2405.17229	translate	read	null
2024-05-27	Learning Generic and Dynamic Locomotion of Humanoids Across Discrete Terrains	Shangqun Yu et.al.	2405.17227	translate	read	null
2024-05-27	Flow control of three-dimensional cylinders transitioning to turbulence via multi-agent reinforcement learning	P. Suárez et.al.	2405.17210	translate	read	null
2024-05-27	CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control	Jingqing Ruan et.al.	2405.17152	translate	read	link
2024-05-27	Q-value Regularized Transformer for Offline Reinforcement Learning	Shengchao Hu et.al.	2405.17098	translate	read	null
2024-05-24	Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment	Hao Sun et.al.	2405.15624	translate	read	null
2024-05-24	Neuromorphic dreaming: A pathway to efficient learning in artificial agents	Ingo Blakowski et.al.	2405.15616	translate	read	null
2024-05-24	OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code	Maxence Faldor et.al.	2405.15568	translate	read	link
2024-05-24	Learning Generalizable Human Motion Generator with Reinforcement Learning	Yunyao Mao et.al.	2405.15541	translate	read	null
2024-05-24	Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces	Angeliki Kamoutsi et.al.	2405.15509	translate	read	null
2024-05-24	Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments	Olivia Jullian Parra et.al.	2405.15508	translate	read	null
2024-05-24	TD3 Based Collision Free Motion Planning for Robot Navigation	Hao Liu et.al.	2405.15460	translate	read	null
2024-05-24	Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics	David Boetius et.al.	2405.15430	translate	read	null
2024-05-24	Model-free reinforcement learning with noisy actions for automated experimental control in optics	Lea Richtmann et.al.	2405.15421	translate	read	null
2024-05-24	Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate	Fan-Ming Luo et.al.	2405.15384	translate	read	null
2024-05-23	Privileged Sensing Scaffolds Reinforcement Learning	Edward S. Hu et.al.	2405.14853	translate	read	null
2024-05-23	Axioms for AI Alignment from Human Feedback	Luise Ge et.al.	2405.14758	translate	read	null
2024-05-23	AGILE: A Novel Framework of LLM Agents	Peiyuan Feng et.al.	2405.14751	translate	read	link
2024-05-23	Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence	Minheng Xiao et.al.	2405.14749	translate	read	null
2024-05-23	SimPO: Simple Preference Optimization with a Reference-Free Reward	Yu Meng et.al.	2405.14734	translate	read	link
2024-05-23	Multi-turn Reinforcement Learning from Preference Human Feedback	Lior Shani et.al.	2405.14655	translate	read	null
2024-05-23	Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models	Jingyi Chen et.al.	2405.14632	translate	read	null
2024-05-23	Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences	Takuya Hiraoka et.al.	2405.14629	translate	read	null
2024-05-23	Closed-form Symbolic Solutions: A New Perspective on Solving Partial Differential Equations	Shu Wei et.al.	2405.14620	translate	read	null
2024-05-23	Discretization of continuous input spaces in the hippocampal autoencoder	Adrian F. Amil et.al.	2405.14600	translate	read	null
2024-05-21	Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale	Shriram Chennakesavalu et.al.	2405.12961	translate	read	null
2024-05-21	Effect of Synthetic Jets Actuator Parameters on Deep Reinforcement Learning-Based Flow Control Performance in a Square Cylinder	Wang Jia et.al.	2405.12834	translate	read	null
2024-05-21	Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones	Jan-Hendrik Ewers et.al.	2405.12800	translate	read	null
2024-05-21	Generative AI and Large Language Models for Cyber Security: All Insights You Need	Mohamed Amine Ferrag et.al.	2405.12750	translate	read	null
2024-05-21	Reinforcement Learning Enabled Peer-to-Peer Energy Trading for Dairy Farms	Mian Ibad Ali Shah et.al.	2405.12716	translate	read	null
2024-05-21	A Multimodal Learning-based Approach for Autonomous Landing of UAV	Francisco Neves et.al.	2405.12681	translate	read	null
2024-05-21	Learning Causal Dynamics Models in Object-Oriented Environments	Zhongwei Yu et.al.	2405.12615	translate	read	null
2024-05-21	PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation	Yuhua Zhu et.al.	2405.12535	translate	read	null
2024-05-21	GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems	Zhenwei Wang et.al.	2405.12475	translate	read	null
2024-05-21	Physics-based Scene Layout Generation from Human Motion	Jianan Li et.al.	2405.12460	translate	read	null
2024-05-20	Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?	Yang Dai et.al.	2405.12094	translate	read	null
2024-05-20	PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation	Zhuobin Huang et.al.	2405.12079	translate	read	null
2024-05-20	Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning	Hai Zhang et.al.	2405.12001	translate	read	null
2024-05-20	Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space	Qianmei Liu et.al.	2405.11982	translate	read	null
2024-05-20	A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers	Tom Roth et.al.	2405.11904	translate	read	null
2024-05-20	Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process	Ermo Hua et.al.	2405.11870	translate	read	link
2024-05-20	Reward-Punishment Reinforcement Learning with Maximum Entropy	Jiexin Wang et.al.	2405.11784	translate	read	null
2024-05-20	Efficient Multi-agent Reinforcement Learning by Planning	Qihan Liu et.al.	2405.11778	translate	read	link
2024-05-20	Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning	Xin Liu et.al.	2405.11740	translate	read	null
2024-05-20	Highway Graph to Accelerate Reinforcement Learning	Zidu Yin et.al.	2405.11727	translate	read	link
2024-05-17	Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review	Hongyi Yang et.al.	2405.10883	translate	read	null
2024-05-17	Automated Radiology Report Generation: A Review of Recent Advances	Phillip Sloan et.al.	2405.10842	translate	read	null
2024-05-17	Combining Teacher-Student with Representation Learning: A Concurrent Teacher-Student Reinforcement Learning Paradigm for Legged Locomotion	Hongxi Wang et.al.	2405.10830	translate	read	null
2024-05-17	Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities	Hao Zhou et.al.	2405.10825	translate	read	null
2024-05-17	A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization	Andrzej Ruszczyński et.al.	2405.10815	translate	read	null
2024-05-17	SignLLM: Sign Languages Production Large Language Models	Sen Fang et.al.	2405.10718	translate	read	null
2024-05-17	Sample-Efficient Constrained Reinforcement Learning with General Parameterization	Washim Uddin Mondal et.al.	2405.10624	translate	read	null
2024-05-17	An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems	Jiyue Tao et.al.	2405.10576	translate	read	null
2024-05-17	Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage Control	Jaeik Jeong et.al.	2405.10536	translate	read	null
2024-05-17	Towards Better Question Generation in QA-Based Event Extraction	Zijin Hong et.al.	2405.10517	translate	read	null
2024-05-16	Stochastic Q-learning for Large Discrete Action Spaces	Fares Fourati et.al.	2405.10310	translate	read	null
2024-05-16	Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning	Yuexiang Zhai et.al.	2405.10292	translate	read	null
2024-05-16	Keep It Private: Unsupervised Privatization of Online Text	Calvin Bao et.al.	2405.10260	translate	read	link
2024-05-16	A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy	Zhaoxing Li et.al.	2405.10214	translate	read	null
2024-05-16	Continuous Transfer Learning for UAV Communication-aware Trajectory Design	Chenrui Sun et.al.	2405.10087	translate	read	null
2024-05-16	Optimizing Search and Rescue UAV Connectivity in Challenging Terrain through Multi Q-Learning	Mohammed M. H. Qazzaz et.al.	2405.10042	translate	read	null
2024-05-16	Reward Centering	Abhishek Naik et.al.	2405.09999	translate	read	null
2024-05-16	Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning	Francisco Leiva et.al.	2405.09760	translate	read	null
2024-05-16	NIFTY Financial News Headlines Dataset	Raeid Saqur et.al.	2405.09747	translate	read	null
2024-05-15	Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning	Sihan Zeng et.al.	2405.09660	translate	read	null
2024-05-15	Reinforcement Learning-Based Framework for the Intelligent Adaptation of User Interfaces	Daniel Gaspar-Figueiredo et.al.	2405.09255	translate	read	null
2024-05-15	DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State Representation	Jingwen Yang et.al.	2405.09163	translate	read	null
2024-05-15	CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving	Dechen Gao et.al.	2405.09111	translate	read	null
2024-05-15	Chaos-based reinforcement learning with TD3	Toshitaka Matsuki et.al.	2405.09086	translate	read	null
2024-05-15	Deep Learning in Earthquake Engineering: A Comprehensive Review	Yazhou Xie et.al.	2405.09021	translate	read	null
2024-05-14	Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language	Jan Kaiser et.al.	2405.08888	translate	read	null
2024-05-14	Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes	Samuel Tesfazgi et.al.	2405.08756	translate	read	null
2024-05-14	Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach	Urvij Saroliya et.al.	2405.08754	translate	read	null
2024-05-14	Reinformer: Max-Return Sequence Modeling for offline RL	Zifeng Zhuang et.al.	2405.08740	translate	read	null
2024-05-14	I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning	Yashuai Yan et.al.	2405.08726	translate	read	null
2024-05-15	Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search Planning	Jan-Hendrik Ewers et.al.	2405.08691	translate	read	null
2024-05-14	A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning	Matteo Cederle et.al.	2405.08655	translate	read	link
2024-05-14	vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement	Yiwen Zhu et.al.	2405.08638	translate	read	null
2024-05-14	Optimizing Deep Reinforcement Learning for American Put Option Hedging	Reilly Pickard et.al.	2405.08602	translate	read	null
2024-05-14	Python-Based Reinforcement Learning on Simulink Models	Georg Schäfer et.al.	2405.08567	translate	read	null
2024-05-14	Growing Artificial Neural Networks for Control: the Role of Neuronal Diversity	Eleni Nisioti et.al.	2405.08510	translate	read	null
2024-05-13	Hierarchical Decision Mamba	André Correia et.al.	2405.07943	translate	read	link
2024-05-13	RLHF Workflow: From Reward Modeling to Online RLHF	Hanze Dong et.al.	2405.07863	translate	read	link
2024-05-13	Adaptive Exploration for Data-Efficient General Value Function Evaluations	Arushi Jain et.al.	2405.07838	translate	read	null
2024-05-13	Fixed Point Theory Analysis of a Lambda Policy Iteration with Randomization for the Ćirić Contraction Operator	Abdelkader Belhenniche et.al.	2405.07824	translate	read	null
2024-05-13	Hamiltonian-based Quantum Reinforcement Learning for Neural Combinatorial Optimization	Georg Kruse et.al.	2405.07790	translate	read	null
2024-05-13	Hype or Heuristic? Quantum Reinforcement Learning for Join Order Optimisation	Maja Franz et.al.	2405.07770	translate	read	null
2024-05-13	CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian Optimization	Wei-Ting Tang et.al.	2405.07760	translate	read	null
2024-05-13	MADRL-Based Rate Adaptation for 360 $\degree$ Video Streaming with Multi-Viewpoint Prediction	Haopeng Wang et.al.	2405.07759	translate	read	null
2024-05-13	Neural Network Compression for Reinforcement Learning Tasks	Dmitry A. Ivanov et.al.	2405.07748	translate	read	null
2024-05-13	Backdoor Removal for Generative Large Language Models	Haoran Li et.al.	2405.07667	translate	read	null
2024-05-10	Value Augmented Sampling for Language Model Alignment and Personalization	Seungwook Han et.al.	2405.06639	translate	read	link
2024-05-10	EcoEdgeTwin: Enhanced 6G Network via Mobile Edge Computing and Digital Twin Integration	Synthia Hossain Karobi et.al.	2405.06507	translate	read	null
2024-05-10	Advantageous and disadvantageous inequality aversion can be taught through vicarious learning of others’ preferences	Shen Zhang et.al.	2405.06500	translate	read	null
2024-05-10	Contextual Affordances for Safe Exploration in Robotic Scenarios	William Z. Ye et.al.	2405.06422	translate	read	null
2024-05-10	Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs	Davide Maran et.al.	2405.06363	translate	read	null
2024-05-10	Learning Latent Dynamic Robust Representations for World Models	Ruixiang Sun et.al.	2405.06263	translate	read	link
2024-05-10	Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning	Xiaoyu Wen et.al.	2405.06192	translate	read	link
2024-05-10	(A Partial Survey of) Decentralized, Cooperative Multi-Agent Reinforcement Learning	Christopher Amato et.al.	2405.06161	translate	read	null
2024-05-09	An RNN-policy gradient approach for quantum architecture search	Gang Wang et.al.	2405.05892	translate	read	null
2024-05-09	Safe Exploration Using Bayesian World Models and Log-Barrier Optimization	Yarden As et.al.	2405.05890	translate	read	null
2024-05-09	ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers	Liangliang Chen et.al.	2405.05861	translate	read	null
2024-05-09	Policy Gradient with Active Importance Sampling	Matteo Papini et.al.	2405.05630	translate	read	null
2024-05-09	An Automatic Prompt Generation System for Tabular Data Tasks	Ashlesha Akella et.al.	2405.05618	translate	read	null
2024-05-09	Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning	Yuchen Shi et.al.	2405.05542	translate	read	link
2024-05-08	Model-Free Robust $φ$ -Divergence Reinforcement Learning Using Both Offline and Online Data	Kishan Panaganti et.al.	2405.05468	translate	read	null
2024-05-08	Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management	Gang Hu et.al.	2405.05449	translate	read	null
2024-05-08	Learning to Play Pursuit-Evasion with Dynamic and Sensor Constraints	Burak M. Gonultas et.al.	2405.05372	translate	read	null
2024-05-08	Offline Model-Based Optimization via Policy-Guided Gradient Search	Yassine Chemingui et.al.	2405.05349	translate	read	link
2024-05-08	Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models	Aylin Gunal et.al.	2405.05060	translate	read	null
2024-05-08	Fault Identification Enhancement with Reinforcement Learning (FIERL)	Valentina Zaccaria et.al.	2405.04938	translate	read	link
2024-05-07	RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes	Kyle Stachowicz et.al.	2405.04714	translate	read	null
2024-05-07	Proximal Policy Optimization with Adaptive Exploration	Andrei Lixandru et.al.	2405.04664	translate	read	null
2024-05-07	ACEGEN: Reinforcement learning of generative chemical agents for drug discovery	Albert Bou et.al.	2405.04657	translate	read	link
2024-05-07	TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters	Jonathan Wilder Lavington et.al.	2405.04491	translate	read	null
2024-05-07	Designing, Developing, and Validating Network Intelligence for Scaling in Service-Based Architectures based on Deep Reinforcement Learning	Paola Soto et.al.	2405.04441	translate	read	null
2024-05-08	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	DeepSeek-AI et.al.	2405.04434	translate	read	link
2024-05-07	The Curse of Diversity in Ensemble-Based Exploration	Zhixuan Lin et.al.	2405.04342	translate	read	link
2024-05-07	Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation	Atharvan Dogra et.al.	2405.04325	translate	read	null
2024-05-07	Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies	Paul Templier et.al.	2405.04322	translate	read	null
2024-05-07	Improving Offline Reinforcement Learning with Inaccurate Simulators	Yiwen Hou et.al.	2405.04307	translate	read	null
2024-05-07	Deep Reinforcement Learning for Multi-User RF Charging with Non-linear Energy Harvesters	Amirhossein Azarbahram et.al.	2405.04218	translate	read	null
2024-05-07	In-context Learning for Automated Driving Scenarios	Ziqi Zhou et.al.	2405.04135	translate	read	null
2024-05-07	Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning	Chunlin Tian et.al.	2405.04122	translate	read	null
2024-05-06	$ε$ -Policy Gradient for Online Pricing	Lukasz Szpruch et.al.	2405.03624	translate	read	null
2024-05-06	Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions	Xingyou Song et.al.	2405.03547	translate	read	null
2024-05-06	ReinWiFi: A Reinforcement-Learning-Based Framework for the Application-Layer QoS Optimization of WiFi Networks	Qianren Li et.al.	2405.03526	translate	read	null
2024-05-06	Robotic Constrained Imitation Learning for the Peg Transfer Task in Fundamentals of Laparoscopic Surgery	Kento Kawaharazuka et.al.	2405.03440	translate	read	null
2024-05-06	Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning	Stone Tao et.al.	2405.03379	translate	read	null
2024-05-06	Enhancing Q-Learning with Large Language Model Heuristics	Xiefeng Wu et.al.	2405.03341	translate	read	null
2024-05-06	Artificial Intelligence in the Autonomous Navigation of Endovascular Interventions: A Systematic Review	Harry Robertshaw et.al.	2405.03305	translate	read	null
2024-05-06	End-to-End Reinforcement Learning of Curative Curtailment with Partial Measurement Availability	Hinrikus Wolf et.al.	2405.03262	translate	read	null
2024-05-06	Federated Reinforcement Learning with Constraint Heterogeneity	Hao Jin et.al.	2405.03236	translate	read	null
2024-05-06	Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning	Caleb Chuck et.al.	2405.03113	translate	read	null
2024-05-03	Geometric Fabrics: a Safe Guiding Medium for Policy Learning	Karl Van Wyk et.al.	2405.02250	translate	read	null
2024-05-03	Learning Optimal Deterministic Policies with Stochastic Policy Gradients	Alessandro Montenegro et.al.	2405.02235	translate	read	null
2024-05-03	The Cambridge RoboMaster: An Agile Multi-Robot Research Platform	Jan Blumenkamp et.al.	2405.02198	translate	read	null
2024-05-03	Imitation Learning in Discounted Linear MDPs without exploration assumptions	Luca Viano et.al.	2405.02181	translate	read	null
2024-05-03	Simulating the economic impact of rationality through reinforcement learning and agent-based modelling	Simone Brusatin et.al.	2405.02161	translate	read	null
2024-05-03	Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach	Anton Plaksin et.al.	2405.02044	translate	read	null
2024-05-03	Model-based reinforcement learning for protein backbone design	Frederic Renard et.al.	2405.01983	translate	read	null
2024-05-03	Rescale-Invariant Federated Reinforcement Learning for Resource Allocation in V2X Networks	Kaidi Xu et.al.	2405.01961	translate	read	null
2024-05-03	Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization	Changliang Zhou et.al.	2405.01906	translate	read	null
2024-05-03	Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants	Francesco Maldonato et.al.	2405.01889	translate	read	link
2024-05-02	Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks	Murtaza Dalal et.al.	2405.01534	translate	read	null
2024-05-02	FLAME: Factuality-Aware Alignment for Large Language Models	Sheng-Chieh Lin et.al.	2405.01525	translate	read	null
2024-05-02	NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment	Gerald Shen et.al.	2405.01481	translate	read	link
2024-05-02	IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning	Ryan Hoque et.al.	2405.01472	translate	read	null
2024-05-02	Goal-conditioned reinforcement learning for ultrasound navigation guidance	Abdoul Aziz Amadou et.al.	2405.01409	translate	read	null
2024-05-02	Learning Force Control for Legged Manipulation	Tifanny Portela et.al.	2405.01402	translate	read	null
2024-05-02	Constrained Reinforcement Learning Under Model Mismatch	Zhongchang Sun et.al.	2405.01327	translate	read	null
2024-05-02	Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network	Hyeonsu Lyu et.al.	2405.01314	translate	read	null
2024-05-02	Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning	Liu Qiyuan et.al.	2405.01284	translate	read	null
2024-05-02	Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation	Hao Wang et.al.	2405.01280	translate	read	null
2024-05-01	Self-Play Preference Optimization for Language Model Alignment	Yue Wu et.al.	2405.00675	translate	read	null
2024-05-01	No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO	Skander Moalla et.al.	2405.00662	translate	read	link
2024-05-01	HUGO – Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach	Malte Lehna et.al.	2405.00629	translate	read	null
2024-05-01	Koopman-based Deep Learning for Nonlinear System Estimation	Zexin Sun et.al.	2405.00627	translate	read	null
2024-05-01	Queue-based Eco-Driving at Roundabouts with Reinforcement Learning	Anna-Lena Schlamp et.al.	2405.00625	translate	read	null
2024-05-01	The Real, the Better: Aligning Large Language Models with Online Human Behaviors	Guanying Jiang et.al.	2405.00578	translate	read	null
2024-05-01	Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment	Zhili Liu et.al.	2405.00557	translate	read	null
2024-05-01	Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning	Lucas-Andreï Thil et.al.	2405.00516	translate	read	null
2024-05-01	MetaRM: Shifted Distributions Alignment via Meta-Learning	Shihan Dou et.al.	2405.00438	translate	read	null
2024-05-01	UCB-driven Utility Function Search for Multi-objective Reinforcement Learning	Yucheng Shi et.al.	2405.00410	translate	read	link

(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)