Reinforcement Learning - 2024-06
Reinforcement Learning - 2024-06
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-06-28 | PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators | Kuo-Hao Zeng et.al. | 2406.20083 | translate | read | null |
| 2024-06-28 | Applying RLAIF for Code Generation with API-usage in Lightweight LLMs | Sujan Dutta et.al. | 2406.20060 | translate | read | null |
| 2024-06-28 | HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid | Xinyu Xu et.al. | 2406.19972 | translate | read | null |
| 2024-06-28 | Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies | Pingcheng Jian et.al. | 2406.19971 | translate | read | null |
| 2024-06-28 | Operator World Models for Reinforcement Learning | Pietro Novelli et.al. | 2406.19861 | translate | read | null |
| 2024-06-28 | 3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints | Yoonkyu Yoo et.al. | 2406.19848 | translate | read | null |
| 2024-06-28 | Reinforcement Learning for Efficient Design and Control Co-optimisation of Energy Systems | Marine Cauz et.al. | 2406.19825 | translate | read | null |
| 2024-06-28 | Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning | Tobias Nagel et.al. | 2406.19817 | translate | read | null |
| 2024-06-28 | Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs | Shiyu Zhang et.al. | 2406.19812 | translate | read | null |
| 2024-06-28 | Decision Transformer for IRS-Assisted Systems with Diffusion-Driven Generative Channels | Jie Zhang et.al. | 2406.19769 | translate | read | null |
| 2024-06-27 | Efficient World Models with Context-Aware Tokenization | Vincent Micheli et.al. | 2406.19320 | translate | read | link |
| 2024-06-27 | Averaging log-likelihoods in direct alignment | Nathan Grinsztajn et.al. | 2406.19188 | translate | read | null |
| 2024-06-27 | Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion | Yannis Flet-Berliac et.al. | 2406.19185 | translate | read | null |
| 2024-06-27 | Learning Pareto Set for Multi-Objective Continuous Robot Control | Tianye Shu et.al. | 2406.18924 | translate | read | link |
| 2024-06-27 | Autonomous Control of a Novel Closed Chain Five Bar Active Suspension via Deep Reinforcement Learning | Nishesh Singh et.al. | 2406.18899 | translate | read | null |
| 2024-06-27 | State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems | Tochukwu Elijah Ogri et.al. | 2406.18804 | translate | read | null |
| 2024-06-26 | Decentralized Semantic Traffic Control in AVs Using RL and DQN for Dynamic Roadblocks | Emanuel Figetakis et.al. | 2406.18741 | translate | read | null |
| 2024-06-26 | Confident Natural Policy Gradient for Local Planning in $q_π$ -realizable Constrained MDPs | Tian Tian et.al. | 2406.18529 | translate | read | null |
| 2024-06-26 | Mental Modeling of Reinforcement Learning Agents by Language Models | Wenhao Lu et.al. | 2406.18505 | translate | read | null |
| 2024-06-26 | Preference Elicitation for Offline Reinforcement Learning | Alizée Pace et.al. | 2406.18450 | translate | read | null |
| 2024-06-26 | Mixture of Experts in a Mixture of RL settings | Timon Willi et.al. | 2406.18420 | translate | read | null |
| 2024-06-26 | AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors | Hao Shi et.al. | 2406.18394 | translate | read | null |
| 2024-06-26 | Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control | Zifan Liu et.al. | 2406.18351 | translate | read | null |
| 2024-06-26 | AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations | Adam Dahlgren Lindström et.al. | 2406.18346 | translate | read | null |
| 2024-06-26 | Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution | Wenting Chen et.al. | 2406.18310 | translate | read | link |
| 2024-06-26 | Combining Automated Optimisation of Hyperparameters and Reward Shape | Julian Dierkes et.al. | 2406.18293 | translate | read | link |
| 2024-06-26 | Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems | Italo Luis da Silva et.al. | 2406.18245 | translate | read | link |
| 2024-06-25 | EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data | Jesse Zhang et.al. | 2406.17768 | translate | read | null |
| 2024-06-25 | When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning | Claas Voelcker et.al. | 2406.17718 | translate | read | null |
| 2024-06-25 | Privacy Preserving Reinforcement Learning for Population Processes | Samuel Yang-Zhao et.al. | 2406.17649 | translate | read | null |
| 2024-06-25 | KANQAS: Kolmogorov Arnold Network for Quantum Architecture Search | Akash Kundu et.al. | 2406.17630 | translate | read | link |
| 2024-06-25 | Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations | Cheng Wang et.al. | 2406.17576 | translate | read | null |
| 2024-06-25 | On the consistency of hyper-parameter selection in value-based deep reinforcement learning | Johan Obando-Ceron et.al. | 2406.17523 | translate | read | null |
| 2024-06-25 | BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO | Sebastian Dittert et.al. | 2406.17490 | translate | read | null |
| 2024-06-25 | CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems | Zhen Chen et.al. | 2406.17425 | translate | read | null |
| 2024-06-25 | Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning | Tianfu Wang et.al. | 2406.17334 | translate | read | link |
| 2024-06-25 | The State-Action-Reward-State-Action Algorithm in Spatial Prisoner’s Dilemma Game | Lanyu Yang et.al. | 2406.17326 | translate | read | null |
| 2024-06-24 | Confidence Aware Inverse Constrained Reinforcement Learning | Sriram Ganapathi Subramanian et.al. | 2406.16782 | translate | read | null |
| 2024-06-24 | WARP: On the Benefits of Weight Averaged Rewarded Policies | Alexandre Ramé et.al. | 2406.16768 | translate | read | null |
| 2024-06-24 | The MRI Scanner as a Diagnostic: Image-less Active Sampling | Yuning Du et.al. | 2406.16754 | translate | read | null |
| 2024-06-24 | OCALM: Object-Centric Assessment with Language Models | Timo Kaufmann et.al. | 2406.16748 | translate | read | null |
| 2024-06-24 | Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization | Zhengyue Zhao et.al. | 2406.16743 | translate | read | null |
| 2024-06-24 | Probabilistic Subgoal Representations for Hierarchical Reinforcement learning | Vivienne Huiling Wang et.al. | 2406.16707 | translate | read | null |
| 2024-06-24 | Decentralized RL-Based Data Transmission Scheme for Energy Efficient Harvesting | Rafaela Scaciota et.al. | 2406.16624 | translate | read | null |
| 2024-06-24 | Towards Physically Talented Aerial Robots with Tactically Smart Swarm Behavior thereof: An Efficient Co-design Approach | Prajit KrisshnaKumar et.al. | 2406.16612 | translate | read | null |
| 2024-06-24 | $\text{Alpha}^2$ : Discovering Logical Formulaic Alphas using Deep Reinforcement Learning | Feng Xu et.al. | 2406.16505 | translate | read | link |
| 2024-06-24 | Towards Comprehensive Preference Data Collection for Reward Modeling | Yulan Hu et.al. | 2406.16486 | translate | read | null |
| 2024-06-21 | MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | Xuan He et.al. | 2406.15252 | translate | read | null |
| 2024-06-21 | Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning | Sattar Vakili et.al. | 2406.15250 | translate | read | null |
| 2024-06-21 | Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting | Jiyong Oh et.al. | 2406.15225 | translate | read | null |
| 2024-06-21 | Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks | Alex Quach et.al. | 2406.15149 | translate | read | null |
| 2024-06-21 | KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty | Philipp Becker et.al. | 2406.15131 | translate | read | null |
| 2024-06-21 | A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning | Gianluca Drappo et.al. | 2406.15124 | translate | read | null |
| 2024-06-21 | Towards General Negotiation Strategies with End-to-End Reinforcement Learning | Bram M. Renting et.al. | 2406.15096 | translate | read | null |
| 2024-06-21 | KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning | Jiahan Chen et.al. | 2406.15073 | translate | read | null |
| 2024-06-21 | Behaviour Distillation | Andrei Lupu et.al. | 2406.15042 | translate | read | link |
| 2024-06-21 | SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning | Matthias Weissenbacher et.al. | 2406.15025 | translate | read | null |
| 2024-06-20 | CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics | Jiawei Gao et.al. | 2406.14558 | translate | read | null |
| 2024-06-20 | MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading | Chuqiao Zong et.al. | 2406.14537 | translate | read | link |
| 2024-06-20 | RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold | Amrith Setlur et.al. | 2406.14532 | translate | read | link |
| 2024-06-20 | Learning telic-controllable state representations | Nadav Amir et.al. | 2406.14476 | translate | read | null |
| 2024-06-20 | Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue | Huifang Du et.al. | 2406.14457 | translate | read | null |
| 2024-06-20 | Revealing the learning process in reinforcement learning agents through attention-oriented metrics | Charlotte Beylier et.al. | 2406.14324 | translate | read | null |
| 2024-06-20 | Resource Optimization for Tail-Based Control in Wireless Networked Control Systems | Rasika Vijithasena et.al. | 2406.14301 | translate | read | null |
| 2024-06-21 | REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability | Shuang Ao et.al. | 2406.14214 | translate | read | link |
| 2024-06-20 | Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning | Amit Sharma et.al. | 2406.14169 | translate | read | null |
| 2024-06-20 | Iterative Sizing Field Prediction for Adaptive Mesh Generation From Expert Demonstrations | Niklas Freymuth et.al. | 2406.14161 | translate | read | link |
| 2024-06-18 | Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Haoxiang Wang et.al. | 2406.12845 | translate | read | link |
| 2024-06-18 | Injection Optimization at Particle Accelerators via Reinforcement Learning: From Simulation to Real-World Application | Awal Awal et.al. | 2406.12735 | translate | read | null |
| 2024-06-18 | A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement Learning | Flora Angileri et.al. | 2406.12667 | translate | read | null |
| 2024-06-18 | Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry | A. L. García Navarro et.al. | 2406.12602 | translate | read | null |
| 2024-06-18 | Discovering Minimal Reinforcement Learning Environments | Jarek Liesen et.al. | 2406.12589 | translate | read | null |
| 2024-06-18 | RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation | Shuting Wang et.al. | 2406.12566 | translate | read | null |
| 2024-06-18 | A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo | Miguel Vasco et.al. | 2406.12563 | translate | read | null |
| 2024-06-18 | Offline Imitation Learning with Model-based Reverse Augmentation | Jie-Jing Shao et.al. | 2406.12550 | translate | read | null |
| 2024-06-18 | Demonstrating Agile Flight from Pixels without State Estimation | Ismail Geles et.al. | 2406.12505 | translate | read | null |
| 2024-06-18 | Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning | Harry Robertshaw et.al. | 2406.12499 | translate | read | null |
| 2024-06-17 | WPO: Enhancing RLHF with Weighted Preference Optimization | Wenxuan Zhou et.al. | 2406.11827 | translate | read | link |
| 2024-06-17 | Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics | Runzhe Wu et.al. | 2406.11810 | translate | read | null |
| 2024-06-17 | Run Time Assured Reinforcement Learning for Six Degree-of-Freedom Spacecraft Inspection | Kyle Dunlap et.al. | 2406.11795 | translate | read | null |
| 2024-06-17 | FetchBench: A Simulation Benchmark for Robot Fetching | Beining Han et.al. | 2406.11793 | translate | read | null |
| 2024-06-17 | Optimal Transport-Assisted Risk-Sensitive Q-Learning | Zahra Shahrooei et.al. | 2406.11774 | translate | read | null |
| 2024-06-17 | Measuring memorization in RLHF for code completion | Aneesh Pappu et.al. | 2406.11715 | translate | read | null |
| 2024-06-17 | The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation | Noah Golowich et.al. | 2406.11686 | translate | read | null |
| 2024-06-17 | Communication-Efficient MARL for Platoon Stability and Energy-efficiency Co-optimization in Cooperative Adaptive Cruise Control of CAVs | Min Hua et.al. | 2406.11653 | translate | read | null |
| 2024-06-17 | Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions | Noah Golowich et.al. | 2406.11640 | translate | read | null |
| 2024-06-17 | Style Transfer with Multi-iteration Preference Optimization | Shuai Liu et.al. | 2406.11581 | translate | read | null |
| 2024-06-14 | Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs | Rui Yang et.al. | 2406.10216 | translate | read | null |
| 2024-06-14 | A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors | Naaman Tan et.al. | 2406.10203 | translate | read | null |
| 2024-06-14 | Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication | Sanjali Yadav et.al. | 2406.10166 | translate | read | null |
| 2024-06-14 | Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | Carson Denison et.al. | 2406.10162 | translate | read | link |
| 2024-06-14 | BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation | Dongjie Yu et.al. | 2406.10093 | translate | read | null |
| 2024-06-14 | PRIMER: Perception-Aware Robust Learning-based Multiagent Trajectory Planner | Kota Kondo et.al. | 2406.10060 | translate | read | null |
| 2024-06-14 | Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation | Federico Tavella et.al. | 2406.10043 | translate | read | null |
| 2024-06-14 | ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR | Vishwanath Pratap Singh et.al. | 2406.09999 | translate | read | null |
| 2024-06-14 | Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model | Siemen Herremans et.al. | 2406.09976 | translate | read | link |
| 2024-06-14 | InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning | Tiancheng Li et.al. | 2406.09973 | translate | read | null |
| 2024-06-13 | Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms | Miaosen Zhang et.al. | 2406.09397 | translate | read | null |
| 2024-06-13 | Is Value Learning Really the Main Bottleneck in Offline RL? | Seohong Park et.al. | 2406.09329 | translate | read | null |
| 2024-06-13 | OpenVLA: An Open-Source Vision-Language-Action Model | Moo Jin Kim et.al. | 2406.09246 | translate | read | null |
| 2024-06-13 | AutomaChef: A Physics-informed Demonstration-guided Learning Framework for Granular Material Manipulation | Minglun Wei et.al. | 2406.09178 | translate | read | null |
| 2024-06-13 | Direct Imitation Learning-based Visual Servoing using the Large Projection Formulation | Sayantan Auddy et.al. | 2406.09120 | translate | read | null |
| 2024-06-13 | Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems | Ashwin P. Dani et.al. | 2406.09097 | translate | read | null |
| 2024-06-13 | DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning | Xuemin Hu et.al. | 2406.09089 | translate | read | null |
| 2024-06-13 | Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles | Hao Zhang et.al. | 2406.09082 | translate | read | null |
| 2024-06-13 | Latent Assistance Networks: Rediscovering Hyperbolic Tangents in RL | Jacob E. Kooi et.al. | 2406.09079 | translate | read | null |
| 2024-06-13 | Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation | Claude Formanek et.al. | 2406.09068 | translate | read | null |
| 2024-06-12 | RILe: Reinforced Imitation Learning | Mert Albaba et.al. | 2406.08472 | translate | read | null |
| 2024-06-12 | Adaptive Swarm Mesh Refinement using Deep Reinforcement Learning with Local Rewards | Niklas Freymuth et.al. | 2406.08440 | translate | read | null |
| 2024-06-12 | RRLS : Robust Reinforcement Learning Suite | Adil Zouitine et.al. | 2406.08406 | translate | read | link |
| 2024-06-12 | Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning | Yuhui Wang et.al. | 2406.08404 | translate | read | null |
| 2024-06-12 | Time-Constrained Robust MDPs | Adil Zouitine et.al. | 2406.08395 | translate | read | null |
| 2024-06-12 | Residual Learning and Context Encoding for Adaptive Offline-to-Online Reinforcement Learning | Mohammadreza Nakhaei et.al. | 2406.08238 | translate | read | link |
| 2024-06-12 | MaIL: Improving Imitation Learning with Mamba | Xiaogang Jia et.al. | 2406.08234 | translate | read | null |
| 2024-06-12 | Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning | Max Weltevrede et.al. | 2406.08069 | translate | read | null |
| 2024-06-12 | Deep reinforcement learning with positional context for intraday trading | Sven Goluža et.al. | 2406.08013 | translate | read | null |
| 2024-06-12 | Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning | Yizhe Huang et.al. | 2406.08002 | translate | read | null |
| 2024-06-11 | CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning | Zeyuan Liu et.al. | 2406.07541 | translate | read | null |
| 2024-06-11 | BAKU: An Efficient Transformer for Multi-Task Policy Learning | Siddhant Haldar et.al. | 2406.07539 | translate | read | null |
| 2024-06-11 | Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis | Qining Zhang et.al. | 2406.07455 | translate | read | null |
| 2024-06-11 | Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization | Weiliang Zhang et.al. | 2406.07418 | translate | read | null |
| 2024-06-11 | Federated Multi-Agent DRL for Radio Resource Management in Industrial 6G in-X subnetworks | Bjarke Madsen et.al. | 2406.07383 | translate | read | null |
| 2024-06-11 | World Models with Hints of Large Language Models for Goal Achieving | Zeyuan Liu et.al. | 2406.07381 | translate | read | null |
| 2024-06-11 | EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning | Yijun Hao et.al. | 2406.07342 | translate | read | null |
| 2024-06-11 | Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling | Constantin Waubert de Puiseau et.al. | 2406.07325 | translate | read | null |
| 2024-06-11 | Multi-objective Reinforcement learning from AI Feedback | Marcus Williams et.al. | 2406.07295 | translate | read | null |
| 2024-06-11 | Hybrid Reinforcement Learning from Offline Observation Alone | Yuda Song et.al. | 2406.07253 | translate | read | null |
| 2024-06-10 | Verification-Guided Shielding for Deep Reinforcement Learning | Davide Corsi et.al. | 2406.06507 | translate | read | null |
| 2024-06-10 | Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation | Mohidul Haque Mridul et.al. | 2406.06500 | translate | read | null |
| 2024-06-10 | Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity | Calarina Muslimani et.al. | 2406.06495 | translate | read | null |
| 2024-06-10 | Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots | Bahador Beigomi et.al. | 2406.06460 | translate | read | link |
| 2024-06-10 | Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning? | Denis Tarasov et.al. | 2406.06309 | translate | read | link |
| 2024-06-10 | Learning-based cognitive architecture for enhancing coordination in human groups | Antonio Grotta et.al. | 2406.06297 | translate | read | null |
| 2024-06-10 | Deep Multi-Objective Reinforcement Learning for Utility-Based Infrastructural Maintenance Optimization | Jesse van Remmerden et.al. | 2406.06184 | translate | read | null |
| 2024-06-10 | Mastering truss structure optimization with tree search | Gabriel E. Garayalde et.al. | 2406.06145 | translate | read | null |
| 2024-06-10 | EXPIL: Explanatory Predicate Invention for Learning in Games | Jingyuan Sha et.al. | 2406.06107 | translate | read | null |
| 2024-06-10 | Sim-To-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery | Paul Maria Scheikl et.al. | 2406.06092 | translate | read | null |
| 2024-06-07 | LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration | Tavor Lipman et.al. | 2406.05107 | translate | read | null |
| 2024-06-07 | Massively Multiagent Minigames for Training Generalist Agents | Kyoung Whan Choe et.al. | 2406.05071 | translate | read | link |
| 2024-06-07 | Online Frequency Scheduling by Learning Parallel Actions | Anastasios Giovanidis et.al. | 2406.05041 | translate | read | null |
| 2024-06-07 | Optimizing Automatic Differentiation with Deep Reinforcement Learning | Jamie Lohoff et.al. | 2406.05027 | translate | read | null |
| 2024-06-07 | Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems | Rohan Paleja et.al. | 2406.05003 | translate | read | null |
| 2024-06-07 | SLOPE: Search with Learned Optimal Pruning-based Expansion | Davor Bokan et.al. | 2406.04935 | translate | read | link |
| 2024-06-07 | Sim-to-real Transfer of Deep Reinforcement Learning Agents for Online Coverage Path Planning | Arvi Jonnarth et.al. | 2406.04920 | translate | read | null |
| 2024-06-07 | Online Adaptation for Enhancing Imitation Learning Policies | Federico Malato et.al. | 2406.04913 | translate | read | link |
| 2024-06-07 | Stabilizing Extreme Q-learning by Maclaurin Expansion | Motoki Omura et.al. | 2406.04896 | translate | read | null |
| 2024-06-07 | Primitive Agentic First-Order Optimization | R. Sala et.al. | 2406.04841 | translate | read | null |
| 2024-06-06 | ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories | Qianlan Yang et.al. | 2406.04323 | translate | read | null |
| 2024-06-06 | Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models | Xiang Ji et.al. | 2406.04274 | translate | read | null |
| 2024-06-06 | Multi-Agent Imitation Learning: Value is Easy, Regret is Hard | Jingwu Tang et.al. | 2406.04219 | translate | read | null |
| 2024-06-06 | Aligning Agents like Large Language Models | Adam Jelley et.al. | 2406.04208 | translate | read | null |
| 2024-06-06 | MARLander: A Local Path Planning for Drone Swarms using Multiagent Deep Reinforcement Learning | Demetros Aschu et.al. | 2406.04159 | translate | read | null |
| 2024-06-06 | Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning | Abdullah Akgül et.al. | 2406.04088 | translate | read | null |
| 2024-06-06 | Bootstrapping Expectiles in Reinforcement Learning | Pierre Clavier et.al. | 2406.04081 | translate | read | null |
| 2024-06-06 | Spatio-temporal Early Prediction based on Multi-objective Reinforcement Learning | Wei Shao et.al. | 2406.04035 | translate | read | link |
| 2024-06-06 | Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents | Yoann Poupart et.al. | 2406.04028 | translate | read | link |
| 2024-06-06 | HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning | Quentin Delfosse et.al. | 2406.03997 | translate | read | link |
| 2024-06-05 | Automating Turkish Educational Quiz Generation Using Large Language Models | Kamyar Zeinalipour et.al. | 2406.03397 | translate | read | null |
| 2024-06-05 | LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | Timon Ziegenbein et.al. | 2406.03363 | translate | read | link |
| 2024-06-05 | UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning | Yu Zhang et.al. | 2406.03324 | translate | read | null |
| 2024-06-05 | Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning | Mohamed Elsayed et.al. | 2406.03276 | translate | read | null |
| 2024-06-05 | Prompt-based Visual Alignment for Zero-shot Policy Transfer | Haihan Gao et.al. | 2406.03250 | translate | read | null |
| 2024-06-05 | Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning | Inwoo Hwang et.al. | 2406.03234 | translate | read | link |
| 2024-06-05 | CommonPower: Supercharging Machine Learning for Smart Grids | Michael Eichelbeck et.al. | 2406.03231 | translate | read | link |
| 2024-06-05 | Object Manipulation in Marine Environments using Reinforcement Learning | Ahmed Nader et.al. | 2406.03223 | translate | read | null |
| 2024-06-05 | Adaptive Distance Functions via Kelvin Transformation | Rafael I. Cabral Muchacho et.al. | 2406.03200 | translate | read | null |
| 2024-06-05 | DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays | Bo Xia et.al. | 2406.03102 | translate | read | null |
| 2024-06-04 | RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots | Soroush Nasiriany et.al. | 2406.02523 | translate | read | link |
| 2024-06-04 | Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs | Filippo Valdettaro et.al. | 2406.02456 | translate | read | null |
| 2024-06-04 | A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies | Md Mirajul Islam et.al. | 2406.02450 | translate | read | null |
| 2024-06-04 | Algorithmic Collusion in Dynamic Pricing with Deep Reinforcement Learning | Shidi Deng et.al. | 2406.02437 | translate | read | null |
| 2024-06-04 | Seed-TTS: A Family of High-Quality Versatile Speech Generation Models | Philip Anastassiou et.al. | 2406.02430 | translate | read | link |
| 2024-06-04 | Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning | Jiaxu Wang et.al. | 2406.02370 | translate | read | null |
| 2024-06-04 | How to Explore with Belief: State Entropy Maximization in POMDPs | Riccardo Zamboni et.al. | 2406.02295 | translate | read | null |
| 2024-06-04 | Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling | Arthur Müller et.al. | 2406.02294 | translate | read | null |
| 2024-06-04 | Test-Time Regret Minimization in Meta Reinforcement Learning | Mirco Mutti et.al. | 2406.02282 | translate | read | null |
| 2024-06-04 | Reinforcement Learning with Lookahead Information | Nadav Merlis et.al. | 2406.02258 | translate | read | null |
| 2024-06-03 | Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles | Jiesong Lian et.al. | 2405.21027 | translate | read | null |
(<a href=../Reinforcement_Learning.md>back to Reinforcement Learning</a>)