2024-07-15 |
Walking the Values in Bayesian Inverse Reinforcement Learning |
Ondrej Bajgar et.al. |
2407.10971 |
null |
2024-07-15 |
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning |
Haohong Lin et.al. |
2407.10967 |
null |
2024-07-15 |
Hedging Beyond the Mean: A Distributional Reinforcement Learning Perspective for Hedging Portfolios with Structured Products |
Anil Sharma et.al. |
2407.10903 |
null |
2024-07-15 |
Offline Reinforcement Learning with Imputed Rewards |
Carlo Romeo et.al. |
2407.10839 |
null |
2024-07-15 |
Exploration in Knowledge Transfer Utilizing Reinforcement Learning |
Adam Jedlička et.al. |
2407.10835 |
null |
2024-07-15 |
GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents |
Haoyuan Jiang et.al. |
2407.10811 |
null |
2024-07-15 |
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning |
Alessandro Montenegro et.al. |
2407.10775 |
null |
2024-07-15 |
Balancing the Scales: Reinforcement Learning for Fair Classification |
Leon Eshuijs et.al. |
2407.10629 |
null |
2024-07-15 |
Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena |
Haipeng Luo et.al. |
2407.10627 |
null |
2024-07-15 |
Three Dogmas of Reinforcement Learning |
David Abel et.al. |
2407.10583 |
null |
2024-07-12 |
Learning Coordinated Maneuver in Adversarial Environments |
Zechen Hu et.al. |
2407.09469 |
null |
2024-07-12 |
ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts |
Amelia F. Hardy et.al. |
2407.09447 |
null |
2024-07-12 |
A Benchmark Environment for Offline Reinforcement Learning in Racing Games |
Girolamo Macaluso et.al. |
2407.09415 |
link |
2024-07-12 |
Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments |
Zoya Volovikova et.al. |
2407.09287 |
null |
2024-07-12 |
GNN with Model-based RL for Multi-agent Systems |
Hanxiao Chen et.al. |
2407.09249 |
null |
2024-07-12 |
Constrained Intrinsic Motivation for Reinforcement Learning |
Xiang Zheng et.al. |
2407.09247 |
null |
2024-07-12 |
Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network |
Shun Kotoku et.al. |
2407.09124 |
null |
2024-07-12 |
New Desiderata for Direct Preference Optimization |
Xiangkun Hu et.al. |
2407.09072 |
null |
2024-07-12 |
Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control |
Huayu Chen et.al. |
2407.09024 |
null |
2024-07-12 |
Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control |
Sicong Jiang et.al. |
2407.08964 |
null |
2024-07-11 |
MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces |
Wayne Wu et.al. |
2407.08725 |
null |
2024-07-11 |
RoboMorph: Evolving Robot Morphology using Large Language Models |
Kevin Qiu et.al. |
2407.08626 |
null |
2024-07-11 |
A Review of Nine Physics Engines for Reinforcement Learning Research |
Michael Kaup et.al. |
2407.08590 |
null |
2024-07-11 |
HACMan++: Spatially-Grounded Motion Primitives for Manipulation |
Bowen Jiang et.al. |
2407.08585 |
null |
2024-07-11 |
TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations |
Junik Bae et.al. |
2407.08464 |
null |
2024-07-11 |
Distributed Deep Reinforcement Learning Based Gradient Quantization for Federated Learning Enabled Vehicle Edge Computing |
Cui Zhang et.al. |
2407.08462 |
null |
2024-07-11 |
Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning |
Shulin Song et.al. |
2407.08458 |
link |
2024-07-11 |
A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning |
Adrien Banse et.al. |
2407.08324 |
null |
2024-07-11 |
A Deep Reinforcement Learning Framework and Methodology for Reducing the Sim-to-Real Gap in ASV Navigation |
Luis F W Batista et.al. |
2407.08263 |
null |
2024-07-11 |
Gradient Boosting Reinforcement Learning |
Benjamin Fuhrer et.al. |
2407.08250 |
link |
2024-07-10 |
Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing |
Jessica Yin et.al. |
2407.07885 |
null |
2024-07-10 |
Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation |
Eugene Teoh et.al. |
2407.07868 |
null |
2024-07-10 |
Reinforcement Learning of Adaptive Acquisition Policies for Inverse Problems |
Gianluigi Silvestri et.al. |
2407.07794 |
null |
2024-07-11 |
BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark |
Nikita Chernyadev et.al. |
2407.07788 |
null |
2024-07-10 |
Continuous Control with Coarse-to-fine Reinforcement Learning |
Younggyo Seo et.al. |
2407.07787 |
null |
2024-07-10 |
Towards Human-Like Driving: Active Inference in Autonomous Vehicle Control |
Elahe Delavari et.al. |
2407.07684 |
null |
2024-07-10 |
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning |
Dake Zhang et.al. |
2407.07631 |
null |
2024-07-10 |
Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network |
Yu Xie et.al. |
2407.07575 |
link |
2024-07-10 |
CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias |
Jiacheng Shen et.al. |
2407.07454 |
link |
2024-07-10 |
Real-time system optimal traffic routing under uncertainties -- Can physics models boost reinforcement learning? |
Zemian Ke et.al. |
2407.07364 |
null |
2024-07-09 |
Safe and Reliable Training of Learning-Based Aerospace Controllers |
Udayan Mandal et.al. |
2407.07088 |
link |
2024-07-09 |
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models |
Logan Cross et.al. |
2407.07086 |
link |
2024-07-09 |
Can Learned Optimization Make Reinforcement Learning Less Difficult? |
Alexander David Goldie et.al. |
2407.07082 |
link |
2024-07-09 |
A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning |
Jesse Jiang et.al. |
2407.06931 |
null |
2024-07-09 |
Intercepting Unauthorized Aerial Robots in Controlled Airspace Using Reinforcement Learning |
Francisco Giral et.al. |
2407.06909 |
null |
2024-07-09 |
Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective |
Shahana Ibrahim et.al. |
2407.06902 |
null |
2024-07-09 |
Energy Efficient Fair STAR-RIS for Mobile Users |
Ashok S. Kumar et.al. |
2407.06868 |
null |
2024-07-09 |
Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning |
Augustine N. Mavor-Parker et.al. |
2407.06756 |
null |
2024-07-09 |
Hierarchical Average-Reward Linearly-solvable Markov Decision Processes |
Guillermo Infante et.al. |
2407.06690 |
null |
2024-07-09 |
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning |
Fanyue Wei et.al. |
2407.06642 |
link |
2024-07-08 |
Periodic agent-state based Q-learning for POMDPs |
Amit Sinha et.al. |
2407.06121 |
null |
2024-07-08 |
QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train |
Chen-Yu Liu et.al. |
2407.06103 |
null |
2024-07-08 |
Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation |
Sara Pohland et.al. |
2407.06056 |
link |
2024-07-08 |
iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement |
Aoyu Pang et.al. |
2407.06025 |
link |
2024-07-08 |
On Bellman equations for continuous-time policy evaluation I: discretization and approximation |
Wenlong Mou et.al. |
2407.05966 |
null |
2024-07-08 |
Graph Anomaly Detection with Noisy Labels by Reinforcement Learning |
Zhu Wang et.al. |
2407.05934 |
null |
2024-07-08 |
FedMRL: Data Heterogeneity Aware Federated Multi-agent Deep Reinforcement Learning for Medical Imaging |
Pranab Sahoo et.al. |
2407.05800 |
link |
2024-07-08 |
Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning |
Jakob Nyberg et.al. |
2407.05775 |
link |
2024-07-08 |
Multi-agent Reinforcement Learning-based Network Intrusion Detection System |
Amine Tellache et.al. |
2407.05766 |
null |
2024-07-08 |
$\mathrm{E^{2}CFD}$ : Towards Effective and Efficient Cost Function Design for Safe Reinforcement Learning via Large Language Model |
Zepeng Wang et.al. |
2407.05580 |
null |
2024-07-05 |
Graph Reinforcement Learning in Power Grids: A Survey |
Mohamed Hassouna et.al. |
2407.04522 |
null |
2024-07-05 |
Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks |
Timon Sachweh et.al. |
2407.04481 |
null |
2024-07-05 |
Hindsight Preference Learning for Offline Preference-based Reinforcement Learning |
Chen-Xiao Gao et.al. |
2407.04451 |
link |
2024-07-05 |
Enhancing Safety for Autonomous Agents in Partly Concealed Urban Traffic Environments Through Representation-Based Shielding |
Pierre Haritz et.al. |
2407.04343 |
link |
2024-07-05 |
Gradient-based Regularization for Action Smoothness in Robotic Control with Reinforcement Learning |
I Lee et.al. |
2407.04315 |
null |
2024-07-05 |
Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling |
Jiawei Xu et.al. |
2407.04285 |
null |
2024-07-05 |
Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator |
Mehryar Abbasi et.al. |
2407.04258 |
null |
2024-07-05 |
PA-LOCO: Learning Perturbation-Adaptive Locomotion for Quadruped Robots |
Zhiyuan Xiao et.al. |
2407.04224 |
null |
2024-07-05 |
Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents |
Sam Earle et.al. |
2407.04221 |
null |
2024-07-04 |
Orchestrating LLMs with Different Personalizations |
Jin Peng Zhou et.al. |
2407.04181 |
null |
2024-07-03 |
Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations |
Trevor Ablett et.al. |
2407.03311 |
link |
2024-07-03 |
A Review of the Applications of Deep Learning-Based Emergent Communication |
Brendon Boldt et.al. |
2407.03302 |
null |
2024-07-03 |
Cooperative Multi-Agent Deep Reinforcement Learning Methods for UAV-aided Mobile Edge Computing Networks |
Mintae Kim et.al. |
2407.03280 |
null |
2024-07-03 |
Policy-guided Monte Carlo on general state spaces: Application to glass-forming mixtures |
Leonardo Galliano et.al. |
2407.03275 |
null |
2024-07-03 |
PPO-based Dynamic Control of Uncertain Floating Platforms in the Zero-G Environment |
Mahya Ramezani et.al. |
2407.03224 |
null |
2024-07-03 |
Combining AI Control Systems and Human Decision Support via Robustness and Criticality |
Walt Woods et.al. |
2407.03210 |
null |
2024-07-03 |
Reinforcement Learning for Sequence Design Leveraging Protein Language Models |
Jithendaraa Subramanian et.al. |
2407.03154 |
null |
2024-07-03 |
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes |
Asaf Cassel et.al. |
2407.03065 |
null |
2024-07-03 |
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment |
Janghwan Lee et.al. |
2407.03051 |
null |
2024-07-03 |
On the Client Preference of LLM Fine-tuning in Federated Learning |
Feijie Wu et.al. |
2407.03038 |
null |
2024-07-03 |
PWM: Policy Learning with Large World Models |
Ignat Georgiev et.al. |
2407.02466 |
null |
2024-07-02 |
Predicting Visual Attention in Graphic Design Documents |
Souradeep Chakraborty et.al. |
2407.02439 |
null |
2024-07-02 |
Reinforcement Learning and Machine ethics:a systematic review |
Ajay Vishwanath et.al. |
2407.02425 |
null |
2024-07-02 |
Talking to Machines: do you read me? |
Lina M. Rojas-Barahona et.al. |
2407.02354 |
null |
2024-07-02 |
DextrAH-G: Pixels-to-Action Dexterous Arm-Hand Grasping with Geometric Fabrics |
Tyler Ga Wei Lum et.al. |
2407.02274 |
null |
2024-07-02 |
Safe CoR: A Dual-Expert Approach to Integrating Imitation Learning and Safe Reinforcement Learning Using Constraint Rewards |
Hyeokjin Kwon et.al. |
2407.02245 |
null |
2024-07-02 |
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization |
Yuchen Hu et.al. |
2407.02243 |
null |
2024-07-02 |
Safety-Driven Deep Reinforcement Learning Framework for Cobots: A Sim2Real Approach |
Ammar N. Abbas et.al. |
2407.02231 |
link |
2024-07-02 |
Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning |
Zakariae El Asri et.al. |
2407.02217 |
null |
2024-07-02 |
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning |
Yifang Chen et.al. |
2407.02119 |
null |
2024-06-28 |
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators |
Kuo-Hao Zeng et.al. |
2406.20083 |
null |
2024-06-28 |
Applying RLAIF for Code Generation with API-usage in Lightweight LLMs |
Sujan Dutta et.al. |
2406.20060 |
null |
2024-06-28 |
HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid |
Xinyu Xu et.al. |
2406.19972 |
link |
2024-06-28 |
Operator World Models for Reinforcement Learning |
Pietro Novelli et.al. |
2406.19861 |
null |
2024-06-28 |
3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints |
Yoonkyu Yoo et.al. |
2406.19848 |
null |
2024-06-28 |
Reinforcement Learning for Efficient Design and Control Co-optimisation of Energy Systems |
Marine Cauz et.al. |
2406.19825 |
null |
2024-06-28 |
Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning |
Tobias Nagel et.al. |
2406.19817 |
null |
2024-06-28 |
Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs |
Shiyu Zhang et.al. |
2406.19812 |
link |
2024-06-28 |
Decision Transformer for IRS-Assisted Systems with Diffusion-Driven Generative Channels |
Jie Zhang et.al. |
2406.19769 |
null |
2024-07-01 |
Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors |
Emma Cramer et.al. |
2406.19768 |
link |
2024-06-27 |
Efficient World Models with Context-Aware Tokenization |
Vincent Micheli et.al. |
2406.19320 |
link |
2024-06-27 |
Averaging log-likelihoods in direct alignment |
Nathan Grinsztajn et.al. |
2406.19188 |
null |
2024-06-27 |
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion |
Yannis Flet-Berliac et.al. |
2406.19185 |
null |
2024-06-27 |
Learning Pareto Set for Multi-Objective Continuous Robot Control |
Tianye Shu et.al. |
2406.18924 |
link |
2024-06-27 |
Autonomous Control of a Novel Closed Chain Five Bar Active Suspension via Deep Reinforcement Learning |
Nishesh Singh et.al. |
2406.18899 |
null |
2024-06-27 |
State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems |
Tochukwu Elijah Ogri et.al. |
2406.18804 |
null |
2024-06-26 |
Decentralized Semantic Traffic Control in AVs Using RL and DQN for Dynamic Roadblocks |
Emanuel Figetakis et.al. |
2406.18741 |
null |
2024-06-26 |
Confident Natural Policy Gradient for Local Planning in $q_π$ -realizable Constrained MDPs |
Tian Tian et.al. |
2406.18529 |
null |
2024-06-26 |
Mental Modeling of Reinforcement Learning Agents by Language Models |
Wenhao Lu et.al. |
2406.18505 |
null |
2024-06-26 |
Preference Elicitation for Offline Reinforcement Learning |
Alizée Pace et.al. |
2406.18450 |
null |
2024-06-26 |
Mixture of Experts in a Mixture of RL settings |
Timon Willi et.al. |
2406.18420 |
null |
2024-06-26 |
AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors |
Hao Shi et.al. |
2406.18394 |
null |
2024-06-26 |
Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control |
Zifan Liu et.al. |
2406.18351 |
null |
2024-06-26 |
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations |
Adam Dahlgren Lindström et.al. |
2406.18346 |
null |
2024-06-26 |
Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution |
Wenting Chen et.al. |
2406.18310 |
link |
2024-06-26 |
Combining Automated Optimisation of Hyperparameters and Reward Shape |
Julian Dierkes et.al. |
2406.18293 |
link |
2024-06-27 |
Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems |
Italo Luis da Silva et.al. |
2406.18245 |
link |
2024-06-25 |
EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data |
Jesse Zhang et.al. |
2406.17768 |
null |
2024-06-25 |
When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning |
Claas Voelcker et.al. |
2406.17718 |
link |
2024-06-25 |
Privacy Preserving Reinforcement Learning for Population Processes |
Samuel Yang-Zhao et.al. |
2406.17649 |
null |
2024-06-25 |
KANQAS: Kolmogorov Arnold Network for Quantum Architecture Search |
Akash Kundu et.al. |
2406.17630 |
link |
2024-06-25 |
Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations |
Cheng Wang et.al. |
2406.17576 |
null |
2024-06-25 |
On the consistency of hyper-parameter selection in value-based deep reinforcement learning |
Johan Obando-Ceron et.al. |
2406.17523 |
link |
2024-06-25 |
BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO |
Sebastian Dittert et.al. |
2406.17490 |
null |
2024-06-25 |
CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems |
Zhen Chen et.al. |
2406.17425 |
null |
2024-06-25 |
Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning |
Tianfu Wang et.al. |
2406.17334 |
link |
2024-06-25 |
The State-Action-Reward-State-Action Algorithm in Spatial Prisoner's Dilemma Game |
Lanyu Yang et.al. |
2406.17326 |
null |
2024-06-24 |
Confidence Aware Inverse Constrained Reinforcement Learning |
Sriram Ganapathi Subramanian et.al. |
2406.16782 |
link |
2024-06-24 |
WARP: On the Benefits of Weight Averaged Rewarded Policies |
Alexandre Ramé et.al. |
2406.16768 |
null |
2024-06-24 |
The MRI Scanner as a Diagnostic: Image-less Active Sampling |
Yuning Du et.al. |
2406.16754 |
null |
2024-06-24 |
OCALM: Object-Centric Assessment with Language Models |
Timo Kaufmann et.al. |
2406.16748 |
null |
2024-06-24 |
Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization |
Zhengyue Zhao et.al. |
2406.16743 |
null |
2024-06-24 |
Probabilistic Subgoal Representations for Hierarchical Reinforcement learning |
Vivienne Huiling Wang et.al. |
2406.16707 |
null |
2024-06-24 |
Decentralized RL-Based Data Transmission Scheme for Energy Efficient Harvesting |
Rafaela Scaciota et.al. |
2406.16624 |
null |
2024-06-24 |
Towards Physically Talented Aerial Robots with Tactically Smart Swarm Behavior thereof: An Efficient Co-design Approach |
Prajit KrisshnaKumar et.al. |
2406.16612 |
null |
2024-06-24 |
$\text{Alpha}^2$ : Discovering Logical Formulaic Alphas using Deep Reinforcement Learning |
Feng Xu et.al. |
2406.16505 |
link |
2024-06-24 |
Towards Comprehensive Preference Data Collection for Reward Modeling |
Yulan Hu et.al. |
2406.16486 |
null |
2024-06-21 |
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation |
Xuan He et.al. |
2406.15252 |
null |
2024-06-21 |
Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning |
Sattar Vakili et.al. |
2406.15250 |
null |
2024-06-21 |
Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting |
Jiyong Oh et.al. |
2406.15225 |
null |
2024-06-21 |
KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty |
Philipp Becker et.al. |
2406.15131 |
null |
2024-06-21 |
A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning |
Gianluca Drappo et.al. |
2406.15124 |
null |
2024-06-21 |
Towards General Negotiation Strategies with End-to-End Reinforcement Learning |
Bram M. Renting et.al. |
2406.15096 |
null |
2024-06-21 |
KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning |
Jiahan Chen et.al. |
2406.15073 |
null |
2024-06-21 |
Behaviour Distillation |
Andrei Lupu et.al. |
2406.15042 |
link |
2024-06-21 |
SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning |
Matthias Weissenbacher et.al. |
2406.15025 |
link |
2024-06-21 |
Evolution of Rewards for Food and Motor Action by Simulating Birth and Death |
Yuji Kanagawa et.al. |
2406.15016 |
null |
2024-06-20 |
CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics |
Jiawei Gao et.al. |
2406.14558 |
null |
2024-06-20 |
MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading |
Chuqiao Zong et.al. |
2406.14537 |
link |
2024-06-20 |
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold |
Amrith Setlur et.al. |
2406.14532 |
link |
2024-06-20 |
Learning telic-controllable state representations |
Nadav Amir et.al. |
2406.14476 |
null |
2024-06-20 |
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue |
Huifang Du et.al. |
2406.14457 |
null |
2024-06-20 |
Revealing the learning process in reinforcement learning agents through attention-oriented metrics |
Charlotte Beylier et.al. |
2406.14324 |
null |
2024-06-20 |
Resource Optimization for Tail-Based Control in Wireless Networked Control Systems |
Rasika Vijithasena et.al. |
2406.14301 |
null |
2024-06-21 |
REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability |
Shuang Ao et.al. |
2406.14214 |
link |
2024-06-20 |
Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning |
Amit Sharma et.al. |
2406.14169 |
null |
2024-06-20 |
Tractable Equilibrium Computation in Markov Games through Risk Aversion |
Eric Mazumdar et.al. |
2406.14156 |
null |
2024-06-18 |
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts |
Haoxiang Wang et.al. |
2406.12845 |
link |
2024-06-18 |
Injection Optimization at Particle Accelerators via Reinforcement Learning: From Simulation to Real-World Application |
Awal Awal et.al. |
2406.12735 |
null |
2024-06-18 |
A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement Learning |
Flora Angileri et.al. |
2406.12667 |
null |
2024-06-18 |
Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry |
A. L. García Navarro et.al. |
2406.12602 |
link |
2024-06-18 |
Discovering Minimal Reinforcement Learning Environments |
Jarek Liesen et.al. |
2406.12589 |
link |
2024-06-18 |
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation |
Shuting Wang et.al. |
2406.12566 |
null |
2024-06-18 |
A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo |
Miguel Vasco et.al. |
2406.12563 |
null |
2024-06-18 |
Offline Imitation Learning with Model-based Reverse Augmentation |
Jie-Jing Shao et.al. |
2406.12550 |
null |
2024-06-18 |
Demonstrating Agile Flight from Pixels without State Estimation |
Ismail Geles et.al. |
2406.12505 |
null |
2024-06-18 |
Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning |
Harry Robertshaw et.al. |
2406.12499 |
null |
2024-06-17 |
WPO: Enhancing RLHF with Weighted Preference Optimization |
Wenxuan Zhou et.al. |
2406.11827 |
link |
2024-06-17 |
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics |
Runzhe Wu et.al. |
2406.11810 |
null |
2024-06-17 |
Run Time Assured Reinforcement Learning for Six Degree-of-Freedom Spacecraft Inspection |
Kyle Dunlap et.al. |
2406.11795 |
null |
2024-06-17 |
Optimal Transport-Assisted Risk-Sensitive Q-Learning |
Zahra Shahrooei et.al. |
2406.11774 |
null |
2024-06-17 |
Measuring memorization in RLHF for code completion |
Aneesh Pappu et.al. |
2406.11715 |
null |
2024-06-18 |
The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation |
Noah Golowich et.al. |
2406.11686 |
null |
2024-06-17 |
Communication-Efficient MARL for Platoon Stability and Energy-efficiency Co-optimization in Cooperative Adaptive Cruise Control of CAVs |
Min Hua et.al. |
2406.11653 |
null |
2024-06-18 |
Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions |
Noah Golowich et.al. |
2406.11640 |
null |
2024-06-17 |
Style Transfer with Multi-iteration Preference Optimization |
Shuai Liu et.al. |
2406.11581 |
null |
2024-06-17 |
Intersymbolic AI: Interlinking Symbolic AI and Subsymbolic AI |
André Platzer et.al. |
2406.11563 |
null |
2024-06-14 |
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs |
Rui Yang et.al. |
2406.10216 |
null |
2024-06-14 |
A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors |
Naaman Tan et.al. |
2406.10203 |
link |
2024-06-14 |
Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication |
Sanjali Yadav et.al. |
2406.10166 |
null |
2024-06-14 |
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models |
Carson Denison et.al. |
2406.10162 |
link |
2024-06-14 |
Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation |
Federico Tavella et.al. |
2406.10043 |
null |
2024-06-14 |
ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR |
Vishwanath Pratap Singh et.al. |
2406.09999 |
null |
2024-06-14 |
Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model |
Siemen Herremans et.al. |
2406.09976 |
link |
2024-06-14 |
InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning |
Tiancheng Li et.al. |
2406.09973 |
null |
2024-06-14 |
Finite-Time Analysis of Simultaneous Double Q-learning |
Hyunjun Na et.al. |
2406.09946 |
null |
2024-06-14 |
I Know How: Combining Prior Policies to Solve New Tasks |
Malio Li et.al. |
2406.09835 |
link |
2024-06-13 |
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms |
Miaosen Zhang et.al. |
2406.09397 |
null |
2024-06-13 |
Is Value Learning Really the Main Bottleneck in Offline RL? |
Seohong Park et.al. |
2406.09329 |
null |
2024-06-13 |
AutomaChef: A Physics-informed Demonstration-guided Learning Framework for Granular Material Manipulation |
Minglun Wei et.al. |
2406.09178 |
null |
2024-06-13 |
Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems |
Ashwin P. Dani et.al. |
2406.09097 |
null |
2024-06-13 |
DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning |
Xuemin Hu et.al. |
2406.09089 |
null |
2024-06-13 |
Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles |
Hao Zhang et.al. |
2406.09082 |
null |
2024-06-13 |
Latent Assistance Networks: Rediscovering Hyperbolic Tangents in RL |
Jacob E. Kooi et.al. |
2406.09079 |
null |
2024-06-13 |
Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation |
Claude Formanek et.al. |
2406.09068 |
null |
2024-06-13 |
CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms |
Arda Sarp Yenicesu et.al. |
2406.09030 |
null |
2024-06-13 |
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning |
Alexander Nikulin et.al. |
2406.08973 |
null |
2024-06-12 |
RILe: Reinforced Imitation Learning |
Mert Albaba et.al. |
2406.08472 |
null |
2024-06-12 |
Adaptive Swarm Mesh Refinement using Deep Reinforcement Learning with Local Rewards |
Niklas Freymuth et.al. |
2406.08440 |
null |
2024-06-12 |
RRLS : Robust Reinforcement Learning Suite |
Adil Zouitine et.al. |
2406.08406 |
link |
2024-06-12 |
Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning |
Yuhui Wang et.al. |
2406.08404 |
null |
2024-06-12 |
Time-Constrained Robust MDPs |
Adil Zouitine et.al. |
2406.08395 |
null |
2024-06-12 |
Residual Learning and Context Encoding for Adaptive Offline-to-Online Reinforcement Learning |
Mohammadreza Nakhaei et.al. |
2406.08238 |
link |
2024-06-12 |
Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning |
Max Weltevrede et.al. |
2406.08069 |
null |
2024-06-12 |
Deep reinforcement learning with positional context for intraday trading |
Sven Goluža et.al. |
2406.08013 |
null |
2024-06-12 |
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning |
Yizhe Huang et.al. |
2406.08002 |
null |
2024-06-12 |
Semantic-Aware Resource Allocation Based on Deep Reinforcement Learning for 5G-V2X HetNets |
Zhiyu Shao et.al. |
2406.07996 |
link |
2024-06-11 |
CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning |
Zeyuan Liu et.al. |
2406.07541 |
null |
2024-06-11 |
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis |
Qining Zhang et.al. |
2406.07455 |
null |
2024-06-11 |
Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization |
Weiliang Zhang et.al. |
2406.07418 |
null |
2024-06-11 |
Federated Multi-Agent DRL for Radio Resource Management in Industrial 6G in-X subnetworks |
Bjarke Madsen et.al. |
2406.07383 |
null |
2024-06-11 |
World Models with Hints of Large Language Models for Goal Achieving |
Zeyuan Liu et.al. |
2406.07381 |
null |
2024-06-11 |
EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning |
Yijun Hao et.al. |
2406.07342 |
null |
2024-06-11 |
Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling |
Constantin Waubert de Puiseau et.al. |
2406.07325 |
null |
2024-06-12 |
Multi-objective Reinforcement learning from AI Feedback |
Marcus Williams et.al. |
2406.07295 |
link |
2024-06-11 |
Hybrid Reinforcement Learning from Offline Observation Alone |
Yuda Song et.al. |
2406.07253 |
null |
2024-06-11 |
A generic and robust quantum agent inspired by deep meta-reinforcement learning |
Zibo Miao et.al. |
2406.07225 |
null |
2024-06-10 |
Verification-Guided Shielding for Deep Reinforcement Learning |
Davide Corsi et.al. |
2406.06507 |
null |
2024-06-10 |
Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation |
Mohidul Haque Mridul et.al. |
2406.06500 |
null |
2024-06-10 |
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity |
Calarina Muslimani et.al. |
2406.06495 |
null |
2024-06-10 |
Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots |
Bahador Beigomi et.al. |
2406.06460 |
link |
2024-06-10 |
Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning? |
Denis Tarasov et.al. |
2406.06309 |
link |
2024-06-10 |
Learning-based cognitive architecture for enhancing coordination in human groups |
Antonio Grotta et.al. |
2406.06297 |
null |
2024-06-10 |
Deep Multi-Objective Reinforcement Learning for Utility-Based Infrastructural Maintenance Optimization |
Jesse van Remmerden et.al. |
2406.06184 |
null |
2024-06-10 |
Mastering truss structure optimization with tree search |
Gabriel E. Garayalde et.al. |
2406.06145 |
null |
2024-06-10 |
EXPIL: Explanatory Predicate Invention for Learning in Games |
Jingyuan Sha et.al. |
2406.06107 |
link |
2024-06-10 |
Sim-To-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery |
Paul Maria Scheikl et.al. |
2406.06092 |
null |
2024-06-07 |
LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration |
Tavor Lipman et.al. |
2406.05107 |
null |
2024-06-07 |
Massively Multiagent Minigames for Training Generalist Agents |
Kyoung Whan Choe et.al. |
2406.05071 |
link |
2024-06-07 |
Online Frequency Scheduling by Learning Parallel Actions |
Anastasios Giovanidis et.al. |
2406.05041 |
null |
2024-06-07 |
Optimizing Automatic Differentiation with Deep Reinforcement Learning |
Jamie Lohoff et.al. |
2406.05027 |
null |
2024-06-07 |
Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems |
Rohan Paleja et.al. |
2406.05003 |
null |
2024-06-07 |
SLOPE: Search with Learned Optimal Pruning-based Expansion |
Davor Bokan et.al. |
2406.04935 |
link |
2024-06-07 |
Sim-to-real Transfer of Deep Reinforcement Learning Agents for Online Coverage Path Planning |
Arvi Jonnarth et.al. |
2406.04920 |
null |
2024-06-07 |
Stabilizing Extreme Q-learning by Maclaurin Expansion |
Motoki Omura et.al. |
2406.04896 |
null |
2024-06-07 |
Primitive Agentic First-Order Optimization |
R. Sala et.al. |
2406.04841 |
null |
2024-06-07 |
Algorithms for learning value-aligned policies considering admissibility relaxation |
Andrés Holgado-Sánchez et.al. |
2406.04838 |
null |
2024-06-06 |
ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories |
Qianlan Yang et.al. |
2406.04323 |
null |
2024-06-06 |
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models |
Xiang Ji et.al. |
2406.04274 |
null |
2024-06-06 |
MARLander: A Local Path Planning for Drone Swarms using Multiagent Deep Reinforcement Learning |
Demetros Aschu et.al. |
2406.04159 |
null |
2024-06-06 |
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning |
Abdullah Akgül et.al. |
2406.04088 |
null |
2024-06-06 |
Bootstrapping Expectiles in Reinforcement Learning |
Pierre Clavier et.al. |
2406.04081 |
null |
2024-06-06 |
Spatio-temporal Early Prediction based on Multi-objective Reinforcement Learning |
Wei Shao et.al. |
2406.04035 |
link |
2024-06-06 |
Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents |
Yoann Poupart et.al. |
2406.04028 |
link |
2024-06-06 |
HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning |
Quentin Delfosse et.al. |
2406.03997 |
link |
2024-06-06 |
AC4MPC: Actor-Critic Reinforcement Learning for Nonlinear Model Predictive Control |
Rudolf Reiter et.al. |
2406.03995 |
null |
2024-06-06 |
Mini Honor of Kings: A Lightweight Environment for Multi-Agent Reinforcement Learning |
Lin Liu et.al. |
2406.03978 |
link |
2024-06-05 |
Automating Turkish Educational Quiz Generation Using Large Language Models |
Kamyar Zeinalipour et.al. |
2406.03397 |
link |
2024-06-05 |
LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback |
Timon Ziegenbein et.al. |
2406.03363 |
null |
2024-06-05 |
UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning |
Yu Zhang et.al. |
2406.03324 |
null |
2024-06-05 |
Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning |
Mohamed Elsayed et.al. |
2406.03276 |
link |
2024-06-05 |
Prompt-based Visual Alignment for Zero-shot Policy Transfer |
Haihan Gao et.al. |
2406.03250 |
null |
2024-06-05 |
Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning |
Inwoo Hwang et.al. |
2406.03234 |
link |
2024-06-05 |
CommonPower: Supercharging Machine Learning for Smart Grids |
Michael Eichelbeck et.al. |
2406.03231 |
link |
2024-06-05 |
Object Manipulation in Marine Environments using Reinforcement Learning |
Ahmed Nader et.al. |
2406.03223 |
null |
2024-06-05 |
Adaptive Distance Functions via Kelvin Transformation |
Rafael I. Cabral Muchacho et.al. |
2406.03200 |
null |
2024-06-05 |
DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays |
Bo Xia et.al. |
2406.03102 |
null |
2024-06-04 |
Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs |
Filippo Valdettaro et.al. |
2406.02456 |
link |
2024-06-04 |
A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies |
Md Mirajul Islam et.al. |
2406.02450 |
null |
2024-06-04 |
Algorithmic Collusion in Dynamic Pricing with Deep Reinforcement Learning |
Shidi Deng et.al. |
2406.02437 |
null |
2024-06-04 |
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models |
Philip Anastassiou et.al. |
2406.02430 |
link |
2024-06-04 |
Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning |
Jiaxu Wang et.al. |
2406.02370 |
null |
2024-06-04 |
How to Explore with Belief: State Entropy Maximization in POMDPs |
Riccardo Zamboni et.al. |
2406.02295 |
null |
2024-06-04 |
Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling |
Arthur Müller et.al. |
2406.02294 |
null |
2024-06-04 |
Test-Time Regret Minimization in Meta Reinforcement Learning |
Mirco Mutti et.al. |
2406.02282 |
null |
2024-06-04 |
Reinforcement Learning with Lookahead Information |
Nadav Merlis et.al. |
2406.02258 |
null |
2024-06-04 |
Quantum Computing in Wireless Communications and Networking: A Tutorial-cum-Survey |
Wei Zhao et.al. |
2406.02240 |
null |
2024-05-31 |
Exploratory Preference Optimization: Harnessing Implicit Q-Approximation for Sample-Efficient RLHF* |
Tengyang Xie et.al. |
2405.21046 |
null |
2024-05-31 |
Direct Alignment of Language Models via Quality-Aware Self-Refinement |
Runsheng Yu et.al. |
2405.21040 |
null |
2024-06-03 |
Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles |
Jiesong Lian et.al. |
2405.21027 |
null |
2024-05-31 |
Generating Triangulations and Fibrations with Reinforcement Learning |
Per Berglund et.al. |
2405.21017 |
null |
2024-05-31 |
Bayesian Design Principles for Offline-to-Online Reinforcement Learning |
Hao Hu et.al. |
2405.20984 |
link |
2024-05-31 |
Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring |
Prasoon Raghuwanshi et.al. |
2405.20983 |
null |
2024-05-31 |
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales |
Tianyang Xu et.al. |
2405.20974 |
link |
2024-05-31 |
Amortizing intractable inference in diffusion models for vision, language, and control |
Siddarth Venkatraman et.al. |
2405.20971 |
link |
2024-05-31 |
Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation |
Shangding Gu et.al. |
2405.20860 |
null |
2024-05-31 |
Improving Reward Models with Synthetic Critiques |
Zihuiwen Ye et.al. |
2405.20850 |
null |
2024-05-30 |
Group Robust Preference Optimization in Reward-free RLHF |
Shyam Sundhar Ramesh et.al. |
2405.20304 |
link |
2024-05-30 |
Evaluating Large Language Model Biases in Persona-Steered Generation |
Andy Liu et.al. |
2405.20253 |
link |
2024-05-30 |
InstructionCP: A fast approach to transfer Large Language Models into target language |
Kuang-Ming Chen et.al. |
2405.20175 |
null |
2024-05-30 |
Enhancing Battlefield Awareness: An Aerial RIS-assisted ISAC System with Deep Reinforcement Learning |
Hyunsang Cho et.al. |
2405.20168 |
null |
2024-05-30 |
Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation |
Wooseong Cho et.al. |
2405.20165 |
null |
2024-05-31 |
NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models |
Kai Wu et.al. |
2405.20081 |
null |
2024-05-30 |
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads |
Avelina Asada Hadji-Kyriacou et.al. |
2405.20053 |
link |
2024-05-30 |
Deep Reinforcement Learning for Intrusion Detection in IoT: A Survey |
Afrah Gueriani et.al. |
2405.20038 |
null |
2024-05-30 |
Safe Multi-agent Reinforcement Learning with Natural Language Constraints |
Ziyan Wang et.al. |
2405.20018 |
null |
2024-05-30 |
LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning |
Hyungho Na et.al. |
2405.19998 |
link |
2024-05-29 |
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment |
Shenao Zhang et.al. |
2405.19332 |
link |
2024-05-29 |
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF |
Shicong Cen et.al. |
2405.19320 |
null |
2024-05-29 |
Robust Preference Optimization through Reward Model Distillation |
Adam Fisch et.al. |
2405.19316 |
null |
2024-05-29 |
Rich-Observation Reinforcement Learning with Continuous Latent Dynamics |
Yuda Song et.al. |
2405.19269 |
null |
2024-05-29 |
Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach |
Amir Hossein Karbasi et.al. |
2405.19236 |
null |
2024-05-29 |
Diffusion-based Dynamics Models for Long-Horizon Rollout in Offline Reinforcement Learning |
Hanye Zhao et.al. |
2405.19189 |
link |
2024-05-29 |
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning |
Arthur Juliani et.al. |
2405.19153 |
null |
2024-05-29 |
Learning Interpretable Scheduling Algorithms for Data Processing Clusters |
Zhibo Hu et.al. |
2405.19131 |
null |
2024-05-29 |
Offline Regularised Reinforcement Learning for Large Language Models Alignment |
Pierre Harvey Richemond et.al. |
2405.19107 |
null |
2024-05-29 |
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts |
Yu Luo et.al. |
2405.19080 |
link |
2024-05-28 |
Hierarchical World Models as Visual Whole-Body Humanoid Controllers |
Nicklas Hansen et.al. |
2405.18418 |
null |
2024-05-28 |
Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study |
Shreyas Bhat et.al. |
2405.18324 |
null |
2024-05-28 |
Highway Reinforcement Learning |
Yuhui Wang et.al. |
2405.18289 |
null |
2024-05-28 |
Extreme Value Monte Carlo Tree Search |
Masataro Asai et.al. |
2405.18248 |
null |
2024-05-28 |
Recurrent Natural Policy Gradient for POMDPs |
Semih Cayci et.al. |
2405.18221 |
null |
2024-05-28 |
Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving |
Zhi Zheng et.al. |
2405.18209 |
link |
2024-05-28 |
Mutation-Bias Learning in Games |
Johann Bauer et.al. |
2405.18190 |
null |
2024-05-28 |
Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding |
Daniel Bethell et.al. |
2405.18180 |
link |
2024-05-28 |
Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing |
Wei Zhao et.al. |
2405.18166 |
link |
2024-05-28 |
PyTAG: Tabletop Games for Multi-Agent Reinforcement Learning |
Martin Balla et.al. |
2405.18123 |
link |
2024-05-27 |
A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning |
Abdulaziz Almuzairee et.al. |
2405.17416 |
link |
2024-05-27 |
Rethinking Transformers in Solving POMDPs |
Chenhao Lu et.al. |
2405.17358 |
link |
2024-05-27 |
Opinion-Guided Reinforcement Learning |
Kyanna Dagenais et.al. |
2405.17287 |
null |
2024-05-27 |
DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems |
Zhi Zheng et.al. |
2405.17272 |
link |
2024-05-27 |
Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning |
Adriana Hugessen et.al. |
2405.17243 |
null |
2024-05-27 |
InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning |
Guozheng Li et.al. |
2405.17229 |
null |
2024-05-27 |
Learning Generic and Dynamic Locomotion of Humanoids Across Discrete Terrains |
Shangqun Yu et.al. |
2405.17227 |
null |
2024-05-27 |
Flow control of three-dimensional cylinders transitioning to turbulence via multi-agent reinforcement learning |
P. Suárez et.al. |
2405.17210 |
null |
2024-05-27 |
CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control |
Jingqing Ruan et.al. |
2405.17152 |
link |
2024-05-27 |
Q-value Regularized Transformer for Offline Reinforcement Learning |
Shengchao Hu et.al. |
2405.17098 |
null |
2024-05-24 |
Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment |
Hao Sun et.al. |
2405.15624 |
null |
2024-05-24 |
Neuromorphic dreaming: A pathway to efficient learning in artificial agents |
Ingo Blakowski et.al. |
2405.15616 |
link |
2024-05-24 |
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code |
Maxence Faldor et.al. |
2405.15568 |
null |
2024-05-24 |
Learning Generalizable Human Motion Generator with Reinforcement Learning |
Yunyao Mao et.al. |
2405.15541 |
null |
2024-05-24 |
Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces |
Angeliki Kamoutsi et.al. |
2405.15509 |
link |
2024-05-24 |
Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments |
Olivia Jullian Parra et.al. |
2405.15508 |
null |
2024-05-24 |
TD3 Based Collision Free Motion Planning for Robot Navigation |
Hao Liu et.al. |
2405.15460 |
null |
2024-05-24 |
Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics |
David Boetius et.al. |
2405.15430 |
null |
2024-05-24 |
Model-free reinforcement learning with noisy actions for automated experimental control in optics |
Lea Richtmann et.al. |
2405.15421 |
link |
2024-05-24 |
Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate |
Fan-Ming Luo et.al. |
2405.15384 |
link |
2024-05-23 |
Privileged Sensing Scaffolds Reinforcement Learning |
Edward S. Hu et.al. |
2405.14853 |
null |
2024-05-23 |
Axioms for AI Alignment from Human Feedback |
Luise Ge et.al. |
2405.14758 |
null |
2024-05-23 |
AGILE: A Novel Framework of LLM Agents |
Peiyuan Feng et.al. |
2405.14751 |
link |
2024-05-23 |
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence |
Minheng Xiao et.al. |
2405.14749 |
null |
2024-05-23 |
SimPO: Simple Preference Optimization with a Reference-Free Reward |
Yu Meng et.al. |
2405.14734 |
link |
2024-05-23 |
Multi-turn Reinforcement Learning from Preference Human Feedback |
Lior Shani et.al. |
2405.14655 |
null |
2024-05-23 |
Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models |
Jingyi Chen et.al. |
2405.14632 |
null |
2024-05-23 |
Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences |
Takuya Hiraoka et.al. |
2405.14629 |
link |
2024-05-23 |
Closed-form Symbolic Solutions: A New Perspective on Solving Partial Differential Equations |
Shu Wei et.al. |
2405.14620 |
null |
2024-05-23 |
Discretization of continuous input spaces in the hippocampal autoencoder |
Adrian F. Amil et.al. |
2405.14600 |
link |
2024-05-21 |
Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale |
Shriram Chennakesavalu et.al. |
2405.12961 |
link |
2024-05-21 |
Effect of Synthetic Jets Actuator Parameters on Deep Reinforcement Learning-Based Flow Control Performance in a Square Cylinder |
Wang Jia et.al. |
2405.12834 |
null |
2024-05-22 |
Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones |
Jan-Hendrik Ewers et.al. |
2405.12800 |
null |
2024-05-21 |
Generative AI and Large Language Models for Cyber Security: All Insights You Need |
Mohamed Amine Ferrag et.al. |
2405.12750 |
null |
2024-05-21 |
Reinforcement Learning Enabled Peer-to-Peer Energy Trading for Dairy Farms |
Mian Ibad Ali Shah et.al. |
2405.12716 |
null |
2024-05-21 |
A Multimodal Learning-based Approach for Autonomous Landing of UAV |
Francisco Neves et.al. |
2405.12681 |
null |
2024-05-21 |
Learning Causal Dynamics Models in Object-Oriented Environments |
Zhongwei Yu et.al. |
2405.12615 |
link |
2024-05-21 |
PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation |
Yuhua Zhu et.al. |
2405.12535 |
null |
2024-05-21 |
GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems |
Zhenwei Wang et.al. |
2405.12475 |
null |
2024-05-21 |
Physics-based Scene Layout Generation from Human Motion |
Jianan Li et.al. |
2405.12460 |
null |
2024-05-20 |
Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? |
Yang Dai et.al. |
2405.12094 |
null |
2024-05-20 |
PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation |
Zhuobin Huang et.al. |
2405.12079 |
null |
2024-05-20 |
Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning |
Hai Zhang et.al. |
2405.12001 |
null |
2024-05-20 |
Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space |
Qianmei Liu et.al. |
2405.11982 |
null |
2024-05-20 |
A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers |
Tom Roth et.al. |
2405.11904 |
null |
2024-05-20 |
Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process |
Ermo Hua et.al. |
2405.11870 |
link |
2024-05-20 |
Reward-Punishment Reinforcement Learning with Maximum Entropy |
Jiexin Wang et.al. |
2405.11784 |
null |
2024-05-20 |
Efficient Multi-agent Reinforcement Learning by Planning |
Qihan Liu et.al. |
2405.11778 |
link |
2024-05-20 |
Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning |
Xin Liu et.al. |
2405.11740 |
null |
2024-05-20 |
Highway Graph to Accelerate Reinforcement Learning |
Zidu Yin et.al. |
2405.11727 |
link |
2024-05-17 |
Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review |
Hongyi Yang et.al. |
2405.10883 |
null |
2024-05-17 |
Automated Radiology Report Generation: A Review of Recent Advances |
Phillip Sloan et.al. |
2405.10842 |
null |
2024-05-17 |
Combining Teacher-Student with Representation Learning: A Concurrent Teacher-Student Reinforcement Learning Paradigm for Legged Locomotion |
Hongxi Wang et.al. |
2405.10830 |
null |
2024-05-17 |
Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities |
Hao Zhou et.al. |
2405.10825 |
null |
2024-05-17 |
A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization |
Andrzej Ruszczyński et.al. |
2405.10815 |
null |
2024-05-17 |
SignLLM: Sign Languages Production Large Language Models |
Sen Fang et.al. |
2405.10718 |
null |
2024-05-17 |
Sample-Efficient Constrained Reinforcement Learning with General Parameterization |
Washim Uddin Mondal et.al. |
2405.10624 |
null |
2024-05-17 |
An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems |
Jiyue Tao et.al. |
2405.10576 |
null |
2024-05-17 |
Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage Control |
Jaeik Jeong et.al. |
2405.10536 |
null |
2024-05-17 |
Towards Better Question Generation in QA-Based Event Extraction |
Zijin Hong et.al. |
2405.10517 |
link |
2024-05-16 |
Stochastic Q-learning for Large Discrete Action Spaces |
Fares Fourati et.al. |
2405.10310 |
null |
2024-05-17 |
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning |
Yuexiang Zhai et.al. |
2405.10292 |
null |
2024-05-16 |
Keep It Private: Unsupervised Privatization of Online Text |
Calvin Bao et.al. |
2405.10260 |
link |
2024-05-16 |
A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy |
Zhaoxing Li et.al. |
2405.10214 |
null |
2024-05-16 |
Continuous Transfer Learning for UAV Communication-aware Trajectory Design |
Chenrui Sun et.al. |
2405.10087 |
null |
2024-05-16 |
Optimizing Search and Rescue UAV Connectivity in Challenging Terrain through Multi Q-Learning |
Mohammed M. H. Qazzaz et.al. |
2405.10042 |
null |
2024-05-16 |
Reward Centering |
Abhishek Naik et.al. |
2405.09999 |
null |
2024-05-16 |
Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning |
Francisco Leiva et.al. |
2405.09760 |
null |
2024-05-16 |
NIFTY Financial News Headlines Dataset |
Raeid Saqur et.al. |
2405.09747 |
null |
2024-05-15 |
Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning |
Sihan Zeng et.al. |
2405.09660 |
null |
2024-05-15 |
Reinforcement Learning-Based Framework for the Intelligent Adaptation of User Interfaces |
Daniel Gaspar-Figueiredo et.al. |
2405.09255 |
null |
2024-05-15 |
DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State Representation |
Jingwen Yang et.al. |
2405.09163 |
null |
2024-05-15 |
CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving |
Dechen Gao et.al. |
2405.09111 |
link |
2024-05-15 |
Chaos-based reinforcement learning with TD3 |
Toshitaka Matsuki et.al. |
2405.09086 |
null |
2024-05-15 |
Deep Learning in Earthquake Engineering: A Comprehensive Review |
Yazhou Xie et.al. |
2405.09021 |
null |
2024-05-14 |
Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language |
Jan Kaiser et.al. |
2405.08888 |
null |
2024-05-14 |
Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes |
Samuel Tesfazgi et.al. |
2405.08756 |
null |
2024-05-14 |
Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach |
Urvij Saroliya et.al. |
2405.08754 |
null |
2024-05-14 |
Reinformer: Max-Return Sequence Modeling for offline RL |
Zifeng Zhuang et.al. |
2405.08740 |
link |
2024-05-14 |
I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning |
Yashuai Yan et.al. |
2405.08726 |
null |
2024-05-15 |
Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search Planning |
Jan-Hendrik Ewers et.al. |
2405.08691 |
null |
2024-05-14 |
A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning |
Matteo Cederle et.al. |
2405.08655 |
link |
2024-05-14 |
vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement |
Yiwen Zhu et.al. |
2405.08638 |
null |
2024-05-14 |
Optimizing Deep Reinforcement Learning for American Put Option Hedging |
Reilly Pickard et.al. |
2405.08602 |
null |
2024-05-14 |
Python-Based Reinforcement Learning on Simulink Models |
Georg Schäfer et.al. |
2405.08567 |
null |
2024-05-14 |
Growing Artificial Neural Networks for Control: the Role of Neuronal Diversity |
Eleni Nisioti et.al. |
2405.08510 |
link |
2024-05-13 |
RLHF Workflow: From Reward Modeling to Online RLHF |
Hanze Dong et.al. |
2405.07863 |
link |
2024-05-13 |
Adaptive Exploration for Data-Efficient General Value Function Evaluations |
Arushi Jain et.al. |
2405.07838 |
link |
2024-05-13 |
Fixed Point Theory Analysis of a Lambda Policy Iteration with Randomization for the Ćirić Contraction Operator |
Abdelkader Belhenniche et.al. |
2405.07824 |
null |
2024-05-13 |
Hamiltonian-based Quantum Reinforcement Learning for Neural Combinatorial Optimization |
Georg Kruse et.al. |
2405.07790 |
null |
2024-05-13 |
Hype or Heuristic? Quantum Reinforcement Learning for Join Order Optimisation |
Maja Franz et.al. |
2405.07770 |
link |
2024-05-13 |
CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian Optimization |
Wei-Ting Tang et.al. |
2405.07760 |
null |
2024-05-13 |
MADRL-Based Rate Adaptation for 360 $\degree$ Video Streaming with Multi-Viewpoint Prediction |
Haopeng Wang et.al. |
2405.07759 |
null |
2024-05-13 |
Neural Network Compression for Reinforcement Learning Tasks |
Dmitry A. Ivanov et.al. |
2405.07748 |
null |
2024-05-13 |
Backdoor Removal for Generative Large Language Models |
Haoran Li et.al. |
2405.07667 |
null |
2024-05-14 |
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback |
Asaf Cassel et.al. |
2405.07637 |
null |
2024-05-10 |
Value Augmented Sampling for Language Model Alignment and Personalization |
Seungwook Han et.al. |
2405.06639 |
link |
2024-05-10 |
EcoEdgeTwin: Enhanced 6G Network via Mobile Edge Computing and Digital Twin Integration |
Synthia Hossain Karobi et.al. |
2405.06507 |
null |
2024-05-10 |
Advantageous and disadvantageous inequality aversion can be taught through vicarious learning of others' preferences |
Shen Zhang et.al. |
2405.06500 |
null |
2024-05-10 |
Contextual Affordances for Safe Exploration in Robotic Scenarios |
William Z. Ye et.al. |
2405.06422 |
null |
2024-05-10 |
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs |
Davide Maran et.al. |
2405.06363 |
null |
2024-05-10 |
Learning Latent Dynamic Robust Representations for World Models |
Ruixiang Sun et.al. |
2405.06263 |
link |
2024-05-10 |
Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning |
Xiaoyu Wen et.al. |
2405.06192 |
link |
2024-05-10 |
(A Partial Survey of) Decentralized, Cooperative Multi-Agent Reinforcement Learning |
Christopher Amato et.al. |
2405.06161 |
null |
2024-05-09 |
An RNN-policy gradient approach for quantum architecture search |
Gang Wang et.al. |
2405.05892 |
null |
2024-05-09 |
Safe Exploration Using Bayesian World Models and Log-Barrier Optimization |
Yarden As et.al. |
2405.05890 |
null |
2024-05-09 |
Policy Gradient with Active Importance Sampling |
Matteo Papini et.al. |
2405.05630 |
null |
2024-05-09 |
An Automatic Prompt Generation System for Tabular Data Tasks |
Ashlesha Akella et.al. |
2405.05618 |
null |
2024-05-09 |
Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning |
Yuchen Shi et.al. |
2405.05542 |
link |
2024-05-08 |
Model-Free Robust $φ$ -Divergence Reinforcement Learning Using Both Offline and Online Data |
Kishan Panaganti et.al. |
2405.05468 |
null |
2024-05-08 |
Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management |
Gang Hu et.al. |
2405.05449 |
null |
2024-05-08 |
Learning to Play Pursuit-Evasion with Dynamic and Sensor Constraints |
Burak M. Gonultas et.al. |
2405.05372 |
null |
2024-05-08 |
Offline Model-Based Optimization via Policy-Guided Gradient Search |
Yassine Chemingui et.al. |
2405.05349 |
link |
2024-05-08 |
Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models |
Aylin Gunal et.al. |
2405.05060 |
null |
2024-05-08 |
Fault Identification Enhancement with Reinforcement Learning (FIERL) |
Valentina Zaccaria et.al. |
2405.04938 |
link |
2024-05-07 |
RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes |
Kyle Stachowicz et.al. |
2405.04714 |
null |
2024-05-07 |
Proximal Policy Optimization with Adaptive Exploration |
Andrei Lixandru et.al. |
2405.04664 |
null |
2024-05-07 |
ACEGEN: Reinforcement learning of generative chemical agents for drug discovery |
Albert Bou et.al. |
2405.04657 |
link |
2024-05-07 |
TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters |
Jonathan Wilder Lavington et.al. |
2405.04491 |
null |
2024-05-07 |
Designing, Developing, and Validating Network Intelligence for Scaling in Service-Based Architectures based on Deep Reinforcement Learning |
Paola Soto et.al. |
2405.04441 |
null |
2024-05-08 |
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model |
DeepSeek-AI et.al. |
2405.04434 |
link |
2024-05-07 |
The Curse of Diversity in Ensemble-Based Exploration |
Zhixuan Lin et.al. |
2405.04342 |
link |
2024-05-07 |
Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation |
Atharvan Dogra et.al. |
2405.04325 |
null |
2024-05-07 |
Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies |
Paul Templier et.al. |
2405.04322 |
null |
2024-05-07 |
Improving Offline Reinforcement Learning with Inaccurate Simulators |
Yiwen Hou et.al. |
2405.04307 |
null |
2024-05-07 |
Deep Reinforcement Learning for Multi-User RF Charging with Non-linear Energy Harvesters |
Amirhossein Azarbahram et.al. |
2405.04218 |
null |
2024-05-07 |
In-context Learning for Automated Driving Scenarios |
Ziqi Zhou et.al. |
2405.04135 |
link |
2024-05-07 |
Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning |
Teng Xue et.al. |
2405.04082 |
null |
2024-05-06 |
$ε$ -Policy Gradient for Online Pricing |
Lukasz Szpruch et.al. |
2405.03624 |
null |
2024-05-06 |
Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions |
Xingyou Song et.al. |
2405.03547 |
null |
2024-05-06 |
ReinWiFi: A Reinforcement-Learning-Based Framework for the Application-Layer QoS Optimization of WiFi Networks |
Qianren Li et.al. |
2405.03526 |
link |
2024-05-06 |
Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning |
Stone Tao et.al. |
2405.03379 |
link |
2024-05-06 |
Enhancing Q-Learning with Large Language Model Heuristics |
Xiefeng Wu et.al. |
2405.03341 |
null |
2024-05-06 |
Artificial Intelligence in the Autonomous Navigation of Endovascular Interventions: A Systematic Review |
Harry Robertshaw et.al. |
2405.03305 |
null |
2024-05-06 |
End-to-End Reinforcement Learning of Curative Curtailment with Partial Measurement Availability |
Hinrikus Wolf et.al. |
2405.03262 |
null |
2024-05-06 |
Federated Reinforcement Learning with Constraint Heterogeneity |
Hao Jin et.al. |
2405.03236 |
null |
2024-05-06 |
Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning |
Caleb Chuck et.al. |
2405.03113 |
null |
2024-05-05 |
Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning |
Tianchen Zhou et.al. |
2405.03082 |
null |
2024-05-03 |
Geometric Fabrics: a Safe Guiding Medium for Policy Learning |
Karl Van Wyk et.al. |
2405.02250 |
null |
2024-05-03 |
Learning Optimal Deterministic Policies with Stochastic Policy Gradients |
Alessandro Montenegro et.al. |
2405.02235 |
null |
2024-05-03 |
The Cambridge RoboMaster: An Agile Multi-Robot Research Platform |
Jan Blumenkamp et.al. |
2405.02198 |
null |
2024-05-03 |
Simulating the economic impact of rationality through reinforcement learning and agent-based modelling |
Simone Brusatin et.al. |
2405.02161 |
link |
2024-05-03 |
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach |
Anton Plaksin et.al. |
2405.02044 |
null |
2024-05-03 |
Model-based reinforcement learning for protein backbone design |
Frederic Renard et.al. |
2405.01983 |
null |
2024-05-03 |
Rescale-Invariant Federated Reinforcement Learning for Resource Allocation in V2X Networks |
Kaidi Xu et.al. |
2405.01961 |
null |
2024-05-03 |
Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization |
Changliang Zhou et.al. |
2405.01906 |
null |
2024-05-03 |
Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants |
Francesco Maldonato et.al. |
2405.01889 |
link |
2024-05-03 |
A Model-based Multi-Agent Personalized Short-Video Recommender System |
Peilun Zhou et.al. |
2405.01847 |
null |
2024-05-02 |
Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks |
Murtaza Dalal et.al. |
2405.01534 |
null |
2024-05-02 |
FLAME: Factuality-Aware Alignment for Large Language Models |
Sheng-Chieh Lin et.al. |
2405.01525 |
null |
2024-05-02 |
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment |
Gerald Shen et.al. |
2405.01481 |
link |
2024-05-02 |
Goal-conditioned reinforcement learning for ultrasound navigation guidance |
Abdoul Aziz Amadou et.al. |
2405.01409 |
null |
2024-05-02 |
Learning Force Control for Legged Manipulation |
Tifanny Portela et.al. |
2405.01402 |
null |
2024-05-03 |
Constrained Reinforcement Learning Under Model Mismatch |
Zhongchang Sun et.al. |
2405.01327 |
null |
2024-05-02 |
Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network |
Hyeonsu Lyu et.al. |
2405.01314 |
null |
2024-05-02 |
Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning |
Liu Qiyuan et.al. |
2405.01284 |
null |
2024-05-02 |
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation |
Hao Wang et.al. |
2405.01280 |
null |
2024-05-02 |
Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies |
Finn Rietz et.al. |
2405.01198 |
null |
2024-05-01 |
Self-Play Preference Optimization for Language Model Alignment |
Yue Wu et.al. |
2405.00675 |
link |
2024-05-01 |
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO |
Skander Moalla et.al. |
2405.00662 |
link |
2024-05-01 |
HUGO -- Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach |
Malte Lehna et.al. |
2405.00629 |
null |
2024-05-01 |
Koopman-based Deep Learning for Nonlinear System Estimation |
Zexin Sun et.al. |
2405.00627 |
null |
2024-05-01 |
Queue-based Eco-Driving at Roundabouts with Reinforcement Learning |
Anna-Lena Schlamp et.al. |
2405.00625 |
null |
2024-05-01 |
The Real, the Better: Aligning Large Language Models with Online Human Behaviors |
Guanying Jiang et.al. |
2405.00578 |
null |
2024-05-01 |
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment |
Zhili Liu et.al. |
2405.00557 |
null |
2024-05-01 |
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning |
Lucas-Andreï Thil et.al. |
2405.00516 |
null |
2024-05-01 |
MetaRM: Shifted Distributions Alignment via Meta-Learning |
Shihan Dou et.al. |
2405.00438 |
null |
2024-05-01 |
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning |
Yucheng Shi et.al. |
2405.00410 |
link |
2024-04-30 |
Collaborative Control Method of Transit Signal Priority Based on Cooperative Game and Reinforcement Learning |
Hao Qin et.al. |
2404.19683 |
null |
2024-04-30 |
Towards Generalist Robot Learning from Internet Video: A Survey |
Robert McCarthy et.al. |
2404.19664 |
null |
2024-04-30 |
Short term vs. long term: optimization of microswimmer navigation on different time horizons |
Navid Mousavi et.al. |
2404.19561 |
null |
2024-04-30 |
Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation |
Cengis Hasan et.al. |
2404.19462 |
null |
2024-04-30 |
Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning |
Mathieu Rita et.al. |
2404.19409 |
link |
2024-04-30 |
Numeric Reward Machines |
Kristina Levina et.al. |
2404.19370 |
null |
2024-04-30 |
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning |
Chenjia Bai et.al. |
2404.19346 |
link |
2024-04-30 |
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning |
Qiaosheng Zhang et.al. |
2404.19292 |
null |
2024-04-30 |
DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets |
Xiaoyu Huang et.al. |
2404.19264 |
null |
2024-04-30 |
Bias Mitigation via Compensation: A Reinforcement Learning Perspective |
Nandhini Swaminathan et.al. |
2404.19256 |
null |
2024-04-29 |
DPO Meets PPO: Reinforced Token Optimization for RLHF |
Han Zhong et.al. |
2404.18922 |
null |
2024-04-29 |
Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty |
Laixi Shi et.al. |
2404.18909 |
null |
2024-04-29 |
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness |
Aaron J. Li et.al. |
2404.18870 |
link |
2024-04-29 |
Performance-Aligned LLMs for Generating Fast Code |
Daniel Nichols et.al. |
2404.18864 |
null |
2024-04-30 |
Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization |
Qi Zhang et.al. |
2404.18826 |
null |
2024-04-30 |
Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies |
Seyed Soroush Karimi Madahi et.al. |
2404.18821 |
null |
2024-04-29 |
Multi-Agent Synchronization Tasks |
Rolando Fernandez et.al. |
2404.18798 |
null |
2024-04-29 |
Resource-rational reinforcement learning and sensorimotor causal states |
Sarah Marzen et.al. |
2404.18775 |
null |
2024-04-29 |
Self-training superconducting neuromorphic circuits using reinforcement learning rules |
M. L. Schneider et.al. |
2404.18774 |
null |
2024-04-29 |
Adaptive Reinforcement Learning for Robot Control |
Yu Tang Liu et.al. |
2404.18713 |
link |
2024-04-26 |
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo |
Stephen Zhao et.al. |
2404.17546 |
link |
2024-04-26 |
Quantum Multi-Agent Reinforcement Learning for Aerial Ad-hoc Networks |
Theodora-Augustina Drăgan et.al. |
2404.17499 |
null |
2024-04-26 |
Q-Learning to navigate turbulence without a map |
Marco Rando et.al. |
2404.17495 |
null |
2024-04-26 |
Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning |
Hao Liu et.al. |
2404.17379 |
null |
2024-04-26 |
When to Trust LLMs: Aligning Confidence with Response Quality |
Shuchang Tao et.al. |
2404.17287 |
null |
2024-04-26 |
Enhancing Privacy and Security of Autonomous UAV Navigation |
Vatsal Aggarwal et.al. |
2404.17225 |
null |
2024-04-26 |
An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging |
Sadjad Anzabi Zadeh et.al. |
2404.17187 |
null |
2024-04-25 |
Compiler for Distributed Quantum Computing: a Reinforcement Learning Approach |
Panagiotis Promponas et.al. |
2404.17077 |
link |
2024-04-25 |
Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey |
Lingfan Bao et.al. |
2404.17070 |
null |
2024-04-25 |
Evaluating Collaborative Autonomy in Opposed Environments using Maritime Capture-the-Flag Competitions |
Jordan Beason et.al. |
2404.17038 |
null |
2024-04-25 |
REBEL: Reinforcement Learning via Regressing Relative Rewards |
Zhaolin Gao et.al. |
2404.16767 |
link |
2024-04-25 |
Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods |
Min Kyu Shin et.al. |
2404.16721 |
null |
2024-04-25 |
RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments |
Diego Martinez-Baselga et.al. |
2404.16672 |
null |
2024-04-25 |
Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare |
Emre Can Acikgoz et.al. |
2404.16621 |
link |
2024-04-25 |
Exploring the Dynamics of Data Transmission in 5G Networks: A Conceptual Analysis |
Nikita Smirnov et.al. |
2404.16508 |
null |
2024-04-25 |
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints |
Bram De Cooman et.al. |
2404.16468 |
null |
2024-04-25 |
Offline Reinforcement Learning with Behavioral Supervisor Tuning |
Padmanaba Srinivasan et.al. |
2404.16399 |
null |
2024-04-25 |
SwarmRL: Building the Future of Smart Active Systems |
Samuel Tovey et.al. |
2404.16388 |
link |
2024-04-25 |
Reinforcement Learning with Generative Models for Compact Support Sets |
Nico Schiavone et.al. |
2404.16300 |
link |
2024-04-24 |
ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling |
Arjun Somayazulu et.al. |
2404.16216 |
null |
2024-04-24 |
DPO: Differential reinforcement learning with application to optimal configuration search |
Chandrajit Bajaj et.al. |
2404.15617 |
null |
2024-04-24 |
GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL |
Lang Qin et.al. |
2404.15597 |
null |
2024-04-24 |
Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems |
Sarah Keren et.al. |
2404.15583 |
null |
2024-04-23 |
An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models |
Yangchen Pan et.al. |
2404.15518 |
null |
2024-04-23 |
The Power of Resets in Online Reinforcement Learning |
Zakaria Mhammedi et.al. |
2404.15417 |
null |
2024-04-23 |
Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments |
Mateus G. Machado et.al. |
2404.15410 |
link |
2024-04-23 |
Reinforcement Learning with Adaptive Control Regularization for Safe Control of Critical Systems |
Haozhe Tian et.al. |
2404.15199 |
null |
2024-04-23 |
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation |
Xun Wu et.al. |
2404.15100 |
null |
2024-04-23 |
Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot |
Neil Guan et.al. |
2404.15096 |
null |
2024-04-23 |
Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem |
Raphael Koster et.al. |
2404.15059 |
null |
2024-04-23 |
Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems |
Xiaoshuang Chen et.al. |
2404.14961 |
null |
2024-04-23 |
Multi-Objective Deep Reinforcement Learning for 5G Base Station Placement to Support Localisation for Future Sustainable Traffic |
Ahmed Al-Tahmeesschi et.al. |
2404.14954 |
null |
2024-04-23 |
MultiSTOP: Solving Functional Equations with Reinforcement Learning |
Alessandro Trenta et.al. |
2404.14909 |
null |
2024-04-23 |
Unitary Synthesis of Clifford+T Circuits with Reinforcement Learning |
Sebastian Rietsch et.al. |
2404.14865 |
null |
2024-04-23 |
Evolutionary Reinforcement Learning via Cooperative Coevolution |
Chengpeng Hu et.al. |
2404.14763 |
null |
2024-04-23 |
Rank2Reward: Learning Shaped Reward Functions from Passive Video |
Daniel Yang et.al. |
2404.14735 |
null |
2024-04-23 |
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data |
Fahim Tajwar et.al. |
2404.14367 |
link |
2024-04-22 |
Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs |
David R. Nickel et.al. |
2404.14319 |
null |
2024-04-22 |
Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories |
Ning Yang et.al. |
2404.14238 |
null |
2024-04-22 |
Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems |
Yiyang Zhu et.al. |
2404.14092 |
null |
2024-04-22 |
Mechanistic Interpretability for AI Safety -- A Review |
Leonard Bereska et.al. |
2404.14082 |
null |
2024-04-22 |
Research on Robot Path Planning Based on Reinforcement Learning |
Wang Ruiqi et.al. |
2404.14077 |
link |
2024-04-22 |
Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras |
Mhairi Dunion et.al. |
2404.14064 |
link |
2024-04-22 |
A survey of air combat behavior modeling using machine learning |
Patrick Ribu Gorton et.al. |
2404.13954 |
null |
2024-04-22 |
Generating Attractive and Authentic Copywriting from Customer Reviews |
Yu-Xiang Lin et.al. |
2404.13906 |
null |
2024-04-22 |
Explicit Lipschitz Value Estimation Enhances Policy Robustness Against Perturbation |
Xulin Chen et.al. |
2404.13879 |
null |
2024-04-19 |
Mapping Social Choice Theory to RLHF |
Jessica Dai et.al. |
2404.13038 |
null |
2024-04-19 |
Deep Reinforcement Learning-Based Active Flow Control of an Elliptical Cylinder: Transitioning from an Elliptical Cylinder to a Circular Cylinder and a Flat Plate |
Wang Jia et.al. |
2404.13003 |
null |
2024-04-19 |
Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning |
Lisheng Wu et.al. |
2404.12999 |
null |
2024-04-19 |
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering |
Avinash Anand et.al. |
2404.12926 |
null |
2024-04-19 |
Zero-Shot Stitching in Reinforcement Learning using Relative Representations |
Antonio Pio Ricciardi et.al. |
2404.12917 |
null |
2024-04-19 |
MAexp: A Generic Platform for RL-based Multi-Agent Exploration |
Shaohao Zhu et.al. |
2404.12824 |
link |
2024-04-19 |
Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation |
Qiang He et.al. |
2404.12754 |
link |
2024-04-19 |
Demonstration of quantum projective simulation on a single-photon-based quantum computer |
Giacomo Franceschetto et.al. |
2404.12729 |
null |
2024-04-19 |
Energy Conserved Failure Detection for NS-IoT Systems |
Guojin Liu et.al. |
2404.12713 |
null |
2024-04-19 |
Single-Task Continual Offline Reinforcement Learning |
Sibo Gai et.al. |
2404.12639 |
null |
2024-04-18 |
From $r$ to $Q^*$ : Your Language Model is Secretly a Q-Function |
Rafael Rafailov et.al. |
2404.12358 |
null |
2024-04-18 |
Improving the interpretability of GNN predictions through conformal-based graph sparsification |
Pablo Sanchez-Martin et.al. |
2404.12356 |
link |
2024-04-18 |
Practical Considerations for Discrete-Time Implementations of Continuous-Time Control Barrier Function-Based Safety Filters |
Lukas Brunke et.al. |
2404.12329 |
null |
2024-04-18 |
ASID: Active Exploration for System Identification in Robotic Manipulation |
Marius Memmel et.al. |
2404.12308 |
null |
2024-04-18 |
Privacy-Preserving UCB Decision Process Verification via zk-SNARKs |
Xikun Jiang et.al. |
2404.12186 |
null |
2024-04-18 |
Aligning language models with human preferences |
Tomasz Korbak et.al. |
2404.12150 |
link |
2024-04-19 |
Robust and Adaptive Deep Reinforcement Learning for Enhancing Flow Control around a Square Cylinder with Varying Reynolds Numbers |
Wang Jia et.al. |
2404.12123 |
null |
2024-04-18 |
X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner |
Haoyuan Jiang et.al. |
2404.12090 |
link |
2024-04-18 |
Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning |
Hyunwoo Park et.al. |
2404.12079 |
null |
2024-04-18 |
Exploring the landscape of large language models: Foundations, techniques, and challenges |
Milad Moradi et.al. |
2404.11973 |
null |
2024-04-17 |
Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding |
Zezhong Fan et.al. |
2404.11589 |
null |
2024-04-17 |
Deep Policy Optimization with Temporal Logic Constraints |
Ameesh Shah et.al. |
2404.11578 |
null |
2024-04-17 |
VC Theory for Inventory Policies |
Yaqi Xie et.al. |
2404.11509 |
null |
2024-04-17 |
Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem |
Bowen Fang et.al. |
2404.11458 |
null |
2024-04-17 |
What-if Analysis Framework for Digital Twins in 6G Wireless Network Management |
Elif Ak et.al. |
2404.11394 |
null |
2024-04-18 |
Convergence of Policy Gradient for Stochastic Linear-Quadratic Control Problem in Infinite Horizon |
Xinpei Zhang et.al. |
2404.11382 |
null |
2024-04-17 |
Following the Human Thread in Social Navigation |
Luca Scofano et.al. |
2404.11327 |
link |
2024-04-17 |
On Learning Parities with Dependent Noise |
Noah Golowich et.al. |
2404.11325 |
null |
2024-04-17 |
Physics-informed Actor-Critic for Coordination of Virtual Inertia from Power Distribution Systems |
Simon Stock et.al. |
2404.11149 |
null |
2024-04-17 |
Towards Multi-agent Reinforcement Learning based Traffic Signal Control through Spatio-temporal Hypergraphs |
Kang Wang et.al. |
2404.11014 |
null |
2024-04-16 |
Settling Constant Regrets in Linear Markov Decision Processes |
Weitong Zhang et.al. |
2404.10745 |
null |
2024-04-16 |
N-Agent Ad Hoc Teamwork |
Caroline Wang et.al. |
2404.10740 |
null |
2024-04-16 |
Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning |
Hao-Lun Hsu et.al. |
2404.10728 |
null |
2024-04-16 |
Automatic re-calibration of quantum devices by reinforcement learning |
T. Crosta et.al. |
2404.10726 |
null |
2024-04-16 |
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study |
Shusheng Xu et.al. |
2404.10719 |
null |
2024-04-16 |
Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning |
David Winkel et.al. |
2404.10683 |
null |
2024-04-16 |
SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation |
Chang Chen et.al. |
2404.10675 |
null |
2024-04-16 |
Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay |
Jinmei Liu et.al. |
2404.10662 |
link |
2024-04-16 |
Trajectory Planning using Reinforcement Learning for Interactive Overtaking Maneuvers in Autonomous Racing Scenarios |
Levent Ögretmen et.al. |
2404.10658 |
null |
2024-04-16 |
Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms |
Zehao Zhou et.al. |
2404.10645 |
null |
2024-04-15 |
Effective Reinforcement Learning Based on Structural Information Principles |
Xianghua Zeng et.al. |
2404.09760 |
link |
2024-04-15 |
Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning |
Linjie Xu et.al. |
2404.09715 |
null |
2024-04-15 |
Learn Your Reference Model for Real Good Alignment |
Alexey Gorbatovski et.al. |
2404.09656 |
null |
2024-04-15 |
Reliability Estimation of News Media Sources: Birds of a Feather Flock Together |
Sergio Burdisso et.al. |
2404.09565 |
link |
2024-04-15 |
Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning |
Tidiane Camaret Ndir et.al. |
2404.09521 |
link |
2024-04-14 |
Egret: Reinforcement Mechanism for Sequential Computation Offloading in Edge Computing |
Haosong Peng et.al. |
2404.09285 |
null |
2024-04-14 |
A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs |
Elliot Kolker-Hicks et.al. |
2404.09264 |
null |
2024-04-14 |
Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts |
Jing-Cheng Pang et.al. |
2404.09248 |
null |
2024-04-14 |
Advanced Intelligent Optimization Algorithms for Multi-Objective Optimal Power Flow in Future Power Systems: A Review |
Yuyan Li et.al. |
2404.09203 |
null |
2024-04-14 |
On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation |
Yidan Wu et.al. |
2404.09188 |
null |
2024-04-14 |
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation |
Ruixin Yang et.al. |
2404.09127 |
link |
2024-04-12 |
Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation |
Hanlin Tian et.al. |
2404.08570 |
link |
2024-04-12 |
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs |
Shreyas Chaudhari et.al. |
2404.08555 |
null |
2024-04-12 |
Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement |
Lucas Murray et.al. |
2404.08523 |
null |
2024-04-12 |
Prescribing Optimal Health-Aware Operation for Urban Air Mobility with Deep Reinforcement Learning |
Mina Montazeri et.al. |
2404.08497 |
null |
2024-04-12 |
Dataset Reset Policy Optimization for RLHF |
Jonathan D. Chang et.al. |
2404.08495 |
link |
2024-04-12 |
Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing |
Cui Zhang et.al. |
2404.08444 |
null |
2024-04-12 |
SIR-RL: Reinforcement Learning for Optimized Policy Control during Epidemiological Outbreaks in Emerging Market and Developing Economies |
Maeghal Jain et.al. |
2404.08423 |
null |
2024-04-12 |
TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability |
Shiwei Lian et.al. |
2404.08353 |
null |
2024-04-12 |
Agile and versatile bipedal robot tracking control through reinforcement learning |
Jiayi Li et.al. |
2404.08246 |
null |
2024-04-12 |
RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning |
Hongqiao Lian et.al. |
2404.08242 |
null |
2024-04-11 |
High-Dimension Human Value Representation in Large Language Models |
Samuel Cahyawijaya et.al. |
2404.07900 |
link |
2024-04-11 |
Data-Driven System Identification of Quadrotors Subject to Motor Delays |
Jonas Eschmann et.al. |
2404.07837 |
null |
2024-04-11 |
On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning |
Giuseppe Canonaco et.al. |
2404.07826 |
null |
2024-04-11 |
An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization |
Minshuo Chen et.al. |
2404.07771 |
null |
2024-04-11 |
Differentially Private Reinforcement Learning with Self-Play |
Dan Qiao et.al. |
2404.07559 |
null |
2024-04-11 |
Enhancing Policy Gradient with the Polyak Step-Size Adaption |
Yunxiang Li et.al. |
2404.07525 |
null |
2024-04-11 |
Generative Probabilistic Planning for Optimizing Supply Chain Networks |
Hyung-il Ahn et.al. |
2404.07511 |
null |
2024-04-11 |
Neural Fault Injection: Generating Software Faults from Natural Language |
Domenico Cotroneo et.al. |
2404.07491 |
null |
2024-04-11 |
Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains |
Soichiro Nishimori et.al. |
2404.07465 |
null |
2024-04-11 |
UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning |
Saichao Liu et.al. |
2404.07453 |
null |
2024-04-10 |
Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery |
Zohre Karimi et.al. |
2404.07185 |
null |
2024-04-10 |
Adaptive behavior with stable synapses |
Cristiano Capone et.al. |
2404.07150 |
link |
2024-04-10 |
How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models |
Unnseo Park et.al. |
2404.07148 |
link |
2024-04-10 |
Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection |
Linas Nasvytis et.al. |
2404.07099 |
link |
2024-04-10 |
Improving Language Model Reasoning with Self-motivated Learning |
Yunlong Feng et.al. |
2404.07017 |
null |
2024-04-10 |
Agent-driven Generative Semantic Communication for Remote Surveillance |
Wanting Yang et.al. |
2404.06997 |
null |
2024-04-10 |
Deep Reinforcement Learning for Mobile Robot Path Planning |
Hao Liu et.al. |
2404.06974 |
null |
2024-04-10 |
UAV-Assisted Enhanced Coverage and Capacity in Dynamic MU-mMIMO IoT Systems: A Deep Reinforcement Learning Approach |
MohammadMahdi Ghadaksaz et.al. |
2404.06726 |
null |
2024-04-10 |
Dual Ensemble Kalman Filter for Stochastic Optimal Control |
Anant A. Joshi et.al. |
2404.06696 |
null |
2024-04-09 |
Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective |
Victor-Alexandru Darvariu et.al. |
2404.06492 |
null |
2024-04-09 |
Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints |
Hritik Bana et.al. |
2404.06423 |
null |
2024-04-09 |
The Power in Communication: Power Regularization of Communication for Autonomy in Cooperative Multi-Agent Reinforcement Learning |
Nancirose Piazza et.al. |
2404.06387 |
null |
2024-04-09 |
Policy-Guided Diffusion |
Matthew Thomas Jackson et.al. |
2404.06356 |
link |
2024-04-09 |
Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning |
Yanjie Li et.al. |
2404.06330 |
null |
2024-04-09 |
Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning |
Xudong Yu et.al. |
2404.06188 |
null |
2024-04-09 |
A quantum information theoretic analysis of reinforcement learning-assisted quantum architecture search |
Abhishek Sadhu et.al. |
2404.06174 |
null |
2024-04-09 |
Adaptable Recovery Behaviors in Robotics: A Behavior Trees and Motion Generators(BTMG) Approach for Failure Management |
Faseeh Ahmad et.al. |
2404.06129 |
null |
2024-04-09 |
Automatic Configuration Tuning on Cloud Database: A Survey |
Limeng Zhang et.al. |
2404.06043 |
null |
2024-04-09 |
Commute with Community: Enhancing Shared Travel through Social Networks |
Tian Siyuan et.al. |
2404.05987 |
null |
2024-04-08 |
Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer |
Xinyang Gu et.al. |
2404.05695 |
null |
2024-04-08 |
YaART: Yet Another ART Rendering Technology |
Sergey Kastryulin et.al. |
2404.05666 |
null |
2024-04-08 |
Dynamic Backtracking in GFlowNet: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms |
Shuai Guo et.al. |
2404.05576 |
null |
2024-04-08 |
Optimal Flow Admission Control in Edge Computing via Safe Reinforcement Learning |
A. Fox et.al. |
2404.05564 |
null |
2024-04-08 |
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data |
Tim Baumgärtner et.al. |
2404.05530 |
null |
2024-04-08 |
CNN-based Game State Detection for a Foosball Table |
David Hagens et.al. |
2404.05357 |
null |
2024-04-08 |
Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models |
Yutao Ouyang et.al. |
2404.05291 |
null |
2024-04-08 |
MeSA-DRL: Memory-Enhanced Deep Reinforcement Learning for Advanced Socially Aware Robot Navigation in Crowded Environments |
Mannan Saeed Muhammad et.al. |
2404.05203 |
null |
2024-04-08 |
Decision Transformer for Wireless Communications: A New Paradigm of Resource Management |
Jie Zhang et.al. |
2404.05199 |
null |
2024-04-07 |
On the Uniqueness of Solution for the Bellman Equation of LTL Objectives |
Zetong Xuan et.al. |
2404.05074 |
null |
2024-04-05 |
Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution |
Tim Seyde et.al. |
2404.04253 |
null |
2024-04-05 |
Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation |
Lanpei Li et.al. |
2404.04219 |
link |
2024-04-05 |
Enhancing IoT Intelligence: A Transformer-based Reinforcement Learning Methodology |
Gaith Rjoub et.al. |
2404.04205 |
null |
2024-04-05 |
Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report |
Jerrod Wigmore et.al. |
2404.04106 |
null |
2024-04-05 |
Dynamic Prompt Optimizing for Text-to-Image Generation |
Wenyi Mo et.al. |
2404.04095 |
link |
2024-04-05 |
Demonstration Guided Multi-Objective Reinforcement Learning |
Junlin Lu et.al. |
2404.03997 |
null |
2024-04-05 |
A proximal policy optimization based intelligent home solar management |
Kode Creer et.al. |
2404.03888 |
null |
2024-04-05 |
Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration |
Xudong Guo et.al. |
2404.03869 |
null |
2024-04-04 |
Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning |
Noah Golowich et.al. |
2404.03774 |
null |
2024-04-04 |
A Reinforcement Learning based Reset Policy for CDCL SAT Solvers |
Chunxiao Li et.al. |
2404.03753 |
null |
2024-04-04 |
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent |
Hanyu Lai et.al. |
2404.03648 |
link |
2024-04-04 |
Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention |
Ziru Liu et.al. |
2404.03637 |
link |
2024-04-04 |
Laser Learning Environment: A new environment for coordination-critical multi-agent tasks |
Yannick Molinghen et.al. |
2404.03596 |
link |
2024-04-04 |
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm |
Miao Lu et.al. |
2404.03578 |
null |
2024-04-04 |
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale |
Adam Pardyl et.al. |
2404.03482 |
link |
2024-04-04 |
Integrating Hyperparameter Search into GramML |
Hernán Ceferino Vázquez et.al. |
2404.03419 |
link |
2024-04-04 |
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought |
Jooyoung Lee et.al. |
2404.03414 |
null |
2024-04-04 |
Elementary Analysis of Policy Gradient Methods |
Jiacai Liu et.al. |
2404.03372 |
null |
2024-04-04 |
REACT: Revealing Evolutionary Action Consequence Trajectories for Interpretable Reinforcement Learning |
Philipp Altmann et.al. |
2404.03359 |
null |
2024-04-04 |
Scaling Population-Based Reinforcement Learning with GPU Accelerated Simulation |
Asad Ali Shahid et.al. |
2404.03336 |
null |
2024-04-03 |
Learning Quadrupedal Locomotion via Differentiable Simulation |
Clemens Schwarke et.al. |
2404.02887 |
null |
2024-04-03 |
Unsupervised Learning of Effective Actions in Robotics |
Marko Zaric et.al. |
2404.02728 |
link |
2024-04-03 |
Reinforcement Learning in Categorical Cybernetics |
Jules Hedges et.al. |
2404.02688 |
null |
2024-04-03 |
Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering |
Abhijeet Pendyala et.al. |
2404.02577 |
null |
2024-04-03 |
SliceIt! -- A Dual Simulator Framework for Learning Robot Food Slicing |
Cristian C. Beltran-Hernandez et.al. |
2404.02569 |
link |
2024-04-03 |
Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning |
Yi Shen et.al. |
2404.02545 |
link |
2024-04-03 |
Joint Optimization on Uplink OFDMA and MU-MIMO for IEEE 802.11ax: Deep Hierarchical Reinforcement Learning Approach |
Hyeonho Noh et.al. |
2404.02486 |
null |
2024-04-03 |
Deep Reinforcement Learning for Traveling Purchaser Problems |
Haofeng Yuan et.al. |
2404.02476 |
null |
2024-04-03 |
Electric Vehicle Routing Problem for Emergency Power Supply: Towards Telecom Base Station Relief |
Daisuke Kikuta et.al. |
2404.02448 |
link |
2024-04-03 |
AD4RL: Autonomous Driving Benchmarks for Offline Reinforcement Learning with Value-based Dataset |
Dongsu Lee et.al. |
2404.02429 |
null |
2024-04-02 |
Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL |
Golnaz Mesbahi et.al. |
2404.02113 |
null |
2024-04-02 |
Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning |
Samuel Tovey et.al. |
2404.01999 |
null |
2024-04-02 |
VLRM: Vision-Language Models act as Reward Models for Image Captioning |
Maksim Dzabraev et.al. |
2404.01911 |
null |
2024-04-02 |
Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation |
Carlos Plou et.al. |
2404.01867 |
null |
2024-04-02 |
Keeping Behavioral Programs Alive: Specifying and Executing Liveness Requirements |
Tom Yaacov et.al. |
2404.01858 |
null |
2024-04-02 |
EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking |
Stavros Orfanoudakis et.al. |
2404.01849 |
link |
2024-04-02 |
Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy |
Kyungbok Lee et.al. |
2404.01830 |
null |
2024-04-02 |
Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid |
Eric MSP Veith et.al. |
2404.01794 |
null |
2024-04-02 |
Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems |
Dapeng Zhi et.al. |
2404.01769 |
null |
2024-04-02 |
Asymptotics of Language Model Alignment |
Joy Qiping Yang et.al. |
2404.01730 |
null |
2024-03-29 |
Learning Visual Quadrupedal Loco-Manipulation from Demonstrations |
Zhengmao He et.al. |
2403.20328 |
null |
2024-03-29 |
Active flow control of a turbulent separation bubble through deep reinforcement learning |
Bernat Font et.al. |
2403.20295 |
link |
2024-03-29 |
Functional Bilevel Optimization for Machine Learning |
Ieva Petrulionyte et.al. |
2403.20233 |
null |
2024-03-29 |
Decentralized Multimedia Data Sharing in IoV: A Learning-based Equilibrium of Supply and Demand |
Jiani Fan et.al. |
2403.20218 |
null |
2024-03-29 |
Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning |
Duzhen Zhang et.al. |
2403.20163 |
null |
2024-03-29 |
CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening |
Hei Yi Mak et.al. |
2403.20156 |
link |
2024-03-29 |
A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles |
Jiani Fan et.al. |
2403.20151 |
null |
2024-03-29 |
Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation |
Jinyeong Park et.al. |
2403.20109 |
link |
2024-03-29 |
Reinforcement learning for graph theory, II. Small Ramsey numbers |
Mohammad Ghebleh et.al. |
2403.20055 |
link |
2024-03-29 |
Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering |
Yuki Akiyama et.al. |
2403.20020 |
null |
2024-03-28 |
Human-compatible driving partners through data-regularized self-play reinforcement learning |
Daphne Cornelisse et.al. |
2403.19648 |
link |
2024-03-28 |
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment |
Alireza Ganjdanesh et.al. |
2403.19490 |
null |
2024-03-28 |
Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization |
Teodor V. Marinov et.al. |
2403.19462 |
null |
2024-03-28 |
EDA-Driven Preprocessing for SAT Solving |
Zhengyuan Shi et.al. |
2403.19446 |
null |
2024-03-28 |
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model |
Qi Gou et.al. |
2403.19443 |
null |
2024-03-28 |
Fine-Tuning Language Models with Reward Learning on Policy |
Hao Lang et.al. |
2403.19279 |
link |
2024-03-28 |
Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning |
Dieter Coppens et.al. |
2403.19262 |
null |
2024-03-28 |
Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning |
Wei Duan et.al. |
2403.19253 |
link |
2024-03-28 |
Disentangling Length from Quality in Direct Preference Optimization |
Ryan Park et.al. |
2403.19159 |
null |
2024-03-27 |
GENESIS-RL: GEnerating Natural Edge-cases with Systematic Integration of Safety considerations and Reinforcement Learning |
Hsin-Jung Yang et.al. |
2403.19062 |
null |
2024-03-27 |
Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment |
Li Siyao et.al. |
2403.18811 |
null |
2024-03-27 |
CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning |
Elliot Chane-Sane et.al. |
2403.18765 |
null |
2024-03-27 |
Probabilistic Model Checking of Stochastic Reinforcement Learning Policies |
Dennis Gross et.al. |
2403.18725 |
null |
2024-03-28 |
FPGA-Based Neural Thrust Controller for UAVs |
Sharif Azem et.al. |
2403.18703 |
null |
2024-03-27 |
Safe and Robust Reinforcement-Learning: Principles and Practice |
Taku Yamagata et.al. |
2403.18539 |
null |
2024-03-27 |
Bridging the Gap: Regularized Reinforcement Learning for Improved Classical Motion Planning with Safety Modules |
Elias Goldsztejn et.al. |
2403.18524 |
null |
2024-03-27 |
VersaT2I: Improving Text-to-Image Models with Versatile Reward |
Jianshu Guo et.al. |
2403.18493 |
null |
2024-03-27 |
Scaling Vision-and-Language Navigation With Offline RL |
Valay Bundele et.al. |
2403.18454 |
null |
2024-03-27 |
FRESCO: Federated Reinforcement Energy System for Cooperative Optimization |
Nicolas Mauricio Cuadrado et.al. |
2403.18444 |
null |
2024-03-27 |
Reinforcement learning for graph theory, I. Reimplementation of Wagner's approach |
Salem Al-Yakoob et.al. |
2403.18429 |
link |
2024-03-26 |
TractOracle: towards an anatomically-informed reward function for RL-based tractography |
Antoine Théberge et.al. |
2403.17845 |
link |
2024-03-26 |
Learning the Optimal Power Flow: Environment Design Matters |
Thomas Wolgast et.al. |
2403.17831 |
link |
2024-03-26 |
Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games |
Yikuan Yan et.al. |
2403.17674 |
null |
2024-03-26 |
Learning Goal-Directed Object Pushing in Cluttered Scenes with Location-Based Attention |
Nils Dengler et.al. |
2403.17667 |
null |
2024-03-26 |
Uncertainty-aware Distributional Offline Reinforcement Learning |
Xiaocong Chen et.al. |
2403.17646 |
null |
2024-03-26 |
PeersimGym: An Environment for Solving the Task Offloading Problem with Reinforcement Learning |
Frederico Metelo et.al. |
2403.17637 |
link |
2024-03-26 |
Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems |
Siyu Wang et.al. |
2403.17634 |
null |
2024-03-26 |
Towards a Zero-Data, Controllable, Adaptive Dialog System |
Dirk Väth et.al. |
2403.17582 |
null |
2024-03-26 |
VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts |
Marius Captari et.al. |
2403.17542 |
null |
2024-03-26 |
BVR Gym: A Reinforcement Learning Environment for Beyond-Visual-Range Air Combat |
Edvards Scukins et.al. |
2403.17533 |
link |
2024-03-25 |
An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems |
Hanqing Yang et.al. |
2403.16809 |
link |
2024-03-25 |
Enhancing Software Effort Estimation through Reinforcement Learning-based Project Management-Oriented Feature Selection |
Haoyang Chen et.al. |
2403.16749 |
null |
2024-03-25 |
Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization |
Fernando Acero et.al. |
2403.16667 |
null |
2024-03-25 |
Skill Q-Network: Learning Adaptive Skill Ensemble for Mapless Navigation in Unknown Environments |
Hyunki Seong et.al. |
2403.16664 |
null |
2024-03-25 |
Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL |
Osama Ahmad et.al. |
2403.16652 |
null |
2024-03-26 |
CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment |
Feiteng Fang et.al. |
2403.16649 |
link |
2024-03-25 |
Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged Robot |
Zifan Wang et.al. |
2403.16535 |
link |
2024-03-25 |
Towards Cooperative Maneuver Planning in Mixed Traffic at Urban Intersections |
Marvin Klimke et.al. |
2403.16478 |
null |
2024-03-25 |
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions |
Reza Esfandiarpoor et.al. |
2403.16442 |
link |
2024-03-25 |
Physics-informed RL for Maximal Safety Probability Estimation |
Hikaru Hoshino et.al. |
2403.16391 |
link |
2024-03-25 |
Learning Action-based Representations Using Invariance |
Max Rudolph et.al. |
2403.16369 |
null |
2024-03-24 |
Q-adaptive: A Multi-Agent Reinforcement Learning Based Routing on Dragonfly Network |
Yao Kang et.al. |
2403.16301 |
null |
2024-03-22 |
Can large language models explore in-context? |
Akshay Krishnamurthy et.al. |
2403.15371 |
null |
2024-03-22 |
Planning with a Learned Policy Basis to Optimally Solve Complex Tasks |
Guillermo Infante et.al. |
2403.15301 |
null |
2024-03-22 |
Blockchain-based Pseudonym Management for Vehicle Twin Migrations in Vehicular Edge Metaverse |
Jiawen Kang et.al. |
2403.15285 |
null |
2024-03-22 |
Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies |
Nicolò Botteghi et.al. |
2403.15267 |
null |
2024-03-22 |
Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement |
Jonathan Pirnay et.al. |
2403.15180 |
link |
2024-03-22 |
Subequivariant Reinforcement Learning Framework for Coordinated Motion Control |
Haoyu Wang et.al. |
2403.15100 |
null |
2024-03-22 |
Improved Long Short-Term Memory-based Wastewater Treatment Simulators for Deep Reinforcement Learning |
Esmaeel Mohammadi et.al. |
2403.15091 |
null |
2024-03-22 |
Automated Feature Selection for Inverse Reinforcement Learning |
Daulet Baimukashev et.al. |
2403.15079 |
null |
2024-03-22 |
Testing for Fault Diversity in Reinforcement Learning |
Quentin Mazouni et.al. |
2403.15065 |
link |
2024-03-22 |
Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation |
Zhenrui Yue et.al. |
2403.14952 |
null |
2024-03-21 |
Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery |
Yangchun Zhang et.al. |
2403.14593 |
link |
2024-03-21 |
A Mathematical Introduction to Deep Reinforcement Learning for 5G/6G Applications |
Farhad Rezazadeh et.al. |
2403.14516 |
null |
2024-03-21 |
Constrained Reinforcement Learning with Smoothed Log Barrier Function |
Baohe Zhang et.al. |
2403.14508 |
null |
2024-03-21 |
On the continuity and smoothness of the value function in reinforcement learning and optimal control |
Hans Harder et.al. |
2403.14432 |
null |
2024-03-21 |
Emergent communication and learning pressures in language models: a language evolution perspective |
Lukas Galke et.al. |
2403.14427 |
null |
2024-03-21 |
Task-optimal data-driven surrogate models for eNMPC via differentiable simulation and optimization |
Daniel Mayfrank et.al. |
2403.14425 |
null |
2024-03-21 |
A reinforcement learning guided hybrid evolutionary algorithm for the latency location routing problem |
Yuji Zou et.al. |
2403.14405 |
link |
2024-03-21 |
Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression |
Fernando Acero et.al. |
2403.14328 |
null |
2024-03-21 |
Reactor Optimization Benchmark by Reinforcement Learning |
Deborah Schwarcz et.al. |
2403.14273 |
link |
2024-03-21 |
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection |
Kyungjae Lee et.al. |
2403.14238 |
null |
2024-03-20 |
Towards Principled Representation Learning from Videos for Reinforcement Learning |
Dipendra Misra et.al. |
2403.13765 |
link |
2024-03-20 |
Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension Study |
Luca Giamattei et.al. |
2403.13729 |
null |
2024-03-20 |
Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections |
Zengqi Peng et.al. |
2403.13674 |
null |
2024-03-20 |
Multi-agent Reinforcement Traffic Signal Control based on Interpretable Influence Mechanism and Biased ReLU Approximation |
Zhiyue Luo et.al. |
2403.13639 |
null |
2024-03-20 |
Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation |
Do June Min et.al. |
2403.13578 |
link |
2024-03-20 |
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot |
Wenxuan Song et.al. |
2403.13358 |
null |
2024-03-20 |
Waypoint-Based Reinforcement Learning for Robot Manipulation Tasks |
Shaunak A. Mehta et.al. |
2403.13281 |
link |
2024-03-20 |
Federated reinforcement learning for robot motion planning with zero-shot generalization |
Zhenyuan Yuan et.al. |
2403.13245 |
null |
2024-03-20 |
Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0 |
Jiana Liao et.al. |
2403.13237 |
null |
2024-03-20 |
Safety-Aware Reinforcement Learning for Electric Vehicle Charging Station Management in Distribution Network |
Jiarong Fan et.al. |
2403.13236 |
null |
2024-03-19 |
Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes |
He Wang et.al. |
2403.12946 |
null |
2024-03-19 |
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning |
Fucai Ke et.al. |
2403.12884 |
null |
2024-03-19 |
Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning |
Mirco Theile et.al. |
2403.12856 |
null |
2024-03-20 |
Policy Bifurcation in Safe Reinforcement Learning |
Wenjun Zou et.al. |
2403.12847 |
link |
2024-03-19 |
Oriented and Non-oriented Cubical Surfaces in The Penteract |
Manuel Estevez et.al. |
2403.12825 |
null |
2024-03-19 |
Automated Contrastive Learning Strategy Search for Time Series |
Baoyu Jing et.al. |
2403.12641 |
null |
2024-03-19 |
FootstepNet: an Efficient Actor-Critic Method for Fast On-line Bipedal Footstep Planning and Forecasting |
Clément Gaspard et.al. |
2403.12589 |
null |
2024-03-19 |
INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations |
Lirui Luo et.al. |
2403.12451 |
link |
2024-03-19 |
Bin Packing Optimization via Deep Reinforcement Learning |
Baoying Wang et.al. |
2403.12420 |
null |
2024-03-19 |
Understanding Training-free Diffusion Guidance: Mechanisms and Limitations |
Yifei Shen et.al. |
2403.12404 |
link |
2024-03-18 |
The Value of Reward Lookahead in Reinforcement Learning |
Nadav Merlis et.al. |
2403.11637 |
null |
2024-03-18 |
Offline Multitask Representation Learning for Reinforcement Learning |
Haque Ishfaq et.al. |
2403.11574 |
null |
2024-03-18 |
Reinforcement Learning with Token-level Feedback for Controllable Text Generation |
Wendi Li et.al. |
2403.11558 |
link |
2024-03-18 |
TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling |
Weiran Chen et.al. |
2403.11550 |
null |
2024-03-18 |
State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards |
Yuto Tanimoto et.al. |
2403.11520 |
link |
2024-03-18 |
Demystifying Deep Reinforcement Learning-Based Autonomous Vehicle Decision-Making |
Hanxi Wan et.al. |
2403.11432 |
null |
2024-03-18 |
Variational Sampling of Temporal Trajectories |
Jurijs Nazarovs et.al. |
2403.11418 |
null |
2024-03-17 |
Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective |
Muhammad Aneeq uz Zaman et.al. |
2403.11345 |
null |
2024-03-17 |
Causality from Bottom to Top: A Survey |
Abraham Itzhak Weinberg et.al. |
2403.11219 |
null |
2024-03-17 |
Continuous Jumping of a Parallel Wire-Driven Monopedal Robot RAMIEL Using Reinforcement Learning |
Kento Kawaharazuka et.al. |
2403.11205 |
null |
2024-03-15 |
HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation |
Carmelo Sferrazza et.al. |
2403.10506 |
null |
2024-03-15 |
Partially Observable Task and Motion Planning with Uncertainty and Risk Awareness |
Aidan Curtis et.al. |
2403.10454 |
null |
2024-03-15 |
Regret Minimization via Saddle Point Optimization |
Johannes Kirschner et.al. |
2403.10379 |
null |
2024-03-15 |
Cooperative Jamming for Physical Layer Security Enhancement Using Deep Reinforcement Learning |
Sayed Amir Hoseini et.al. |
2403.10342 |
null |
2024-03-15 |
Application of machine learning to experimental design in quantum mechanics |
Federico Belliardo et.al. |
2403.10317 |
null |
2024-03-15 |
Offline Goal-Conditioned Reinforcement Learning for Shape Control of Deformable Linear Objects |
Rita Laezza et.al. |
2403.10290 |
null |
2024-03-15 |
Grasp Anything: Combining Teacher-Augmented Policy Gradient Learning with Instance Segmentation to Grasp Arbitrary Objects |
Malte Mosbach et.al. |
2403.10187 |
null |
2024-03-15 |
Online Policy Learning from Offline Preferences |
Guoxi Zhang et.al. |
2403.10160 |
null |
2024-03-15 |
Belief Aided Navigation using Bayesian Reinforcement Learning for Avoiding Humans in Blind Spots |
Jinyeob Kim et.al. |
2403.10105 |
link |
2024-03-15 |
Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF |
Amey Hengle et.al. |
2403.10088 |
null |
2024-03-14 |
Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning |
Zhishuai Liu et.al. |
2403.09621 |
null |
2024-03-15 |
ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models |
Runyu Ma et.al. |
2403.09583 |
null |
2024-03-14 |
A Reinforcement Learning Approach to Dairy Farm Battery Management using Q Learning |
Nawazish Ali et.al. |
2403.09499 |
null |
2024-03-14 |
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision |
Zhiqing Sun et.al. |
2403.09472 |
link |
2024-03-14 |
A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces |
Hyuckjin Choi et.al. |
2403.09270 |
null |
2024-03-14 |
Leveraging Constraint Programming in a Deep Learning Approach for Dynamically Solving the Flexible Job-Shop Scheduling Problem |
Imanol Echeverria et.al. |
2403.09249 |
null |
2024-03-14 |
Rumor Mitigation in Social Media Platforms with Deep Reinforcement Learning |
Hongyuan Su et.al. |
2403.09217 |
link |
2024-03-14 |
MetroGNN: Metro Network Expansion with Reinforcement Learning |
Hongyuan Su et.al. |
2403.09197 |
link |
2024-03-14 |
SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning |
Nicholas Zolman et.al. |
2403.09110 |
link |
2024-03-14 |
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences |
Martin Weyssow et.al. |
2403.09032 |
link |
2024-03-13 |
TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning |
Shangding Gu et.al. |
2403.08694 |
null |
2024-03-13 |
Digital Twin-assisted Reinforcement Learning for Resource-aware Microservice Offloading in Edge Computing |
Xiangchun Chen et.al. |
2403.08687 |
null |
2024-03-13 |
Meta Reinforcement Learning for Resource Allocation in Aerial Active-RIS-assisted Networks with Rate-Splitting Multiple Access |
Sajad Faramarzi et.al. |
2403.08648 |
null |
2024-03-13 |
Human Alignment of Large Language Models through Online Preference Optimisation |
Daniele Calandriello et.al. |
2403.08635 |
null |
2024-03-13 |
Specification Overfitting in Artificial Intelligence |
Benjamin Roth et.al. |
2403.08425 |
null |
2024-03-13 |
Optimizing Risk-averse Human-AI Hybrid Teams |
Andrew Fuchs et.al. |
2403.08386 |
null |
2024-03-13 |
Learning to Describe for Predicting Zero-shot Drug-Drug Interactions |
Fangqi Zhu et.al. |
2403.08377 |
link |
2024-03-13 |
LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments |
Maonan Wang et.al. |
2403.08337 |
link |
2024-03-14 |
HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback |
Ang Li et.al. |
2403.08309 |
null |
2024-03-13 |
SpaceOctopus: An Octopus-inspired Motion Planning Framework for Multi-arm Space Robot |
Wenbo Zhao et.al. |
2403.08219 |
null |
2024-03-12 |
Exploring Safety Generalization Challenges of Large Language Models via Code |
Qibing Ren et.al. |
2403.07865 |
link |
2024-03-12 |
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards |
Wei Shen et.al. |
2403.07708 |
null |
2024-03-12 |
Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning |
Motoki Omura et.al. |
2403.07704 |
null |
2024-03-12 |
Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation |
Michael Ogezi et.al. |
2403.07605 |
null |
2024-03-12 |
An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning |
Weiwei Gu et.al. |
2403.07566 |
null |
2024-03-12 |
Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding |
Huijie Tang et.al. |
2403.07559 |
link |
2024-03-12 |
Constrained Optimal Fuel Consumption of HEV: A Constrained Reinforcement Learning Approach |
Shuchang Yan et.al. |
2403.07503 |
null |
2024-03-12 |
Optimization of Pressure Management Strategies for Geological CO2 Sequestration Using Surrogate Model-based Reinforcement Learning |
Jungang Chen et.al. |
2403.07360 |
link |
2024-03-12 |
Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and Transformer |
Dipesh Tamboli et.al. |
2403.07309 |
link |
2024-03-12 |
Advantage-Aware Policy Optimization for Offline Reinforcement Learning |
Yunpeng Qing et.al. |
2403.07262 |
null |
2024-03-11 |
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts |
Onur Celik et.al. |
2403.06966 |
null |
2024-03-11 |
Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning |
Junseok Park et.al. |
2403.06880 |
null |
2024-03-11 |
Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification |
Joar Skalse et.al. |
2403.06854 |
null |
2024-03-11 |
In-context Exploration-Exploitation for Reinforcement Learning |
Zhenwen Dai et.al. |
2403.06826 |
null |
2024-03-11 |
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment |
Hao-Lun Hsu et.al. |
2403.06814 |
null |
2024-03-11 |
From Factor Models to Deep Learning: Machine Learning in Reshaping Empirical Asset Pricing |
Junyi Ye et.al. |
2403.06779 |
null |
2024-03-11 |
ALaRM: Align Language Models via Hierarchical Rewards Modeling |
Yuhang Lai et.al. |
2403.06754 |
link |
2024-03-11 |
Generalising Multi-Agent Cooperation through Task-Agnostic Communication |
Dulhan Jayalath et.al. |
2403.06750 |
link |
2024-03-11 |
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback |
Adarsh N L et.al. |
2403.06735 |
null |
2024-03-11 |
Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning |
Zijian Zhou et.al. |
2403.06728 |
null |
2024-03-08 |
Will GPT-4 Run DOOM? |
Adrian de Wynter et.al. |
2403.05468 |
null |
2024-03-08 |
Switching the Loss Reduces the Cost in Batch Reinforcement Learning |
Alex Ayoub et.al. |
2403.05385 |
null |
2024-03-08 |
Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation |
Xiaoying Zhang et.al. |
2403.05171 |
null |
2024-03-08 |
Inverse Design of Photonic Crystal Surface Emitting Lasers is a Sequence Modeling Problem |
Ceyao Zhang et.al. |
2403.05149 |
null |
2024-03-08 |
ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models |
Jun Xu et.al. |
2403.05132 |
null |
2024-03-08 |
RLPeri: Accelerating Visual Perimetry Test with Reinforcement Learning and Convolutional Feature Extraction |
Tanvi Verma et.al. |
2403.05112 |
null |
2024-03-08 |
Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detection |
Jared M. Ping et.al. |
2403.05106 |
null |
2024-03-08 |
Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning |
Hongjoon Ahn et.al. |
2403.05066 |
null |
2024-03-08 |
Aligning Large Language Models for Controllable Recommendations |
Wensheng Lu et.al. |
2403.05063 |
null |
2024-03-08 |
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback |
Huiying Zhong et.al. |
2403.05006 |
null |
2024-03-07 |
Teaching Large Language Models to Reason with Reinforcement Learning |
Alex Havrilla et.al. |
2403.04642 |
null |
2024-03-07 |
Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace |
Léopold Maytié et.al. |
2403.04588 |
null |
2024-03-07 |
Learning Agility Adaptation for Flight in Clutter |
Guangyu Zhao et.al. |
2403.04586 |
null |
2024-03-07 |
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition |
Long-Fei Li et.al. |
2403.04568 |
null |
2024-03-07 |
Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation |
Fabian Otto et.al. |
2403.04453 |
null |
2024-03-07 |
Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation |
Tairan He et.al. |
2403.04436 |
null |
2024-03-07 |
iTRPL: An Intelligent and Trusted RPL Protocol based on Multi-Agent Reinforcement Learning |
Debasmita Dey et.al. |
2403.04416 |
null |
2024-03-07 |
Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning |
Jing Guo Jing Guo et.al. |
2403.04412 |
null |
2024-03-07 |
Model-Free Load Frequency Control of Nonlinear Power Systems Based on Deep Reinforcement Learning |
Xiaodi Chen et.al. |
2403.04374 |
null |
2024-03-07 |
Symmetry Considerations for Learning Task Symmetric Robot Policies |
Mayank Mittal et.al. |
2403.04359 |
null |
2024-03-06 |
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL |
Jesse Farebrother et.al. |
2403.03950 |
null |
2024-03-06 |
Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation |
Marcel Torne et.al. |
2403.03949 |
null |
2024-03-06 |
Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning |
Zifan Xu et.al. |
2403.03848 |
null |
2024-03-06 |
A Survey on Applications of Reinforcement Learning in Spatial Resource Allocation |
Di Zhang et.al. |
2403.03643 |
null |
2024-03-06 |
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem |
Yuhong Sun et.al. |
2403.03558 |
link |
2024-03-06 |
Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning |
Zida Wu et.al. |
2403.03552 |
null |
2024-03-05 |
RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging |
Jordan Poots et.al. |
2403.03359 |
null |
2024-03-05 |
Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination |
Liangzhou Wang et.al. |
2403.03172 |
null |
2024-03-05 |
Leveraging Federated Learning and Edge Computing for Recommendation Systems within Cloud Computing Networks |
Yaqian Qi et.al. |
2403.03165 |
null |
2024-03-05 |
Language Guided Exploration for RL Agents in Text Environments |
Hitesh Golchha et.al. |
2403.03141 |
null |
2024-03-05 |
SplAgger: Split Aggregation for Meta-Reinforcement Learning |
Jacob Beck et.al. |
2403.03020 |
link |
2024-03-05 |
Autonomous vehicle decision and control through reinforcement learning with traffic flow randomization |
Yuan Lin et.al. |
2403.02882 |
null |
2024-03-05 |
SpaceHopper: A Small-Scale Legged Robot for Exploring Low-Gravity Celestial Bodies |
Alexander Spiridonov et.al. |
2403.02831 |
null |
2024-03-05 |
A Zero-Shot Reinforcement Learning Strategy for Autonomous Guidewire Navigation |
Valentina Scarponi et.al. |
2403.02777 |
null |
2024-03-05 |
Fighting Game Adaptive Background Music for Improved Gameplay |
Ibrahim Khan et.al. |
2403.02701 |
null |
2024-03-05 |
PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning |
Ke Zhang et.al. |
2403.02635 |
link |
2024-03-04 |
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation |
Xueqing Wu et.al. |
2403.02528 |
link |
2024-03-02 |
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning |
Alexander Scarlatos et.al. |
2403.01304 |
link |
2024-03-02 |
Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey |
Hamza Kheddar et.al. |
2403.01255 |
null |
2024-03-02 |
Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding |
Ha-Thanh Nguyen et.al. |
2403.01185 |
null |
2024-03-02 |
Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning |
Hyungho Na et.al. |
2403.01112 |
link |
2024-03-02 |
Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL) |
Noah Ford et.al. |
2403.01059 |
null |
2024-03-01 |
A Holistic Power Optimization Approach for Microgrid Control Based on Deep Reinforcement Learning |
Fulong Yao et.al. |
2403.01013 |
link |
2024-03-01 |
Policy Optimization for PDE Control with a Warm Start |
Xiangyuan Zhang et.al. |
2403.01005 |
null |
2024-03-01 |
On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games |
Awni Altabaa et.al. |
2403.00993 |
null |
2024-03-01 |
SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation |
Noriaki Hirose et.al. |
2403.00991 |
null |
2024-03-01 |
Scale-free Adversarial Reinforcement Learning |
Mingyu Chen et.al. |
2403.00930 |
null |
2024-02-29 |
Curiosity-driven Red-teaming for Large Language Models |
Zhang-Wei Hong et.al. |
2402.19464 |
link |
2024-02-29 |
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL |
Yifei Zhou et.al. |
2402.19446 |
link |
2024-02-29 |
Understanding Iterative Combinatorial Auction Designs via Multi-Agent Reinforcement Learning |
Greg d'Eon et.al. |
2402.19420 |
null |
2024-02-29 |
RL-GPT: Integrating Reinforcement Learning and Code-as-policy |
Shaoteng Liu et.al. |
2402.19299 |
null |
2024-02-29 |
StiefelGen: A Simple, Model Agnostic Approach for Time Series Data Augmentation over Riemannian Manifolds |
Prasad Cheema et.al. |
2402.19287 |
null |
2024-02-29 |
Adaptive Testing Environment Generation for Connected and Automated Vehicles with Dense Reinforcement Learning |
Jingxuan Yang et.al. |
2402.19275 |
null |
2024-02-29 |
Deep Reinforcement Learning: A Convex Optimization Approach |
Ather Gattami et.al. |
2402.19212 |
null |
2024-02-29 |
ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration |
Angelo Caregnato-Neto et.al. |
2402.19128 |
null |
2024-02-29 |
Temporal-Aware Deep Reinforcement Learning for Energy Storage Bidding in Energy and Contingency Reserve Markets |
Jinhao Li et.al. |
2402.19110 |
null |
2024-02-29 |
How to Train your Antivirus: RL-based Hardening through the Problem-Space |
Jacopo Cortellazzi et.al. |
2402.19027 |
null |
2024-02-28 |
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards |
Haoxiang Wang et.al. |
2402.18571 |
link |
2024-02-28 |
Unifying F1TENTH Autonomous Racing: Survey, Methods and Benchmarks |
Benjamin David Evans et.al. |
2402.18558 |
link |
2024-02-28 |
Human-Centric Aware UAV Trajectory Planning in Search and Rescue Missions Employing Multi-Objective Reinforcement Learning with AHP and Similarity-Based Experience Replay |
Mahya Ramezani et.al. |
2402.18487 |
null |
2024-02-28 |
FinAgent: A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist |
Wentao Zhang et.al. |
2402.18485 |
null |
2024-02-28 |
Implementing Online Reinforcement Learning with Clustering Neural Networks |
James E. Smith et.al. |
2402.18472 |
null |
2024-02-28 |
Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning |
Jin Hwa Lee et.al. |
2402.18361 |
null |
2024-02-28 |
Solving Multi-Entity Robotic Problems Using Permutation Invariant Neural Networks |
Tianxu An et.al. |
2402.18345 |
null |
2024-02-28 |
Whole-body Humanoid Robot Locomotion with Human Reference |
Qiang Zhang et.al. |
2402.18294 |
null |
2024-02-28 |
Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization |
Shuo Yang et.al. |
2402.18284 |
null |
2024-02-28 |
Reinforcement Learning and Graph Neural Networks for Probabilistic Risk Assessment |
Joachim Grimstad et.al. |
2402.18246 |
null |
2024-02-27 |
Quantum Circuit Discovery for Fault-Tolerant Logical State Preparation with Reinforcement Learning |
Remmy Zen et.al. |
2402.17761 |
link |
2024-02-27 |
Learning to Program Variational Quantum Circuits with Fast Weights |
Samuel Yen-Chi Chen et.al. |
2402.17760 |
null |
2024-02-27 |
When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning |
Leon Lang et.al. |
2402.17747 |
null |
2024-02-27 |
reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use |
Susobhan Ghosh et.al. |
2402.17739 |
link |
2024-02-27 |
Model Free Deep Deterministic Policy Gradient Controller for Setpoint Tracking of Non-minimum Phase Systems |
Fatemeh Tavakkoli et.al. |
2402.17703 |
null |
2024-02-27 |
Multi-Agent Deep Reinforcement Learning for Distributed Satellite Routing |
Federico Lozano-Cuadra et.al. |
2402.17666 |
null |
2024-02-27 |
Emergency Caching: Coded Caching-based Reliable Map Transmission in Emergency Networks |
Zeyu Tian et.al. |
2402.17550 |
null |
2024-02-27 |
Intensive Care as One Big Sequence Modeling Problem |
Vadim Liventsev et.al. |
2402.17501 |
link |
2024-02-27 |
Reinforced In-Context Black-Box Optimization |
Lei Song et.al. |
2402.17423 |
link |
2024-02-27 |
Beacon, a lightweight deep reinforcement learning benchmark library for flow control |
Jonathan Viquerat et.al. |
2402.17402 |
link |
2024-02-26 |
Q-FOX Learning: Breaking Tradition in Reinforcement Learning |
Mahmood Alqaseer et.al. |
2402.16562 |
null |
2024-02-26 |
Model-based deep reinforcement learning for accelerated learning from flow simulations |
Andre Weiner et.al. |
2402.16543 |
link |
2024-02-26 |
Discovering Artificial Viscosity Models for Discontinuous Galerkin Approximation of Conservation Laws using Physics-Informed Machine Learning |
Matteo Caldana et.al. |
2402.16517 |
null |
2024-02-26 |
AI-enabled STAR-RIS aided MISO ISAC Secure Communications |
Zhengyu Zhu et.al. |
2402.16413 |
null |
2024-02-26 |
Feedback Efficient Online Fine-Tuning of Diffusion Models |
Masatoshi Uehara et.al. |
2402.16359 |
null |
2024-02-26 |
C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory |
Tianjiao Luo et.al. |
2402.16349 |
null |
2024-02-26 |
Achieving $\tilde{O}(1/ε)$ Sample Complexity for Constrained Markov Decision Process |
Jiashuo Jiang et.al. |
2402.16324 |
null |
2024-02-26 |
Graph Diffusion Policy Optimization |
Yijing Liu et.al. |
2402.16302 |
link |
2024-02-25 |
How Can LLM Guide RL? A Value-Based Approach |
Shenao Zhang et.al. |
2402.16181 |
link |
2024-02-25 |
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction |
Xiao Chen et.al. |
2402.16174 |
null |
2024-02-25 |
Citation-Enhanced Generation for LLM-based Chatbot |
Weitao Li et.al. |
2402.16063 |
null |
2024-02-25 |
LLMs with Chain-of-Thought Are Non-Causal Reasoners |
Guangsheng Bao et.al. |
2402.16048 |
link |
2024-02-25 |
Harnessing the Synergy between Pushing, Grasping, and Throwing to Enhance Object Manipulation in Cluttered Scenarios |
Hamidreza Kasaei et.al. |
2402.16045 |
null |
2024-02-23 |
Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization |
Swaroop Nath et.al. |
2402.15473 |
link |
2024-02-23 |
PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning |
Simon Holk et.al. |
2402.15420 |
null |
2024-02-23 |
Distributionally Robust Off-Dynamics Reinforcement Learning: Provable Efficiency with Linear Function Approximation |
Zhishuai Liu et.al. |
2402.15399 |
link |
2024-02-23 |
Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms |
Filippo Lazzati et.al. |
2402.15392 |
null |
2024-02-23 |
Shapley Value Based Multi-Agent Reinforcement Learning: Theory, Method and Its Application to Energy Network |
Jianhong Wang et.al. |
2402.15324 |
null |
2024-02-23 |
When in Doubt, Think Slow: Iterative Reasoning with Latent Imagination |
Martin Benfeghoul et.al. |
2402.15283 |
null |
2024-02-23 |
Safety Optimized Reinforcement Learning via Multi-Objective Policy Optimization |
Homayoun Honari et.al. |
2402.15197 |
null |
2024-02-23 |
EasyRL4Rec: A User-Friendly Code Library for Reinforcement Learning Based Recommender Systems |
Yuanqing Yu et.al. |
2402.15164 |
link |
2024-02-23 |
Spatially-Aware Transformer Memory for Embodied Agents |
Junmo Cho et.al. |
2402.15160 |
link |
2024-02-23 |
Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding |
Haoming Li et.al. |
2402.15102 |
null |
2024-02-22 |
Generalizing Reward Modeling for Out-of-Distribution Preference Learning |
Chen Jia et.al. |
2402.14760 |
link |
2024-02-22 |
SHM-Traffic: DRL and Transfer learning based UAV Control for Structural Health Monitoring of Bridges with Traffic |
Divija Swetha Gadiraju et.al. |
2402.14757 |
null |
2024-02-22 |
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs |
Arash Ahmadian et.al. |
2402.14740 |
null |
2024-02-22 |
Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement Learning |
Jinyeob Kim et.al. |
2402.14569 |
link |
2024-02-22 |
MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR |
Marco Borghesi et.al. |
2402.14483 |
null |
2024-02-22 |
Model-Based Reinforcement Learning Control of Reaction-Diffusion Problems |
Christina Schenk et.al. |
2402.14446 |
null |
2024-02-22 |
Quantum Circuit Optimization with AlphaTensor |
Francisco J. R. Ruiz et.al. |
2402.14396 |
null |
2024-02-22 |
Optimal Mechanism in a Dynamic Stochastic Knapsack Environment |
Jihyeok Jung et.al. |
2402.14269 |
null |
2024-02-22 |
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint |
Xinglin Zhou et.al. |
2402.14244 |
null |
2024-02-22 |
Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning |
Peng Gao et.al. |
2402.14236 |
null |
2024-02-21 |
Generating Realistic Arm Movements in Reinforcement Learning: A Quantitative Comparison of Reward Terms and Task Requirements |
Jhon Charaja et.al. |
2402.13949 |
null |
2024-02-21 |
AttackGNN: Red-Teaming GNNs in Hardware Security Using Reinforcement Learning |
Vasudev Gohil et.al. |
2402.13946 |
null |
2024-02-21 |
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning |
Antoine Chaffin et.al. |
2402.13936 |
link |
2024-02-21 |
Enhancing Reinforcement Learning Agents with Local Guides |
Paul Daoudi et.al. |
2402.13930 |
link |
2024-02-21 |
Dealing with unbounded gradients in stochastic saddle-point optimization |
Gergely Neu et.al. |
2402.13903 |
null |
2024-02-21 |
Synthesis of Hierarchical Controllers Based on Deep Reinforcement Learning Policies |
Florent Delgrange et.al. |
2402.13785 |
null |
2024-02-21 |
Weakly supervised localisation of prostate cancer using reinforcement learning for bi-parametric MR images |
Martynas Pocius et.al. |
2402.13778 |
null |
2024-02-21 |
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions |
Jiayu Chen et.al. |
2402.13777 |
link |
2024-02-21 |
Reinforcement learning-assisted quantum architecture search for variational quantum algorithms |
Akash Kundu et.al. |
2402.13754 |
null |
2024-02-21 |
Privacy-Preserving Instructions for Aligning Large Language Models |
Da Yu et.al. |
2402.13659 |
link |
2024-02-20 |
Analyzing Operator States and the Impact of AI-Enhanced Decision Support in Control Rooms: A Human-in-the-Loop Specialized Reinforcement Learning Framework for Intervention Strategies |
Ammar N. Abbas et.al. |
2402.13219 |
link |
2024-02-20 |
Bayesian Reward Models for LLM Alignment |
Adam X. Yang et.al. |
2402.13210 |
null |
2024-02-20 |
SONATA: Self-adaptive Evolutionary Framework for Hardware-aware Neural Architecture Search |
Halima Bouzidi et.al. |
2402.13204 |
null |
2024-02-20 |
Tiny Reinforcement Learning for Quadruped Locomotion using Decision Transformers |
Orhan Eren Akgün et.al. |
2402.13201 |
link |
2024-02-20 |
Align Your Intents: Offline Imitation Learning via Optimal Transport |
Maksim Bobrin et.al. |
2402.13037 |
null |
2024-02-20 |
Multi-Level ML Based Burst-Aware Autoscaling for SLO Assurance and Cost Efficiency |
Chunyang Meng et.al. |
2402.12962 |
link |
2024-02-20 |
Discovering Behavioral Modes in Deep Reinforcement Learning Policies Using Trajectory Clustering in Latent Space |
Sindre Benjamin Remman et.al. |
2402.12939 |
null |
2024-02-20 |
Large Language Model-based Human-Agent Collaboration for Complex Task Solving |
Xueyang Feng et.al. |
2402.12914 |
link |
2024-02-20 |
Skill or Luck? Return Decomposition via Advantage Functions |
Hsiao-Ru Pan et.al. |
2402.12874 |
null |
2024-02-20 |
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces |
Tianyu Zheng et.al. |
2402.12845 |
link |
2024-02-19 |
A Critical Evaluation of AI Feedback for Aligning Large Language Models |
Archit Sharma et.al. |
2402.12366 |
link |
2024-02-19 |
Refining Minimax Regret for Unsupervised Environment Design |
Michael Beukman et.al. |
2402.12284 |
link |
2024-02-19 |
CovRL: Fuzzing JavaScript Engines with Coverage-Guided Reinforcement Learning for LLM-based Mutation |
Jueon Eom et.al. |
2402.12222 |
null |
2024-02-19 |
Revisiting Data Augmentation in Deep Reinforcement Learning |
Jianshu Hu et.al. |
2402.12181 |
link |
2024-02-19 |
BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence |
Jiajie Jin et.al. |
2402.12174 |
null |
2024-02-19 |
Joint mode switching and resource allocation in wireless-powered RIS-aided multiuser communication systems |
Mingang Yuan et.al. |
2402.12143 |
null |
2024-02-19 |
Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks |
Moritz Lange et.al. |
2402.12067 |
null |
2024-02-19 |
All Language Models Large and Small |
Zhixun Chen et.al. |
2402.12061 |
null |
2024-02-19 |
Reinforcement Learning for Optimal Execution when Liquidity is Time-Varying |
Andrea Macrì et.al. |
2402.12049 |
null |
2024-02-19 |
When Do Off-Policy and On-Policy Policy Gradient Methods Align? |
Davide Mambelli et.al. |
2402.12034 |
null |
2024-02-16 |
RLVF: Learning from Verbal Feedback without Overgeneralization |
Moritz Stephan et.al. |
2402.10893 |
link |
2024-02-16 |
Pedipulate: Enabling Manipulation Skills using a Quadruped Robot's Leg |
Philip Arm et.al. |
2402.10837 |
null |
2024-02-16 |
Goal-Conditioned Offline Reinforcement Learning via Metric Learning |
Alfredo Reichlin et.al. |
2402.10820 |
null |
2024-02-16 |
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning |
Zihao Li et.al. |
2402.10810 |
null |
2024-02-16 |
Modelling crypto markets by multi-agent reinforcement learning |
Johann Lussange et.al. |
2402.10803 |
link |
2024-02-16 |
Policy Learning for Off-Dynamics RL with Deficient Support |
Linh Le Pham Van et.al. |
2402.10765 |
link |
2024-02-16 |
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models |
Yuxuan Kuang et.al. |
2402.10670 |
link |
2024-02-16 |
Direct Preference Optimization with an Offset |
Afra Amini et.al. |
2402.10571 |
link |
2024-02-16 |
Discovery of an exchange-only gate sequence for CNOT with record-low gate time using reinforcement learning |
Violeta N. Ivanova-Rohling et.al. |
2402.10559 |
null |
2024-02-16 |
Provably Sample Efficient RLHF via Active Preference Optimization |
Nirjhar Das et.al. |
2402.10500 |
link |
2024-02-15 |
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation |
Huizhuo Yuan et.al. |
2402.10210 |
null |
2024-02-15 |
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment |
Rui Yang et.al. |
2402.10207 |
link |
2024-02-15 |
Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective |
Tianyi Qiu et.al. |
2402.10184 |
null |
2024-02-15 |
Large Scale Constrained Clustering With Reinforcement Learning |
Benedikt Schesch et.al. |
2402.10177 |
null |
2024-02-15 |
GraphCBAL: Class-Balanced Active Learning for Graph Neural Networks via Reinforcement Learning |
Chengcheng Yu et.al. |
2402.10074 |
null |
2024-02-15 |
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models |
Saeed Khaki et.al. |
2402.10038 |
null |
2024-02-15 |
Neural Network Approaches for Parameterized Optimal Control |
Deepanshu Verma et.al. |
2402.10033 |
null |
2024-02-15 |
Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts |
Tobias Enders et.al. |
2402.09992 |
link |
2024-02-15 |
Enhancing Courier Scheduling in Crowdsourced Last-Mile Delivery through Dynamic Shift Extensions: A Deep Reinforcement Learning Approach |
Zead Saleh et.al. |
2402.09961 |
null |
2024-02-15 |
Revisiting Recurrent Reinforcement Learning with Memory Monoids |
Steven Morad et.al. |
2402.09900 |
link |
2024-02-14 |
Reinforcement Learning from Human Feedback with Active Queries |
Kaixuan Ji et.al. |
2402.09401 |
null |
2024-02-14 |
LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement Learning |
Adithya Raman et.al. |
2402.09392 |
null |
2024-02-14 |
Active Disruption Avoidance and Trajectory Design for Tokamak Ramp-downs with Neural Differential Equations and Reinforcement Learning |
Allen M. Wang et.al. |
2402.09387 |
null |
2024-02-14 |
Single-Reset Divide & Conquer Imitation Learning |
Alexandre Chenu et.al. |
2402.09355 |
null |
2024-02-14 |
Mitigating Reward Hacking via Information-Theoretic Reward Modeling |
Yuchun Miao et.al. |
2402.09345 |
null |
2024-02-14 |
Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement Learning |
Michael Lanier et.al. |
2402.09290 |
null |
2024-02-14 |
Uncertainty-Aware Transient Stability-Constrained Preventive Redispatch: A Distributional Reinforcement Learning Approach |
Zhengcheng Wang et.al. |
2402.09263 |
null |
2024-02-14 |
Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement Learning |
Cheng Wang et.al. |
2402.09200 |
null |
2024-02-14 |
Measuring Exploration in Reinforcement Learning via Optimal Transport in Policy Space |
Reabetswe M. Nkhumise et.al. |
2402.09113 |
null |
2024-02-14 |
Exploiting Estimation Bias in Deep Double Q-Learning for Actor-Critic Methods |
Alberto Sinigaglia et.al. |
2402.09078 |
null |
2024-02-13 |
Mixtures of Experts Unlock Parameter Scaling for Deep RL |
Johan Obando-Ceron et.al. |
2402.08609 |
link |
2024-02-13 |
A Distributional Analogue to the Successor Representation |
Harley Wiltzer et.al. |
2402.08530 |
link |
2024-02-13 |
Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea |
Hanna Krasowski et.al. |
2402.08502 |
null |
2024-02-13 |
Deep Reinforcement Learning for Controlled Traversing of the Attractor Landscape of Boolean Models in the Context of Cellular Reprogramming |
Andrzej Mizera et.al. |
2402.08491 |
null |
2024-02-13 |
Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning for Digital Twins |
Eslam Eldeeb et.al. |
2402.08421 |
null |
2024-02-13 |
Transition Constrained Bayesian Optimization via Markov Decision Processes |
Jose Pablo Folch et.al. |
2402.08406 |
null |
2024-02-13 |
MAVRL: Learn to Fly in Cluttered Environments with Varying Speed |
Hang Yu et.al. |
2402.08381 |
null |
2024-02-13 |
Reinforcement Learning for Docking Maneuvers with Prescribed Performance |
Simon Gottschalk et.al. |
2402.08306 |
null |
2024-02-13 |
Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap |
Mohammad Mehrabi et.al. |
2402.08201 |
null |
2024-02-13 |
Enabling Multi-Agent Transfer Reinforcement Learning via Scenario Independent Representation |
Ayesha Siddika Nipu et.al. |
2402.08184 |
null |
2024-02-12 |
MAIDCRL: Semi-centralized Multi-Agent Influence Dense-CNN Reinforcement Learning |
Ayesha Siddika Nipu et.al. |
2402.07890 |
null |
2024-02-12 |
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States |
Noam Razin et.al. |
2402.07875 |
link |
2024-02-12 |
IR-Aware ECO Timing Optimization Using Reinforcement Learning |
Vidya A. Chhabria et.al. |
2402.07781 |
null |
2024-02-12 |
Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model |
Mark Rowland et.al. |
2402.07598 |
null |
2024-02-12 |
Rethinking Scaling Laws for Learning in Strategic Environments |
Tinashe Handina et.al. |
2402.07588 |
null |
2024-02-12 |
A Reinforcement Learning Approach to the Design of Quantum Chains for Optimal Energy Transfer |
S. Sgroi et.al. |
2402.07561 |
null |
2024-02-12 |
Reinforcement learning based demand charge minimization using energy storage |
Lucas Weber et.al. |
2402.07525 |
null |
2024-02-12 |
Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial |
Wenpin Tang et.al. |
2402.07487 |
null |
2024-02-12 |
Auxiliary Reward Generation with Transition Distance Representation Learning |
Siyuan Li et.al. |
2402.07412 |
null |
2024-02-12 |
Measurement Scheduling for ICU Patients with Offline Reinforcement Learning |
Zongliang Ji et.al. |
2402.07344 |
null |
2024-02-09 |
Predictive representations: building blocks of intelligence |
Wilka Carvalho et.al. |
2402.06590 |
null |
2024-02-09 |
Deceptive Path Planning via Reinforcement Learning with Graph Neural Networks |
Michael Y. Fatemi et.al. |
2402.06552 |
link |
2024-02-09 |
ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies |
Jasmina Gajcin et.al. |
2402.06503 |
null |
2024-02-09 |
Hierarchical Transformers are Efficient Meta-Reinforcement Learners |
Gresa Shala et.al. |
2402.06402 |
null |
2024-02-09 |
High-Precision Geosteering via Reinforcement Learning and Particle Filters |
Ressi Bonti Muhammad et.al. |
2402.06377 |
null |
2024-02-09 |
Dynamic Q-planning for Online UAV Path Planning in Unknown and Complex Environments |
Lidia Gianne Souza da Rocha et.al. |
2402.06297 |
null |
2024-02-09 |
Value function interference and greedy action selection in value-based multi-objective reinforcement learning |
Peter Vamplew et.al. |
2402.06266 |
null |
2024-02-09 |
Reinforcement Learning for Blind Stair Climbing with Legged and Wheeled-Legged Robots |
Simon Chamorro et.al. |
2402.06143 |
null |
2024-02-08 |
Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning |
Mohak Bhardwaj et.al. |
2402.06102 |
null |
2024-02-08 |
Scaling Artificial Intelligence for Digital Wargaming in Support of Decision-Making |
Scotty Black et.al. |
2402.06075 |
null |
2024-02-08 |
Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative Markov Games |
Hafez Ghaemi et.al. |
2402.05906 |
link |
2024-02-08 |
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices |
Jiin Woo et.al. |
2402.05876 |
null |
2024-02-08 |
Discovering Temporally-Aware Reinforcement Learning Algorithms |
Matthew Thomas Jackson et.al. |
2402.05828 |
link |
2024-02-08 |
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning |
Zhiheng Xi et.al. |
2402.05808 |
link |
2024-02-08 |
Analysing the Sample Complexity of Opponent Shaping |
Kitty Fung et.al. |
2402.05782 |
null |
2024-02-08 |
When is Mean-Field Reinforcement Learning Tractable and Relevant? |
Batuhan Yardim et.al. |
2402.05757 |
null |
2024-02-08 |
Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL |
Jiawei Huang et.al. |
2402.05724 |
link |
2024-02-08 |
Offline Risk-sensitive RL with Partial Observability to Enhance Performance in Human-Robot Teaming |
Giorgio Angelotti et.al. |
2402.05703 |
null |
2024-02-08 |
Improving Token-Based World Models with Parallel Observation Prediction |
Lior Cohen et.al. |
2402.05643 |
link |
2024-02-08 |
Optimizing Delegation in Collaborative Human-AI Hybrid Teams |
Andrew Fuchs et.al. |
2402.05605 |
null |
2024-02-07 |
Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation |
Dennis Hoftijzer et.al. |
2402.05090 |
link |
2024-02-07 |
Non-Markovian Quantum Control via Model Maximum Likelihood Estimation and Reinforcement Learning |
Tanmay Neema et.al. |
2402.05084 |
null |
2024-02-07 |
Extending the Reach of First-Order Algorithms for Nonconvex Min-Max Problems with Cohypomonotonicity |
Ahmet Alacaoglu et.al. |
2402.05071 |
null |
2024-02-07 |
Exploration Without Maps via Zero-Shot Out-of-Distribution Deep Reinforcement Learning |
Shathushan Sivashangaran et.al. |
2402.05066 |
null |
2024-02-07 |
Towards Generalizability of Multi-Agent Reinforcement Learning in Graphs with Recurrent Message Passing |
Jannis Weil et.al. |
2402.05027 |
link |
2024-02-07 |
Pedagogical Alignment of Large Language Models |
Shashank Sonkar et.al. |
2402.05000 |
null |
2024-02-07 |
A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health |
Biyonka Liang et.al. |
2402.04933 |
link |
2024-02-07 |
Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning |
Apoorva Vashisth et.al. |
2402.04894 |
link |
2024-02-07 |
Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding |
Luis Costero et.al. |
2402.04891 |
null |
2024-02-07 |
Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy |
Ruichu Cai et.al. |
2402.04869 |
null |
2024-02-06 |
MusicRL: Aligning Music Generation to Human Preferences |
Geoffrey Cideron et.al. |
2402.04229 |
null |
2024-02-06 |
Reinforcement Learning with Ensemble Model Predictive Safety Certification |
Sven Gronauer et.al. |
2402.04182 |
null |
2024-02-06 |
Informed Reinforcement Learning for Situation-Aware Traffic Rule Exceptions |
Daniel Bogdoll et.al. |
2402.04168 |
link |
2024-02-06 |
Harnessing the Plug-and-Play Controller by Prompting |
Hao Wang et.al. |
2402.04160 |
null |
2024-02-06 |
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning |
Ruoqi Zhang et.al. |
2402.04080 |
link |
2024-02-06 |
Collaborative Deep Reinforcement Learning for Resource Optimization in Non-Terrestrial Networks |
Yang Cao et.al. |
2402.04056 |
null |
2024-02-06 |
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR |
Liang-Hsuan Tseng et.al. |
2402.03988 |
link |
2024-02-06 |
Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent Deep Reinforcement Learning |
Maxime Toquebiau et.al. |
2402.03972 |
link |
2024-02-06 |
In-context learning agents are asymmetric belief updaters |
Johannes A. Schubert et.al. |
2402.03969 |
null |
2024-02-06 |
Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding |
Mihir Kulkarni et.al. |
2402.03947 |
null |
2024-02-05 |
Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models |
Anthony Sicilia et.al. |
2402.03284 |
null |
2024-02-05 |
A Framework for Partially Observed Reward-States in RLHF |
Chinmaya Kausik et.al. |
2402.03282 |
null |
2024-02-05 |
MobilityGPT: Enhanced Human Mobility Modeling with a GPT model |
Ammar Haydari et.al. |
2402.03264 |
null |
2024-02-05 |
Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems |
Tianzhang Cai et.al. |
2402.03204 |
null |
2024-02-05 |
A Multi-step Loss Function for Robust Learning of the Dynamics in Model-based Reinforcement Learning |
Abdelhakim Benechehab et.al. |
2402.03146 |
null |
2024-02-05 |
Boosting Long-Delayed Reinforcement Learning with Auxiliary Short-Delayed Task |
Qingyuan Wu et.al. |
2402.03141 |
link |
2024-02-05 |
Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained Representations |
Stefan Sylvius Wagner et.al. |
2402.03138 |
null |
2024-02-05 |
Learning to Abstract Visuomotor Mappings using Meta-Reinforcement Learning |
Carlos A. Velazquez-Vargas et.al. |
2402.03072 |
null |
2024-02-05 |
Probabilistic Actor-Critic: Learning to Explore with PAC-Bayes Uncertainty |
Bahareh Tasdighi et.al. |
2402.03055 |
null |
2024-02-05 |
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning |
Shengyi Huang et.al. |
2402.03046 |
null |
2024-02-02 |
Position Paper: Generalized grammar rules and structure-based generalization beyond classical equivariance for lexical tasks and transduction |
Mircea Petrache et.al. |
2402.01629 |
null |
2024-02-02 |
DRL-Based Dynamic Channel Access and SCLAR Maximization for Networks Under Jamming |
Abdul Basit et.al. |
2402.01574 |
null |
2024-02-02 |
A Hybrid Strategy for Chat Transcript Summarization |
Pratik K. Biswas et.al. |
2402.01510 |
null |
2024-02-02 |
Brain-Like Replay Naturally Emerges in Reinforcement Learning Agents |
Jiyi Wang et.al. |
2402.01467 |
null |
2024-02-02 |
A Reinforcement Learning-Boosted Motion Planning Framework: Comprehensive Generalization Performance in Autonomous Driving |
Rainer Trauth et.al. |
2402.01465 |
link |
2024-02-02 |
Learning the Market: Sentiment-Based Ensemble Trading Agents |
Andrew Ye et.al. |
2402.01441 |
null |
2024-02-02 |
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback |
Shihan Dou et.al. |
2402.01391 |
link |
2024-02-02 |
To the Max: Reinventing Reward in Reinforcement Learning |
Grigorii Veviurko et.al. |
2402.01361 |
null |
2024-02-02 |
Parametric-Task MAP-Elites |
Timothée Anne et.al. |
2402.01275 |
null |
2024-02-02 |
Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems |
Neharika Jali et.al. |
2402.01147 |
null |
2024-02-01 |
Towards Efficient and Exact Optimization of Language Model Alignment |
Haozhe Ji et.al. |
2402.00856 |
link |
2024-02-01 |
SLIM: Skill Learning with Multiple Critics |
David Emukpere et.al. |
2402.00823 |
null |
2024-02-01 |
Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments |
Alexander W. Goodall et.al. |
2402.00816 |
null |
2024-02-01 |
Distilling Conditional Diffusion Models for Offline Reinforcement Learning through Trajectory Stitching |
Shangzhe Li et.al. |
2402.00807 |
null |
2024-02-01 |
Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour with Multi-Agent Reinforcement Learning |
Benjamin Patrick Evans et.al. |
2402.00787 |
null |
2024-02-01 |
Dense Reward for Free in Reinforcement Learning from Human Feedback |
Alex J. Chan et.al. |
2402.00782 |
link |
2024-02-01 |
Control-Theoretic Techniques for Online Adaptation of Deep Neural Networks in Dynamical Systems |
Jacob G. Elkins et.al. |
2402.00761 |
null |
2024-02-01 |
FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game |
Guangzheng Hu et.al. |
2402.00738 |
null |
2024-02-01 |
Neural Policy Style Transfer |
Raul Fernandez-Fernandez et.al. |
2402.00677 |
null |
2024-02-01 |
Deep Robot Sketching: An application of Deep Q-Learning Networks for human-like sketching |
Raul Fernandez-Fernandez et.al. |
2402.00676 |
null |
2024-01-31 |
Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability |
Navin Kamuni et.al. |
2401.18040 |
null |
2024-01-31 |
Causal Coordinated Concurrent Reinforcement Learning |
Tim Tse et.al. |
2401.18012 |
null |
2024-01-31 |
Circuit Partitioning for Multi-Core Quantum Architectures with Deep Reinforcement Learning |
Arnau Pastor et.al. |
2401.17976 |
null |
2024-01-31 |
Attention Graph for Multi-Robot Social Navigation with Deep Reinforcement Learning |
Erwan Escudie et.al. |
2401.17914 |
null |
2024-01-31 |
On Tractability, Complexity, and Mixed-Integer Convex Programming Representability of Distributionally Favorable Optimization |
Nan Jiang et.al. |
2401.17899 |
null |
2024-01-31 |
Graph Attention-based Reinforcement Learning for Trajectory Design and Resource Assignment in Multi-UAV Assisted Communication |
Zikai Feng et.al. |
2401.17880 |
null |
2024-01-31 |
Safe Reinforcement Learning-Based Eco-Driving Control for Mixed Traffic Flows With Disturbances |
Ke Lu et.al. |
2401.17837 |
null |
2024-01-31 |
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees |
Toshinori Kitamura et.al. |
2401.17780 |
link |
2024-01-31 |
SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models |
Xiao Shao et.al. |
2401.17749 |
link |
2024-01-31 |
Learning to Stop Cut Generation for Efficient Mixed-Integer Linear Programming |
Haotian Ling et.al. |
2401.17527 |
null |
2024-01-30 |
Improving robustness of quantum feedback control with reinforcement learning |
Manuel Guatto et.al. |
2401.17190 |
link |
2024-01-30 |
Zero-Shot Reinforcement Learning via Function Encoders |
Tyler Ingebrand et.al. |
2401.17173 |
link |
2024-01-30 |
Learning Approximation Sets for Exploratory Queries |
Susan B. Davidson et.al. |
2401.17059 |
null |
2024-01-30 |
M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation |
Fotios Lygerakis et.al. |
2401.17032 |
link |
2024-01-30 |
Re3val: Reinforced and Reranked Generative Retrieval |
EuiYul Song et.al. |
2401.16979 |
null |
2024-01-30 |
CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement Learning |
Andreas W. M. Sauter et.al. |
2401.16974 |
link |
2024-01-30 |
Deep Contextual Bandit and Reinforcement Learning for IRS-Assisted MU-MIMO Systems |
Dariel Pereira-Ruisánchez et.al. |
2401.16901 |
null |
2024-01-30 |
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control |
Zhongyu Li et.al. |
2401.16889 |
null |
2024-01-30 |
Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator |
Ryoma Furuyama et.al. |
2401.16772 |
null |
2024-01-30 |
Gradient-Based Language Model Red Teaming |
Nevan Wichers et.al. |
2401.16656 |
link |
2024-01-29 |
Curriculum-Based Reinforcement Learning for Quadrupedal Jumping: A Reference-free Design |
Vassil Atanassov et.al. |
2401.16337 |
link |
2024-01-29 |
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF |
Banghua Zhu et.al. |
2401.16335 |
null |
2024-01-29 |
Optimal Control of Renewable Energy Communities subject to Network Peak Fees with Model Predictive Control and Reinforcement Learning Algorithms |
Samy Aittahar et.al. |
2401.16321 |
null |
2024-01-29 |
Prepare Non-classical Collective Spin State by Reinforcement Learning |
X. L. Zhao et.al. |
2401.16320 |
null |
2024-01-29 |
Effective Communication with Dynamic Feature Compression |
Pietro Talli et.al. |
2401.16236 |
link |
2024-01-29 |
Scalable Reinforcement Learning for Linear-Quadratic Control of Networks |
Johan Olsson et.al. |
2401.16183 |
null |
2024-01-29 |
Future Impact Decomposition in Request-level Recommendations |
Xiaobei Wang et.al. |
2401.16108 |
link |
2024-01-29 |
Emergence of cooperation under punishment: A reinforcement learning perspective |
Chenyang Zhao et.al. |
2401.16073 |
null |
2024-01-29 |
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning |
Jianlan Luo et.al. |
2401.16013 |
null |
2024-01-29 |
A Deep Q-Network Based on Radial Basis Functions for Multi-Echelon Inventory Management |
Liqiang Cheng et.al. |
2401.15872 |
null |
2024-01-26 |
Fully Independent Communication in Multi-Agent Reinforcement Learning |
Rafael Pina et.al. |
2401.15059 |
link |
2024-01-26 |
Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement Learning |
Md Mushfiqur Rahman et.al. |
2401.15043 |
null |
2024-01-26 |
Reinforcement Learning-based Relay Selection for Cooperative WSNs in the Presence of Bursty Impulsive Noise |
Hazem Barka et.al. |
2401.15008 |
null |
2024-01-26 |
Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks |
Eura Nofshin et.al. |
2401.14923 |
null |
2024-01-26 |
RESPRECT: Speeding-up Multi-fingered Grasping with Residual Reinforcement Learning |
Federico Ceola et.al. |
2401.14858 |
link |
2024-01-26 |
A Deep Reinforcement Learning-based Approach for Adaptive Handover Protocols in Mobile Networks |
Peter J. Gu et.al. |
2401.14823 |
link |
2024-01-26 |
On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks |
Joar Skalse et.al. |
2401.14811 |
null |
2024-01-26 |
Off-Policy Primal-Dual Safe Reinforcement Learning |
Zifan Wu et.al. |
2401.14758 |
link |
2024-01-26 |
FairSample: Training Fair and Accurate Graph Convolutional Neural Networks Efficiently |
Zicun Cong et.al. |
2401.14702 |
null |
2024-01-25 |
GCBF+: A Neural Graph Control Barrier Function Framework for Distributed Safe Multi-Agent Control |
Songyuan Zhang et.al. |
2401.14554 |
null |
2024-01-25 |
Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks |
Shuai Han et.al. |
2401.14226 |
null |
2024-01-25 |
True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning |
Weihao Tan et.al. |
2401.14151 |
link |
2024-01-25 |
Concept: Dynamic Risk Assessment for AI-Controlled Robotic Systems |
Philipp Grimmeisen et.al. |
2401.14147 |
null |
2024-01-25 |
Towards a Systems Theory of Algorithms |
Florian Dörfler et.al. |
2401.14029 |
null |
2024-01-25 |
Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration |
Alireza Mohammadshahi et.al. |
2401.13979 |
link |
2024-01-25 |
Networked Multiagent Reinforcement Learning for Peer-to-Peer Energy Trading |
Chen Feng et.al. |
2401.13947 |
null |
2024-01-25 |
Learning-based sensing and computing decision for data freshness in edge computing-enabled networks |
Sinwoong Yun et.al. |
2401.13936 |
null |
2024-01-25 |
Reinforcement Learning with Hidden Markov Models for Discovering Decision-Making Dynamics |
Xingche Guo et.al. |
2401.13929 |
null |
2024-01-25 |
Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation |
Yixuan Zhang et.al. |
2401.13884 |
null |
2024-01-24 |
Machine learning for industrial sensing and control: A survey and practical perspective |
Nathan P. Lawrence et.al. |
2401.13836 |
null |
2024-01-24 |
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations |
Matthias Lehmann et.al. |
2401.13662 |
link |
2024-01-24 |
Emergence of anti-coordinated patterns in snowdrift game by reinforcement learning |
Zhen-Wei Ding et.al. |
2401.13497 |
null |
2024-01-24 |
Multi-Agent Diagnostics for Robustness via Illuminated Diversity |
Mikayel Samvelyan et.al. |
2401.13460 |
null |
2024-01-24 |
Symbolic Equation Solving via Reinforcement Learning |
Lennart Dabelow et.al. |
2401.13447 |
null |
2024-01-24 |
TraKDis: A Transformer-based Knowledge Distillation Approach for Visual Reinforcement Learning with Application to Cloth Manipulation |
Wei Chen et.al. |
2401.13362 |
null |
2024-01-24 |
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning |
Guoxin Chen et.al. |
2401.13246 |
link |
2024-01-24 |
DittoGym: Learning to Control Soft Shape-Shifting Robots |
Suning Huang et.al. |
2401.13231 |
link |
2024-01-23 |
NLBAC: A Neural Ordinary Differential Equations-based Framework for Stable and Safe Reinforcement Learning |
Liqun Zhao et.al. |
2401.13148 |
link |
2024-01-23 |
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts |
Lingfeng Shen et.al. |
2401.13136 |
null |
2024-01-23 |
Generalization of Heterogeneous Multi-Robot Policies via Awareness and Communication of Capabilities |
Pierce Howell et.al. |
2401.13127 |
null |
2024-01-23 |
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments |
Qinhong Zhou et.al. |
2401.12975 |
link |
2024-01-23 |
Reward-Relevance-Filtered Linear Offline Reinforcement Learning |
Angela Zhou et.al. |
2401.12934 |
null |
2024-01-23 |
Active Inference as a Model of Agency |
Lancelot Da Costa et.al. |
2401.12917 |
null |
2024-01-23 |
Emergent Communication Protocol Learning for Task Offloading in Industrial Internet of Things |
Salwa Mostafa et.al. |
2401.12914 |
null |
2024-01-23 |
Model-Free $δ$-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H$\infty$ Tracking Control |
Qi Wang et.al. |
2401.12882 |
null |
2024-01-23 |
Learning safety critics via a non-contractive binary bellman operator |
Agustin Castellano et.al. |
2401.12849 |
null |
2024-01-23 |
Digital Twin-Based Network Management for Better QoE in Multicast Short Video Streaming |
Xinyu Huang et.al. |
2401.12826 |
null |
2024-01-23 |
Deep Learning Based Simulators for the Phosphorus Removal Process Control in Wastewater Treatment via Deep Reinforcement Learning Algorithms |
Esmaeel Mohammadi et.al. |
2401.12822 |
null |
2024-01-23 |
Dynamic Layer Tying for Parameter-Efficient Transformers |
Tamir David Hay et.al. |
2401.12819 |
null |
2024-01-23 |
Learning Mean Field Games on Sparse Graphs: A Hybrid Graphex Approach |
Christian Fabian et.al. |
2401.12686 |
null |
2024-01-22 |
Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning |
Philip Amortila et.al. |
2401.12216 |
null |
2024-01-22 |
Retrieval-Guided Reinforcement Learning for Boolean Circuit Minimization |
Animesh Basak Chowdhury et.al. |
2401.12205 |
null |
2024-01-22 |
WARM: On the Benefits of Weight Averaged Reward Models |
Alexandre Ramé et.al. |
2401.12187 |
null |
2024-01-22 |
West-of-N: Synthetic Preference Generation for Improved Reward Modeling |
Alizée Pace et.al. |
2401.12086 |
null |
2024-01-22 |
Collaborative Reinforcement Learning Based Unmanned Aerial Vehicle (UAV) Trajectory Design for 3D UAV Tracking |
Yujiao Zhu et.al. |
2401.12079 |
null |
2024-01-22 |
HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum) |
Volodymyr Kuzma et.al. |
2401.12048 |
null |
2024-01-22 |
Adaptive Motion Planning for Multi-fingered Functional Grasp via Force Feedback |
Dongying Tian et.al. |
2401.11977 |
null |
2024-01-22 |
Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey |
Pengyi Li et.al. |
2401.11963 |
link |
2024-01-22 |
Self-Labeling the Job Shop Scheduling Problem |
Andrea Corsini et.al. |
2401.11849 |
link |
2024-01-22 |
Safe and Generalized end-to-end Autonomous Driving System with Reinforcement Learning and Demonstrations |
Zuojin Tang et.al. |
2401.11792 |
null |
2024-01-19 |
Reinforcement learning for question answering in programming domain using public community scoring as a human feedback |
Alexey Gorbatovski et.al. |
2401.10882 |
null |
2024-01-19 |
Deep Reinforcement Learning Empowered Activity-Aware Dynamic Health Monitoring Systems |
Ziqiaing Ye et.al. |
2401.10794 |
null |
2024-01-19 |
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model |
Yinan Zheng et.al. |
2401.10700 |
link |
2024-01-19 |
Quality-Diversity Algorithms Can Provably Be Helpful for Optimization |
Chao Qian et.al. |
2401.10539 |
null |
2024-01-19 |
Episodic Reinforcement Learning with Expanded State-reward Space |
Dayang Liang et.al. |
2401.10516 |
null |
2024-01-18 |
HRL-TSCH: A Hierarchical Reinforcement Learning-based TSCH Scheduler for IIoT |
F. Fernando Jurado-Lasso et.al. |
2401.10368 |
null |
2024-01-18 |
LangProp: A code optimization framework using Language Models applied to driving |
Shu Ishida et.al. |
2401.10314 |
link |
2024-01-18 |
Model-Assisted Learning for Adaptive Cooperative Perception of Connected Autonomous Vehicles |
Kaige Qu et.al. |
2401.10156 |
null |
2024-01-18 |
Multi-Agent Reinforcement Learning for Maritime Operational Technology Cyber Security |
Alec Wilson et.al. |
2401.10149 |
null |
2024-01-18 |
Deep Back-Filling: a Split Window Technique for Deep Online Cluster Job Scheduling |
Lingfei Wang et.al. |
2401.09910 |
null |
2024-01-18 |
Cooperative Edge Caching Based on Elastic Federated and Multi-Agent Deep Reinforcement Learning in Next-Generation Network |
Qiong Wu et.al. |
2401.09886 |
link |
2024-01-18 |
Reconciling Spatial and Temporal Abstractions for Goal Representation |
Mehdi Zadem et.al. |
2401.09870 |
link |
2024-01-18 |
FREED++: Improving RL Agents for Fragment-Based Molecule Generation by Thorough Reproduction |
Alexander Telepov et.al. |
2401.09840 |
link |
2024-01-18 |
Optimizing Visible Light Communication Efficiency Through Reinforcement Learning-Based NOMA-CSK Integration |
Serkan Vela et.al. |
2401.09780 |
null |
2024-01-18 |
Robotic Test Tube Rearrangement Using Combined Reinforcement Learning and Motion Planning |
Hao Chen et.al. |
2401.09772 |
null |
2024-01-18 |
Exploration and Anti-Exploration with Distributional Random Network Distillation |
Kai Yang et.al. |
2401.09750 |
link |
2024-01-18 |
A HPC Co-Scheduler with Reinforcement Learning |
Abel Souza et.al. |
2401.09706 |
null |
2024-01-17 |
Central Limit Theorem for Two-Timescale Stochastic Approximation with Markovian Noise: Theory and Applications |
Jie Hu et.al. |
2401.09339 |
null |
2024-01-17 |
Vision-driven Autonomous Flight of UAV Along River Using Deep Reinforcement Learning with Dynamic Expert Guidance |
Zihan Wang et.al. |
2401.09332 |
link |
2024-01-17 |
Deployable Reinforcement Learning with Variable Control Rate |
Dong Wang et.al. |
2401.09286 |
link |
2024-01-17 |
An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation |
Yinuo Zhao et.al. |
2401.09258 |
null |
2024-01-17 |
LLMs for Relational Reasoning: How Far are We? |
Zhiming Li et.al. |
2401.09042 |
null |
2024-01-17 |
UOEP: User-Oriented Exploration Policy for Enhancing Long-Term User Experiences in Recommender Systems |
Changshuo Zhang et.al. |
2401.09034 |
link |
2024-01-17 |
Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous Agent |
Hugo Laurencon et.al. |
2401.08999 |
null |
2024-01-17 |
ReFT: Reasoning with Reinforced Fine-Tuning |
Trung Quoc Luong et.al. |
2401.08967 |
link |
2024-01-17 |
Cascading Reinforcement Learning |
Yihan Du et.al. |
2401.08961 |
null |
2024-01-17 |
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback |
Teng Xiao et.al. |
2401.08959 |
null |
2024-01-16 |
On Quantum Natural Policy Gradients |
André Sequeira et.al. |
2401.08307 |
link |
2024-01-16 |
Sum Throughput Maximization in Multi-BD Symbiotic Radio NOMA Network Assisted by Active-STAR-RIS |
Rahman Saadat Yeganeh et.al. |
2401.08301 |
null |
2024-01-16 |
PRewrite: Prompt Rewriting with Reinforcement Learning |
Weize Kong et.al. |
2401.08189 |
null |
2024-01-16 |
IoTWarden: A Deep Reinforcement Learning Based Real-time Defense System to Mitigate Trigger-action IoT Attacks |
Md Morshed Alam et.al. |
2401.08141 |
null |
2024-01-16 |
CycLight: learning traffic signal cooperation with a cycle-level strategy |
Gengyue Han et.al. |
2401.08121 |
null |
2024-01-15 |
Survey of Learning Approaches for Robotic In-Hand Manipulation |
Abraham Itzhak Weinberg et.al. |
2401.07915 |
null |
2024-01-15 |
Learned Best-Effort LLM Serving |
Siddharth Jha et.al. |
2401.07886 |
null |
2024-01-15 |
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise |
Shuze Liu et.al. |
2401.07844 |
null |
2024-01-15 |
Inferring Preferences from Demonstrations in Multi-Objective Residential Energy Management |
Junlin Lu et.al. |
2401.07722 |
null |
2024-01-15 |
Go-Explore for Residential Energy Management |
Junlin Lu et.al. |
2401.07710 |
null |
2024-01-12 |
NetMind: Adaptive RAN Baseband Function Placement by GCN Encoding and Maze-solving DRL |
Haiyuan Li et.al. |
2401.06722 |
link |
2024-01-12 |
Identifying Policy Gradient Subspaces |
Jan Schneider et.al. |
2401.06604 |
null |
2024-01-12 |
Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study |
Shangding Gu et.al. |
2401.06603 |
null |
2024-01-12 |
Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field Games |
Berkay Anahtarci et.al. |
2401.06566 |
null |
2024-01-12 |
Personalized Reinforcement Learning with a Budget of Policies |
Dmitry Ivanov et.al. |
2401.06514 |
link |
2024-01-12 |
AI-enabled Priority and Auction-Based Spectrum Management for 6G |
Mina Khadem et.al. |
2401.06484 |
null |
2024-01-12 |
UNEX-RL: Reinforcing Long-Term Rewards in Multi-Stage Recommender Systems with UNidirectional EXecution |
Gengrui Zhang et.al. |
2401.06470 |
null |
2024-01-12 |
Striking a Balance in Fairness for Dynamic Systems Through Reinforcement Learning |
Yaowei Hu et.al. |
2401.06318 |
link |
2024-01-12 |
A Semantic-Aware Multiple Access Scheme for Distributed, Dynamic 6G-Based Applications |
Hamidreza Mazandarani et.al. |
2401.06308 |
null |
2024-01-11 |
Model-Free Reinforcement Learning for Automated Fluid Administration in Critical Care |
Elham Estiri et.al. |
2401.06299 |
null |
2024-01-11 |
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint |
Zhipeng Chen et.al. |
2401.06081 |
link |
2024-01-11 |
Secrets of RLHF in Large Language Models Part II: Reward Modeling |
Binghai Wang et.al. |
2401.06080 |
link |
2024-01-11 |
Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem |
Niklas Strauß et.al. |
2401.05969 |
null |
2024-01-11 |
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications |
Xijun Li et.al. |
2401.05960 |
null |
2024-01-11 |
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization |
Yuanzhao Zhai et.al. |
2401.05899 |
null |
2024-01-11 |
Safe reinforcement learning in uncertain contexts |
Dominik Baumann et.al. |
2401.05876 |
link |
2024-01-11 |
Confidence-Based Curriculum Learning for Multi-Agent Path Finding |
Thomy Phan et.al. |
2401.05860 |
link |
2024-01-11 |
Interactions between dynamic team composition and coordination: An agent-based modeling approach |
Darío Blanco-Fernández et.al. |
2401.05832 |
null |
2024-01-11 |
Towards Goal-Oriented Agents for Evolving Problems Observed via Conversation |
Michael Free et.al. |
2401.05822 |
null |
2024-01-11 |
Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents |
Quentin Delfosse et.al. |
2401.05821 |
link |
2024-01-10 |
ReACT: Reinforcement Learning for Controller Parametrization using B-Spline Geometries |
Thomas Rudolf et.al. |
2401.05251 |
null |
2024-01-10 |
Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces |
Yaqi Duan et.al. |
2401.05233 |
null |
2024-01-10 |
Modelling, Positioning, and Deep Reinforcement Learning Path Tracking Control of Scaled Robotic Vehicles: Design and Experimental Validation |
Carmine Caponio et.al. |
2401.05194 |
null |
2024-01-11 |
DRL-based Latency-Aware Network Slicing in O-RAN with Time-Varying SLAs |
Raoul Raftopoulos et.al. |
2401.05042 |
null |
2024-01-10 |
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk |
Dennis Ulmer et.al. |
2401.05033 |
null |
2024-01-10 |
An Information Theoretic Approach to Interaction-Grounded Learning |
Xiaoyan Hu et.al. |
2401.05015 |
null |
2024-01-10 |
Advancing ECG Diagnosis Using Reinforcement Learning on Global Waveform Variations Related to P Wave and PR Interval |
Rumsha Fatima et.al. |
2401.04938 |
null |
2024-01-10 |
Fully Decentralized Cooperative Multi-Agent Reinforcement Learning: A Survey |
Jiechuan Jiang et.al. |
2401.04934 |
null |
2024-01-09 |
Graph Learning-based Fleet Scheduling for Urban Air Mobility under Operational Constraints, Varying Demand & Uncertainties |
Steve Paul et.al. |
2401.04851 |
null |
2024-01-09 |
Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring |
Samuel Yanes Luis et.al. |
2401.04631 |
null |
2024-01-09 |
Scalable Policies for the Dynamic Traveling Multi-Maintainer Problem with Alerts |
Peter Verleijsdonk et.al. |
2401.04574 |
null |
2024-01-09 |
i-Rebalance: Personalized Vehicle Repositioning for Supply Demand Balance |
Haoyang Chen et.al. |
2401.04429 |
null |
2024-01-09 |
StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments |
Sean Kulinski et.al. |
2401.04290 |
null |
2024-01-08 |
Curiosity & Entropy Driven Unsupervised RL in Multiple Environments |
Shaurya Dewan et.al. |
2401.04198 |
null |
2024-01-08 |
A Minimaximalist Approach to Reinforcement Learning from Human Feedback |
Gokul Swamy et.al. |
2401.04056 |
null |
2024-01-08 |
Behavioural Cloning in VizDoom |
Ryan Spick et.al. |
2401.03993 |
null |
2024-01-08 |
Guiding drones by information gain |
Alouette van Hove et.al. |
2401.03947 |
null |
2024-01-08 |
Using reinforcement learning to improve drone-based inference of greenhouse gas fluxes |
Alouette van Hove et.al. |
2401.03932 |
link |
2024-01-08 |
A Tensor Network Implementation of Multi Agent Reinforcement Learning |
Sunny Howard et.al. |
2401.03896 |
null |
2024-01-08 |
Inverse Reinforcement Learning with Sub-optimal Experts |
Riccardo Poiani et.al. |
2401.03857 |
null |
2024-01-08 |
Long-term Safe Reinforcement Learning with Binary Feedback |
Akifumi Wachi et.al. |
2401.03786 |
null |
2024-01-07 |
NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds |
Shivam Goel et.al. |
2401.03546 |
null |
2024-01-07 |
ClusterComm: Discrete Communication in Decentralized MARL using Internal Representation Clustering |
Robert Müller et.al. |
2401.03504 |
null |
2024-01-07 |
Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence |
Philip Jordan et.al. |
2401.03489 |
link |
2024-01-05 |
A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty |
Parvin Malekzadeh et.al. |
2401.02914 |
null |
2024-01-05 |
Deep Reinforcement Learning for Local Path Following of an Autonomous Formula SAE Vehicle |
Harvey Merton et.al. |
2401.02903 |
null |
2024-01-05 |
Synergistic Formulaic Alpha Generation for Quantitative Trading based on Reinforcement Learning |
Hong-Gi Shin et.al. |
2401.02710 |
null |
2024-01-05 |
Adaptive Discounting of Training Time Attacks |
Ridhima Bector et.al. |
2401.02652 |
null |
2024-01-05 |
Improving sample efficiency of high dimensional Bayesian optimization with MCMC |
Zeji Yi et.al. |
2401.02650 |
null |
2024-01-05 |
Simple Hierarchical Planning with Diffusion |
Chang Chen et.al. |
2401.02644 |
null |
2024-01-04 |
Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel |
Jinhang Chai et.al. |
2401.02520 |
null |
2024-01-04 |
Towards an Adaptable and Generalizable Optimization Engine in Decision and Control: A Meta Reinforcement Learning Approach |
Sungwook Yang et.al. |
2401.02508 |
null |
2024-01-04 |
A Survey Analyzing Generalization in Deep Reinforcement Learning |
Ezgi Korkmaz et.al. |
2401.02349 |
null |
2024-01-04 |
A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning |
Parvin Malekzadeh et.al. |
2401.02325 |
link |
2024-01-04 |
Policy-regularized Offline Multi-objective Reinforcement Learning |
Qian Lin et.al. |
2401.02244 |
link |
2024-01-04 |
Trajectory-Oriented Policy Optimization with Sparse Rewards |
Guojian Wang et.al. |
2401.02225 |
null |
2024-01-04 |
OFDM-Based Digital Semantic Communication with Importance Awareness |
Chuanhong Liu et.al. |
2401.02178 |
null |
2024-01-04 |
Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning |
Ke Li et.al. |
2401.02160 |
null |
2024-01-04 |
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers |
Chen Zheng et.al. |
2401.02072 |
null |
2024-01-03 |
NODEC: Neural ODE For Optimal Control of Unknown Dynamical Systems |
Cheng Chi et.al. |
2401.01836 |
link |
2024-01-03 |
Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment |
Shamyo Brotee et.al. |
2401.01481 |
null |
2024-01-02 |
Learning-based agricultural management in partially observable environments subject to climate variability |
Zhaoan Wang et.al. |
2401.01273 |
null |
2024-01-02 |
Mirror Descent for Stochastic Control Problems with Measure-valued Controls |
Bekzhan Kerimkulov et.al. |
2401.01198 |
null |
2024-01-02 |
Deep Learning Driven Buffer-Aided Cooperative Networks for B5G/6G: Challenges, Solutions, and Future Opportunities |
Peng Xu et.al. |
2401.01195 |
null |
2024-01-02 |
Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer |
Yanni Wang et.al. |
2401.01165 |
null |
2024-01-02 |
Enhancing Communication Efficiency of Semantic Transmission via Joint Processing Technique |
Xumin Pu et.al. |
2401.01143 |
null |
2024-01-02 |
Joint Offloading and Resource Allocation for Hybrid Cloud and Edge Computing in SAGINs: A Decision Assisted Hybrid Action Space Deep Reinforcement Learning Approach |
Chong Huang et.al. |
2401.01140 |
null |
2024-01-02 |
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction |
Jie Feng et.al. |
2401.01084 |
null |
2024-01-01 |
Data Assimilation in Chaotic Systems Using Deep Reinforcement Learning |
Mohamad Abed El Rahman Hammoud et.al. |
2401.00916 |
null |
2024-01-01 |
Polynomial-time Approximation Scheme for Equilibriums of Games |
Hongbo Sun et.al. |
2401.00747 |
link |
2024-01-01 |
Personalized Dynamic Pricing Policy for Electric Vehicles: Reinforcement learning approach |
Sangjun Bae et.al. |
2401.00661 |
null |
2023-12-29 |
Adaptive Control Strategy for Quadruped Robots in Actuator Degradation Scenarios |
Xinyuan Wu et.al. |
2312.17606 |
link |
2023-12-29 |
Exploring Deep Reinforcement Learning for Robust Target Tracking using Micro Aerial Vehicles |
Alberto Dionigi et.al. |
2312.17552 |
link |
2023-12-29 |
Design Space Exploration of Approximate Computing Techniques with a Reinforcement Learning Approach |
Sepide Saeedi et.al. |
2312.17525 |
null |
2023-12-29 |
Actuator-Constrained Reinforcement Learning for High-Speed Quadrupedal Locomotion |
Young-Ha Shin et.al. |
2312.17507 |
null |
2023-12-29 |
HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning |
Hao Wang et.al. |
2312.17503 |
null |
2023-12-29 |
Culturally-Attuned Moral Machines: Implicit Learning of Human Value Systems by AI through Inverse Reinforcement Learning |
Nigini Oliveira et.al. |
2312.17479 |
null |
2023-12-29 |
Once Burned, Twice Shy? The Effect of Stock Market Bubbles on Traders that Learn by Experience |
Haibei Zhu et.al. |
2312.17472 |
null |
2023-12-28 |
Beyond PID Controllers: PPO with Neuralized PID Policy for Proton Beam Intensity Control in Mu2e |
Chenwei Xu et.al. |
2312.17372 |
null |
2023-12-28 |
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity |
Guhao Feng et.al. |
2312.17248 |
null |
2023-12-28 |
Resilient Constrained Reinforcement Learning |
Dongsheng Ding et.al. |
2312.17194 |
null |
2023-12-28 |
Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? |
Gunshi Gupta et.al. |
2312.17168 |
link |
2023-12-28 |
Generalizable Visual Reinforcement Learning with Segment Anything Model |
Ziyu Wang et.al. |
2312.17116 |
link |
2023-12-28 |
When Metaverses Meet Vehicle Road Cooperation: Multi-Agent DRL-Based Stackelberg Game for Vehicular Twins Migration |
Jiawen Kang et.al. |
2312.17081 |
null |
2023-12-28 |
Model-aware reinforcement learning for high-performance Bayesian experimental design in quantum metrology |
Federico Belliardo et.al. |
2312.16985 |
link |
2023-12-28 |
Reinforcement-based Display-size Selection for Frugal Satellite Image Change Detection |
Hichem Sahbi et.al. |
2312.16965 |
null |
2023-12-28 |
RLPlanner: Reinforcement Learning based Floorplanning for Chiplets with Fast Thermal Analysis |
Yuanyuan Duan et.al. |
2312.16895 |
null |
2023-12-28 |
Tail-Learning: Adaptive Learning Method for Mitigating Tail Latency in Autonomous Edge Systems |
Cheng Zhang et.al. |
2312.16883 |
null |
2023-12-28 |
Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studies |
Bing Yuan et.al. |
2312.16815 |
null |
2023-12-26 |
A Bayesian Framework of Deep Reinforcement Learning for Joint O-RAN/MEC Orchestration |
Fahri Wisnu Murti et.al. |
2312.16142 |
null |
2023-12-26 |
Large Language Models as Traffic Signal Control Agents: Capacity and Opportunity |
Siqi Lai et.al. |
2312.16044 |
link |
2023-12-26 |
Aligning Large Language Models with Human Preferences through Representation Engineering |
Wenhao Liu et.al. |
2312.15997 |
link |
2023-12-26 |
Adaptive Kalman-based hybrid car following strategy using TD3 and CACC |
Yuqi Zheng et.al. |
2312.15993 |
null |
2023-12-26 |
Optimistic and Pessimistic Actor in RL:Decoupling Exploration and Utilization |
Jingpu Yang et.al. |
2312.15965 |
link |
2023-12-26 |
Reinforcement Unlearning |
Dayong Ye et.al. |
2312.15910 |
null |
2023-12-26 |
Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations |
Renzhe Zhou et.al. |
2312.15909 |
link |
2023-12-26 |
PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning |
Hangyu Mao et.al. |
2312.15863 |
link |
2023-12-26 |
Learning Online Policies for Person Tracking in Multi-View Environments |
Keivan Nalaie et.al. |
2312.15858 |
null |
2023-12-25 |
A Closed-Loop Multi-perspective Visual Servoing Approach with Reinforcement Learning |
Lei Zhang et.al. |
2312.15809 |
null |
2023-12-22 |
A Survey of Reinforcement Learning from Human Feedback |
Timo Kaufmann et.al. |
2312.14925 |
null |
2023-12-22 |
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning |
Filippos Christianos et.al. |
2312.14878 |
null |
2023-12-22 |
YAYI 2: Multilingual Open-Source Large Language Models |
Yin Luo et.al. |
2312.14862 |
null |
2023-12-22 |
An investigation of belief-free DRL and MCTS for inspection and maintenance planning |
Daniel Koutas et.al. |
2312.14824 |
null |
2023-12-22 |
Hierarchical Multi-Agent Reinforcement Learning for Assessing False-Data Injection Attacks on Transportation Networks |
Taha Eghtesad et.al. |
2312.14625 |
null |
2023-12-22 |
Machine learning for structure-guided materials and process design |
Lukas Morand et.al. |
2312.14552 |
null |
2023-12-22 |
DuaLight: Enhancing Traffic Signal Control by Leveraging Scenario-Specific and Scenario-Shared Knowledge |
Jiaming Lu et.al. |
2312.14532 |
link |
2023-12-22 |
Not All Tasks Are Equally Difficult: Multi-Task Reinforcement Learning with Dynamic Depth Routing |
Jinmin He et.al. |
2312.14472 |
null |
2023-12-22 |
Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration |
Honghao Wei et.al. |
2312.14470 |
null |
2023-12-22 |
Dynamic Programming-based Approximate Optimal Control for Model-Based Reinforcement Learning |
Prakash Mallick et.al. |
2312.14463 |
null |
2023-12-21 |
Diffusion Reward: Learning Rewards via Conditional Video Diffusion |
Tao Huang et.al. |
2312.14134 |
null |
2023-12-21 |
CVA Hedging by Risk-Averse Stochastic-Horizon Reinforcement Learning |
Roberto Daluiso et.al. |
2312.14044 |
null |
2023-12-21 |
Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing |
Hany Abdulsamad et.al. |
2312.14000 |
link |
2023-12-21 |
Modular Neural Network Policies for Learning In-Flight Object Catching with a Robot Hand-Arm System |
Wenbin Hu et.al. |
2312.13987 |
null |
2023-12-21 |
Multi-Agent Probabilistic Ensembles with Trajectory Sampling for Connected Autonomous Vehicles |
Ruoqi Wen et.al. |
2312.13910 |
null |
2023-12-21 |
Variational Quantum Circuit Design for Quantum Reinforcement Learning on Continuous Environments |
Georg Kruse et.al. |
2312.13798 |
null |
2023-12-21 |
Open-Source Reinforcement Learning Environments Implemented in MuJoCo with Franka Manipulator |
Zichun Xu et.al. |
2312.13788 |
link |
2023-12-21 |
Critic-Guided Decision Transformer for Offline Reinforcement Learning |
Yuanfu Wang et.al. |
2312.13716 |
null |
2023-12-21 |
Automatic Curriculum Learning with Gradient Reward Signals |
Ryan Campbell et.al. |
2312.13565 |
null |
2023-12-20 |
Entropy-Regularized Mean-Variance Portfolio Optimization with Jumps |
Christian Bender et.al. |
2312.13409 |
null |
2023-12-20 |
First-principle-like reinforcement learning of nonlinear numerical schemes for conservation laws |
Hao-Chen Wang et.al. |
2312.13260 |
null |
2023-12-20 |
Learning Best Response Policies in Dynamic Auctions via Deep Reinforcement Learning |
Vinzenz Thoma et.al. |
2312.13232 |
null |
2023-12-20 |
Task-oriented Semantics-aware Communications for Robotic Waypoint Transmission: the Value and Age of Information Approach |
Wenchao Wu et.al. |
2312.13182 |
null |
2023-12-20 |
Collaborative Optimization of the Age of Information under Partial Observability |
Anam Tahir et.al. |
2312.12977 |
null |
2023-12-20 |
Sparse Mean Field Load Balancing in Large Localized Queueing Systems |
Anam Tahir et.al. |
2312.12973 |
null |
2023-12-20 |
PGN: A perturbation generation network against deep reinforcement learning |
Xiangjuan Li et.al. |
2312.12904 |
null |
2023-12-20 |
Parameterized Projected Bellman Operator |
Théo Vincent et.al. |
2312.12869 |
link |
2023-12-20 |
Towards Machines that Trust: AI Agents Learn to Trust in the Trust Game |
Ardavan S. Nobandegani et.al. |
2312.12868 |
null |
2023-12-20 |
Dynamic Fairness-Aware Spectrum Auction for Enhanced Licensed Shared Access in 6G Networks |
Mina Khadem et.al. |
2312.12867 |
null |
2023-12-20 |
Safe Multi-Agent Reinforcement Learning for Formation Control without Individual Reference Targets |
Murad Dawood et.al. |
2312.12861 |
null |
2023-12-19 |
Emergence of In-Context Reinforcement Learning from Noise Distillation |
Ilya Zisman et.al. |
2312.12275 |
link |
2023-12-19 |
TaskFlex Solver for Multi-Agent Pursuit via Automatic Curriculum Learning |
Jiayu Chen et.al. |
2312.12255 |
null |
2023-12-19 |
CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning |
Chenyu Sun et.al. |
2312.12191 |
null |
2023-12-19 |
OVD-Explorer:Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments |
Jinyi Liu et.al. |
2312.12145 |
null |
2023-12-19 |
Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning |
Yanwen Ba et.al. |
2312.12095 |
link |
2023-12-19 |
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property |
Ioannis Anagnostides et.al. |
2312.12067 |
null |
2023-12-19 |
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX |
Alexander Nikulin et.al. |
2312.12044 |
link |
2023-12-19 |
LHManip: A Dataset for Long-Horizon Language-Grounded Manipulation Tasks in Cluttered Tabletop Environments |
Federico Ceola et.al. |
2312.12036 |
link |
2023-12-19 |
Parameterized Decision-making with Multi-modal Perception for Autonomous Driving |
Yuyang Xia et.al. |
2312.11935 |
null |
2023-12-19 |
Stable Relay Learning Optimization Approach for Fast Power System Production Cost Minimization Simulation |
Zishan Guo et.al. |
2312.11896 |
null |
2023-12-18 |
Contextual Reinforcement Learning for Offshore Wind Farm Bidding |
David Cole et.al. |
2312.10884 |
null |
2023-12-17 |
Learning to Act without Actions |
Dominik Schmidt et.al. |
2312.10812 |
link |
2023-12-17 |
Deep-Dispatch: A Deep Reinforcement Learning-Based Vehicle Dispatch Algorithm for Advanced Air Mobility |
Elaheh Sabziyan Varnousfaderani et.al. |
2312.10809 |
null |
2023-12-17 |
Language-conditioned Learning for Robotic Manipulation: A Survey |
Hongkuan Zhou et.al. |
2312.10807 |
link |
2023-12-17 |
CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with Trajectory Optimization |
Elisa Alboni et.al. |
2312.10666 |
link |
2023-12-17 |
Episodic Return Decomposition by Difference of Implicitly Assigned Sub-Trajectory Reward |
Haoxin Lin et.al. |
2312.10642 |
link |
2023-12-17 |
Risk-Constrained Reinforcement Learning for Inverter-Dominated Power System Controls |
Kyung-bin Kwon et.al. |
2312.10635 |
null |
2023-12-16 |
Improving Environment Robustness of Deep Reinforcement Learning Approaches for Autonomous Racing Using Bayesian Optimization-based Curriculum Learning |
Rohan Banerjee et.al. |
2312.10557 |
link |
2023-12-16 |
Advancing RAN Slicing with Offline Reinforcement Learning |
Kun Yang et.al. |
2312.10547 |
null |
2023-12-16 |
Spatial Deep Learning for Site-Specific Movement Optimization of Aerial Base Stations |
Jiangbin Lyu et.al. |
2312.10490 |
null |
2023-12-15 |
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent |
Renat Aksitov et.al. |
2312.10003 |
null |
2023-12-15 |
Toward Computationally Efficient Inverse Reinforcement Learning via Reward Shaping |
Lauren H. Cooke et.al. |
2312.09983 |
null |
2023-12-15 |
Deep Reinforcement Learning for Joint Cruise Control and Intelligent Data Acquisition in UAVs-Assisted Sensor Networks |
Yousef Emami et.al. |
2312.09953 |
null |
2023-12-15 |
Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations |
Cedric Derstroff et.al. |
2312.09950 |
link |
2023-12-15 |
Assume-Guarantee Reinforcement Learning |
Milad Kazemi et.al. |
2312.09938 |
null |
2023-12-15 |
LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer |
Yuxin Cao et.al. |
2312.09935 |
link |
2023-12-15 |
Sample-Efficient Learning to Solve a Real-World Labyrinth Game Using Data-Augmented Model-Based Reinforcement Learning |
Thomas Bi et.al. |
2312.09906 |
null |
2023-12-15 |
Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation |
Girolamo Macaluso et.al. |
2312.09844 |
null |
2023-12-15 |
Benchmarking the Full-Order Model Optimization Based Imitation in the Humanoid Robot Reinforcement Learning Walk |
Ekaterina Chaikovskaya et.al. |
2312.09757 |
null |
2023-12-15 |
GraphRARE: Reinforcement Learning Enhanced Graph Neural Network with Relative Entropy |
Tianhao Peng et.al. |
2312.09708 |
null |
2023-12-14 |
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking |
Jacob Eisenstein et.al. |
2312.09244 |
null |
2023-12-14 |
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft |
Hao Li et.al. |
2312.09238 |
null |
2023-12-14 |
Vision-Language Models as a Source of Rewards |
Kate Baumli et.al. |
2312.09187 |
null |
2023-12-14 |
MRL-PoS: A Multi-agent Reinforcement Learning based Proof of Stake Consensus Algorithm for Blockchain |
Tariqul Islam et.al. |
2312.09123 |
null |
2023-12-14 |
Less is more -- the Dispatcher/ Executor principle for multi-task Reinforcement Learning |
Martin Riedmiller et.al. |
2312.09120 |
null |
2023-12-14 |
DeepSurveySim: Simulation Software and Benchmark Challenges for Astronomical Observation Scheduling |
Maggie Voetberg et.al. |
2312.09092 |
link |
2023-12-14 |
ReCoRe: Regularized Contrastive Representation Learning of World Model |
Rudra P. K. Poudel et.al. |
2312.09056 |
null |
2023-12-14 |
Using Surprise Index for Competency Assessment in Autonomous Decision-Making |
Akash Ratheesh et.al. |
2312.09033 |
null |
2023-12-14 |
Adaptive parameter sharing for multi-agent reinforcement learning |
Dapeng Li et.al. |
2312.09009 |
null |
2023-12-14 |
LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers |
Taewook Nam et.al. |
2312.08958 |
null |
2023-12-13 |
The Effective Horizon Explains Deep RL Performance in Stochastic Environments |
Cassidy Laidlaw et.al. |
2312.08369 |
link |
2023-12-13 |
An Invitation to Deep Reinforcement Learning |
Bernhard Jaeger et.al. |
2312.08365 |
null |
2023-12-13 |
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF |
Anand Siththaranjan et.al. |
2312.08358 |
link |
2023-12-13 |
Model-Free Verification for Neural Network Controlled Systems |
Han Wang et.al. |
2312.08293 |
null |
2023-12-13 |
Leveraging User Simulation to Develop and Evaluate Conversational Information Access Agents |
Nolwenn Bernard et.al. |
2312.08041 |
null |
2023-12-13 |
Secure Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless MEC Networks |
Xin Hao et.al. |
2312.08016 |
null |
2023-12-14 |
Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies |
Vicki Young et.al. |
2312.07953 |
null |
2023-12-13 |
On Designing Multi-UAV aided Wireless Powered Dynamic Communication via Hierarchical Deep Reinforcement Learning |
Ze Yu Zhao et.al. |
2312.07917 |
null |
2023-12-13 |
Artificial Intelligence Studies in Cartography: A Review and Synthesis of Methods, Applications, and Ethics |
Yuhao Kang et.al. |
2312.07901 |
null |
2023-12-13 |
RAT: Reinforcement-Learning-Driven and Adaptive Testing for Vulnerability Discovery in Web Application Firewalls |
Mohammadhossein Amouei et.al. |
2312.07885 |
link |
2023-12-12 |
On Diverse Preferences for Large Language Model Alignment |
Dun Zeng et.al. |
2312.07401 |
link |
2023-12-12 |
ReRoGCRL: Representation-based Robustness in Goal-Conditioned Reinforcement Learning |
Xiangyu Yin et.al. |
2312.07392 |
link |
2023-12-12 |
Sequential Planning in Large Partially Observable Environments guided by LLMs |
Swarna Kamal Paul et.al. |
2312.07368 |
link |
2023-12-12 |
Intelligible Protocol Learning for Resource Allocation in 6G O-RAN Slicing |
Farhad Rezazadeh et.al. |
2312.07362 |
null |
2023-12-12 |
Learning from Interaction: User Interface Adaptation using Reinforcement Learning |
Daniel Gaspar-Figueiredo et.al. |
2312.07216 |
null |
2023-12-12 |
Beyond Expected Return: Accounting for Policy Reproducibility when Evaluating Reinforcement Learning Algorithms |
Manon Flageat et.al. |
2312.07178 |
null |
2023-12-12 |
Noise Distribution Decomposition based Multi-Agent Distributional Reinforcement Learning |
Wei Geng et.al. |
2312.07025 |
null |
2023-12-11 |
A Novel Differentiable Loss Function for Unsupervised Graph Neural Networks in Graph Partitioning |
Vivek Chaudhary et.al. |
2312.06877 |
null |
2023-12-11 |
Scalable Decentralized Cooperative Platoon using Multi-Agent Deep Reinforcement Learning |
Ahmed Abdelrahman et.al. |
2312.06858 |
null |
2023-12-11 |
Data-Driven Modeling and Verification of Perception-Based Autonomous Systems |
Thomas Waite et.al. |
2312.06848 |
null |
2023-12-11 |
Convergence of Multi-Scale Reinforcement Q-Learning Algorithms for Mean Field Game and Control Problems |
Andrea Angiuli et.al. |
2312.06659 |
null |
2023-12-11 |
Can Reinforcement Learning support policy makers? A preliminary study with Integrated Assessment Models |
Theodore Wolf et.al. |
2312.06527 |
null |
2023-12-11 |
Decoupling Meta-Reinforcement Learning with Gaussian Task Contexts and Skills |
Hongcai He et.al. |
2312.06518 |
link |
2023-12-11 |
Reward Certification for Policy Smoothed Reinforcement Learning |
Ronghui Mu et.al. |
2312.06436 |
null |
2023-12-11 |
Partial End-to-end Reinforcement Learning for Robustness Against Modelling Error in Autonomous Racing |
Andrew Murdoch et.al. |
2312.06406 |
null |
2023-12-11 |
FOSS: A Self-Learned Doctor for Query Optimizer |
Kai Zhong et.al. |
2312.06357 |
null |
2023-12-11 |
DiffAIL: Diffusion Adversarial Imitation Learning |
Bingzheng Wang et.al. |
2312.06348 |
link |
2023-12-11 |
Dropout is all you need: robust two-qubit gate with reinforcement learning |
Tian-Niu Xu et.al. |
2312.06335 |
null |
2023-12-11 |
Mobile Edge Computing and AI Enabled Web3 Metaverse over 6G Wireless Communications: A Deep Reinforcement Learning Approach |
Wenhan Yu et.al. |
2312.06293 |
null |
2023-12-11 |
No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning |
Dianyu Zhong et.al. |
2312.06258 |
link |
2023-12-08 |
TaskMet: Task-Driven Metric Learning for Model Learning |
Dishank Bansal et.al. |
2312.05250 |
null |
2023-12-08 |
Modeling Risk in Reinforcement Learning: A Literature Mapping |
Leonardo Villalobos-Arias et.al. |
2312.05231 |
null |
2023-12-08 |
DARLEI: Deep Accelerated Reinforcement Learning with Evolutionary Intelligence |
Saeejith Nair et.al. |
2312.05171 |
null |
2023-12-08 |
Onflow: an online portfolio allocation algorithm |
Gabriel Turinici et.al. |
2312.05169 |
null |
2023-12-08 |
Multi-Agent Reinforcement Learning via Distributed MPC as a Function Approximator |
Samuel Mallick et.al. |
2312.05166 |
link |
2023-12-08 |
A Review of Cooperation in Multi-agent Learning |
Yali Du et.al. |
2312.05162 |
null |
2023-12-08 |
Learning to Fly Omnidirectional Micro Aerial Vehicles with an End-To-End Control Network |
Eugenio Cuniato et.al. |
2312.05125 |
null |
2023-12-08 |
An Autonomous Driving model with BEV-V2X Perception, Trajectory Prediction and Driving Planning in Complex Traffic Intersections |
Fukang Li et.al. |
2312.05104 |
null |
2023-12-08 |
UniTSA: A Universal Reinforcement Learning Framework for V2X Traffic Signal Control |
Maonan Wang et.al. |
2312.05090 |
link |
2023-12-08 |
Robotic Control of the Deformation of Soft Linear Objects Using Deep Reinforcement Learning |
Mélodie Hani Daniel Zakaria et.al. |
2312.05056 |
link |
2023-12-07 |
Data-Driven Robust Reinforcement Learning Control of Uncertain Nonlinear Systems: Towards a Fully-Automated, Insulin-Based Artificial Pancreas |
Alexandros Tanzanakis et.al. |
2312.04503 |
null |
2023-12-07 |
Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation |
Jiayi Huang et.al. |
2312.04464 |
null |
2023-12-07 |
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization |
Carlos E. Luis et.al. |
2312.04386 |
null |
2023-12-07 |
HARQ-IR Aided Short Packet Communications: BLER Analysis and Throughput Maximization |
Fuchao He et.al. |
2312.04377 |
null |
2023-12-07 |
A Scalable Network-Aware Multi-Agent Reinforcement Learning Framework for Decentralized Inverter-based Voltage Control |
Han Xu et.al. |
2312.04371 |
null |
2023-12-07 |
Learning to sample in Cartesian MRI |
Thomas Sanchez et.al. |
2312.04327 |
null |
2023-12-07 |
iDesigner: A High-Resolution and Complex-Prompt Following Text-to-Image Diffusion Model for Interior Design |
Ruyi Gan et.al. |
2312.04326 |
null |
2023-12-07 |
Multi Actor-Critic DDPG for Robot Action Space Decomposition: A Framework to Control Large 3D Deformation of Soft Linear Objects |
Mélodie Daniel et.al. |
2312.04308 |
link |
2023-12-07 |
Dynamic Data-Driven Digital Twins for Blockchain Systems |
Georgios Diamantopoulos et.al. |
2312.04226 |
null |
2023-12-07 |
CODEX: A Cluster-Based Method for Explainable Reinforcement Learning |
Timothy K. Mathes et.al. |
2312.04216 |
link |
2023-12-06 |
On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer |
Elie Aljalbout et.al. |
2312.03673 |
null |
2023-12-06 |
MICRACLE: Inverse Reinforcement and Curriculum Learning Model for Human-inspired Mobile Robot Navigation |
Nihal Gunukula et.al. |
2312.03651 |
null |
2023-12-06 |
MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment |
Ziyan Wang et.al. |
2312.03644 |
null |
2023-12-06 |
MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations |
Assaf Ben-Kish et.al. |
2312.03631 |
link |
2023-12-06 |
Evaluation of Active Feature Acquisition Methods for Static Feature Settings |
Henrik von Kleist et.al. |
2312.03619 |
null |
2023-12-06 |
Physical Symbolic Optimization |
Wassim Tenachi et.al. |
2312.03612 |
link |
2023-12-06 |
Generalized Contrastive Divergence: Joint Training of Energy-Based Model and Diffusion Model through Inverse Reinforcement Learning |
Sangwoong Yoon et.al. |
2312.03397 |
null |
2023-12-06 |
Diffused Task-Agnostic Milestone Planner |
Mineui Hong et.al. |
2312.03395 |
null |
2023-12-06 |
Demand response for residential building heating: Effective Monte Carlo Tree Search control based on physics-informed neural networks |
Fabio Pavirani et.al. |
2312.03365 |
null |
2023-12-06 |
Masking Behaviors in Epidemiological Networks with Cognitively-plausible Reinforcement Learning |
Konstantinos Mitsopoulos et.al. |
2312.03301 |
null |
2023-12-05 |
Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World |
Kiana Ehsani et.al. |
2312.02976 |
null |
2023-12-05 |
Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications |
Rajeeva L. Karandikar et.al. |
2312.02828 |
null |
2023-12-05 |
Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems |
Céline Comte et.al. |
2312.02804 |
null |
2023-12-05 |
LExCI: A Framework for Reinforcement Learning with Embedded Systems |
Kevin Badalian et.al. |
2312.02739 |
link |
2023-12-05 |
Hierarchical Visual Policy Learning for Long-Horizon Robot Manipulation in Densely Cluttered Scenes |
Hecheng Wang et.al. |
2312.02697 |
null |
2023-12-05 |
Contact Energy Based Hindsight Experience Prioritization |
Erdi Sayar et.al. |
2312.02677 |
null |
2023-12-05 |
A Q-learning approach to the continuous control problem of robot inverted pendulum balancing |
Mohammad Safeea et.al. |
2312.02649 |
null |
2023-12-05 |
DanZero+: Dominating the GuanDan Game through Reinforcement Learning |
Youpeng Zhao et.al. |
2312.02561 |
link |
2023-12-05 |
PolyFit: A Peg-in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to-real Adaptation |
Geonhyup Lee et.al. |
2312.02531 |
null |
2023-12-05 |
MASP: Scalable GNN-based Planning for Multi-Agent Navigation |
Xinyi Yang et.al. |
2312.02522 |
null |
2023-12-04 |
Optimizing Camera Configurations for Multi-View Pedestrian Detection |
Yunzhong Hou et.al. |
2312.02144 |
null |
2023-12-04 |
Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models |
Xingyuan Zhang et.al. |
2312.02019 |
link |
2023-12-04 |
CaRL: Cascade Reinforcement Learning with State Space Splitting for O-RAN based Traffic Steering |
Chuanneng Sun et.al. |
2312.01970 |
null |
2023-12-04 |
Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities |
Markus Wulfmeier et.al. |
2312.01939 |
null |
2023-12-04 |
A Reliable Representation with Bidirectional Transition Model for Visual Reinforcement Learning Generalization |
Xiaobo Hu et.al. |
2312.01915 |
null |
2023-12-04 |
Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters |
Aksel Vaaler et.al. |
2312.01855 |
null |
2023-12-04 |
Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing |
Ying Yuan et.al. |
2312.01853 |
null |
2023-12-04 |
Integrated Drill Boom Hole-Seeking Control via Reinforcement Learning |
Haoqi Yan et.al. |
2312.01836 |
null |
2023-12-04 |
Learning Machine Morality through Experience and Interaction |
Elizaveta Tennant et.al. |
2312.01818 |
null |
2023-12-04 |
Class Symbolic Regression: Gotta Fit 'Em All |
Wassim Tenachi et.al. |
2312.01816 |
link |
2023-12-01 |
Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space |
Xiaoyuan Cheng et.al. |
2312.00727 |
null |
2023-12-01 |
Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version) |
Emma Cramer et.al. |
2312.00592 |
link |
2023-12-01 |
Explainable Fraud Detection with Deep Symbolic Classification |
Samantha Visbeek et.al. |
2312.00586 |
link |
2023-12-01 |
Interior Point Constrained Reinforcement Learning with Global Convergence Guarantees |
Tingting Ni et.al. |
2312.00561 |
null |
2023-12-01 |
GFN-SR: Symbolic Regression with Generative Flow Networks |
Sida Li et.al. |
2312.00396 |
link |
2023-12-01 |
TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning |
Dohyeong Kim et.al. |
2312.00344 |
null |
2023-12-01 |
Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk |
Dohyeong Kim et.al. |
2312.00342 |
null |
2023-12-01 |
UAV-Aided Lifelong Learning for AoI and Energy Optimization in Non-Stationary IoT Networks |
Zhenzhen Gong et.al. |
2312.00334 |
null |
2023-12-01 |
Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement Learning Approach |
Xingqiu He et.al. |
2312.00279 |
link |
2023-12-01 |
Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration |
Viraj Mehta et.al. |
2312.00267 |
null |
2023-11-30 |
Language Model Agents Suffer from Compositional Generalization in Web Automation |
Hiroki Furuta et.al. |
2311.18751 |
link |
2023-11-30 |
Controlgym: Large-Scale Safety-Critical Control Environments for Benchmarking Reinforcement Learning Algorithms |
Xiangyuan Zhang et.al. |
2311.18736 |
link |
2023-11-30 |
Predictable Reinforcement Learning Dynamics through Entropy Rate Minimization |
Daniel Jarne Ornia et.al. |
2311.18703 |
link |
2023-11-30 |
Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning |
Jared Markowitz et.al. |
2311.18684 |
null |
2023-11-30 |
Generalisable Agents for Neural Network Optimisation |
Kale-ab Tessera et.al. |
2311.18598 |
null |
2023-11-30 |
Optimizing ZX-Diagrams with Deep Reinforcement Learning |
Maximilian Nägele et.al. |
2311.18588 |
link |
2023-11-30 |
Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control |
Bernd Frauenknecht et.al. |
2311.18393 |
null |
2023-11-30 |
URLLC-Awared Resource Allocation for Heterogeneous Vehicular Edge Computing |
Qiong Wu et.al. |
2311.18352 |
null |
2023-11-30 |
Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent |
Bianca Marin Moreno et.al. |
2311.18346 |
null |
2023-11-30 |
Deep Reinforcement Learning Based Optimal Energy Management of Multi-energy Microgrids with Uncertainties |
Yang Cui et.al. |
2311.18327 |
null |
2023-11-29 |
Maximum Entropy Model Correction in Reinforcement Learning |
Amin Rakhsha et.al. |
2311.17855 |
null |
2023-11-29 |
Identifying Dynamic Regulation with Adversarial Surrogates |
Ron Teichner et.al. |
2311.17783 |
null |
2023-11-29 |
Q-learning Based Optimal False Data Injection Attack on Probabilistic Boolean Control Networks |
Xianlun Peng et.al. |
2311.17631 |
null |
2023-11-29 |
LanGWM: Language Grounded World Model |
Rudra P. K. Poudel et.al. |
2311.17593 |
null |
2023-11-29 |
Deep Reinforcement Learning Graphs: Feedback Motion Planning via Neural Lyapunov Verification |
Armin Ghanbarzadeh et.al. |
2311.17587 |
null |
2023-11-29 |
Bias Resilient Multi-Step Off-Policy Goal-Conditioned Reinforcement Learning |
Lisheng Wu et.al. |
2311.17565 |
null |
2023-11-29 |
Reinforcement Learning with thermal fluctuations at the nano-scale |
Francesco Boccardo et.al. |
2311.17519 |
null |
2023-11-29 |
Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement Learning |
Swaroop Nath et.al. |
2311.17514 |
link |
2023-11-29 |
Unveiling the Implicit Toxicity in Large Language Models |
Jiaxin Wen et.al. |
2311.17391 |
link |
2023-11-29 |
Data-driven Bandwidth Adaptation for Radio Access Network Slices |
Panagiotis Nikolaidis et.al. |
2311.17347 |
null |
2023-11-28 |
Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications |
Jun Wang et.al. |
2311.17059 |
null |
2023-11-28 |
An Investigation of Time Reversal Symmetry in Reinforcement Learning |
Brett Barkley et.al. |
2311.17008 |
null |
2023-11-28 |
Goal-conditioned Offline Planning from Curious Exploration |
Marco Bagatella et.al. |
2311.16996 |
null |
2023-11-28 |
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up? |
Hailin Chen et.al. |
2311.16989 |
link |
2023-11-28 |
Bidirectional Reactive Programming for Machine Learning |
Dumitru Potop Butucaru et.al. |
2311.16977 |
null |
2023-11-28 |
End-to-end Reinforcement Learning for Time-Optimal Quadcopter Flight |
Robin Ferede et.al. |
2311.16948 |
null |
2023-11-28 |
Optimization Theory Based Deep Reinforcement Learning for Resource Allocation in Ultra-Reliable Wireless Networked Control Systems |
Hamida Qumber Ali et.al. |
2311.16895 |
null |
2023-11-28 |
Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks Slicing |
Zhengming Zhang et.al. |
2311.16876 |
null |
2023-11-28 |
Edge AI for Internet of Energy: Challenges and Perspectives |
Yassine Himeur et.al. |
2311.16851 |
null |
2023-11-28 |
Two-step dynamic obstacle avoidance |
Fabian Hart et.al. |
2311.16841 |
null |
2023-11-27 |
Interactive Autonomous Navigation with Internal State Inference and Interactivity Estimation |
Jiachen Li et.al. |
2311.16091 |
null |
2023-11-27 |
Evaluating the Impact of Personalized Value Alignment in Human-Robot Interaction: Insights into Trust and Team Performance Outcomes |
Shreyas Bhat et.al. |
2311.16051 |
null |
2023-11-27 |
Value-Based Reinforcement Learning for Digital Twins in Cloud Computing |
Van-Phuc Bui et.al. |
2311.15985 |
null |
2023-11-27 |
Adaptive Agents and Data Quality in Agent-Based Financial Markets |
Colin M. Van Oort et.al. |
2311.15974 |
null |
2023-11-27 |
Addressing Long-Horizon Tasks by Integrating Program Synthesis and State Machines |
Yu-An Lin et.al. |
2311.15960 |
null |
2023-11-27 |
Replay across Experiments: A Natural Extension of Off-Policy RL |
Dhruva Tirumala et.al. |
2311.15951 |
null |
2023-11-27 |
Reinforcement Learning for Wildfire Mitigation in Simulated Disaster Environments |
Alexander Tapley et.al. |
2311.15925 |
link |
2023-11-27 |
A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning |
Jianxiong Li et.al. |
2311.15920 |
null |
2023-11-27 |
Distributed Attacks over Federated Reinforcement Learning-enabled Cell Sleep Control |
Han Zhang et.al. |
2311.15894 |
null |
2023-11-27 |
Multi-Agent Reinforcement Learning for Power Control in Wireless Networks via Adaptive Graphs |
Lorenzo Mario Amorosa et.al. |
2311.15858 |
null |
2023-11-24 |
Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language |
Di Jin et.al. |
2311.14543 |
null |
2023-11-24 |
Digital Twin-Native AI-Driven Service Architecture for Industrial Networks |
Kubra Duran et.al. |
2311.14532 |
null |
2023-11-24 |
How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation |
Zicong Zhao et.al. |
2311.14457 |
null |
2023-11-24 |
Universal Jailbreak Backdoors from Poisoned Human Feedback |
Javier Rando et.al. |
2311.14455 |
link |
2023-11-24 |
Approximation of Convex Envelope Using Reinforcement Learning |
Vivek S. Borkar et.al. |
2311.14421 |
null |
2023-11-24 |
Directly Attention Loss Adjusted Prioritized Experience Replay |
Zhuoying Chen et.al. |
2311.14390 |
null |
2023-11-24 |
AI-based Attack Graph Generation |
Sangbeom Park et.al. |
2311.14342 |
null |
2023-11-24 |
Offline Skill Generalization via Task and Motion Planning |
Shin Watanabe et.al. |
2311.14328 |
null |
2023-11-24 |
On optimal tracking portfolio in incomplete markets: The classical control and the reinforcement learning approaches |
Lijun Bo et.al. |
2311.14318 |
null |
2023-11-24 |
Multi-modal Instance Refinement for Cross-domain Action Recognition |
Yuan Qing et.al. |
2311.14281 |
null |
2023-11-22 |
Risk-sensitive Markov Decision Process and Learning under General Utility Functions |
Zhengqi Wu et.al. |
2311.13589 |
null |
2023-11-22 |
Guided Flows for Generative Modeling and Decision Making |
Qinqing Zheng et.al. |
2311.13443 |
null |
2023-11-22 |
From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex? |
Yannik Keller et.al. |
2311.13414 |
link |
2023-11-22 |
Large Language Model is a Good Policy Teacher for Training Reinforcement Learning Agents |
Zihao Zhou et.al. |
2311.13373 |
link |
2023-11-22 |
Probabilistic Inference in Reinforcement Learning Done Right |
Jean Tarbouriech et.al. |
2311.13294 |
null |
2023-11-22 |
Intention and Context Elicitation with Large Language Models in the Legal Aid Intake Process |
Nick Goodson et.al. |
2311.13281 |
null |
2023-11-22 |
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model |
Kai Yang et.al. |
2311.13231 |
link |
2023-11-22 |
AdaptiveFL: Adaptive Heterogeneous Federated Learning for Resource-Constrained AIoT Systems |
Chentao Jia et.al. |
2311.13166 |
null |
2023-11-22 |
Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications |
Ha-Thanh Nguyen et.al. |
2311.13095 |
null |
2023-11-22 |
Learning to Fly in Seconds |
Jonas Eschmann et.al. |
2311.13081 |
link |
2023-11-21 |
Decentralised Q-Learning for Multi-Agent Markov Decision Processes with a Satisfiability Criterion |
Keshav P. Keval et.al. |
2311.12613 |
null |
2023-11-21 |
Reinforcement Learning for the Near-Optimal Design of Zero-Delay Codes for Markov Sources |
Liam Cregg et.al. |
2311.12609 |
null |
2023-11-21 |
Scheduling Distributed Flexible Assembly Lines using Safe Reinforcement Learning with Soft Shielding |
Lele Li et.al. |
2311.12572 |
null |
2023-11-21 |
Multi-Session Budget Optimization for Forward Auction-based Federated Learning |
Xiaoli Tang et.al. |
2311.12548 |
null |
2023-11-21 |
Towards Faster Reinforcement Learning of Quantum Circuit Optimization: Exponential Reward Functions |
Ioana Moflic et.al. |
2311.12509 |
null |
2023-11-21 |
Cost Explosion for Efficient Reinforcement Learning Optimisation of Quantum Circuits |
Ioana Moflic et.al. |
2311.12498 |
null |
2023-11-21 |
Multi-Objective Reinforcement Learning based on Decomposition: A taxonomy and framework |
Florian Felten et.al. |
2311.12495 |
link |
2023-11-21 |
Reinforcement Learning for Stochastic LQ Control of Discrete-Time Systems with Multiplicative Noises |
Hongdan Li et.al. |
2311.12322 |
null |
2023-11-21 |
Resilient Control of Networked Microgrids using Vertical Federated Reinforcement Learning: Designs and Real-Time Test-Bed Validations |
Sayak Mukherjee et.al. |
2311.12264 |
null |
2023-11-21 |
Beyond Simulated Drivers: Evaluating the Impact of Real-World Car-Following in Mixed Traffic Control |
Bibek Poudel et.al. |
2311.12261 |
link |
2023-11-20 |
Provably Efficient CVaR RL in Low-rank MDPs |
Yulai Zhao et.al. |
2311.11965 |
null |
2023-11-20 |
Continual Learning: Applications and the Road Forward |
Eli Verwimp et.al. |
2311.11908 |
null |
2023-11-20 |
Few-shot Multispectral Segmentation with Representations Generated by Reinforcement Learning |
Dilith Jayakody et.al. |
2311.11827 |
null |
2023-11-20 |
AIaaS for ORAN-based 6G Networks: Multi-time scale slice resource management with DRL |
Suvidha Mhatre et.al. |
2311.11668 |
null |
2023-11-20 |
Replay-enhanced Continual Reinforcement Learning |
Tiantian Zhang et.al. |
2311.11557 |
link |
2023-11-20 |
ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning |
Yizhao Jin et.al. |
2311.11537 |
null |
2023-11-19 |
Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets |
Kun Yang et.al. |
2311.11423 |
null |
2023-11-19 |
Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts |
Ahmed Hendawy et.al. |
2311.11385 |
link |
2023-11-19 |
Dynamic System Stability Verification Using Numerical Simulator |
Jongrae Kim et.al. |
2311.11372 |
null |
2023-11-19 |
Tactile Active Inference Reinforcement Learning for Efficient Robotic Manipulation Skill Acquisition |
Zihao Liu et.al. |
2311.11287 |
null |
2023-11-17 |
EduGym: An Environment Suite for Reinforcement Learning Education |
Thomas M. Moerland et.al. |
2311.10590 |
link |
2023-11-17 |
Learning Agile Locomotion on Risky Terrains |
Chong Zhang et.al. |
2311.10484 |
null |
2023-11-17 |
Decentralized Energy Marketplace via NFTs and AI-based Agents |
Rasoul Nikbakht et.al. |
2311.10406 |
link |
2023-11-17 |
Joint Sensing and Communication Optimization in Target-Mounted STARS-Assisted Vehicular Networks: A MADRL Approach |
Haocheng Zhang et.al. |
2311.10352 |
null |
2023-11-17 |
Imagination-augmented Hierarchical Reinforcement Learning for Safe and Interactive Autonomous Driving in Urban Environments |
Sang-Hyun Lee et.al. |
2311.10309 |
null |
2023-11-17 |
From "Thumbs Up" to "10 out of 10": Reconsidering Scalar Feedback in Interactive Reinforcement Learning |
Hang Yu et.al. |
2311.10284 |
null |
2023-11-16 |
Data-Driven LQR using Reinforcement Learning and Quadratic Neural Networks |
Soroush Asri et.al. |
2311.10235 |
null |
2023-11-17 |
JaxMARL: Multi-Agent RL Environments in JAX |
Alexander Rutherford et.al. |
2311.10090 |
link |
2023-11-16 |
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback |
Yangyi Chen et.al. |
2311.10081 |
null |
2023-11-16 |
Interpretable Reinforcement Learning for Robotics and Continuous Control |
Rohan Paleja et.al. |
2311.10041 |
link |
2023-11-16 |
Guaranteeing Control Requirements via Reward Shaping in Reinforcement Learning |
Francesco De Lellis et.al. |
2311.10026 |
link |
2023-11-16 |
Online Optimization for Network Resource Allocation and Comparison with Reinforcement Learning Techniques |
Ahmed Sid-Ali et.al. |
2311.10023 |
null |
2023-11-16 |
Safety Aware Autonomous Path Planning Using Model Predictive Reinforcement Learning for Inland Waterways |
Astrid Vanneste et.al. |
2311.09878 |
null |
2023-11-16 |
Short vs. Long-term Coordination of Drones: When Distributed Optimization Meets Deep Reinforcement Learning |
Chuhao Qin et.al. |
2311.09852 |
null |
2023-11-16 |
Runtime Verification of Learning Properties for Reinforcement Learning Algorithms |
Tommaso Mannucci et.al. |
2311.09811 |
null |
2023-11-16 |
Prudent Silence or Foolish Babble? Examining Large Language Models' Responses to the Unknown |
Genglin Liu et.al. |
2311.09731 |
link |
2023-11-16 |
Augmenting Unsupervised Reinforcement Learning with Self-Reference |
Andrew Zhao et.al. |
2311.09692 |
null |
2023-11-15 |
Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge |
Sang-Hyun Lee et.al. |
2311.09195 |
null |
2023-11-15 |
Grounding or Guesswork? Large Language Models are Presumptive Grounders |
Omar Shaikh et.al. |
2311.09144 |
null |
2023-11-15 |
Aligning Neural Machine Translation Models: Human Feedback in Training and Inference |
Miguel Moura Ramos et.al. |
2311.09132 |
null |
2023-11-15 |
Assessing the Robustness of Intelligence-Driven Reinforcement Learning |
Lorenzo Nodari et.al. |
2311.09027 |
null |
2023-11-15 |
On the Foundation of Distributionally Robust Reinforcement Learning |
Shengbo Wang et.al. |
2311.09018 |
null |
2023-11-15 |
Adversarial Attacks to Reward Machine-based Reinforcement Learning |
Lorenzo Nodari et.al. |
2311.09014 |
null |
2023-11-15 |
Supported Trust Region Optimization for Offline Reinforcement Learning |
Yixiu Mao et.al. |
2311.08935 |
null |
2023-11-15 |
Efficiently Escaping Saddle Points for Non-Convex Policy Optimization |
Sadegh Khorasani et.al. |
2311.08914 |
null |
2023-11-15 |
An MRL-Based Design Solution for RIS-Assisted MU-MIMO Wireless System under Time-Varying Channels |
Meng-Qian Alexander Wu et.al. |
2311.08840 |
null |
2023-11-15 |
A Deep Reinforcement Learning Approach to Efficient Distributed Optimization |
Daokuan Zhu et.al. |
2311.08827 |
null |
2023-11-14 |
MVSA-Net: Multi-View State-Action Recognition for Robust and Deployable Trajectory Generation |
Ehsan Asali et.al. |
2311.08393 |
null |
2023-11-14 |
Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding |
Guangyu Yang et.al. |
2311.08380 |
link |
2023-11-14 |
Workflow-Guided Response Generation for Task-Oriented Dialogue |
Do June Min et.al. |
2311.08300 |
null |
2023-11-14 |
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling |
Nicholas E. Corrado et.al. |
2311.08290 |
null |
2023-11-14 |
Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework |
Weiqin Zu et.al. |
2311.08244 |
null |
2023-11-14 |
When Mining Electric Locomotives Meet Reinforcement Learning |
Ying Li et.al. |
2311.08153 |
null |
2023-11-14 |
Probable Object Location (POLo) Score Estimation for Efficient Object Goal Navigation |
Jiaming Wang et.al. |
2311.07992 |
null |
2023-11-14 |
AutoML for Large Capacity Modeling of Meta Ranking Systems |
Hang Yin et.al. |
2311.07870 |
null |
2023-11-14 |
A Neuro-Inspired Hierarchical Reinforcement Learning for Motor Control |
Pei Zhang et.al. |
2311.07822 |
null |
2023-11-13 |
Reinforcement Learning for Solving Stochastic Vehicle Routing Problem |
Zangir Iklassov et.al. |
2311.07708 |
link |
2023-11-13 |
Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning |
Arjun Bhardwaj et.al. |
2311.07558 |
null |
2023-11-13 |
Investigating Robustness in Cyber-Physical Systems: Specification-Centric Analysis in the face of System Deviations |
Changjian Zhang et.al. |
2311.07462 |
null |
2023-11-13 |
Goal-oriented Estimation of Multiple Markov Sources in Resource-constrained Systems |
Jiping Luo et.al. |
2311.07346 |
null |
2023-11-13 |
An introduction to reinforcement learning for neuroscience |
Kristopher T. Jensen et.al. |
2311.07315 |
null |
2023-11-13 |
C-Procgen: Empowering Procgen with Controllable Contexts |
Zhenxiong Tan et.al. |
2311.07312 |
null |
2023-11-13 |
TIAGo RL: Simulated Reinforcement Learning Environments with Tactile Data for Mobile Robots |
Luca Lach et.al. |
2311.07260 |
null |
2023-11-13 |
Towards Transferring Tactile-based Continuous Force Control Policies from Simulation to Robot |
Luca Lach et.al. |
2311.07245 |
null |
2023-11-13 |
STEER: Unified Style Transfer with Expert Reinforcement |
Skyler Hallinan et.al. |
2311.07167 |
link |
2023-11-13 |
Untargeted Black-box Attacks for Social Recommendations |
Wenqi Fan et.al. |
2311.07127 |
null |
2023-11-12 |
FLASH-RL: Federated Learning Addressing System and Static Heterogeneity using Reinforcement Learning |
Sofiane Bouaziz et.al. |
2311.06917 |
link |
2023-11-10 |
Multi-Agent Reinforcement Learning for the Low-Level Control of a Quadrotor UAV |
Beomyeol Yu et.al. |
2311.06144 |
link |
2023-11-10 |
Intersection-free Robot Manipulation with Soft-Rigid Coupled Incremental Potential Contact |
Wenxin Du et.al. |
2311.05945 |
null |
2023-11-10 |
Learning-Augmented Scheduling for Solar-Powered Electric Vehicle Charging |
Tongxin Li et.al. |
2311.05941 |
null |
2023-11-10 |
Genetic Algorithm enhanced by Deep Reinforcement Learning in parent selection mechanism and mutation : Minimizing makespan in permutation flow shop scheduling problems |
Maissa Irmouli et.al. |
2311.05937 |
null |
2023-11-10 |
Clipped-Objective Policy Gradients for Pessimistic Policy Optimization |
Jared Markowitz et.al. |
2311.05846 |
null |
2023-11-10 |
Let's Reinforce Step by Step |
Sarah Pan et.al. |
2311.05821 |
null |
2023-11-09 |
Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning |
Aaryan Singhal et.al. |
2311.05780 |
link |
2023-11-09 |
Advancing Algorithmic Trading: A Multi-Technique Enhancement of Deep Q-Network Models |
Gang Hu et.al. |
2311.05743 |
null |
2023-11-09 |
LLM Augmented Hierarchical Agents |
Bharat Prakash et.al. |
2311.05596 |
null |
2023-11-09 |
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations |
Joey Hong et.al. |
2311.05584 |
null |
2023-11-09 |
Joint SDN Synchronization and Controller Placement in Wireless Networks using Deep Reinforcement Learning |
Akrit Mudvari et.al. |
2311.05582 |
null |
2023-11-09 |
Removing RLHF Protections in GPT-4 via Fine-Tuning |
Qiusi Zhan et.al. |
2311.05553 |
null |
2023-11-09 |
Multi-Agent Quantum Reinforcement Learning using Evolutionary Optimization |
Michael Kölle et.al. |
2311.05546 |
null |
2023-11-09 |
Anytime-Constrained Reinforcement Learning |
Jeremy McMahan et.al. |
2311.05511 |
link |
2023-11-09 |
From "What" to "When" -- a Spiking Neural Network Predicting Rare Events and Time to their Occurrence |
Mikhail Kiselev et.al. |
2311.05210 |
null |
2023-11-09 |
Counter-Empirical Attacking based on Adversarial Reinforcement Learning for Time-Relevant Scoring System |
Xiangguo Sun et.al. |
2311.05144 |
link |
2023-11-09 |
Accelerating Exploration with Unlabeled Prior Data |
Qiyang Li et.al. |
2311.05067 |
link |
2023-11-08 |
Reinforcement Learning Generalization for Nonlinear Systems Through Dual-Scale Homogeneity Transformations |
Abdel Gafoor Haddad et.al. |
2311.05013 |
null |
2023-11-08 |
Real-Time Recurrent Reinforcement Learning |
Julian Lemmel et.al. |
2311.04830 |
null |
2023-11-08 |
Simultaneous Discovery of Quantum Error Correction Codes and Encoders with a Noise-Aware Reinforcement Learning Agent |
Jan Olle et.al. |
2311.04750 |
link |
2023-11-08 |
Enhancing Multi-Agent Coordination through Common Operating Picture Integration |
Peihong Yu et.al. |
2311.04740 |
null |
2023-11-08 |
Social Motion Prediction with Cognitive Hierarchies |
Wentao Zhu et.al. |
2311.04726 |
null |
2023-11-08 |
RDGCN: Reinforced Dependency Graph Convolutional Network for Aspect-based Sentiment Analysis |
Xusheng Zhao et.al. |
2311.04467 |
link |
2023-11-07 |
Force-Constrained Visual Policy: Safe Robot-Assisted Dressing via Multi-Modal Sensing |
Zhanyi Sun et.al. |
2311.04390 |
null |
2023-11-07 |
Adaptive Stochastic Nonlinear Model Predictive Control with Look-ahead Deep Reinforcement Learning for Autonomous Vehicle Motion Control |
Baha Zarrouki et.al. |
2311.04303 |
null |
2023-11-07 |
Compilation of product-formula Hamiltonian simulation via reinforcement learning |
Lea M. Trenkwalder et.al. |
2311.04285 |
link |
2023-11-07 |
Interactive Semantic Map Representation for Skill-based Visual Object Navigation |
Tatiana Zemskova et.al. |
2311.04107 |
null |
2023-11-07 |
Time-Efficient Reinforcement Learning with Stochastic Stateful Policies |
Firas Al-Hafez et.al. |
2311.04082 |
null |
2023-11-07 |
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment |
Geyang Guo et.al. |
2311.04072 |
link |
2023-11-07 |
Estimator-Coupled Reinforcement Learning for Robust Purely Tactile In-Hand Manipulation |
Lennart Röstel et.al. |
2311.04060 |
null |
2023-11-07 |
Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features |
Diogo Cruz et.al. |
2311.04046 |
link |
2023-11-07 |
A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems |
Cheng Yin et.al. |
2311.04014 |
null |
2023-11-07 |
Learning-Based Latency-Constrained Fronthaul Compression Optimization in C-RAN |
Axel Grönland et.al. |
2311.03899 |
null |
2023-11-07 |
On Deep Reinforcement Learning for Traffic Steering Intelligent ORAN |
Fatemeh Kavehmadavani et.al. |
2311.03853 |
null |
2023-11-07 |
Learning Decentralized Traffic Signal Controllers with Multi-Agent Graph Reinforcement Learning |
Yao Zhang et.al. |
2311.03756 |
null |
2023-11-07 |
Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning |
Joseph Suárez et.al. |
2311.03736 |
null |
2023-11-06 |
Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization |
Kun Lei et.al. |
2311.03351 |
link |
2023-11-06 |
A Brain-inspired Theory of Collective Mind Model for Efficient Social Cooperation |
Zhuoya Zhao et.al. |
2311.03150 |
null |
2023-11-06 |
Reinforcement Learning for Inverse Linear-quadratic Dynamic Non-cooperative Games |
Emin Martirosyan et.al. |
2311.03044 |
null |
2023-11-06 |
Virtual Action Actor-Critic Framework for Exploration (Student Abstract) |
Bumgeun Park et.al. |
2311.02916 |
null |
2023-11-06 |
Reinforcement Learning for Safety Testing: Lessons from A Mobile Robot Case Study |
Tom P. Huck et.al. |
2311.02907 |
null |
2023-11-06 |
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs |
Wenke Xia et.al. |
2311.02847 |
link |
2023-11-05 |
ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs |
Yann Hicke et.al. |
2311.02775 |
null |
2023-11-05 |
Causal Question Answering with Reinforcement Learning |
Lukas Blübaum et.al. |
2311.02760 |
link |
2023-11-05 |
Staged Reinforcement Learning for Complex Tasks through Decomposed Environments |
Rafael Pina et.al. |
2311.02746 |
null |
2023-11-05 |
Learning Independently from Causality in Multi-Agent Environments |
Rafael Pina et.al. |
2311.02741 |
null |
2023-11-03 |
DeliverAI: Reinforcement Learning Based Distributed Path-Sharing Network for Food Deliveries |
Ashman Mehra et.al. |
2311.02017 |
null |
2023-11-03 |
Score Models for Offline Goal-Conditioned Reinforcement Learning |
Harshit Sikchi et.al. |
2311.02013 |
null |
2023-11-03 |
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies |
Jonathan Colaco Carr et.al. |
2311.01990 |
null |
2023-11-03 |
Emergence of odd elasticity in a microswimmer using deep reinforcement learning |
Li-Shing Lin et.al. |
2311.01973 |
null |
2023-11-03 |
Domain Randomization via Entropy Maximization |
Gabriele Tiboni et.al. |
2311.01885 |
null |
2023-11-03 |
RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization |
Siqi Shen et.al. |
2311.01753 |
link |
2023-11-03 |
Epidemic Decision-making System Based Federated Reinforcement Learning |
Yangxi Zhou et.al. |
2311.01749 |
null |
2023-11-03 |
Energy Efficiency Optimization for Subterranean LoRaWAN Using A Reinforcement Learning Approach: A Direct-to-Satellite Scenario |
Kaiqiang Lin et.al. |
2311.01743 |
null |
2023-11-03 |
RDE: A Hybrid Policy Framework for Multi-Agent Path Finding Problem |
Jianqi Gao et.al. |
2311.01728 |
null |
2023-11-03 |
Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula |
Aryaman Reddi et.al. |
2311.01642 |
null |
2023-11-02 |
Conformal Policy Learning for Sensorimotor Control Under Distribution Shifts |
Huang Huang et.al. |
2311.01457 |
null |
2023-11-02 |
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation |
Yufei Wang et.al. |
2311.01455 |
null |
2023-11-02 |
DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing |
Vint Lee et.al. |
2311.01450 |
null |
2023-11-02 |
Analysis of Information Propagation in Ethereum Network Using Combined Graph Attention Network and Reinforcement Learning to Optimize Network Efficiency and Scalability |
Stefan Kambiz Behfar et.al. |
2311.01406 |
null |
2023-11-02 |
Learning Realistic Traffic Agents in Closed-loop |
Chris Zhang et.al. |
2311.01394 |
null |
2023-11-02 |
Formal Methods for Autonomous Systems |
Tichakorn Wongpiromsarn et.al. |
2311.01258 |
null |
2023-11-02 |
EISim: A Platform for Simulating Intelligent Edge Orchestration Solutions |
Henna Kokkonen et.al. |
2311.01224 |
link |
2023-11-02 |
Diffusion Models for Reinforcement Learning: A Survey |
Zhengbang Zhu et.al. |
2311.01223 |
link |
2023-11-02 |
Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning |
Siming Lan et.al. |
2311.01075 |
link |
2023-11-02 |
Dynamic Fair Federated Learning Based on Reinforcement Learning |
Weikang Chen et.al. |
2311.00959 |
null |
2023-11-02 |
Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning |
Richard Bornemann et.al. |
2311.00651 |
null |
2023-11-01 |
Learning impartial policies for sequential counterfactual explanations using Deep Reinforcement Learning |
E. Panagiotou et.al. |
2311.00523 |
null |
2023-11-01 |
Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards |
Alain Andres et.al. |
2311.00426 |
null |
2023-11-01 |
Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems |
Hao Zhang et.al. |
2311.00388 |
null |
2023-11-01 |
QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning |
Rizhong Wang et.al. |
2311.00356 |
null |
2023-11-02 |
A Definition of Open-Ended Learning Problems for Goal-Conditioned Agents |
Olivier Sigaud et.al. |
2311.00344 |
null |
2023-11-01 |
Rethinking Decision Transformer via Hierarchical Reinforcement Learning |
Yi Ma et.al. |
2311.00267 |
null |
2023-11-01 |
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents |
Yang Deng et.al. |
2311.00262 |
link |
2023-11-01 |
Active Neural Topological Mapping for Multi-Agent Exploration |
Xinyi Yang et.al. |
2311.00252 |
null |
2023-11-01 |
Federated Natural Policy Gradient Methods for Multi-task Reinforcement Learning |
Tong Yang et.al. |
2311.00201 |
null |
2023-10-31 |
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity |
Joey Hong et.al. |
2310.20663 |
null |
2023-10-31 |
"Pick-and-Pass" as a Hat-Trick Class for First-Principle Memory, Generalizability, and Interpretability Benchmarks |
Jason Wang et.al. |
2310.20654 |
null |
2023-10-31 |
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B |
Simon Lermen et.al. |
2310.20624 |
null |
2023-10-31 |
Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback |
Max Balsells et.al. |
2310.20608 |
null |
2023-10-31 |
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning |
Ruizhe Shi et.al. |
2310.20587 |
link |
2023-10-31 |
Amoeba: Circumventing ML-supported Network Censorship via Adversarial Reinforcement Learning |
Haoyu Liu et.al. |
2310.20469 |
link |
2023-11-01 |
Dropout Strategy in Reinforcement Learning: Limiting the Surrogate Objective Variance in Policy Optimization Methods |
Zhengpeng Xie et.al. |
2310.20380 |
null |
2023-10-31 |
Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents |
Woojun Kim et.al. |
2310.20287 |
null |
2023-10-31 |
Beyond Average Return in Markov Decision Processes |
Alexandre Marthe et.al. |
2310.20266 |
null |
2023-10-31 |
Handover Protocol Learning for LEO Satellite Networks: Access Delay and Collision Minimization |
Ju-Hyung Lee et.al. |
2310.20215 |
null |
2023-10-30 |
Optimal Status Updates for Minimizing Age of Correlated Information in IoT Networks with Energy Harvesting Sensors |
Chao Xu et.al. |
2310.19216 |
link |
2023-10-29 |
Real-World Implementation of Reinforcement Learning Based Energy Coordination for a Cluster of Households |
Gargya Gokhale et.al. |
2310.19155 |
null |
2023-10-29 |
MAG-GNN: Reinforcement Learning Boosted Graph Neural Network |
Lecheng Kong et.al. |
2310.19142 |
null |
2023-10-29 |
Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep Reinforcement Learning |
Suraj Singireddy et.al. |
2310.19137 |
null |
2023-10-29 |
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery |
Katie Z Luo et.al. |
2310.19080 |
null |
2023-10-29 |
Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback |
Jingliang Duan et.al. |
2310.19022 |
null |
2023-10-31 |
Behavior Alignment via Reward Function Optimization |
Dhawal Gupta et.al. |
2310.19007 |
null |
2023-10-29 |
Spacecraft Autonomous Decision-Planning for Collision Avoidance: a Reinforcement Learning Approach |
Nicolas Bourriez et.al. |
2310.18966 |
null |
2023-10-29 |
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game |
Zelai Xu et.al. |
2310.18940 |
null |
2023-10-29 |
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation |
Nikki Lijing Kuang et.al. |
2310.18919 |
null |
2023-10-27 |
FP8-LM: Training FP8 Large Language Models |
Houwen Peng et.al. |
2310.18313 |
link |
2023-10-27 |
Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models |
Pushkal Katara et.al. |
2310.18308 |
null |
2023-10-27 |
Learning to Search Feasible and Infeasible Regions of Routing Problems with Flexible Neural k-Opt |
Yining Ma et.al. |
2310.18264 |
link |
2023-10-27 |
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning |
Nicholas E. Corrado et.al. |
2310.18247 |
null |
2023-10-27 |
DESiRED -- Dynamic, Enhanced, and Smart iRED: A P4-AQM with Deep Reinforcement Learning and In-band Network Telemetry |
Leandro C. de Almeida et.al. |
2310.18159 |
null |
2023-10-27 |
Improving Intrinsic Exploration by Creating Stationary Objectives |
Roger Creus Castanyer et.al. |
2310.18144 |
null |
2023-10-27 |
Ask more, know better: Reinforce-Learned Prompt Questions for Decision Making with Large Language Models |
Xue Yan et.al. |
2310.18127 |
null |
2023-10-27 |
Text2Bundle: Towards Personalized Query-based Bundle Generation |
Shixuan Zhu et.al. |
2310.18004 |
null |
2023-10-27 |
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning |
Shenzhi Wang et.al. |
2310.17966 |
link |
2023-10-27 |
Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation |
Wei Fan et.al. |
2310.17922 |
link |
2023-10-26 |
Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion |
Laura Smith et.al. |
2310.17634 |
null |
2023-10-26 |
Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity |
Jaedong Hwang et.al. |
2310.17537 |
link |
2023-10-26 |
Learning Regularized Graphon Mean-Field Games with Unknown Graphons |
Fengzhuo Zhang et.al. |
2310.17531 |
null |
2023-10-27 |
Adaptive Resource Management for Edge Network Slicing using Incremental Multi-Agent Deep Reinforcement Learning |
Haiyuan Li et.al. |
2310.17523 |
null |
2023-10-26 |
Orchestration of Emulator Assisted Mobile Edge Tuning for AI Foundation Models: A Multi-Agent Deep Reinforcement Learning Approach |
Wenhan Yu et.al. |
2310.17492 |
null |
2023-10-26 |
FedPEAT: Convergence of Federated Learning, Parameter-Efficient Fine Tuning, and Emulator Assisted Tuning for Artificial Intelligence Foundation Models with Mobile Edge Computing |
Terence Jie Chua et.al. |
2310.17491 |
null |
2023-10-26 |
Fair collaborative vehicle routing: A deep multi-agent reinforcement learning approach |
Stephen Mak et.al. |
2310.17485 |
null |
2023-10-26 |
Coalitional Bargaining via Reinforcement Learning: An Application to Collaborative Vehicle Routing |
Stephen Mak et.al. |
2310.17458 |
null |
2023-10-26 |
Goals are Enough: Inducing AdHoc cooperation among unseen Multi-Agent systems in IMFs |
Kaushik Dey et.al. |
2310.17416 |
null |
2023-10-26 |
CQM: Curriculum Reinforcement Learning with a Quantized World Model |
Seungjae Lee et.al. |
2310.17330 |
null |
2023-10-25 |
TD-MPC2: Scalable, Robust World Models for Continuous Control |
Nicklas Hansen et.al. |
2310.16828 |
null |
2023-10-25 |
AI Agent as Urban Planner: Steering Stakeholder Dynamics in Urban Planning via Consensus-based Multi-Agent Reinforcement Learning |
Kejiang Qian et.al. |
2310.16772 |
null |
2023-10-25 |
SuperHF: Supervised Iterative Learning from Human Feedback |
Gabriel Mukobi et.al. |
2310.16763 |
link |
2023-10-25 |
MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning |
Dong-Ki Kim et.al. |
2310.16730 |
null |
2023-10-25 |
Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies |
Michael Beukman et.al. |
2310.16686 |
link |
2023-10-25 |
BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories? |
Xingmeng Zhao et.al. |
2310.16681 |
link |
2023-10-25 |
UAV Pathfinding in Dynamic Obstacle Avoidance with Multi-agent Reinforcement Learning |
Qizhen Wu et.al. |
2310.16659 |
null |
2023-10-25 |
Towards Control-Centric Representations in Reinforcement Learning from Images |
Chen Liu et.al. |
2310.16655 |
null |
2023-10-25 |
Model predictive control-based value estimation for efficient reinforcement learning |
Qizhen Wu et.al. |
2310.16646 |
link |
2023-10-25 |
Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation |
Chengpeng Li et.al. |
2310.16566 |
null |
2023-10-24 |
AI Alignment and Social Choice: Fundamental Limitations and Policy Implications |
Abhilash Mishra et.al. |
2310.16048 |
null |
2023-10-25 |
WebWISE: Web Interface Control and Sequential Exploration with Large Language Models |
Heyi Tao et.al. |
2310.16042 |
null |
2023-10-24 |
Finetuning Offline World Models in the Real World |
Yunhai Feng et.al. |
2310.16029 |
null |
2023-10-24 |
Data-driven Traffic Simulation: A Comprehensive Review |
Di Chen et.al. |
2310.15975 |
null |
2023-10-24 |
State Sequences Prediction via Fourier Transform for Representation Learning |
Mingxuan Ye et.al. |
2310.15888 |
link |
2023-10-24 |
Control problems on infinite horizon subject to time-dependent pure state constraints |
Vincenzo Basco et.al. |
2310.15771 |
null |
2023-10-24 |
Recurrent Linear Transformers |
Subhojeet Pramanik et.al. |
2310.15719 |
link |
2023-10-24 |
Solving large flexible job shop scheduling instances by generating a diverse set of scheduling policies with deep reinforcement learning |
Imanol Echeverria et.al. |
2310.15706 |
null |
2023-10-24 |
DACOOP-A: Decentralized Adaptive Cooperative Pursuit via Attention |
Zheng Zhang et.al. |
2310.15699 |
link |
2023-10-25 |
COPF: Continual Learning Human Preference through Optimal Policy Fitting |
Han Zhang et.al. |
2310.15694 |
null |
2023-10-23 |
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning |
Jingyun Yang et.al. |
2310.15145 |
null |
2023-10-23 |
The primacy bias in Model-based RL |
Zhongjian Qiao et.al. |
2310.15017 |
null |
2023-10-23 |
Reinforcement learning in large, structured action spaces: A simulation study of decision support for spinal cord injury rehabilitation |
Nathan Phelps et.al. |
2310.14976 |
null |
2023-10-23 |
Comparison of path following in ships using modern and traditional controllers |
Sanjeev Kumar Ramkumar Sudha et.al. |
2310.14940 |
null |
2023-10-23 |
AI on the Water: Applying DRL to Autonomous Vessel Navigation |
Md Shadab Alam et.al. |
2310.14938 |
null |
2023-10-23 |
Navigating the Ocean with DRL: Path following for marine vessels |
Joel Jose et.al. |
2310.14932 |
null |
2023-10-23 |
Budgeted Embedding Table For Recommender Systems |
Yunke Qu et.al. |
2310.14884 |
null |
2023-10-23 |
Diverse Priors for Deep Reinforcement Learning |
Chenfan Weng et.al. |
2310.14864 |
null |
2023-10-23 |
Policy Gradient with Kernel Quadrature |
Satoshi Hayakawa et.al. |
2310.14768 |
null |
2023-10-23 |
Multi-Agent Learning in Contextual Games under Unknown Constraints |
Anna M. Maddux et.al. |
2310.14685 |
null |
2023-10-20 |
Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis |
Philip John Gorinski et.al. |
2310.13669 |
link |
2023-10-20 |
EXPLORA: AI/ML EXPLainability for the Open RAN |
Claudio Fiandrino et.al. |
2310.13667 |
link |
2023-10-20 |
Contrastive Prefence Learning: Learning from Human Feedback without RL |
Joey Hejna et.al. |
2310.13639 |
link |
2023-10-20 |
Entangled Preferences: The History and Risks of Reinforcement Learning and Human Feedback |
Nathan Lambert et.al. |
2310.13595 |
null |
2023-10-20 |
Simultaneous Machine Translation with Tailored Reference |
Shoutao Guo et.al. |
2310.13588 |
null |
2023-10-20 |
Cooperative Multi-Agent Deep Reinforcement Learning for Adaptive Decentralized Emergency Voltage Control |
Ying Zhang et.al. |
2310.13577 |
null |
2023-10-20 |
Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery |
Victor-Alexandru Darvariu et.al. |
2310.13576 |
null |
2023-10-20 |
Reward Shaping for Happier Autonomous Cyber Security Agents |
Elizabeth Bates et.al. |
2310.13565 |
null |
2023-10-20 |
Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes |
Ruiquan Huang et.al. |
2310.13550 |
null |
2023-10-20 |
Towards Understanding Sycophancy in Language Models |
Mrinank Sharma et.al. |
2310.13548 |
link |
2023-10-19 |
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption |
Rui Yang et.al. |
2310.12955 |
link |
2023-10-19 |
End-to-End Delay Minimization based on Joint Optimization of DNN Partitioning and Resource Allocation for Cooperative Edge Inference |
Xinrui Ye et.al. |
2310.12937 |
null |
2023-10-19 |
Generative Flow Networks as Entropy-Regularized RL |
Daniil Tiapkin et.al. |
2310.12934 |
link |
2023-10-19 |
Eureka: Human-Level Reward Design via Coding Large Language Models |
Yecheng Jason Ma et.al. |
2310.12931 |
link |
2023-10-19 |
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning |
Juan Rocamonde et.al. |
2310.12921 |
link |
2023-10-19 |
Collaborative Adaptation: Learning to Recover from Unforeseen Malfunctions in Multi-Robot Teams |
Yasin Findik et.al. |
2310.12909 |
null |
2023-10-19 |
Safe RLHF: Safe Reinforcement Learning from Human Feedback |
Josef Dai et.al. |
2310.12773 |
link |
2023-10-19 |
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark |
Jiaming Ji et.al. |
2310.12567 |
null |
2023-10-19 |
Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework |
Imdad Ullah et.al. |
2310.12523 |
null |
2023-10-19 |
SDGym: Low-Code Reinforcement Learning Environments using System Dynamics Models |
Emmanuel Klu et.al. |
2310.12494 |
link |
2023-10-18 |
Quality Diversity through Human Feedback |
Li Ding et.al. |
2310.12103 |
link |
2023-10-18 |
Understanding Reward Ambiguity Through Optimal Transport Theory in Inverse Reinforcement Learning |
Ali Baheri et.al. |
2310.12055 |
null |
2023-10-18 |
A General Theoretical Paradigm to Understand Learning from Human Preferences |
Mohammad Gheshlaghi Azar et.al. |
2310.12036 |
null |
2023-10-19 |
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning |
Rui Zheng et.al. |
2310.11971 |
null |
2023-10-18 |
Accelerated Policy Gradient: On the Nesterov Momentum for Reinforcement Learning |
Yen-Ju Chen et.al. |
2310.11897 |
link |
2023-10-18 |
Accelerate Presolve in Large-Scale Linear Programming via Reinforcement Learning |
Yufei Kuang et.al. |
2310.11845 |
null |
2023-10-18 |
On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning |
Rohan Subramani et.al. |
2310.11840 |
null |
2023-10-18 |
IntentDial: An Intent Graph based Multi-Turn Dialogue System with Reasoning Path Visualization |
Zengguang Hao et.al. |
2310.11818 |
null |
2023-10-18 |
Dynamic Resource Management in Integrated NOMA Terrestrial-Satellite Networks using Multi-Agent Reinforcement Learning |
Ali Nauman et.al. |
2310.11814 |
null |
2023-10-18 |
NeuroCUT: A Neural Approach for Robust Graph Partitioning |
Rishi Shah et.al. |
2310.11787 |
link |
2023-10-17 |
GreenNFV: Energy-Efficient Network Function Virtualization with Service Level Agreement Constraints |
MD S Q Zulkar Nine et.al. |
2310.11406 |
null |
2023-10-17 |
Real-time data assimilation for the thermodynamic modeling of cryogenic storage tanks |
Pedro Afonso Marques et.al. |
2310.11399 |
null |
2023-10-17 |
Non-ergodicity in reinforcement learning: robustness via ergodicity transformations |
Dominik Baumann et.al. |
2310.11335 |
link |
2023-10-17 |
Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous Control |
Chao Li et.al. |
2310.11138 |
null |
2023-10-17 |
Sim-to-Real Transfer of Adaptive Control Parameters for AUV Stabilization under Current Disturbance |
Thomas Chaffre et.al. |
2310.11075 |
null |
2023-10-17 |
Cooperative Dispatch of Microgrids Community Using Risk-Sensitive Reinforcement Learning with Monotonously Improved Performance |
Ziqing Zhu et.al. |
2310.10997 |
null |
2023-10-17 |
Combat Urban Congestion via Collaboration: Heterogeneous GNN-based MARL for Coordinated Platooning and Traffic Signal Control |
Xianyue Peng et.al. |
2310.10948 |
null |
2023-10-18 |
Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning |
Yunlong Song et.al. |
2310.10943 |
null |
2023-10-17 |
Enhanced Transformer Architecture for Natural Language Processing |
Woohyeon Moon et.al. |
2310.10930 |
null |
2023-10-16 |
Eco-Driving Control of Connected and Automated Vehicles using Neural Network based Rollout |
Jacob Paugh et.al. |
2310.10878 |
null |
2023-10-16 |
Generating Summaries with Controllable Readability Levels |
Leonardo F. R. Ribeiro et.al. |
2310.10623 |
link |
2023-10-16 |
Quantifying Assistive Robustness Via the Natural-Adversarial Frontier |
Jerry Zhi-Yang He et.al. |
2310.10610 |
null |
2023-10-16 |
Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks |
Zihao Li et.al. |
2310.10556 |
null |
2023-10-16 |
Applications of Distributed Machine Learning for the Internet-of-Things: A Comprehensive Survey |
Mai Le et.al. |
2310.10549 |
null |
2023-10-16 |
Learning optimal integration of spatial and temporal information in noisy chemotaxis |
Albert Alonso et.al. |
2310.10531 |
link |
2023-10-16 |
Efficient Sim-to-real Transfer of Contact-Rich Manipulation Skills with Online Admittance Residual Learning |
Xiang Zhang et.al. |
2310.10509 |
null |
2023-10-16 |
ReMax: A Simple, Effective, and Efficient Method for Aligning Large Language Models |
Ziniu Li et.al. |
2310.10505 |
link |
2023-10-16 |
Machine learning in physics: a short guide |
Francisco A. Rodrigues et.al. |
2310.10368 |
link |
2023-10-16 |
Unlocking Metasurface Practicality for B5G Networks: AI-assisted RIS Planning |
Guillermo Encinas-Lago et.al. |
2310.10330 |
null |
2023-10-16 |
End-to-end Offline Reinforcement Learning for Glycemia Control |
Tristan Beolet et.al. |
2310.10312 |
null |
2023-10-13 |
Goodhart's Law in Reinforcement Learning |
Jacek Karwowski et.al. |
2310.09144 |
null |
2023-10-13 |
Automatic Music Playlist Generation via Simulation-based Reinforcement Learning |
Federico Tomasi et.al. |
2310.09123 |
null |
2023-10-13 |
Online Relocating and Matching of Ride-Hailing Services: A Model-Based Modular Approach |
Chang Gao et.al. |
2310.09071 |
null |
2023-10-13 |
DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control |
Kevin Huang et.al. |
2310.09053 |
link |
2023-10-13 |
Optimal Scheduling of Electric Vehicle Charging with Deep Reinforcement Learning considering End Users Flexibility |
Christoforos Menos-Aikateriniadis et.al. |
2310.09040 |
null |
2023-10-13 |
μ-DDRL: A QoS-Aware Distributed Deep Reinforcement Learning Technique for Service Offloading in Fog computing Environments |
Mohammad Goudarzi et.al. |
2310.09003 |
null |
2023-10-13 |
Multi-Purpose NLP Chatbot : Design, Methodology & Conclusion |
Shivom Aggarwal et.al. |
2310.08977 |
null |
2023-10-13 |
PAGE: Equilibrate Personalization and Generalization in Federated Learning |
Qian Chen et.al. |
2310.08961 |
null |
2023-10-13 |
LLaMA Rider: Spurring Large Language Models to Explore the Open World |
Yicheng Feng et.al. |
2310.08922 |
null |
2023-10-13 |
Community Membership Hiding as Counterfactual Graph Search via Deep Reinforcement Learning |
Andrea Bernini et.al. |
2310.08909 |
null |
2023-10-12 |
Octopus: Embodied Vision-Language Programmer from Environmental Feedback |
Jingkang Yang et.al. |
2310.08588 |
link |
2023-10-12 |
Discovering Fatigued Movements for Virtual Character Animation |
Noshaba Cheema et.al. |
2310.08583 |
null |
2023-10-12 |
Universal Visual Decomposer: Long-Horizon Manipulation Made Easy |
Zichen Zhang et.al. |
2310.08581 |
null |
2023-10-12 |
A Lightweight Calibrated Simulation Enabling Efficient Offline Learning for Optimal Control of Real Buildings |
Judah Goldfeder et.al. |
2310.08569 |
null |
2023-10-12 |
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining |
Licong Lin et.al. |
2310.08566 |
link |
2023-10-12 |
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias |
Max Sobol Mark et.al. |
2310.08558 |
link |
2023-10-12 |
Cross-Episodic Curriculum for Transformer Agents |
Lucy Xiaoyang Shi et.al. |
2310.08549 |
null |
2023-10-12 |
MeanAP-Guided Reinforced Active Learning for Object Detection |
Zhixuan Liang et.al. |
2310.08387 |
null |
2023-10-12 |
Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment |
Boyang Xue et.al. |
2310.08372 |
link |
2023-10-12 |
Impact of multi-armed bandit strategies on deep recurrent reinforcement learning |
Valentina Zangirolami et.al. |
2310.08331 |
link |
2023-10-11 |
Reinforcement Learning-based Knowledge Graph Reasoning for Explainable Fact-checking |
Gustav Nikopensius et.al. |
2310.07613 |
null |
2023-10-11 |
Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning |
Mirco Mutti et.al. |
2310.07518 |
null |
2023-10-11 |
Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing |
Minh Ngoc Luu et.al. |
2310.07497 |
link |
2023-10-11 |
KwaiYiiMath: Technical Report |
Jiayi Fu et.al. |
2310.07488 |
null |
2023-10-11 |
GMOCAT: A Graph-Enhanced Multi-Objective Method for Computerized Adaptive Testing |
Hangyu Wang et.al. |
2310.07477 |
link |
2023-10-12 |
Imitation Learning from Observation with Automatic Discount Scheduling |
Yuyang Liu et.al. |
2310.07433 |
null |
2023-10-11 |
Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages |
Guozheng Ma et.al. |
2310.07418 |
link |
2023-10-11 |
RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts |
Matteo El-Hariry et.al. |
2310.07393 |
link |
2023-10-11 |
Learning a Reward Function for User-Preferred Appliance Scheduling |
Nikolina Čović et.al. |
2310.07389 |
link |
2023-10-12 |
RLaGA: A Reinforcement Learning Augmented Genetic Algorithm For Searching Real and Diverse Marker-Based Landing Violations |
Linfeng Liang et.al. |
2310.07378 |
null |
2023-10-10 |
Scalable Semantic Non-Markovian Simulation Proxy for Reinforcement Learning |
Kaustuv Mukherji et.al. |
2310.06835 |
null |
2023-10-10 |
$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$ -Divergences |
Siddhant Agarwal et.al. |
2310.06794 |
null |
2023-10-10 |
Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning |
Stefan Stojanovic et.al. |
2310.06793 |
null |
2023-10-10 |
Information Content Exploration |
Jacob Chmura et.al. |
2310.06777 |
null |
2023-10-10 |
EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation |
Baichuan Huang et.al. |
2310.06751 |
null |
2023-10-10 |
Near-Optimality of Finite-Memory Codes and Reinforcement Learning for Zero-Delay Coding of Markov Sources |
Liam Cregg et.al. |
2310.06742 |
null |
2023-10-10 |
Solving Inverse Problems with REINFORCE |
Chen Xu et.al. |
2310.06711 |
null |
2023-10-10 |
Diversity from Human Feedback |
Ren-Jian Wang et.al. |
2310.06648 |
null |
2023-10-10 |
BridgeHand2Vec Bridge Hand Representation |
Anna Sztyber-Betley et.al. |
2310.06624 |
link |
2023-10-10 |
SYNLOCO: Synthesizing Central Pattern Generator and Reinforcement Learning for Quadruped Locomotion |
Xinyu Zhang et.al. |
2310.06606 |
null |
2023-10-09 |
SALMON: Self-Alignment with Principle-Following Reward Models |
Zhiqing Sun et.al. |
2310.05910 |
link |
2023-10-09 |
DSAC-T: Distributional Soft Actor-Critic with Three Refinements |
Jingliang Duan et.al. |
2310.05858 |
link |
2023-10-09 |
A Simple Open-Loop Baseline for Reinforcement Learning Locomotion Tasks |
Antonin Raffin et.al. |
2310.05808 |
null |
2023-10-09 |
Aligning Language Models with Human Preferences via a Bayesian Approach |
Jiashuo Wang et.al. |
2310.05782 |
link |
2023-10-09 |
RateRL: A Framework for Developing RL-Based Rate Adaptation Algorithms in ns-3 |
Ruben Queiros et.al. |
2310.05772 |
null |
2023-10-09 |
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning |
Trevor McInroe et.al. |
2310.05723 |
null |
2023-10-09 |
DecAP: Decaying Action Priors for Accelerated Learning of Torque-Based Legged Locomotion Policies |
Shivam Sood et.al. |
2310.05714 |
null |
2023-10-09 |
Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments |
Xiong-Hui Chen et.al. |
2310.05712 |
null |
2023-10-09 |
Hierarchical Reinforcement Learning for Temporal Pattern Prediction |
Faith Johnson et.al. |
2310.05695 |
null |
2023-10-09 |
Multi-timestep models for Model-based Reinforcement Learning |
Abdelhakim Benechehab et.al. |
2310.05672 |
null |
2023-10-06 |
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets |
Zhang-Wei Hong et.al. |
2310.04413 |
link |
2023-10-06 |
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models |
Andy Zhou et.al. |
2310.04406 |
link |
2023-10-06 |
Confronting Reward Model Overoptimization with Constrained RLHF |
Ted Moskovitz et.al. |
2310.04373 |
link |
2023-10-06 |
Amortizing intractable inference in large language models |
Edward J. Hu et.al. |
2310.04363 |
link |
2023-10-06 |
Applying Reinforcement Learning to Option Pricing and Hedging |
Zoran Stoiljkovic et.al. |
2310.04336 |
null |
2023-10-06 |
Adjustable Robust Reinforcement Learning for Online 3D Bin Packing |
Yuxin Pan et.al. |
2310.04323 |
null |
2023-10-06 |
Searching for Optimal Runtime Assurance via Reachability and Reinforcement Learning |
Kristina Miller et.al. |
2310.04288 |
null |
2023-10-06 |
DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories |
Matteo El-Hariry et.al. |
2310.04266 |
link |
2023-10-06 |
Comparing Auxiliary Tasks for Learning Representations for Reinforcement Learning |
Moritz Lange et.al. |
2310.04241 |
null |
2023-10-06 |
Lending Interaction Wings to Recommender Systems with Conversational Agents |
Jiarui Jin et.al. |
2310.04230 |
null |
2023-10-05 |
Aligning Text-to-Image Diffusion Models with Reward Backpropagation |
Mihir Prabhudesai et.al. |
2310.03739 |
link |
2023-10-05 |
Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning |
Yihang Yao et.al. |
2310.03718 |
null |
2023-10-05 |
A Long Way to Go: Investigating Length Correlations in RLHF |
Prasann Singhal et.al. |
2310.03716 |
link |
2023-10-05 |
Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization |
Zhanhui Zhou et.al. |
2310.03708 |
link |
2023-10-05 |
Enhancing Exfiltration Path Analysis Using Reinforcement Learning |
Riddam Rishu et.al. |
2310.03667 |
null |
2023-10-05 |
Solving a Class of Non-Convex Minimax Optimization in Federated Learning |
Xidong Wu et.al. |
2310.03613 |
link |
2023-10-05 |
Output Feedback Reinforcement Learning with Parameter Optimisation for Temperature Control in a Material Extrusion Additive Manufacturing system |
Eleni Zavrakli et.al. |
2310.03599 |
link |
2023-10-05 |
Resilient Legged Local Navigation: Learning to Traverse with Compromised Perception End-to-End |
Jin Jin et.al. |
2310.03581 |
null |
2023-10-05 |
Reinforcement learning for traversing chemical structure space: Optimizing transition states and minimum energy paths of molecules |
Rhyan Barrett et.al. |
2310.03511 |
link |
2023-10-05 |
RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing |
Antoine Scardigli et.al. |
2310.03507 |
link |
2023-10-04 |
Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making |
Jeonghye Kim et.al. |
2310.03022 |
null |
2023-10-04 |
Proximal Policy Optimization-Based Reinforcement Learning Approach for DC-DC Boost Converter Control: A Comparative Evaluation Against Traditional Control Techniques |
Utsab Saha et.al. |
2310.02945 |
null |
2023-10-04 |
Searching for High-Value Molecules Using Reinforcement Learning and Transformers |
Raj Ghugare et.al. |
2310.02902 |
null |
2023-10-04 |
Learning to Scale Logits for Temperature-Conditional GFlowNets |
Minsu Kim et.al. |
2310.02823 |
link |
2023-10-04 |
Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design |
Matthew Thomas Jackson et.al. |
2310.02782 |
link |
2023-10-04 |
Reward Model Ensembles Help Mitigate Overoptimization |
Thomas Coste et.al. |
2310.02743 |
link |
2023-10-04 |
Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance |
Weirui Ye et.al. |
2310.02635 |
null |
2023-10-04 |
RLTrace: Synthesizing High-Quality System Call Traces for OS Fuzz Testing |
Wei Chen et.al. |
2310.02609 |
null |
2023-10-04 |
Multi-Agent Reinforcement Learning for Power Grid Topology Optimization |
Erica van der Sar et.al. |
2310.02605 |
null |
2023-10-04 |
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning |
Weidong Liu et.al. |
2310.02581 |
null |
2023-10-03 |
What do we learn from a large-scale study of pre-trained visual representations in sim and real environments? |
Sneha Silwal et.al. |
2310.02219 |
null |
2023-10-03 |
Towards a Unified Framework for Sequential Decision Making |
Carlos Núñez-Molina et.al. |
2310.02167 |
null |
2023-10-03 |
Navigating Uncertainty in ESG Investing |
Jiayue Zhang et.al. |
2310.02163 |
null |
2023-10-03 |
AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model |
Zibin Dong et.al. |
2310.02054 |
null |
2023-10-03 |
Probabilistic Reach-Avoid for Bayesian Neural Networks |
Matthew Wicker et.al. |
2310.01951 |
link |
2023-10-03 |
Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency |
Francisco Roldan Sanchez et.al. |
2310.01827 |
link |
2023-10-03 |
Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AI |
Emily Jin et.al. |
2310.01824 |
link |
2023-10-03 |
Differentially Encoded Observation Spaces for Perceptive Reinforcement Learning |
Lev Grossman et.al. |
2310.01767 |
link |
2023-10-04 |
Blending Imitation and Reinforcement Learning for Robust Policy Improvement |
Xuefeng Liu et.al. |
2310.01737 |
null |
2023-10-03 |
On Representation Complexity of Model-based and Model-free Reinforcement Learning |
Hanlin Zhu et.al. |
2310.01706 |
null |