Offline RL Paper List (continued updating)
Offline RL paper list.
Classic offline RL papers
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
- Off-Policy Deep Reinforcement Learning without Exploration
- Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
- Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
- Behavior Regularized Offline Reinforcement Learning
- AlgaeDICE: Policy Gradient from Arbitrary Experience
- An Optimistic Perspective on Offline Reinforcement Learning
- MOPO: Model-based Offline Policy Optimization
- Conservative Q-Learning for Offline Reinforcement Learning
- Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning
- Accelerating Online Reinforcement Learning with Offline Datasets
- Critic Regularized Regression
- Provably Good Batch Reinforcement Learning Without Great Exploration
- Model-Based Offline Planning
- PLAS: Latent Action Space for Offline Reinforcement Learning
Recent papers on offline RL (2020-2021)
Paper Title | Sub-area | Venue | Theory/Reproducibility | One sentence Summary |
---|---|---|---|---|
BRAC+: Going Deeper with Behavior Regularized Offline Reinforcement Learning | Model-Free | ICLR2021(reject) | Improving behavior regularized offline reinforcement learning. | |
Offline Policy Optimization with Variance Regularization | Model-Free | ICLR2021(reject) | Variance regularization based on stationary state-action distribution corrections in offline policy optimization. | |
Uncertainty Weighted Offline Reinforcement Learning | Model-Free | ICLR2021(reject) | A simple and effective uncertainty weighted training mechanism for stabilizing offline reinforcement learning. | |
Addressing Extrapolation Error in Deep Offline Reinforcement Learning | Model-Free | ICLR2021(reject) | We are proposing methods to address extrapolation error in deep offline reinforcement learning. | |
Q-Value Weighted Regression: Reinforcement Learning with Limited Data | Model-Free | ICLR2021(reject) | We analyze the sample-efficiency of actor-critic RL algorithms, and introduce a new algorithm, achieving superior sample-efficiency while maintaining competitive final performance on the MuJoCo task suite and on Atari games. | |
Robust Offline Reinforcement Learning from Low-Quality Data | Model-Free | ICLR2021(reject) | ||
Reducing Conservativeness Oriented Offline Reinforcement Learning | Model-Free | Preprint | ||
Continuous Doubly Constrained Batch Reinforcement Learning | Model-Free | ICML2021(review) | ||
Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation | Model-Free | NIPS2020(accept) | ||
POPO: Pessimistic Offline Policy Optimization | Model-Free | NIPS2020(workshop) | ||
PLAS: Latent Action Space for Offline Reinforcement Learning | Model-Free | CORL2020(accept) | No theory, easy to reproduce(has code) | |
Critic Regularized Regression | Model-Free | NIPS2020(accept) | No theory, easy to reproduce(has code) | |
COMBO: Conservative Offline Model-Based Policy Optimization | Model-Based | ICML2021(review) | Some theory, easy to reproduce(has code) | |
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation | Model-Based | ICLR2021(accept) | Offline, data-driven optimization using normalized maximum likelihood to produce robust function estimates. | |
Overcoming Model Bias for Robust Offline Deep Reinforcement Learning | Model-Based | AAAI21(reject) | ||
MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning | Model-Based | ICML2021(review) | ||
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning | Model-Based | ICML2021(review) | ||
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators | Model-Based | ICML2021(review) | ||
Counterfactual Data Augmentation using Locally Factored Dynamics | Data-Aug | NIPS2020(accept) | ||
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning | Data-Aug | ICML2021(review) | ||
Representation Balancing Offline Model-based Reinforcement Learning | Data-Aug | Preprint | ||
Representation Matters: Offline Pretraining for Sequential Decision Making | Data-Aug | Preprint | ||
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets | Offline2Online | ICLR2021(reject) | No theory, easy to reproduce(has code) | We study RL pretraining from offline datasets and fine-tuning with online interaction, identifying issues with existing methods and proposing a new RL algorithm, AWAC, that is effective in this setting. |
Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets | Offline2Online | ICLR2021(reject) | We present a simple framework, BRED, that incorporates a balanced replay scheme and an ensemble distillation scheme, for fine-tuning an offline RL agent more efficiently. | |
Fine-Tuning Offline Reinforcement Learning with Model-Based Policy Optimization | Offline2Online | ICLR2021(reject) | We present an offline RL approach that leverages both uncertainty-aware models and behavior-regularized model-free RL to achieve state of the art results on the MuJoCo tasks in the D4RL benchmark. | |
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning | Offline2Online | ICLR2021(accept) | An effective way to leverage multimodal offline behavioral data is to extract a continuous space of primitives, and use it for downstream task learning. | |
Near Real-World Benchmarks for Offline Reinforcement Learning | Benchmark | Preprint | No theory | |
Offline Reinforcement Learning Hands-On | Benchmark | NIPS2020(workshop) | No theory | |
RL Unplugged: Benchmarks for Offline Reinforcement Learning | Benchmark | NIPS2020(accept) | No theory | |
Offline Adaptive Policy Leaning in Real-World Sequential Recommendation Systems | Application | ICLR2021(reject) | We propose a new paradigm to learn an RL policy from offline data in the real-world sequential recommendation system. | |
Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation | Application | ICLR2021(reject) | ||
Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification | Constraints | ICLR2021(reject) | This paper presents an approach that is robust with respect to constraint satisfaction in the presence of perturbations to the system dynamics. | |
Offline Learning from Demonstrations and Unlabeled Experience | Imitation | NIPS2020(workshop) | ||
Risk-Averse Offline Reinforcement Learning | Robustness | ICLR2021(accept) | We propose the first risk-averse reinforcement learning algorithm in the fully offline setting. | |
Batch Reinforcement Learning with Hyperparameter Gradients | Others | ICML2020(accept) | ||
Overfitting and Optimization in Offline Policy Learning | Others | Preprint |