Offline RL Paper List (continued updating)
Offline RL paper list.
Classic offline RL papers
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
- Off-Policy Deep Reinforcement Learning without Exploration
- Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
- Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
- Behavior Regularized Offline Reinforcement Learning
- AlgaeDICE: Policy Gradient from Arbitrary Experience
- An Optimistic Perspective on Offline Reinforcement Learning
- MOPO: Model-based Offline Policy Optimization
- Conservative Q-Learning for Offline Reinforcement Learning
- Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning
- Accelerating Online Reinforcement Learning with Offline Datasets
- Critic Regularized Regression
- Provably Good Batch Reinforcement Learning Without Great Exploration
- Model-Based Offline Planning
- PLAS: Latent Action Space for Offline Reinforcement Learning
Recent papers on offline RL (2020-2021)
| Paper Title | Sub-area | Venue | Theory/Reproducibility | One sentence Summary |
|---|---|---|---|---|
| BRAC+: Going Deeper with Behavior Regularized Offline Reinforcement Learning | Model-Free | ICLR2021(reject) | Improving behavior regularized offline reinforcement learning. | |
| Offline Policy Optimization with Variance Regularization | Model-Free | ICLR2021(reject) | Variance regularization based on stationary state-action distribution corrections in offline policy optimization. | |
| Uncertainty Weighted Offline Reinforcement Learning | Model-Free | ICLR2021(reject) | A simple and effective uncertainty weighted training mechanism for stabilizing offline reinforcement learning. | |
| Addressing Extrapolation Error in Deep Offline Reinforcement Learning | Model-Free | ICLR2021(reject) | We are proposing methods to address extrapolation error in deep offline reinforcement learning. | |
| Q-Value Weighted Regression: Reinforcement Learning with Limited Data | Model-Free | ICLR2021(reject) | We analyze the sample-efficiency of actor-critic RL algorithms, and introduce a new algorithm, achieving superior sample-efficiency while maintaining competitive final performance on the MuJoCo task suite and on Atari games. | |
| Robust Offline Reinforcement Learning from Low-Quality Data | Model-Free | ICLR2021(reject) | ||
| Reducing Conservativeness Oriented Offline Reinforcement Learning | Model-Free | Preprint | ||
| Continuous Doubly Constrained Batch Reinforcement Learning | Model-Free | ICML2021(review) | ||
| Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation | Model-Free | NIPS2020(accept) | ||
| POPO: Pessimistic Offline Policy Optimization | Model-Free | NIPS2020(workshop) | ||
| PLAS: Latent Action Space for Offline Reinforcement Learning | Model-Free | CORL2020(accept) | No theory, easy to reproduce(has code) | |
| Critic Regularized Regression | Model-Free | NIPS2020(accept) | No theory, easy to reproduce(has code) | |
| COMBO: Conservative Offline Model-Based Policy Optimization | Model-Based | ICML2021(review) | Some theory, easy to reproduce(has code) | |
| Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation | Model-Based | ICLR2021(accept) | Offline, data-driven optimization using normalized maximum likelihood to produce robust function estimates. | |
| Overcoming Model Bias for Robust Offline Deep Reinforcement Learning | Model-Based | AAAI21(reject) | ||
| MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning | Model-Based | ICML2021(review) | ||
| GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning | Model-Based | ICML2021(review) | ||
| PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators | Model-Based | ICML2021(review) | ||
| Counterfactual Data Augmentation using Locally Factored Dynamics | Data-Aug | NIPS2020(accept) | ||
| S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning | Data-Aug | ICML2021(review) | ||
| Representation Balancing Offline Model-based Reinforcement Learning | Data-Aug | Preprint | ||
| Representation Matters: Offline Pretraining for Sequential Decision Making | Data-Aug | Preprint | ||
| AWAC: Accelerating Online Reinforcement Learning with Offline Datasets | Offline2Online | ICLR2021(reject) | No theory, easy to reproduce(has code) | We study RL pretraining from offline datasets and fine-tuning with online interaction, identifying issues with existing methods and proposing a new RL algorithm, AWAC, that is effective in this setting. |
| Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets | Offline2Online | ICLR2021(reject) | We present a simple framework, BRED, that incorporates a balanced replay scheme and an ensemble distillation scheme, for fine-tuning an offline RL agent more efficiently. | |
| Fine-Tuning Offline Reinforcement Learning with Model-Based Policy Optimization | Offline2Online | ICLR2021(reject) | We present an offline RL approach that leverages both uncertainty-aware models and behavior-regularized model-free RL to achieve state of the art results on the MuJoCo tasks in the D4RL benchmark. | |
| OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning | Offline2Online | ICLR2021(accept) | An effective way to leverage multimodal offline behavioral data is to extract a continuous space of primitives, and use it for downstream task learning. | |
| Near Real-World Benchmarks for Offline Reinforcement Learning | Benchmark | Preprint | No theory | |
| Offline Reinforcement Learning Hands-On | Benchmark | NIPS2020(workshop) | No theory | |
| RL Unplugged: Benchmarks for Offline Reinforcement Learning | Benchmark | NIPS2020(accept) | No theory | |
| Offline Adaptive Policy Leaning in Real-World Sequential Recommendation Systems | Application | ICLR2021(reject) | We propose a new paradigm to learn an RL policy from offline data in the real-world sequential recommendation system. | |
| Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation | Application | ICLR2021(reject) | ||
| Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification | Constraints | ICLR2021(reject) | This paper presents an approach that is robust with respect to constraint satisfaction in the presence of perturbations to the system dynamics. | |
| Offline Learning from Demonstrations and Unlabeled Experience | Imitation | NIPS2020(workshop) | ||
| Risk-Averse Offline Reinforcement Learning | Robustness | ICLR2021(accept) | We propose the first risk-averse reinforcement learning algorithm in the fully offline setting. | |
| Batch Reinforcement Learning with Hyperparameter Gradients | Others | ICML2020(accept) | ||
| Overfitting and Optimization in Offline Policy Learning | Others | Preprint |
