Offline RL Paper List (continued updating)

Offline RL paper list.

Classic offline RL papers

Recent papers on offline RL (2020-2021)

Paper TitleSub-areaVenueTheory/ReproducibilityOne sentence Summary
BRAC+: Going Deeper with Behavior Regularized Offline Reinforcement LearningModel-FreeICLR2021(reject) Improving behavior regularized offline reinforcement learning.
Offline Policy Optimization with Variance RegularizationModel-FreeICLR2021(reject) Variance regularization based on stationary state-action distribution corrections in offline policy optimization.
Uncertainty Weighted Offline Reinforcement LearningModel-FreeICLR2021(reject) A simple and effective uncertainty weighted training mechanism for stabilizing offline reinforcement learning.
Addressing Extrapolation Error in Deep Offline Reinforcement LearningModel-FreeICLR2021(reject) We are proposing methods to address extrapolation error in deep offline reinforcement learning.
Q-Value Weighted Regression: Reinforcement Learning with Limited DataModel-FreeICLR2021(reject) We analyze the sample-efficiency of actor-critic RL algorithms, and introduce a new algorithm, achieving superior sample-efficiency while maintaining competitive final performance on the MuJoCo task suite and on Atari games.
Robust Offline Reinforcement Learning from Low-Quality DataModel-FreeICLR2021(reject)  
Reducing Conservativeness Oriented Offline Reinforcement LearningModel-FreePreprint  
Continuous Doubly Constrained Batch Reinforcement LearningModel-FreeICML2021(review)  
Expert-Supervised Reinforcement Learning for Offline Policy Learning and EvaluationModel-FreeNIPS2020(accept)  
POPO: Pessimistic Offline Policy OptimizationModel-FreeNIPS2020(workshop)  
PLAS: Latent Action Space for Offline Reinforcement LearningModel-FreeCORL2020(accept)No theory, easy to reproduce(has code) 
Critic Regularized RegressionModel-FreeNIPS2020(accept)No theory, easy to reproduce(has code) 
COMBO: Conservative Offline Model-Based Policy OptimizationModel-BasedICML2021(review)Some theory, easy to reproduce(has code) 
Offline Model-Based Optimization via Normalized Maximum Likelihood EstimationModel-BasedICLR2021(accept) Offline, data-driven optimization using normalized maximum likelihood to produce robust function estimates.
Overcoming Model Bias for Robust Offline Deep Reinforcement LearningModel-BasedAAAI21(reject)  
MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement LearningModel-BasedICML2021(review)  
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement LearningModel-BasedICML2021(review)  
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized SimulatorsModel-BasedICML2021(review)  
Counterfactual Data Augmentation using Locally Factored DynamicsData-AugNIPS2020(accept)  
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement LearningData-AugICML2021(review)  
Representation Balancing Offline Model-based Reinforcement LearningData-AugPreprint  
Representation Matters: Offline Pretraining for Sequential Decision MakingData-AugPreprint  
AWAC: Accelerating Online Reinforcement Learning with Offline DatasetsOffline2OnlineICLR2021(reject)No theory, easy to reproduce(has code)We study RL pretraining from offline datasets and fine-tuning with online interaction, identifying issues with existing methods and proposing a new RL algorithm, AWAC, that is effective in this setting.
Addressing Distribution Shift in Online Reinforcement Learning with Offline DatasetsOffline2OnlineICLR2021(reject) We present a simple framework, BRED, that incorporates a balanced replay scheme and an ensemble distillation scheme, for fine-tuning an offline RL agent more efficiently.
Fine-Tuning Offline Reinforcement Learning with Model-Based Policy OptimizationOffline2OnlineICLR2021(reject) We present an offline RL approach that leverages both uncertainty-aware models and behavior-regularized model-free RL to achieve state of the art results on the MuJoCo tasks in the D4RL benchmark.
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement LearningOffline2OnlineICLR2021(accept) An effective way to leverage multimodal offline behavioral data is to extract a continuous space of primitives, and use it for downstream task learning.
Near Real-World Benchmarks for Offline Reinforcement LearningBenchmarkPreprintNo theory 
Offline Reinforcement Learning Hands-OnBenchmarkNIPS2020(workshop)No theory 
RL Unplugged: Benchmarks for Offline Reinforcement LearningBenchmarkNIPS2020(accept)No theory 
Offline Adaptive Policy Leaning in Real-World Sequential Recommendation SystemsApplicationICLR2021(reject) We propose a new paradigm to learn an RL policy from offline data in the real-world sequential recommendation system.
Batch-Constrained Distributional Reinforcement Learning for Session-based RecommendationApplicationICLR2021(reject)  
Robust Constrained Reinforcement Learning for Continuous Control with Model MisspecificationConstraintsICLR2021(reject) This paper presents an approach that is robust with respect to constraint satisfaction in the presence of perturbations to the system dynamics.
Offline Learning from Demonstrations and Unlabeled ExperienceImitationNIPS2020(workshop)  
Risk-Averse Offline Reinforcement LearningRobustnessICLR2021(accept) We propose the first risk-averse reinforcement learning algorithm in the fully offline setting.
Batch Reinforcement Learning with Hyperparameter GradientsOthersICML2020(accept)  
Overfitting and Optimization in Offline Policy LearningOthersPreprint