Offline RL Paper List (continued updating)

Offline RL paper list.

Classic offline RL papers

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Off-Policy Deep Reinforcement Learning without Exploration
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
Behavior Regularized Offline Reinforcement Learning
AlgaeDICE: Policy Gradient from Arbitrary Experience
An Optimistic Perspective on Offline Reinforcement Learning
MOPO: Model-based Offline Policy Optimization
Conservative Q-Learning for Offline Reinforcement Learning
Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning
Accelerating Online Reinforcement Learning with Offline Datasets
Critic Regularized Regression
Provably Good Batch Reinforcement Learning Without Great Exploration
Model-Based Offline Planning
PLAS: Latent Action Space for Offline Reinforcement Learning

Recent papers on offline RL (2020-2021)

Paper Title	Sub-area	Venue	Theory/Reproducibility	One sentence Summary
BRAC+: Going Deeper with Behavior Regularized Offline Reinforcement Learning	Model-Free	ICLR2021(reject)		Improving behavior regularized offline reinforcement learning.
Offline Policy Optimization with Variance Regularization	Model-Free	ICLR2021(reject)		Variance regularization based on stationary state-action distribution corrections in offline policy optimization.
Uncertainty Weighted Offline Reinforcement Learning	Model-Free	ICLR2021(reject)		A simple and effective uncertainty weighted training mechanism for stabilizing offline reinforcement learning.
Addressing Extrapolation Error in Deep Offline Reinforcement Learning	Model-Free	ICLR2021(reject)		We are proposing methods to address extrapolation error in deep offline reinforcement learning.
Q-Value Weighted Regression: Reinforcement Learning with Limited Data	Model-Free	ICLR2021(reject)		We analyze the sample-efficiency of actor-critic RL algorithms, and introduce a new algorithm, achieving superior sample-efficiency while maintaining competitive final performance on the MuJoCo task suite and on Atari games.
Robust Offline Reinforcement Learning from Low-Quality Data	Model-Free	ICLR2021(reject)
Reducing Conservativeness Oriented Offline Reinforcement Learning	Model-Free	Preprint
Continuous Doubly Constrained Batch Reinforcement Learning	Model-Free	ICML2021(review)
Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation	Model-Free	NIPS2020(accept)
POPO: Pessimistic Ofﬂine Policy Optimization	Model-Free	NIPS2020(workshop)
PLAS: Latent Action Space for Offline Reinforcement Learning	Model-Free	CORL2020(accept)	No theory, easy to reproduce（has code）
Critic Regularized Regression	Model-Free	NIPS2020(accept)	No theory, easy to reproduce（has code）
COMBO: Conservative Offline Model-Based Policy Optimization	Model-Based	ICML2021(review)	Some theory, easy to reproduce（has code）
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation	Model-Based	ICLR2021(accept)		Offline, data-driven optimization using normalized maximum likelihood to produce robust function estimates.
Overcoming Model Bias for Robust Offline Deep Reinforcement Learning	Model-Based	AAAI21(reject)
MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning	Model-Based	ICML2021(review)
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning	Model-Based	ICML2021(review)
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators	Model-Based	ICML2021(review)
Counterfactual Data Augmentation using Locally Factored Dynamics	Data-Aug	NIPS2020(accept)
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning	Data-Aug	ICML2021(review)
Representation Balancing Offline Model-based Reinforcement Learning	Data-Aug	Preprint
Representation Matters: Offline Pretraining for Sequential Decision Making	Data-Aug	Preprint
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets	Offline2Online	ICLR2021(reject)	No theory, easy to reproduce（has code）	We study RL pretraining from offline datasets and fine-tuning with online interaction, identifying issues with existing methods and proposing a new RL algorithm, AWAC, that is effective in this setting.
Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets	Offline2Online	ICLR2021(reject)		We present a simple framework, BRED, that incorporates a balanced replay scheme and an ensemble distillation scheme, for fine-tuning an offline RL agent more efficiently.
Fine-Tuning Offline Reinforcement Learning with Model-Based Policy Optimization	Offline2Online	ICLR2021(reject)		We present an offline RL approach that leverages both uncertainty-aware models and behavior-regularized model-free RL to achieve state of the art results on the MuJoCo tasks in the D4RL benchmark.
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning	Offline2Online	ICLR2021(accept)		An effective way to leverage multimodal offline behavioral data is to extract a continuous space of primitives, and use it for downstream task learning.
Near Real-World Benchmarks for Offline Reinforcement Learning	Benchmark	Preprint	No theory
Offline Reinforcement Learning Hands-On	Benchmark	NIPS2020(workshop)	No theory
RL Unplugged: Benchmarks for Offline Reinforcement Learning	Benchmark	NIPS2020(accept)	No theory
Offline Adaptive Policy Leaning in Real-World Sequential Recommendation Systems	Application	ICLR2021(reject)		We propose a new paradigm to learn an RL policy from offline data in the real-world sequential recommendation system.
Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation	Application	ICLR2021(reject)
Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification	Constraints	ICLR2021(reject)		This paper presents an approach that is robust with respect to constraint satisfaction in the presence of perturbations to the system dynamics.
Offline Learning from Demonstrations and Unlabeled Experience	Imitation	NIPS2020(workshop)
Risk-Averse Offline Reinforcement Learning	Robustness	ICLR2021(accept)		We propose the first risk-averse reinforcement learning algorithm in the fully offline setting.
Batch Reinforcement Learning with Hyperparameter Gradients	Others	ICML2020(accept)
Overﬁtting and Optimization in Ofﬂine Policy Learning	Others	Preprint

詹仙园|Xianyuan Zhan

Offline RL Paper List (continued updating)

Classic offline RL papers

Recent papers on offline RL (2020-2021)