Model-Based Offline Planning with Trajectory Pruning
Published in 31st International Joint Conference on Artificial Intelligence (IJCAI-22), 2022
Recommended citation: Zhan, X., Zhu, X. and Xu, H. Model-Based Offline Planning with Trajectory Pruning. In the 31st International Joint Conference on Artificial Intelligence (IJCAI-22), 3695-3701.
Abstract
Offline reinforcement learning (RL) enables learning policies using pre-collected datasets without environment interaction, which provides a promising direction to make RL usable in real-world systems. Although recent offline RL studies have achieved much progress, existing methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. Model-based planning framework provides an attractive solution for such tasks. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.
Other information
- A former version of this paper has been accepted in NeurIPS 2021 Offline RL Workshop.
- Arxiv version of the paper
- Code