Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Published in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)., 2023

Recommended citation: Cheng, P.*, Zhan, X.*, Wu, Z., Zhang, W., Song, S., Wang, H., Lin, Y., & Jiang, L. Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL. In the Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023).

Abstract

Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.

Other information

Arxiv version of the paper
This paper is also accepted in ICML 2023 Workshop on Spurious Correlations, Invariance and Stability and Workshop on New Frontiers in Learning, Control, and Dynamical Systems (Frontiers4LCD).

Twitter Facebook LinkedIn

詹仙园|Xianyuan Zhan

Abstract

Other information