Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization

Published in International Conference on Autonomous Agents and Multiagent Systems 2023 (AAMAS 2023)., 2023

Recommended citation: Wang, X., and Zhan, X.. Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization. In International Conference on Autonomous Agents and Multiagent Systems 2023 (AAMAS 2023) (Extended Abstract).

Abstract

Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new Offline Multi-Agent RL algorithm with Coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and action-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.

Other information