LIBERO-Safety: A Comprehensive Benchmark for Physical and Semantic Safety in Vision-Language-Action Models

Published in 19th European Conference on Computer Vision (ECCV 2026)., 2026

Recommended citation: Cui, R., Zhang, Z., Pang, J., Chi, H., Guo, J., Zhang, S., Xie, S., Jin, X., Mu, Y., Yang, J., Yao, G., Zhan, X., Zhang, Y., Zhao H. LIBERO-Safety: A Comprehensive Benchmark for Physical and Semantic Safety in Vision-Language-Action Models. In the 19th European Conference on Computer Vision (ECCV 2026).

Abstract

Despite the impressive manipulation capabilities of Vision-Language-Action (VLA) models, their operational safety under strict constraints remains largely unverified. To address this, we introduce a parametric safety benchmark to procedurally generate safety-critical scenarios with comprehensive stochasticity. To overcome the scalability bottlenecks of human teleoperation, we develop a novel keypose-driven data generation pipeline. Leveraging this infrastructure, we curate a large-scale dataset of 19,664 strictly collision-free demonstrations with extensive domain randomization. We then conduct a systematic cross-paradigm evaluation of eight VLA and two embodied foundation models. Our analysis reveals a critical generalization-safety tension: although high-diversity training fosters safer trajectories, task success remains fundamentally bottlenecked by sub-optimal trajectory synthesis and semantic misalignment. By providing a scalable pipeline, a robust dataset, and profound failure-mode insights, this work establishes a crucial foundation for developing safe and reliable VLA model. The entire dataset and code will be released.

Other information