OpenGVL: Benchmarking Visual Temporal Progress for Data Curation
Published in CoRL 2025 workshop, 2025

OpenGVL provides a benchmark and toolkit to evaluate how well vision–language models (VLMs) understand temporal progress in robotic tasks. It enables automatic annotation and curation of large-scale robotics datasets by predicting task completion from video frames, making it practical for data quality assessment and filtering.
We introduce Value-Order Correlation (VOC) — Spearman rank correlation between the model’s predicted progress ordering and the video’s true time order — to measure temporal understanding (higher is better; +1 perfect, 0 random, −1 reversed). The framework supports few-shot prompting with context episodes and includes contamination control via hidden tasks curated for evaluation.
Links:
Recommended citation: Budzianowski, P., Wiśnios, E., Góral, G., Tyrolski, M., Kulakov, I., Petrenko, V., & Walas, K. (2025). OpenGVL - Benchmarking Visual Temporal Progress for Data Curation. arXiv:2509.17321.
Download Paper | Download Slides