VGSI

1 papers with code • 1 benchmarks • 1 datasets

Given a textual goal and multiple images representing candidate events, a model must choose one image which constitutes a reason- able step towards the given goal. A model should correctly recognize not only the specific action illustrated in an image (e.g., “turning on the oven”), but also the intent of the action (“baking fish”).

Most implemented papers

Visual Goal-Step Inference using wikiHow

yueyang1996/wikihow-vgsi EMNLP 2021

Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities.