VGSI
1 papers with code • 1 benchmarks • 1 datasets
Given a textual goal and multiple images representing candidate events, a model must choose one image which constitutes a reason- able step towards the given goal. A model should correctly recognize not only the specific action illustrated in an image (e.g., “turning on the oven”), but also the intent of the action (“baking fish”).
Most implemented papers
Visual Goal-Step Inference using wikiHow
Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities.