Computer Vision

VGSI

1 papers with code • 1 benchmarks • 1 datasets

Given a textual goal and multiple images representing candidate events, a model must choose one image which constitutes a reason- able step towards the given goal. A model should correctly recognize not only the specific action illustrated in an image (e.g., “turning on the oven”), but also the intent of the action (“baking fish”).

Benchmarks

Add a Result

These leaderboards are used to track progress in VGSI

Trend	Dataset	Best Model	Paper	Code	Compare
	wikiHow-image	Triplet Network			See all

Datasets

wikiHow-image

Most implemented papers

Most implemented Social Latest No code

Visual Goal-Step Inference using wikiHow

yueyang1996/wikihow-vgsi • EMNLP 2021

Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities.

1

Paper
Code