Person-centric Visual Grounding

4 papers with code • 1 benchmarks • 1 datasets

Person-centric visual grounding is the problem of linking between people named in a caption and people pictured in an image. Introduced in "Who's Waldo? Linking People Across Text and Images" (Cui et al, ICCV 2021).

Benchmarks

Add a Result

These leaderboards are used to track progress in Person-centric Visual Grounding

Trend	Dataset	Best Model	Paper	Code	Compare
	Who’s Waldo	Who's Waldo			See all

Datasets

Who’s Waldo

Most implemented papers

Most implemented Social Latest No code

Who's Waldo? Linking People Across Text and Images

clairecyq/whos-waldo • • ICCV 2021

We present a task and benchmark dataset for person-centric visual grounding, the problem of linking between people named in a caption and people pictured in an image.

Paper
Code

TubeDETR: Spatio-Temporal Video Grounding with Transformers

antoyang/TubeDETR • • CVPR 2022

We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query.

Paper
Code

To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

fpsluozi/tofindwaldo • • 30 Mar 2022

We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.

Paper
Code

Person-centric Visual Grounding

Benchmarks Add a Result

Datasets

Most implemented papers

Who's Waldo? Linking People Across Text and Images

TubeDETR: Spatio-Temporal Video Grounding with Transformers

To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

Content

Benchmarks

Add a Result