The MIT-States dataset has 245 object classes, 115 attribute classes and ∼53K images. There is a wide range of objects (e.g., fish, persimmon, room) and attributes (e.g., mossy, deflated, dirty). On average, each object instance is modified by one of the 9 attributes it affords.
65 PAPERS • 4 BENCHMARKS
UT Zappos50K is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The images are divided into 4 major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The shoes are centered on a white background and pictured in the same orientation for convenient analysis.
30 PAPERS • 1 BENCHMARK
We propose a split built on top of Stanford GQA dataset originally proposed for VQA and name it Compositional GQA (C-GQA) dataset (see supplementary for the details). CGQA contains over 9.5k compositional labels making it the most extensive dataset for CZSL. With cleaner labels and a larger label space, our hope is that this dataset will inspire further research on the topic.
27 PAPERS • NO BENCHMARKS YET
ReaSCAN is a synthetic navigation task that requires models to reason about surroundings over syntactically difficult languages.
7 PAPERS • NO BENCHMARKS YET
AO-CLEVr is a new synthetic-images dataset containing images of "easy" Attribute-Object categories, based on the CLEVr. AO-CLEVr has attribute-object pairs created from 8 attributes: { red, purple, yellow, blue, green, cyan, gray, brown } and 3 object shapes {sphere, cube, cylinder}, yielding 24 attribute-object pairs. Each pair consists of 7500 images. Each image has a single object that consists of the attribute-object pair. The object is randomly assigned one of two sizes (small/large), one of two materials (rubber/metallic), a random position, and random lightning according to CLEVr defaults.
5 PAPERS • NO BENCHMARKS YET
UT Zappos50K (UT-Zap50K) is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The images are divided into 4 major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The shoes are centered on a white background and pictured in the same orientation for convenient analysis. This dataset is created in the context of an online shopping task, where users pay special attentions to fine-grained visual differences. For instance, it is more likely that a shopper is deciding between two pairs of similar men's running shoes instead of between a woman's high heel and a man's slipper. GIST and LAB color features are provided. In addition, each image has 8 associated meta-data (gender, materials, etc.) labels that are used to filter the shoes on Zappos.com. We introduced this dataset in the context of a pairwise comparison task, where the goal is to predict which of two images more strongly exhibits a visual attribu
0 PAPER • NO BENCHMARKS YET