dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.
140 PAPERS • NO BENCHMARKS YET
The smallNORB dataset is a datset for 3D object recognition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees). The training set is composed of 5 instances of each category (instances 4, 6, 7, 8 and 9), and the test set of the remaining 5 instances (instances 0, 1, 2, 3, and 5).
106 PAPERS • 1 BENCHMARK
The Sprites dataset contains 60 pixel color images of animated characters (sprites). There are 672 sprites, 500 for training, 100 for testing and 72 for validation. Each sprite has 20 animations and 178 images, so the full dataset has 120K images in total. There are many changes in the appearance of the sprites, they differ in their body shape, gender, hair, armor, arm type, greaves, and weapon.
48 PAPERS • 3 BENCHMARKS
3dshapes is a dataset of 3D shapes procedurally generated from 6 ground truth independent latent factors. These factors are floor colour, wall colour, object colour, scale, shape and orientation.
31 PAPERS • NO BENCHMARKS YET
Car CAD models from "3d object detection and viewpoint estimation with a deformable 3d cuboid model" were used to generate the dataset. For each of the 199 car models, the authors generated $64\times64$ color renderings from 24 rotation angles each offset by 15 degrees, as well as from 4 different camera elevations.
21 PAPERS • NO BENCHMARKS YET
Update on 3DIdent, where we introduce six additional object classes (Hare, Dragon, Cow, Armadillo, Horse, and Head), and impose a causal graph over the latent variables. For further details, see Appendix B in the associated paper (https://arxiv.org/abs/2106.04619).
12 PAPERS • 1 BENCHMARK
A data-set which consists of over one million images of physical 3D objects with seven factors of variation, such as object color, shape, size and position.
10 PAPERS • NO BENCHMARKS YET
Novel benchmark which features aspects of natural scenes, e.g. a complex 3D object and different lighting conditions, while still providing access to the continuous ground-truth factors.
9 PAPERS • 1 BENCHMARK
This Dataset consists of 2120 sequences of binary masks of pedestrians. The sequence length varies between 2-710. For details, we refer to our paper. It is based on the original KITTI Segmentation challenge which can be found at https://www.vision.rwth-aachen.de/page/mots
3 PAPERS • 1 BENCHMARK
Largest, first-of-its-kind, in-the-wild, fine-grained workout/exercise posture analysis dataset, covering three different exercises: BackSquat, Barbell Row, and Overhead Press. Seven different types of exercise errors are covered. Unlabeled data is also provided to facilitate self-supervised learning.
2 PAPERS • NO BENCHMARKS YET
This csv consists of (x-position, y-position, area) tuples of three views (left, middle, right) of downscaled binary masks with aspect ratio kept (64 x 128) from the 2019 YouTube-VIS challenge, which can be found at https://competitions.codalab.org/competitions/20127#participate-get-data. Extracting pairs from this csv results in 234,652 transitions in the given statistics. These statistics can be used to augment ground truth factor distributions with natural transitions, which we demonstrate with spriteworld. For details, we refer to our paper, which can be found at https://openreview.net/forum?id=EbIDjBynYJ8.
1 PAPER • 1 BENCHMARK
Synthetic dataset intended for benchmarking disentanglement frameworks.
1 PAPER • NO BENCHMARKS YET