Office-Home is a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The four domains are: Art – artistic images in the form of sketches, paintings, ornamentation, etc.; Clipart – collection of clipart images; Product – images of objects without a background and Real-World – images of objects captured with a regular camera. It contains 15,500 images, with an average of around 70 images per class and a maximum of 99 images in a class.
935 PAPERS • 11 BENCHMARKS
Market-1501 is a large-scale public benchmark dataset for person re-identification. It contains 1501 identities which are captured by six different cameras, and 32,668 pedestrian image bounding-boxes obtained using the Deformable Part Models pedestrian detector. Each person has 3.6 images on average at each viewpoint. The dataset is split into two parts: 750 identities are utilized for training and the remaining 751 identities are used for testing. In the official testing protocol 3,368 query images are selected as probe set to find the correct match across 19,732 reference gallery images.
812 PAPERS • 9 BENCHMARKS
DomainNet is a dataset of common objects in six different domain. All domains include 345 categories (classes) of objects such as Bracelet, plane, bird and cello. The domains include clipart: collection of clipart images; real: photos and real world images; sketch: sketches of specific objects; infograph: infographic images with specific object; painting artistic depictions of objects in the form of paintings and quickdraw: drawings of the worldwide players of game “Quick Draw!”.
604 PAPERS • 10 BENCHMARKS
The Office dataset contains 31 object categories in three domains: Amazon, DSLR and Webcam. The 31 categories in the dataset consist of objects commonly encountered in office settings, such as keyboards, file cabinets, and laptops. The Amazon domain contains on average 90 images per class and 2817 images in total. As these images were captured from a website of online merchants, they are captured against clean background and at a unified scale. The DSLR domain contains 498 low-noise high resolution images (4288×2848). There are 5 objects per category. Each object was captured from different viewpoints on average 3 times. For Webcam, the 795 images of low resolution (640×480) exhibit significant noise and color as well as white balance artifacts.
592 PAPERS • 7 BENCHMARKS
PACS is an image dataset for domain generalization. It consists of four domains, namely Photo (1,670 images), Art Painting (2,048 images), Cartoon (2,344 images) and Sketch (3,929 images). Each domain contains seven categories.
564 PAPERS • 7 BENCHMARKS
ImageNet-C is an open source data set that consists of algorithmically generated corruptions (blur, noise) applied to the ImageNet test-set.
510 PAPERS • 3 BENCHMARKS
The SYNTHIA dataset is a synthetic dataset that consists of 9400 multi-viewpoint photo-realistic frames rendered from a virtual city and comes with pixel-level semantic annotations for 13 classes. Each frame has resolution of 1280 × 960.
500 PAPERS • 10 BENCHMARKS
The GTA5 dataset contains 24966 synthetic images with pixel level semantic annotation. The images have been rendered using the open-world video game Grand Theft Auto 5 and are all from the car perspective in the streets of American-style virtual cities. There are 19 semantic classes which are compatible with the ones of Cityscapes dataset.
379 PAPERS • 7 BENCHMARKS
ImageNet-R(endition) contains art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renditions of ImageNet classes.
341 PAPERS • 6 BENCHMARKS
The ImageNet-A dataset consists of real-world, unmodified, and naturally occurring examples that are misclassified by ResNet models.
318 PAPERS • 5 BENCHMARKS
Foggy Cityscapes is a synthetic foggy dataset which simulates fog on real scenes. Each foggy image is rendered with a clear image and depth map from Cityscapes. Thus the annotations and data split in Foggy Cityscapes are inherited from Cityscapes.
207 PAPERS • 6 BENCHMARKS
VisDA-2017 is a simulation-to-real dataset for domain adaptation with over 280,000 images across 12 categories in the training, validation and testing domains. The training images are generated from the same object under different circumstances, while the validation images are collected from MSCOCO..
205 PAPERS • 6 BENCHMARKS
This paper introduces the pipeline to scale the largest dataset in egocentric vision EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection also enables evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected under the same hypotheses albeit "two years on". The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition.
134 PAPERS • 7 BENCHMARKS
SIM10k is a synthetic dataset containing 10,000 images, which is rendered from the video game Grand Theft Auto V (GTA5).
71 PAPERS • 3 BENCHMARKS
The Oxford RobotCar Dataset contains over 100 repetitions of a consistent route through Oxford, UK, captured over a period of over a year. The dataset captures many different combinations of weather, traffic and pedestrians, along with longer term changes such as construction and roadworks.
38 PAPERS • 3 BENCHMARKS
Jester Gesture Recognition dataset includes 148,092 labeled video clips of humans performing basic, pre-defined hand gestures in front of a laptop camera or webcam. It is designed for training machine learning models to recognize human hand gestures like sliding two fingers down, swiping left or right and drumming fingers.
16 PAPERS • 6 BENCHMARKS
SynLiDAR is a large-scale synthetic LiDAR sequential point cloud dataset with point-wise annotations. 13 sequences of LiDAR point cloud with around 20k scans (over 19 billion points and 32 semantic classes) are collected from virtual urban cities, suburban towns, neighborhood, and harbor.
10 PAPERS • 1 BENCHMARK
Adaptiope is a domain adaptation dataset with 123 classes in the three domains synthetic, product and real life. One of the main goals of Adaptiope is to offer a clean and well curated set of images for domain adaptation. This was necessary as many other common datasets in the area suffer from label noise and low quality images. Additionally, Adaptiope's class set was chosen in a way that minimizes the overlap with the class set of the commonly used ImageNet pretraining, therefore preventing information leakage in a domain adaptation setup.
9 PAPERS • NO BENCHMARKS YET
The Cross-dataset Testbed is a Decaf7 based cross-dataset image classification dataset, which contains 40 categories of images from 3 domains: 3,847 images in Caltech256, 4,000 images in ImageNet, and 2,626 images for SUN. In total there are 10,473 images of 40 categories from these three domains.
7 PAPERS • NO BENCHMARKS YET
MegaAge is a large dataset that consists of 41,941 faces annotated with age posterior distributions.
Modern Office-31 is a refurbished version of the commonly used Office-31 dataset. Modern Office-31 rectifies many of the annotation errors and low quality images in the Amazon domain of the original Office-31 dataset. Additionally, this dataset adds another synthetic domain based on the Adaptiope dataset.
6 PAPERS • NO BENCHMARKS YET
Libri-Adapt aims to support unsupervised domain adaptation research on speech recognition models.
5 PAPERS • 2 BENCHMARKS
The Sims4Action Dataset: a videogame-based dataset for Synthetic→Real domain adaptation for human activity recognition.
5 PAPERS • NO BENCHMARKS YET
The ClonedPerson dataset is a large-scale synthetic person re-identification dataset introduced in the paper "Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification" in CVPR 2022. It is generated by MakeHuman and Unity3D. Characters in this dataset use an automatic approach to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart. The dataset contains 887,766 synthesized person images of 5,621 identities.
4 PAPERS • 4 BENCHMARKS
This dataset contains 114 individuals including 1824 images captured from two disjoint camera views. For each person, eight images are captured from eight different orientations under one camera view and are normalized to 128x48 pixels. This dataset is also split into two parts randomly. One contains 57 individuals for training, and the other contains 57 individuals for testing.
4 PAPERS • 1 BENCHMARK
The Five-Billion-Pixels dataset contains more than 5 billion labeled pixels of 150 high-resolution Gaofen-2 (4 m) satellite images, annotated in a 24-category system covering artificial-constructed, agricultural, and natural classes. It possesses the advantage of rich categories, large coverage, wide distribution, and high-spatial resolution, which well reflects the distributions of real-world ground objects and can benefit to different land cover related studies.
3 PAPERS • NO BENCHMARKS YET
RoCoG-v2 (Robot Control Gestures) is a dataset intended to support the study of synthetic-to-real and ground-to-air video domain adaptation. It contains over 100K synthetically-generated videos of human avatars performing gestures from seven (7) classes. It also provides videos of real humans performing the same gestures from both ground and air perspectives
3 PAPERS • 1 BENCHMARK
5 domains: synthetic domain, document domain, street view domain, handwritten domain, and car license domain over five million images
2 PAPERS • 2 BENCHMARKS
CFC-DAOD is a domain adaptation extension to the Caltech Fish Counting domain generalization benchmark.
1 PAPER • 1 BENCHMARK
UDA-CH contains 16 objects that cover a variety of artworks which can be found in a museum like sculptures, paintings and books. Specifically, the dataset has been collected inside the cultural site “Galleria Regionale di Palazzo Bellomo” located in Siracusa, Italy.
A cross-city UDA benchmark built upon nuScenes.
1 PAPER • NO BENCHMARKS YET