The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
10,147 PAPERS • 92 BENCHMARKS
The MPII Human Pose Dataset for single person pose estimation is composed of about 25K images of which 15K are training samples, 3K are validation samples and 7K are testing samples (which labels are withheld by the authors). The images are taken from YouTube videos covering 410 different human activities and the poses are manually annotated with up to 16 body joints.
463 PAPERS • 4 BENCHMARKS
The Pascal3D+ multi-view dataset consists of images in the wild, i.e., images of object categories exhibiting high variability, captured under uncontrolled settings, in cluttered scenes and under many different poses. Pascal3D+ contains 12 categories of rigid objects selected from the PASCAL VOC 2012 dataset. These objects are annotated with pose information (azimuth, elevation and distance to camera). Pascal3D+ also adds pose annotated images of these 12 categories from the ImageNet dataset.
231 PAPERS • 1 BENCHMARK
This dataset focuses on heavily occluded human with comprehensive annotations including bounding-box, humans pose and instance mask. This dataset contains 13,360 elaborately annotated human instances within 5081 images. With average 0.573 MaxIoU of each person, OCHuman is the most complex and challenging dataset related to human.
52 PAPERS • 5 BENCHMARKS
KeypointNet is a large-scale and diverse 3D keypoint dataset that contains 83,231 keypoints and 8,329 3D models from 16 object categories, by leveraging numerous human annotations, based on ShapeNet models.
23 PAPERS • NO BENCHMARKS YET
ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20 times larger than PASCAL3D+ and KITTI, the current state-of-the-art.
17 PAPERS • 14 BENCHMARKS
The General Robust Image Task (GRIT) Benchmark is an evaluation-only benchmark for evaluating the performance and robustness of vision systems across multiple image prediction tasks, concepts, and data sources. GRIT hopes to encourage our research community to pursue the following research directions:
13 PAPERS • 8 BENCHMARKS
AwA Pose is a large scale animal keypoint dataset with ground truth annotations for keypoint detection of quadruped animals from images.
4 PAPERS • NO BENCHMARKS YET
A multimodal dataset of radio galaxies and their corresponding infrared hosts.
1 PAPER • NO BENCHMARKS YET
Automating the creation of catalogues for radio galaxies in next-generation deep surveys necessitates the identification of components within extended sources and their respective infrared hosts. We present RadioGalaxyNET, a multimodal dataset, tailored for machine learning tasks to streamline the automated detection and localization of multi-component extended radio galaxies and their associated infrared hosts. The dataset encompasses 4,155 instances of galaxies across 2,800 images, incorporating both radio and infrared channels. Each instance furnishes details about the extended radio galaxy class, a bounding box covering all components, a pixel-level segmentation mask, and the keypoint position of the corresponding infrared host galaxy. RadioGalaxyNET is the first dataset to include images from the highly sensitive Australian Square Kilometre Array Pathfinder (ASKAP) radio telescope, corresponding infrared images, and instance-level annotations for galaxy detection.
1 PAPER • 1 BENCHMARK
TAMPAR is a real-world dataset of parcel photos for tampering detection with annotations in COCO format. For details see the paper and for visual samples the project page. Features are: