The Human3.6M dataset is one of the largest motion capture datasets, which consists of 3.6 million human poses and corresponding images captured by a high-speed motion capture system. There are 4 high-resolution progressive scan cameras to acquire video data at 50 Hz. The dataset contains activities by 11 professional actors in 17 scenarios: discussion, smoking, taking photo, talking on the phone, etc., as well as provides accurate 3D joint positions and high-resolution videos.
715 PAPERS • 16 BENCHMARKS
PASCAL-Part is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL object detection task by providing segmentation masks for each body part of the object. For categories that do not have a consistent set of parts (e.g., boat), it provides the silhouette annotation.
55 PAPERS • 4 BENCHMARKS
The Crowd Instance-level Human Parsing (CIHP) dataset has 38,280 diverse human images. Each image in CIHP is labeled with pixel-wise annotations on 20 categories and instance-level identification. The dataset can be used for the human part segmentation task.
29 PAPERS • 1 BENCHMARK
The MHP dataset contains multiple persons captured in real-world scenes with pixel-level fine-grained semantic annotations in an instance-aware setting.
14 PAPERS • 3 BENCHMARKS
The Pascal Panoptic Parts dataset consists of annotations for the part-aware panoptic segmentation task on the PASCAL VOC 2010 dataset. It is created by merging scene-level labels from PASCAL-Context with part-level labels from PASCAL-Part
7 PAPERS • 2 BENCHMARKS
The Cityscapes Panoptic Parts dataset introduces part-aware panoptic segmentation annotations for the Cityscapes dataset. It extends the original panoptic annotations for the Cityscapes dataset with part-level annotations for selected scene-level classes.
6 PAPERS • 1 BENCHMARK
We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.
2 PAPERS • NO BENCHMARKS YET
CCIHP dataset is devoted to fine-grained description of people in the wild with localized & characterized semantic attributes. It contains 20 attribute classes and 20 characteristic classes split into 3 categories (size, pattern and color). The dataset has been introduced in this paper: Loesch, A., & Audigier, R. (2021, September). Describe me if you can! Characterized instance-level human parsing. In 2021 IEEE International Conference on Image Processing (ICIP) (pp. 2528-2532). IEEE. The annotations were made with Pixano, an opensource, smart annotation tool for computer vision applications: https://pixano.cea.fr/
1 PAPER • NO BENCHMARKS YET