JHMDB is an action recognition dataset that consists of 960 video sequences belonging to 21 actions. It is a subset of the larger HMDB51 dataset collected from digitized movies and YouTube videos. The dataset contains video and annotation for puppet flow per frame (approximated optimal flow on the person), puppet mask per frame, joint positions per frame, action label per clip and meta label per clip (camera motion, visible body parts, camera viewpoint, number of people, video quality).
230 PAPERS • 8 BENCHMARKS
AGORA is a synthetic human dataset with high realism and accurate ground truth. It consists of around 14K training and 3K test images by rendering between 5 and 15 people per image using either image-based lighting or rendered 3D environments, taking care to make the images physically plausible and photoreal. In total, AGORA contains 173K individual person crops. AGORA provides (1) SMPL/SMPL-X parameters and (2) segmentation masks for each subject in images.
55 PAPERS • 4 BENCHMARKS
This dataset focuses on heavily occluded human with comprehensive annotations including bounding-box, humans pose and instance mask. This dataset contains 13,360 elaborately annotated human instances within 5081 images. With average 0.573 MaxIoU of each person, OCHuman is the most complex and challenging dataset related to human.
52 PAPERS • 5 BENCHMARKS
UAV-Human is a large dataset for human behavior understanding with UAVs. It contains 67,428 multi-modal video sequences and 119 subjects for action recognition, 22,476 frames for pose estimation, 41,290 frames and 1,144 identities for person re-identification, and 22,263 frames for attribute recognition. The dataset was collected by a flying UAV in multiple urban and rural districts in both daytime and nighttime over three months, hence covering extensive diversities w.r.t subjects, backgrounds, illuminations, weathers, occlusions, camera motions, and UAV flying attitudes. This dataset can be used for UAV-based human behavior understanding, including action recognition, pose estimation, re-identification, and attribute recognition.
38 PAPERS • 5 BENCHMARKS
COCO-WholeBody is an extension of COCO dataset with whole-body annotations. There are 4 types of bounding boxes (person box, face box, left-hand box, and right-hand box) and 133 keypoints (17 for body, 6 for feet, 68 for face and 42 for hands) annotations for each person in the image.
22 PAPERS • 6 BENCHMARKS
Alibaba Cluster Trace captures detailed statistics for the co-located workloads of long-running and batch jobs over a course of 24 hours. The trace consists of three parts: (1) statistics of the studied homogeneous cluster of 1,313 machines, including each machine’s hardware configuration, and the runtime {CPU, Memory, Disk} resource usage for a duration of 12 hours (the 2nd half of the 24-hour period); (2) long-running job workloads, including a trace of all container deployment requests and actions, and a resource usage trace for 12 hours; (3) co-located batch job workloads, including a trace of all batch job requests and actions, and a trace of per-instance resource usage over 24 hours.
7 PAPERS • 1 BENCHMARK
Human-Art is a versatile human-centric dataset to bridge the gap between natural and artificial scenes. It includes twenty high-quality human scenes, including natural and artificial humans in both 2D representation and 3D representation. It includes 50,000 images including more than 123,000 human figures in 20 scenarios, with annotations of human bounding box, 21 2D human keypoints, human self-contact keypoints, and description text.
6 PAPERS • 1 BENCHMARK
This basketball dataset was acquired under the Walloon region project DeepSport, using the Keemotion system installed in multiple arenas. We would like to thanks both Keemotion for letting us use their system for raw image acquisition during live productions, and the LNB for the rights on their images.
5 PAPERS • NO BENCHMARKS YET
Throughout the history of art, the pose—as the holistic abstraction of the human body's expression—has proven to be a constant in numerous studies. However, due to the enormous amount of data that so far had to be processed by hand, its crucial role to the formulaic recapitulation of art-historical motifs since antiquity could only be highlighted selectively. This is true even for the now automated estimation of human poses, as domain-specific, sufficiently large data sets required for training computational models are either not publicly available or not indexed at a fine enough granularity. With the Poses of People in Art data set, we introduce the first openly licensed data set for estimating human poses in art and validating human pose estimators. It consists of 2,454 images from 22 art-historical depiction styles, including those that have increasingly turned away from lifelike representations of the body since the 19th century. A total of 10,749 human figures are precisely enclos
3 PAPERS • 1 BENCHMARK
Relative Human (RH) contains multi-person in-the-wild RGB images with rich human annotations, including:
3 PAPERS • 2 BENCHMARKS
Accurate 3D human pose estimation is essential for sports analytics, coaching, and injury prevention. However, existing datasets for monocular pose estimation do not adequately capture the challenging and dynamic nature of sports movements. In response, we introduce SportsPose, a large-scale 3D human pose dataset consisting of highly dynamic sports movements. With more than 176,000 3D poses from 24 different subjects performing 5 different sports activities, SportsPose provides a diverse and comprehensive set of 3D poses that reflect the complex and dynamic nature of sports movements. Contrary to other markerless datasets we have quantitatively evaluated the precision of SportsPose by comparing our poses with a commercial marker-based system and achieve a mean error of 34.5 mm across all evaluation sequences. This is comparable to the error reported on the commonly used 3DPW dataset. We further introduce a new metric, local movement, which describes the movement of the wrist and ankle
3 PAPERS • NO BENCHMARKS YET
A dataset for 2D pose estimation of anime/manga images.
2 PAPERS • NO BENCHMARKS YET
Largest, first-of-its-kind, in-the-wild, fine-grained workout/exercise posture analysis dataset, covering three different exercises: BackSquat, Barbell Row, and Overhead Press. Seven different types of exercise errors are covered. Unlabeled data is also provided to facilitate self-supervised learning.
JRDB-Pose is a large-scale dataset and benchmark for multi-person pose estimation and tracking using videos captured from a social navigation robot. The dataset contains challenge scenes with crowded indoor and outdoor locations and a diverse range of scales and occlusion types. It provides human pose annotations with per-keypoint occlusion labels and tack IDs consistent across the scene. These annotations include 600,000 human body pose annotations and 600,000 head bounding box annotations.
In recent years, person detection and human pose estimation have made great strides, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis of human activities, poses, or context diversity. Additionally, privacy, legal, safety, and ethical concerns may limit the ability to collect more human data. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, creation of synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. Therefore, we release a human-centric synthetic data generator PeopleSansPeople which contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels. Using PeopleSansPeople, we performed benchmark synthetic data training using a Detectron2 Keypoint R-CNN variant [1]. We found that pre-training a network using sy
Human keypoint dataset of anime/manga-style character illustrations. Extension of the AnimeDrawingsDataset, with additional features:
1 PAPER • NO BENCHMARKS YET
FreeMan is the first large-scale multi-view human motion dataset under real scenarios. FreeMan was captured by synchro- nizing 8 smartphones across diverse scenarios. It comprises 11M frames from 8000 sequences, viewed from different perspectives. These sequences cover 40 subjects across 10 different scenarios, each with varying lighting conditions.
Halpe-FullBody is a full body keypoints dataset where each person has annotated 136 keypoints, including 20 for body, 6 for feet, 42 for hands and 68 for face. It is designed for the task of whole body human pose estimation.
The MPII Human Pose Descriptions dataset extends the widely-used MPII Human Pose Dataset with rich textual annotations. These annotations are generated by various state-of-the-art language models (LLMs) and include detailed descriptions of the activities being performed, the count of people present, and their specific poses.
This dataset contains 500K high photo-real rendered images of 10 real head models with (yaw, pitch, roll) head pose labels. Each head is rendered with a different pose and environmental lighting.
Synthetic humans generated by the RePoGen method.
InfiniteRep is a synthetic, open-source dataset for fitness and physical therapy (PT) applications. It includes 1k videos of diverse avatars performing multiple repetitions of common exercises. It includes significant variation in the environment, lighting conditions, avatar demographics, and movement trajectories. From cadence to kinematic trajectory, each rep is done slightly differently -- just like real humans. InfiniteRep videos are accompanied by a rich set of pixel-perfect labels and annotations, including frame-specific repetition counts.
0 PAPER • NO BENCHMARKS YET