The Annotated Facial Landmarks in the Wild (AFLW) is a large-scale collection of annotated face images gathered from Flickr, exhibiting a large variety in appearance (e.g., pose, expression, ethnicity, age, gender) as well as general imaging and environmental conditions. In total about 25K faces are annotated with up to 21 landmarks per image.
151 PAPERS • 11 BENCHMARKS
AFLW2000-3D is a dataset of 2000 images that have been annotated with image-level 68-point 3D facial landmarks. This dataset is used for evaluation of 3D facial landmark detection models. The head poses are very diverse and often hard to be detected by a CNN-based face detector.
112 PAPERS • 8 BENCHMARKS
CMU Panoptic is a large scale dataset providing 3D pose annotations (1.5 millions) for multiple people engaging social activities. It contains 65 videos (5.5 hours) with multi-view annotations, but only 17 of them are in multi-person scenario and have the camera parameters.
112 PAPERS • 4 BENCHMARKS
The Caltech Occluded Faces in the Wild (COFW) dataset is designed to present faces in real-world conditions. Faces show large variations in shape and occlusions due to differences in pose, expression, use of accessories such as sunglasses and hats and interactions with objects (e.g. food, hands, microphones, etc.). All images were hand annotated using the same 29 landmarks as in LFPW. Both the landmark positions as well as their occluded/unoccluded state were annotated. The faces are occluded to different degrees, with large variations in the type of occlusions encountered. COFW has an average occlusion of over 23.
110 PAPERS • 5 BENCHMARKS
The Wider Facial Landmarks in the Wild or WFLW database contains 10000 faces (7500 for training and 2500 for testing) with 98 annotated landmarks. This database also features rich attribute annotations in terms of occlusion, head pose, make-up, illumination, blur and expressions.
100 PAPERS • 4 BENCHMARKS
The dataset contains over 15K images of 20 people (6 females and 14 males - 4 people were recorded twice). For each frame, a depth image, the corresponding rgb image (both 640x480 pixels), and the annotation is provided. The head pose range covers about +-75 degrees yaw and +-60 degrees pitch. Ground truth is provided in the form of the 3D location of the head and its rotation.
38 PAPERS • 1 BENCHMARK
Biwi Kinect Head Pose is a challenging dataset mainly inspired by the automotive setup. It is acquired with the Microsoft Kinect sensor, a structured IR light device. It contains about 15k frame, with RGB. (640 × 480) and depth maps (640 × 480). Twenty subjects have been involved in the recordings: four of them were recorded twice, for a total of 24 sequences. The ground truth of yaw, pitch and roll angles is reported together with the head center and the calibration matrix.
6 PAPERS • NO BENCHMARKS YET
These images were generated using Blender and IEE-Simulator with different head-poses, where the images are labelled according to nine classes (straight, turned bottom-left, turned left, turned top-left, turned bottom-right, turned right, turned top-right, reclined, looking up). The dataset contains 16,013 training images and 2,825 testing images, in addition to 4,700 images for improvements.
4 PAPERS • NO BENCHMARKS YET
DAD-3DHeads dataset consists of 44,898 images collected from various sources (37,840 in the training set, 4,312 in the validation set, and 2,746 in the test set).
3 PAPERS • NO BENCHMARKS YET
ICT-3DHP is collected using the Microsoft Kinect sensor and contains RGB images and depth maps of about 14k frames, divided in 10 sequences. The image resolution is 640 × 480 pixels. An hardware sensor (Polhemus Fastrack) is exploited to generate the ground truth annotation. The device is placed on a white cap worn by each subject, visible in both RGB and depth frames.
0 PAPER • NO BENCHMARKS YET