The LFW dataset contains 13,233 images of faces collected from the web. This dataset consists of the 5749 identities with 1680 people with two or more images. In the standard LFW evaluation protocol the verification accuracies are reported on 6000 face pairs.
784 PAPERS • 13 BENCHMARKS
The CASIA-WebFace dataset is used for face verification and face identification tasks. The dataset contains 494,414 face images of 10,575 real identities collected from the web.
381 PAPERS • 2 BENCHMARKS
The MS-Celeb-1M dataset is a large-scale face recognition dataset consists of 100K identities, and each identity has about 100 facial images. The original identity labels are obtained automatically from webpages.
244 PAPERS • NO BENCHMARKS YET
The Extended Yale B database contains 2414 frontal-face images with size 192×168 over 38 subjects and about 64 images per subject. The images were captured under different lighting conditions and various facial expressions.
181 PAPERS • 1 BENCHMARK
MORPH is a facial age estimation dataset, which contains 55,134 facial images of 13,617 subjects ranging from 16 to 77 years old.
169 PAPERS • 8 BENCHMARKS
The IJB-B dataset is a template-based face dataset that contains 1845 subjects with 11,754 images, 55,025 frames and 7,011 videos where a template consists of a varying number of still images and video frames from different sources. These images and videos are collected from the Internet and are totally unconstrained, with large variations in pose, illumination, image quality etc. In addition, the dataset comes with protocols for 1-to-1 template-based face verification, 1-to-N template-based open-set face identification, and 1-to-N open-set video face identification.
142 PAPERS • 5 BENCHMARKS
The Adience dataset, published in 2014, contains 26,580 photos across 2,284 subjects with a binary gender label and one label from eight different age groups, partitioned into five splits. The key principle of the data set is to capture the images as close to real world conditions as possible, including all variations in appearance, pose, lighting condition and image quality, to name a few.
114 PAPERS • 6 BENCHMARKS
The VGG Face dataset is face identity recognition dataset that consists of 2,622 identities. It contains over 2.6 million images.
88 PAPERS • NO BENCHMARKS YET
To validate the racial bias of four commercial APIs and four state-of-the-art (SOTA) algorithms.
60 PAPERS • NO BENCHMARKS YET
Partial REID is a specially designed partial person reidentification dataset that includes 600 images from 60 people, with 5 full-body images and 5 occluded images per person. These images were collected on a university campus by 6 cameras from different viewpoints, backgrounds and different types of occlusion. The examples of partial persons in the Partial REID dataset are shown in the Figure.
39 PAPERS • NO BENCHMARKS YET
CASIA-FASD is a small face anti-spoofing dataset containing 50 subjects.
37 PAPERS • NO BENCHMARKS YET
The color FERET database is a dataset for face recognition. It contains 11,338 color images of size 512×768 pixels captured in a semi-controlled environment with 13 different poses from 994 subjects.
34 PAPERS • 3 BENCHMARKS
UMDFaces is a face dataset divided into two parts:
30 PAPERS • NO BENCHMARKS YET
CelebA-Spoof is a large-scale face anti-spoofing dataset with the following properties:
26 PAPERS • NO BENCHMARKS YET
TinyFace is a large scale face recognition benchmark to facilitate the investigation of natively LRFR (Low Resolution Face Recognition) at large scales (large gallery population sizes) in deep learning. The TinyFace dataset consists of 5,139 labelled facial identities given by 169,403 native LR face images (average 20×16 pixels) designed for 1:N recognition test. All the LR faces in TinyFace are collected from public web data across a large variety of imaging scenarios, captured under uncontrolled viewing conditions in pose, illumination, occlusion and background.
23 PAPERS • 1 BENCHMARK
IMDb-Face is large-scale noise-controlled dataset for face recognition research. The dataset contains about 1.7 million faces, 59k identities, which is manually cleaned from 2.0 million raw images. All images are obtained from the IMDb website.
21 PAPERS • NO BENCHMARKS YET
Dataset for face anti-spoofing in terms of both subjects and modalities. Specifically, it consists of subjects with videos and each sample has modalities (i.e., RGB, Depth and IR).
20 PAPERS • NO BENCHMARKS YET
The Replay-Mobile Database for face spoofing consists of 1190 video clips of photo and video attack attempts to 40 clients, under different lighting conditions. These videos were recorded with current devices from the market -- an iPad Mini2 (running iOS) and a LG-G4 smartphone (running Android). This Database was produced at the Idiap Research Institute (Switzerland) within the framework of collaboration with Galician Research and Development Center in Advanced Telecommunications - Gradiant (Spain).
18 PAPERS • NO BENCHMARKS YET
WebFace260M is a million-scale face benchmark, which is constructed for the research community towards closing the data gap behind the industry.
A renovation of Labeled Faces in the Wild (LFW), the de facto standard testbed for unconstraint face verification.
15 PAPERS • 4 BENCHMARKS
An evaluation protocol for face verification focusing on a large intra-pair image quality difference.
15 PAPERS • 1 BENCHMARK
DigiFace-1M is a synthetic dataset for face recognition, obtained by rendering digital faces using a computer graphics pipeline. It contains 1.22M images of 110K unique identities. The dataset consists of two parts. The first part contains 720K images with 10K identities. For each identity, 4 different sets of accessories are sampled and 18 images are rendered for each set. The second part contains 500K images with 100K identities. For each identity, only one set of accessories is sampled and only 5 images are rendered. Following the format of the existing datasets, we provide the aligned crop around the face, resized into $112 \times 112$ resolution.
14 PAPERS • NO BENCHMARKS YET
During the COVID-19 coronavirus epidemic, almost everyone wears a facial mask, which poses a huge challenge to face recognition. Traditional face recognition systems may not effectively recognize the masked faces, but removing the mask for authentication will increase the risk of virus infection. Inspired by the COVID-19 pandemic response, the widespread requirement that people wear protective face masks in public places has driven a need to understand how face recognition technology deals with occluded faces, often with just the periocular area and above visible.
13 PAPERS • 1 BENCHMARK
A new face annotation dataset with balanced distribution between genders and ethnic origins.
10 PAPERS • 2 BENCHMARKS
QMUL-SurvFace is a surveillance face recognition benchmark that contains 463,507 face images of 15,573 distinct identities captured in real-world uncooperative surveillance scenes over wide space and time.
10 PAPERS • 1 BENCHMARK
Real-World Masked Face Dataset (RMFD) is a large dataset for masked face detection.
10 PAPERS • NO BENCHMARKS YET
MeGlass is an eyeglass dataset originally designed for eyeglass face recognition evaluation. All the face images are selected and cleaned from MegaFace. Each identity has at least two face images with eyeglass and two face images without eyeglass. It contains 47,817 images from 1,710 different identities.
8 PAPERS • NO BENCHMARKS YET
Contains over 11000 images of 1000 identities with different types of disguise accessories. The dataset is collected from the Internet, resulting in unconstrained face images similar to real world settings.
7 PAPERS • 3 BENCHMARKS
Although deep face recognition has achieved impressive results in recent years, there is increasing controversy regarding racial and gender bias of the models, questioning their trustworthiness and deployment into sensitive scenarios. DemogPairs is a validation set with 10.8K facial images and 58.3M identity verification pairs, distributed in demographically-balanced folds of Asian, Black and White females and males. We also propose a benchmark of experiments using DemogPairs over state-of-the-art deep face recognition models in order to analyze their cross-demographic behavior and potential demographic biases (see figure below).
7 PAPERS • NO BENCHMARKS YET
The iCartoonFace dataset is a large-scale dataset that can be used for two different tasks: cartoon face detection and cartoon face recognition.
7 PAPERS • 1 BENCHMARK
Aims to facilitate research in caricature recognition. All the caricatures and face images were collected from the Web. Compared with two existing datasets, this dataset is much more challenging, with a much greater number of available images, artistic styles and larger intra-personal variations.
6 PAPERS • NO BENCHMARKS YET
iQIYI-VID dataset, which comprises video clips from iQIYI variety shows, films, and television dramas. The whole dataset contains 500,000 videos clips of 5,000 celebrities. The length of each video is 1~30 seconds.
The IDiff-Face dataset was proposed in the paper "IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models". This dataset is synthetically generated using the IDiff-Face model.
5 PAPERS • NO BENCHMARKS YET
The Masked LFW (MLFW), based on Cross-Age LFW (CALFW) database, is built using a simple but effective tool that generates masked faces from unmasked faces automatically.
5 PAPERS • 1 BENCHMARK
ROF is a dataset for occluded face recognition that contains faces with both upper face occlusion, due to sunglasses, and lower face occlusion, due to masks.
A multimodal database for eye blink detection and attention level estimation.
The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images. Our method relies on Spark AR Studio, a developer program made by Facebook that is used to create Instagram face filters. In our approach, we use 9 masks of different colors, shapes and fabrics. We employ our method to generate a number of 445,446 (90%) samples of masks for the CASIA-WebFace data set.
4 PAPERS • 1 BENCHMARK
The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images. Our method relies on Spark AR Studio, a developer program made by Facebook that is used to create Instagram face filters. In our approach, we use 9 masks of different colors, shapes and fabrics. We employ our method to generate a number of 196,254 (96.8%) masks for the CelebA data set.
Paper Abstract
3 PAPERS • 4 BENCHMARKS
We have cleaned the noisy IMDB-WIKI dataset using a constrained clustering method, resulting this new benchmark for in-the-wild age estimation. The annotations also allow this dataset to use for some other tasks, like gender classification and face recognition/verification. For more details, please refer to our FPAge paper.
3 PAPERS • 1 BENCHMARK
MCXFace is a heterogeneous face recognition dataset consisting of multi-channel image samples for 51 subjects. For each subject color (RGB), thermal, near-infrared (850 nm), short-wave infrared (1300 nm), Depth, Stereo depth, and depth estimated from RGB images are available. Overall 7406 images together with landmark annotations and standard protocols are available in this dataset.
3 PAPERS • NO BENCHMARKS YET
CASIA-Face-Africa is a face image database which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc.
2 PAPERS • NO BENCHMARKS YET
Celeb-HQ Face Gender Recognition Dataset
Celeb-HQ Facial Identity Recognition Dataset
KANFace consists of 40K still images and 44K sequences (14.5M video frames in total) captured in unconstrained, real-world conditions from 1,045 subjects. The dataset is manually annotated in terms of identity, exact age, gender and kinship.
2 PAPERS • 1 BENCHMARK
Comprised of real human and wax figure images and videos that endorse the problem of face spoofing detection. The dataset consists of more than 1800 face images and 110 videos of 55 people/waxworks, arranged in training, validation and test sets with a large range in expression, illumination and pose variations.
"The Chicago Face Database was developed at the University of Chicago by Debbie S. Ma, Joshua Correll, and Bernd Wittenbrink. The CFD is intended for use in scientific research. It provides high-resolution, standardized photographs of male and female faces of varying ethnicity between the ages of 17-65. Extensive norming data are available for each individual model. These data include both physical attributes (e.g., face size) as well as subjective ratings by independent judges (e.g., attractiveness).
1 PAPER • NO BENCHMARKS YET