FaceForensics++ is a forensics dataset consisting of 1000 original video sequences that have been manipulated with four automated face manipulation methods: Deepfakes, Face2Face, FaceSwap and NeuralTextures. The data has been sourced from 977 youtube videos and all videos contain a trackable mostly frontal face without occlusions which enables automated tampering methods to generate realistic forgeries.
311 PAPERS • 2 BENCHMARKS
Celeb-DF is a large-scale challenging dataset for deepfake forensics. It includes 590 original videos collected from YouTube with subjects of different ages, ethnic groups and genders, and 5639 corresponding DeepFake videos.
142 PAPERS • NO BENCHMARKS YET
The DFDC (Deepfake Detection Challenge) is a dataset for deepface detection consisting of more than 100,000 videos.
104 PAPERS • 1 BENCHMARK
FaceForensics is a video dataset consisting of more than 500,000 frames containing faces from 1004 videos that can be used to study image or video forgeries. All videos are downloaded from Youtube and are cut down to short continuous clips that contain mostly frontal faces. This dataset has two versions:
81 PAPERS • 1 BENCHMARK
FakeAVCeleb is a novel Audio-Video Deepfake dataset that not only contains deepfake videos but respective synthesized cloned audios as well.
34 PAPERS • 2 BENCHMARKS
WildDeepfake is a dataset for real-world deepfakes detection which consists of 7,314 face sequences extracted from 707 deepfake videos that are collected completely from the internet. WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop more effective detectors against real-world deepfakes.
34 PAPERS • NO BENCHMARKS YET
DeeperForensics-1.0 represents the largest face forgery detection dataset by far, with 60,000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. The full dataset includes 48,475 source videos and 11,000 manipulated videos. The source videos are collected on 100 paid and consented actors from 26 countries, and the manipulated videos are generated by a newly proposed many-to-many end-to-end face swapping method, DF-VAE. 7 types of real-world perturbations at 5 intensity levels are employed to ensure a larger scale and higher diversity. Image Source: https://github.com/EndlessSora/DeeperForensics-1.0
21 PAPERS • NO BENCHMARKS YET
The Korean DeepFake Detection Dataset (KoDF) is a large-scale collection of synthesized and real videos focused on Korean subjects, used for the task of deepfake detection.
17 PAPERS • NO BENCHMARKS YET
WaveFake is a dataset for audio deepfake detection. The dataset consists of a large-scale dataset of over 100K generated audio clips.
Localized Audio Visual DeepFake Dataset (LAV-DF).
5 PAPERS • 2 BENCHMARKS
We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification. 2) Spatial Forgery Localization, which segments the manipulated area of fake images compared to their corresponding source real images. 3) Video Forgery Classification, which re-defines the video-level forgery classification with manipulated frames in random positions. This task is important because attackers in real world are free to manipulate any target frame. and 4) Temporal Forgery Localization, to localize the temporal segments which are manipulated. ForgeryNet is by far the largest publicly available deep face forgery dataset in terms of data-scale (2.9 million images, 221,247 video
3 PAPERS • 2 BENCHMARKS
We created a new dataset, named DFDM, with 6,450 Deepfake videos generated by different Autoencoder models. Specifically, five Autoencoder models with variations in encoder, decoder, intermediate layer, and input resolution, respectively, have been selected to generate Deepfakes based on the same input. We have observed the visible but subtle visual differences among different Deepfakes, demonstrating the evidence of model attribution artifacts.
2 PAPERS • NO BENCHMARKS YET
DeePhy is a novel DeepFake Phylogeny dataset consisting of 5040 DeepFake videos generated using three different generation techniques. It is one of the first datasets which incorporates the concept of Deepfake Phylogeny which refers to the idea of generation of DeepFakes using multiple generation techniques in a sequential manner.
DeepFake MNIST+ is a deepfake facial animation dataset. The dataset is generated by a SOTA image animation generator. It includes 10,000 facial animation videos in ten different actions, which can spoof the recent liveness detectors.
The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In this research, we emulate the process of such content generation and propose the AV-Deepfake1M dataset. The dataset contains content-driven (i) video manipulations, (ii) audio manipulations, and (iii) audio-visual manipulations for more than 2K subjects resulting in a total of more than 1M videos. The paper provides a thorough description of the proposed data generation pipeline accompanied by a rigorous analysis of the quality of the generated data. The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant
1 PAPER • NO BENCHMARKS YET
DEEP-VOICE: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion This dataset contains examples of real human speech, and DeepFake versions of those speeches by using Retrieval-based Voice Conversion.
1 PAPER • 1 BENCHMARK
Human face Deepfake dataset sampled from large datasets
We release the dataset for non-commercial research. Submit requests <a href="https://forms.gle/6WPEGNWbYoEe6bte8" target="_blank">here</a>.