CN-Celeb is a large-scale speaker recognition dataset collected `in the wild'. This dataset contains more than 130,000 utterances from 1,000 Chinese celebrities, and covers 11 different genres in real world.
63 PAPERS • 1 BENCHMARK
Oxford105k is the combination of the Oxford5k dataset and 99782 negative images crawled from Flickr using 145 most popular tags. This dataset is used to evaluate search performance for object retrieval (reported as mAP) on a large scale.
44 PAPERS • NO BENCHMARKS YET
STRING is a collection of protein-protein interaction (PPI) networks.
34 PAPERS • NO BENCHMARKS YET
HolStep is a dataset based on higher-order logic (HOL) proofs, for the purpose of developing new machine learning-based theorem-proving strategies.
10 PAPERS • 2 BENCHMARKS
The Oxford-Affine dataset is a small dataset containing 8 scenes with sequence of 6 images per scene. The images in a sequence are related by homographies.
7 PAPERS • NO BENCHMARKS YET
The AtariARI (Atari Annotated RAM Interface) is an environment for representation learning. The Atari Arcade Learning Environment (ALE) does not explicitly expose any ground truth state information. However, ALE does expose the RAM state (128 bytes per timestep) which are used by the game programmer to store important state information such as the location of sprites, the state of the clock, or the current room the agent is in. To extract these variables, the dataset creators consulted commented disassemblies (or source code) of Atari 2600 games which were made available by Engelhardt and Jentzsch and CPUWIZ. The dataset creators were able to find and verify important state variables for a total of 22 games. Once this information was acquired, combining it with the ALE interface produced a wrapper that can automatically output a state label for every example frame generated from the game. The dataset creators make this available with an easy-to-use gym wrapper, which returns this infor
6 PAPERS • NO BENCHMARKS YET
GoodSounds dataset contains around 28 hours of recordings of single notes and scales played by 15 different professional musicians, all of them holding a music degree and having some expertise in teaching. 12 different instruments (flute, cello, clarinet, trumpet, violin, alto sax alto, tenor sax, baritone sax, soprano sax, oboe, piccolo and bass) were recorded using one or up to 4 different microphones. For all the instruments the whole set of playable semitones in the instrument is recorded several times with different tonal characteristics. Each note is recorded into a separate monophonic audio file of 48kHz and 32 bits. Rich annotations of the recordings are available, including details on recording environment and rating on tonal qualities of the sound (“good-sound”, “bad”, “scale-good”, “scale-bad”).
4 PAPERS • NO BENCHMARKS YET
The Deep Fakes Dataset is a collection of "in the wild" portrait videos for deepfake detection. The videos in the dataset are diverse real-world samples in terms of the source generative model, resolution, compression, illumination, aspect-ratio, frame rate, motion, pose, cosmetics, occlusion, content, and context. They originate from various sources such as news articles, forums, apps, and research presentations; totalling up to 142 videos, 32 minutes, and 17 GBs. Synthetic videos are matched with their original counterparts when possible.
3 PAPERS • NO BENCHMARKS YET
The Specs on Faces (SoF) dataset, a collection of 42,592 (2,662×16) images for 112 persons (66 males and 46 females) who wear glasses under different illumination conditions. The dataset is FREE for reasonable academic fair use. The dataset presents a new challenge regarding face detection and recognition. It is focused on two challenges: harsh illumination environments and face occlusions, which highly affect face detection, recognition, and classification. The glasses are the common natural occlusion in all images of the dataset. However, there are two more synthetic occlusions (nose and mouth) added to each image. Moreover, three image filters, that may evade face detectors and facial recognition systems, were applied to each image. All generated images are categorized into three levels of difficulty (easy, medium, and hard). That enlarges the number of images to be 42,592 images (26,112 male images and 16,480 female images). There is metadata for each image that contains many infor
Repository of a generative art dataset by computer artist Andy Lomas.
2 PAPERS • NO BENCHMARKS YET