The ImageNet1K dataset, also known as ILSVRC 2012, is a subset of the larger ImageNet dataset. It is commonly used for pretraining deep learning models for computer vision tasks. Here are some key details about the ImageNet1K dataset: 1. It spans 1000 object classes. 2. It contains 1,281,167 training images, 50,000 validation images, and 100,000 test images. 3. The images in the dataset are organized according to the WordNet hierarchy. 4. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a “synonym set” or “synset”. ImageNet aims to provide on average 1000 images to illustrate each synset. 5. The images of each concept are quality-controlled and human-annotated.
775 PAPERS • 1 BENCHMARK
Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra. It consists mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more.
116 PAPERS • 2 BENCHMARKS
GuitarSet is a dataset of high-quality guitar recordings and rich annotations. It contains 360 excerpts 30 seconds in length. The 360 excerpts are the result of the following combinations:
24 PAPERS • NO BENCHMARKS YET
The FIVR-200K dataset has been collected to simulate the problem of Fine-grained Incident Video Retrieval (FIVR). The dataset comprises 225,960 videos associated with 4,687 Wikipedia events and 100 selected video queries.
15 PAPERS • 1 BENCHMARK
Contains normal driving videos together with a set of anomalous actions in its training set. In the test set of the DAD dataset, there are unseen anomalous actions that still need to be winnowed out from normal driving.
8 PAPERS • NO BENCHMARKS YET
Clinical diagnosis of the eye is performed over multifarious data modalities including scalar clinical labels, vectorized biomarkers, two-dimensional fundus images, and three-dimensional Optical Coherence Tomography (OCT) scans. While the clinical labels, fundus images and OCT scans are instrumental measurements, the vectorized biomarkers are interpreted attributes from the other measurements. Clinical practitioners use all these data modalities for diagnosing and treating eye diseases like Diabetic Retinopathy (DR) or Diabetic Macular Edema (DME). Enabling usage of machine learning algorithms within the ophthalmic medical domain requires research into the relationships and interactions between these relevant data modalities. Existing datasets are limited in that: (i) they view the problem as disease prediction without assessing biomarkers, and (ii) they do not consider the explicit relationship among all four data modalities over the treatment period. In this paper, we introduce the O
4 PAPERS • NO BENCHMARKS YET
The US-4 is a dataset of Ultrasound (US) images. It is a video-based image dataset that contains over 23,000 high-resolution images from four US video sub-datasets, where two sub-datasets are newly collected by experienced doctors for this dataset.
3 PAPERS • NO BENCHMARKS YET
CommitBART is a benchmark for researching commit-related task such as denoising, cross-modal generation and contrastive learning. The dataset contains over 7 million commits across 7 programming languages.
2 PAPERS • NO BENCHMARKS YET
Extended Agriculture-Vision dataset comprises two parts:
Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including currency recognition. To aid with this task, we present BankNote-Net, an open dataset for assistive currency recognition. The dataset consists of a total of 24,816 embeddings of banknote images captured in a variety of assistive scenarios, spanning 17 currencies and 112 denominations. These compliant embeddings were learned using supervised contrastive learning and a MobileNetV2 architecture, and they can be used to train and test specialized downstream models for any currency, including those not covered by our dataset or for which only a few real images per denomination are available (few-shot learning). We deploy a variation of this model for public use in the last version of the Seeing AI app developed by Microsoft, which has over a 100 thousand monthly active users.
1 PAPER • NO BENCHMARKS YET
MatSim is a synthetic dataset, and natural image benchmark for computer vision-based recognition of similarities and transitions between materials and textures, focusing on identifying any material under any conditions using one or a few examples (one-shot learning), including materials states and subclasses.