The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
10,147 PAPERS • 92 BENCHMARKS
The CheXpert dataset contains 224,316 chest radiographs of 65,240 patients with both frontal and lateral views available. The task is to do automated chest x-ray interpretation, featuring uncertainty labels and radiologist-labeled reference standard evaluation sets.
508 PAPERS • 1 BENCHMARK
The NUS-WIDE dataset contains 269,648 images with a total of 5,018 tags collected from Flickr. These images are manually annotated with 81 concepts, including objects and scenes.
320 PAPERS • 3 BENCHMARKS
ChestX-ray14 is a medical imaging dataset which comprises 112,120 frontal-view X-ray images of 30,805 (collected from the year of 1992 to 2015) unique patients with the text-mined fourteen common disease labels, mined from the text radiological reports via NLP techniques. It expands on ChestX-ray8 by adding six additional thorax diseases: Edema, Emphysema, Fibrosis, Pleural Thickening and Hernia.
206 PAPERS • 5 BENCHMARKS
MIMIC-CXR from Massachusetts Institute of Technology presents 371,920 chest X-rays associated with 227,943 imaging studies from 65,079 patients. The studies were performed at Beth Israel Deaconess Medical Center in Boston, MA.
165 PAPERS • 2 BENCHMARKS
PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:
119 PAPERS • 14 BENCHMARKS
ECHR is an English legal judgment prediction dataset of cases from the European Court of Human Rights (ECHR). The dataset contains ~11.5k cases, including the raw text.
35 PAPERS • 1 BENCHMARK
The friedman1 data set is commonly used to test semi-supervised regression methods.
29 PAPERS • NO BENCHMARKS YET
The MRNet dataset consists of 1,370 knee MRI exams performed at Stanford University Medical Center. The dataset contains 1,104 (80.6%) abnormal exams, with 319 (23.3%) ACL tears and 508 (37.1%) meniscal tears; labels were obtained through manual extraction from clinical reports.
23 PAPERS • 1 BENCHMARK
Our goal is to improve upon the status quo for designing image classification models trained in one domain that perform well on images from another domain. Complementing existing work in robustness testing, we introduce the first test dataset for this purpose which comes from an authentic use case where photographers wanted to learn about the content in their images. We built a new test set using 8,900 images taken by people who are blind for which we collected metadata to indicate the presence versus absence of 200 ImageNet object categories. We call this dataset VizWiz-Classification.
21 PAPERS • 3 BENCHMARKS
CAL500 (Computer Audition Lab 500) is a dataset aimed for evaluation of music information retrieval systems. It consists of 502 songs picked from western popular music. The audio is represented as a time series of the first 13 Mel-frequency cepstral coefficients (and their first and second derivatives) extracted by sliding a 12 ms half-overlapping short-time window over the waveform of each song. Each song has been annotated by at least 3 people with 135 musically-relevant concepts spanning six semantic categories:
20 PAPERS • NO BENCHMARKS YET
The objective in extreme multi-label classification is to learn feature architectures and classifiers that can automatically tag a data point with the most relevant subset of labels from an extremely large label set. This repository provides resources that can be used for evaluating the performance of extreme multi-label algorithms including datasets, code, and metrics.
18 PAPERS • NO BENCHMARKS YET
LSHTC is a dataset for large-scale text classification. The data used in the LSHTC challenges originates from two popular sources: the DBpedia and the ODP (Open Directory Project) directory, also known as DMOZ. DBpedia instances were selected from the english, non-regional Extended Abstracts provided by the DBpedia site. The DMOZ instances consist of either Content vectors, Description vectors or both. A Content vectors is obtained by directly indexing the web page using standard indexing chain (preprocessing, stemming/lemmatization, stop-word removal).
OpenImages V6 is a large-scale dataset , consists of 9 million training images, 41,620 validation samples, and 125,456 test samples. It is a partially annotated dataset, with 9,600 trainable classes
17 PAPERS • 3 BENCHMARKS
MLRSNet is a a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. It provides different perspectives of the world captured from satellites. That is, it is composed of high spatial resolution optical satellite images. MLRSNet contains 109,161 remote sensing images that are annotated into 46 categories, and the number of sample images in a category varies from 1,500 to 3,000. The images have a fixed size of 256×256 pixels with various pixel resolutions (~10m to 0.1m). Moreover, each image in the dataset is tagged with several of 60 predefined class labels, and the number of labels associated with each image varies from 1 to 13. The dataset can be used for multi-label based image classification, multi-label based image retrieval, and image segmentation.
10 PAPERS • 1 BENCHMARK
Arxiv ASTRO-PH (Astro Physics) collaboration network is from the e-print arXiv and covers scientific collaborations between authors papers submitted to Astro Physics category. If an author i co-authored a paper with author j, the graph contains a undirected edge from i to j. If the paper is co-authored by k authors this generates a completely connected (sub)graph on k nodes.
10 PAPERS • 2 BENCHMARKS
CHiME-Home is a dataset for sound source recognition in a domestic environment. It uses around 6.8 hours of domestic environment audio recordings. The recordings were obtained from the CHiME projects – computational hearing in multisource environments – where recording equipment was positioned inside an English Victorian semi-detached house. The recordings were selected from 22 sessions totalling 19.5 hours, with each session made between 7:30 in the morning and 20:00 in the evening. In the considered recordings, the equipment was placed in the lounge (sitting room) near the door opening onto a hallway, with the hallway opening onto a kitchen with no door. With the lounge door typically open, prominent sounds thus may originate from sources both in the lounge and kitchen.
4 PAPERS • NO BENCHMARKS YET
For each dataset we provide a short description as well as some characterization metrics. It includes the number of instances (m), number of attributes (d), number of labels (q), cardinality (Card), density (Dens), diversity (Div), average Imbalance Ratio per label (avgIR), ratio of unconditionally dependent label pairs by chi-square test (rDep) and complexity, defined as m × q × d as in [Read 2010]. Cardinality measures the average number of labels associated with each instance, and density is defined as cardinality divided by the number of labels. Diversity represents the percentage of labelsets present in the dataset divided by the number of possible labelsets. The avgIR measures the average degree of imbalance of all labels, the greater avgIR, the greater the imbalance of the dataset. Finally, rDep measures the proportion of pairs of labels that are dependent at 99% confidence. A broader description of all the characterization metrics and the used partition methods are described in
Sewer-ML is a sewer defect dataset. It contains 1.3 million images, from 75,618 videos collected from three Danish water utility companies over nine years. All videos have been annotated by licensed sewer inspectors following the Danish sewer inspection standard, Fotomanualen. This leads to consistent and reliable annotations, and a total of 17 annotated defect classes.
This dataset contains Bangla handwritten numerals, basic characters and compound characters. This dataset was collected from multiple geographical location within Bangladesh and includes sample collected from a variety of aged groups. This dataset can also be used for other classification problems i.e: gender, age, district.
3 PAPERS • 2 BENCHMARKS
Moviescope is a large-scale dataset of 5,000 movies with corresponding video trailers, posters, plots and metadata. Moviescope is based on the IMDB 5000 dataset consisting of 5.043 movie records. It is augmented by crawling video trailers associated with each movie from YouTube and text plots from Wikipedia.
3 PAPERS • NO BENCHMARKS YET
Aims to help V-NLIs recognize analytic tasks from free-form natural language by training and evaluating cutting-edge multi-label classification models. The dataset contains diverse user queries, and each is annotated with one or multiple analytic tasks.
2 PAPERS • NO BENCHMARKS YET
CAVES is the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns in a multi-label setting. This is also the first multi-label classification dataset that provides explanations for each of the labels. Additionally, the dataset also provides class-wise summaries of all the tweets.
1 PAPER • NO BENCHMARKS YET
This data is for the Mis2-KDD 2021 under review paper: Dataset of Propaganda Techniques of the State-Sponsored Information Operation of the People’s Republic of China
1 PAPER • 1 BENCHMARK
MTC is a financial-domain dataset of the multi-label topic classification task. It aims to identify the topics of the spoken dialogue.
ScienceExamCER is a collection of resources for studying explanation-centered inference, including explanation graphs for 1,680 questions, with 4,950 tablestore rows, and other analyses of the knowledge required to answer elementary and middle-school science questions.
Trailers12k is a movie trailer dataset comprised of 12,000 titles associated to ten genres. It distinguishes from other datasets by its collection procedure aimed at providing a high-quality publicly available dataset.
The KIT Whole-Body Human Motion Database is a large-scale dataset of whole-body human motion with methods and tools, which allows a unifying representation of captured human motion, and efficient search in the database, as well as the transfer of subject-specific motions to robots with different embodiments. Captured subject-specific motion is normalized regarding the subject’s height and weight by using a reference kinematics and dynamics model of the human body, the master motor map (MMM). In contrast with previous approaches and human motion databases, the motion data in this database consider not only the motions of the human subject but the position and motion of objects with which the subject is interacting as well. In addition to the description of the MMM reference model, See the paper for procedures and techniques used for the systematic recording, labeling, and organization of human motion capture data, object motions as well as the subject–object relations.
0 PAPER • NO BENCHMARKS YET