🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

15 dataset results for Active Learning

CIFAR-10 (Canadian Institute for Advanced Research, 10 classes)

The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The images are labelled with one of 10 mutually exclusive classes: airplane, automobile (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, ship, and truck (but not pickup truck). There are 6000 images per class with 5000 training and 1000 testing images per class.

14,087 PAPERS • 98 BENCHMARKS

MNIST-8M

MNIST-8M (Infinite MNIST)

MNIST8M is derived from the MNIST dataset by applying random deformations and translations to the dataset.

26 PAPERS • NO BENCHMARKS YET

DeepWeeds

The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighbouring flora.

19 PAPERS • NO BENCHMARKS YET

FDST

FDST (Fudan-ShanghaiTech)

The Fudan-ShanghaiTech dataset (FDST) is a dataset for video crowd counting. It contains 15K frames with about 394K annotated heads captured from 13 different scenes

17 PAPERS • NO BENCHMARKS YET

DialoGLUE

DialoGLUE is a natural language understanding benchmark for task-oriented dialogue designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning. It consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks.

16 PAPERS • 2 BENCHMARKS

Industrial Benchmark

A benchmark which bridges the gap between freely available, documented, and motivated artificial benchmarks and properties of real industrial problems. The resulting industrial benchmark (IB) has been made publicly available to the RL community by publishing its Java and Python code, including an OpenAI Gym wrapper, on Github.

12 PAPERS • NO BENCHMARKS YET

Groove

Groove (Groove MIDI Dataset)

The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive drumming. The dataset contains 1,150 MIDI files and over 22,000 measures of drumming.

11 PAPERS • NO BENCHMARKS YET

HJDataset

HJDataset is a large dataset of Historical Japanese Documents with Complex Layouts. It contains over 250,000 layout element annotations of seven types. In addition to bounding boxes and masks of the content regions, it also includes the hierarchical structures and reading orders for layout elements. The dataset is constructed using a combination of human and machine efforts.

5 PAPERS • NO BENCHMARKS YET

SYNTHIA-AL

Specially designed to evaluate active learning for video object detection in road scenes.

4 PAPERS • NO BENCHMARKS YET

Arxiv GR-QC

Arxiv GR-QC (General Relativity and Quantum Cosmology collaboration network)

Arxiv GR-QC (General Relativity and Quantum Cosmology) collaboration network is from the e-print arXiv and covers scientific collaborations between authors papers submitted to General Relativity and Quantum Cosmology category. If an author i co-authored a paper with author j, the graph contains a undirected edge from i to j. If the paper is co-authored by k authors this generates a completely connected (sub)graph on k nodes.

3 PAPERS • 2 BENCHMARKS

COMP6

COMP6 (COmprehensive Machine-learning Potential)

COMP6 is a benchmark for evaluating the extensibility of machine-learning based molecular potentials. It contains a diverse set of organic molecules.

3 PAPERS • NO BENCHMARKS YET

Goldfinch

Goldfinch (GOogLe image-search Dataset)

Goldfinch is a dataset for fine-grained recognition challenges. It contains a list of bird, butterfly, aircraft, and dog categories with relevant Google image search and Flickr search URLs. In addition, it also includes a set of active learning annotations on dog categories.

3 PAPERS • NO BENCHMARKS YET

Photoswitch

A benchmark for molecular machine learning where improvements in model performance can be immediately observed in the throughput of promising molecules synthesized in the lab. Photoswitches are a versatile class of molecule for medical and renewable energy applications where a molecule's efficacy is governed by its electronic transition wavelengths.

3 PAPERS • NO BENCHMARKS YET

Illness-dataset

Illness-dataset (Illness multi-domain textual dataset)

A dataset for evaluating text classification, domain adaptation, and active learning models. The dataset consists of 22,660 documents (tweets) collected in 2018 and 2019. It spans across four domains: Alzheimer's, Parkinson's, Cancer, and Diabetes.

2 PAPERS • NO BENCHMARKS YET

L-Bird (Large-Bird)

The L-Bird (Large-Bird) dataset contains nearly 4.8 million images which are obtained by searching images of a total of 10,982 bird species from the Internet.

2 PAPERS • NO BENCHMARKS YET

Datasets

15 dataset results for Active Learning