TyDi QA is a question answering dataset covering 11 typologically diverse languages with 200K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology — the set of linguistic features that each language expresses — such that the authors expect models performing well on this set to generalize across a large number of the languages in the world.
153 PAPERS • 1 BENCHMARK
The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:
120 PAPERS • 2 BENCHMARKS
The M4 dataset is a collection of 100,000 time series used for the fourth edition of the Makridakis forecasting Competition. The M4 dataset consists of time series of yearly, quarterly, monthly and other (weekly, daily and hourly) data, which are divided into training and test sets. The minimum numbers of observations in the training test are 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series. The participants were asked to produce the following numbers of forecasts beyond the available data that they had been given: six for yearly, eight for quarterly, 18 for monthly series, 13 for weekly series and 14 and 48 forecasts respectively for the daily and hourly ones.
85 PAPERS • NO BENCHMARKS YET
The Omniglot data set is designed for developing more human-like learning algorithms. It contains 1623 different handwritten characters from 50 different alphabets. Each of the 1623 characters was drawn online via Amazon's Mechanical Turk by 20 different people. Each image is paired with stroke data, a sequences of [x,y,t] coordinates with time (t) in milliseconds.
83 PAPERS • 15 BENCHMARKS
BlendedMVS is a novel large-scale dataset, to provide sufficient training ground truth for learning-based MVS. The dataset was created by applying a 3D reconstruction pipeline to recover high-quality textured meshes from images of well-selected scenes. Then, these mesh models were rendered to color images and depth maps.
74 PAPERS • NO BENCHMARKS YET
An open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks.
48 PAPERS • NO BENCHMARKS YET
TyDiQA is the gold passage version of the Typologically Diverse Question Answering (TyDiWA) dataset, a benchmark for information-seeking question answering, which covers nine languages. The gold passage version is a simplified version of the primary task, which uses only the gold passage as context and excludes unanswerable questions. It is thus similar to XQuAD and MLQA, while being more challenging as questions have been written without seeing the answers, leading to 3× and 2× less lexical overlap compared to XQuAD and MLQA respectively.
37 PAPERS • 1 BENCHMARK
The MQ2007 dataset consists of queries, corresponding retrieved documents and labels provided by human experts. The possible relevance labels for each document are “relevant”, “partially relevant”, and “not relevant”.
30 PAPERS • NO BENCHMARKS YET
Encourages machine learning research in this area and to help facilitate further work in understanding and mitigating the effects of climate change.
18 PAPERS • NO BENCHMARKS YET
Kuzushiji-49 is an MNIST-like dataset that has 49 classes (28x28 grayscale, 270,912 images) from 48 Hiragana characters and one Hiragana iteration mark.
10 PAPERS • NO BENCHMARKS YET
Dataset for multi-target classification of five commonly appearing concrete defects.
9 PAPERS • NO BENCHMARKS YET
Europeana Newspapers consists of four datasets with 100 pages each for the languages Dutch, French, German (including Austrian) as part of the Europeana Newspapers project is expected to contribute to the further development and improvement of named entity recognition systems with a focus on historical content.
5 PAPERS • NO BENCHMARKS YET
This prostate MRI segmentation dataset is collected from six different data sources.
3 PAPERS • NO BENCHMARKS YET
The FIGR-8 database is a dataset containing 17,375 classes of 1,548,256 images representing pictograms, ideograms, icons, emoticons or object or conception depictions. Its aim is to set a benchmark for Few-shot Image Generation tasks, albeit not being limited to it. Each image is represented by 192x192 pixels with grayscale value of 0-255. Classes are not balanced (they do not all contain the same number of elements), but they all do contain at the very least 8 images.
2 PAPERS • NO BENCHMARKS YET
The L-Bird (Large-Bird) dataset contains nearly 4.8 million images which are obtained by searching images of a total of 10,982 bird species from the Internet.
A multi-sensor, multi-modal dataset, implemented to benchmark Human Activity Recognition(HAR) and Multi-modal Fusion algorithms. Collection of this dataset was inspired by the need for recognising and evaluating quality of exercise performance to support patients with Musculoskeletal Disorders(MSD).
ToM-in-AMC is a novel NLP benchmark, Short for Theory-of-Mind meta-learning Assessment with Movie Characters. The benchmark consists of 1,000 parsed movie scripts for this purpose, each corresponding to a few-shot character understanding task.
Chinese Literature NER RE is a Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text. It is constructed from hundreds of Chinese literature articles.
1 PAPER • NO BENCHMARKS YET
Meta Omnium is a dataset-of-datasets spanning multiple vision tasks including recognition, keypoint localization, semantic segmentation and regression. Meta Omnium enables meta-learning researchers to evaluate model generalization to a much wider array of tasks than previously possible, and provides a single framework for evaluating meta-learners across a wide suite of vision applications in a consistent manner.