Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories (flat surfaces, humans, vehicles, constructions, objects, nature, sky, and void). The dataset consists of around 5000 fine annotated images and 20000 coarse annotated ones. Data was captured in 50 cities during several months, daytimes, and good weather conditions. It was originally recorded as video so the frames were manually selected to have the following features: large number of dynamic objects, varying scene layout, and varying background.
3,323 PAPERS • 54 BENCHMARKS
Speech Commands is an audio dataset of spoken words designed to help train and evaluate keyword spotting systems .
342 PAPERS • 4 BENCHMARKS
A new dataset with abstractive dialogue summaries.
106 PAPERS • 6 BENCHMARKS
Social Media Mining for Health (SMM4H) Shared Task is a massive data source for biomedical and public health applications.
39 PAPERS • NO BENCHMARKS YET
A suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.
18 PAPERS • NO BENCHMARKS YET
CC-19 is a small new dataset related to the latest family of coronavirus i.e. COVID-19. The proposed dataset “CC-19” contains 34,006 CT scan slices (images) belonging to 98 subjects out of which 28,395 CT scan slices belong to positive COVID patients.
3 PAPERS • NO BENCHMARKS YET
Includes 11,771 samples of both human activities and falls performed by 30 subjects of ages ranging from 18 to 60 years. Samples are divided in 17 fine grained classes grouped in two coarse grained classes: one containing samples of 9 types of activities of daily living (ADL) and the other containing samples of 8 types of falls. The dataset has been stored to include all the information useful to select samples according to different criteria, such as the type of ADL, the age, the gender, and so on.
FedTADBench is a federated time series anomaly detection benchmark. It covers 5 time series anomaly detection algorithms, 4 federated learning frameworks, and 3 time series anomaly detection datasets.
1 PAPER • NO BENCHMARKS YET
A real-world image dataset that contains more than 900 images generated from 26 street cameras and 7 object categories annotated with detailed bounding box. The data distribution is non-IID and unbalanced, reflecting the characteristic real-world federated learning scenarios.
Natural Vertical Partitioned CVR Dataset for Vertical Federated Learning
The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact.
0 PAPER • NO BENCHMARKS YET