Speech Commands is an audio dataset of spoken words designed to help train and evaluate keyword spotting systems .
342 PAPERS • 4 BENCHMARKS
The PhysioNet Challenge 2012 dataset is publicly available and contains the de-identified records of 8000 patients in Intensive Care Units (ICU). Each record consists of roughly 48 hours of multivariate time series data with up to 37 features recorded at various times from the patients during their stay such as respiratory rate, glucose etc.
19 PAPERS • 5 BENCHMARKS
Caenorhabditis elegans is a roundworm commonly used as a model organism in the study of genetics. The movement of these worms is known to be a useful indicator for understanding behavioural genetics. Brown {\em et al.}[1] describe a system for recording the motion of worms on an agar plate and measuring a range of human-defined features[2]. It has been shown that the space of shapes Caenorhabditis elegans adopts on an agar plate can be represented by combinations of six base shapes, or eigenworms. Once the worm outline is extracted, each frame of worm motion can be captured by six scalars representing the amplitudes along each dimension when the shape is projected onto the six eigenworms. Using data collected for the work described in[1], we address the problem of classifying individual worms as wild-type or mutant based on the time series. The data were extracted from the C. elegans behavioural database [3]. We have 259 cases, which we split 131 train and 128 test. We have truncated e
17 PAPERS • 1 BENCHMARK
Abstract: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.
15 PAPERS • 4 BENCHMARKS
The time series segmentation benchmark (TSSB) currently contains 75 annotated time series (TS) with 1-9 segments. Each TS is constructed from one of the UEA & UCR time series classification datasets. We group TS by label and concatenate them to create segments with distinctive temporal patterns and statistical properties. We annotate the offsets at which we concatenated the segments as change points (CPs). Addtionally, we apply resampling to control the dataset resolution and add approximate, hand-selected window sizes that are able to capture temporal patterns.
8 PAPERS • 1 BENCHMARK
Alibaba Cluster Trace captures detailed statistics for the co-located workloads of long-running and batch jobs over a course of 24 hours. The trace consists of three parts: (1) statistics of the studied homogeneous cluster of 1,313 machines, including each machine’s hardware configuration, and the runtime {CPU, Memory, Disk} resource usage for a duration of 12 hours (the 2nd half of the 24-hour period); (2) long-running job workloads, including a trace of all container deployment requests and actions, and a resource usage trace for 12 hours; (3) co-located batch job workloads, including a trace of all batch job requests and actions, and a trace of per-instance resource usage over 24 hours.
7 PAPERS • 1 BENCHMARK
a dataset of time-series anomaly detection
6 PAPERS • NO BENCHMARKS YET
The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:
4 PAPERS • 2 BENCHMARKS
The dataset contains the hotel demand and revenue of 8 major tourist destinations in the US (e.g., Los Angeles, Orlando ...). The dataset contains sales, daily occupancy, demand, and revenue of the upper-middle class hotels.
2 PAPERS • NO BENCHMARKS YET
A new spatio-temporal benchmark dataset (Hurricane), is suited for forecasting during extreme events and anomalies. The dataset is provided through the Florida Department of Revenue which provides the monthly sales revenue (2003-2020) for the tourism industry for all 67 counties of Florida which are prone to annual hurricanes. Furthermore, we aligned and joined the raw time series with the history of hurricane categories based on time for each county. More precisely, the hurricane category indicates the maximum sustained wind speed which can result in catastrophic damages (Oceanic 2022).
2 PAPERS • 1 BENCHMARK
Solar Power Data for Integration Studies NREL's Solar Power Data for Integration Studies are synthetic solar photovoltaic (PV) power plant data points for the United States representing the year 2006.
2 PAPERS • 2 BENCHMARKS
State-level data for the US economy. The changes in the number of employees based on one million employees active in the US during the COVID-19 pandemic are gathered from Homebase (Bartik et al. 2020). We further enriched the data with the state-level policies as an indication of extreme events (e.g., the state’s business closure order).
This dataset contains data of 125 1-hour simulations of ship motion during various sea states performing random maneuvers in 4 degrees of freedom (surge-sway-yaw-roll). The original ship is a patrol ship developed by Perez et al. 1. We have extended it with a set of two symmetrically placed rudder propellers. Additionally, we simulate wind forces according to Isherwood's wind model 2. Wind-induced waves are generated with the JONSWAP spectrum 3 and the corresponding wave forces are then computed using wave force response amplitude operators (ROA).
1 PAPER • NO BENCHMARKS YET
FedTADBench is a federated time series anomaly detection benchmark. It covers 5 time series anomaly detection algorithms, 4 federated learning frameworks, and 3 time series anomaly detection datasets.
HASCD (Human Activity Segmentation Challenge Dataset) contains 250 annotated multivariate time series capturing 10.7 h of real-world human motion smartphone sensor data from 15 bachelor computer science students. The recordings capture 6 distinct human motion sequences designed to represent pervasive behaviour in realistic indoor and outdoor settings. The data set serves as a benchmark for evaluating machine learning workflows.
MOSAD (Mobile Sensing Human Activity Data Set) is a multi-modal, annotated time series (TS) data set that contains 14 recordings of 9 triaxial smartphone sensor measurements (126 TS) from 6 human subjects performing (in part) 3 motion sequences in different locations. The aim of the data set is to facilitate the study of human behaviour and the design of TS data mining technology to separate individual activities using low-cost sensors in wearable devices.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
A Sentinel-2 based time series multi country benchmark dataset, tailored for agricultural monitoring applications with Machine and Deep Learning. Sen4AgriNet dataset is annotated from farmer declarations collected via the Land Parcel Identification System (LPIS) for harmonizing country wide labels. Sen4AgriNet is the only multi-country, multi-year dataset that includes all spectral information. It is constructed to cover the period 2016-2020 for Catalonia and France, while it can be extended to include additional countries. Currently, it contains 42.5 million parcels, which makes it significantly larger than other available archives.
This dataset contains vibration data recorded on a rotating drive train. This drive train consists of an electronically commutated DC motor and a shaft driven by it, which passes through a roller bearing. With the help of a 3D-printed holder, unbalances with different weights and different radii were attached to the shaft. Besides the strength of the unbalances, the rotation speed of the motor was also varied. This dataset can be used to develop and test algorithms for the automatic detection of unbalances on drive trains. Datasets for 4 differently sized unbalances and for the unbalance-free case were recorded. The vibration data was recorded at a sampling rate of 4096 values per second. Datasets for development (ID "D[0-4]") as well as for evaluation (ID "E[0-4]") are available for each unbalance strength. The rotation speed was varied between approx. 630 and 2330 RPM in the development datasets and between approx. 1060 and 1900 RPM in the evaluation datasets. For each measurement of
pm2.5 time series data