The time series segmentation benchmark (TSSB) currently contains 75 annotated time series (TS) with 1-9 segments. Each TS is constructed from one of the UEA & UCR time series classification datasets. We group TS by label and concatenate them to create segments with distinctive temporal patterns and statistical properties. We annotate the offsets at which we concatenated the segments as change points (CPs). Addtionally, we apply resampling to control the dataset resolution and add approximate, hand-selected window sizes that are able to capture temporal patterns.
8 PAPERS • 1 BENCHMARK
The Epinions dataset is trust network dataset. For each user, it contains his profile, his ratings and his trust relations. For each rating, it has the product name and its category, the rating score, the time point when the rating is created, and the helpfulness of this rating.
5 PAPERS • 1 BENCHMARK
Specifically designed for the evaluation of change point detection algorithms, consisting of 37 time series from various domains.
4 PAPERS • NO BENCHMARKS YET
SKAB is designed for evaluating algorithms for anomaly detection. The benchmark currently includes 30+ datasets plus Python modules for algorithms’ evaluation. Each dataset represents a multivariate time series collected from the sensors installed on the testbed. All instances are labeled for evaluating the results of solving outlier detection and changepoint detection problems.
3 PAPERS • 2 BENCHMARKS
The original paper presented a model of the industrial chemical process named Tennessee Eastman Process and a model-based TEP simulator for data generation. The most widely used benchmark consists of 22 datasets, 21 of which (Fault 1–21) contain faults and 1 (Fault 0) is fault-free. It is available in repository. All datasets have training (500 samples) and testing (960 samples) parts: training part has healthy state observations, testing part begins right after training, and contains faults which appear after 8 h since the training part. Each dataset has 52 features or observation variables with a 3 min sampling rate for most of all.
2 PAPERS • 1 BENCHMARK
HASCD (Human Activity Segmentation Challenge Dataset) contains 250 annotated multivariate time series capturing 10.7 h of real-world human motion smartphone sensor data from 15 bachelor computer science students. The recordings capture 6 distinct human motion sequences designed to represent pervasive behaviour in realistic indoor and outdoor settings. The data set serves as a benchmark for evaluating machine learning workflows.
1 PAPER • NO BENCHMARKS YET
The whole UCF-Crime dataset consists of real-world 240 × 320 RGB videos with 13 realistic anomaly types such as explosion, road accident, burglary, etc., and normal examples. The CPD specific requires a change in data distribution. We suppose that explosions and road accidents correspond to such a scenario, while most other types correspond to point anomalies. For example, data, obviously, com from a normal regime before the explosion. After it, we can see fire and smoke, which last for some time. Thus, the first moment when an explosion appears is a change point. Along with a volunteer, the authors carefully labelled chosen anomaly types. Their opinions were averaged. We provide the obtained markup, so other researchers can use it to validate their CPD algorithm for video.
MOSAD (Mobile Sensing Human Activity Data Set) is a multi-modal, annotated time series (TS) data set that contains 14 recordings of 9 triaxial smartphone sensor measurements (126 TS) from 6 human subjects performing (in part) 3 motion sequences in different locations. The aim of the data set is to facilitate the study of human behaviour and the design of TS data mining technology to separate individual activities using low-cost sensors in wearable devices.