The Electricity Transformer Temperature (ETT) is a crucial indicator in the electric power long-term deployment. This dataset consists of 2 years data from two separated counties in China. To explore the granularity on the Long sequence time-series forecasting (LSTF) problem, different subsets are created, {ETTh1, ETTh2} for 1-hour-level and ETTm1 for 15-minutes-level. Each data point consists of the target value ”oil temperature” and 6 power load features. The train/val/test is 12/4/4 months.
165 PAPERS • 1 BENCHMARK
The M4 dataset is a collection of 100,000 time series used for the fourth edition of the Makridakis forecasting Competition. The M4 dataset consists of time series of yearly, quarterly, monthly and other (weekly, daily and hourly) data, which are divided into training and test sets. The minimum numbers of observations in the training test are 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series. The participants were asked to produce the following numbers of forecasts beyond the available data that they had been given: six for yearly, eight for quarterly, 18 for monthly series, 13 for weekly series and 14 and 48 forecasts respectively for the daily and hourly ones.
85 PAPERS • NO BENCHMARKS YET
This dataset details the energy consumption of appliances in a low-energy building over 4.5 months. Data was collected at 10-minute intervals.
28 PAPERS • NO BENCHMARKS YET
PeMSD7 is traffic data in District 7 of California consisting of the traffic speed of 228 sensors while the period is from May to June in 2012 (only weekdays) with a time interval of 5 minutes. This dataset is popular for benchmark the traffic forecasting models.
20 PAPERS • 2 BENCHMARKS
Satellite images are snapshots of the Earth surface. We propose to forecast them. We frame Earth surface forecasting as the task of predicting satellite imagery conditioned on future weather. EarthNet2021 is a large dataset suitable for training deep neural networks on the task. It contains Sentinel~2 satellite imagery at $20$~m resolution, matching topography and mesoscale ($1.28$~km) meteorological variables packaged into $32000$ samples. Additionally we frame EarthNet2021 as a challenge allowing for model intercomparison. Resulting forecasts will greatly improve ($>\times50$) over the spatial resolution found in numerical models. This allows localized impacts from extreme weather to be predicted, thus supporting downstream applications such as crop yield prediction, forest health assessments or biodiversity monitoring. Find data, code, and how to participate at www.earthnet.tech.
7 PAPERS • 2 BENCHMARKS
Weather is recorded every 10 minutes for the 2020 whole year, which contains 21 meteorological indicators, such as air temperature, humidity, etc. The dataset in CSV format can be downloaded at https://drive.google.com/file/d/1Tc7GeVN7DLEl-RAs-JVwG9yFMf--S8dy/view?usp=share_link.
7 PAPERS • 5 BENCHMARKS
A multivariate spatio-temporal benchmark dataset for meteorological forecasting based on real-time observation data from ground weather stations.
7 PAPERS • 16 BENCHMARKS
This data set contains electricity consumption of 370 points/clients.
6 PAPERS • 4 BENCHMARKS
The data we use include 366 monthly series, 427 quarterly series and 518 yearly series. They were supplied by both tourism bodies (such as Tourism Australia, the Hong Kong Tourism Board and Tourism New Zealand) and various academics, who had used them in previous tourism forecasting studies (please refer to the acknowledgements and details of the data sources and availability).
4 PAPERS • NO BENCHMARKS YET
The AQI dataset is collected from 12 observing stations around Beijing from year 2013 to 2017. The data is accessible at The University of California, Irvine (UCI) Machine Learning Repository.
3 PAPERS • NO BENCHMARKS YET
Three-dimensional position of external markers placed on the chest and abdomen of healthy individuals breathing during intervals from 73s to 222s. The markers move because of the respiratory motion, and their position is sampled at approximately 10Hz. Markers are metallic objects used during external beam radiotherapy to track and predict the motion of tumors due to breathing for accurate dose delivery.
3 PAPERS • 1 BENCHMARK
Visuelle 2.0 is a dataset containing real data for 5355 clothing products of the retail fast-fashion Italian company, Nuna Lie. Specifically, Visuelle 2.0 provides data from 6 fashion seasons (partitioned in Autumn-Winter and Spring-Summer) from 2017-2019, right before the Covid-19 pandemic. Each product is accompanied by an HD image, textual tags and more. The time series data are disaggregated at the shop level, and include the sales, inventory stock, max-normalized prices (for the sake of confidentiality} and discounts. Exogenous time series data is also provided, in the form of Google Trends based on the textual tags and multivariate weather conditions of the stores’ locations. Finally, we also provide purchase data for 667K customers whose identity has been anonymized, to capture personal preferences. With these data, Visuelle 2.0 allows to cope with several problems which characterize the activity of a fast fashion company: new product demand forecasting, short-observation new pr
3 PAPERS • 2 BENCHMARKS
The dataset contains the hotel demand and revenue of 8 major tourist destinations in the US (e.g., Los Angeles, Orlando ...). The dataset contains sales, daily occupancy, demand, and revenue of the upper-middle class hotels.
2 PAPERS • NO BENCHMARKS YET
A new spatio-temporal benchmark dataset (Hurricane), is suited for forecasting during extreme events and anomalies. The dataset is provided through the Florida Department of Revenue which provides the monthly sales revenue (2003-2020) for the tourism industry for all 67 counties of Florida which are prone to annual hurricanes. Furthermore, we aligned and joined the raw time series with the history of hurricane categories based on time for each county. More precisely, the hurricane category indicates the maximum sustained wind speed which can result in catastrophic damages (Oceanic 2022).
2 PAPERS • 1 BENCHMARK
The Lorenz dataset contains 100000 time-series with length 24. The data has 5 modes and it is obtained using the Lorenz equation with 5 different seed values.
The original dataset was provided by Orange telecom in France, which contains anonymized and aggregated human mobility data. The Multivariate-Mobility-Paris dataset comprises information from 2020-08-24 to 2020-11-04 (72 days during the COVID-19 pandemic), with time granularity of 30 minutes and spatial granularity of 6 coarse regions in Paris, France. In other words, it represents a multivariate time series dataset.
This repository contains a financial-domain-focused dataset for financial sentiment/emotion classification and stock market time series prediction. It's based on our paper: StockEmotions: Discover Investor Emotions for Financial Sentiment Analysis and Multivariate Time Series accepted by AAAI 2023 Bridge (AI for Financial Services).
State-level data for the US economy. The changes in the number of employees based on one million employees active in the US during the COVID-19 pandemic are gathered from Homebase (Bartik et al. 2020). We further enriched the data with the state-level policies as an indication of extreme events (e.g., the state’s business closure order).
Dataset for A probabilistic forecast methodology for volatile electricity prices in the Australian National Electricity Market
1 PAPER • NO BENCHMARKS YET
The Beijing Traffic Dataset collects traffic speeds at 5-minute granularity for 3126 roadway segments in Beijing between 2022/05/12 and 2022/07/25.
1 PAPER • 1 BENCHMARK
Box-Jenkins gas furnace, a well-known time series forecasting problem
The data contains the following attributes for Korea Stock Price Index (KOSPI) for January 2000–December 2016: 1. Date (YYYY.M(M).D(D)) 2. Opening Price for the date, PX_OPEN 3. Highest Price for the date, PX_HIGH 4. Lowest Price for the date, PX_LOW 5. Closing Price for the date, PX_LAST 6. Total volume traded on the date, PX_VOLUME
The Mauna Loa Seeing Study was performed by the EOL/Integrated Surface Flux System team, capturing surface meteorology and flux products at the Mauna Loa Observatory in Hawaii.
1 PAPER • 2 BENCHMARKS
DIT4BEARs Internship Project (at UiT-The Arctic University of Norway) Dataset
Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graphlike in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problem using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fact
The USNA long-term scintillation study is a continuing effort to characterize and measure optical turbulence in the near-maritime boundary layer.
PJM Hourly Energy Consumption Data PJM Interconnection LLC (PJM) is a regional transmission organization (RTO) in the United States. It is part of the Eastern Interconnection grid operating an electric transmission system serving all or parts of Delaware, Illinois, Indiana, Kentucky, Maryland, Michigan, New Jersey, North Carolina, Ohio, Pennsylvania, Tennessee, Virginia, West Virginia, and the District of Columbia.
0 PAPER • NO BENCHMARKS YET