RESISC45 dataset is a dataset for Remote Sensing Image Scene Classification (RESISC). It contains 31,500 RGB images of size 256×256 divided into 45 scene classes, each class containing 700 images. Among its notable features, RESISC45 contains varying spatial resolution ranging from 20cm to more than 30m/px.
114 PAPERS • 1 BENCHMARK
BigEarthNet consists of 590,326 Sentinel-2 image patches, each of which is a section of i) 120x120 pixels for 10m bands; ii) 60x60 pixels for 20m bands; and iii) 20x20 pixels for 60m bands.
64 PAPERS • 3 BENCHMARKS
The Places365 dataset is a scene recognition dataset. It is composed of 10 million images comprising 434 scene classes. There are two versions of the dataset: Places365-Standard with 1.8 million train and 36000 validation images from K=365 scene classes, and Places365-Challenge-2016, in which the size of the training set is increased up to 6.2 million extra images, including 69 new scene classes (leading to a total of 8 million train images from 434 scene classes).
55 PAPERS • 8 BENCHMARKS
The Remote Sensing Image Captioning Dataset (RSICD) is a dataset for remote sensing image captioning task. It contains more than ten thousands remote sensing images which are collected from Google Earth, Baidu Map, MapABC and Tianditu. The images are fixed to 224X224 pixels with various resolutions. The total number of remote sensing images is 10921, with five sentences descriptions per image.
41 PAPERS • 3 BENCHMARKS
DCASE 2016 is a dataset for sound event detection. It consists of 20 short mono sound files for each of 11 sound classes (from office environments, like clearthroat, drawer, or keyboard), each file containing one sound event instance. Sound files are annotated with event on- and offset times, however silences between actual physical sounds (like with a phone ringing) are not marked and hence “included” in the event.
30 PAPERS • NO BENCHMARKS YET
A dataset consisting of 180,662 triplets of dual-pol synthetic aperture radar (SAR) image patches, multi-spectral Sentinel-2 image patches, and MODIS land cover maps.
29 PAPERS • NO BENCHMARKS YET
AID is a new large-scale aerial image dataset, by collecting sample images from Google Earth imagery. Note that although the Google Earth images are post-processed using RGB renderings from the original optical aerial images, it has proven that there is no significant difference between the Google Earth images with the real optical aerial images even in the pixel-level land use/cover mapping. Thus, the Google Earth images can also be used as aerial images for evaluating scene classification algorithms.
28 PAPERS • 1 BENCHMARK
The MTG-Jamendo dataset is an open dataset for music auto-tagging. The dataset contains over 55,000 full audio tracks with 195 tags categories (87 genre tags, 40 instrument tags, and 56 mood/theme tags). It is built using music available at Jamendo under Creative Commons licenses and tags provided by content uploaders. All audio is distributed in 320kbps MP3 format.
28 PAPERS • NO BENCHMARKS YET
Million-AID is a large-scale benchmark dataset containing a million instances for RS scene classification. There are 51 semantic scene categories in Million-AID. And the scene categories are customized to match the land-use classification standards, which greatly enhance the practicability of the constructed Million-AID. Different form the existing scene classification datasets of which categories are organized with parallel or uncertain relationships, scene categories in Million-AID are organized with systematic relationship architecture, giving it superiority in management and scalability. Specifically, the scene categories in Million-AID are organized by the hierarchical category network of a three-level tree: 51 leaf nodes fall into 28 parent nodes at the second level which are grouped into 8 nodes at the first level, representing the 8 underlying scene categories of agriculture land, commercial land, industrial land, public service land, residential land, transportation land, unut
26 PAPERS • NO BENCHMARKS YET
This is a 21 class land use image dataset meant for research purposes.
18 PAPERS • 1 BENCHMARK
TAU Urban Acoustic Scenes 2019 development dataset consists of 10-seconds audio segments from 10 acoustic scenes: airport, indoor shopping mall, metro station, pedestrian street, public square, street with medium level of traffic, travelling by a tram, travelling by a bus, travelling by an underground metro and urban park. Each acoustic scene has 1440 segments (240 minutes of audio). The dataset contains in total 40 hours of audio.
13 PAPERS • 2 BENCHMARKS
The TUT Acoustic Scenes 2017 dataset is a collection of recordings from various acoustic scenes all from distinct locations. For each recording location 3-5 minute long audio recordings are captured and are split into 10 seconds which act as unit of sample for this task. All the audio clips are recorded with 44.1 kHz sampling rate and 24 bit resolution.
12 PAPERS • 1 BENCHMARK
DCASE 2013 is a dataset for sound event detection. It consists of audio-only recordings where individual sound events are prominent in an acoustic scene.
11 PAPERS • NO BENCHMARKS YET
MLRSNet is a a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. It provides different perspectives of the world captured from satellites. That is, it is composed of high spatial resolution optical satellite images. MLRSNet contains 109,161 remote sensing images that are annotated into 46 categories, and the number of sample images in a category varies from 1,500 to 3,000. The images have a fixed size of 256×256 pixels with various pixel resolutions (~10m to 0.1m). Moreover, each image in the dataset is tagged with several of 60 predefined class labels, and the number of labels associated with each image varies from 1 to 13. The dataset can be used for multi-label based image classification, multi-label based image retrieval, and image segmentation.
10 PAPERS • 1 BENCHMARK
RICE is a remote sensing image dataset for cloud removal. The proposed dataset consists of two parts: RICE1 contains 500 pairs of images, each pair has images with cloud and cloudless size of 512512; RICE2 contains 450 sets of images, each set contains three 512512 size images, respectively, the reference picture without clouds, the picture of the cloud and the mask of its cloud.
9 PAPERS • 1 BENCHMARK
Dataset aimed to do automated aerial scene classification of disaster events from on-board a UAV.
6 PAPERS • NO BENCHMARKS YET
TAU Urban Acoustic Scenes 2019 Mobile development dataset consists of 10-seconds audio segments from 10 acoustic scenes:
5 PAPERS • 1 BENCHMARK
The LITIS-Rouen dataset is a dataset for audio scenes. It consists of 3026 examples of 19 scene categories. Each class is specific to a location such as a train station or an open market. The audio recordings have a duration of 30 seconds and a sampling rate of 22050 Hz. The dataset has a total duration of 1500 minutes.
3 PAPERS • NO BENCHMARKS YET
CochlScene is a dataset for acoustic scene classification. The dataset consists of 76k samples collected from 831 participants in 13 acoustic scenes.
2 PAPERS • 1 BENCHMARK
A high-resolution multi-sensor remote sensing scene classification dataset, appropriate for training and evaluating image classification models in the remote sensing domain.
1 PAPER • NO BENCHMARKS YET
The dataset is collected from the Youtube videos that contains fight instances in it. Also, some non-fight sequences from regular surveillance camera videos are included. * There are 300 videos in total as 150 fight + 150 non-fight * Videos are 2-second long * Only the fight related parts are included in the samples