The Free Music Archive (FMA) is a large-scale dataset for evaluating several tasks in Music Information Retrieval. It consists of 343 days of audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. It provides full-length and high-quality audio, pre-computed features, together with track- and user-level metadata, tags, and free-form text such as biographies.
95 PAPERS • 2 BENCHMARKS
MedleyDB, is a dataset of annotated, royalty-free multitrack recordings. It was curated primarily to support research on melody extraction. For each song melody f₀ annotations are provided as well as instrument activations for evaluating automatic instrument recognition. The original dataset consists of 122 multitrack songs out of which 108 include melody annotations.
41 PAPERS • NO BENCHMARKS YET
The RWC (Real World Computing) Music Database is a copyright-cleared music database (DB) that is available to researchers as a common foundation for research. It contains around 100 complete songs with manually labeled section boundaries. For the 50 instruments, individual sounds at half-tone intervals were captured with several variations of playing styles, dynamics, instrument manufacturers and musicians.
URMP (University of Rochester Multi-Modal Musical Performance) is a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises 44 simple multi-instrument musical pieces assembled from coordinated but separately recorded performances of individual tracks. For each piece the dataset provided the musical score in MIDI format, the high-quality individual instrument audio recordings and the videos of the assembled pieces.
30 PAPERS • NO BENCHMARKS YET
The MTG-Jamendo dataset is an open dataset for music auto-tagging. The dataset contains over 55,000 full audio tracks with 195 tags categories (87 genre tags, 40 instrument tags, and 56 mood/theme tags). It is built using music available at Jamendo under Creative Commons licenses and tags provided by content uploaders. All audio is distributed in 320kbps MP3 format.
28 PAPERS • NO BENCHMARKS YET
GuitarSet is a dataset of high-quality guitar recordings and rich annotations. It contains 360 excerpts 30 seconds in length. The 360 excerpts are the result of the following combinations:
24 PAPERS • NO BENCHMARKS YET
The iKala dataset is a singing voice separation dataset that comprises of 252 30-second excerpts sampled from 206 iKala songs (plus 100 hidden excerpts reserved for MIREX data mining contest). The music accompaniment and the singing voice are recorded at the left and right channels respectively. Additionally, the human-labeled pitch contours and timestamped lyrics are provided.
20 PAPERS • 1 BENCHMARK
EMOPIA (pronounced ‘yee-mò-pi-uh’) dataset is a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators.
14 PAPERS • NO BENCHMARKS YET
The Lakh Pianoroll Dataset (LPD) is a collection of 174,154 multitrack pianorolls derived from the Lakh MIDI Dataset (LMD).
9 PAPERS • NO BENCHMARKS YET
The YouTube-100M data set consists of 100 million YouTube videos: 70M training videos, 10M evaluation videos, and 20M validation videos. Videos average 4.6 minutes each for a total of 5.4M training hours. Each of these videos is labeled with 1 or more topic identifiers from a set of 30,871 labels. There are an average of around 5 labels per video. The labels are assigned automatically based on a combination of metadata (title, description, comments, etc.), context, and image content for each video. The labels apply to the entire video and range from very generic (e.g. “Song”) to very specific (e.g. “Cormorant”). Being machine generated, the labels are not 100% accurate and of the 30K labels, some are clearly acoustically relevant (“Trumpet”) and others are less so (“Web Page”). Videos often bear annotations with multiple degrees of specificity. For example, videos labeled with “Trumpet” are often labeled “Entertainment” as well, although no hierarchy is enforced.
8 PAPERS • NO BENCHMARKS YET
OpenMIC-2018 is an instrument recognition dataset containing 20,000 examples of Creative Commons-licensed music available on the Free Music Archive. Each example is a 10-second excerpt which has been partially labeled for the presence or absence of 20 instrument classes by annotators on a crowd-sourcing platform.
7 PAPERS • 1 BENCHMARK
GiantMIDI-Piano contains 10,854 unique piano solo pieces composed by 2,786 composers. GiantMIDI-Piano contains 34,504,873 transcribed notes, and contains metadata information of each music piece.
6 PAPERS • NO BENCHMARKS YET
The Spotify Music Streaming Sessions Dataset (MSSD) consists of 160 million streaming sessions with associated user interactions, audio features and metadata describing the tracks streamed during the sessions, and snapshots of the playlists listened to during the sessions.
5 PAPERS • 1 BENCHMARK
GoodSounds dataset contains around 28 hours of recordings of single notes and scales played by 15 different professional musicians, all of them holding a music degree and having some expertise in teaching. 12 different instruments (flute, cello, clarinet, trumpet, violin, alto sax alto, tenor sax, baritone sax, soprano sax, oboe, piccolo and bass) were recorded using one or up to 4 different microphones. For all the instruments the whole set of playable semitones in the instrument is recorded several times with different tonal characteristics. Each note is recorded into a separate monophonic audio file of 48kHz and 32 bits. Rich annotations of the recordings are available, including details on recording environment and rating on tonal qualities of the sound (“good-sound”, “bad”, “scale-good”, “scale-bad”).
4 PAPERS • NO BENCHMARKS YET
MuMu is a new dataset of more than 31k albums classified into 250 genre classes.
jazznet is a dataset of piano patterns for music audio machine learning research. The dataset comprises chords, arpeggios, scales, and chord progressions in all keys of an 88-key piano and in all the inversions, for a total of 162520 labeled piano patterns, resulting in 95GB of data and more than 26k hours of audio. The data is also accompanied by Python scripts to enable the easy generation of new piano patterns beyond those present in the dataset. The data is broken down into small, medium, and large subsets, comprising 21516, 30328, and 52360 patterns, respectively (with all the chords, arpeggios, and scales being present in all subsets).
COSIAN is an annotation collection of Japanese popular (J-POP) songs, focusing on singing style and expression of famous solo-singers.
2 PAPERS • NO BENCHMARKS YET
This dataset includes all music sources, background noises and impulse-reponses (IR) samples and conversation speech that have been used in the work "Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning" ICASSP 2021 (https://arxiv.org/abs/2010.11910).
The CAL10K dataset (introduced as Swat10k) contains 10,870 songs that are weakly-labelled using a tag vocabulary of 475 acoustic tags and 153 genre tags. The tags have all been harvested from Pandora’s website and result from song annotations performed by expert musicologists involved with the Music Genome Project.
1 PAPER • NO BENCHMARKS YET
The Haydn Annotation Dataset consists of note onset annotations from 24 experiment participants with varying musical experience. The annotation experiments use recordings from the ARME Virtuoso Strings Dataset.
Introduction The Niko Chord Progression Dataset is used in AccoMontage2. It contains 5k+ chord progression pieces, labeled with styles. There are four styles in total: Pop Standard, Pop Complex, Dark and R&B. Some progressions have an 'Unknown' style. Some statistics are provided below.
Nlakh is a dataset for Musical Instrument Retrieval. It is a combination of the NSynth dataset, which provides a large number of instruments, and the Lakh dataset, which provides multi-track MIDI data.
Virtuoso Strings is a dataset for soft onsets detection for string instruments. It consists of over 144 recordings of professional performances of an excerpt from Haydn's string quartet Op. 74 No. 1 Finale, each with corresponding individual instrumental onset annotations.