The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of tuples, each the result of a person expressing a preference (a 0-5 star rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site1 — a recommender system that asks its users to give movie ratings in order to receive personalized movie recommendations.
1,092 PAPERS • 16 BENCHMARKS
Netflix Prize consists of about 100,000,000 ratings for 17,770 movies given by 480,189 users. Each rating in the training dataset consists of four entries: user, movie, date of grade, grade. Users and movies are represented with integer IDs, while ratings range from 1 to 5.
345 PAPERS • 1 BENCHMARK
Gowalla is a location-based social networking website where users share their locations by checking-in. The friendship network is undirected and was collected using their public API, and consists of 196,591 nodes and 950,327 edges. We have collected a total of 6,442,890 check-ins of these users over the period of Feb. 2009 - Oct. 2010.
158 PAPERS • 4 BENCHMARKS
MIcrosoft News Dataset (MIND) is a large-scale dataset for news recommendation research. It was collected from anonymized behavior logs of Microsoft News website. The mission of MIND is to serve as a benchmark dataset for news recommendation and facilitate the research in news recommendation and recommender systems area.
127 PAPERS • 1 BENCHMARK
The Yelp2018 dataset is adopted from the 2018 edition of the yelp challenge. Wherein local businesses like restaurants and bars are viewed as items. We use the same 10-core setting in order to ensure data quality.
93 PAPERS • 2 BENCHMARKS
ReDial (Recommendation Dialogues) is an annotated dataset of dialogues, where users recommend movies to each other. The dataset consists of over 10,000 conversations centered around the theme of providing movie recommendations.
89 PAPERS • 2 BENCHMARKS
We release Douban Conversation Corpus, comprising a training data set, a development set and a test set for retrieval based chatbot. The statistics of Douban Conversation Corpus are shown in the following table.
77 PAPERS • 4 BENCHMARKS
The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world data related to businesses, reviews, and user interactions. Here are the key details about the Yelp Dataset: Reviews: A whopping 6,990,280 reviews from users. Businesses: Information on 150,346 businesses. Pictures: A collection of 200,100 pictures. Metropolitan Areas: Data from 11 metropolitan areas. Tips: Over 908,915 tips provided by 1,987,897 users. Business Attributes: Details like hours, parking availability, and ambiance for more than 1.2 million businesses. Aggregated Check-ins: Historical check-in data for each of the 131,930 businesses.
68 PAPERS • 21 BENCHMARKS
This dataset contains 21,889 outfits from polyvore.com, in which 17,316 are for training, 1,497 for validation and 3,076 for testing.
55 PAPERS • 3 BENCHMARKS
The Epinions dataset is built form a who-trust-whom online social network of a general consumer review site Epinions.com. Members of the site can decide whether to ''trust'' each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user. It contains 75,879 nodes and 50,8837 edges.
52 PAPERS • 2 BENCHMARKS
The Foursquare dataset consists of check-in data for different cities. One subset contains check-ins in NYC and Tokyo collected for about 10 month (from 12 April 2012 to 16 February 2013). It contains 227,428 check-ins in New York city and 573,703 check-ins in Tokyo. Each check-in is associated with its time stamp, its GPS coordinates and its semantic meaning (represented by fine-grained venue-categories). Another subset contains long-term (about 18 months from April 2012 to September 2013) global-scale check-in data collected from Foursquare. It contains 33,278,683 checkins by 266,909 users on 3,680,126 venues (in 415 cities in 77 countries). Those 415 cities are the most checked 415 cities by Foursquare users in the world, each of which contains at least 10K check-ins.
39 PAPERS • NO BENCHMARKS YET
The Memetracker corpus contains articles from mainstream media and blogs from August 1 to October 31, 2008 with about 1 million documents per day. It has 10,967 hyperlink cascades among 600 media sites.
37 PAPERS • NO BENCHMARKS YET
The General Video Game AI (GVGAI) framework is widely used in research which features a corpus of over 100 single-player games and 60 two-player games. These are fairly small games, each focusing on specific mechanics or skills the players should be able to demonstrate, including clones of classic arcade games such as Space Invaders, puzzle games like Sokoban, adventure games like Zelda or game-theory problems such as the Iterative Prisoners Dilemma. All games are real-time and require players to make decisions in only 40ms at every game tick, although not all games explicitly reward or require fast reactions; in fact, some of the best game-playing approaches add up the time in the beginning of the game to run Breadth-First Search in puzzle games in order to find an accurate solution. However, given the large variety of games (many of which are stochastic and difficult to predict accurately), scoring systems and termination conditions, all unknown to the players, highly-adaptive genera
34 PAPERS • NO BENCHMARKS YET
This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.
33 PAPERS • 6 BENCHMARKS
The Ciao dataset contains rating information of users given to items, and also contain item category information. The data comes from the Epinions dataset.
32 PAPERS • 1 BENCHMARK
The Pinterest dataset contains more than 1 million images associated to Pinterest users’ who have “pinned” them.
29 PAPERS • 1 BENCHMARK
A human-to-human Chinese dialog dataset (about 10k dialogs, 156k utterances), which contains multiple sequential dialogs for every pair of a recommendation seeker (user) and a recommender (bot).
26 PAPERS • NO BENCHMARKS YET
KuaiRec is a real-world dataset collected from the recommendation logs of the video-sharing mobile app Kuaishou. For now, it is the first dataset that contains a fully observed user-item interaction matrix. For the term “fully observed”, we mean there are almost no missing values in the user-item matrix, i.e., each user has viewed each video and then left feedback.
The Yahoo! Learning to Rank Challenge dataset consists of 709,877 documents encoded in 700 features and sampled from query logs of the Yahoo! search engine, spanning 29,921 queries.
24 PAPERS • NO BENCHMARKS YET
Amazon Review is a dataset to tackle the task of identifying whether the sentiment of a product review is positive or negative. This dataset includes reviews from four different merchandise categories: Books (B) (2834 samples), DVDs (D) (1199 samples), Electronics (E) (1883 samples), and Kitchen and housewares (K) (1755 samples).
23 PAPERS • 5 BENCHMARKS
TG-ReDial is a a topic-guided conversational recommendation dataset for research on conversational/interactive recommender systems.
22 PAPERS • NO BENCHMARKS YET
KuaiRand is an unbiased sequential recommendation dataset collected from the recommendation logs of the video-sharing mobile app, Kuaishou (快手). It is the first recommendation dataset with millions of intervened interactions of randomly exposed items inserted in the standard recommendation feeds!
20 PAPERS • NO BENCHMARKS YET
The MMD (MultiModal Dialogs) dataset is a dataset for multimodal domain-aware conversations. It consists of over 150K conversation sessions between shoppers and sales agents, annotated by a group of in-house annotators using a semi-automated manually intense iterative process.
18 PAPERS • NO BENCHMARKS YET
A dataset containing 404,683 shop photos collected from 25 different online retailers and 20,357 street photos, providing a total of 39,479 clothing item matches between street and shop photos.
15 PAPERS • 1 BENCHMARK
BeerAdvocate is a dataset that consists of beer reviews from beeradvocate. The data span a period of more than 10 years, including all ~1.5 million reviews up to November 2011. Each review includes ratings in terms of five "aspects": appearance, aroma, palate, taste, and overall impression. Reviews include product and user information, followed by each of these five ratings, and a plaintext review.
14 PAPERS • 1 BENCHMARK
This datasets is a subset of the Amazon reviews dataset which contain Men related products
13 PAPERS • 2 BENCHMARKS
N/A
11 PAPERS • 1 BENCHMARK
Amazon-Sports is a sub-category of the Amazon dataset, which contains a series of product reviews crawled from Amazon.com.
9 PAPERS • 1 BENCHMARK
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
9 PAPERS • 2 BENCHMARKS
The WeChat dataset for fake news detection contains more than 20k news labelled as fake news or not.
7 PAPERS • 1 BENCHMARK
This datasets is a subset of the Amazon reviews dataset which contain Fashion related products
6 PAPERS • 1 BENCHMARK
CITE is a crowd-sourced resource for multimodal discourse: this resource characterises inferences in image-text contexts in the domain of cooking recipes in the form of coherence relations.
Coached Conversational Preference Elicitation is a dataset consisting of 502 English dialogs with 12,000 annotated utterances between a user and an assistant discussing movie preferences in natural language. It was collected using a Wizard-of-Oz methodology between two paid crowd-workers, where one worker plays the role of an 'assistant', while the other plays the role of a 'user'.
5 PAPERS • NO BENCHMARKS YET
The Epinions dataset is trust network dataset. For each user, it contains his profile, his ratings and his trust relations. For each rating, it has the product name and its category, the rating score, the time point when the rating is created, and the helpfulness of this rating.
5 PAPERS • 1 BENCHMARK
KG20C is a Knowledge Graph about high quality papers from 20 top computer science Conferences. It can serve as a standard benchmark dataset in scholarly data analysis for several tasks, including knowledge graph embedding, link prediction, recommendation systems, and question answering .
4 PAPERS • 1 BENCHMARK
Amazon Fine Foods is a dataset that consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plaintext review.
3 PAPERS • NO BENCHMARKS YET
Publicly available dataset in the hotel domain (50M versus 0.9M) and additionally, the largest recommendation dataset in a single domain and with textual reviews (50M versus 22M).
A set of approximately 100K podcast episodes comprised of raw audio files along with accompanying ASR transcripts. This represents over 47,000 hours of transcribed audio, and is an order of magnitude larger than previous speech-to-text corpora.
We ran 21 recommender systems on three datasets (BeerAdvocate, LibraryThing and MovieLens 1M). The output of these recommenders was evaluated using rec_eval tool. We also measured statistically significant improvements using permutation test. The output of both tools can be found in data.
Dataset of restaurant reviews from TripAdvisor that includes images and texts uploaded in reviews by users. Reviews in six different cities are included: Gijón (Spain), Barcelona (Spain), Madrid (Spain), New York City (USA), Paris (France) and London (United Kingdom). In the original publication, the following task is proposed: Can we explain, using the existing image or text from a different user, why a given restaurant was recommended to a certain user?
3 PAPERS • 6 BENCHMARKS
Delicious : This data set contains tagged web pages retrieved from the website delicious.com.
2 PAPERS • 1 BENCHMARK
A dataset consisting of recipient 46 users and, 26180 tweets. The dataset includes the news feed of the users and 13 features that may influence the relevance of the tweets.
2 PAPERS • NO BENCHMARKS YET
Uses a platform with 77 candies and sweets to rank. Over 2000 users submitted over 44000 grades resulting in a matrix with 28% coverage.
Wikidata-14M is a recommender system dataset for recommending items to Wikidata editors. It consists of 220,000 editors responsible for 14 million interactions with 4 million items.
Wyze Rule Recommendation Dataset. It is a big dataset with 300,000 users. Please cite [1] if you used the dataset and cite [2] if you referenced the algorithm.
The CAL10K dataset (introduced as Swat10k) contains 10,870 songs that are weakly-labelled using a tag vocabulary of 475 acoustic tags and 153 genre tags. The tags have all been harvested from Pandora’s website and result from song annotations performed by expert musicologists involved with the Music Genome Project.
1 PAPER • NO BENCHMARKS YET
E-ReDial is a conversational recommender system dataset with high-quality explanations. It consists of 756 dialogues with 12,003 utterances, each with 15.9 turns on average. 2,058 high-quality explanations are included, each with 79.2 tokens on average.
Description This Dataset contains review information on Google map (ratings, text, images, etc.), business metadata (address, geographical info, descriptions, category information, price, open hours, and MISC info), and links (relative businesses) up to Sep 2021 in the United States.