The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of tuples, each the result of a person expressing a preference (a 0-5 star rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site1 — a recommender system that asks its users to give movie ratings in order to receive personalized movie recommendations.
1,092 PAPERS • 16 BENCHMARKS
Children’s Book Test (CBT) is designed to measure directly how well language models can exploit wider linguistic context. The CBT is built from books that are freely available thanks to Project Gutenberg.
90 PAPERS • 2 BENCHMARKS
Criteo contains 7 days of click-through data, which is widely used for CTR prediction benchmarking. There are 26 anonymous categorical fields and 13 continuous fields in Criteo dataset.
40 PAPERS • 1 BENCHMARK
The iPinYou Global RTB(Real-Time Bidding) Bidding Algorithm Competition is organized by iPinYou from April 1st, 2013 to December 31st, 2013.The competition has been divided into three seasons. For each season, a training dataset is released to the competition participants, the testing dataset is reserved by iPinYou. The complete testing dataset is randomly divided into two parts: one part is the leaderboard testing dataset to score and rank the participating teams on the leaderboard, and the other part is reserved for the final offline evaluation. The participant's last offline submission is evaluated by the reserved testing dataset to get a team's offline final score. This dataset contains all three seasons training datasets and leaderboard testing datasets.The reserved testing datasets are withheld by iPinYou. The training dataset includes a set of processed iPinYou DSP bidding, impression, click, and conversion logs.
18 PAPERS • 1 BENCHMARK
TripClick is a large-scale dataset of click logs in the health domain, obtained from user interactions of the Trip Database health web search engine.
15 PAPERS • NO BENCHMARKS YET
A clickthrough prediction dataset, for more information please see the Kaggle page
4 PAPERS • 1 BENCHMARK
A large scale, C2C marketplace e-commerce dataset.
1 PAPER • NO BENCHMARKS YET