The objective in extreme multi-label classification is to learn feature architectures and classifiers that can automatically tag a data point with the most relevant subset of labels from an extremely large label set. This repository provides resources that can be used for evaluating the performance of extreme multi-label algorithms including datasets, code, and metrics.
18 PAPERS • NO BENCHMARKS YET
A dataset containing 404,683 shop photos collected from 25 different online retailers and 20,357 street photos, providing a total of 39,479 clothing item matches between street and shop photos.
15 PAPERS • 1 BENCHMARK
Logo-2K+:A Large-Scale Logo Dataset for Scalable Logo Classification The Logo-2K+ dataset contains a diverse range of logo classes from real-world logo images. It contains 167,140 images with 10 root categories and 2,341 leaf categories. The 10 different root categories are: Food, Clothes, Institution, Accessories, Transportation, Electronic, Necessities, Cosmetic, Leisure and Medical.
11 PAPERS • NO BENCHMARKS YET
The dataset contains:
7 PAPERS • NO BENCHMARKS YET
The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions fo
3 PAPERS • 1 BENCHMARK
Visuelle 2.0 is a dataset containing real data for 5355 clothing products of the retail fast-fashion Italian company, Nuna Lie. Specifically, Visuelle 2.0 provides data from 6 fashion seasons (partitioned in Autumn-Winter and Spring-Summer) from 2017-2019, right before the Covid-19 pandemic. Each product is accompanied by an HD image, textual tags and more. The time series data are disaggregated at the shop level, and include the sales, inventory stock, max-normalized prices (for the sake of confidentiality} and discounts. Exogenous time series data is also provided, in the form of Google Trends based on the textual tags and multivariate weather conditions of the stores’ locations. Finally, we also provide purchase data for 667K customers whose identity has been anonymized, to capture personal preferences. With these data, Visuelle 2.0 allows to cope with several problems which characterize the activity of a fast fashion company: new product demand forecasting, short-observation new pr
3 PAPERS • 2 BENCHMARKS
The LSEC (Live Stream E-Commerce) dataset has two subsets: LSEC-Small and LSEC-Large. It is a dataset for studying E-commerce transactions in the context of live streams, where the streames are talking about products while interacting with their audience. The dataset consists of interaction information among streamers, users, and products.
1 PAPER • NO BENCHMARKS YET
The OTTO session dataset is a large-scale dataset intended for multi-objective recommendation research. We collected the data from anonymized behavior logs of the OTTO webshop and the app. The mission of this dataset is to serve as a benchmark for session-based recommendations and foster research in the multi-objective and session-based recommender systems area. We also launched a Kaggle competition with the goal to predict clicks, cart additions, and orders based on previous events in a user session.