The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
13,433 PAPERS • 40 BENCHMARKS
The Food-101 dataset consists of 101 food categories with 750 training and 250 test images per category, making a total of 101k images. The labels for the test images have been manually cleaned, while the training set contains some noise.
589 PAPERS • 10 BENCHMARKS
ImageNet-R(endition) contains art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renditions of ImageNet classes.
341 PAPERS • 6 BENCHMARKS
The ImageNet-A dataset consists of real-world, unmodified, and naturally occurring examples that are misclassified by ResNet models.
318 PAPERS • 5 BENCHMARKS
ImageNet-Sketch data set consists of 50,889 images, approximately 50 images for each of the 1000 ImageNet classes. The data set is constructed with Google Image queries "sketch of ", where is the standard class name. Only within the "black and white" color scheme is searched. 100 images are initially queried for every class, and the pulled images are cleaned by deleting the irrelevant images and images that are for similar but different classes. For some classes, there are less than 50 images after manually cleaning, and then the data set is augmented by flipping and rotating the images.
205 PAPERS • 3 BENCHMARKS
ObjectNet is a test set of images collected directly using crowd-sourcing. ObjectNet is unique as the objects are captured at unusual poses in cluttered, natural scenes, which can severely degrade recognition performance. There are 50,000 images in the test set which controls for rotation, background and viewpoint. There are 313 object classes with 113 overlapping ImageNet.
130 PAPERS • 4 BENCHMARKS
Powered by the ImageNet dataset, unsupervised learning on large-scale data has made significant advances for classification tasks. There are two major challenges to allowing such an attractive learning modality for segmentation tasks: i) a large-scale benchmark for assessing algorithms is missing; ii) unsupervised shape representation learning is difficult. We propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to track the research progress. Based on the ImageNet dataset, we propose the ImageNet-S dataset with 1.2 million training images and 50k high-quality semantic segmentation annotations for evaluation. Our benchmark has a high data diversity and a clear task objective. We also present a simple yet effective baseline method that works surprisingly well for LUSS. In addition, we benchmark related un/weakly/fully supervised methods accordingly, identifying the challenges and possible directions of LUSS.
31 PAPERS • 6 BENCHMARKS
When glancing at a magazine, or browsing the Internet, we are continuously being exposed to photographs. Despite of this overflow of visual information, humans are extremely good at remembering thousands of pictures along with some of their visual details. But not all images are equal in memory. Some stitch to our minds, and other are forgotten. In this paper we focus on the problem of predicting how memorable an image will be. We show that memorability is a stable property of an image that is shared across different viewers. We introduce a database for which we have measured the probability that each picture will be remembered after a single view. We analyze image features and labels that contribute to making an image memorable, and we train a predictor based on global image descriptors. We find that predicting image memorability is a task that can be addressed with current computer vision techniques. Whereas making memorable images is a challenging task in visualization and photograp
27 PAPERS • 4 BENCHMARKS