PASCAL-S is a dataset for salient object detection consisting of a set of 850 images from PASCAL VOC 2010 validation set with multiple salient objects on the scenes.
253 PAPERS • 3 BENCHMARKS
DUTS is a saliency detection dataset containing 10,553 training images and 5,019 test images. All training images are collected from the ImageNet DET training/val sets, while test images are collected from the ImageNet DET test set and the SUN data set. Both the training and test set contain very challenging scenarios for saliency detection. Accurate pixel-level ground truths are manually annotated by 50 subjects.
250 PAPERS • 5 BENCHMARKS
HKU-IS is a visual saliency prediction dataset which contains 4447 challenging images, most of which have either low contrast or multiple salient objects.
205 PAPERS • 3 BENCHMARKS
The DUT-OMRON dataset is used for evaluation of Salient Object Detection task and it contains 5,168 high quality images. The images have one or more salient objects and relatively cluttered background.
193 PAPERS • 4 BENCHMARKS
The Image Shadow Triplets dataset (ISTD) is a dataset for shadow understanding that contains 1870 image triplets of shadow image, shadow mask, and shadow-free image.
75 PAPERS • 2 BENCHMARKS
SBU-Kinect-Interaction dataset version 2.0 comprises of RGB-D video sequences of humans performing interaction activities that are recording using the Microsoft Kinect sensor. This dataset was originally recorded for a class project, and it must be used only for the purposes of research. If you use this dataset in your work, please cite the following paper. Kiwon Yun, Jean Honorio, Debaleena Chattopadhyay, Tamara L. Berg, and Dimitris Samaras, The 2nd International Workshop on Human Activity Understanding from 3D Data at Conference on Computer Vision and Pattern Recognition (HAU3D-CVPRW), CVPR 2012
71 PAPERS • 4 BENCHMARKS
The Extended Complex Scene Saliency Dataset (ECSSD) is comprised of complex scenes, presenting textures and structures common to real-world images. ECSSD contains 1,000 intricate images and respective ground-truth saliency maps, created as an average of the labeling of five human participants.
29 PAPERS • 5 BENCHMARKS
SOC (Salient Objects in Clutter) is a dataset for Salient Object Detection (SOD). It includes images with salient and non-salient objects from daily object categories. Beyond object category annotations, each salient image is accompanied by attributes that reflect common challenges in real-world scenes.
29 PAPERS • 1 BENCHMARK
Includes 5000 spatially aligned RGBT image pairs with ground truth annotations. VT5000 has 11 challenges collected in different scenes and environments for exploring the robustness of algorithms.
22 PAPERS • NO BENCHMARKS YET
There exist several datasets for saliency detection, but none of them is specifically designed for high-resolution salient object detection. High-Resolution Salient Object Detection (HRSOD) dataset, containing 1610 training images and 400 test images. The total 2010 images are collected from the website of Flickr with the license of all creative commons. Pixel-level ground truths are manually annotated by 40 subjects. The shortest edge of each image in HRSOD is more than 1200 pixels.
19 PAPERS • 1 BENCHMARK
To enrich the diversity, we also collect 92 images which are suitable for saliency detection from DAVIS [27], a densely annotated high-resolution video segmentation dataset. Im- ages in this dataset are precisely annotated and have very high resolutions (i.e.,1920!1080). We ignore the categories of the objects and generate saliency ground truth masks for this dataset. For convenience, the collected dataset is denot- ed as DAVIS-S.
9 PAPERS • 1 BENCHMARK
Aiming Detect small obstacles, like lost and found.
7 PAPERS • 2 BENCHMARKS
HS-SOD is a hyperspectral salient object detection dataset with a collection of 60 hyperspectral images with their respective ground-truth binary images and representative rendered colour images (sRGB).
5 PAPERS • NO BENCHMARKS YET
Recent salient object detection (SOD) methods based on deep neural network have achieved remarkable performance. However, most of existing SOD models designed for low-resolution input perform poorly on high-resolution images due to the contradiction between the sampling depth and the receptive field size. Aiming at resolving this contradiction, we propose a novel one-stage framework called Pyramid Grafting Network (PGNet), using transformer and CNN backbone to extract features from different resolution images independently and then graft the features from transformer branch to CNN branch. An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically, guided by different source feature during decoding process. Moreover, we design an Attention Guided Loss (AGL) to explicitly supervise the attention matrix generated by CMGM to help the network better interact with the attention from different models. We cont
5 PAPERS • 1 BENCHMARK
A salient object subitizing image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace.
2 PAPERS • NO BENCHMARKS YET
360-SOD contains 500 high-resolution equirectangular images.
1 PAPER • NO BENCHMARKS YET
A small-scale training set, which only contains 4K images.