🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

15 dataset results for Scene Recognition

ScanNet is an instance-level indoor RGB-D dataset that includes both 2D and 3D data. It is a collection of labeled voxels rather than points or objects. Up to now, ScanNet v2, the newest version of ScanNet, has collected 1513 annotated scans with an approximate 90% surface coverage. In the semantic segmentation task, this dataset is marked in 20 classes of annotated 3D voxelized objects.

1,240 PAPERS • 19 BENCHMARKS

ADE20K

The ADE20K semantic segmentation dataset contains more than 20K scene-centric images exhaustively annotated with pixel-level objects and object parts labels. There are totally 150 semantic categories, which include stuffs like sky, road, grass, and discrete objects like person, car, bed.

995 PAPERS • 25 BENCHMARKS

Places205

The Places205 dataset is a large-scale scene-centric dataset with 205 common scene categories. The training dataset contains around 2,500,000 images from these categories. In the training set, each scene category has the minimum 5,000 and maximum 15,000 images. The validation set contains 100 images per category (a total of 20,500 images), and the testing set includes 200 images per category (a total of 41,000 images).

508 PAPERS • 2 BENCHMARKS

SUN RGB-D

The SUN RGBD dataset contains 10335 real RGB-D images of room scenes. Each RGB image has a corresponding depth and segmentation map. As many as 700 object categories are labeled. The training and testing sets contain 5285 and 5050 images, respectively.

422 PAPERS • 13 BENCHMARKS

Places365

The Places365 dataset is a scene recognition dataset. It is composed of 10 million images comprising 434 scene classes. There are two versions of the dataset: Places365-Standard with 1.8 million train and 36000 validation images from K=365 scene classes, and Places365-Challenge-2016, in which the size of the training set is increased up to 6.2 million extra images, including 69 new scene classes (leading to a total of 8 million train images from 434 scene classes).

55 PAPERS • 8 BENCHMARKS

SUN397

The Scene UNderstanding (SUN) database contains 899 categories and 130,519 images. There are 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition.

34 PAPERS • 5 BENCHMARKS

AID (Aerial Image Dataset)

AID is a new large-scale aerial image dataset, by collecting sample images from Google Earth imagery. Note that although the Google Earth images are post-processed using RGB renderings from the original optical aerial images, it has proven that there is no significant difference between the Google Earth images with the real optical aerial images even in the pixel-level land use/cover mapping. Thus, the Google Earth images can also be used as aerial images for evaluating scene classification algorithms.

28 PAPERS • 1 BENCHMARK

YUP++

YUP++ (YUP++ Dynamic Scenes dataset)

A new and challenging video database of dynamic scenes that more than doubles the size of those previously available. This dataset is explicitly split into two subsets of equal size that contain videos with and without camera motion to allow for systematic study of how this variable interacts with the defining dynamics of the scene per se.

12 PAPERS • 1 BENCHMARK

MIT Indoor Scenes

Context This is the Original data provided by MIT .

8 PAPERS • 1 BENCHMARK

HSD

HSD (Honda Scenes Dataset)

An annotated dataset is released to enable dynamic scene classification that includes 80 hours of diverse high quality driving video data clips collected in the San Francisco Bay area. The dataset includes temporal annotations for road places, road types, weather, and road surface conditions.

4 PAPERS • NO BENCHMARKS YET

ADVANCE (AuDio Visual Aerial sceNe reCognition datasEt)

The AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE) is a brand-new multimodal learning dataset, which aims to explore the contribution of both audio and conventional visual messages to scene recognition. This dataset in summary contains 5075 pairs of geotagged aerial images and sounds, classified into 13 scene classes, i.e., airport, sports land, beach, bridge, farmland, forest, grassland, harbor, lake, orchard, residential area, shrub land, and train station.

3 PAPERS • NO BENCHMARKS YET

HOWS (HOWS-CL-25)

HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile robots operating in a changing environment (like a household), where it is important to learn new, never seen objects on the fly. This dataset can also be used for other learning use-cases, like instance segmentation or depth estimation. Or where household objects or continual learning are of interest.

1 PAPER • 2 BENCHMARKS

MAI (Multi-scene Aerial Image)

MAI is a dataset for multi-scene recognition in single aerial images. It consists of 3,923 labelled large-scale images from Google Earth imagery that covers the United States, Germany, and France. The size of each image is 512 ×512, and spatial resolutions vary from 0.3 m/pixel to 0.6 m/pixel. After capturing aerial images, multiple scene-level labels were manually assigned to each image from in total 24 scene categories, including apron, baseball, beach, commercial, farmland, woodland, parking lot, port, residential, river, storage tanks, sea, bridge, lake, park, roundabout, soccer field, stadium, train station, works, golf course, runway, sparse shrub, and tennis court

1 PAPER • NO BENCHMARKS YET

Indian Food Image Dataset (datacluster.ai)

This dataset is an extremely challenging set of over 5000+ original India food images captured and crowdsourced from over 800+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at ****DC Labs.

0 PAPER • NO BENCHMARKS YET

Stairs Image Dataset | Parts of House | Indoor (datacluster.ai)

This dataset is an extremely challenging set of over 3000+ originally Stair images captured and crowdsourced from over 500+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.

0 PAPER • NO BENCHMARKS YET

Datasets

15 dataset results for Scene Recognition