The benchmarks section lists all benchmarks using a given dataset or any of
its variants. We use variants to distinguish between results evaluated on
slightly different versions of the same dataset. For example, ImageNet 32⨉32
and ImageNet 64⨉64 are variants of the ImageNet dataset.
The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect. It features:
1449 densely labeled pairs of aligned RGB and depth images
464 new scenes taken from 3 cities
407,024 new unlabeled frames
Each object is labeled with a class and an instance number.
The dataset has several components:
Labeled: A subset of the video data accompanied by dense multi-class labels. This data has also been preprocessed to fill in missing depth labels.
Raw: The raw RGB, depth and accelerometer data as provided by the Kinect.
Toolbox: Useful functions for manipulating the data and labels.