The YCB-Video dataset is a large-scale video dataset for 6D object pose estimation. provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames.
146 PAPERS • 6 BENCHMARKS
T-LESS is a dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from
78 PAPERS • 2 BENCHMARKS
REAL275 is a benchmark for category-level pose estimation. It contains 4300 training frames, 950 validation and 2750 for testing across 18 different real scenes.
53 PAPERS • 1 BENCHMARK
The LM (Linemod) dataset is a valuable resource introduced by Stefan Hinterstoisser and colleagues in their research on model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes¹. Let's delve into the details:
33 PAPERS • 5 BENCHMARKS
The dataset includes the synthetic data generated from rendering the 3D meshes of LM objects and several household objects in Blender for training 6D pose estimation algorithms. The whole dataset contains synthetic data for 18 objects (13 from LM and 5 from household objects), with 20,000 data samples for each object. Each data sample includes an RGB image in .png format and a depth image in .exr format. Each sample has the annotations of mask labels in .png format and the ground truth pose labels saved in .json files. Apart from the training data, the 3D meshes of the objects and the pre-trained models of the 6D pose estimation algorithm are also included. The whole dataset takes approximately ~1T of storage memory.
1 PAPER • NO BENCHMARKS YET
Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. To tackle this issue with a common benchmark, we introduce the Drunkard’s Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality.
1 PAPER • 1 BENCHMARK