The NCLT dataset is a large scale, long-term autonomy dataset for robotics research collected on the University of Michigan’s North Campus. The dataset consists of omnidirectional imagery, 3D lidar, planar lidar, GPS, and proprioceptive sensors for odometry collected using a Segway robot. The dataset was collected to facilitate research focusing on long-term autonomous operation in changing environments. The dataset is comprised of 27 sessions spaced approximately biweekly over the course of 15 months. The sessions repeatedly explore the campus, both indoors and outdoors, on varying trajectories, and at different times of the day across all four seasons. This allows the dataset to capture many challenging elements including: moving obstacles (e.g., pedestrians, bicyclists, and cars), changing lighting, varying viewpoint, seasonal and weather changes (e.g., falling leaves and snow), and long-term structural changes caused by construction projects.
96 PAPERS • NO BENCHMARKS YET
Cambridge Landmarks, a large scale outdoor visual relocalisation dataset taken around Cambridge University. Contains original video, with extracted image frames labelled with their 6-DOF camera pose and a visual reconstruction of the scene. If you use this data, please cite our paper: Alex Kendall, Matthew Grimes and Roberto Cipolla "PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization." Proceedings of the International Conference on Computer Vision (ICCV), 2015.
92 PAPERS • NO BENCHMARKS YET
Aachen Day-Night is a dataset designed for benchmarking 6DOF outdoor visual localization in changing conditions. It focuses on localizing high-quality night-time images against a day-time 3D model. There are 14,607 images with changing conditions of weather, season and day-night cycles.
81 PAPERS • 1 BENCHMARK
InLoc is a dataset with reference 6DoF poses for large-scale indoor localization. Query photographs are captured by mobile phones at a different time than the reference 3D map, thus presenting a realistic indoor localization scenario.
58 PAPERS • NO BENCHMARKS YET
This data set provides Light Detection and Ranging (LiDAR) data and stereo image with various position sensors targeting a highly complex urban environment. The presented data set captures features in urban environments (e.g. metropolis areas, complex buildings and residential areas). The data of 2D and 3D LiDAR are provided, which are typical types of LiDAR sensors. Raw sensor data for vehicle navigation is presented in a file format. For convenience, development tools are provided in the Robot Operating System (ROS) environment.
18 PAPERS • NO BENCHMARKS YET
A novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100'000 (approx.) assets from Dublin LiDAR point cloud [12] in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (i.e., building, vegetation and ground) to refined (i.e., window, door and tree) elements.
13 PAPERS • NO BENCHMARKS YET
DeepLoc is a large-scale urban outdoor localization dataset. The dataset is currently comprised of one scene spanning an area of 110 x 130 m, that a robot traverses multiple times with different driving patterns. The dataset creators use a LiDAR-based SLAM system with sub-centimeter and sub-degree accuracy to compute the pose labels that provided as groundtruth. Poses in the dataset are approximately spaced by 0.5 m which is twice as dense as other relocalization datasets.
7 PAPERS • NO BENCHMARKS YET
Long-term visual localization provides a benchmark datasets aimed at evaluating 6 DoF pose estimation accuracy over large appearance variations caused by changes in seasonal (summer, winter, spring, etc.) and illumination (dawn, day, sunset, night) conditions. Each dataset consists of a set of reference images, together with their corresponding ground truth poses, and a set of query images.
3 PAPERS • NO BENCHMARKS YET
The PhotoSynth (PS) dataset for patch matching consists of a total of 30 scenes with 25 scenes for training and 5 scenes for validation. The different image pairs are captured in different illumination conditions, at different scales and with different viewpoints.
Aa new cross-season scaleless monocular depth prediction dataset from CMU Visual Localization dataset through structure from motion.
Aachen Day-Night v1.1 dataset is an extended version of the original Aachen Day-Night dataset. Besides the original query images, the Aachen Day-Night v1.1 dataset contains an additional 93 nighttime queries. In addition, it uses a larger 3D model containing additional images. These additional images were extracted from video sequences captured with different cameras. Please refer to Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis for more information.
2 PAPERS • 1 BENCHMARK
The Virtual Gallery dataset is a synthetic dataset that targets multiple challenges such as varying lighting conditions and different occlusion levels for various tasks such as depth estimation, instance segmentation and visual localization.
2 PAPERS • NO BENCHMARKS YET
To study the data-scarcity mitigation for learning-based visual localization methods via sim-to-real transfer, we curate and now present the CrossLoc benchmark datasets—a multimodal aerial sim-to-real data available for flights above nature and urban terrains. Unlike the previous computer vision datasets focusing on localization in a single domain (mostly real RGB images), the provided benchmark datasets include various multimodal synthetic cues paired to all real photos. Complementary to the paired real and synthetic data, we offer rich synthetic data that efficiently fills the flight envelope volume in the vicinity of the real data.
1 PAPER • NO BENCHMARKS YET
Danish Airs and Grounds (DAG) is a large collection of street-level and aerial images targeting such cases. Its main challenge lies in the extreme viewing-angle difference between query and reference images with consequent changes in illumination and perspective. The dataset is larger and more diverse than current publicly available data, including more than 50 km of road in urban, suburban and rural areas. All images are associated with accurate 6-DoF metadata that allows the benchmarking of visual localization methods.
DeepLocCross is a localization dataset that contains RGB-D stereo images captured at 1280 x 720 pixels at a rate of 20 Hz. The ground-truth pose labels are generated using a LiDAR-based SLAM system. In addition to the 6-DoF localization poses of the robot, the dataset additionally contains tracked detections of the observable dynamic objects. Each tracked object is identified using a unique track ID, spatial coordinates, velocity and orientation angle. Furthermore, as the dataset contains multiple pedestrian crossings, labels at each intersection indicating its safety for crossing are provided. This dataset consists of seven training sequences with a total of 2264 images, and three testing sequences with a total of 930 images. The dynamic nature of the surrounding environment at which the dataset was captured renders the tasks of localization and visual odometry estimation extremely challenging due to the varying weather conditions, presence of shadows and motion blur caused by the mov
FAS100K is a large-scale visual localization dataset. This dataset is comprised of two traverses of 238 and 130 kms respectively where the latter is a partial repeat of the former. The data was collected using stereo cameras in Australia under sunny day conditions. It covers a variety of road and environment types including urban and rural areas. The raw image data from one of the cameras streaming at 5 Hz constitutes 63,650 and 34,497 image frames for the two traverses respectively.
A new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images.
The NAVER LABS localization datasets are 5 new indoor datasets for visual localization in challenging real-world environments. They were captured in a large shopping mall and a large metro station in Seoul, South Korea, using a dedicated mapping platform consisting of 10 cameras and 2 laser scanners. In order to obtain accurate ground truth camera poses, we used a robust LiDAR SLAM which provides initial poses that are then refined using a novel structure-from-motion based optimization. The datasets are provided in the kapture format and contain about 130k images as well as 6DoF camera poses for training and validation. We also provide sparse Lidar-based depth maps for the training images. The poses of the test set are withheld to not bias the benchmark.
This synthetic event dataset is used in Robust e-NeRF to study the collective effect of camera speed profile, contrast threshold variation and refractory period on the quality of NeRF reconstruction from a moving event camera. It is simulated using an improved version of ESIM with three different camera configurations of increasing difficulty levels (i.e. easy, medium and hard) on seven Realistic Synthetic $360^{\circ}$ scenes (adopted in the synthetic experiments of NeRF), resulting in a total of 21 sequence recordings. Please refer to the Robust e-NeRF paper for more details.