The Replica Dataset is a dataset of high quality reconstructions of a variety of indoor spaces. Each reconstruction has clean dense geometry, high resolution and high dynamic range textures, glass and mirror surface information, planar segmentation as well as semantic class and instance segmentation.
280 PAPERS • 3 BENCHMARKS
ViZDoom is an AI research platform based on the classical First Person Shooter game Doom. The most popular game mode is probably the so-called Death Match, where several players join in a maze and fight against each other. After a fixed time, the match ends and all the players are ranked by the FRAG scores defined as kills minus suicides. During the game, each player can access various observations, including the first-person view screen pixels, the corresponding depth-map and segmentation-map (pixel-wise object labels), the bird-view maze map, etc. The valid actions include almost all the keyboard-stroke and mouse-control a human player can take, accounting for moving, turning, jumping, shooting, changing weapon, etc. ViZDoom can run a game either synchronously or asynchronously, indicating whether the game core waits until all players’ actions are collected or runs in a constant frame rate without waiting.
149 PAPERS • 3 BENCHMARKS
AVD focuses on simulating robotic vision tasks in everyday indoor environments using real imagery. The dataset includes 20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely captured in 9 unique scenes.
29 PAPERS • 1 BENCHMARK
The Collaborative Drawing game (CoDraw) dataset contains ~10K dialogs consisting of ~138K messages exchanged between human players in the CoDraw game. The game involves two players: a Teller and a Drawer. The Teller sees an abstract scene containing multiple clip art pieces in a semantically meaningful configuration, while the Drawer tries to reconstruct the scene on an empty canvas using available clip art pieces. The two players communicate with each other using natural language.
12 PAPERS • NO BENCHMARKS YET
The GoogleEarth dataset is collected from Google Earth Studio, including 400 orbit trajectories in Manhattan and Brooklyn. Each trajectory consists of 60 images, with orbit radiuses ranging from 125 to 813 meters and altitudes varying from 112 to 884 meters. In addition to the images, Google Earth Studio provides camera intrinsic and extrinsic parameters, making it possible to create automated annotations for semantic and building instance segmentation
3 PAPERS • 1 BENCHMARK
The OSM dataset, sourced from OpenStreetMap, is composed of the rasterized semantic maps and height fields of 80 cities worldwide, spanning an area of more than 6,000 km^2. During the rasterization process, vectorized geometry information is converted into images by translating longitude and latitude into the EPSG:3857 coordinate system at zoom level 18, approximately 0.597 meters per pixel.
InstaOrder can be used to understand the geometrical relationships of instances in an image. The dataset consists of 2.9M annotations of geometric orderings for class-labeled instances in 101K natural scenes. The scenes were annotated by 3,659 crowd-workers regarding (1) occlusion order that identifies occluder/occludee and (2) depth order that describes ordinal relations that consider relative distance from the camera.
2 PAPERS • NO BENCHMARKS YET
3D FRONT HUMAN is a dataset that extends the large-scale synthetic scene dataset 3D-FRONT. Specifically, the 3D scenes with humans, i.e., non-contact humans (a sequence of walking motion and standing humans) as well as contact humans (sitting, touching, and lying humans). 3D FRONT HUMAN contains four room types: 1) 5689 bedrooms, 2) 2987 living rooms, 3) 2549 dining rooms and 4) 679 libraries. We use 21 object categories for the bedrooms, 24 for the living and dining rooms, and 25 for the libraries.
1 PAPER • NO BENCHMARKS YET