The nuScenes dataset is a large-scale autonomous driving dataset. The dataset has 3D bounding boxes for 1000 scenes collected in Boston and Singapore. Each scene is 20 seconds long and annotated at 2Hz. This results in a total of 28130 samples for training, 6019 samples for validation and 6008 samples for testing. The dataset has the full autonomous vehicle data suite: 32-beam LiDAR, 6 cameras and radars with complete 360° coverage. The 3D object detection challenge evaluates the performance on 10 classes: cars, trucks, buses, trailers, construction vehicles, pedestrians, motorcycles, bicycles, traffic cones and barriers.
1,549 PAPERS • 20 BENCHMARKS
Argoverse is a tracking benchmark with over 30K scenarios collected in Pittsburgh and Miami. Each scenario is a sequence of frames sampled at 10 HZ. Each sequence has an interesting object called “agent”, and the task is to predict the future locations of agents in a 3 seconds future horizon. The sequences are split into training, validation and test sets, which have 205,942, 39,472 and 78,143 sequences respectively. These splits have no geographical overlap.
322 PAPERS • 6 BENCHMARKS
The UCY dataset consist of real pedestrian trajectories with rich multi-human interaction scenarios captured at 2.5 Hz (Δt=0.4s). It is composed of three sequences (Zara01, Zara02, and UCY), taken in public spaces from top-view.
158 PAPERS • 1 BENCHMARK
The highD dataset is a new dataset of naturalistic vehicle trajectories recorded on German highways. Using a drone, typical limitations of established traffic data collection methods such as occlusions are overcome by the aerial perspective. Traffic was recorded at six different locations and includes more than 110 500 vehicles. Each vehicle's trajectory, including vehicle type, size and manoeuvres, is automatically extracted. Using state-of-the-art computer vision algorithms, the positioning error is typically less than ten centimeters. Although the dataset was created for the safety validation of highly automated vehicles, it is also suitable for many other tasks such as the analysis of traffic patterns or the parameterization of driver models.
93 PAPERS • NO BENCHMARKS YET
The INTERACTION dataset contains naturalistic motions of various traffic participants in a variety of highly interactive driving scenarios from different countries. The dataset can serve for many behavior-related research areas, such as
71 PAPERS • 1 BENCHMARK
ApolloScape is a large dataset consisting of over 140,000 video frames (73 street scene videos) from various locations in China under varying weather conditions. Pixel-wise semantic annotation of the recorded data is provided in 2D, with point-wise semantic annotation in 3D for 28 classes. In addition, the dataset contains lane marking annotations in 2D.
66 PAPERS • 5 BENCHMARKS
As autonomous driving systems mature, motion forecasting has received increasing attention as a critical requirement for planning. Of particular importance are interactive situations such as merges, unprotected turns, etc., where predicting individual object motion is not sufficient. Joint predictions of multiple objects are required for effective route planning. There has been a critical need for high-quality motion data that is rich in both interactions and annotation to develop motion planning models. In this work, we introduce the most diverse interactive motion dataset to our knowledge, and provide specific labels for interacting objects suitable for developing joint prediction models. With over 100,000 scenes, each 20 seconds long at 10 Hz, our new dataset contains more than 570 hours of unique data over 1750 km of roadways. It was collected by mining for interesting interactions between vehicles, pedestrians, and cyclists across six cities within the United States. We use a high
65 PAPERS • NO BENCHMARKS YET
ETH is a dataset for pedestrian detection. The testing set contains 1,804 images in three video clips. The dataset is captured from a stereo rig mounted on car, with a resolution of 640 x 480 (bayered), and a framerate of 13--14 FPS.
59 PAPERS • 5 BENCHMARKS
The inD dataset is a new dataset of naturalistic vehicle trajectories recorded at German intersections. Using a drone, typical limitations of established traffic data collection methods like occlusions are overcome. Traffic was recorded at four different locations. The trajectory for each road user and its type is extracted. Using state-of-the-art computer vision algorithms, the positional error is typically less than 10 centimetres. The dataset is applicable on many tasks such as road user prediction, driver modeling, scenario-based safety validation of automated driving systems or data-driven development of HAD system components.
39 PAPERS • NO BENCHMARKS YET
Honda Research Institute Driving Dataset (HDD) is a dataset to enable research on learning driver behavior in real-life environments. The dataset includes 104 hours of real human driving in the San Francisco Bay Area collected using an instrumented vehicle equipped with different sensors.
37 PAPERS • NO BENCHMARKS YET
JAAD is a dataset for studying joint attention in the context of autonomous driving. The focus is on pedestrian and driver behaviors at the point of crossing and factors that influence them. To this end, JAAD dataset provides a richly annotated collection of 346 short video clips (5-10 sec long) extracted from over 240 hours of driving footage. These videos filmed in several locations in North America and Eastern Europe represent scenes typical for everyday urban driving in various weather conditions.
21 PAPERS • 2 BENCHMARKS
The Argoverse 2 Motion Forecasting Dataset is a curated collection of 250,000 scenarios for training and validation. Each scenario is 11 seconds long and contains the 2D, birds-eye-view centroid and heading of each tracked object sampled at 10 Hz.
14 PAPERS • NO BENCHMARKS YET
The TrajNet Challenge represents a large multi-scenario forecasting benchmark. The challenge consists on predicting 3161 human trajectories, observing for each trajectory 8 consecutive ground-truth values (3.2 seconds) i.e., t−7,t−6,…,t, in world plane coordinates (the so-called world plane Human-Human protocol) and forecasting the following 12 (4.8 seconds), i.e., t+1,…,t+12. The 8-12-value protocol is consistent with the most trajectory forecasting approaches, usually focused on the 5-dataset ETH-univ + ETH-hotel + UCY-zara01 + UCY-zara02 + UCY-univ. Trajnet extends substantially the 5-dataset scenario by diversifying the training data, thus stressing the flexibility and generalization one approach has to exhibit when it comes to unseen scenery/situations. In fact, TrajNet is a superset of diverse datasets that requires to train on four families of trajectories, namely 1) BIWI Hotel (orthogonal bird’s eye flight view, moving people), 2) Crowds UCY (3 datasets, tilted bird’s eye view
12 PAPERS • 2 BENCHMARKS
TITAN consists of 700 labeled video-clips (with odometry) captured from a moving vehicle on highly interactive urban traffic scenes in Tokyo. The dataset includes 50 labels including vehicle states and actions, pedestrian age groups, and targeted pedestrian action attributes that are organized hierarchically corresponding to atomic, simple/complex-contextual, transportive, and communicative actions.
11 PAPERS • NO BENCHMARKS YET
The rounD dataset introduces a fresh compilation of natural road user trajectory data from German roundabouts, gathered using drone technology to navigate past usual challenges such as occlusions inherent in traditional traffic data collection methods. It includes traffic data from three unique locations, capturing the movement and categorizing each road user by type. Advanced computer vision algorithms are applied to ensure high positional accuracy. This dataset is highly adaptable for a variety of applications, including predicting road user behavior, driver modeling, scenario-based safety evaluations for automated driving systems, and the data-driven creation of Highly Automated Driving (HAD) system components.
The SIND dataset is based on 4K video captured by drones, providing information including traffic participant trajectories, traffic light status, and high-definition maps
10 PAPERS • NO BENCHMARKS YET
BLVD is a large scale 5D semantics dataset collected by the Visual Cognitive Computing and Intelligent Vehicles Lab. This dataset contains 654 high-resolution video clips owing 120k frames extracted from Changshu, Jiangsu Province, China, where the Intelligent Vehicle Proving Center of China (IVPCC) is located. The frame rate is 10fps/sec for RGB data and 3D point cloud. The dataset contains fully annotated frames which yield 249,129 3D annotations, 4,902 independent individuals for tracking with the length of overall 214,922 points, 6,004 valid fragments for 5D interactive event recognition, and 4,900 individuals for 5D intention prediction. These tasks are contained in four kinds of scenarios depending on the object density (low and high) and light conditions (daytime and nighttime).
9 PAPERS • NO BENCHMARKS YET
The GTA Indoor Motion dataset (GTA-IM) that emphasizes human-scene interactions in the indoor environments. It consists of HD RGB-D image sequences of 3D human motion from a realistic game engine. The dataset has clean 3D human pose and camera pose annotations, and large diversity in human appearances, indoor environments, camera views, and human activities.
9 PAPERS • 2 BENCHMARKS
Supports new task that predicts future locations of people observed in first-person videos.
7 PAPERS • NO BENCHMARKS YET
PIE is a new dataset for studying pedestrian behavior in traffic. PIE contains over 6 hours of footage recorded in typical traffic scenes with on-board camera. It also provides accurate vehicle information from OBD sensor (vehicle speed, heading direction and GPS coordinates) synchronized with video footage. Rich spatial and behavioral annotations are available for pedestrians and vehicles that potentially interact with the ego-vehicle as well as for the relevant elements of infrastructure (traffic lights, signs and zebra crossings). There are over 300K labeled video frames with 1842 pedestrian samples making this the largest publicly available dataset for studying pedestrian behavior in traffic.
6 PAPERS • 2 BENCHMARKS
Honda Egocentric View-Intersection Dataset (HEV-I) is introduced to enable research on traffic participants interaction modelling, future object localization, as well as learning driver action in challenging driving scenarios. The dataset includes 230 video clips of real human driving in different intersections from the San Francisco Bay Area, collected using an instrumented vehicle equipped with different sensors including cameras, GPS/IMU, and vehicle states signals.
5 PAPERS • 1 BENCHMARK
The UCLA Aerial Event Dataest has been captured by a low-cost hex-rotor with a GoPro camera, which is able to eliminate the high frequency vibration of the camera and hold in air autonomously through a GPS and a barometer. It can also fly 20 ∼ 90m above the ground and stays 5 minutes in air.
5 PAPERS • NO BENCHMARKS YET
A dataset composed of 12 different 3D scenes and RGB sequences of 20 subjects moving in and interacting with the scenes.
4 PAPERS • 1 BENCHMARK
The exiD dataset introduces a groundbreaking collection of naturalistic road user trajectories at highway entries and exits in Germany, meticulously captured with drones to navigate past the limitations of conventional traffic data collection methods, such as occlusions. This approach not only allows for the precise extraction of each road user’s trajectory and type but also ensures very high positional accuracy, thanks to sophisticated computer vision algorithms. Its innovative data collection technique minimizes errors and maximizes the quality and reliability of the dataset, making it a valuable resource for advanced research and development in the field of automated driving technologies.
4 PAPERS • NO BENCHMARKS YET
An abnormal activity data-set for research use that contains 4,83,566 annotated frames.
3 PAPERS • NO BENCHMARKS YET
This dataset contains aircraft trajectories in an untowered terminal airspace collected over 8 months surrounding the Pittsburgh-Butler Regional Airport [ICAO:KBTP], a single runway GA airport, 10 miles North of the city of Pittsburgh, Pennsylvania. The trajectory data is recorded using an on-site setup that includes an ADS-B receiver. The trajectory data provided spans days from 18 Sept 2020 till 23 Apr 2021 and includes a total of 111 days of data discounting downtime, repairs, and bad weather days with no traffic. Data is collected starting at 1:00 AM local time to 11:00 PM local time. The dataset uses an Automatic Dependent Surveillance-Broadcast (ADS-B) receiver placed within the airport premises to capture the trajectory data. The receiver uses both the 1090 MHz and 978 MHz frequencies to listen to these broadcasts. The ADS-B uses satellite navigation to produce accurate location and timestamp for the targets which is recorded on-site using our custom setup. Weather data during t
3 PAPERS • 1 BENCHMARK
Our trajectory dataset consists of camera-based images, LiDAR scanned point clouds, and manually annotated trajectories. It is collected under various lighting conditions and traffic densities in Beijing, China. More specifically, it contains highly complicated traffic flows mixed with vehicles, riders, and pedestrians.
2 PAPERS • 1 BENCHMARK
From Schaub, Michael T., et al. "Random walks on simplicial complexes and the normalized hodge 1-laplacian." SIAM Review 62.2 (2020): 353-391.
2 PAPERS • NO BENCHMARKS YET
SDD dataset contains a variety of indoor and outdoor scenes, designed for Image Defocus Deblurring. There are 50 indoor scenes and 65 outdoor scenes in the training set, and 11 indoor scenes and 24 outdoor scenes in the testing set.
The Euro-PVI dataset contains trajectories of pedestrians and bicyclists, with dense interactions with the ego-vehicle. The dataset is collected in Brussels and Leuven, Belgium. The goal of this dataset is to address the challenge of future trajectory prediction in urban environments with dense pedestrian (bicyclist) - vehicle interactions.
1 PAPER • 1 BENCHMARK
This dataset is a result of a study that was created to assess drivers behaviors when following a lead vehicle. The driving simulator study used a simulated suburban environment for collecting driver behavior data while following a lead vehicle driving through various unsignalized intersections. The driving environment had two lanes in each direction and a dedicated left-turn lane for the intersection. The experiment was deployed on a miniSim Driving Simulator. We programmed the lead vehicle ran- domly turn left, right or go straight through the intersections. In total we had 2(traffic density) × 2(speed level) × 3 = 12 scenarios for each participant to be tested on. We split the data into train, validation and test sets. The setup for the task is to observe 1 second of trajectories and predict the next 3,5 and 8 seconds.
1 PAPER • NO BENCHMARKS YET
The NBA SportVU dataset contains player and ball trajectories for 631 games from the 2015-2016 NBA season. The raw tracking data is in the JSON format, and each moment includes information about the identities of the players on the court, the identities of the teams, the period, the game clock, and the shot clock.
The PREdiction of Clinical Outcomes from Genomic profiles (or PRECOG) encompasses 166 cancer expression data sets, including overall survival data for ~18,000 patients diagnosed with 39 distinct malignancies.
The uniD dataset is an innovative collection of naturalistic road user trajectories, captured within the RWTH Aachen University campus using drone technology to address common challenges such as occlusions found in traditional traffic data collection methods. It meticulously documents the movement and classifies each road user by type. Employing cutting-edge computer vision algorithms, the dataset ensures high positional accuracy. Its utility spans various applications, from predicting road user behavior and modeling driver actions to conducting scenario-based safety checks for automated driving systems and facilitating the data-driven design of Highly Automated Driving (HAD) system components.