CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).
1,055 PAPERS • 3 BENCHMARKS
AirSim is a simulator for drones, cars and more, built on Unreal Engine. It is open-source, cross platform, and supports software-in-the-loop simulation with popular flight controllers such as PX4 & ArduPilot and hardware-in-loop with PX4 for physically and visually realistic simulations. It is developed as an Unreal plugin that can simply be dropped into any Unreal environment. Similarly, there exists an experimental version for a Unity plugin.
240 PAPERS • NO BENCHMARKS YET
The INTERACTION dataset contains naturalistic motions of various traffic participants in a variety of highly interactive driving scenarios from different countries. The dataset can serve for many behavior-related research areas, such as
71 PAPERS • 1 BENCHMARK
The Talk2Car dataset finds itself at the intersection of various research domains, promoting the development of cross-disciplinary solutions for improving the state-of-the-art in grounding natural language into visual space. The annotations were gathered with the following aspects in mind: Free-form high quality natural language commands, that stimulate the development of solutions that can operate in the wild. A realistic task setting. Specifically, the authors consider an autonomous driving setting, where a passenger can control the actions of an Autonomous Vehicle by giving commands in natural language. The Talk2Car dataset was build on top of the nuScenes dataset to include an extensive suite of sensor modalities, i.e. semantic maps, GPS, LIDAR, RADAR and 360-degree RGB images annotated with 3D bounding boxes. Such variety of input modalities sets the object referral task on the Talk2Car dataset apart from related challenges, where additional sensor modalities are generally missing
34 PAPERS • 1 BENCHMARK
RadarScenes is a real-world radar point cloud dataset for automotive applications.
23 PAPERS • NO BENCHMARKS YET
ROAD is designed to test an autonomous vehicle's ability to detect road events, defined as triplets composed by an active agent, the action(s) it performs and the corresponding scene locations. ROAD comprises videos originally from the Oxford RobotCar Dataset, annotated with bounding boxes showing the location in the image plane of each road event.
19 PAPERS • NO BENCHMARKS YET
The Drive&Act dataset is a state of the art multi modal benchmark for driver behavior recognition. The dataset includes 3D skeletons in addition to frame-wise hierarchical labels of 9.6 Million frames captured by 6 different views and 3 modalities (RGB, IR and depth).
18 PAPERS • 1 BENCHMARK
ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20 times larger than PASCAL3D+ and KITTI, the current state-of-the-art.
17 PAPERS • 14 BENCHMARKS
The Argoverse 2 Motion Forecasting Dataset is a curated collection of 250,000 scenarios for training and validation. Each scenario is 11 seconds long and contains the 2D, birds-eye-view centroid and heading of each tracked object sampled at 10 Hz.
14 PAPERS • NO BENCHMARKS YET
DADA-2000 is a large-scale benchmark with 2000 video sequences (named as DADA-2000) is contributed with laborious annotation for driver attention (fixation, saccade, focusing time), accident objects/intervals, as well as the accident categories, and superior performance to state-of-the-arts are provided by thorough evaluations.
13 PAPERS • NO BENCHMARKS YET
Consists of over 50,000 frames and includes high-definition images with full resolution depth information, semantic segmentation (images), point-wise segmentation (point clouds), and detailed annotations for all vehicles and people.
TITAN consists of 700 labeled video-clips (with odometry) captured from a moving vehicle on highly interactive urban traffic scenes in Tokyo. The dataset includes 50 labels including vehicle states and actions, pedestrian age groups, and targeted pedestrian action attributes that are organized hierarchically corresponding to atomic, simple/complex-contextual, transportive, and communicative actions.
11 PAPERS • NO BENCHMARKS YET
comma 2k19 is a dataset of over 33 hours of commute in California's 280 highway. This means 2019 segments, 1 minute long each, on a 20km section of highway driving between California's San Jose and San Francisco. The dataset was collected using comma EONs that have sensors similar to those of any modern smartphone including a road-facing camera, phone GPS, thermometers and a 9-axis IMU.
10 PAPERS • NO BENCHMARKS YET
SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is assigned a label from one of nine categories.
8 PAPERS • NO BENCHMARKS YET
A challenging multi-agent seasonal dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017-18.
7 PAPERS • NO BENCHMARKS YET
The EuroCity Persons dataset provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238,200 person instances manually labeled in over 47,300 images, EuroCity Persons is nearly one order of magnitude larger than person datasets used previously for benchmarking. The dataset furthermore contains a large number of person orientation annotations (over 211,200).
6 PAPERS • NO BENCHMARKS YET
An annotated dataset is released to enable dynamic scene classification that includes 80 hours of diverse high quality driving video data clips collected in the San Francisco Bay area. The dataset includes temporal annotations for road places, road types, weather, and road surface conditions.
4 PAPERS • NO BENCHMARKS YET
A self-driving dataset for motion prediction, containing over 1,000 hours of data. This was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California, over a four-month period. It consists of 170,000 scenes, where each scene is 25 seconds long and captures the perception output of the self-driving system, which encodes the precise positions and motions of nearby vehicles, cyclists, and pedestrians over time.
Consists of two pedestrian trajectory datasets, CITR dataset and DUT dataset, so that the pedestrian motion models can be further calibrated and verified, especially when vehicle influence on pedestrians plays an important role.
3 PAPERS • NO BENCHMARKS YET
EyeCar is a dataset of driving videos of vehicles involved in rear-end collisions paired with eye fixation data captured from human subjects. It contains 21 front-view videos that were captured in various traffic, weather, and day light conditions. Each video is 30sec in length and contains typical driving tasks (e.g., lanekeeping, merging-in, and braking) ending to rear-end collisions.
2 PAPERS • NO BENCHMARKS YET
The TCG dataset is used to evaluate Traffic Control Gesture recognition for autonomous driving. The dataset is based on 3D body skeleton input to perform traffic control gesture classification on every time step. The dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence.
2 PAPERS • 1 BENCHMARK
TuSimple Lane is an extension of the TuSimple dataset with 14,336 lane boundaries annotations. Each lane boundary in the dataset is annotated using 7 different classes such as “Single Dashed”, “Double Dashed” or “Single White Continuous”.
CITR Dataset consists of experimentally designed fundamental VCI scenarios (front, back, and lateral VCIs) and provides unique ID for each pedestrian, which is suitable for exploring a specific aspect of VCI. DUT dataset gives two ordinary and natural VCI scenarios in crowded university campus, which can be used for more general purpose VCI exploration.
1 PAPER • NO BENCHMARKS YET
The PREdiction of Clinical Outcomes from Genomic profiles (or PRECOG) encompasses 166 cancer expression data sets, including overall survival data for ~18,000 patients diagnosed with 39 distinct malignancies.
Panoramic Video Panoptic Segmentation Dataset is a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving. The dataset has labels for 28 semantic categories and 2,860 temporal sequences that were captured by five cameras mounted on autonomous vehicles driving in three different geographical locations, leading to a total of 100k labeled camera images.
This dataset is a collection of 4,000 images of cars in multiple scenes that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organizations to enable their creative and machine learning projects. For more details, please refer to the link: https://www.pixta.ai/ Or send your inquiries to contact@pixta.ai
0 PAPER • NO BENCHMARKS YET
IN2LAAMA is a set of lidar-inertial datasets collected with a Velodyne VLP-16 lidar and a Xsens MTi-3 IMU.