The UCY dataset consist of real pedestrian trajectories with rich multi-human interaction scenarios captured at 2.5 Hz (Δt=0.4s). It is composed of three sequences (Zara01, Zara02, and UCY), taken in public spaces from top-view.
158 PAPERS • 1 BENCHMARK
As autonomous driving systems mature, motion forecasting has received increasing attention as a critical requirement for planning. Of particular importance are interactive situations such as merges, unprotected turns, etc., where predicting individual object motion is not sufficient. Joint predictions of multiple objects are required for effective route planning. There has been a critical need for high-quality motion data that is rich in both interactions and annotation to develop motion planning models. In this work, we introduce the most diverse interactive motion dataset to our knowledge, and provide specific labels for interacting objects suitable for developing joint prediction models. With over 100,000 scenes, each 20 seconds long at 10 Hz, our new dataset contains more than 570 hours of unique data over 1750 km of roadways. It was collected by mining for interesting interactions between vehicles, pedestrians, and cyclists across six cities within the United States. We use a high
65 PAPERS • NO BENCHMARKS YET
The Argoverse 2 Motion Forecasting Dataset is a curated collection of 250,000 scenarios for training and validation. Each scenario is 11 seconds long and contains the 2D, birds-eye-view centroid and heading of each tracked object sampled at 10 Hz.
14 PAPERS • NO BENCHMARKS YET
Extreme Pose Interaction (ExPI) Dataset is a new person interaction dataset of Lindy Hop dancing actions. In Lindy Hop, the two dancers are called leader and follower. The authors recorded two couples of dancers in a multi-camera setup equipped also with a motion-capture system. 16 different actions are performed in ExPI dataset, some by the two couples of dancers, some by only one of the couples. Each action was repeated five times to account for variability. More precisely, for each recorded sequence, ExPI provides: (i) Multi-view videos at 25FPS from all the cameras in the recording setup; (ii) Mocap data (3D position of 18 joints for each person) at 25FPS synchronized with the videos.; (iii) camera calibration information; and (iv) 3D shapes as textured meshes for each frame.
14 PAPERS • 2 BENCHMARKS
The SIND dataset is based on 4K video captured by drones, providing information including traffic participant trajectories, traffic light status, and high-definition maps
10 PAPERS • NO BENCHMARKS YET
The GTA Indoor Motion dataset (GTA-IM) that emphasizes human-scene interactions in the indoor environments. It consists of HD RGB-D image sequences of 3D human motion from a realistic game engine. The dataset has clean 3D human pose and camera pose annotations, and large diversity in human appearances, indoor environments, camera views, and human activities.
9 PAPERS • 2 BENCHMARKS
NTU RGB+D 2D is a curated version of NTU RGB+D often used for skeleton-based action prediction and synthesis. It contains less number of actions.
7 PAPERS • 1 BENCHMARK
A self-driving dataset for motion prediction, containing over 1,000 hours of data. This was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California, over a four-month period. It consists of 170,000 scenes, where each scene is 25 seconds long and captures the perception output of the self-driving system, which encodes the precise positions and motions of nearby vehicles, cyclists, and pedestrians over time.
4 PAPERS • NO BENCHMARKS YET
Covers 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class.
Consists of two pedestrian trajectory datasets, CITR dataset and DUT dataset, so that the pedestrian motion models can be further calibrated and verified, especially when vehicle influence on pedestrians plays an important role.
3 PAPERS • NO BENCHMARKS YET
Occ-Traj120 is a trajectory dataset that contains occupancy representations of different local-maps with associated trajectories. This dataset contains 400 locally-structured maps with occupancy representation and roughly around 120K trajectories in total.
2 PAPERS • NO BENCHMARKS YET
This dataset is a result of a study that was created to assess drivers behaviors when following a lead vehicle. The driving simulator study used a simulated suburban environment for collecting driver behavior data while following a lead vehicle driving through various unsignalized intersections. The driving environment had two lanes in each direction and a dedicated left-turn lane for the intersection. The experiment was deployed on a miniSim Driving Simulator. We programmed the lead vehicle ran- domly turn left, right or go straight through the intersections. In total we had 2(traffic density) × 2(speed level) × 3 = 12 scenarios for each participant to be tested on. We split the data into train, validation and test sets. The setup for the task is to observe 1 second of trajectories and predict the next 3,5 and 8 seconds.
1 PAPER • NO BENCHMARKS YET
The Freiburg Street Crossing dataset consists of data collected from three different street crossings in Freiburg, Germany; ; two of which were traffic light regulated intersections and one a zebra crossing without traffic lights. The data can be used to train agents to cross roads autonomously.