Moments in Time is a large-scale dataset for recognizing and understanding action in videos. The dataset includes a collection of one million labeled 3 second videos, involving people, animals, objects or natural phenomena, that capture the gist of a dynamic scene.
88 PAPERS • 2 BENCHMARKS
The UCSD Anomaly Detection Dataset was acquired with a stationary camera mounted at an elevation, overlooking pedestrian walkways. The crowd density in the walkways was variable, ranging from sparse to very crowded. In the normal setting, the video contains only pedestrians. Abnormal events are due to either: the circulation of non pedestrian entities in the walkways anomalous pedestrian motion patterns Commonly occurring anomalies include bikers, skaters, small carts, and people walking across a walkway or in the grass that surrounds it. A few instances of people in wheelchair were also recorded. All abnormalities are naturally occurring, i.e. they were not staged for the purposes of assembling the dataset. The data was split into 2 subsets, each corresponding to a different scene. The video footage recorded from each scene was split into various clips of around 200 frames.
79 PAPERS • 3 BENCHMARKS
The UT-Kinect dataset is a dataset for action recognition from depth sequences. The videos were captured using a single stationary Kinect. There are 10 action types: walk, sit down, stand up, pick up, carry, throw, push, pull, wave hands, clap hands. There are 10 subjects, Each subject performs each actions twice. Three channels were recorded: RGB, depth and skeleton joint locations. The three channel are synchronized. The framerate is 30f/s.
70 PAPERS • 2 BENCHMARKS
The UTD-MHAD dataset consists of 27 different actions performed by 8 subjects. Each subject repeated the action for 4 times, resulting in 861 action sequences in total. The RGB, depth, skeleton and the inertial sensor signals were recorded.
54 PAPERS • 2 BENCHMARKS
DailyActivity3D dataset is a daily activity dataset captured by a Kinect device. There are 16 activity types: drink, eat, read book, call cellphone, write on a paper, use laptop, use vacuum cleaner, cheer up, sit still, toss paper, play game, lay down on sofa, walk, play guitar, stand up, sit down. If possible, each subject performs an activity in two different poses: “sitting on sofa” and “standing”. The total number of the activity samples is 320. This dataset is designed to cover human’s daily activities in the living room. When the performer stands close to the sofa or sits on the sofa, the 3D joint positions extracted by the skeleton tracker are very noisy. Moreover, most of the activities involve the humans-object interactions. Thus this dataset is more challenging.
44 PAPERS • 1 BENCHMARK
MMAct is a large-scale dataset for multi/cross modal action understanding. This dataset has been recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios.
18 PAPERS • 1 BENCHMARK
Home Action Genome is a large-scale multi-view video database of indoor daily activities. Every activity is captured by synchronized multi-view cameras, including an egocentric view. There are 30 hours of vides with 70 classes of daily activities and 453 classes of atomic actions.
7 PAPERS • 2 BENCHMARKS