AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. Each of the video clips has been exhaustively annotated by human annotators, and together they represent a rich variety of scenes, recording conditions, and expressions of human activity. There are annotations for:
94 PAPERS • 7 BENCHMARKS
A large scale dataset with daily-living activities performed in a natural manner.
24 PAPERS • 2 BENCHMARKS
ROAD is designed to test an autonomous vehicle's ability to detect road events, defined as triplets composed by an active agent, the action(s) it performs and the corresponding scene locations. ROAD comprises videos originally from the Oxford RobotCar Dataset, annotated with bounding boxes showing the location in the image plane of each road event.
19 PAPERS • NO BENCHMARKS YET
Large-scale dataset for human activity recognition. Existing security datasets either focus on activity counts by aggregating public video disseminated due to its content, which typically excludes same-scene background video, or they achieve persistence by observing public areas and thus cannot control for activity content. The dataset is over 9300 hours of untrimmed, continuous video, scripted to include diverse, simultaneous activities, along with spontaneous background activity.
12 PAPERS • NO BENCHMARKS YET
Toyota Smarthome Untrimmed (TSU) is a dataset for activity detection in long untrimmed videos. The dataset contains 536 videos with an average duration of 21 mins. Since this dataset is based on the same footage video as Toyota Smarthome Trimmed version, it features the same challenges and introduces additional ones. The dataset is annotated with 51 activities.
10 PAPERS • 1 BENCHMARK
Contains densely labeled speech activity in YouTube videos, with the goal of creating a shared, available dataset for this task.
9 PAPERS • 1 BENCHMARK
Home Action Genome is a large-scale multi-view video database of indoor daily activities. Every activity is captured by synchronized multi-view cameras, including an egocentric view. There are 30 hours of vides with 70 classes of daily activities and 453 classes of atomic actions.
7 PAPERS • 2 BENCHMARKS
The MLB-YouTube dataset is a new, large-scale dataset consisting of 20 baseball games from the 2017 MLB post-season available on YouTube with over 42 hours of video footage. The dataset consists of two components: segmented videos for activity recognition and continuous videos for activity classification. It is quite challenging as it is created from TV broadcast baseball games where multiple different activities share the camera angle. Further, the motion/appearance difference between the various activities is quite small.
5 PAPERS • NO BENCHMARKS YET
An abnormal activity data-set for research use that contains 4,83,566 annotated frames.
3 PAPERS • NO BENCHMARKS YET
40,764 images (11,659 protest images and hard negatives) with various annotations of visual attributes and sentiments.
2 PAPERS • NO BENCHMARKS YET
DAHLIA dataset [1] is devoted to human activity recognition, which is a major issue for adapting smart-home services such as user assistance. DAHLIA has been realized in Mobile Mii Platform by CEA LIST, and has been partly supported by ITEA 3 Emospaces Project (https://itea3.org/project/emospaces.html)
0 PAPER • NO BENCHMARKS YET
InfiniteRep is a synthetic, open-source dataset for fitness and physical therapy (PT) applications. It includes 1k videos of diverse avatars performing multiple repetitions of common exercises. It includes significant variation in the environment, lighting conditions, avatar demographics, and movement trajectories. From cadence to kinematic trajectory, each rep is done slightly differently -- just like real humans. InfiniteRep videos are accompanied by a rich set of pixel-perfect labels and annotations, including frame-specific repetition counts.