GazeFollow is a large-scale dataset annotated with the location of where people in images are looking. It uses several major datasets that contain people as a source of images: 1, 548 images from SUN, 33, 790 images from MS COCO, 9, 135 images from Actions 40, 7, 791 images from PASCAL, 508 images from the ImageNet detection challenge and 198, 097 images from the Places dataset. This concatenation results in a challenging and large image collection of people performing diverse activities in many everyday scenarios.
34 PAPERS • NO BENCHMARKS YET
This dataset includes time-series data generated by accelerometer and gyroscope sensors (attitude, gravity, userAcceleration, and rotationRate). It is collected with an iPhone 6s kept in the participant's front pocket using SensingKit which collects information from Core Motion framework on iOS devices. All data is collected in 50Hz sample rate. A total of 24 participants in a range of gender, age, weight, and height performed 6 activities in 15 trials in the same environment and conditions: downstairs, upstairs, walking, jogging, sitting, and standing.
29 PAPERS • NO BENCHMARKS YET
The Drive&Act dataset is a state of the art multi modal benchmark for driver behavior recognition. The dataset includes 3D skeletons in addition to frame-wise hierarchical labels of 9.6 Million frames captured by 6 different views and 3 modalities (RGB, IR and depth).
18 PAPERS • 1 BENCHMARK
A database with 2,000 videos captured by surveillance cameras in real-world scenes.
14 PAPERS • 1 BENCHMARK
First-Person Hand Action Benchmark is a collection of RGB-D video sequences comprised of more than 100K frames of 45 daily hand action categories, involving 26 different objects in several hand configurations.
13 PAPERS • 2 BENCHMARKS
Large-scale dataset for human activity recognition. Existing security datasets either focus on activity counts by aggregating public video disseminated due to its content, which typically excludes same-scene background video, or they achieve persistence by observing public areas and thus cannot control for activity content. The dataset is over 9300 hours of untrimmed, continuous video, scripted to include diverse, simultaneous activities, along with spontaneous background activity.
12 PAPERS • NO BENCHMARKS YET
Contains 60 female and 30 male actors performing a collection of 20 predefined everyday actions and sports movements, and one self-chosen movement.
10 PAPERS • 1 BENCHMARK
Home Action Genome is a large-scale multi-view video database of indoor daily activities. Every activity is captured by synchronized multi-view cameras, including an egocentric view. There are 30 hours of vides with 70 classes of daily activities and 453 classes of atomic actions.
7 PAPERS • 2 BENCHMARKS
This is a 3D action recognition dataset, also known as 3D Action Pairs dataset. The actions in this dataset are selected in pairs such that the two actions of each pair are similar in motion (have similar trajectories) and shape (have similar objects); however, the motion-shape relation is different.
7 PAPERS • 1 BENCHMARK
Includes egocentric videos containing hands in the wild.
6 PAPERS • NO BENCHMARKS YET
The MISAW data set is composed of 27 sequences of micro-surgical anastomosis on artificial blood vessels performed by 3 surgeons and 3 engineering students. The dataset contained video, kinematic, and procedural descriptions synchronized at 30Hz. The procedural descriptions contained phases, steps, and activities performed by the participants.
OPERAnet is a multimodal activity recognition dataset acquired from radio frequency and vision-based sensors. Approximately 8 hours of annotated measurements are provided, which are collected across two different rooms from 6 participants performing 6 activities, namely, sitting down on a chair, standing from sit, lying down on the ground, standing from the floor, walking and body rotating. The dataset has been acquired from four synchronized modalities for the purpose of passive Human Activity Recognition (HAR) as well as localization and crowd counting.
Augments the video-description dataset TACoS with short and single sentence descriptions.
A dataset which provides detailed annotations for activity recognition.
5 PAPERS • 1 BENCHMARK
Simitate is a hybrid benchmarking suite targeting the evaluation of approaches for imitation learning. It consists on a dataset containing 1938 sequences where humans perform daily activities in a realistic environment. The dataset is strongly coupled with an integration into a simulator. RGB and depth streams with a resolution of 960×540 at 30Hz and accurate ground truth poses for the demonstrator's hand, as well as the object in 6 DOF at 120Hz are provided. Along with the dataset the 3D model of the used environment and labelled object images are also provided.
5 PAPERS • NO BENCHMARKS YET
The Sims4Action Dataset: a videogame-based dataset for Synthetic→Real domain adaptation for human activity recognition.
Multimodal Dyadic Behavior (MMDB) dataset is a unique collection of multimodal (video, audio, and physiological) recordings of the social and communicative behavior of toddlers. The MMDB contains 160 sessions of 3-5 minute semi-structured play interaction between a trained adult examiner and a child between the age of 15 and 30 months. The MMDB dataset supports a novel problem domain for activity recognition, which consists of the decoding of dyadic social interactions between adults and children in a developmental context.
3 PAPERS • NO BENCHMARKS YET
PETRAW data set was composed of 150 sequences of peg transfer training sessions. The objective of the peg transfer session is to transfer 6 blocks from the left to the right and back. Each block must be extracted from a peg with one hand, transferred to the other hand, and inserted in a peg at the other side of the board. All cases were acquired by a non-medical expert on the LTSI Laboratory from the University of Rennes. The data set was divided into a training data set composed of 90 cases and a test data set composed of 60 cases. A case was composed of kinematic data, a video, semantic segmentation of each frame, and workflow annotation.
3 PAPERS • 6 BENCHMARKS
Includes 11,771 samples of both human activities and falls performed by 30 subjects of ages ranging from 18 to 60 years. Samples are divided in 17 fine grained classes grouped in two coarse grained classes: one containing samples of 9 types of activities of daily living (ADL) and the other containing samples of 8 types of falls. The dataset has been stored to include all the information useful to select samples according to different criteria, such as the type of ADL, the age, the gender, and so on.
Contains annotations of human activity with different sub-actions, e.g., activity Ping-Pong with four sub-actions which are pickup-ball, hit, bounce-ball and serve.
2 PAPERS • NO BENCHMARKS YET
EgoHOS is a labeled dataset consisting of 11243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with during a diverse array of daily activities. The data are collected form multiple sources: 7,458 frames from Ego4D, 2,212 frames from EPIC-KITCHEN, 806 frames from THU-READ, and 350 frames of our own collected egocentric videos with people playing Escape Room. This dataset is designed for tasks including hand state classification, video activity recognition, 3D mesh reconstruction of hand-object interactions, and video inpainting of hand-object foregrounds in egocentric videos.
A multi-sensor, multi-modal dataset, implemented to benchmark Human Activity Recognition(HAR) and Multi-modal Fusion algorithms. Collection of this dataset was inspired by the need for recognising and evaluating quality of exercise performance to support patients with Musculoskeletal Disorders(MSD).
CLAD (Compled and Long Activities Dataset) is an activity dataset which exhibits real-life and diverse scenarios of complex, temporally-extended human activities and actions. The dataset consists of a set of videos of actors performing everyday activities in a natural and unscripted manner. The dataset was recorded using a static Kinect 2 sensor which is commonly used on many robotic platforms. The dataset comprises of RGB-D images, point cloud data, automatically generated skeleton tracks in addition to crowdsourced annotations.
1 PAPER • NO BENCHMARKS YET
HASCD (Human Activity Segmentation Challenge Dataset) contains 250 annotated multivariate time series capturing 10.7 h of real-world human motion smartphone sensor data from 15 bachelor computer science students. The recordings capture 6 distinct human motion sequences designed to represent pervasive behaviour in realistic indoor and outdoor settings. The data set serves as a benchmark for evaluating machine learning workflows.
INDRA is a dataset capturing videos of Indian roads from the pedestrian point-of-view. INDRA contains 104 videos comprising of 26k 1080p frames, each annotated with a binary road crossing safety label and vehicle bounding boxes.
MOSAD (Mobile Sensing Human Activity Data Set) is a multi-modal, annotated time series (TS) data set that contains 14 recordings of 9 triaxial smartphone sensor measurements (126 TS) from 6 human subjects performing (in part) 3 motion sequences in different locations. The aim of the data set is to facilitate the study of human behaviour and the design of TS data mining technology to separate individual activities using low-cost sensors in wearable devices.
MPHOI-72 is a multi-person human-object interaction dataset that can be used for a wide variety of HOI/activity recognition and pose estimation/object tracking tasks. The dataset is challenging due to many body occlusions among the humans and objects. It consists of 72 videos captured from 3 different angles at 30 fps, with totally 26,383 frames and an average length of 12 seconds. It involves 5 humans performing in pairs, 6 object types, 3 activities and 13 sub-activities. The dataset includes color video, depth video, human skeletons, human and object bounding boxes.
DAHLIA dataset [1] is devoted to human activity recognition, which is a major issue for adapting smart-home services such as user assistance. DAHLIA has been realized in Mobile Mii Platform by CEA LIST, and has been partly supported by ITEA 3 Emospaces Project (https://itea3.org/project/emospaces.html)
0 PAPER • NO BENCHMARKS YET
InfiniteRep is a synthetic, open-source dataset for fitness and physical therapy (PT) applications. It includes 1k videos of diverse avatars performing multiple repetitions of common exercises. It includes significant variation in the environment, lighting conditions, avatar demographics, and movement trajectories. From cadence to kinematic trajectory, each rep is done slightly differently -- just like real humans. InfiniteRep videos are accompanied by a rich set of pixel-perfect labels and annotations, including frame-specific repetition counts.