Aff-Wild is a dataset for emotion recognition from facial images in a variety of head poses, illumination conditions and occlusions.
111 PAPERS • NO BENCHMARKS YET
Comprises 11 hand gesture categories from 29 subjects under 3 illumination conditions.
77 PAPERS • 5 BENCHMARKS
A large dataset of human hand images (dorsal and palmar sides) with detailed ground-truth information for gender recognition and biometric identification.
76 PAPERS • NO BENCHMARKS YET
The Microsoft Research Cambridge-12 Kinect gesture data set consists of sequences of human movements, represented as body-part locations, and the associated gesture to be recognized by the system. The data set includes 594 sequences and 719,359 frames—approximately six hours and 40 minutes—collected from 30 people performing 12 gestures. In total, there are 6,244 gesture instances. The motion files contain tracks of 20 joints estimated using the Kinect Pose Estimation pipeline. The body poses are captured at a sample rate of 30Hz with an accuracy of about two centimeters in joint positions.
31 PAPERS • 2 BENCHMARKS
The SHREC dataset contains 14 dynamic gestures performed by 28 participants (all participants are right handed) and captured by the Intel RealSense short range depth camera. Each gesture is performed between 1 and 10 times by each participant in two way: using one finger and the whole hand. Therefore, the dataset is composed by 2800 sequences captured. The depth image, with a resolution of 640x480, and the coordinates of 22 joints (both in the 2D depth image space and in the 3D world space) are saved for each frame of each sequence in the dataset.
28 PAPERS • 8 BENCHMARKS
Jester Gesture Recognition dataset includes 148,092 labeled video clips of humans performing basic, pre-defined hand gestures in front of a laptop camera or webcam. It is designed for training machine learning models to recognize human hand gestures like sliding two fingers down, swiping left or right and drumming fingers.
16 PAPERS • 6 BENCHMARKS
A novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis.
15 PAPERS • NO BENCHMARKS YET
The IPN Hand dataset is a benchmark video dataset with sufficient size, variation, and real-world elements able to train and evaluate deep neural networks for continuous Hand Gesture Recognition (HGR).
5 PAPERS • NO BENCHMARKS YET
Contains static tasks as well as a multitude of more dynamic tasks, involving larger motion of the hands. The dataset has 55 tremor patient recordings together with: associated ground truth accelerometer data from the most affected hand, RGB video data, and aligned depth data.
3 PAPERS • NO BENCHMARKS YET
The TCG dataset is used to evaluate Traffic Control Gesture recognition for autonomous driving. The dataset is based on 3D body skeleton input to perform traffic control gesture classification on every time step. The dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence.
2 PAPERS • 1 BENCHMARK
The Florentine dataset is a dataset of facial gestures which contains facial clips from 160 subjects (both male and female), where gestures were artificially generated according to a specific request, or genuinely given due to a shown stimulus. 1032 clips were captured for posed expressions and 1745 clips for induced facial expressions amounting to a total of 2777 video clips. Genuine facial expressions were induced in subjects using visual stimuli, i.e. videos selected randomly from a bank of Youtube videos to generate a specific emotion.
1 PAPER • NO BENCHMARKS YET
Explicitly created for Human Computer Interaction (HCI).
MlGesture is a dataset for hand gesture recognition tasks, recorded in a car with 5 different sensor types at two different viewpoints. The dataset contains over 1300 hand gesture videos from 24 participants and features 9 different hand gesture symbols. One sensor cluster with five different cameras is mounted in front of the driver in the center of the dashboard. A second sensor cluster is mounted on the ceiling looking straight down.
We introduce a large-scale video dataset Slovo for Russian Sign Language task. Slovo dataset size is about 16 GB, and it contains 20400 RGB videos for 1000 sign language gestures from 194 singers. Each class has 20 samples. The dataset is divided into training set and test set by subject user_id. The training set includes 15300 videos, and the test set includes 5100 videos. The total video recording time is ~9.2 hours. About 35% of the videos are recorded in HD format, and 65% of the videos are in FullHD resolution. The average video length with gesture is 50 frames.
1 PAPER • 1 BENCHMARK