The Kinetics dataset is a large-scale, high-quality dataset for human action recognition in videos. The dataset consists of around 500,000 video clips covering 600 human action classes with at least 600 video clips for each action class. Each video clip lasts around 10 seconds and is labeled with a single action class. The videos are collected from YouTube.
1,180 PAPERS • 28 BENCHMARKS
BosphorusSign22k is a benchmark dataset for vision-based user-independent isolated Sign Language Recognition (SLR). The dataset is based on the BosphorusSign (Camgoz et al., 2016c) corpus which was collected with the purpose of helping both linguistic and computer science communities. It contains isolated videos of Turkish Sign Language glosses from three different domains: Health, finance and commonly used everyday signs. Videos in this dataset were performed by six native signers, which makes this dataset valuable for user independent sign language studies.
6 PAPERS • NO BENCHMARKS YET
Contains video clips shot with modern high-resolution mobile cameras, with strong projective distortions and with low lighting conditions.
5 PAPERS • NO BENCHMARKS YET
3DYoga90 is organized within a three-level label hierarchy. It stands out as one of the most comprehensive open datasets, featuring the largest collection of RGB videos and 3D skeleton sequences among publicly available resources.
2 PAPERS • NO BENCHMARKS YET
Includes challenging sequences and extensive data stratification in-terms of camera and object motion, velocity magnitudes, direction, and rotational speeds.
Imbalanced-MiniKinetics200 was proposed by "Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition" to evaluate varying scenarios of video long-tailed recognition. Similar to CIFAR-10/100-LT, it utilizes an imbalance factor to construct long-tailed variants of the MiniKinetics200 dataset. Imbalanced-MiniKinetics200 is a subset of Mini-Kinetics-200 consisting of 200 categories which is also a subset of Kinetics400. Both the raw frames and extracted features with ResNet50/101 are provided.
1 PAPER • NO BENCHMARKS YET
First of its kind paired win-fail action understanding dataset with samples from the following domains: “General Stunts,” “Internet Wins-Fails,” “Trick Shots,” & “Party Games.” The task is to identify successful and failed attempts at various activities. Unlike existing action recognition datasets, intra-class variation is high making the task challenging, yet feasible.
1 PAPER • 2 BENCHMARKS
This dataset defines a total of 11 crowd motion patterns and it is composed of over 6000 video sequences with an average length of 100 frames per sequence. This documentation presents how to download and process the Crowd-11 dataset.
0 PAPER • NO BENCHMARKS YET
Welcome to L-SVD L-SVD is an extensive and rigorously curated video dataset aimed at transforming the field of emotion recognition. This dataset features more than 20,000 short video clips, each carefully annotated to represent a range of human emotions. L-SVD stands at the intersection of Cognitive Science, Psychology, Computer Science, and Medical Science, providing a unique tool for both research and application in these fields.
Laser powder bed fusion (LBPF) is the additive manufacturing (3D printing) process for metals. RAISE-LPBF is a large dataset on the effect of laser power and laser dot speed in 316L stainless steel bulk material. Both process parameters are independently sampled for each scan line from a continuous distribution, so interactions of different parameter choices can be investigated. Process monitoring comprises on-axis high-speed (20k FPS) video. The data can be used to derive statistical properties of LPBF, as well as to build anomaly detectors.