HumanML3D is a 3D human motion-language dataset that originates from a combination of HumanAct12 and Amass dataset. It covers a broad range of human actions such as daily activities (e.g., 'walking', 'jumping'), sports (e.g., 'swimming', 'playing golf'), acrobatics (e.g., 'cartwheel') and artistry (e.g., 'dancing'). Overall, HumanML3D dataset consists of 14,616 motions and 44,970 descriptions composed by 5,371 distinct words. The total length of motions amounts to 28.59 hours. The average motion length is 7.1 seconds, while average description length is 12 words.
98 PAPERS • 2 BENCHMARKS
The KIT Motion-Language is a dataset linking human motion and natural language.
32 PAPERS • 2 BENCHMARKS
HumanAct12 is a new 3D human motion dataset adopted from the polar image and 3D pose dataset PHSPD, with proper temporal cropping and action annotating. Statistically, there are 1191 3D motion clips(and 90,099 poses in total) which are categorized into 12 action classes, and 34 fine-grained sub-classes. The action types includes daily actions such as walk, run, sit down, jump up, warm up, etc. Fine-grained action types contain more specific information like Warm up by bowing left side, Warm up by pressing left leg, etc.
26 PAPERS • 2 BENCHMARKS
AIST++ is a 3D dance dataset which contains 3D motion reconstructed from real dancers paired with music. The AIST++ Dance Motion Dataset is constructed from the AIST Dance Video DB. With multi-view videos, an elaborate pipeline is designed to estimate the camera parameters, 3D human keypoints and 3D human dance motion sequences:
15 PAPERS • 2 BENCHMARKS
InterHuman is a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 16,756 natural language descriptions.
13 PAPERS • 1 BENCHMARK
ARCTIC is a dataset of free-form interactions of hands and articulated objects. ARCTIC has 1.2M images paired with accurate 3D meshes for both hands and for objects that move and deform over time. The dataset also provides hand-object contact information.
9 PAPERS • NO BENCHMARKS YET
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
7 PAPERS • 1 BENCHMARK
BRACE is a dataset for audio-conditioned dance motion synthesis challenging common assumptions for this task:
5 PAPERS • 2 BENCHMARKS
Ubisoft La Forge Animation Dataset ("LAFAN1") Ubisoft La Forge Animation dataset and accompanying code for the SIGGRAPH 2020 paper Robust Motion In-betweening.
4 PAPERS • 1 BENCHMARK
CHAIRS is a large-scale motion-captured f-AHOI dataset, consisting of 17.3 hours of versatile interactions between 46 participants and 81 articulated and rigid sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as realistic and physically plausible full-body interactions.
2 PAPERS • NO BENCHMARKS YET
Trinity Gesture Dataset includes 23 takes, totalling 244 minutes of motion capture and audio of a male native English speaker producing spontaneous speech on different topics. The actor’s motion was captured with 20 Viconcameras at 59.94 frames per second(fps), and the skeleton includes 69 joints.
2 PAPERS • 3 BENCHMARKS
BOTH57M is a body-hand dataset with body-level text prompts and finger-level text prompts.
1 PAPER • NO BENCHMARKS YET
Data used for the paper SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data