3D Absolute Human Pose Estimation
9 papers with code • 3 benchmarks • 6 datasets
This task aims to solve absolute (camera-centric not root-relative) 3D human pose estimation.
( Image credit: RootNet )
Most implemented papers
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation
Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense classification problems such as semantic segmentation.
Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image
Although significant improvement has been achieved recently in 3D human pose estimation, most of the previous methods only treat a single-person case.
Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach
Then we lift the multi-view 2D poses to the 3D space by an Orientation Regularized Pictorial Structure Model (ORPSM) which jointly minimizes the projection error between the 3D and 2D poses, along with the discrepancy between the 3D pose and IMU orientations.
MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation
Heatmap representations have formed the basis of human pose estimation systems for many years, and their extension to 3D has been a fruitful line of recent research.
End-to-End Human Pose and Mesh Reconstruction with Transformers
We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image.
Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos
To tackle this problem, we propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses that do not require camera parameters.
Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
To this end, we propose Multi-level Attention Encoder-Decoder Network (MAED), including a Spatial-Temporal Encoder (STE) and a Kinematic Topology Decoder (KTD) to model multi-level attentions in a unified framework.
Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation
We consider a new problem of adapting a human mesh reconstruction model to out-of-domain streaming videos, where performance of existing SMPL-based models are significantly affected by the distribution shift represented by different camera parameters, bone lengths, backgrounds, and occlusions.
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model.