Authors:
Elham Iravani
1
;
2
;
Frederik Hasecke
2
;
Lukas Hahn
2
and
Tobias Meisen
1
Affiliations:
1
University of Wuppertal, Gaußstraße 20, Wuppertal, Germany
;
2
APTIV, Am Technologiepark 1, Wuppertal, Germany
Keyword(s):
Human Pose Estimation, Absolute Pose Estimation, Pose Refinement.
Abstract:
Human Pose Estimation (HPE) is a critical task in computer vision, involving the prediction of human body joint coordinates from images or videos. Traditional 3D HPE methods often predict joint positions relative to a central body part, such as the hip. Transformer-based models like PoseFormer (Zheng et al., 2021), MHFormer (Li et al., 2022b), and PoseFormerV2 (Zhao et al., 2023) have advanced the field by capturing spatial and temporal relationships to improve prediction accuracy. However, these models primarily output relative joint positions, requiring additional steps for absolute pose estimation. In this work, we present a novel post-processing technique that refines the output of other HPE methods from monocular images. By leveraging projection and spatial constraints, our method enhances the accuracy of relative joint predictions and seamlessly transitions them to absolute poses. Validated on the Human3.6M dataset (Ionescu et al., 2013), our approach demonstrates significant i
mprovements over existing methods, achieving state-of-the-art performance in both relative and absolute 3D human pose estimation. Our method achieves a notable error reduction, with a 33.9% improvement compared to PoseFormer and a 27.2% improvement compared to MHFormer estimations.
(More)