Authors:
Diógenes Silva
1
;
João Lima
1
;
2
;
Diego Thomas
3
;
Hideaki Uchiyama
4
and
Veronica Teichrieb
1
Affiliations:
1
Voxar Labs, Centro de Informática, Universidade Federal de Pernambuco, Recife, PE, Brazil
;
2
Visual Computing Lab, Departamento de Computação, Universidade Federal Rural de Pernambuco, Recife, PE, Brazil
;
3
Faculty of Information Science and Electrical Engineering, Kyushu University
;
4
Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
Keyword(s):
3D Human Pose Estimation, Unsupervised Learning, Deep Learning, Reprojection Error.
Abstract:
We present UMVpose++ to address the problem of 3D pose estimation of multiple persons in a multi-view
scenario. Different from the most recent state-of-the-art methods, which are based on supervised techniques,
our work does not need labeled data to perform 3D pose estimation. Furthermore, generating 3D annotations is
costly and has a high probability of containing errors. Our approach uses a plane sweep method to generate the
3D pose estimation. We define one view as the target and the remainder as reference views. We estimate the
depth of each 2D skeleton in the target view to obtain our 3D poses. Instead of comparing them with ground
truth poses, we project the estimated 3D poses onto the reference views, and we compare the 2D projections
with the 2D poses obtained using an off-the-shelf method. 2D poses of the same pedestrian obtained from the
target and reference views must be matched to allow comparison. By performing a matching process based
on ground points, we ident
ify the corresponding 2D poses and compare them with our respective projections.
Furthermore, we propose a new reprojection loss based on the smooth L1 norm. We evaluated our proposed
method on the publicly available Campus dataset. As a result, we obtained better accuracy than state-of-the-art
unsupervised methods, achieving 0.5% points above the best geometric method. Furthermore, we outperform
some state-of-the-art supervised methods, and our results are comparable with the best-supervised method,
achieving only 0.2% points below.
(More)