accumulation. To reduce error accumulation, a multi-
task learning framework can be adopted to separately
handle pose parameters and shape parameters. By
sharing some network layers, the model can
complement each other during the learning process,
improving overall accuracy. For example, a joint
network for pose estimation and shape reconstruction
can be designed, with each having its own
independent loss function. Additionally, the mesh
reconstruction algorithm can be improved to reduce
regression errors. Finally, a Generative Adversarial
Network (GAN) model can be used to improve the
quality of reconstructed meshes.
5 CONCLUSIONS
This paper introduces the current mainstream
methods from 2D and 3D respectively. The two-
dimensional human body is estimated based on Wi-Fi
signal, and the CSI sequence is integrated by spatial
encoder. Based on self-supervised 3D human Pose
estimation, the Pose ResNet convolutional neural
architecture is divided into 2D and 3D modules to
detect features and key joints. 3D human pose
estimation based on encoder and regression decoder
enhances the accuracy of human pose estimation by
adding both time and space.
At present, 2D motion capture can be divided into
camera-based visual capture, hand-drawn animation,
motion capture software, sensor capture. These
methods have their own characteristics and use cases.
In future research, it can be optimized for real-time
interaction. Optimize the real-time performance of
2D motion capture, improve the interactivity of the
system, and enable users to operate and edit more
conveniently.
At present, the mainstream 3D motion capture
method can be divided into optical camera systems,
inertial measurement unit, depth cameras, optical
tracking system, laser scanning system. These
methods have been widely used in different fields. In
future research, it can be combined with visual data,
inertial data, depth data, etc., so as to enhance the
accuracy of capture.
AUTHORS CONTRIBUTION
All the authors contributed equally and their names
were listed in alphabetical order.
REFERENCES
Ben Gamra, M., & Akhloufi, M. A., 2021. A review of deep
learning techniques for 2D and 3D human pose
estimation. Image and Vision Computing, 114, 104282.
https://doi.org/10.1016/j.imavis.2021.104282
Fang, Z., Wang, A., Bu, C., & Liu, C., 2021. 3D Human
Pose Estimation Using RGBD Camera. 2021 IEEE
International Conference on Computer Science,
Electronic Information Engineering and Intelligent
Control Technology (CEI).
https://doi.org/10.1109/cei52496.2021.9574486
Lan, G., Wu, Y., Hu, F., & Hao, Q., 2022. Vision-Based
Human Pose Estimation via Deep Learning: A Survey.
IEEE Transactions on Human-Machine Systems, 1–16.
https://doi.org/10.1109/thms.2022.3219242
Yu, C., Chen, W., Li, Y., & Chen, C., 2021. Action
Recognition Algorithm based on 2D Human Pose
Estimation Method. IEEE Xplore.
https://doi.org/10.23919/CCC52363.2021.9550204
Zheng, C., Wu, W., Chen, C., Yang, T., Zhu, S., Shen, J.,
Nasser Kehtarnavaz, & Shah, M., 2023. Deep
Learning-Based Human Pose Estimation: A Survey.
https://doi.org/10.1145/3603618
Andriluka, M., Pishchulin, L., Gehler, P., & Schiele, B.,
2014. 2D human pose estimation: New benchmark and
state of the art analysis. 2014 IEEE Conference on
Computer Vision and Pattern Recognition, 3686-3693.
https://doi.org/10.1109/CVPR.2014.471
Andriluka, M., Iqbal, U., Insafutdinov, E., Pishchulin, L.,
Milan, A., Gall, J., & Schiele, B., 2018. PoseTrack: A
Benchmark for Human Pose Estimation and Tracking.
IEEE Xplore.
https://doi.org/10.1109/CVPR.2018.00542
Ju, X., Zeng, A., Wang, J., Xu, Q., & Zhang, L., 2023.
Human-Art: A versatile human-centric dataset bridging
natural and artificial scenes. arXiv preprint.
https://doi.org/10.48550/arxiv.2303.02760
Cho, J., Youwang, K., & Oh, T.-H., 2022. Cross-Attention
of Disentangled Modalities for 3D Human Mesh
Recovery with Transformers. ArXiv.org.
https://arxiv.org/abs/2207.13820
Choi, H., Moon, G., Chang, J. Y., & Lee, K. M., 2020.
Beyond Static Features for Temporally Consistent 3D
Human Pose and Shape from a Video. ArXiv.org.
https://arxiv.org/abs/2011.08627
Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C.,
2014. Human3.6M: Large Scale Datasets and Predictive
Methods for 3D Human Sensing in Natural
Environments. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 36(7), 1325–1339.
https://doi.org/10.1109/TPAMI.2013.248
Jin, S., Liu, W., Xie, E., Wang, W., Qian, C., Ouyang, W.,
& Luo, P., 2020. Differentiable Hierarchical Graph
Grouping for Multi-Person Pose Estimation. ArXiv.org.
https://arxiv.org/abs/2007.11864
Kocabas, M., Athanasiou, N., & Black, M. J., 2020. VIBE:
Video Inference for Human Body Pose and Shape
Estimation. ArXiv.org. https://doi.org/10.48550/arXiv.
1912.05656
MLSCM 2024 - International Conference on Modern Logistics and Supply Chain Management