Authors:
Marko Linna
1
;
Juho Kannala
2
and
Esa Rahtu
3
Affiliations:
1
University of Oulu, Finland
;
2
Aalto University, Finland
;
3
Tampere University of Technology, Finland
Keyword(s):
Human Pose Estimation, Person Detection, Convolutional Neural Networks.
Abstract:
In this paper, we present a method for real-time multi-person human pose estimation from video by utilizing
convolutional neural networks. Our method is aimed for use case specific applications, where good accuracy
is essential and variation of the background and poses is limited. This enables us to use a generic network
architecture, which is both accurate and fast. We divide the problem into two phases: (1) pre-training and
(2) finetuning. In pre-training, the network is learned with highly diverse input data from publicly available
datasets, while in finetuning we train with application specific data, which we record with Kinect. Our method
differs from most of the state-of-the-art methods in that we consider the whole system, including person
detector, pose estimator and an automatic way to record application specific training material for finetuning.
Our method is considerably faster than many of the state-of-the-art methods. Our method can be thought of as
a replacemen
t for Kinect in restricted environments. It can be used for tasks, such as gesture control, games,
person tracking, action recognition and action tracking. We achieved accuracy of 96.8% (PCK@0.2) with
application specific data.
(More)