Towards Deep People Detection using CNNs Trained on Synthetic Images

Roberto Martín-López, David Fuentes-Jiménez, Sara Luengo-Sánchez, Cristina Losada-Gutiérrez, Marta Marrón-Romera, Carlos Luna

2020

Abstract

In this work, we propose a people detection system that uses only depth information, provided by an RGB-D camera in frontal position. The proposed solution is based on a Convolutional Neural Network (CNN) with an encoder-decoder architecture, formed by ResNet residual layers, that have been widely used in detection and classification tasks. The system takes a depth map as input, generated by a time-of-flight or a structured-light based sensor. Its output is a probability map (with the same size of the input) where each detection is represented as a Gaussian function, whose mean is the position of the person’s head. Once this probability map is generated, some refinement techniques are applied in order to improve the detection precision. During the system training process, there have only been used synthetic images generated by the software Blender, thus avoiding the need to acquire and label large image datasets. The described system has been evaluated using both, synthetic and real images acquired using a Microsoft Kinect II camera. In addition, we have compared the obtained results with those from other works of the state-of-the-art, proving that the results are similar in spite of not having used real data during the training procedure.

Download


Paper Citation


in Harvard Style

Martín-López R., Fuentes-Jiménez D., Luengo-Sánchez S., Losada-Gutiérrez C., Marrón-Romera M. and Luna C. (2020). Towards Deep People Detection using CNNs Trained on Synthetic Images. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 5: VISAPP; ISBN 978-989-758-402-2, SciTePress, pages 225-232. DOI: 10.5220/0008879102250232


in Bibtex Style

@conference{visapp20,
author={Roberto Martín-López and David Fuentes-Jiménez and Sara Luengo-Sánchez and Cristina Losada-Gutiérrez and Marta Marrón-Romera and Carlos Luna},
title={Towards Deep People Detection using CNNs Trained on Synthetic Images},
booktitle={Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 5: VISAPP},
year={2020},
pages={225-232},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008879102250232},
isbn={978-989-758-402-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 5: VISAPP
TI - Towards Deep People Detection using CNNs Trained on Synthetic Images
SN - 978-989-758-402-2
AU - Martín-López R.
AU - Fuentes-Jiménez D.
AU - Luengo-Sánchez S.
AU - Losada-Gutiérrez C.
AU - Marrón-Romera M.
AU - Luna C.
PY - 2020
SP - 225
EP - 232
DO - 10.5220/0008879102250232
PB - SciTePress