Authors:
Olfa Haggui
1
;
Claude Tadonki
2
;
Fatma Sayadi
3
and
Bouraoui Ouni
4
Affiliations:
1
Centre de Recherche en Informatique (CRI), Mines ParisTech - PSL Research University, 60 boulevard Saint-Michel, 75006 Paris, France, Networked Objects Control and Communications Systems (NOCCS), Sousse National School of Engineering, BP 264 Sousse Erriadh 4023 and Tunisia
;
2
Centre de Recherche en Informatique (CRI), Mines ParisTech - PSL Research University, 60 boulevard Saint-Michel, 75006 Paris and France
;
3
Electronics and Microelectronics Laboratory,Faculty of Sciences, University of Monastir, 5000 Monastir and Tunisia
;
4
Networked Objects Control and Communications Systems (NOCCS), Sousse National School of Engineering, BP 264 Sousse Erriadh 4023 and Tunisia
Keyword(s):
Optical Flow, Lucas-Kanade, Multicore, Manycore, GPU, OpenACC.
Related
Ontology
Subjects/Areas/Topics:
Computer Vision, Visualization and Computer Graphics
;
Motion, Tracking and Stereo Vision
;
Optical Flow and Motion Analyses
;
Tracking and Visual Navigation
Abstract:
Optical flow estimation stands as an essential component for motion detection and object tracking procedures. It is an image processing algorithm, which is typically composed of a series of convolution masks (approximation of the derivatives) followed by 2 × 2 linear systems for the optical flow vectors. Since we are dealing with a stencil computation for each stage of the algorithm, the overhead from memory accesses is expected to be significant and to yield a genuine scalability bottleneck, especially with the complexity of GPU memory configuration. In this paper, we investigate a GPU deployment of an optimized CPU implementation via OpenACC, a directive-based parallel programming model and framework that ease the process of porting codes to a wide-variety of heterogeneous HPC hardware platforms and architectures. We explore each of the major technical features and strive to get the best performance impact. Experimental results on a Quadro P5000 are provided together with the corre
sponding technical discussions, taking the performance of the multicore version on a INTEL Broadwell EP as the baseline.
(More)