GPU OPTIMIZATION AND PERFORMANCE ANALYSIS OF A 3D CURVE-SKELETON GENERATION ALGORITHM

J. Jiménez, J. Ruiz de Miras

2012

Abstract

The CUDA programming model allows the programmer to code algorithms for executing in a parallel way on NVIDIA GPU devices. But achieving acceptable acceleration rates writing programs that scale to thousands of independent threads is not always easy, especially when working with algorithms that have high data-sharing or data-dependence requirements. This type of algorithms is very common in fields like volume modelling or image analysis. In this paper we expose a comprehensive collection of optimizations to be considered in any CUDA implementation, and show how we have applied them in practice in a complex and not trivially parallelizable case study: a 3D curve-skeleton calculation algorithm. Two different GPU architectures have been used to test the implications of each optimization, the NVIDIA GT200 architecture and the new Fermi GF100. As a result, although the first direct CUDA implementation of our algorithm ran even slower than its CPU version, overall speedups of 19x (GT200) and 68x (Fermi GF100) were finally achieved.

References

  1. Cornea, N. D., Silver, D., Min, P., 2007. Curve-skeleton Properties, Applications and Algorithms. IEEE Transactions on Visualization and Computer Graphics 13, 530-548.
  2. Feinbure, F., Tröger, P., Polze, A., 2011. Joint Forces: From Multithreaded Programming to GPU Computing. IEEE Software.
  3. Huang, Q., Huang, Z., Werstein, P., Purvis, M., 2008. GPU as a General Purpose Computing Resource. International Conference on Parallel and Distributed Computing. Applications and Technologies. 151-158.
  4. Kong J., Dimitrov M., Yang Y., Liyanage J., Cao L., Staples J., Mantor M., Zhou H., 2010. Accelerating MATLAB Image Processing Toolbox Functions on GPUs. Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3).
  5. Kirk D. B., Hwu W. W., 2010. Programming Massively Parallel Processors. Hands-on Approach. Morgan Kaufmann Publishers.
  6. Khronos OpenCL Working Group, 2010. The OpenCL specification. V. 1.1. http://www.khronos.org/opencl/.
  7. (A) NVIDIA, 2011. NVIDIA CUDA C Programming Guide. V 4.0. http://developer.download.nvidia.com/ compute/DevZone/docs/html/C/doc/CUDA_C_Progra mming_Guide.pdf
  8. (B) NVIDIA, 2011. NVIDIA CUDA Best Practices Guide. v 4.0. http://developer.download.nvidia.com/compu te/DevZone/docs/html/C/doc/CUDA_C_Best_Practice s_Guide.pdf
  9. (C) NVIDIA, 2011. Compute Visual Profiler, User Guide. http://developer.download.nvidia.com/compute/DevZo ne/docs/html/C/doc/Compute_Visual_Profiler_User_ Guide.pdf
  10. Price, D. K., Humphrey, J. R., Spagnoli, K. E., Paolini, A. L., 2010. Analyzing the Impact of Data Movement on GPU Computations. Proceedings of SPIE - The International Society for Optical Engineering, 7705.
  11. Palágyi, K., Kuba, A., 1999. A Parallel 3D 12-Subiteration Thinning Algorithm. Graphical Models and Image Processing 61, 199-221.
  12. Reyes, R., De Sande, F., 2011. Optimize or wait? Using llc fast-prototyping tool to evaluate CUDA optimizations. Proceedings of 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing, 257-261.
  13. Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B., Hwu, W. W., 2008. Optimization Principles and Application Performance Evaluation of a Multithreaded GPU using CUDA. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
  14. Shape Repository, 2011. http://shapes.aimatshape.net/
  15. Stanford University, 2011. The Stanford 3D Scanning Repository. http://graphics.stanford.edu/data/3Dscan rep/
  16. Sanders, J., Kandrot, E., 2010. CUDA by Example. An Introduction to General-Purpose GPU Programming, Addison-Wesley.
  17. Torres, Y., González-Escribano, A., Llanos, D. R., 2011. Understanding the Impact of CUDA Tuning Techniques for Fermi. Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011, art. no. 5999886, pp. 631-639.
  18. VIA, 3D Repository, 2011. http://www.3dvia.com.
  19. Wittenbrink, C. M., Kilgariff, E., Prabhu, A., 2011. Fermi GF100 GPU Architecture. IEEE Micro 31, 50-59
Download


Paper Citation


in Harvard Style

Jiménez J. and Ruiz de Miras J. (2012). GPU OPTIMIZATION AND PERFORMANCE ANALYSIS OF A 3D CURVE-SKELETON GENERATION ALGORITHM . In Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information Visualization Theory and Applications - Volume 1: GRAPP, (VISIGRAPP 2012) ISBN 978-989-8565-02-0, pages 77-86. DOI: 10.5220/0003852600770086


in Bibtex Style

@conference{grapp12,
author={J. Jiménez and J. Ruiz de Miras},
title={GPU OPTIMIZATION AND PERFORMANCE ANALYSIS OF A 3D CURVE-SKELETON GENERATION ALGORITHM},
booktitle={Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information Visualization Theory and Applications - Volume 1: GRAPP, (VISIGRAPP 2012)},
year={2012},
pages={77-86},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003852600770086},
isbn={978-989-8565-02-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information Visualization Theory and Applications - Volume 1: GRAPP, (VISIGRAPP 2012)
TI - GPU OPTIMIZATION AND PERFORMANCE ANALYSIS OF A 3D CURVE-SKELETON GENERATION ALGORITHM
SN - 978-989-8565-02-0
AU - Jiménez J.
AU - Ruiz de Miras J.
PY - 2012
SP - 77
EP - 86
DO - 10.5220/0003852600770086