Cache Aware Instruction Accurate Simulation of a 3-D Coastal Ocean Model on Low Power Hardware

Dominik Schoenwetter, Alexander Ditter, Vadym Aizinger, Balthasar Reuter, Dietmar Fey

2016

Abstract

High level hardware simulation and modeling techniques matured significantly over the last years and have become more and more important in practice, e.g., in the industrial hardware development and automotive domain. Yet, there are many other challenging application areas such as numerical solvers for environmental or disaster prediction problems, e.g., tsunami and storm surge simulations, that could greatly profit from accurate and efficient hardware simulation. Such applications rely on complex mathematical models that are discretized using suitable numerical methods, and require a close collaboration between mathematicians and computer scientists to attain desired computational performance on current micro architectures and code parallelization techniques to produce accurate simulation results as fast as possible. This complex and detailed simulation requires a lot of time during preparation and execution. Especially the execution on non-standard or new hardware may be challenging and potentially error prone. In this paper, we focus on a high level simulation approach for determining accurate runtimes of applications using instruction accurate modeling and simulation. We extend the basic instruction accurate simulation technology from OVP using cache models in conjunction with a statistical cost function, which enables high precision and significantly better runtime predictions compared to the pure instruction accurate approach.

References

  1. Aizinger, V. and Dawson, C. (2002). A discontinuous galerkin method for two-dimensional flow and transport in shallow water. Advances in Water Resources, 25(1):67-84.
  2. Aizinger, V., Proft, J., Dawson, C., Pothina, D., and Negusse, S. (2013). A three-dimensional discontinuous Galerkin model applied to the baroclinic simulation of Corpus Christi Bay. Ocean Dynamics, 63(1):89-113.
  3. Altera Corp. (2016). Cyclone V SoC Development Kit User Guide. https://www.altera.com/ en US/pdfs/literature/ug/ug cv soc dev kit.pdf. Last visit on 31.03.2016.
  4. Applegate, D., Bixby, R., Chvátal, V., and Cook, W. (2011). The Traveling Salesman Problem: A Computational Study: A Computational Study. Princeton Series in Applied Mathematics. Princeton University Press.
  5. Bailey, D. H., Barszcz, E., Barton, J. T., Browning, D. S., Carter, R. L., Dagum, L., Fatoohi, R. A., Frederickson, P. O., Lasinski, T. A., Schreiber, R. S., et al. (1991). The NAS parallel benchmarks - summary and preliminary results. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 158-165. ACM.
  6. Barker, K. and Kerbyson, D. (2005). A performance model and scalability analysis of the hycom ocean simulation application. In Proc. IASTED Int. Conf. On Parallel and Distributed Computing.
  7. Carlson, T. E., Heirman, W., and Eeckhout, L. (2011). Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulations. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 52:1-52:12.
  8. Castro, M., Francesquini, E., Nguélé, T. M., and Méhaut, J.- F. (2013). Analysis of computing and energy performance of multicore, numa, and manycore platforms for an irregular application. In Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms, IA3 7813, pages 5:1-5:8, New York, NY, USA. ACM.
  9. Cockburn, B. and Shu, C.-W. (1998). The Local Discontinuous Galerkin Method for Time-Dependent Convection-Diffusion Systems. SIAM Journal on Numerical Analysis, 35(6):2440-2463.
  10. Cowles, G. W. (2008). Parallelization of the FVCOM coastal ocean model. International Journal of High Performance Computing Applications, 22(2):177- 193.
  11. Dawson, C. and Aizinger, V. (2005). A discontinuous Galerkin method for three-dimensional shallow water equations. Journal of Scientific Computing , 22(1- 3):245-267.
  12. Dietrich, J., Tanaka, S., Westerink, J., Dawson, C., Luettich, R.A., J., Zijlema, M., Holthuijsen, L., Smith, J., Westerink, L., and Westerink, H. (2012). Performance of the unstructured-mesh, swan+adcirc model in computing hurricane waves and surge. Journal of Scientific Computing , 52(2):468-497.
  13. Eeckhout, L., Bell, R. H., Stougie, B., De Bosschere, K., and John, L. K. (2004). Control flow modeling in statistical simulation for accurate and efficient processor design studies. In Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004, pages 350-361. IEEE.
  14. Genbrugge, D. and Eeckhout, L. (2009). Chip Multiprocessor Design Space Exploration through Statistical Simulation. Computers, IEEE Transactions on, 58(12):1668-1681.
  15. Göddeke, D., Komatitsch, D., Geveler, M., Ribbrock, D., Rajovic, N., Puzovic, N., and Ramirez, A. (2013). Energy efficiency vs. performance of the numerical solution of pdes: An application study on a low-power arm-based cluster. J. Comput. Phys., 237:132-150.
  16. Imperas Software Ltd. (2015). OVP Guide to Using Processor Models. Imperas Buildings, North Weston, Thame, Oxfordshire, OX9 2HA, UK. Version 0.5, docs@imperas.com.
  17. Imperas Software Ltd. (2016). Description of Altera Cyclone V SoC. http://www.ovpworld.org/library/wikka.php?wakka= AlteraCycloneVHPS. Last visit on 31.03.2016.
  18. TU Dortmund (2015). Official LiDO website. https://www.itmc.unidortmund.de/dienste/hochleistungsrechnen/lido.html.
  19. Last visit on 26.03.2015.
  20. John Hardman (2016). Official NAS Parallel Benchmarks Website. http://www.nas.nasa.gov/ publications/npb.html. Last visit on 12.04.2016.
  21. KALRAY Corp. (2015). Official kalray mppa processor website. http://www.kalrayinc.com/kalray/ products/#processors. Last visit on 31.03.2015.
  22. Kerbyson, D. J. and Jones, P. W. (2005). A performance model of the parallel ocean program. International Journal of High Performance Computing Applications, 19(3):261-276.
  23. Miller, J., Kasture, H., Kurian, G., Gruenwald, C., Beckmann, N., Celio, C., Eastep, J., and Agarwal, A. (2010). Graphite: A distributed parallel simulator for multicores. In IEEE 16th International Symposium on High Performance Computer Architecture (HPCA), 2010, pages 1-12.
  24. Nair, R., Choi, H.-W., and Tufo, H. (2009). Computational aspects of a scalable high-order discontinuous galerkin atmospheric dynamical core. Computers & Fluids, 38(2):309 - 319.
  25. NVIDIA Corp. (2015). Official NVIDIA SECO development kit website. https://developer.nvidia.com/secodevelopment-kit. Last visit on 31.03.2015.
  26. Rajovic, N., Carpenter, P. M., Gelado, I., Puzovic, N., Ramirez, A., and Valero, M. (2013). Supercomputing with commodity cpus: Are mobile SoCs ready for HPC? In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 7813, pages 40:1-40:12, New York, NY, USA. ACM.
  27. Rajovic, N., Rico, A., Puzovic, N., Adeniyi-Jones, C., and Ramirez, A. (2014). Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems, 36(0):322 - 334.
  28. Reuter, B., Aizinger, V., and Köstler, H. (2015). A multiplatform scaling study for an OpenMP parallelization of a discontinuous Galerkin ocean model. Comput Fluids, 117:325 - 335.
  29. Ringler, T., Petersen, M., Higdon, R. L., Jacobsen, D., Jones, P. W., and Maltrud, M. (2013). A multiresolution approach to global ocean modeling. Ocean Modelling, 69:211 - 232.
  30. Schoenwetter, D., Ditter, A., Kleinert, B., Hendricks, A., Aizinger, V., K östler, H., and Fey, D. (2015). Tsunami and Storm Surge Simulation using Low Power Architectures - Concept and Evaluation. In SIMULTECH 2015 - Proceedings of the 5th International Conference on Simulation and Modeling Methodologies, Technologies and Applications, pages 377-382.
  31. Shu, C.-W. (2016). High order {WENO} and {DG} methods for time-dependent convection-dominated pdes: A brief survey of several recent developments. Journal of Computational Physics, 316:598 - 613.
  32. Tanaka, S., Bunya, S., Westerink, J. J., Dawson, C., and Luettich, R. A. (2011). Scalability of an unstructured grid continuous galerkin based hurricane storm surge model. J. Sci. Comput., 46(3):329-358.
  33. Wallcraft, A., Hurlburt, H., Townsend, T., and Chassignet, E. (2005). 1/25 degree atlantic ocean simulation using hycom. In Users Group Conference, 2005, pages 222- 225.
  34. Worley, P. and Levesque, J. (2004). The performance evolution of the parallel ocean program on the cray x1. In Proceedings of the 46th Cray User Group Conference, pages 17-21.
Download


Paper Citation


in Harvard Style

Schoenwetter D., Ditter A., Aizinger V., Reuter B. and Fey D. (2016). Cache Aware Instruction Accurate Simulation of a 3-D Coastal Ocean Model on Low Power Hardware . In Proceedings of the 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications - Volume 1: SIMULTECH, ISBN 978-989-758-199-1, pages 129-137. DOI: 10.5220/0006006501290137


in Bibtex Style

@conference{simultech16,
author={Dominik Schoenwetter and Alexander Ditter and Vadym Aizinger and Balthasar Reuter and Dietmar Fey},
title={Cache Aware Instruction Accurate Simulation of a 3-D Coastal Ocean Model on Low Power Hardware},
booktitle={Proceedings of the 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications - Volume 1: SIMULTECH,},
year={2016},
pages={129-137},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006006501290137},
isbn={978-989-758-199-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications - Volume 1: SIMULTECH,
TI - Cache Aware Instruction Accurate Simulation of a 3-D Coastal Ocean Model on Low Power Hardware
SN - 978-989-758-199-1
AU - Schoenwetter D.
AU - Ditter A.
AU - Aizinger V.
AU - Reuter B.
AU - Fey D.
PY - 2016
SP - 129
EP - 137
DO - 10.5220/0006006501290137