Parallel Applications and On-chip Traffic Distributions: Observation, Implication and Modelling

Thomas Canhao Xu, Jonne Pohjankukka, Paavo Nevalainen, Ville Leppänen, Tapio Pahikkala

2015

Abstract

We study the traffic characteristics of parallel and high performance computing applications in this paper. Applications that utilize multiple cores are more and more common nowadays due to the emergence of multicore processors. However the design nature of single-threaded applications and multi-threaded applications can vary significantly. Furthermore the on-chip communication profile of multicore systems should be analysed and modelled for characterization and simulation purposes. We investigate several applications running on a full system simulation environment. The on-chip communication traces are gathered and analysed. We study the detailed low-level profiles of these applications. The applications are categorized into different groups according to various parallel programming paradigms. We discover that the trace data follow different parameters of power-law model. The problem is solved by applying least-squares linear regression. We propose a generic synthetic traffic model based on the analysis results.

References

  1. Badr, M. and Jerger, N. (2014). Synfull: Synthetic traffic models capturing cache coherent behaviour. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on, pages 109-120.
  2. Bahn, J. H. and Bagherzadeh, N. (2008). A generic traffic model for on-chip interconnection networks. Network on Chip Architectures, page 22.
  3. Bienia, C., Kumar, S., Singh, J. P., and Li, K. (2008). The parsec benchmark suite: characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT 7808, pages 72-81, New York, NY, USA. ACM.
  4. Bogdan, P., Kas, M., Marculescu, R., and Mutlu, O. (2010). Quale: A quantum-leap inspired model for non-stationary analysis of noc traffic in chip multi-processors. In Networks-on-Chip (NOCS), 2010 Fourth ACM/IEEE International Symposium on, pages 241-248.
  5. Dally, W. J. and Towles, B. (2003). Principles and Practices of Interconnection Networks. Morgan Kaufmann.
  6. Intel (2015). Intel xeon processor e5-2699 v3. http://ark.intel.com/products/81061/.
  7. Kim, C., Burger, D., and Keckler, S. W. (2002). An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. SIGARCH Comput. Archit. News, 30(5):211-222.
  8. Lee, Y., Grover, V., Krashinsky, R., Stephenson, M., Keckler, S., and Asanovic, K. (2014). Exploring the design space of spmd divergence management on dataparallel architectures. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on, pages 101-113.
  9. Liu, W., Xu, J., Wu, X., Ye, Y., Wang, X., Zhang, W., Nikdast, M., and Wang, Z. (2011). A noc traffic suite based on real applications. In VLSI (ISVLSI), 2011 IEEE Computer Society Annual Symposium on, pages 66-71.
  10. Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. (2002). Simics: A full system simulation platform. Computer, 35(2):50-58.
  11. Martin, M. M., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. (2005). Multifacet's general execution-driven multiprocessor simulator (gems) toolset. Computer Architecture News.
  12. Mediatek (2015). Mediatek - true http://event.mediatek.com/ en octacore/.
  13. Mostaghim, S., Branke, J., Lewis, A., and Schmeck, H. (2008). Parallel multi-objective optimization using master-slave model on heterogeneous resources. In Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). IEEE Congress on, pages 1981-1987.
  14. Newman, M. (2005). Power laws, pareto distributions and zipf's law. Contemporary Physics, 46(5):323-351.
  15. Patel, A. and Ghose, K. (2008). Energy-efficient mesi cache coherence with pro-active snoop filtering for multicore microprocessors. In Proceeding of the thirteenth international symposium on Low power electronics and design, pages 247-252.
  16. Pekkarinen, E., Lehtonen, L., Salminen, E., and Hamalainen, T. (2011). A set of traffic models for network-on-chip benchmarking. In System on Chip (SoC), 2011 International Symposium on, pages 78- 81.
  17. Perelman, E., Polito, M., Bouguet, J.-Y., Sampson, J., Calder, B., and Dulong, C. (2006). Detecting phases in parallel applications on shared memory architectures. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pages 10 pp.-.
  18. Rauber, T. and Rnger, G. (2010). Parallel Programming - for Multicore and Cluster Systems. Springer.
  19. Soteriou, V., Wang, H., and Peh, L.-S. (2006). A statistical traffic model for on-chip interconnection networks. In Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2006. MASCOTS 2006. 14th IEEE International Symposium on, pages 104- 116.
  20. Woo, S., Ohara, M., Torrie, E., Singh, J., and Gupta, A. (1995). The splash-2 programs: characterization and methodological considerations. In Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on, pages 24-36.
  21. Xu, T., Liljeberg, P., Plosila, J., and Tenhunen, H. (2013). Evaluate and optimize parallel barnes-hut algorithm for emerging many-core architectures. In High Performance Computing and Simulation (HPCS), 2013 International Conference on, pages 421-428.
  22. Xu, T., Pahikkala, T., Airola, A., Liljeberg, P., Plosila, J., Salakoski, T., and Tenhunen, H. (2012a). Implementation and analysis of block dense matrix decomposition on network-on-chips. In High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, pages 516-523.
  23. Xu, T. C., Liljeberg, P., Plosila, J., and Tenhunen, H. (2012b). A high-efficiency low-cost heterogeneous 3d network-on-chip design. In Proceedings of the Fifth International Workshop on Network on Chip Architectures, NoCArc 7812, pages 37-42, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Xu T., Pohjankukka J., Nevalainen P., Leppänen V. and Pahikkala T. (2015). Parallel Applications and On-chip Traffic Distributions: Observation, Implication and Modelling . In Proceedings of the 10th International Conference on Software Engineering and Applications - Volume 1: ICSOFT-EA, (ICSOFT 2015) ISBN 978-989-758-114-4, pages 443-449. DOI: 10.5220/0005553604430449


in Bibtex Style

@conference{icsoft-ea15,
author={Thomas Canhao Xu and Jonne Pohjankukka and Paavo Nevalainen and Ville Leppänen and Tapio Pahikkala},
title={Parallel Applications and On-chip Traffic Distributions: Observation, Implication and Modelling},
booktitle={Proceedings of the 10th International Conference on Software Engineering and Applications - Volume 1: ICSOFT-EA, (ICSOFT 2015)},
year={2015},
pages={443-449},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005553604430449},
isbn={978-989-758-114-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Software Engineering and Applications - Volume 1: ICSOFT-EA, (ICSOFT 2015)
TI - Parallel Applications and On-chip Traffic Distributions: Observation, Implication and Modelling
SN - 978-989-758-114-4
AU - Xu T.
AU - Pohjankukka J.
AU - Nevalainen P.
AU - Leppänen V.
AU - Pahikkala T.
PY - 2015
SP - 443
EP - 449
DO - 10.5220/0005553604430449