Victor G. Khoroshevsky, Mikhail G. Kurnosov



In most cases modern distributed computer systems (computer clusters and MPP systems) have hierarchical organization and non-uniform communication channels between elementary machines (computer nodes, processors or processor cores). Execution time of parallel programs significantly depends on how they map to computer system (on what elementary machines parallel processes are assigned and what channels for inter-process communications are used). The general problem of mapping a parallel program into a distributed computer system is a well known NP-hard problem and several heuristics have been proposed to approximate its optimal solution. In this paper an algorithm for mapping parallel programs into hierarchical distributed computer systems based on task graph partitioning is proposed. The software tool for mapping MPI applications into multicore computer clusters is considered. The quality of this algorithm with the NAS Parallel Benchmarks is evaluated.


  1. Ahmad, I., 1997. A Parallel Algorithm for Optimal Task Assignment in Distributed Systems. In Proceedings of the Advances in Parallel and Distributed Computing Conference, pp. 284.
  2. Bokhari, S. H., 1981. On the mapping problem. IEEE Transactions on Computers, Vol. 30, ?3, pp. 207-214.
  3. Lee, C., 1989. On the mapping problem using simulated annealing. In Proceedings of Computers and Communications, pp. 40-44.
  4. Yau, S., 1993. A task allocation algorithm for distributed computing systems. In Proceedings of Computer Software and Applications Conference, pp. 336-342.
  5. Yu, H., 2006. Topology Mapping for Blue Gene/L Supercomputer. In Proceedings of ACM/IEEE Conference Supercomputing, pp. 52.
  6. Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A. and Bhoedjang, R., 1999. MagPIe: MPI's collective communication operations for clustered wide area systems. In ACM SIG-PLAN Notices 34, pp. 131-140.
  7. Almási, G., Heidelberger, P., Archer, C. J., Martorell, X., Erway, C. C., Moreira, J. E., Steinmacher-Burow, B., and Zheng, Y., 2005. Optimization of MPI collective communication on BlueGene/L systems. In Proceedings of the 19th Annual international Conference on Supercomputing, pp. 253-262.
  8. Faraj, A. and Yuan, X., 2005. Automatic generation and tuning of MPI collective communication routines. In Proceedings of the 19th Annual international Conference on Supercomputing, Cambridge, Massachusetts, pp. 393 - 402.
  9. Lee, C. H., Kim, M., and Park, C. I., 1990. An efficient Kway graph partitioning algorithm for task allocation in parallel computing systems. In Proceedings of the First international Conference on Systems integration on Systems integration, IEEE Press, pp. 748-751.
  10. Lopez-Benitez, N., Djomehri, M. J., and Biswas, R., 2001. Task Assignment Heuristics for Distributed CFD Applications. In Proceedings of the 2001 international Conference on Parallel Processing Workshop. IEEE Press., p. 128.
  11. Karypis, G., Kumar, V., 1999. A fast and high quality multilevel scheme for partitioning irregular graphs. In SIAM Journal on Scientific Computing, Vol. 20, No. 1, pp. 359-392.
  12. Hendrickson, B., Leland, R., 1995. A multilevel algorithm for partitioning graphs. In Proceedings of the ACM/IEEE conference on Supercomputing.
  13. Träff, J. L., 2002. Implementing the MPI process topology mechanism. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing. IEEE Press., pp. 1 - 14.
  14. Chen, H., Chen, W., Huang, J., Robert, B., and Kuhn, H., 2006. MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In Proceedings of the 20th Annual international Conference on Supercomputing, pp. 353-360.
  15. Khoroshevsky, V.G., 2008. Computer Systems Architecture, MGTU, Moscow [in Russian].
  16. Karypis, G., 1998. Multilevel k-way partitioning scheme for irregular graphs. In Journal of Parallel and Distributed computing, Vol. 48, pp. 96-129.
  17. Schloegel, K., G. Karypis, V. Kumar, 2003. Graph partitioning for high-performance scientific simulations. In Sourcebook of parallel computing, pp. 491-541.
  18. Fiduccia, C. M., R. M. Mattheyses, 1982. A linear-time heuristic for improving network partitions. In Proc. of conference “Design Automation”, pp. 175-181.
  19. Khoroshevsky, V.G., Mamoilenko, S.N. Maidanov, Y.S., Sedelnikov, M.S., 2005. Space-distributed multicluster computer system with multiprogramme regimes supporting. In Proceedings of the Second IASTED International Multi-Conference on Automation, Control and Information Technology. Software Engineering, ASTA Press.
  20. Khoroshevsky, V.G., Mamoilenko, S.N., Kurnosov, M.G., Medvedeva, N.A., 2006. Space-Distributed MultiCluster Computer System for Training in Parallel Computational Technologies. In Proc. of 7th International Workshop and Tutorials on Electron Devices and Materials, pp. 218-219.
  21. Knüpfer, A., Brendel, R., Brunst, H., Mix, H., Nagel, W.E., 2006. Introducing the Open Trace Format (OTF). In Vassil N. Alexandrov, Geert Dick van Albada, Peter M. A. Sloot, Jack Dongarra (Eds): Computational Science - ICCS 2006: 6th International Conference, pp. 526-533.

Paper Citation

in Harvard Style

Khoroshevsky V. and Kurnosov M. (2009). MAPPING PARALLEL PROGRAMS INTO HIERARCHICAL DISTRIBUTED COMPUTER SYSTEMS . In Proceedings of the 4th International Conference on Software and Data Technologies - Volume 2: ICSOFT, ISBN 978-989-674-010-8, pages 123-128. DOI: 10.5220/0002240601230128

in Bibtex Style

author={Victor G. Khoroshevsky and Mikhail G. Kurnosov},
booktitle={Proceedings of the 4th International Conference on Software and Data Technologies - Volume 2: ICSOFT,},

in EndNote Style

JO - Proceedings of the 4th International Conference on Software and Data Technologies - Volume 2: ICSOFT,
SN - 978-989-674-010-8
AU - Khoroshevsky V.
AU - Kurnosov M.
PY - 2009
SP - 123
EP - 128
DO - 10.5220/0002240601230128