PERFORMANCE STUDY OF PARALLEL HYBRID MULTIPLE PATTERN MATCHING ALGORITHMS FOR BIOLOGICAL SEQUENCES

Charalampos S. Kouzinopoulos, Panagiotis D. Michailidis, Konstantinos G. Margaritis

2012

Abstract

Multiple pattern matching is widely used in computational biology to locate any number of nucleotides in genome databases. Processing data of this size often requires more computing power than a sequential computer can provide. A viable and cost-effective solution that can offer the power required by computationally intensive applications at low cost is to share computational tasks among the processing nodes of a high performance hybrid distributed and shared memory platform that consists of cluster workstations and multi-core processors. This paper presents experimental results and a theoretical performance model of the hybrid implementations of the Commentz-Walter, Wu-Manber, Set Backward Oracle Matching and the Salmela-Tarhio-Kytöjoki family of multiple pattern matching algorithms when executed in parallel on biological sequence databases.

References

  1. Aho, A. and Corasick, M. (1975). Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333-340.
  2. Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., and Silvera, R. (2003). Is the schedule clause really necessary in openmp? In International workshop on OpenMP applications and tools, volume 2716, pages 147-159.
  3. Boukerche, A., de Melo, A. C. M. A., Ayala-Rincón, M., and Walter, M. E. M. T. (2007). Parallel strategies for the local biological sequence alignment in a cluster of workstations. J. Parallel Distrib. Comput., 67:170- 185.
  4. Chaichoompu, K., Kittitornkun, S., and Tongsima, S. (2006). MT-clustalW: multithreading multiple sequence alignment. In IPDPS.
  5. Commentz-Walter, B. (1979). A string matching algorithm fast on the average. Proceedings of the 6th Colloquium, on Automata, Languages and Programming, pages 118-132.
  6. Cuvillo, J., Tian, X., Gao, G., and Girkar, M. (2003). Performance study of a whole genome comparison tool on a hyper-threading multiprocessor. In ISHPC, pages 450-457.
  7. Jacob, A. C., Sanyal, S., Paprzycki, M., Arora, R., and Ganzha, M. (2007). Whole genome comparison on a network of workstations. In ISPDC'07, pages 31-36.
  8. Kouzinopoulos, C. and Margaritis, K. (2009). Parallel implementation of exact two dimensional pattern matching algorithms using MPI and OpenMP. In 9th Hellenic European Research on Computer Mathematics and its Applications Conference.
  9. Kouzinopoulos, C. and Margaritis, K. (2010). Experimental Results on Algorithms for Multiple Keyword Matching. In IADIS International Conference on Informatics, pages 274-277.
  10. Kouzinopoulos, C., Michailidis, P., and Margaritis, K. (2011). Parallel Processing of Multiple Pattern Matching Algorithms for Biological Sequences: Methods and Performance Results. InTech.
  11. Li, K.-B. (2003). ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics, 19(12):1585-1586.
  12. Li, Y. and Chen, C.-K. (2005). Parallelization of multiple genome alignment. In HPCC'05, pages 910-915.
  13. Liao, C. and Chapman, B. (2007). Invited paper: A compile-time cost model for openmp. In Proceedings of the 21st International Parallel and Distributed Processing Symposium.
  14. Navarro, G. and Raffinot, M. (2002). Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences. Cambridge University Press.
  15. Rashid, N. A., Abdullah, R., and Talib, A. Z. H. (2007). Parallel homologous search with hirschberg algorithm: a hybrid mpi-pthreads solution. In Proceedings of the 11th WSEAS International Conference on Computers, pages 228-233, Stevens Point, Wisconsin, USA. World Scientific and Engineering Academy and Society (WSEAS).
  16. Salmela, L., Tarhio, J., and Kytöjoki, J. (2006). Multipattern string matching with q -grams. Journal of Experimental Algorithmics, 11:1-19.
  17. Watson, B. (1995). Taxonomies and toolkits of regular language algorithms. PhD thesis, Eindhoven University of Technology.
  18. Wu, S. and Manber, U. (1994). A fast algorithm for multipattern searching. pages 1-11. Technical report TR94-17.
  19. Zomaya, A. (2006). Parallel Computing for Bioinformatics and Computational Biology: Models, Enabling Technologies, and Case Studies. Wiley.
Download


Paper Citation


in Harvard Style

S. Kouzinopoulos C., D. Michailidis P. and G. Margaritis K. (2012). PERFORMANCE STUDY OF PARALLEL HYBRID MULTIPLE PATTERN MATCHING ALGORITHMS FOR BIOLOGICAL SEQUENCES . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012) ISBN 978-989-8425-90-4, pages 182-187. DOI: 10.5220/0003769801820187


in Bibtex Style

@conference{bioinformatics12,
author={Charalampos S. Kouzinopoulos and Panagiotis D. Michailidis and Konstantinos G. Margaritis},
title={PERFORMANCE STUDY OF PARALLEL HYBRID MULTIPLE PATTERN MATCHING ALGORITHMS FOR BIOLOGICAL SEQUENCES},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},
year={2012},
pages={182-187},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003769801820187},
isbn={978-989-8425-90-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)
TI - PERFORMANCE STUDY OF PARALLEL HYBRID MULTIPLE PATTERN MATCHING ALGORITHMS FOR BIOLOGICAL SEQUENCES
SN - 978-989-8425-90-4
AU - S. Kouzinopoulos C.
AU - D. Michailidis P.
AU - G. Margaritis K.
PY - 2012
SP - 182
EP - 187
DO - 10.5220/0003769801820187