GAST, A GENOMIC ALIGNMENT SEARCH TOOL

Kalle Karhu, Juho Mäkinen, Jussi Rautio, Jorma Tarhio, Hugh Salamon

2011

Abstract

Alignment to a genomic sequence is a common task in modern bioinformatics. By improving the methods used, significant amount of time and resources can be saved. We have developed a new genomic alignment search tool, called GAST, for sequences of at least 160 nt. GAST is many times faster than commonly used alignment tools BLAT and Mega BLAST. As the sizes of query sequences and the database increase, the advantage grows. This paper describes the principles of GAST and reports a comparison of GAST with BLAT and Mega BLAST. The effects the query sequence length and the number of queries have on run times were studied using the full human genome and the chromosome 1 of human genome separately. Additionally, the error tolerance and behaviour of GAST when handling sequences with lower similarity to a database was studied. Lastly, we compared the quality of exon mappings produced by the three tools and the genomic mapping tool GMAP.

References

  1. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-410.
  2. Harper, C. A., Huang, C. C., Stryke, D., Kawamoto, M., Ferrin, T. E., and Babbitt, P. C. (2006). Comparison of methods for genomic localization of gene trap sequences. BMC Genomics, 7:236.
  3. Hubbard, T. J. P. et al. (2007). Ensembl 2007. Nucleid Acid Res., 35:D610-D617.
  4. Karlin, S. and Burge, C. (1995). Dinucleotide relative abundance extremes: a genomic signature. Trends in Genetics, 11(7):283-290.
  5. Kent, W. J. (2002). BLAT - The BLAST-like alignment tool. Genome Res., 12:656-664.
  6. Manber, U. and Wu, S. (1994). GLIMPSE: A tool to search through entire file systems. Proceedings of the USENIX Winter Conference, pages 23-32.
  7. Navarro, G. and Raffinot, M. (2000). Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithms 5, 4:1-36.
  8. NCBI (2009). www.ncbi.nlm.nih.gov/BLAST/ (cited Mar 24, 2009), BLAST: Basic Local Alignment Search Tool (on-line).
  9. Ning, Z., Cox, A. J., and Mullikin, J. C. (2001). SSAHA: A Fast Search Method for Large DNA Databases. Genome Res., 11:1725-1729.
  10. OICR and EBI (2010). www.biomart.org (cited May 3, 2010), BioMart Project (on-line).
  11. Salmela, L., Tarhio, J., and Kytöjoki, J. (2006). Multipattern string matching with q-grams. ACM Journal of Experimental Algorithms, 11(1).
  12. Sutinen, E. and Tarhio, J. (1996). Filtration with q-samples in approximate string matching. In Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching (CPM 7896). Lecture Notes in Computer Science, 1075:50-63.
  13. Wu, T. D. and Watanabe, C. K. (2005). GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21(9):1859-1875.
  14. Zhang, S.-H. and Yang, J.-H. (2005). Conservation versus variation of dinucleotide frequencies across genomes: Evolutionary implications. Genome Biology, 6, P12.
  15. Zhang, S.-H. and Yang, J.-H. (2008). Characteristics of oligonucleotide frequencies across genomes: Conservation versus variation, strand symmetry, and evolutionary implications. Nature Precedings, hdl:10101/npre.2008.2146.1.
  16. Zhang, Z., Schwartz, S., Wagner, L., and Miller, W. (2000). A greedy algorithm for aligning DNA sequences. Journal of Computational Biology, 7:203-214.
Download


Paper Citation


in Harvard Style

Karhu K., Mäkinen J., Rautio J., Tarhio J. and Salamon H. (2011). GAST, A GENOMIC ALIGNMENT SEARCH TOOL . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011) ISBN 978-989-8425-36-2, pages 82-90. DOI: 10.5220/0003181400820090


in Bibtex Style

@conference{bioinformatics11,
author={Kalle Karhu and Juho Mäkinen and Jussi Rautio and Jorma Tarhio and Hugh Salamon},
title={GAST, A GENOMIC ALIGNMENT SEARCH TOOL},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)},
year={2011},
pages={82-90},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003181400820090},
isbn={978-989-8425-36-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)
TI - GAST, A GENOMIC ALIGNMENT SEARCH TOOL
SN - 978-989-8425-36-2
AU - Karhu K.
AU - Mäkinen J.
AU - Rautio J.
AU - Tarhio J.
AU - Salamon H.
PY - 2011
SP - 82
EP - 90
DO - 10.5220/0003181400820090