AUTOMATIC ANNOTATION OF BACTERIAL COMMUNITY SEQUENCES AND APPLICATION TO INFECTIONS DIAGNOSTIC

Victor Solovyev, Asaf Salamov, Igor Seledtsov, Denis Vorobyev, Alexander Bachinsky

Abstract

To annotate bacterial sequences from an environmental sample, we have developed an automatic annotation pipeline Fgenesb_annotator that includes self-training of gene-finding parameters, prediction of CDS, RNA genes, operons, promoters and terminators. New version of pipeline includes frame shift corrections and special module with improved prediction accuracy of ribosomal proteins. To analyze next-generation sequencing data we have developed OligiZip assembler and Transomics pipeline that provide solutions to the following tasks: 1) de novo reconstruction of genomic sequence; 2) reconstruction of sequence with a reference genome; 3) SNP discovery; 4) mapping RNA-Seq data to a reference genome, assemble them into alternative transcripts and quantify the abundance of these transcripts. Using the OligoZip assembler and gene Fgenesb pipeline we have developed a novel computational approach of identification toxic and non-toxic bacterial serotypes using next-generation sequencing data. It can be used for detection of bacterial infections in wounds, water or food contamination.

References

  1. Venter, J. C., Remington, K., Heidelberg, J., Halpern, A., Rusch, D., Eisen, J., Wu, D., Paulsen, I., Nelson, K., Nelson, W., et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science, 304, 66-74.
  2. Mavromatis, K., Ivanova, N., Barry, K., Shapiro, H., Goltsman,E., McHardy, A. C., Rigoutsos, I., Salamov, A., Korzeniewski, F., Land, M., et al. (2007) Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nature Methods, 4, 495-500.
  3. Krause, L., McHardy, A., Nattkemper, T., Puhler, A., Stoye, J., Meyer, F. (2007) GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Res., 35, 2, 540-549.
  4. Noguchi, H., Park, J., Takagi, T. (2006) MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res., 34, 19, 5623- 5630.
  5. Lowe, T. M. and Eddy, S.R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res., 25, 955-964.
  6. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389-3402.
  7. Oliynyk, M., Samborskyy, M., Lester, J., Mironenko, T., Scott, N., Dickens, S., Haydock, S., Leadlay, P. (2007) Complete genome sequence of the erythromycinproducing bacterium Saccharopolyspora erythraea NRRL23338. Nature Biotechnology, 25, 447-453.
  8. Martin, H., Ivanova, N., Kunin,V., Warnecke, F., Barry, K., McHardy, A., Yeates, C., He, S., Salamov, A., Szeto, S., et al. (2006) Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nature Biotechnology, 24, 1263- 1269.
  9. Frigaard, N., Martinez, A., Mincer, T., DeLong, E. (2006) Proteorhodopsin lateral gene transfer between marine planktonic Bacteria and Archaea. Nature, 439, 847- 850.
  10. Perez-Brocal,V., Gil, R., Ramos, S., Lamelas, A., Postigo, M., Michelena, J., Silva, F., Moya, A., Latorre, A. (2006) A Small Microbial Genome: The End of a Long Symbiotic Relationship? Science, 314, 312-313.
  11. Badger, J. H., Olsen, G. J. (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol. Biol. Evol., 16, 512-524.
  12. Delcher, A., Harmon, D., Kasif, S., White, O., Salzberg, S. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res., 27, 4636-4641.
  13. Hayes,W., Borodovsky, M. (1998) How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res., 8, 1154- 1171.
  14. Salzberg, S., Delcher, A., Kasif, S., White, O. (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res., 26, 544-548.
  15. Larsen, T., Krogh, A. (2003) EasyGene - a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformatics, 4, 21, 1-15.
  16. Borodovsky, M. Y., Sprizhitskii, Y. A., Golovanov, E. I., Aleksandrov, A. A. (1986) Statistical patterns in primary structures of functional regions in E. coli genome: III. Computer recognition of coding regions. Mol. Biol., 20, 1390-1398.
  17. Krogh, A., Mian, I., Haussler, D. (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res., 22, 4768-4778.
  18. Frishman, D., Mironov, A., Mewes, H. W., Gelfand, M. (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res., 26, 2941-2947.
  19. Markowitz, V. M., Microbial genome data sources. Curr. Opin. Biotechnol. 18, 267-272 (2007).
  20. Besemer, J., Lomsadze, A., Borodovsky, M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res., 29, 2607-2618.
  21. Delcher, A, Bratke, K., Powers, E., Salzberg, S. (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23, 6, 673-679.
  22. Tyson, G., Lo, I., Baker,B., Allen, E., Hugenholtz, P., Banfield, J. (2005) Genome-directed isolation of the key nitrogen fixer Leptospirillum ferrodiazotrophum sp. nov. from an acidophilic microbial community. Appl. Envir. Microbiol., 71, 6319-6324.
  23. Martinez, A., Bradley, A. S., Waldbauer, J. R., Summons, R. E., and DeLong, E. F. (2007) Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. PNAS, 104, 5590-5595.
  24. McClain, M., Shaffer, C., Israel, D., Peek, R., Cover, T. (2009) Genome sequence analysis of Helicobacter pylori strains associated with gastric ulceration and gastric cancer. BMC Genomics, 10, 3.
  25. Yan, B., Methé, B. A., Lovley, D. R., Krushkal, J. (2004) Computational prediction of conserved operons and phylogenetic footprinting of transcription regulatory elements in the metal-reducing bacterial family Geobacteraceae. J. Theor. Biol., 230, 133-144.
  26. Yan, B., Núñez, C., Ueki, T., Esteve-Núñez, A., Puljic.M., Adkins, R. M., Methé, B. A., Lovley, D. R., Krushkal, J. (2006). Computational prediction of RpoS and RpoD regulatory sites in Geobacter sulfurreducens using sequence and gene expression information. Gene, 384, 73-95.
  27. Pothier, J. F., Wisniewski-Dye, F., Weiss-Gayet, M., Loccoz, Y., Prigent-Combaret, C. (2007) Promotertrap identification of wheat seed extract-induced genes in the plant-growth-promoting rhizobacterium Azospirillum brasilense Sp245. Microbiology, 153, 3608-3622.
  28. Gil, H., Platz, G. J., Forestal, C. A., Monfett, M., Bakshi, C., Sellati, T. J., Furie. M. B., Benach, J. L., Thanassi, D. G. (2006) Deletion of TolC orthologs in Francisella tularensis identifies roles in multidrug resistance and virulence. PNAS, 103, 12897-12902.
  29. Michel, G. P., Durand, E., Filloux, A. (2007) XphA/XqhA, a Novel GspCD Subunit for Type II Secretion in Pseudomonas aeruginosa. J. Bacteriol., 189, 3776-3783.
  30. Budde, P., Davis, B., Yuan, J., Waldor, M. (2007) Characterization of a higBA Toxin-Antitoxin Locus in Vibrio cholerae. J. Bacteriol., 189, 491-500.
  31. Pilhofer, M., Bauer, A., Schrallhammer, M., Richter, L., Ludwig, W., Schleifer, K., Petroni, G. (2007) Characterization of bacterial operons consisting of two tubulins and a kinesin-like gene by the novel TwoStep Gene Walking method. Nucleic Acids Res., 35, e135.
  32. Kosaka, T., Uchiyama, T., Ishii, S., Enoki, M., Imachi, H., Kamagata, Y., Ohashi, A., Harada, H., Ikenaga, H., and Watanabe, K. (2006) Reconstruction and Regulation of the Central Catabolic Pathway in the Thermophilic Propionate-Oxidizing Syntroph Pelotomaculum thermopropionicum. J. Bacteriol., 188, 202-210.
  33. Grieshaber, N. A., Grieshaber, S. S., Fischer, E. R., Hackstadt, T. (2006) A small RNA inhibits translation of the histone-like protein Hc1 in Chlamydia trachomatis. Mol. Microbiol., 59, 541-550.
  34. Warren R., Sutton G, Jones S and Holt R. Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 2007, 23(4), 500-501.
Download


Paper Citation


in Harvard Style

Solovyev V., Salamov A., Seledtsov I., Vorobyev D. and Bachinsky A. (2011). AUTOMATIC ANNOTATION OF BACTERIAL COMMUNITY SEQUENCES AND APPLICATION TO INFECTIONS DIAGNOSTIC . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: Meta, (BIOSTEC 2011) ISBN 978-989-8425-36-2, pages 346-353. DOI: 10.5220/0003333703460353


in Bibtex Style

@conference{meta11,
author={Victor Solovyev and Asaf Salamov and Igor Seledtsov and Denis Vorobyev and Alexander Bachinsky},
title={AUTOMATIC ANNOTATION OF BACTERIAL COMMUNITY SEQUENCES AND APPLICATION TO INFECTIONS DIAGNOSTIC},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: Meta, (BIOSTEC 2011)},
year={2011},
pages={346-353},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003333703460353},
isbn={978-989-8425-36-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: Meta, (BIOSTEC 2011)
TI - AUTOMATIC ANNOTATION OF BACTERIAL COMMUNITY SEQUENCES AND APPLICATION TO INFECTIONS DIAGNOSTIC
SN - 978-989-8425-36-2
AU - Solovyev V.
AU - Salamov A.
AU - Seledtsov I.
AU - Vorobyev D.
AU - Bachinsky A.
PY - 2011
SP - 346
EP - 353
DO - 10.5220/0003333703460353