Search for Latent Periodicity in Amino Acid Sequences with Insertions and Deletions

Valentina Pugacheva, Alexander Korotkov, Eugene Korotkov

Abstract

The aim of this study was to show that amino acid sequences have a latent periodicity with insertions and deletions of amino acids in unknown positions of the analyzed sequence. Genetic algorithm, dynamic programming, and random weight matrices were used to develop the new mathematical algorithm for latent periodicity search. The method makes the direct optimization of the position-weight matrix for multiple sequence alignment without using pairwise alignments. The developed algorithm was applied to analyze the amino acid sequences of a small number of proteins. This study showed the presence of latent periodicity with insertions and deletions in the amino acid sequences of such proteins, for which the presence of latent periodicity was not previously known. The origin of latent periodicity with insertions and deletions is discussed.

References

  1. Almirantis, Y. et al., 2014. Editorial: Complexity in genomes. Computational biology and chemistry, 53 Pt A, pp.1-4.
  2. Altschul, S.F. et al., 1990. Basic local alignment search tool. Journal of molecular biology, 215(3), pp.403- 410.
  3. Andrade, M. a et al., 2000. Homology-based method for identification of protein repeats using statistical significance estimates. Journal of molecular biology, 298(3), pp.521-537.
  4. Bäck, T., 1996. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms, Oxford University Press.
  5. Banzhaf, W. et al., 1998. Genetic programming: an introduction: on the automatic evolution of computer programs and its applications.
  6. Biegert, a & Söding, J., 2008. De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics (Oxford, England), 24(6), pp.807-14.
  7. Björklund, A.K., Ekman, D. & Elofsson, A., 2006. Expansion of protein domain repeats. PLoS computational biology, 2(8), p.e114.
  8. Boeckmann, B. et al., 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic acids research, 31(1), pp.365-370.
  9. Custer, M. et al., 1997. Identification of a new gene product (diphor-1) regulated by dietary phosphate. The American journal of physiology, 273(5 Pt 2), pp.F801- F806.
  10. Dahlstrand, J. et al., 1992. Characterization of the human nestin gene reveals a close evolutionary relationship to neurofilaments. Journal of cell science, 103 ( Pt 2, pp.589-97.
  11. Durbin, R. et al., 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press.
  12. Ekblom, R. & Wolf, J.B.W., 2014. A field guide to wholegenome sequencing, assembly and annotation. Evolutionary Applications, 7(9), pp.1026-1042.
  13. Elkins, P.A. et al., 2002. Structure of the C-terminally truncated human ProMMP9, a gelatin-binding matrix metalloproteinase. Acta crystallographica. Section D, Biological crystallography, 58(Pt 7), pp.1182-92.
  14. Fogel, D.B., 2010. EVOLUTIONARY COMPUTATION Toward a New Philosophy of Machine Intelligence, Fogel, D.B., 1998. Evolutionary Computation: The Fossil Record.
  15. Gondro, C. & Kinghorn, B.P., 2007. A simple genetic algorithm for multiple sequence alignment. Genetics and molecular research?: GMR, 6(4), pp.964-82.
  16. De Grassi, A. & Ciccarelli, F.D., 2009. Tandem repeats modify the structure of human genes hosted in segmental duplications. Genome biology, 10(12), p.R137.
  17. Heger, A. & Holm, L., 2000. Rapid automatic detection and alignment of repeats in protein sequences. Proteins: Structure, Function and Genetics, 41(2), pp.224-237.
  18. Heringa, J. & Argos, P., 1993. A method to recognize distant repeats in protein sequences. Proteins, 17(4), pp.391-41.
  19. Jernigan, K.K. & Bordenstein, S.R., 2015. Tandem-repeat protein domains across the tree of life. PeerJ, 3, p.e732.
  20. Jorda, J. et al., 2010. Protein tandem repeats - the more perfect, the less structured. The FEBS journal, 277(12), pp.2673-82.
  21. Jorda, J. & Kajava, A. V, 2009. T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics (Oxford, England), 25(20), pp.2632-8.
  22. Kajava, A. V, 2012. Tandem repeats in proteins: from sequence to structure. Journal of structural biology, 179(3), pp.279-88.
  23. Korotkov, E.V., Korotkova, M.A. & Kudryashov, N.A., 2003. The informational concept of searching for periodicity in symbol sequences. Molekuliarnaia Biologiia, 37(3), pp.436-451.
  24. Korotkov, Korotkova & Kudryashov, 2003. Information decomposition method to analyze symbolical sequences. Physics Letters, Section A: General, Atomic and Solid State Physics, 312(3-4), pp.198-210.
  25. Kravatskaya, G.I. et al., 2011. Coexistence of different base periodicities in prokaryotic genomes as related to DNA curvature, supercoiling, and transcription. Genomics, 98(3), pp.223-231.
  26. Kumar, L., Futschik, M. & Herzel, H., 2006. DNA motifs and sequence periodicities. In silico biology, 6(1-2), pp.71-8.
  27. Lee, M.S. et al., 1989. Three-dimensional solution structure of a single zinc finger DNA-binding domain. Science (New York, N.Y.), 245(4918), pp.635-7.
  28. Lobzin, V. V. & Chechetkin, V.R., 2000. Order and correlations in genomic DNA sequences. The spectral approach. Uspekhi Fizicheskih Nauk, 170(1), p.57.
  29. Marcotte, E.M. et al., 1999. A census of protein repeats. Journal of molecular biology, 293(1), pp.151-160.
  30. Meng, T. et al., 2013. Wavelet analysis in current cancer genome research: a survey. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 10(6), pp.1442-59.
  31. Mitchell, M., 1998. An Introduction to Genetic Algorithms.
  32. Mott, R., 1999. Local sequence alignments with monotonic gap penalties. Bioinformatics (Oxford, England), 15(6), pp.455-62.
  33. Newman, A.M. & Cooper, J.B., 2007. XSTREAM: a practical algorithm for identification and architecture modeling of tandem repeats in protein sequences. BMC bioinformatics, 8, p.382.
  34. Palidwor, G.A. et al., 2009. Detection of alpha-rod protein repeats using a neural network and application to huntingtin. PLoS computational biology, 5(3), p.e1000304.
  35. Pellegrini, M., 2015. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role. Frontiers in bioengineering and biotechnology, 3, p.143.
  36. Pellegrini, M., Renda, M.E. & Vecchio, A., 2012. Ab initio detection of fuzzy amino acid tandem repeats in protein sequences. BMC Bioinformatics, 13, p.S8.
  37. Radcliffe, N.J., 1991. Equivalence Class Analysis of Genetic Algorithms. Complex Systems, 5(2), pp.183- 205.
  38. Rubinson, E.H. & Eichman, B.F., 2012. Nucleic acid recognition by tandem helical repeats. Current opinion in structural biology, 22(1), pp.101-9.
  39. Sawaya, M.R. et al., 2008. A double S shape provides the structural basis for the extraordinary binding specificity of Dscam isoforms. Cell, 134(6), pp.1007- 18.
  40. Shelenkov, A., Skryabin, K. & Korotkov, E., 2006. Search and classification of potential minisatellite sequences from bacterial genomes. DNA research?: an international journal for rapid publication of reports on genes and genomes, 13(3), pp.89-102.
  41. Smith, T.F. & Waterman, M.S., 1981. Identification of common molecular subsequences. Journal of Molecular Biology, 147, pp.195-197.
  42. Söding, J., Remmert, M. & Biegert, A., 2006. HHrep: de novo protein repeat detection and the origin of TIM barrels. Nucleic acids research, 34(Web Server issue), pp.W137-42.
  43. Sosa, D. et al., 2013. Periodic distribution of a putative nucleosome positioning motif in human, nonhuman primates, and archaea: mutual information analysis. International journal of genomics, 2013, p.963956.
  44. de Sousa Vieira, M., 1999. Statistics of DNA sequences: a low-frequency analysis. Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 60(5 Pt B), pp.5932-5937.
  45. Spears, W.M. & De Jong, K.D., 1991. On the Virtues of Parameterized Uniform Crossover,. Proceedings of the Fourth International Conference on Genetic Algorithms, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, pp.230-236.
  46. Suvorova, Y.M., Korotkova, M.A. & Korotkov, E. V, 2014. Comparative analysis of periodicity search methods in DNA sequences. Computational biology and chemistry, 53 Pt A, pp.43-48.
  47. Sywerda, G., 1989. Uniform crossover in genetic algorithms. Proceedings of the third international conference on Genetic algorithms, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA ©1989, pp.2- 9.
  48. Szklarczyk, R. & Heringa, J., 2004. Tracking repeats using significance and transitivity. Bioinformatics (Oxford, England), 20 Suppl 1, pp.i311-7.
  49. Tiwari, S. et al., 1997. Prediction of probable genes by Fourier analysis of genomic sequences. Computer applications in the biosciences CABIOS, 13(3), pp.263-270.
  50. Turutina, V.P. et al., 2006. Identification of Amino Acid Latent Periodicity within 94 Protein Families. Journal of Computational Biology, 13(4), pp.946-964.
  51. Yang, R. et al., 2004. AglZ is a filament-forming coiledcoil protein required for adventurous gliding motility of Myxococcus xanthus. Journal of bacteriology, 186(18), pp.6168-78.
Download


Paper Citation


in Harvard Style

Pugacheva V., Korotkov A. and Korotkov E. (2016). Search for Latent Periodicity in Amino Acid Sequences with Insertions and Deletions . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 117-127. DOI: 10.5220/0005630401170127


in Bibtex Style

@conference{bioinformatics16,
author={Valentina Pugacheva and Alexander Korotkov and Eugene Korotkov},
title={Search for Latent Periodicity in Amino Acid Sequences with Insertions and Deletions},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)},
year={2016},
pages={117-127},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005630401170127},
isbn={978-989-758-170-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)
TI - Search for Latent Periodicity in Amino Acid Sequences with Insertions and Deletions
SN - 978-989-758-170-0
AU - Pugacheva V.
AU - Korotkov A.
AU - Korotkov E.
PY - 2016
SP - 117
EP - 127
DO - 10.5220/0005630401170127