Search of Periodicity Regions in the Genome A.thaliana - Periodicity Regions in the A.thaliana Genomes

E. V. Korotkov, F. E. Frenkel, M. A. Korotkova

Abstract

A mathematical method was developed in this study to determine tandem repeats in a DNA sequence. A multiple alignment of periods was calculated by direct optimization of the position-weight matrix (PWM) without using pairwise alignments or searching for similarity between periods. Random PWMs were used to develop a new mathematical algorithm for periodicity search. The developed algorithm was applied to analyze the DNA sequences of A.thaliana genome. 13997 regions having a periodicity with length of 2 to 50 bases were found. The average distance between regions with periodicity is ~9000 nucleotides. A significant portion of the revealed regions have periods consisting of 2 nucleotide, 10-11 nucleotides and periods in the vicinity of 30 nucleotides. No more than ~30% of the periods found were discovered early. The sequences found were collected in a data bank from the website: http://victoria.biengi.ac.ru/cgi-in/indelper/index.cgi. This study discussed the origin of periodicity with insertions and deletions.

References

  1. Afreixo, V., Ferreira, P.J.S.G. & Santos, D., 2004. Fourier analysis of symbolic data: A brief review. Digital Signal Processing, 14(6), pp.523-530.
  2. Benson, G., 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research, 27(2), pp.573-580.
  3. Betley, J.N. et al., 2002. A ubiquitous and conserved signal for RNA localization in chordates. Current biology?: CB, 12(20), pp.1756-61.
  4. Boeva, V. et al., 2006. Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics (Oxford, England), 22(6), pp.676-84.
  5. Domaniç, N.O. & Preparata, F.P., 2007. A novel approach to the detection of genomic approximate tandem repeats in the Levenshtein metric. Journal of computational biology a journal of computational molecular cell biology, 14(7), pp.873-891.
  6. Durbin, R. et al., 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press.
  7. Frenkel, F.E. & Korotkov, E. V, 2008. Classification analysis of triplet periodicity in protein-coding regions of genes. Gene, 421(1-2), pp.52-60.
  8. Grissa, I., Vergnaud, G. & Pourcel, C., 2007. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic acids research, 35(Web Server issue), pp.W52-7.
  9. Herzel, H., Weiss, O. & Trifonov, E.N., 1999. 10-11 bp periodicities in complete genomes reflect protein structure and DNA folding. Bioinformatics, 15(3), pp.187-193.
  10. Jorda, J. & Kajava, A. V, 2009. T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics (Oxford, England), 25(20), pp.2632-8.
  11. Kadauke, S. & Blobel, G.A., 2009. Chromatin loops in gene regulation. Biochimica et biophysica acta, 1789(1), pp.17-25.
  12. Kantidze, O.L. & Razin, S. V, 2009. Chromatin loops, illegitimate recombination, and genome evolution. BioEssays: news and reviews in molecular, cellular and developmental biology, 31(3), pp.278-86.
  13. Kolpakov, R., Bana, G. & Kucherov, G., 2003. mreps: Efficient and flexible detection of tandem repeats in DNA. Nucleic acids research, 31(13), pp.3672-8.
  14. Korotkov, E.V., Korotkova, M.A. & Kudryashov, N.A., 2003. The informational concept of searching for periodicity in symbol sequences. Molekuliarnaia Biologiia, 37(3), pp.436-451.
  15. Korotkov, Korotkova & Kudryashov, 2003. Information decomposition method to analyze symbolical sequences. Physics Letters, Section A: General, Atomic and Solid State Physics, 312(3-4), pp.198-210.
  16. Kravatskaya, G.I. et al., 2011. Coexistence of different base periodicities in prokaryotic genomes as related to DNA curvature, supercoiling, and transcription. Genomics, 98(3), pp.223-231.
  17. Kullback, S., 1997. Information Theory and Statistics S. Kullback, ed., New York: Dover publications.
  18. Kumar, L., Futschik, M. & Herzel, H., 2006. DNA motifs and sequence periodicities. In silico biology, 6(1-2), pp.71-8.
  19. Larsabal, E. & Danchin, A., 2005. Genomes are covered with ubiquitous 11 bp periodic patterns, the “class A flexible patterns”. BMC bioinformatics, 6, p.206.
  20. Lim, K.G. et al., 2013. Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Briefings in bioinformatics, 14(1), pp.67-81.
  21. Lobzin, V. V. & Chechetkin, V.R., 2000. Order and correlations in genomic DNA sequences. The spectral approach. Uspekhi Fizicheskih Nauk, 170(1), p.57.
  22. Mehrotra, S. & Goyal, V., 2014. Repetitive sequences in plant nuclear DNA: types, distribution, evolution and function. Genomics, proteomics & bioinformatics, 12(4), pp.164-71.
  23. Meng, T. et al., 2013. Wavelet analysis in current cancer genome research: a survey. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 10(6), pp.1442-59.
  24. Moniruzzaman, M. et al., 2016. Development of Microsatellites: A Powerful Genetic Marker. The Agriculturists, 13(1), p.152.
  25. Mudunuri, S.B. et al., 2010. G-IMEx: A comprehensive software tool for detection of microsatellites from genome sequences. Bioinformation, 5(5), pp.221-3.
  26. Mudunuri, S.B. & Nagarajaram, H.A., 2007. IMEx: Imperfect Microsatellite Extractor. Bioinformatics (Oxford, England), 23(10), pp.1181-7.
  27. Parisi, V., De Fonzo, V. & Aluffi-Pentini, F., 2003. STRING: Finding tandem repeats in DNA sequences. Bioinformatics, 19(14), pp.1733-1738.
  28. Pellegrini, M., Renda, M.E. & Vecchio, A., 2010. TRStalker: an efficient heuristic for finding fuzzy tandem repeats. Bioinformatics (Oxford, England), 26(12), pp.i358-66.
  29. Pokrzywa, R. & Polanski, A., 2010. BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows-Wheeler transform. Genomics, 96(5), pp.316-21.
  30. Pugacheva V.M., Korotkov A.E & Korotkov E.V., 2016. Search of latent periodicity in amino acid sequences by means of genetic algorithm and dynamic programming. Statistical application in genetics and molecular biology, 15(4).
  31. Pugacheva, V., Korotkov, A. and Korotkov, E., 2016. Search for Latent Periodicity in Amino Acid Sequences with Insertions and Deletions. In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2016). SCITEPRESS - Science and Technology Publications, Lda., pp. 117-127.
  32. Richard, G.-F., Kerrest, A. & Dujon, B., 2008. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiology and molecular biology reviews: MMBR, 72(4), pp.686- 727.
  33. Schieg, P. & Herzel, H., 2004. Periodicities of 10-11bp as indicators of the supercoiled state of genomic DNA. Journal of molecular biology, 343(4), pp.891-901.
  34. Shelenkov, A., Skryabin, K. & Korotkov, E., 2006. Search and classification of potential minisatellite sequences from bacterial genomes. DNA research: an international journal for rapid publication of reports on genes and genomes, 13(3), pp.89-102.
  35. Sinha, S., 2006. On counting position weight matrix matches in a sequence, with application to discriminative motif finding. In Bioinformatics.
  36. Smith, T.F. & Waterman, M.S., 1981. Identification of common molecular subsequences. Journal of Molecular Biology, 147, pp.195-197.
  37. Sokol, D. & Tojeira, J., 2014. Speeding up the detection of tandem repeats over the edit distance. Theoretical Computer Science, 525, pp.103-110.
  38. Suvorova, Y.M., Korotkova, M.A. & Korotkov, E. V, 2014. Comparative analysis of periodicity search methods in DNA sequences. Computational biology and chemistry, 53 Pt A, pp.43-48.
  39. Turutina, V.P. et al., 2006. Identification of Amino Acid Latent Periodicity within 94 Protein Families. Journal of Computational Biology, 13(4), pp.946-964.
  40. Wexler, Y. et al., 2005. Finding approximate tandem repeats in genomic sequences. Journal of computational biology: a journal of computational molecular cell biology, 12(7), pp.928-42.
Download


Paper Citation


in Harvard Style

Korotkov E., Frenkel F. and Korotkova M. (2017). Search of Periodicity Regions in the Genome A.thaliana - Periodicity Regions in the A.thaliana Genomes . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017) ISBN 978-989-758-214-1, pages 125-132. DOI: 10.5220/0006106001250132


in Bibtex Style

@conference{bioinformatics17,
author={E. V. Korotkov and F. E. Frenkel and M. A. Korotkova},
title={Search of Periodicity Regions in the Genome A.thaliana - Periodicity Regions in the A.thaliana Genomes},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={125-132},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006106001250132},
isbn={978-989-758-214-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)
TI - Search of Periodicity Regions in the Genome A.thaliana - Periodicity Regions in the A.thaliana Genomes
SN - 978-989-758-214-1
AU - Korotkov E.
AU - Frenkel F.
AU - Korotkova M.
PY - 2017
SP - 125
EP - 132
DO - 10.5220/0006106001250132