Geometric Encoding, Filtering, and Visualization of Genomic Sequences

Helena Cristina da Gama Leitão, Rafael Felipe Veiga Saracchini, Jorge Stolfi

2015

Abstract

This article describes a three-channel encoding of nucleotide sequences, and proper formulas for filtering and downsampling such encoded sequences for multi-scale signal analysis. With proper interpolation, the encoded sequences can be visualized as curves in three-dimensional space. The filtering uses Gaussian-like smoothing kernels, chosen so that all levels of the multi-scale pyramid (except the original curve) are practically free from aliasing artifacts and have the same degree of smoothing. With these precautions, the overall shape of the space curve is robust under small changes in the DNA sequence, such as single-point mutations, insertions, deletions, and shifts.

References

  1. Anastassiou, D. (2002). Digital signal processing of biomolecular sequences. Technical Report CU/EE/TR2000-20-042, Department of Electrical Engineering, Columbia University.
  2. Cheever, E. A., Searls, D. B., Karunaratne, W., and Overton, G. C. (1989). Using signal processing techniques for DNA sequence comparison. Proceedings of 15th Bioengineering Conference, pages 173-174.
  3. Cristea, P. (2002). Conversion of nucleotides sequences into genomic signals. Journal of Cellular and Molecular Medicine, 6(2):279-303.
  4. Futschik, A., Hotz, T., Munk, A., and Sieling, H. (2014). Multiscale DNA partitioning: Statistical evidence for segments. Bioinformatics, page btu180.
  5. Knijnenburg, T. A., Ramsey, S. A., Berman, B. P., Kennedy, K. A., Smit, A. F. A., Wessels, L. F. A., Laird, P. W., Aderem, A., and Shmulevich, I. (2014). Multiscale representation of genomic signals. Nature Methods.
  6. Machado, J. A. T., Costa, A. C., and Quelhas, M. D. (2011). Wavelet analysis of human DNA. Genomics, 98(3):155-163.
  7. Pessoˆa, L., Leita˜o, H. C. G., and Stolfi, J. (2004). Mutual information content of homologous DNA sequences. In Proc. 2004 Workshop on Bioinformatics (WOB), number 6016 in Lecture Notes on Informatics, pages 57- 64.
  8. Ravichandran, L., Papandreou-Suppappola, A., Spanias, A., Lacroix, Z., and Legendre, C. (2011). Waveform mapping and time-frequency processing of DNA and protein sequences. IEEE Transactions on Signal Processing, 59(9):4210-4224.
  9. Vincken, K. L., Koster, A. S. E., and Viergever, M. A. (1997). Probabilistic multiscale image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):109-120.
Download


Paper Citation


in Harvard Style

da Gama Leitão H., Saracchini R. and Stolfi J. (2015). Geometric Encoding, Filtering, and Visualization of Genomic Sequences . In Proceedings of the 6th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2015) ISBN 978-989-758-088-8, pages 219-224. DOI: 10.5220/0005297102190224


in Bibtex Style

@conference{ivapp15,
author={Helena Cristina da Gama Leitão and Rafael Felipe Veiga Saracchini and Jorge Stolfi},
title={Geometric Encoding, Filtering, and Visualization of Genomic Sequences},
booktitle={Proceedings of the 6th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2015)},
year={2015},
pages={219-224},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005297102190224},
isbn={978-989-758-088-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2015)
TI - Geometric Encoding, Filtering, and Visualization of Genomic Sequences
SN - 978-989-758-088-8
AU - da Gama Leitão H.
AU - Saracchini R.
AU - Stolfi J.
PY - 2015
SP - 219
EP - 224
DO - 10.5220/0005297102190224