Authors:
Diogo Pratas
;
Armando J. Pinho
and
Sara P. Garcia
Affiliation:
University of Aveiro, Portugal
Keyword(s):
Normalized-compression distance, Finite-context models, Human chromosomal similarity.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Bioinformatics
;
Biomedical Engineering
;
Sequence Analysis
Abstract:
A compression-based similarity measure assesses the similarity between two objects using the number of bits needed to describe one of them when a description of the other is available. For being effective, these measures have to rely on “normal” compression algorithms, roughly meaning that they have to be able to build an internal model of the data being compressed. Often, we find that good “normal” compression methods are slow and those that are fast do not provide acceptable results. In this paper, we propose a method for measuring the similarity of DNA sequences that balances these two goals. The method relies on a mixture of finite-context models and is compared with other methods, including XM, the state-of-the-art DNA compression technique. Moreover, we present a comprehensive study of the inter-chromosomal similarity of the human genome.