Measuring the Similarity of Proteomes using Grammar-based Compression via Domain Combinations

Morihiro Hayashida, Hitoshi Koyano, Jose Nacher

Abstract

Revealing evolution of organisms is one of important biological research topics, and is also useful for understanding the origin of organisms. Hence, genomic sequences have been compared and aligned for finding conserved and functional regions. A protein can contain several domains, which are known as structural and functional units. In the previous work, a proteome, whole kinds of proteins in an organism, was regarded as a set of sequences of protein domains, and a grammar-based compression algorithm was developed for a proteome, where production rules in the grammar represented evolutionary processes, mutation and duplication. In this paper, we propose a similarity measure based on the grammar-based compression, and apply it to hierarchical clustering of seven organisms, Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana, and Escherichia coli. The results suggest that our similarity measure could classify the organisms very well.

Download


Paper Citation


in Harvard Style

Hayashida M., Koyano H. and Nacher J. (2020). Measuring the Similarity of Proteomes using Grammar-based Compression via Domain Combinations.In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, ISBN 978-989-758-398-8, pages 117-122. DOI: 10.5220/0008913101170122


in Bibtex Style

@conference{bioinformatics20,
author={Morihiro Hayashida and Hitoshi Koyano and Jose Nacher},
title={Measuring the Similarity of Proteomes using Grammar-based Compression via Domain Combinations},
booktitle={Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS,},
year={2020},
pages={117-122},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008913101170122},
isbn={978-989-758-398-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS,
TI - Measuring the Similarity of Proteomes using Grammar-based Compression via Domain Combinations
SN - 978-989-758-398-8
AU - Hayashida M.
AU - Koyano H.
AU - Nacher J.
PY - 2020
SP - 117
EP - 122
DO - 10.5220/0008913101170122