loading
Papers

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Conrad J. Burden ; Paul Leopardi and Sylvain Forêt

Affiliation: Australian National University, Australia

ISBN: 978-989-8565-35-8

Keyword(s): Word Matches, Biological Sequence Comparison.

Related Ontology Subjects/Areas/Topics: Bioinformatics ; Biomedical Engineering ; Biostatistics and Stochastic Models ; Pattern Recognition, Clustering and Classification ; Sequence Analysis

Abstract: The D2 statistic, which counts the number of word matches between two given sequences, has long been proposed as a measure of similarity for biological sequences. Much of the mathematically rigorous work carried out to date on the properties of the D2 statistic has been restricted to the case of ‘Bernoulli’ sequences composed of identically and independently distributed letters. Here the properties of the distribution of this statistic for the biologically more realistic case of Markovian sequences is studied. The approach is novel in that Markovian dependency is defined for sequences with periodic boundary conditions, and this enables exact analytic formulae for the mean and variance to be derived. The formulae are confirmed using numerical simulations, and asymptotic approximations to the full distribution are tested.

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.231.229.89

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
J. Burden, C.; Leopardi, P. and Forêt, S. (2013). The Distribution of Short Word Match Counts between Markovian Sequences.In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 25-33. DOI: 10.5220/0004203700250033

@conference{bioinformatics13,
author={Conrad J. Burden. and Paul Leopardi. and Sylvain Forêt.},
title={The Distribution of Short Word Match Counts between Markovian Sequences},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},
year={2013},
pages={25-33},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004203700250033},
isbn={978-989-8565-35-8},
}

TY - CONF

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - The Distribution of Short Word Match Counts between Markovian Sequences
SN - 978-989-8565-35-8
AU - J. Burden, C.
AU - Leopardi, P.
AU - Forêt, S.
PY - 2013
SP - 25
EP - 33
DO - 10.5220/0004203700250033

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.