Authors:
Markus E. Nebel
and
Anika Scheid
Affiliation:
University of Kaiserslautern, Germany
Keyword(s):
RNA folding, RNA secondary structure, Computational prediction, Probabilistic modeling, Stochastic context-free grammars, Statistical sampling, Inside-outside calculation, Heuristic approximation.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Bioinformatics
;
Biomedical Engineering
;
Structural Bioinformatics
;
Structure Prediction
Abstract:
Several state-of-the-art tools for predicting RNA secondary structures have worst-case time and space requirements
of O(n3) and O(n2) for sequence length n, limiting their applicability for practical purposes. Accordingly,
biologists are interested in getting results faster, where a moderate loss of accuracy would willingly be
tolerated. For this reason, we propose a novel algorithm for structure prediction that reduces the time complexity
by a linear factor to O(n2), while still being able to produce high quality results. Basically, our method
relies on a probabilistic sampling approach based on an appropriate stochastic context-free grammar (SCFG):
using a well-known or a newly introduced sampling strategy it generates a random set of candidate structures
(from the ensemble of all feasible foldings) according to a “noisy” distribution (obtained by heuristically approximating
the inside-outside values) for a given sequence, such that finally a corresponding prediction can
be efficie
ntly derived. Sampling can easily be parallelized. Furthermore, it can be done in-place, i.e. only
the best (most probable) candidate structure generated so far needs to be stored and finally communicated.
Together, this allows to efficiently handle increased sample sizes necessary to achieve competitive prediction
accuracy in connection with the noisy distribution.
(More)