Authors:
G. Arango-Argoty
1
;
A. F. Giraldo-Forero
1
;
J. A. Jaramillo-Garzón
2
;
L. Duque-Muñoz
2
and
G. Castellanos-Dominguez
1
Affiliations:
1
Universidad Nacional de Colombia, Colombia
;
2
Universidad Nacional de Colombia and Instituto Tecnológico Metropolitano, Colombia
Keyword(s):
Amino Acid Properties, Dissimilarity based Classification, Molecular Function, Motifs, Wavelet Transform.
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Data Mining and Machine Learning
;
Pattern Recognition, Clustering and Classification
Abstract:
Predicting molecular functions of proteins is a fundamental challenge in bioinformatics. Commonly used algorithms are based on sequence alignments and fail when the training sequences have low percentages of identity with query proteins, as it is the case for non-model organisms such as land plants. On the other hand, machine learning-based algorithms offer a good alternative for prediction, but most of them ignore that molecular functions are conditioned by functional domains instead of global features of the whole sequence. This work presents a novel application of theWavelet Transform in order to detect discriminant sub-sequences (motifs) and use them as input for a pattern recognition classifier. The results show that the continuous wavelet transform is a suitable tool for the identification and characterization of motifs. Also, the proposed classification methodology shows good prediction capabilities for datasets with low percentage of identity among sequences, outperforming BL
AST2GO on about 11,5% and PEPSTATS-SVMon 16,4%. Plus, it offers major interpretability of the obtained results.
(More)