Protein Disorder Prediction using Information Theory Measures on the Distribution of the Dihedral Torsion Angles from Ramachandran Plots

Jonny A. Uribe, Julián D. Arias-Londoño, Alexandre Perera-Lluna

2017

Abstract

This paper addresses the problem of order/disorder prediction in protein sequences from alignment free methods. The proposed approach is based on a set of 11 information theory measures estimated from the distribution of the dihedral torsion angles in the amino acid chain. The aim is to characterize the energetically allowed regions for amino acids in the protein structures, as a way of measuring the rigidity/flexibility of every amino acid in the chain, and the effect of such rigidity on the disorder propensity. The features are estimated from empirical Ramachandran Plots obtained using the Protein Geometry Database. The proposed features are used in conjunction with well-established features in the state of the art for disorder prediction. The classification is performed using two different strategies: one based on conventional supervised methods and the other one based on structural learning. The performance is evaluated in terms of AUC (Area Under the ROC Curve), and three suitable performance metrics for unbalanced classification problems. The results show that the proposed scheme using conventional supervised methods is able to achieve results similar than well-known alignment free methods for disorder prediction. Moreover, the scheme based on structural learning outperforms the results obtained for all the methods evaluated, including three alignment-based methods.

References

  1. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 25(17):3389-3402.
  2. Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A., and Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16(5):412-424.
  3. Baruah, A., Rani, P., and Biswas, P. (2015). Conformational entropy of intrinsically disordered proteins from amino acid triads. Scientific reports, 5.
  4. Berkholz, D. S., Krenesky, P. B., Davidson, J. R., and Karplus, P. A. (2009). Protein Geometry Database: a flexible engine to explore backbone conformations and their relationships to covalent geometry. Nucleic acids research, page gkp1013.
  5. Bulashevska, A. and Eils, R. (2008). Using Bayesian multinomial classifier to predict whether a given protein sequence is intrinsically disordered. Journal of theoretical biology, 254(4):799-803.
  6. Campen, A., Williams, R. M., Brown, C. J., Meng, J., Uversky, V. N., and Dunker, A. K. (2008). TOPIDP-scale: a new amino acid scale measuring propensity for intrinsic disorder. Protein and peptide letters, 15(9):956.
  7. Chou, K.-C. (2001). Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Structure, Function, and Bioinformatics, 43(3):246- 255.
  8. Das, R. K., Ruff, K. M., and Pappu, R. V. (2015). Relating sequence encoded information to form and function of intrinsically disordered proteins. Current opinion in structural biology, 32:102-112.
  9. DeForte, S. and Uversky, V. N. (2016). Order, disorder, and everything in between. Molecules, 21(8):1090.
  10. Deng, X., Eickholt, J., and Cheng, J. (2012). A comprehensive overview of computational protein disorder prediction methods. Molecular BioSystems, 8(1):114- 121.
  11. Dosztnyi, Z., Csizmok, V., Tompa, P., and Simon, I. (2005). IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics, 21(16):3433-3434.
  12. Dunker, A. K., Oldfield, C. J., Meng, J., Romero, P., Yang, J. Y., Chen, J. W., Vacic, V., Obradovic, Z., and Uversky, V. N. (2008). The unfoldomics decade: an update on intrinsically disordered proteins. BMC genomics, 9(Suppl 2):S1.
  13. Faraggi, E., Yang, Y., Zhang, S., and Zhou, Y. (2009). Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure, 17(11):1515- 1527.
  14. Faraggi, E., Zhang, T., Yang, Y., Kurgan, L., and Zhou, Y. (2012). SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. Journal of computational chemistry, 33(3):259-267.
  15. He, B., Wang, K., Liu, Y., Xue, B., Uversky, V. N., and Dunker, A. K. (2009). Predicting intrinsic disorder in proteins: an overview. Cell research, 19(8):929-949.
  16. Hollingsworth, S. A. and Karplus, P. A. (2010). A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins. Biomolecular concepts, 1(3-4):271-283.
  17. Huang, F., Oldfield, C. J., Xue, B., Hsu, W.-L., Meng, J., Liu, X., Shen, L., Romero, P., Uversky, V. N., and Dunker, A. K. (2014). Improving protein orderdisorder classification using charge-hydropathy plots. BMC bioinformatics, 15(Suppl 17):S4.
  18. Jones, D. T. and Cozzetto, D. (2014). DISOPRED3: precise disordered region predictions with annotated proteinbinding activity. Bioinformatics, page btu744.
  19. Kawashima, S. and Kanehisa, M. (2000). AAindex: amino acid index database. Nucleic acids research, 28(1):374-374.
  20. Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18h International Conference on Machine Learning - ICML 2001, pages 282-289.
  21. Lieutaud, P., Canard, B., and Longhi, S. (2008). MeDor: a metaserver for predicting protein disorder. BMC genomics, 9(Suppl 2):S25.
  22. McGuffin, L. J., Bryson, K., and Jones, D. T. (2000). The PSIPRED protein structure prediction server. Bioinformatics, 16(4):404-405.
  23. Oates, M. E., Romero, P., Ishida, T., Ghalwash, M., Mizianty, M. J., Xue, B., Dosztnyi, Z., Uversky, V. N., Obradovic, Z., Kurgan, L., and others (2013). D2p2: database of disordered protein predictions. Nucleic acids research, 41(D1):D508-D516.
  24. Peng, Z., Yan, J., Fan, X., Mizianty, M. J., Xue, B., Wang, K., Hu, G., Uversky, V. N., and Kurgan, L. (2015). Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life. Cellular and Molecular Life Sciences, 72(1):137- 151.
  25. Potenza, E., Di Domenico, T., Walsh, I., and Tosatto, S. C. (2015). MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins. Nucleic acids research, 43(D1):D315-D320.
  26. Sickmeier, M., Hamilton, J. A., LeGall, T., Vacic, V., Cortese, M. S., Tantos, A., Szabo, B., Tompa, P., Chen, J., Uversky, V. N., and others (2007). DisProt: the database of disordered proteins. Nucleic acids research, 35(suppl 1):D'6-D793.
  27. Sirota, F. L., Ooi, H.-S., Gattermayer, T., Schneider, G., Eisenhaber, F., and Maurer-Stroh, S. (2010). Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset. BMC genomics, 11(Suppl 1):S15.
  28. Uversky, V. N. (2013). Unusual biophysics of intrinsically disordered proteins. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics, 1834(5):932-951.
  29. Uversky, V. N., Oldfield, C. J., and Dunker, A. K. (2008). Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu. Rev. Biophys., 37:215-246.
  30. Vapnik, V. N. (1998). Statistical Learning Theory. WileyInterscience.
  31. Varadi, M., Vranken, W., Guharoy, M., and Tompa, P. (2015). Computational approaches for inferring the functions of intrinsically disordered proteins. Frontiers in molecular biosciences, 2.
  32. Venkatarajan, M. S. and Braun, W. (2001). New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physicalchemical properties. Molecular modeling annual, 7(12):445-453.
  33. Walsh, I., Martin, A. J., Di Domenico, T., and Tosatto, S. C. (2012). ESpritz: accurate and fast prediction of protein disorder. Bioinformatics, 28(4):503-509.
  34. Xue, B., Dunbrack, R. L., Williams, R. W., Dunker, A. K., and Uversky, V. N. (2010). PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics, 1804(4):996-1010.
  35. Zhang, T., Faraggi, E., Xue, B., Dunker, A. K., Uversky, V. N., and Zhou, Y. (2012). SPINE-D: accurate prediction of short and long disordered regions by a single neural-network based method. Journal of Biomolecular Structure and Dynamics, 29(4):799-813.
Download


Paper Citation


in Harvard Style

Uribe J., Arias-Londoño J. and Perera-Lluna A. (2017). Protein Disorder Prediction using Information Theory Measures on the Distribution of the Dihedral Torsion Angles from Ramachandran Plots . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017) ISBN 978-989-758-214-1, pages 43-51. DOI: 10.5220/0006140500430051


in Bibtex Style

@conference{bioinformatics17,
author={Jonny A. Uribe and Julián D. Arias-Londoño and Alexandre Perera-Lluna},
title={Protein Disorder Prediction using Information Theory Measures on the Distribution of the Dihedral Torsion Angles from Ramachandran Plots},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={43-51},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006140500430051},
isbn={978-989-758-214-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)
TI - Protein Disorder Prediction using Information Theory Measures on the Distribution of the Dihedral Torsion Angles from Ramachandran Plots
SN - 978-989-758-214-1
AU - Uribe J.
AU - Arias-Londoño J.
AU - Perera-Lluna A.
PY - 2017
SP - 43
EP - 51
DO - 10.5220/0006140500430051