loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Kam-Heung Sze 1 ; Zhiqiang Xiong 1 ; Jinlong Ma 1 ; Gang Lu 2 ; Wai-Yee Chan 2 and Hongjian Li 1 ; 2

Affiliations: 1 Bioinformatics Unit, SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, New Territories, Hong Kong ; 2 CUHK-SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong

Keyword(s): Molecular Docking, Binding Affinity Prediction, Machine Learning, Feature Engineering, Data Similarity.

Abstract: Inconsistent conclusions have been drawn from recent studies exploring the influence of data similarity on the scoring power of machine-learning scoring functions, but they were all based on the PDBbind v2007 refined set whose data size is limited to just 1300 protein-ligand complexes. Whether these conclusions can be generalized to substantially larger and more diverse datasets warrants further examinations. Besides, the previous definition of protein structure similarity, which relied on aligning monomers, might not truly reflect what it was supposed to be. Moreover, the impact of binding pocket similarity has not been investigated either. Here we have employed the updated refined set v2013 providing 2959 complexes and utilized not only protein structure and ligand fingerprint similarity but also a novel measure based on binding pocket topology dissimilarity to systematically control how similar or dissimilar complexes are incorporated for training predictive models. Three empirica l scoring functions X-Score, AutoDock Vina, Cyscore and their random forest counterparts were evaluated. Results have confirmed that dissimilar training complexes may be valuable if allied with appropriate machine learning algorithms and informative descriptor sets. Machine-learning scoring functions acquire their remarkable scoring power through mining more data to advance performance persistently, whereas classical scoring functions lack such learning ability. The software code and data used in this study and supplementary results are available at https://GitHub.com/HongjianLi/MLSF. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.133.157.12

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Sze, K.; Xiong, Z.; Ma, J.; Lu, G.; Chan, W. and Li, H. (2020). Influence of Data Similarity on the Scoring Power of Machine-learning Scoring Functions for Docking. In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - BIOINFORMATICS; ISBN 978-989-758-398-8; ISSN 2184-4305, SciTePress, pages 85-92. DOI: 10.5220/0008873800850092

@conference{bioinformatics20,
author={Kam{-}Heung Sze. and Zhiqiang Xiong. and Jinlong Ma. and Gang Lu. and Wai{-}Yee Chan. and Hongjian Li.},
title={Influence of Data Similarity on the Scoring Power of Machine-learning Scoring Functions for Docking},
booktitle={Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - BIOINFORMATICS},
year={2020},
pages={85-92},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008873800850092},
isbn={978-989-758-398-8},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - BIOINFORMATICS
TI - Influence of Data Similarity on the Scoring Power of Machine-learning Scoring Functions for Docking
SN - 978-989-758-398-8
IS - 2184-4305
AU - Sze, K.
AU - Xiong, Z.
AU - Ma, J.
AU - Lu, G.
AU - Chan, W.
AU - Li, H.
PY - 2020
SP - 85
EP - 92
DO - 10.5220/0008873800850092
PB - SciTePress