A KDD APPROACH FOR DESIGNING FILTERING STRATEGIES TO IMPROVE VIRTUAL SCREENING

Leo Ghemtio, Malika Smaïl-Tabbone, Marie-Dominique Devignes, Michel Souchet, Bernard Maigret

2009

Abstract

Virtual screening has become an essential step in the early drug discovery process. Generally speaking, it consists in using computational techniques for selecting compounds from chemical libraries in order to identify drug-like molecules acting on a biological target of therapeutic interest. In the present study we consider virtual screening as a particular form of the KDD (Knowledge Discovery from Databases) approach. The knowledge to be discovered concerns the way a compound can be considered as a consistent ligand for a given target. The data from which this knowledge has to be discovered derive from diverse sources such as chemical, structural, and biological data related to ligands and their cognate targets. More precisely, we aim to extract filters from chemical libraries and protein-ligand interactions. In this context, the three basic steps of a KDD process have to be implemented. Firstly, a model-driven data integration step is applied to appropriate heterogeneous data found in public databases. This facilitates subsequent extraction of various datasets for mining. In a second step, mining algorithms are applied to the datasets and finally the most accurate knowledge units are eventually proposed as new filters. We present here this KDD approach and the experimental results we obtained with a set of ligands of the hormone receptor LXR.

References

  1. Beautrait, A. et al. 2008. Multiple-step virtual screening using VSM-G: overview and validation of fast geometrical matching enrichment, Journal of Molecular Modeling, 14, 135-48.
  2. Bennett, D.J., Carswell, E.L., Cooke, A.J., Edwards, A.S. & Nimz, O. 2008. Design, structure activity relationships and X-Ray co-crystallography of nonsteroidal LXR agonists. Curr Med Chem 15, 195-209.
  3. Berman, H., WestBrook, J., Feng, A., Gililand, G., Bhat, T., Weissig, H., Shinlyalov, I., Bourne, P., 2000. The Protein Data Bank. Nucl. Acid. Res. 28: 235-242.
  4. Cai, W., Xu J., Shao X., Leroux V., Beautrait A., Maigret B., 2008. SHEF: a vHTS geometrical filter using coefficients of spherical harmonic molecular surfaces. J Mol Model 14, 393-401.
  5. Dzeroski, S., and Lavrac, N.(Eds.), 2001. Relational Data Mining. Springer.
  6. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., 1996. From Data Mining to Knowledge Discovery: an Overview. MIT Press, Cambridge MA.
  7. Feher, M. (2006) Consensus scoring for protein-ligand interactions, Drug Discovery Today, 11, 421-428.
  8. Finn, P., Muggleton, S., Page, D., Srinivasan, A., 1998. Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL. Machine Learning 30(2- 3): 241-270.
  9. Halgren, T. A., Murphy, R. B., Friesner, R. A., Beard, H. S., Frye, L. L., Pollard, W. T., Banks, J. L. 2004. Glide: A New Approach for Rapid, Accurate Docking and Scoring. J. Med. Chem., 47, 1750-1759.
  10. Janowski, B.A. et al. 1999. Structural requirements of ligands for the oxysterol liver X receptors LXRalpha and LXRbeta. Proc Natl Acad Sci U S A 96, 266-71.
  11. Jones G., Willett P., Glen R.C., Leach A.R., Taylor R. 1997. Development and validation of a genetic algorithm for flexible docking. J Mol Biol., 267, 727- 48.
  12. Jorgensen, W. L., 2004. The Many Roles of Computation in Drug Discovery. Science 303, 5665-5682.
  13. Karp P., Lee T., Wagner V., 2008. BioWarehouse: Relational Integration of Eleven Bioinformatics Databases and Formats. In Data Integration in the Life Sciences, LNCS 5109, Springer Berlin / Heidelberg.
  14. Kirchmair, J., Distinto, S., Schuster, D., Spitzer, G., Langer, T. and Wolber, G. (2008) Enhancing drug discovery through in silico screening: strategies to increase true positives retrieval rates, Current medicinal chemistry, 15, 2040-2053.
  15. Köppen, H., 2009. Virtual screening - What does it give us? Curr Opin Drug Discov Devel., 12(3), 397-407.
  16. Krovat, E.M., Steindl T., Langer, T., 2005. Recent Advances in Docking and Scoring, Current Computer - Aided Drug Design, 1, 93-102.
  17. Lala, D.S. 2005. The liver X receptors. Curr Opin Investig Drugs 6, 934-43.
  18. Maron, O., T. Lozano-Perez, T., 1998. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems (NIPS), pages 570- 576. MIT Press.
  19. Page, D., Craven, M., 2003. Biological applications of multi-relational data mining. SIGKDD Explorations 5(1): 69--79.
  20. Spencer, T.A. et al. 2001. Pharmacophore analysis of the nuclear oxysterol receptor LXRalpha. J Med Chem 44, 886-97.
  21. Winkler D.A., 2002. The role of quantitative structureactivity relationships in molecular discovery. Briefings in Bioinformatics 3, 73-86
  22. Witten, I., and Frank, E., 2005. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), Morgan Kaufmann.
Download


Paper Citation


in Harvard Style

Ghemtio L., Smaïl-Tabbone M., Devignes M., Souchet M. and Maigret B. (2009). A KDD APPROACH FOR DESIGNING FILTERING STRATEGIES TO IMPROVE VIRTUAL SCREENING . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 146-151. DOI: 10.5220/0002292301460151


in Bibtex Style

@conference{kdir09,
author={Leo Ghemtio and Malika Smaïl-Tabbone and Marie-Dominique Devignes and Michel Souchet and Bernard Maigret},
title={A KDD APPROACH FOR DESIGNING FILTERING STRATEGIES TO IMPROVE VIRTUAL SCREENING},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},
year={2009},
pages={146-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002292301460151},
isbn={978-989-674-011-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
TI - A KDD APPROACH FOR DESIGNING FILTERING STRATEGIES TO IMPROVE VIRTUAL SCREENING
SN - 978-989-674-011-5
AU - Ghemtio L.
AU - Smaïl-Tabbone M.
AU - Devignes M.
AU - Souchet M.
AU - Maigret B.
PY - 2009
SP - 146
EP - 151
DO - 10.5220/0002292301460151