A HYBRID METHOD FOR DOMAIN ONTOLOGY CONSTRUCTION FROM THE WEB

B. Frikh, A. S. Djaanfar, B. Ouhbi

Abstract

This paper describes a hybrid statistical and semantic relationships among model concepts for ontology construction. The implementation of the model, called HCHIRSIM (Hybrid Chir-Statistic and Similarity), can be adapted to any domain ontology learning from the Web. It can be viewed as a combination of information from inference view of concepts by using the CHIR-statistic method and the semantic relationships among concepts from the Web by the mutual information measure. The experiments show that our hybrid approach outperforms both purely statistical and purely semantic relationships among concepts approaches. The successful evaluation of our method with different values of the weighting parameter shows that the proposed approach can effectively construct a cancer domain ontology from unstructured text documents.

References

  1. Brun, A., Smaili, K., and Haton, J.-P. (2002). Wsim : une mthode de dtection de thme fonde sur la similarit entre mots. In 9me conf. fran. TALN'2002, Nancy, France.
  2. Budanitsky, A. (1999). Lexical semantic relatedness and its application in natural language processing. In Technical Report CSRG-390. Computer Systems Research Group, University of Toronto.
  3. Church, K. and Hanks, P. (1990). Word association norms, mutual information, and lexicograph. Computational Linguistics, 16(1).
  4. Craven, M., Dipasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery (2000). Learning to construct knowledge bases from the world wide web. Artificial Intelligence, 118(1):69-113.
  5. Croft, B. and Ponte, J. (1998). A language modeling approach to information retrieval. In 21st International Conference on Research and Development in Information Retrieval.
  6. Dagan, I., Lee, L., and Pereira, F. C. N. (1999). Similaritybased models of word co-occurrence probabilities. Machine Learning, 34:43-69.
  7. Djaanfar, A. S., Frikh, B., and Ouhbi, B. (2010). A domain ontology learning from the web. In M. Saadi et al. (eds), Studies in Comp. Intel., Vol(315), 201-208,. Springer-Verlag.
  8. Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderland, S., , Weld, D., and Yates, A. (2005). Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence, 165(1):91-134.
  9. Fotzo, H. and Gallinari, P. (2004). Learning generalization/specialization relations between concepts application for automatically building thematic document hierarchies. In The 7th International Conference on Computer-Assisted Information Retrieval (RIAO). RIAO Vaucluse, France.
  10. Frikh, B., Djaanfar, A. S., and Ouhbi, B. (2009). An intelligent surfer model combining web contents and links based on simultaneous multiple-term query. In The seventh ACS/IEEE International Conference on Computer Systems and Applications (AICCSA-2009), IEEE Computer Society.
  11. Fuhr, N. (1992). Probabilistic models in information retrieval. The Computer Journal, 35(3):243-255.
  12. Li, Y., Luo, C., and Chung, S. M. (2008). Text clustering with feature selection by using statistical data knowledge and data engineering. IEEE Transactions on Know and Data Eng., 20(5):641-651.
  13. Maedche, A., Pekar, V., and Staab, S. (2002). Ontology learning part one-on discovering taxonomic relations from the web. Springer-Verlag.
  14. Maedche, A. and Staab, S. (2002). Measuring similarity between ontologies. In European Conference on Knowledge Acquisition and Management (EKAW), Madrid, Spain.
  15. Mesh (2010). Medical Subject Headings. National Library of Medicine's controlled vocabulary thesaurus.
  16. OWL (2004). Web Ontology Language. W3C Recommendation 10 February.
  17. Petasis, G., Karkaletsis, V., and Spyropoulos, C. (2003). Cross-lingual information extraction from web pages: The use of a general-puepose text engineering platform. In the 4th International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria.
  18. Porzel, R. and Malaka, R. (2004). A task-based approach for ontology evaluation. In The 16th European Conference on Artificial Intelligence. Valencia, Spain.
  19. RDF (2004). Resource Description Framework. W3C Recommendation 10 February.
  20. Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11(1):95-130.
  21. Sabou, M., Wroe, C., Goble, C., and Mishne, G. (2005). Learning domain ontologies for web service descriptions: an experiment in bio-informatics. In The 14 International Conference on World Wide Web.
  22. Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513-523.
  23. Sanchez, D. and Moreno, A. (2003). Web-scale taxonomy learning. In Tech. Rep. of Dep. Computer Science and Mathematics. University Rovira i Virgili.
  24. Sanchez, D. and Moreno, A. (2004). Creating ontologies from web documents. Recent Advances in Artificial Intelligence Research and Development, 113:11-18.
  25. Senellart, P. and Blondel, V. (2003). Automatic discovery of similar words. Springer-Verlag.
  26. Strehl, A. (2002). Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining. PhD thesis, University of Texas at Austin.
  27. Turney, P. (2001). Mining the web for synonyms: Pmi-ir versus lsa on toefl. In The 12th European Conference on Machine Learning. ECML, Freiburg, Germany.
  28. Velardi, P., Navigli, R., Cucchiarelli, A., and Neri, F. (2005). Evaluation of ontolearn, a methodology for automatic learning of ontologies. IOS Press.
  29. Wong, W., Liu, W., and Bennamoun, M. (2006). Featureless similarities for terms clustering using tree-traversing ants. In The International Symposium on Practical Cognitive Agents and Robots. (PCAR), Perth, Australia.
Download


Paper Citation


in Harvard Style

Frikh B., S. Djaanfar A. and Ouhbi B. (2011). A HYBRID METHOD FOR DOMAIN ONTOLOGY CONSTRUCTION FROM THE WEB . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2011) ISBN 978-989-8425-80-5, pages 285-292. DOI: 10.5220/0003667502850292


in Bibtex Style

@conference{keod11,
author={B. Frikh and A. S. Djaanfar and B. Ouhbi},
title={A HYBRID METHOD FOR DOMAIN ONTOLOGY CONSTRUCTION FROM THE WEB},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2011)},
year={2011},
pages={285-292},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003667502850292},
isbn={978-989-8425-80-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2011)
TI - A HYBRID METHOD FOR DOMAIN ONTOLOGY CONSTRUCTION FROM THE WEB
SN - 978-989-8425-80-5
AU - Frikh B.
AU - S. Djaanfar A.
AU - Ouhbi B.
PY - 2011
SP - 285
EP - 292
DO - 10.5220/0003667502850292