Title-based Approach to Relation Discovery from Wikipedia

Rim Zarrad, Narjes Doggaz, Ezzeddine Zagrouba

2013

Abstract

With the advent of the Web and the explosion of available textual data, the field of domain ontology engineering has gained more and more importance. The last decade, several successful tools for automatically harvesting knowledge from web data have been developed, but the extraction of taxonomic and non taxonomic ontological relationships is still far from being fully solved. This paper describes a new approach which extracts ontological relations from Wikipedia. The non-taxonomic relations extraction process is performed by analyzing the titles which appear in each document of the studied corpus. This method is based on regular expressions which appear in titles and from which we can extract not only the two arguments of the relationships but also the labels which describe the relations. The resulting set of labels is used in order to retrieve new relations by analyzing the title hierarchy in each document. Other relations can be extracted from titles and subtitles containing only one term. An enrichment step is also applied by considering each term which appears as a relation argument of the extracted links in order to discover new concepts and new relations. The experiments have been performed on French Wikipedia articles related to the medical field. The precision and recall values are encouraging and seem to validate our approach.

References

  1. Barbu, E., Poesio, M., 2009. Unsupervised Knowledge Extraction for Taxonomies of Concepts from Wikipedia. In: Proceedings of the International Conference in Recent Advances in Natural Language Processing, RANLP-2009. Bulgaria. pp. 28-32.
  2. Ciarmita, M., Gangemi, A., Ratsch, E., Jasmin, S. Isabel, R., 2005. Unsupervised Learning of Semantic Relations between Concepts of Molecular Biology Ontology. In: Proceedings of the 19th international joint conference on Artificial intelligence, IJCAI'05. pp. 659-664.
  3. Faure, D., Poibeau, T., 2000. First experiences of using semantic knowledge learned by ASIUM for information extraction task using INTEX. In: ECAI Workshop on Ontology Learning, ECAI'2000. Germany.
  4. Gomez P. A., Benjamins V.R., 1999. Overview of Knowledge Sharing and Reuse Components: Ontologies and Problem-Solving Methods. IJCAI and the Scandinavian AI Societies. CEUR Workshop Proceedings.
  5. Gómez-Pérez, A., Moreno, A., Pazos, J., Sierra-Alonso, A., 2000. Knowledge Maps: An essential technique for conceptualization. In Data & Knowledge Engineering. 33(2). pp 169-190.
  6. Guarino, N., Welty, C., 2001. Identity and Subsumption, In The Semantics of Relationships: an Interdisciplinary Perspective. R. Green, C.A. Bean, S. Hyon Myseng (Eds). Kluwer. pp 111-126.
  7. Harris, Z., 1954. Distributional structure. Word 10 (23). pp. 146-162.
  8. Jacques, M.P, Rebeyrolle, J., 2006. Titres et structuration des documents. In: Actes International Symposium: Discourse and Document, ISDD'06. France. pp. 01-12.
  9. Kavalec, M. and Spyns, P., 2005. Ontology Learning from Text, chapter A Study on Automated Relation Labelling in Ontology Learning. IOS Press. Amsterdam. pp 44-58.
  10. Kermanidis, K., Fakotakis, N., 2007. One-sided Sampling for Learning Taxonomic Relations in the Modern Greek Economic Domain. In Proceedings of the 19th IEEE Tools with Artificial Intelligence, ICTAI, Vol.2. pp. 354-361
  11. Laurens, F., 2006. Construction d'une ontologie à partir de textes en langage naturel. Master report. University Paris7.
  12. Liu, W., Weichselbraun, A., Scharl, A., Chang, E., 2005. Semi-automatic ontology extension using spreading activation. Journal of Universal Knowledge Management. vol. 1. pp. 50-58.
  13. Maedche, A., Pekar, V., Staab, S., 2002. Ontology learning part one - on discoverying taxonomic relations from the web. Web Intelligence. In Zhong, N., Liu, J., and Yao, Y., editors. Springer. pp. 301- 322.
  14. Marshman, E., 2008. Expressions of uncertainty in candidate knowledge-rich contexts. Terminology. Vol. 14, Number 1. pp 124-151.
  15. Medelyan, O., Milne, D., Legg, C., Witten, I. H., 2009. Mining meaning from Wikipedia. International Journal of Human-Computer Studies. vol. 9. pp 716-754.
  16. Morin, E., 1999. Using Lexico-Syntactic Patterns to Extract Semantic Relations between terms from Tech th nical Corpus. In: Proceedings of the 5 International Congress on Terminology and Knowledge Engineering, TKE'99. Austria. pp 268-278.
  17. Paukkeri, M.S., García-Plaza, A.P., Fresno, V., Unanue, R.M., Honkela, T., 2012. Learning a taxonomy from a set of text documents. Journal of Appl. Soft Comput. Elsevier Science Publishers B. V. Vol 12. pp 1138- 1148.
  18. Pembe, F.C., Tunga, G., 2007. Heading-based sectional hierarchy identification for HTML documents. 22nd international symposium on Computer and information sciences. pp 1-6.
  19. Powers, D. M., 2007. Evaluation: From precision, recall and f-factor to roc, informedness, markedness & correlation. School of Informatics and Engineering. Adelaide, Australia.
  20. Punuru, J., Chen, J., 2012. Learning non-taxonomical semantic relations from domain texts. Journal of Intelligent Information Systems. vol. 38. pp 191-207.
  21. Ruiz-casado, M., Alfonseca, E., Okumura, M., Castells, P., 2008. Information Extraction and Semantic Annotation of Wikipedia. Ontology Learning and Population: Bridging the Gap between Text and Knowledge. pp 145-169.
  22. Sanchez, D., Moreno, A., 2008. Learning non-taxonomic relationships from web documents for domain ontology construction. Data and Knowledge Engineering. vol. 64. pp 600-623.
  23. Schmid, H., 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. In: Proceedings of the International Conference on New Methods in Language Processing. Manchester, UK. pp. 44-49.
  24. Snow, R., Jurafsky, D., Ng, A. Y., 2005. Learning syntactic patterns for automatic hypernym discovery. In: Nineteenth Annual Conference on Neural Information Processing Systems, NIPS 2005. Vancouver, Canada. pp 1297-1304.
  25. Sumida, A., Torisawa, 2008. T. Hacking Wikipedia for hyponymy relation acquisition. Proceedings of IJCNLP. pp 883-888.
  26. Weichselbraun, A., Wohlgenannt, G., Scharl, A., Granitzer, M., Neidhart, T. Juffinger, A., 2009. Discovery and evaluation of non-taxonomic relations in domain ontologies. International Journal of Metadata, Semantics and Ontologies. Vol. 4. pp. 212- 222.
  27. Zarrad, R., Doggaz, N. and Zagrouba, E. 2012a. Concepts Extraction based on HTML Documents Structure. In Proceedings of the 4th International Conference on Agents and Artificial Intelligence, ICAART2012. Vilamoura, Algarve, Portugal, pp. 503-506.
  28. Zarrad, R., Doggaz, N., Zagrouba, E., 2012b. Toward a Taxonomy of Concepts using Web Documents Structure. In Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services, IIWAS2012. Bali, Indonesia. pp. 303-312.
  29. Zesch, T., Müller, C., Gurevych, I., 2008. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC'08. Morocco.
Download


Paper Citation


in Harvard Style

Zarrad R., Doggaz N. and Zagrouba E. (2013). Title-based Approach to Relation Discovery from Wikipedia . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2013) ISBN 978-989-8565-81-5, pages 70-80. DOI: 10.5220/0004547400700080


in Bibtex Style

@conference{keod13,
author={Rim Zarrad and Narjes Doggaz and Ezzeddine Zagrouba},
title={Title-based Approach to Relation Discovery from Wikipedia},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2013)},
year={2013},
pages={70-80},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004547400700080},
isbn={978-989-8565-81-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2013)
TI - Title-based Approach to Relation Discovery from Wikipedia
SN - 978-989-8565-81-5
AU - Zarrad R.
AU - Doggaz N.
AU - Zagrouba E.
PY - 2013
SP - 70
EP - 80
DO - 10.5220/0004547400700080