AUTOMATIC GENERATION OF CONCEPT TAXONOMIES FROM WEB SEARCH DATA USING SUPPORT VECTOR MACHINE

Robertas Damaševičius

2009

Abstract

Ontologies and concept taxonomies are essential parts of the Semantic Web infrastructure. Since manual construction of taxonomies requires considerable efforts, automated methods for taxonomy construction should be considered. In this paper, an approach for automatic derivation of concept taxonomies from web search results is presented. The method is based on generating derivative features from web search data and applying the machine learning techniques. The Support Vector Machine (SVM) classifier is trained with known concept hyponym-hypernym pairs and the obtained classification model is used to predict new hyponymy (is-a) relations. Prediction results are used to generate concept taxonomies in OWL. The results of the application of the approach for constructing colour taxonomy are presented.

References

  1. Berners-Lee, T., Hendler, J., Lassila, O., 2001. The Semantic Web. Scientific American, May 2001, pp. 29-37.
  2. Cimiano, P., Hotho, A., Staab, S., 2004. Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. Proc. of European Conf. on Artificial Intelligence ECAI 2004, pp. 435- 439.
  3. Clerkin, P., Cunningham P., Hayes, C., 2001. Ontology Discovery for the Semantic Web using Hierarchical Clustering. Proc. of the Semantic Web Mining Workshop at ECML/PKDD 2001, Freiburg, Germany.
  4. Cristianini, N., Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press.
  5. Cui, G., Lu, Q., Li, W., Chen, Y., 2008. Corpus Exploitation from Wikipedia for Ontology Construction. Proc. of 6th Int. Language Resources and Evaluation Conference LREC 2008, Marrakech, Morocco, 28-30 May.
  6. Daille, B., 1996. Study and implementation of combined techniques for automatic extraction of terminology. In Resnick, P., Klavans, J. (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language, MIT Press, Cambridge, MA.
  7. Damaševicius, R., 2009. Ontology of Domain Analysis Concepts in Software System Design Domain. In Papadopoulos, G.A., et al. (eds.), Information System Development: Design and Development. Springer.
  8. Damaševicius, R., Štuikys, V., Toldinas, E., 2008. Domain Ontology-Based Generative Component Design Using Feature Diagrams and Meta-Programming Techniques. Proc. of 2nd European Conference on Software Architecture ECSA 2008. LNCS 5292, pp. 338-341. Springer-Verlag, 2008.
  9. Davulcu, H., Vadrevu, S., Nagarajan, S., 2003. OntoMiner: Bootstrapping and Populating Ontologies From Domain Specific Web Sites. First Int. Workshop on Semantic Web and Databases, Berlin, Germany.
  10. Degeratu, M., Hatzivassiloglou, V., 2002. Building automatically a business registration ontology. Proc. of the 2nd National Conf. on Digital Government Research. DG.O.
  11. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderl, S., Weld, D.S., Yates, E., 2004. Methods for domain-independent information extraction from the web: An experimental comparison. Proc. of AAAI Conference, pp. 391-398.
  12. Fellbaum, C., 1998. WordNet: An Electronic Lexical Database. MIT Press.
  13. Fernández-López, M., Gómez-Pérez, A., Juristo, N., 1997. Methontology: From Ontological Art Towards Ontological Engineering. Spring Symposium on Ontological Engineering of AAAI. Stanford University, CA, USA.
  14. Finkelstein-Landau, M., Morin, E., 1999. Extracting Semantic Relationships between Terms: Supervised vs Unsupervised Methods. Proc. of Int. Workshop on Ontological Engineering on the Global Information Infrastructure, Dagstuhl Castle, Germany.
  15. Fromkin, V., Rodman, R., 2006. Introduction to Language. Wadsworth Publishing, 8 edition.
  16. Joachims, T., 2008. SVMlight: Support Vector Machine. Web site: http://svmlight.joachims.org/
  17. Kashyap, V., Ramakrishnan, C., Thomas, C., Sheth, A., 2005. TaxaMiner: an experimentation framework for automated taxonomy bootstrapping. Int. Journal of Web and Grid Services, 1(2), pp. 240-266.
  18. Maedche, A., Staab S., 2004. Ontology Learning. In Staab, S., Studer R. (Eds.), Handbook on Ontologies. International Handbooks on Information Systems. Springer, pp. 173-190.
  19. Maedche, E., Staab, S., 2000. Discovering conceptual relations from text. Proc. of 13th European Conf. on Artificial Intelligence, ECAI-2000. IOS Press, pp. 321- 325.
  20. Nakayama, K., 2008. Extracting Structured Knowledge for Semantic Web by Mining Wikipedia. Proc. of the 7th Int. Semantic Web Conference (ISWC 2008), Karlsruhe, Germany, October 28.
  21. OWL. Web Ontology Language. W3C. Web site: http://www.w3c.org/TR/owl-features/.
  22. Ponzetto, S.P., Strube, M., 2007. Deriving a large scale taxonomy from Wikipedia. Proc. of the 22nd Conf. on the Advancement of Artificial Intelligence, Vancouver, B.C., Canada, 22-26 July 2007, pp. 1440-1445.
  23. Potrich, A., Pianta, E., 2008. L-ISA: Learning Domain Specific Isa-Relations from the Web. Proc. of 6th Int. Language Resources and Evaluation Conference LREC 2008, Marrakech, Morocco, 28-30 May.
  24. Protégé 2.1. Web site: http://protege.stanford.edu/
  25. Roitman, H., Gal, A., 2006. OntoBuilder: Fully Automatic Extraction and Consolidation of Ontologies from Web Sources Using Sequence Semantics. In Current Trends in Database Technology, Munich, Germany, March 26-31. Springer LNCS 4254, pp. 573-576.
  26. Sánchez, D., Moreno, A., 2004. Automatic Generation of Taxonomies from the WWW. Proc. of the 5 th Int. Conf. on Practical Aspects of Knowledge Management (PAKM 2004). LNAI. Springer, pp. 208-219.
  27. Sanderson, M., Croft, B., 1999. Deriving concept hierarchies from text. Proc. of 22nd Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206-213. SIGIR.
  28. Sombatsrisomboon, R., Matsuo, Y., Ishizuka, M., 2003. Acquisition of hypernyms and hyponyms from the WWW. Proc. of 2nd Int. Workshop on Active Mining.
  29. Štuikys, V., Damaševicius, R., Brauklyte, I., Limanauskiene, V., 2008. Exploration of Learning Object Ontologies Using Feature Diagrams. Proc. of World Conference on Educational Multimedia, Hypermedia & Telecommunications (ED-MEDIA 2008), pp. 2144-2154. Chesapeake, VA: AACE.
  30. Suryanto, H., Compton, P. 2000. Learning classification taxonomies from a classification knowledge based system. Proc. of ECAI'2000 Workshop on Ontology Learning OL'2000, Berlin, Germany, August 25. CEUR Workshop Proceedings 31, pp. 1-6.
  31. Welty, C.A., Guarino, N., 2001. Supporting ontological analysis of taxonomic relationships. Data Knowledge Engineering, 39 (1), pp. 51-74, 2001.
  32. Wikipedia, 2008. List of colors. Web site: http://en.wikipedia.org/wiki/List_of_colors
  33. Zhang, D., 2004. Web taxonomy integration using support vector machines. Proc. of the 13th Int. Conf. on World Wide Web WWW 7804, pp. 472-481. ACM Press.
Download


Paper Citation


in Harvard Style

Damaševičius R. (2009). AUTOMATIC GENERATION OF CONCEPT TAXONOMIES FROM WEB SEARCH DATA USING SUPPORT VECTOR MACHINE . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 666-673. DOI: 10.5220/0001842206660673


in Bibtex Style

@conference{webist09,
author={Robertas Damaševičius},
title={AUTOMATIC GENERATION OF CONCEPT TAXONOMIES FROM WEB SEARCH DATA USING SUPPORT VECTOR MACHINE},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={666-673},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001842206660673},
isbn={978-989-8111-81-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - AUTOMATIC GENERATION OF CONCEPT TAXONOMIES FROM WEB SEARCH DATA USING SUPPORT VECTOR MACHINE
SN - 978-989-8111-81-4
AU - Damaševičius R.
PY - 2009
SP - 666
EP - 673
DO - 10.5220/0001842206660673