Evaluating the Word Sense Disambiguation Accuracy with Three Different Sense Inventories

Dan Tufiş, Radu Ion

Abstract

Comparing performances of word sense disambiguation systems is a very difficult evaluation task when different sense inventories are used and, even more difficult when the sense distinctions are not of the same granularity. The paper substantiates this statement by briefly presenting a system for word sense disambiguation (WSD) based on parallel corpora. The method relies on word alignment, word clustering and is supported by a lexical ontology made of aligned wordnets for the languages in the corpora. The wordnets are aligned to the Princeton Wordnet, according to the principles established by EuroWordNet. The evaluation of the WSD system was performed on the same data, using three different granularity sense inventories.

References

  1. Fellbaum, Ch. (ed.) WordNet: An Electronic Lexical Database, MIT Press (1998).
  2. Tufis, D., Cristea, D., Stamou, S., BalkaNet: Aims, Methods, Results and Perspectives. A General Overview. In: D. Tufis (ed): Special Issue on BalkaNet. Romanian Journal on Science and Technology of Information, Vol. 7 no. 3-4 (2004) 9-44.
  3. Ide, N., Veronis, J., Introduction to the special issue on word sense disambiguation. The state of the art. Computational Linguistics, Vol. 27, no. 3, (2001) 1-40.
  4. Stevenson, M., Wilks, Y., The interaction of Knowledge Sources in Word Sense Disambiguation. Computational Linguistics, Vo. 24, no. 1, (1998) 321-350.
  5. Peters, W., Vossen, P., Diez-Orzas, P., Adriaens, G., Cross-Linguistic Alignment of wordnets with an Inter-Lingual-Index. In P. Vossen (Ed.): Special Issue on EuroWordNet. Computers and the Humanities, Vol. 32, no.2-3 (1998) 221-251.
  6. Niles, I., and Pease, A., Towards a Standard Upper Ontology. In Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Ogunquit, Maine, (2001) 17-19.
  7. Magnini B. Cavaglià G., Integrating Subject Field Codes into WordNet. In Proceedings of LREC2000, Athens, Greece (2000) 1413-1418.
  8. Erjavec, T, MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora, in Proceedings of LREC2004, Lisbon (2004) 1535-1538.
  9. Tufis, D, Ion, R., Ide, N., Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets, in Proceedings of the 20th International Conference on Computational Linguistics, COLING2004, Geneva, (2004) 1312-1318.
  10. 10. Gale, W.A. and Church, K.W. (1991). Identifying word correspondences in parallel texts. Fourth DARPA Workshop on Speech and Natural Language. Asilomar, CA, pp. 152- 157.
  11. Smadja, F., McKeown, K.R., and Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational inguistics, 22(1):1-38.
  12. Tufis, D., A cheap and fast way to build useful translation lexicons. In Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002, Taipei (2002) 1030-1036.
  13. Tufis, D., Barbu, A., M., Ion, R. A word-alignment system with limited language resources. In Proceedings of the NAACL 2003 Workshop on Building and Using Parallel Texts; Romanian-English Shared Task, Edmonton (2003) 36-39.
  14. 14. Tufis, D., Tiered Tagging and Combined Classifiers, in F. Jelinek, E. Nöth (eds) Text, Speech and Dialogue, Lecture Notes in Artificial Intelligence, Vol. 1692. SpringerVerlag, Berlin Heidelberg New-York (1999) 28-33.
  15. 15. A. Budanitsky and G. Hirst, Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. Proceedings of the Workshop on WordNet and Other Lexical Resources, Second meeting of the NAACL, Pittsburgh, June, (2001) 29-34.
  16. Stolcke, A. Cluster 2.9. http://www.icsi.berkeley.edu/ftp/global/pub/ai/stolcke/software/ cluster-2.9.tar.Z (1996).
  17. 17. Ide, N., Erjavec, T., Tufis, D.: „Sense Discrimination with Parallel Corpora” in Proceedings of the SIGLEX Workshop on Word Sense Disambiguation: Recent Successes and Future Directions. ACL2002, July Philadelphia 2002, pp. 56-60 18. Ide, N., Bonhomme, P., Romary, L., XCES: An XML-based Standard for Linguistic Corpora. In Proceedings of LREC2000, Athens, Greece (2000) 825-30.
  18. Och, F., J., Ney, H., Improved Statistical Alignment Models, Proceedings of ACL2000, Hong Kong, China, 440-447, 2000.
Download


Paper Citation


in Harvard Style

Tufiş D. and Ion R. (2005). Evaluating the Word Sense Disambiguation Accuracy with Three Different Sense Inventories . In Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005) ISBN 972-8865-23-6X, pages 118-127. DOI: 10.5220/0002560601180127


in Bibtex Style

@conference{nlucs05,
author={Dan Tufiş and Radu Ion},
title={Evaluating the Word Sense Disambiguation Accuracy with Three Different Sense Inventories},
booktitle={Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)},
year={2005},
pages={118-127},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002560601180127},
isbn={972-8865-23-6X},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)
TI - Evaluating the Word Sense Disambiguation Accuracy with Three Different Sense Inventories
SN - 972-8865-23-6X
AU - Tufiş D.
AU - Ion R.
PY - 2005
SP - 118
EP - 127
DO - 10.5220/0002560601180127