Bootstrapping a Semantic Lexicon on Verb Similarities

Shaheen Syed, Marco Spruit, Melania Borit

Abstract

We present a bootstrapping algorithm to create a semantic lexicon from a list of seed words and a corpus that was mined from the web. We exploit extraction patterns to bootstrap the lexicon and use collocation statistics to dynamically score new lexicon entries. Extraction patterns are subsequently scored by calculating the conditional probability in relation to a non-related text corpus. We find that verbs that are highly domain related achieved the highest accuracy and collocation statistics affect the accuracy positively and negatively during the bootstrapping runs.

References

  1. Curran, J. R., Murphy, T., and Scholz, B. (2007). Minimising semantic drift with mutual exclusion bootstrapping. In In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, pages 172-180.
  2. Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61-74.
  3. Igo, S. P. and Riloff, E. (2009). Corpus-based semantic lexicon induction with web-based corroboration. In Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics, UMSLLS 7809, pages 18-26, Stroudsburg, PA, USA. Association for Computational Linguistics.
  4. Lenat, D. B. (1995). Cyc: A large-scale investment in knowledge infrastructure. Commun. ACM, 38(11):33- 38.
  5. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. J. (1990). Introduction to wordnet: An online lexical database. International journal of lexicography, 3(4):235-244.
  6. Pantel, P. and Ravichandran, D. (2004). Automatically labeling semantic classes. In HLT-NAACL, pages 321- 328.
  7. Phillips, W. and Riloff, E. (2002). Exploiting strong syntactic heuristics and co-training to learn semantic lexicons. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, EMNLP 7802, pages 125-132, Stroudsburg, PA, USA. Association for Computational Linguistics.
  8. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130-137.
  9. Qadir, A., Mendes, P. N., Gruhl, D., and Lewis, N. (2015). Semantic lexicon induction from twitter with pattern relatedness and flexible term length. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI'15, pages 2432-2439. AAAI Press.
  10. Qadir, A. and Riloff, E. (2012). Ensemble-based semantic lexicon induction for semantic tagging. In Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 7812, pages 199- 208, Stroudsburg, PA, USA. Association for Computational Linguistics.
  11. Riloff, E. (1996). Automatically generating extraction patterns from untagged text. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2, AAAI'96, pages 1044-1049. AAAI Press.
  12. Riloff, E. and Shepherd, J. (1997). A corpus-based approach for building semantic lexicons. In In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 117-124.
  13. Roark, B. and Charniak, E. (1998). Noun-phrase cooccurrence statistics for semiautomatic semantic lexicon construction. In Proceedings of the 17th International Conference on Computational Linguistics - Volume 2, COLING 7898, pages 1110-1116, Stroudsburg, PA, USA. Association for Computational Linguistics.
  14. Thelen, M. and Riloff, E. (2002). A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, EMNLP 7802, pages 214-221, Stroudsburg, PA, USA. Association for Computational Linguistics.
  15. Widdows, D. and Dorow, B. (2002). A graph model for unsupervised lexical acquisition. In Proceedings of the 19th International Conference on Computational Linguistics - Volume 1, COLING 7802, pages 1-7, Stroudsburg, PA, USA. Association for Computational Linguistics.
  16. Ziering, P., van der Plas, L., and Schütze, H. (2013a). Bootstrapping semantic lexicons for technical domains. In IJCNLP, pages 1321-1329. Asian Federation of Natural Language Processing / ACL.
  17. Ziering, P., van der Plas, L., and Sch ütze, H. (2013b). Multilingual lexicon bootstrapping - improving a lexicon induction system using a parallel corpus. In IJCNLP, pages 844-848. Asian Federation of Natural Language Processing / ACL.
Download


Paper Citation


in Harvard Style

Syed S., Spruit M. and Borit M. (2016). Bootstrapping a Semantic Lexicon on Verb Similarities . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 189-196. DOI: 10.5220/0006036901890196


in Bibtex Style

@conference{kdir16,
author={Shaheen Syed and Marco Spruit and Melania Borit},
title={Bootstrapping a Semantic Lexicon on Verb Similarities},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={189-196},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006036901890196},
isbn={978-989-758-203-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - Bootstrapping a Semantic Lexicon on Verb Similarities
SN - 978-989-758-203-5
AU - Syed S.
AU - Spruit M.
AU - Borit M.
PY - 2016
SP - 189
EP - 196
DO - 10.5220/0006036901890196