Utrecht University, Netherlands
University of Tromsø, Norway
Semantic Lexicon, Bootstrapping, Extraction Patterns, Web Mining.
Knowledge Discovery and Information Retrieval
Mining Text and Semi-Structured Data
Pre-Processing and Post-Processing for Data Mining
We present a bootstrapping algorithm to create a semantic lexicon from a list of seed words and a corpus that was mined from the web. We exploit extraction patterns to bootstrap the lexicon and use collocation statistics to dynamically score new lexicon entries. Extraction patterns are subsequently scored by calculating the conditional probability in relation to a non-related text corpus. We find that verbs that are highly domain related achieved the highest accuracy and collocation statistics affect the accuracy positively and negatively during the bootstrapping runs.