Authors: F. Jani and A.H. Pilevar

Affiliation: Bu Ali Sina University, Iran, Islamic Republic of

ISBN: 978-989-8565-19-8

Keyword(s): Thesaurus, Word Sense Disambiguation, Persian Language Corpus.

Related Ontology Subjects/Areas/Topics: Context ; Context Aggregation and Inference ; Context Analysis ; Context Design ; Context Formalization ; Context Identification ; Context Representation ; Domain-Specific Languages ; Models ; Paradigm Trends ; Software Engineering

Abstract: This paper seeks to elaborate on the disambiguation of Persian words with the same written form but different senses using a combination of supervised and unsupervised method which is conducted by means of thesaurus and corpus. The present method is based on a previously proposed one with several differences. These differences include the use of texts which have been collected by supervised or unsupervised method. In addition, the words of the input corpus were stemmed, and in the case of those words whose different senses have different roles in the sentence, the role of the word in the input sentence was considered for disambiguation. Applying this method to the selected ambiguous words from “Hamshahri”, which is a standard Persian corpus, we achieved to a satisfactory accuracy of 97 percent in the results, and evaluated the presented method as a better and more efficient in comparison with the similar methods.

