Yevgeniy Guseynov


In this paper, we present a formalization of an Index Assignment process that was used against documents stored in a text database. The process uses key phrases or terms from a hierarchical thesaurus or ontology and is based on the new notion of entropy on ontology for terms and their weights that is an extension of the Shannon concept of entropy in Information Theory and the Resnik semantic similarity measure for terms on ontology. Introduced notion provides a measure of closeness or semantic similarity for a set of terms in ontology and their weights and allows creation of a clustering algorithm that constructively resolves index assignment task. The algorithm was tested on 30,000 documents randomly extracted from MEDLINE biomedicine database that are manually indexed by professional indexers. The main output from experiments shows that after all 30,000 documents were processed in seven topics out of ten the presented algorithm and human indexers have the same understanding of documents.


