
demanding for IR systems and we intend to conduct 
these experiments in the near future. 
7 CONCLUSIONS 
The notion of Entropy on Ontology, introduced 
above, involves a topology of entities in a 
topological space. This feature was realized through 
a weight extension on the semantic similarity cover 
as a connected component on ontology and can be 
used as a pattern to similarly define entropy for 
entities from other topological spaces to formalize 
some semantics like similarity, closeness, or 
correlation between entities. This new notion can be 
used to measure information in a message or 
collection of entities when we know weights of 
entities that compose a message and, in addition, 
how entities “semantically” relate to each other in a 
topological space.  
The quality of the presented algorithm that 
allows us to estimate Entropy on Ontology and the 
State of the Document depends entirely on the 
correctness and sufficiency of the hierarchical 
thesaurus on which it is based. As mentioned earlier, 
there are many thesauruses and their maintenance 
and evolution are vital for the proper functioning of 
such algorithms. The world also has acquired a great 
deal of knowledge in different forms, like 
dictionaries, and it is very important to convert them 
into a hierarchy to be used for the proper 
interpretation of texts that contain special topics. 
The minimum that defines Entropy on Ontology 
and the State of the Document may not be unique or 
there may be multiple local minima. For developing 
approximations it is important to find conditions on 
ontology or terms topology under which the 
minimum is unique. 
 Current release of AIAS uses MeSH Descriptors 
vocabulary and WordWeb Pro general purpose 
thesaurus in electronic form to select terms from 
ontology using words from a document. Many 
misunderstandings of documents by AIAS that were 
automatically caught were the result of 
insufficiencies of these sources when processing 
MEDLINE abstracts. The next release will integrate 
the whole MeSH thesaurus, Descriptors, Qualifiers, 
and Supplementary Concept Records, to make AIAS 
more educated regarding the subject of chemistry. 
Also, any additional thesaurus made available 
electronically would be integrated into AIAS. 
The algorithm that was presented in Section 5 
was only tested on the MEDLINE database and 
MeSH ontology. Its implementation does not depend 
on a particular indexing thesaurus or ontology and it 
would be interesting to try it on other existing text 
corpora and appropriate ontology such as WordNet 
(http://wordnet.princeton.edu) or others. 
REFERENCES 
Agrawal, R., Chakrabarti, S., Dom, B.E., Raghavan, P., 
2001. Multilevel taxonomy based on features derived 
from training documents classification using fisher 
values as discrimination values. United State Patent 
6,233,575. 
Aronson, A.R., Mork, J.G., Gay, C.W., Humphrey, S.M., 
Rogers, W.J., 2004. The NLM indexing initiative’s 
Medical Text Indexer, Stud Health Technol Inform 
107 (Pt 1), pp. 268–272. 
Calmet, J., Daemi, A., 2004. From entropy to ontology. 
Fourth International Symposium "From Agent Theory 
to Agent Implementation", R. Trappl, Ed., vol. 2, pp. 
547 – 551. 
Cho, M., Choi, C., Kim, W., Park, J., Kim, P., 2007. 
Comparing Ontologies using Entropy. 2007 
International Conference on Convergence Information 
Technology, Korea, 873-876. 
Grobelnik, M., Brank, J., Fortuna, B., Mozetič, I., 2008. 
Contextualizing Ontologies with OntoLight: A 
Pragmatic Approach. Informatica 32, 79–84. 
Guseynov, Y., 2009. XML Processing. No Parsing. 
Proceedings WEBIST 2009 - 5th   International 
Conference on Web Information Systems and 
Technologies, INSTICC, Lisbon, Portugal, pp. 81 – 
84. 
Klein, D., Manning, C.D., 2003. Accurate Unlexicalized 
Parsing.  Proceedings of the 41st Meeting of the 
Association for Computational Linguistics, pp. 423-
430. 
Lee, J.H., Kim, M.H., Lee, Y.J., 1993. Information 
retrieval based on conceptual distance in IS-A 
hierarchies. Journal of Documentation, 49(2):188-207, 
June. 
Lindberg, D.A.B., Humphreys, B.L., McCray, A.T., 1993. 
The Unified Medical Language System. Methods of 
Information in Medicine, 32(4): 281-91. 
Manning, C.D., Schütze, H., 1999. Foundations of 
Statistical Natural Language Processing. The MIT 
Press. 
Manning, C. D., Raghavan, P., Schütze, H., 2008. 
Introduction to Information Retrieval. Cambridge 
University Press.  
Medelyan, O., Witten, I.H., 2006a. Thesaurus Based 
Automatic Keyphrase Indexing. JCDL’06, June 11–
15, Chapel Hill, North Carolina, USA. 
Medelyan, O., Witten, I.H., 2006b. Measuring Inter-
Indexer Consistency Using a Thesaurus. JCDL’06, 
June 11–15, Chapel Hill, North Carolina, USA. 
MEDLINE
®
, Medical Literature, Analysis, and Retrieval 
System Online. http://www.nlm.nih.gov/databases/ 
databases_medline.html. 
ENTROPY ON ONTOLOGY AND INDEXING IN INFORMATION RETRIEVAL
565