
2 RELATED PRELIMINARY 
WORK 
Although there have been some researches exploring 
on ontology, most of them focused on using specific 
ontology to assist their work, rather than on building 
ontology. On the other hand, other researches (Trent, 
2002, Rowena, 2005, Dave 2001, Sin-Jae, 2001, 
Yan-Hwang, 2005, Alexander, 2000, Riichiro, 2003, 
Thanh Tho, 2006, Prieto-Diaz, 2003, Yuri A., 2003 
and Ju-in Youn, 2004) addressed building ontology. 
They could be classified into two categories in 
building ontology (strictly speaking, some of them 
are just to propose a schema of object entities). The 
first one is to classify documents into their domain 
based on key terms which are organized by several 
words in documents (Florian, 2002, Dave, 2001, 
Weipeng, 2001, Yin-Fu, 2007, Thanh Tho, 2006 and 
Ju-in, 2004). The other one is to classify keywords 
to construct a taxonomy structure based on 
belonging documents, thesauri, or pre-built ontology 
(Trent, 2002, Rowena, 2005, Sin-Jae, 2001, Yan-
Hwang, 2005, Alexander, 2000, Prieto-Diaz, 2003, 
Vaclav, 2005 and Yuri A., 2003). 
Youn et al. (Ju-in Youn, 2004) first constructed 
the ontology by fuzzy function and relations, and 
then classifies documents based on this ontology. In 
fact, the ontology constructed here is just a word 
relation tree similar to that proposed (Yin-Fu Huang, 
2007). Besides, two papers (Florian, 2002 and Yin-
Fu, 2007) also provide schemas of documents, and 
the classification on documents has the same 
characteristics, since each cluster of documents (or 
each tree node in word relation tree) implies the 
same term feature. However, their methodologies 
are different where one is how to select term features 
to do clustering, and another is how to stretch the 
current level to the next one. 
Since building ontology is so tremendous, it 
should be maintained incrementally, rather than 
building from scratch. Some learning techniques to 
refine the built ontology were proposed (P. Buitelaar, 
2005, Asunción, 2003 and Alexander, 2001), and 
even general relationship learning (not focusing on 
Is-A or Parts-of  relationships) has been discussed 
(M. Kavalec, 2004, David, 2006 and A. Schutz, 
2005). In our framework, new incremental 
documents could be imported periodically, and then 
the learning process uses them to refine word 
relationships in the same way. 
2.1  Key Terms for Generating 
Ontology 
Term-Document-Matrix (TDM) records the 
frequency that each key term appears in documents, 
and it is also called weighted word histogram 
(Weipeng, 2001). Key terms and documents are two 
dimensions in TDM. If we take the dimension of 
documents as our classified target, key terms can be 
viewed as feature (Florian, 2002, Dave, 2001, 
Weipeng, 2001 and Teuvo, 2000), and vice versa. 
Usually, it is necessary to build ontology to present 
the overall context structure on web pages. Tijerino 
et al. developed an information-gathering engine, 
TANGO, to exploit tables and filled-in forms to 
generate domain-specific ontology (Yuri A., 2003). 
In our framework, TDM is treated as the implicit 
feature to evaluate word correlations. 
FOLDOC (http://foldoc.org/) is an online 
computing dictionary, in which each keyword and 
its relatives are tagged to show their relationships. 
Apted and Kay followed its original relationships 
between words, and transferred the whole keywords 
in the dictionary into a clear relation graph of 
keywords (Trent Apted, 2002). Although it has 
stored about 14,000 computing terms till now, many 
computing terminologies are not yet stored inside. 
2.2  Features of Key Terms 
Besides the documents as the input source, 
additional dictionaries are required to build ontology 
(Sin-Jae, 2001 and Alexander, 2000). The features 
of key terms retrieved from documents and 
dictionaries help to build ontology, which could be 
generalized as three kinds; i.e., document vectors, 
sememes, and the meaning coming from 
dictionaries. 
Sememes are defined as the smallest basic 
semantic unit in HowNet (K. W. Gan, 2002). Some 
papers (Yi, 2002 and Yan-Hwang, 2005) took 
sememes as feature roles to do further processing. 
However, many computing terms are special 
terminologies, the meanings of which could be 
different from their original words. Thus, viewing 
sememes in computing terms as features could not 
be feasible here. Finally, since FOLDOC does not 
have enough computing terms for our work, the 
instruction inside it is somewhat inadequate to 
provide further features. Therefore, we choose The 
Free Dictionary instead as the explicit feature 
provider. 
 
A FRAMEWORK AUTOMATING DOMAIN ONTOLOGY CONSTRUCTION
17