
 a list of ontology features for each database. 
Ontologies are domain knowledge, which can 
provide a single identifier for describing each 
concept or entity in a domain, even more, connect 
concepts with related meanings, therefore, ontology 
utility can drive data annotation and data integration. 
Some databases had adapted the ontology concept 
and provided access to a library of biomedical 
ontologies and terminologies. For examples, the 
Gene Ontology database (The Gene Ontology 
Consortium; Ashburner, 2000), BRENDA 
(Schomburg, 2004), TAIR (The Arabidopsis 
Information Resource; Swarbreck, 2008), the 
NCBI’s 
BioPortal (Musen, 2012), etc. The ontology of 
databases can be described as the relational schema 
of their tagged corpuses. In order to make the 
classification of mined databases in our BioMetaDB, 
we constructed our own ontology list in which 
specification and conceptualization define the 
ontology purpose and provide the vocabulary, 
relationships, and concepts for ontology design. The 
ontological hierarchies and child-parent 
relationships (PART_OF/IS_A) were established to 
develop the domain ontology and sub-ontologies for 
further use in implementation. Except the database 
content, we also inferred the ontologies from other 
groups such as the Gene Ontology database, 
BioPortal, the Open Biological and Biomedical 
Ontologies, the Proteomics Standards Initiative 
(Orchard, 2003), and the Consultative Group on 
International Agricultural Research. The relevance 
among the databases was calculated according to 
their ontology features, and the databases were then 
grouped into various categories. In our BioMetaDB, 
the species is indicated with the standard NCBI 
taxonomy database taxid. In order to support search 
in large, open and heterogeneous repositories of 
unstructured biomedical information, we needed to 
not only exploit deep levels of conceptualization of 
these databases, but also their corresponding 
publications and web site contents. 
2.2 Relevance Measurement for 
Classification of Databases 
We adapted the hierarchical classification and 
relevance measurement to categorize the databases. 
Firstly, we had indexed the database by their 
features, which were further used to evaluate the 
relevance between different databases. The feature 
index of each database also helped us to classify the 
database. For example, the databases A, B and C can 
be indexed as {human, transcription factor, 
sequence}, {yeast, transcription factor}, and 
{human, transcription factor binding site} 
respectively. The databases A and C belong to the 
“human” category, and the database B belongs to the 
“yeast” category. Once the users propose the query 
as “human”, they will obtain the results as databases 
A and C. If the query is “human” plus “transcription 
factor”, the output will be the database A. The goal 
of the present work is to determinate the relevance 
between each database pairs where each database 
contains multiple biological features, for example, 
the study species and the focused biological issue. In 
the bag of indexes vector of each database, the 
database was represented by vector in N-
dimensional space where N represented the total 
number of feature indexes. For the relevance 
calculation, we had inferred the previous database 
classification method (Wu, 2005). Once two 
databases share a significant number of feature 
items, they were relevant to each other. For example, 
we extracted the feature items of individual 
database, such as A, B, C. Three databases were 
presented as follows: D1= {A, B, C} D2= {A, C, 
D, E}. The similarity S between the items of two 
databases can be defined as,  
(Item (D1) ∩Item (D2))⁄ (Item (D1) ∪Item (D2)) = S 
Thus the relevance among various biomedical 
databases can be measured. The significance of the S 
value presents the high relevance between databases. 
2.3  Database and Query 
Implementation 
We present an ontology-based multi database 
classification and extension. The BioMetaDB is 
curated by the authors and regularly updated (Fig. 
1). Figure 1 is the workflow of our BioMetaDB 
establishment. Generation of web pages was 
implemented using the PHP server-side scripting 
language for obtaining data and maintaining sessions 
between web pages. The MySQL relational database 
management system was used for storing the 
biodatabase information in a structured manner. 
BioMetaDB (http://cbs.ym.edu.tw/services/BMdb/) 
provides versatile search functions with multi-source 
multi-category searching through ontologies and 
through researchers’ own keywords. Searching is 
possible in the Web and dedicated collections, and 
query results can be retrieved. A range of ontologies 
can be used without assuming annotation of 
databases. BioMetaDB offers a databases analysis 
function through online query biased summarization 
of individual databases and category sets. The 
summarization criteria can be flexibly changed.  
BioMetaDB:Ontology-basedClassificationandExtensionofBiodatabases
157