
into XML, including their contributions and 
drawbacks. Our DWG2XML method is presented in 
Section 3. Section 4 is the conclusions. 
2  OVERVIEW OF THE EXISTING 
APPROACHES 
2.1 Reverse engineering approach 
Alhajj (2003) presents a reverse engineering 
approach that extracts the entity-relationship (EER) 
schema from the relational schema. The concepts 
and mechanism provided contribute to legacy 
database maintenance, re-engineering or updating to 
another database technique. Based on the analysis of 
the relationships between tables in a legacy 
database, a relational intermediate directed (RID) 
graph consistent with the EER diagram is derived to 
express all possible unary, binary and nary 
relationships between the given relations. Then, it 
develops algorithms to eliminate the symmetry and 
transitivity in RID, if exist. It also identifies is-a 
links in the RID graph to deliver an optimized RID 
as the final outcome, which can be used to derive the 
XML schema. Such a conversion approach has been 
implemented by Wang, et al (2004). Then they 
translate the RID graph into XML schema in a 
process called forward engineering. A flat XML 
schema is automatically derived from the RID 
graph. Our DWG2XML approach can be easily seen 
as an extension to complete RID to nested XML 
schema translation; this is all described in Section 3. 
2.2 CoT and NeT 
Lee,  et al (2002a; 2002b; 2001) proposed an 
approach for creating both flat and nesting XML 
structures from the relational database schema. The 
Flat Translation (FT) converts each table into a flat 
element structure. The Nesting-based Translation 
(NeT) derives nested structures from a flat relational 
model by the use of the nest operator. This nest 
operator process is applied to a single table at a time 
and it can create nested structures only for non-
normalized tables in normalized databases. Net is 
useful to decrease data redundancy in non-fully 
normalized relational databases. But it only works 
on tables one by one and depends on the relational 
schema as well as the actual data stored in the 
database. 
Then Lee et al extended the nesting approach to 
multiple tables, using Constraints-based Translation 
(CoT) algorithm. It is one of the first approaches 
that deal with relationships. The source database 
contains several interconnected tables and based on 
the cardinality of the binary relationships, two types 
are identified one-to-one (1:1) and one-to-many 
(1:M). A directed Inclusion Dependency (IND) 
Graph of tables is created from which an empirical 
way to nest XML structures is identified. However, 
a table can only have one child. If there are more 
children relations for a particular parent table, these 
relationships are simulated by using reference key 
expression. 
2.3 ConvRel and Conv2XML 
Conv2XML and ConvRel are two algorithms 
proposed by Duta, et al (2004) for converting 
relational schema to XML Schema, focusing on 
preserving the source relationships and their 
structural constraints. 
ConvRel analyzes each type of relationship and 
determines a set of candidate XML structures 
capable of representing the analyzed relationship 
type. The possible XML structures are classified as 
Parent-Child, Child-Parent nested structures, flat 
structure using keyref references and combination 
nested with keyref structure. Those structures are 
filtered depending on criteria such as the nested and 
compact structure, and the size of XML data file. 
ConvRel classifies each type of possible relationship 
in the database into the best XML structure spot. But 
this approach only works with a single relationship 
at a time; it is not applicable for relationships 
involving more that two tables. 
Conv2XML algorithm extends ConvRel to 
create a nested structure for the entire database. It 
uses a graph representation that combines all 
structures discovered previously in ConvRel. In this 
graph, the vertices are tables and edges represent 
connections between tables as defined by ConvRel. 
Two categories of edges exist in this directed graph: 
1) full edges representing nested structures; and 2) 
dotted edges representing relationships for the 
reference key. The ConvRel algorithm is thereby 
transformed into the problem of discovering trees in 
a directed graph.  
Compared to the NeT and CoT approach, 
ConvRel and Conv2XML approach solved the unary 
relationship problem between tables. It also can 
present multiple tables as a tree structure. However, 
from the directed graph, there exist different nested 
tree structures. The method proposed by Duta et al 
is depth-first algorithm, which ends up with only 
one tree structure solution. As a result, DWG2XML 
as described in this paper is more comprehensive; it 
considers all possible tree structures instead. 
ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
20