
study. 
In this paper, we propose an approach to the 
problems of change detection and warehouse main-
tenance for an XML web warehouse system. First, 
we propose an object-oriented data model for XML 
web pages in the web warehouse as well as system 
architecture for change detection and warehouse 
maintenance. Then, we propose a change detection 
method based on mobile agent technology to ac-
tively detect changes of data sources of the web 
warehouse. Finally, we propose an incremental and 
deferred maintenance method to maintain XML web 
pages in the web warehouse. We have implemented 
an experimental prototype system for change detec-
tion and maintenance of an XML web warehouse 
and have compared our approach with a rewriting 
approach by experiments. Performance evaluation 
shows that our approach is efficient in terms of the 
response time and storage space of the web ware-
house. 
The remainder of this paper is organized as fol-
lows. In Section 2 we illustrate the data model of the 
web warehouse and the system architecture for 
change detection and warehouse maintenance. In 
Section 3 we present the change detection method 
and algorithm. In Section 4 we present the ware-
house maintenance method and algorithm. Section 5 
illustrates the experimental results and states our 
observations from the experimental results. Section 
6 concludes this paper and gives some directions for 
future research.
 
2  DATA MODEL AND SYSTEM 
ARCHITECTURE 
2.1 Data Model 
Our web warehouse stores XML data from the Web. 
Hence, we propose a data model, called the XML 
Web Warehouse Data Model (XWWDM), for XML 
web pages in the web warehouse. Due to the hierar-
chical structure of an XML web page, we follow the 
Document Object Model (Apparao, 1998) to de-
compose an XML web page into a tree structure. 
Besides, the design of the XWWDM model is based 
on the OEM-like model (Chawathe, 1999) and con-
siders the characteristics of a web warehouse. First, 
a web warehouse is like a data warehouse in that it 
can store historical data. Therefore, the data model 
includes version information to keep track of the 
change of data. Second, data in a web warehouse are 
sourced from remote web sites. Therefore, the data 
model includes source information to identify the 
source of data. The XWWDM model is an ob-
ject-oriented model whose class definition is shown 
in Figure 1. 
 
class XML_Page {root: XML_Node, version: 
Version_Info, source: Source_Info}; 
class XML_Node 
{content: Node_Content, version: Version_Info}; 
class Node_Content {label: string, value: string, p-node: 
XML_Node, child#: integer, s-action: char}; 
class Version_Info 
{version#: integer, update-time: time}; 
class Source_Info 
{url: string, title: string}; 
class Update 
{content: Update_Content, source: Source_Info}; 
class Update_Content {label: string, value: string, 
p-node: XML_Node, detect-time: time, action: char}; 
Figure 1: The class definition of the XWWDM model 
 
An XML web page is represented as an object of 
the class XML_Page, which has three attributes root, 
version, and source. The attribute root records the 
root node of the tree structure of the web page. The 
attributes version and source record the newest ver-
sion information and source information of the web 
page, respectively. Each node of a web page is rep-
resented as an object of the class XML_Node, which 
has two attributes content and version. The attribute 
content records the content, position, and source 
action of a node. The attribute version records the 
version information of a node. The class 
Node_Content has five attributes label,  value, 
p-node, child#, and s-action. The attributes label and 
value record the tag label and data content of a node, 
respectively. The attributes p-node and child# record 
the parent node and child number under its parent, 
respectively. The attribute s-action records the 
source action causing the creation of a node, whose 
value is I (for insertion), D (for deletion), or M (for 
modification). The class Version_Info has two attrib-
utes  version# and update-time, which record the 
version number and time of last update, respectively. 
The class Source_Info has two attributes url and title, 
which record the URL and title of the source web 
page, respectively. 
We adopt a change-centric approach to storage 
of all versions of an XML web page. Only the first 
version is completely stored. For subsequent ver-
sions, only deltas are stored. As shown in Figure 2, 
all frames represent the same web page, in which 
each frame represents a specific version at time Ti. 
The first frame represents the first version, in which 
all nodes of a web page are stored. Other frames 
represent subsequent versions, in which only nodes 
that are changed are stored. The number and letter 
drawn by a node are the child number and source 
CHANGE DETECTION AND MAINTENANCE OF AN XML WEB WAREHOUSE
53