
 
queries based on previous knowledge about the data 
to be queried, considering quality dimensions such 
as completeness, timeliness and accuracy. The 
described approach, however, does not use XML as 
the canonical data model and does not address 
physical algebraic query plan implementation issues. 
8  CONCLUSIONS AND FUTURE 
WORK 
With the ubiquitous growth, availability, and usage 
of data on the web, addressing data quality 
requirements in connection with web queries is 
emerging as a key priority for database research 
(Gertz, M., Ozsu, T., Saake, G., and Sattler, K., 
2003). There are two established approaches for 
addressing data quality issues relating to web data: 
data warehouse-based, where relevant data is 
reconciled, cleansed and warehoused prior to 
querying; and mediator-based where quality metrics 
and thresholds relating to cooperative web data 
sources are evaluated “on the fly” at query 
processing and execution time. In this paper we 
illustrate the query processing extensions being 
engineered into the Niagara internet query system to 
support mediator-based quality aware query 
processing for the completeness data quality 
dimension. We are also addressing the timeliness 
dimension (Sampaio, S. F. M., Dong, C., and 
Sampaio, P. R. F, 2005) and extending SQL with 
data quality constructs to express data quality 
requirements (Dong, C., Sampaio, S. F. M., and 
Sampaio, P. R. F., 2006). The data quality aware 
query processing extensions encompass metadata 
support, an XML-based data quality measurement 
method, algebraic query processing operators, and 
query plan structures of a query processing 
framework aimed at helping users to identify, assess, 
and filter out data regarded as of low completeness 
data quality for the intended use. As future plans we 
intend to incorporate accuracy data quality support 
into the framework and benchmark the quality/cost 
query optimiser in connection with a health care 
application (Dong, C., Sampaio, S. F. M., and 
Sampaio, P. R. F., 2005). 
REFERENCES 
Naughton, J., DeWitt, D., Maier, D., et al, 2001. The 
Niagara Internet Query System. IEEE Data Eng. Bull. 
24(2), 27-33.  
Olson, J., 2003. Data Quality: the Accuracy Dimension, 
Morgan Kauffmann. 1st edition. 
http://www.rcuk.ac.uk/escience. The UK e-Science 
Programme. 
Wiederhold, G., 1992. Mediators in the Architecture of 
Future Information Systems. IEEE Computer 25(3).  
Helfert, M., and E. von, Maur, 2001. A Strategy for 
Managing Data Quality in Data Warehouse Systems. 
In Proc. of Information Quality, 62-76.  
Wang, R., and S. E., Madnick, 1989. The Inter-Database 
Instance Identification Problem in Integrating 
Autonomous Systems. Proc. of ICDE, 46-55.  
Wang, R. Y., Reddy, M. P., and Kon, H. B., 1995. Toward 
Quality Data: An Attribute-Based Approach. Decision 
Support Systems, 13(3-4), 349-372.  
Sampaio, S. F. M., Dong, C., and Sampaio, P. R. F, 2005. 
Incorporating the Timeliness Quality Dimension in 
Internet Query Systems. WISE 2005 Workshops, 
LNCS 3807, 53-62.  
Dong, C., Sampaio, S. F. M., and Sampaio, P. R. F., 2006. 
Expressing and Processing Timeliness Quality Aware 
Queries: The DQ2L Approach. International 
Workshop on Quality of Information Systems, ER 
2006 Workshops, LNCS 4231, 382-392.  
Naumann, F., Lesser, U., and Freytag, J., 1999. Quality-
driven Integration of Heterogeneous Information 
Systems. In Proc. of the 25th VLDB, 447-458.  
Mecella, M., Scannapieco, Et. Al.. The DaQuinCIS 
Broker: Querying Data and Their Quality in 
Cooperative Information Systems. LNCS 2800. 
Dong, C., Sampaio, S. F. M., and Sampaio, P. R. F., 2005. 
Building a Data Quality Aware Internet Query System 
for Health Care Applications. In Proceedings of IRMA 
Conference - Databases Track, San Diego, USA. 
Graefe, G., 1996. Iterators, Schedulers, and Distributed-
memory Parallelism. In Software, Practice and 
Experience, 26(4), 427-452. 
Gertz, M., Ozsu, T., Saake, G., and Sattler, K., 2003. Data 
Quality on the Web. Germany, Dagstuhl Seminar.  
Pipino, L.L., Lee, Y.W. and Wang, R.Y., 2002. Data 
Quality Assessment. CACM(45),4 (virtual extension). 
A COMPLETENESS-AWARE DATA QUALITY PROCESSING APPROACH FOR WEB QUERIES
239