RELATIONAL SAMPLING FOR DATA QUALITY AUDITING AND DECISION SUPPORT

Bruno Cortes, José Nuno Oliveira

2004

Abstract

This paper presents a strategy for applying sampling techniques to relational databases, in the context of data quality auditing or decision support processes. Fuzzy cluster sampling is used to survey sets of records for correctness of business rules. Relational algebra estimators are presented as a data quality-auditing tool.

References

  1. T. Andreasen, H. Christiansen and H. Larsen, “Flexible Query Answering Systems”, ISBN 0-7923-8001-0, Kluwer Academic Publishers, 1997
  2. J. Bisbal and J. Grimson, “Generalising the Consistent Database Sampling Process”. ISAS Conference, 2000
  3. Bruno Cortes, “Amostragem Relacional”, MsC. Thesis, University of Minho, 2002
  4. P. Hass, J. Naughton et al., “Sampling-Based Estimation of the Number of Distinct Values of an Attribute”, 21st VLDB Conference, 1995
  5. Peter Haas and Arun Swami, “Sequential Sampling Procedures for Query Size Optimization”, ACM SIGMOD Conference, 1992
  6. L. Kaufman and P. Rousseeuw, “Finding Groups in Data - An Introduction to Cluster Analysis”, Wiley & Sons, Inc, 1990
  7. R. Lipton, J. Naughton et al., “Practical Selectivity Estimation through Adaptative Sampling”, ACM SIGMOD Conference, 1990
  8. F. Neves, J. Oliveira et al., “Converting Informal Metadata to VDM-SL: A Reverse Calculation Approach”, VDM workshop FM'99, France, 1999.
  9. José N. Oliveira, “SETS - A Data Structuring Calculus and Its Application to Program Development”, UNU/IIST, 1997
  10. Frank Olken, “Random Sampling from Databases”, PhD thesis, University of California, 1993
  11. J. Ranito, L. Henriques, L. Ferreira, F. Neves, J. Oliveira. “Data Quality: Do It Formally?” Proceedings of IASTED-SE'98, Las Vegas, USA, 1998.
  12. A. Shlosser, “On estimation of the size of the dictionary of a long text on the basis of sample”, Engineering Cybernetics 19, pp. 97-102, 1981
  13. Sun, Ling et at., “An Instant Accurate Size Estimation Method for Joins and Selection in a Retrieval-Intense Environment”, ACM SIGMOD Conference, 1993
  14. Hannu Toivonen, “Sampling Large Databases for Association Rules”, 22nd VLDB Conference, 1996
Download


Paper Citation


in Harvard Style

Cortes B. and Nuno Oliveira J. (2004). RELATIONAL SAMPLING FOR DATA QUALITY AUDITING AND DECISION SUPPORT . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 972-8865-00-7, pages 376-382. DOI: 10.5220/0002630403760382


in Bibtex Style

@conference{iceis04,
author={Bruno Cortes and José Nuno Oliveira},
title={RELATIONAL SAMPLING FOR DATA QUALITY AUDITING AND DECISION SUPPORT},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2004},
pages={376-382},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002630403760382},
isbn={972-8865-00-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - RELATIONAL SAMPLING FOR DATA QUALITY AUDITING AND DECISION SUPPORT
SN - 972-8865-00-7
AU - Cortes B.
AU - Nuno Oliveira J.
PY - 2004
SP - 376
EP - 382
DO - 10.5220/0002630403760382