IMPROVING DATA QUALITY IN DATA WAREHOUSING APPLICATIONS

Lin Li; Taoxin Peng; Jessie Kennedy

doi:10.5220/0002903903790382

IMPROVING DATA QUALITY IN DATA WAREHOUSING APPLICATIONS

Lin Li, Taoxin Peng, Jessie Kennedy

2010

Abstract

There is a growing awareness that high quality of data is a key to today’s business success and dirty data that exits within data sources is one of the reasons that cause poor data quality. To ensure high quality, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. However in practice, detecting and cleaning all the dirty data that exists in all data sources is quite expensive and unrealistic. The cost of cleaning dirty data needs to be considered for most of enterprises. Therefore conflicts may arise if an organization intends to clean their data warehouses in that how do they select the most important data to clean based on their business requirements. In this paper, business rules are used to classify dirty data types based on data quality dimensions. The proposed method will be able to help to solve this problem by allowing users to select the appropriate group of dirty data types based on the priority of their business requirements. It also provides guidelines for measuring the data quality with respect to different data quality dimensions and also will be helpful for the development of data cleaning tools.

References

Adelman, S., Moss, L., Abai, M. (2005). Data Strategy. Addison-Wesley Professional.
Elmagarmid, A. K., Ipeirotis, P. G., VeryKios, V. S. (2007). Duplicate Record Detection: A Survey. . IEEE Trans. on Knowl. and Data Eng. 19, 1-16.
Fox, C., Levitin, A., Redman, T. (1994). The notion of data and its quality of dimensions. Information Processing & Management., vol. 30, no. 1. pp. 9-19
Kim, W., Choi, B., Hong, E. Y., Kim, S. K., Lee, D. (2003). A taxonomy of dirty data. Data Mining and Knowledge Discovery, 7,81-99.
Kim, W. (2002). On three major holes in Data Warehousing Today. Journal of Object Technology, Vol.1, No.4.
Loshin, D. (2006). Monitoring Data Quality Performance Using Data Quality Metrics. Retrived January 10, 2010, from http://www.it.ojp.gov/documents/Informa tica_Whitepaper_Monitoring_DQ_Using_Metrics.pdf
Mong, L. (2000). IntelliClean: A knowledge-based intelligent data cleaner. Proceedings of the ACM SIGKDD, Boston, USA.
Müller, H., Freytag, J. C. (2003). Problems, Methods, and Challenges in Comprehensive Data Cleansing. Tech. Rep. HUB-1B-164
Oliveira, P., Rodrigues, F. T., Henriques, P., Galhardas, H. (2005). A Taxonomy of Data Quality Problems. Second International Workshop on Data and Information Quality (in conjunction with CAISE'05), Porto, Portugal.
Rahm, E., Do, H. (2000). Data Cleaning: Problems and Current Approaches. IEEE Bulletin of the Technical Committee on Data Engineering. vol.23, 41, No.2.
Wang, R. Y., Strong, D. M. (1996). Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems, 12, 4.

Download

Paper Citation

in Harvard Style

Li L., Peng T. and Kennedy J. (2010). IMPROVING DATA QUALITY IN DATA WAREHOUSING APPLICATIONS . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8425-04-1, pages 379-382. DOI: 10.5220/0002903903790382

in Bibtex Style

@conference{iceis10,
author={Lin Li and Taoxin Peng and Jessie Kennedy},
title={IMPROVING DATA QUALITY IN DATA WAREHOUSING APPLICATIONS},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2010},
pages={379-382},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002903903790382},
isbn={978-989-8425-04-1},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - IMPROVING DATA QUALITY IN DATA WAREHOUSING APPLICATIONS
SN - 978-989-8425-04-1
AU - Li L.
AU - Peng T.
AU - Kennedy J.
PY - 2010
SP - 379
EP - 382
DO - 10.5220/0002903903790382