Authors:
Mujiono Sadikin
1
;
Adi Trisnojuwono
2
;
Rudiansyah
2
and
Toni Widodo
3
Affiliations:
1
Faculty of Computer Science, Universitas Bhayangkara Jakarta Raya, Kota Bekasi, Indonesia
;
2
Deputy of Enterpreneurship, Ministry of Coperation and Small Medium Enterprise, Jakarta, Indonesia
;
3
Mitreka Solusi Indonesia, Jakarta, Indonesia
Keyword(s):
MSME, Data Quality, Data Cleansing, Attribute Domain Constraint, Relation Integrity Rule.
Abstract:
As mandated by the laws and regulations that have been released, the Government of Indonesia has decided that Cooperation and MSME empowerment policies must be determined based on accurate Cooperation and MSME data profiles. Therefore, the Government of Indonesia, in this case the Ministry of Cooperation SMEs, executes a complete data collection program of Cooperation and MSME profile. Due to the characteristics and constraints of data collection, many risks must be mitigated. The main risk identified in this program is the possibility of reduced data quality for Cooperation and MSME caused by some factors. This paper presents a proposed comprehensive framework to ensure the quality of Cooperation and MSME data based on Khan’s data quality criteria previously defined. The aim of the proposed framework is to prevent, detect, repair, and recover dirty data to achieve the required minimum standards of data quality. The proposed framework covers all stages and aspects of the data collect
ion process. In the data cleaning and correction stage, we investigate on many techniques namely rule based, selection based, and machine learning based. In the initial validation of the framework presented in this paper, the results of several data cleansing methods applied are discussed.
(More)