Four key dimensions of data quality were
considered for this prototype: accuracy, consistency,
timeliness, and completeness. These dimensions are
crucial for an automated event-driven evaluation and
do not require any additional information (in contrast
to relevance for instance). As part of the background
analysis, we examined various data management
solutions and found that most providers pursue static
approaches that only perform analyses at fixed time
intervals and have limited scoring capabilities.
Our study describes the development of a data
quality scoring model that is based on a mathematical
model that enables a balanced assessment of data
quality by employing a weighted scoring method
across multiple specified criteria (such as correctness
and completeness). This offers a nuanced view of data
quality that many frameworks do not.
The contributions of our research are two-fold.
For practitioners, we provide a detailed description of
an artifact that aims to automate data quality scoring
and the labelling of data sets. Practitioners can use our
descriptions and findings to create custom solutions
in their environments. Moreover, they can use our
conceptual approach and evaluation results to raise
the awareness of data quality and initiate new
projects. Scientifically, we offer a design science
artifact that can inform further research and help
advance the fields of data quality and data
management (Hevner et al., 2004). Our research also
addresses calls for improving the communication of
data quality scores that several researchers made
(Geisler et al., 2022; Guggenberger et al., 2024) and
can support the future development and research on
data ecosystems.
In terms of limitations and future work, there are
multiple areas that could be improved. First, our
solution is currently focused on a limited number of
data quality metrics. By transforming the prototype
into a modular architecture and integrating additional
metrics, we could increase the data quality scoring
functionalities and offer a more profound data quality
label. Second, our evaluation is currently limited on
exemplary data sets. A more empirical and in-depth
evaluation is necessary to assess the applicability and
usefulness of our solution in real-world contexts.
Future work should, therefore, focus on applying the
solution on real-world data sets to identify areas for
improvement and future development.
REFERENCES
Alavi, M. (1984). An assessment of the prototyping
approach to information systems development.
Communications of the ACM, 27(6), 556–563.
https://doi.org/10.1145/358080.358095
Altendeitering, M., Dübler, S., & Guggenberger, T. M.
(2022). Data Quality in Data Ecosystems: Towards a
Design Theory: Findings from an Action Design
Research Project at Boehringer Ingelheim. AMCIS
2022 Proceedings.
Altendeitering, M., Guggenberger, T. M., & Möller, F.
(2024). A design theory for data quality tools in data
ecosystems: Findings from three industry cases. Data
& Knowledge Engineering, 153, 102333. https://
doi.org/10.1016/j.datak.2024.102333
Altendeitering, M., & Tomczyk, M. (2022). A Functional
Taxonomy of Data Quality Tools: Insights from
Science and Practice. Wirtschaftsinformatik 2022
Proceedings.
Amit, R., & Zott, C. (2001). Value creation in E‐business.
Strategic Management Journal, 22(6-7), 493–520.
https://doi.org/10.1002/smj.187
Borra, S. (2006). Consumer perspectives on food labels.
The American Journal of Clinical Nutrition, 83(5),
1235S. https://doi.org/10.1093/ajcn/83.5.1235S
Chankong, V., & Haines, Y. Y. (2008). Multiobjective
decision making: Theory and methodology. Dover Publ.
Gartner. (2024). Gartner Magic Quadrant for Augmented
Data Quality Solutions. https://www.gartner.com/en/
documents/5257863
Geisler, S., Vidal, M.‑E., Cappiello, C., Lóscio, B. F.,
Gal, A., Jarke, M., Lenzerini, M., Missier, P., Otto, B.,
Paja, E., Pernici, B., & Rehof, J. (2022). Knowledge-
Driven Data Ecosystems Toward Data Transparency.
Journal of Data and Information Quality, 14(1), 1–12.
https://doi.org/10.1145/3467022
Gori, M., Monfardini, G, Scarselli, F. (2005). A New Model
for Learning in Graph Domains.” Proceedings of the
IEEE International Joint Conference on Neural
Networks 2: 729–734
Guggenberger, T. M., Altendeitering, M., & Schlueter
Langdon, C. (2024). Design Principles for Quality
Scoring: Coping with Information Asymmetry of Data
Products. HICSS, 4526–4535.
Hallinan, D., Leenes, R., Gutwirth, S., & Hert, P. de (Eds.).
(2020). Computers, Privacy and Data Protection Ser:
vol. 12. Data protection and privacy: Data protection
and democracy. Hart. https://ebookcentral.proquest.
com/lib/kxp/detail.action?docID=6160332
Hercberg, S., Touvier, M., & Salas-Salvado, J. (2022). The
Nutri-Score nutrition label. International Journal for
Vitamin and Nutrition Research. Internationale
Zeitschrift Fur Vitamin- Und Ernahrungsforschung.
Journal International de Vitaminologie et de Nutrition,
92(3-4), 147–157. https://doi.org/10.1024/0300-9831/
a000722
Hevner, A. R., March, Park, J., & Ram, S. (2004). Design
Science in Information Systems Research. MIS
Quarterly, 28(1), 75–105. https://doi.org/10.
2307/25148625
Legner, C., Pentek, T., & Otto, B. (2020). Accumulating
Design Knowledge with Reference Models: Insights
from 12 Years’ Research into Data Management.
Journal of the Association for Information Systems,
21(3), 735–770. https://doi.org/10.17705/1jais.00618