A Data Lake Metadata Enrichment Mechanism via Semantic Blueprints

Michalis Pingos, Andreas Andreou

2022

Abstract

One of the greatest challenges in Smart Big Data Processing nowadays revolves around handling multiple heterogeneous data sources that produce massive amounts of structured, semi-structured and unstructured data through Data Lakes. The latter requires a disciplined approach to collect, store and retrieve/ analyse data to enable efficient predictive and prescriptive modelling, as well as the development of other advanced analytics applications on top of it. The present paper addresses this highly complex problem and proposes a novel standardization framework that combines mainly the 5Vs Big Data characteristics, blueprint ontologies and Data Lakes with ponds architecture, to offer a metadata semantic enrichment mechanism that enables fast storing to and efficient retrieval from a Data Lake. The proposed mechanism is compared qualitatively against existing metadata systems using a set of functional characteristics or properties, with the results indicating that it is indeed a promising approach.

Download


Paper Citation


in Harvard Style

Pingos M. and Andreou A. (2022). A Data Lake Metadata Enrichment Mechanism via Semantic Blueprints. In Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE, ISBN 978-989-758-568-5, pages 186-196. DOI: 10.5220/0011080400003176


in Bibtex Style

@conference{enase22,
author={Michalis Pingos and Andreas Andreou},
title={A Data Lake Metadata Enrichment Mechanism via Semantic Blueprints},
booktitle={Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,},
year={2022},
pages={186-196},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011080400003176},
isbn={978-989-758-568-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,
TI - A Data Lake Metadata Enrichment Mechanism via Semantic Blueprints
SN - 978-989-758-568-5
AU - Pingos M.
AU - Andreou A.
PY - 2022
SP - 186
EP - 196
DO - 10.5220/0011080400003176