An Advanced Entity Resolution in Data Lakes: First Steps

Lamisse F. Bouabdelli, Lamisse F. Bouabdelli, Fatma Abdelhedi, Slimane Hammoudi, Allel Hadjali

2025

Abstract

Entity Resolution (ER) is a critical challenge for maintaining data quality in data lakes, aiming to identify different descriptions that refer to the same real-world entity. We address here the problem of entity resolution in data lakes, where their schema-less architecture and heterogeneous data sources often lead to entity duplication, inconsistency, and ambiguity, causing serious data quality issues. Although ER has been well studied both in academic research and industry, many state-of-the-art ER solutions face significant drawbacks. Existing ER solutions typically compare two entities based on attribute similarity, without taking into account that some attributes contribute more significantly than others in distinguishing entities. In addition, traditional validation methods that rely on human experts are often error-prone, time-consuming, and costly. We propose an efficient ER approach that leverages deep learning, knowledge graphs (KG), and large language models (LLM) to automate and enhance entity disambiguation. Furthermore, the matching task incorporates attribute weights, thereby improving accuracy. By integrating LLM for automated validation, this approach significantly reduces the reliance on manual expert verification while maintaining high accuracy.

Download


Paper Citation


in Harvard Style

Bouabdelli L., Abdelhedi F., Hammoudi S. and Hadjali A. (2025). An Advanced Entity Resolution in Data Lakes: First Steps. In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-758-0, SciTePress, pages 661-668. DOI: 10.5220/0013643200003967


in Bibtex Style

@conference{data25,
author={Lamisse Bouabdelli and Fatma Abdelhedi and Slimane Hammoudi and Allel Hadjali},
title={An Advanced Entity Resolution in Data Lakes: First Steps},
booktitle={Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2025},
pages={661-668},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013643200003967},
isbn={978-989-758-758-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - An Advanced Entity Resolution in Data Lakes: First Steps
SN - 978-989-758-758-0
AU - Bouabdelli L.
AU - Abdelhedi F.
AU - Hammoudi S.
AU - Hadjali A.
PY - 2025
SP - 661
EP - 668
DO - 10.5220/0013643200003967
PB - SciTePress