SEALM: Semantically Enriched Attributes with Language Models for Linkage Recommendation
Leonard Traeger, Leonard Traeger, Andreas Behrend, George Karabatis
2025
Abstract
Matching attributes from different repositories is an important step in the process of schema integration to consolidate heterogeneous data silos. In order to recommend linkages between relevant attributes, a contextually rich representation of each attribute is quite essential, particularly when more than two database schemas are to be integrated. This paper introduces the SEALM approach to generate a data catalog of semantically rich attribute descriptions using Generative Language Models based on a new technique that employs six variations of available metadata information. Instead of using raw attribute metadata, we generate SEALM descriptions, which are used to recommend linkages with an unsupervised matching pipeline that involves a novel multi-source Blocking algorithm. Experiments on multiple schemas yield a 5% to 20% recall improvement in recommending linkages with SEALM-based attribute descriptions generated by the tiniest Llama3.1:8B model compared to existing techniques. With SEALM, we only need to process the small fraction of attributes to be integrated rather than exhaustively inspecting all combinations of potential linkages.
DownloadPaper Citation
in Harvard Style
Traeger L., Behrend A. and Karabatis G. (2025). SEALM: Semantically Enriched Attributes with Language Models for Linkage Recommendation. In Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-749-8, SciTePress, pages 39-50. DOI: 10.5220/0013217700003929
in Bibtex Style
@conference{iceis25,
author={Leonard Traeger and Andreas Behrend and George Karabatis},
title={SEALM: Semantically Enriched Attributes with Language Models for Linkage Recommendation},
booktitle={Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2025},
pages={39-50},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013217700003929},
isbn={978-989-758-749-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - SEALM: Semantically Enriched Attributes with Language Models for Linkage Recommendation
SN - 978-989-758-749-8
AU - Traeger L.
AU - Behrend A.
AU - Karabatis G.
PY - 2025
SP - 39
EP - 50
DO - 10.5220/0013217700003929
PB - SciTePress