Tallinn University of Technology, Estonia
University of Applied Sciences and Arts, Germany
Ontology-based data integration, Duplicated attributes, Context-based similarity, Market basket analysis, ICD algorithm.
Coupling and Integrating Heterogeneous Data Sources
Databases and Information Systems Integration
Enterprise Information Systems
Semantic heterogeneity is the ambiguous interpretation of terms describing the meaning of data in heterogeneous data sources such as databases. This is a well-known problem in data integration. A recent solution to this problem is to use ontologies, which is called ontology-based data integration. However, ontologies can contain duplicated attributes, which can lead to improper integration results. This paper proposes a novel approach that analyzes a workload of queries over an ontology to automatically calculate (semantic) distances between attributes, which are then used for duplicate detection.