A Text Similarity-based Process for Extracting JSON Conceptual Schemas

Fhabiana Machado, Deise Saccol, Eduardo Piveta, Renata Padilha, Ezequiel Ribeiro

Abstract

NoSQL (Not Only SQL) document-oriented databases stand out because of the need for scalability. This storage model promises flexibility in documents, using files and data sources in JSON (JavaScript Object Notation) format. It also allows documents within the same collection to have different fields. Such differences occur in database integration scenarios. When the user needs to access different datasources in an unified way, it can be troublesome, as there is no standardization in the structures. In this sense, this work presents a process for conceptual schema extraction in JSON datasets. Our proposal analyzes fields representing the same information, but written differently. In the context of this work, differences in writing are related to treatment of synonyms and character. To perform this analysis, techniques such as character-based and knowledge-based similarity functions, as well as stemming are used. Therefore, we specify a process to extract the implicit schema present in these data sources, applying different textual equivalence techniques in field names. We applied the process in an experiment from the scientific publications domain, correctly identifying 80% of the equivalent terms. This process outputs an unified conceptual schema and the respective mappings for the equivalent terms contributing to the schema integration’s problem.

Download


Paper Citation


in Harvard Style

Machado F., Saccol D., Piveta E., Padilha R. and Ribeiro E. (2021). A Text Similarity-based Process for Extracting JSON Conceptual Schemas. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-509-8, pages 264-271. DOI: 10.5220/0010475102640271


in Bibtex Style

@conference{iceis21,
author={Fhabiana Machado and Deise Saccol and Eduardo Piveta and Renata Padilha and Ezequiel Ribeiro},
title={A Text Similarity-based Process for Extracting JSON Conceptual Schemas},
booktitle={Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2021},
pages={264-271},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010475102640271},
isbn={978-989-758-509-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A Text Similarity-based Process for Extracting JSON Conceptual Schemas
SN - 978-989-758-509-8
AU - Machado F.
AU - Saccol D.
AU - Piveta E.
AU - Padilha R.
AU - Ribeiro E.
PY - 2021
SP - 264
EP - 271
DO - 10.5220/0010475102640271