loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: María del Pilar Angeles and Noemi Bailón-Miguel

Affiliation: Facultad de Ingeniería, Universidad Nacional Autónoma de México, Mexico

Keyword(s): Data mining; Data matching; Record linkage; Data cleansing.

Abstract: Many business within big data projects suffer from duplicate data. This situation seriously impedes to managers to make well informed decisions. In the case of low data quality written in Spanish language, the identification and correction of problems such as spelling errors with English language based coding techniques is not suitable. In the case of Spanish language, written information is pronounced equal. There are phonetic techniques for duplicate detection that are not oriented to the Spanish language. Thus, the identification and correction of problems such as spelling errors in Spanish texts with such techniques is not suitable. In this paper we have implemented, modified and utilized in SEUCAD (Angeles, 2014) three Spanish phonetic algorithms to detect duplicate text strings in the presence of spelling errors in Spanish. The results were satisfactory, the Phonetic Spanish algorithm performed the best most of the time, demonstrating opportunities for an improved performance o f Spanish encoding during the record linkage process. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.144.102.239

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Angeles, M. and Bailón-Miguel, N. (2016). A Comparative of Spanish Encoding Functions - Efectiveness on Record Linkage. In Proceedings of the Fifth International Conference on Telecommunications and Remote Sensing - ICTRS; ISBN 978-989-758-200-4, SciTePress, pages 105-113. DOI: 10.5220/0006227701050113

@conference{ictrs16,
author={María del Pilar Angeles. and Noemi Bailón{-}Miguel.},
title={A Comparative of Spanish Encoding Functions - Efectiveness on Record Linkage},
booktitle={Proceedings of the Fifth International Conference on Telecommunications and Remote Sensing - ICTRS},
year={2016},
pages={105-113},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006227701050113},
isbn={978-989-758-200-4},
}

TY - CONF

JO - Proceedings of the Fifth International Conference on Telecommunications and Remote Sensing - ICTRS
TI - A Comparative of Spanish Encoding Functions - Efectiveness on Record Linkage
SN - 978-989-758-200-4
AU - Angeles, M.
AU - Bailón-Miguel, N.
PY - 2016
SP - 105
EP - 113
DO - 10.5220/0006227701050113
PB - SciTePress