Authors:
Keerthi Koneru
;
Venkata Sai Venkatesh Pulla
and
Cihan Varol
Affiliation:
Sam Houston State University, United States
Keyword(s):
Caverphone, Dmetaphone, Information Retrieval, Misspelled Words, Metaphone, NYSIIS, Phonetic Matching, Soundex.
Related
Ontology
Subjects/Areas/Topics:
Data Engineering
;
Data Management and Quality
;
Information Quality
Abstract:
Researchers confront major problems while searching for various kinds of data in the large imprecise database, as they are not spelled correctly or in the way they were expected to be spelled. As a result, they cannot find the word they sought. Over the years of struggle, pronunciation of words was considered to be one of the practices to solve the problem effectively. The technique used to acquire words based on sounds is known as “Phonetic Matching”. Soundex is the first algorithm proposed and other algorithms like Metaphone, Caverphone, DMetaphone, Phonex etc., are also used for information retrieval in different environments. This paper deals with the analysis and evaluation of different phonetic matching algorithms on several datasets comprising of street names of North Carolina and English dictionary words. The analysis clearly states that there is no clear best technique for generic word lists as Metaphone has best performance for English dictionary words, while NYSIIS has bet
ter performance for datasets having street names. Though Soundex has high accuracy in correcting the exact words compared to other algorithms, it has lower precision due to more noise in the considered arena. The experimental results paved way for introducing some suggestions that would aid to make databases more concrete and achieve higher data quality.
(More)