Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names - Comparison and Correlation

Keerthi Koneru, Venkata Sai Venkatesh Pulla, Cihan Varol

Abstract

Researchers confront major problems while searching for various kinds of data in the large imprecise database, as they are not spelled correctly or in the way they were expected to be spelled. As a result, they cannot find the word they sought. Over the years of struggle, pronunciation of words was considered to be one of the practices to solve the problem effectively. The technique used to acquire words based on sounds is known as “Phonetic Matching”. Soundex is the first algorithm proposed and other algorithms like Metaphone, Caverphone, DMetaphone, Phonex etc., are also used for information retrieval in different environments. This paper deals with the analysis and evaluation of different phonetic matching algorithms on several datasets comprising of street names of North Carolina and English dictionary words. The analysis clearly states that there is no clear best technique for generic word lists as Metaphone has best performance for English dictionary words, while NYSIIS has better performance for datasets having street names. Though Soundex has high accuracy in correcting the exact words compared to other algorithms, it has lower precision due to more noise in the considered arena. The experimental results paved way for introducing some suggestions that would aid to make databases more concrete and achieve higher data quality.

References

  1. Balabantaray, RC, Sahoo, B, Lenka, SK, Sahoo, DK & Swain, M May 2012. An Automatic Approximate Matching Technique Based on Phonetic Encoding for Odia Query. IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 3, No 3.
  2. Beider, A & Morse, SP March, 2010. Phonetic Matching: A Better Soundex. [Online] Available from: http://stevemorse.org/phonetics/bmpm2.htm
  3. Bhattacharjee, AK, Mallick, A, Dey, A & Bandypoadhay, S September 2013. Enhanced Technique for Data cleaning in text files. International Journal of Computer Science Issues, Vol. 10, Issue 5, No 2.
  4. Carstensen, A September 2005. An Introduction to Double Metaphone and the Principles behind Soundex. [Online] Available from: http://www.b-eyenetwork.com/view/1596
  5. Chan, K, Vasardani, M & Winter, S August 2015. Getting lost in Cities: Spatial Patterns of Phonetically Confusing Street Names. Transactions in GIS, Vol. 19, Issue 4, August 2015.
  6. Christen, P December 2006. A Comparison of Personal Name Matching: Techniques and Practical Issues. Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), pp. 290-294, December 2006.
  7. Hood, D December, 2004. Caversham Project Occasional Technical Paper.
  8. Kelkar, BA & Manwade, KB June 2012. Identifying Nearly Duplicate Records in Relational Database. IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), Vol. 2, No.3
  9. Kukich, K December 1992. Techniques for automatically correcting words in text. ACM Computing Surveys, Vol. 24, No.4
  10. Lawler, J March 1999, An English Words List, [Online] Available from: http://www-personal.umich.edu/
  11. Nikita, March 2013. Phonetic Algorithms. [Online] Available from: http:// ntz-develop.blogspot.com/ 2011/03/phonetic-algorithms.html
  12. Pande, BP & Dhami, HS August 2011. Application of Natural Language Processing Tools in Stemming. International Journal of Computer Applications (0975 - 8887) Volume 27- No.6
  13. Philips, L June 2000. The Double Metaphone Search Algorithm. [Online] Available from: http://www. drdobbs.com/the-double-metaphone-search-algorithm
  14. Shah, R, & Singh, DK February, 2014. Analysis and Comparative Study on Phonetic Matching Techniques. International Journal of Computer Applications, Volume 87 - No.9.
  15. Varol, C & Talburt, JR 2011. Pattern and Phonetic Based Street Name Misspelling Correction. Eighth International Conference on Information Technology: New Generations.
  16. Zobel, J & Dart, P 1996. Phonetic String Matching: Lessons from Information Retrieval. Nineteenth Annual International ACM SIGIR conference on Research and development in Information Retrieval.
Download


Paper Citation


in Harvard Style

Koneru K., Pulla V. and Varol C. (2016). Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names - Comparison and Correlation . In Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-193-9, pages 57-64. DOI: 10.5220/0005926300570064


in Bibtex Style

@conference{data16,
author={Keerthi Koneru and Venkata Sai Venkatesh Pulla and Cihan Varol},
title={Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names - Comparison and Correlation},
booktitle={Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2016},
pages={57-64},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005926300570064},
isbn={978-989-758-193-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names - Comparison and Correlation
SN - 978-989-758-193-9
AU - Koneru K.
AU - Pulla V.
AU - Varol C.
PY - 2016
SP - 57
EP - 64
DO - 10.5220/0005926300570064