A Novel Method for Word Segmentation and Spell Correction in e-Commerce Search Engines
Melis Öztürk Umut, Muhammed Bera Kaya, Mustafa Keskin
2025
Abstract
E-commerce search engines face a common problem where users write multi-word queries as a single, concatenated word, such as "blackshoe" instead of "black shoe." This issue complicates search algorithms, leading to poor user experience and lower conversion rates. Our observations from historical search data of an e-commerce platform confirm that these incorrectly concatenated terms are a significant challenge, indicating a need for improved detection and correction methods. This study aims to develop a novel method to accurately segment and correct these terms. Our approach is based on dictionary and statistical algorithms, using a custom-built dictionary and edit distance-based structures to quickly match and correct erroneous or concatenated words. The algorithm's parameters, including search frequency thresholds, maximum edit distance, and prefix length, were extensively tested with different combinations to find the optimal settings for both spell correction and word segmentation. While this method was specifically designed for a particular e-commerce application's dataset, it proposes a generalizable approach for other e-commerce platforms. The paper details the dataset preparation, the proposed methodology, and the performance metrics obtained.
DownloadPaper Citation
in Harvard Style
Öztürk Umut M., Kaya M. and Keskin M. (2025). A Novel Method for Word Segmentation and Spell Correction in e-Commerce Search Engines. In Proceedings of the 2nd International Conference on Advances in Electrical, Electronics, Energy, and Computer Sciences - Volume 1: ICEEECS; ISBN 978-989-758-783-2, SciTePress, pages 14-19. DOI: 10.5220/0014286500004848
in Bibtex Style
@conference{iceeecs25,
author={Melis Öztürk Umut and Muhammed Bera Kaya and Mustafa Keskin},
title={A Novel Method for Word Segmentation and Spell Correction in e-Commerce Search Engines},
booktitle={Proceedings of the 2nd International Conference on Advances in Electrical, Electronics, Energy, and Computer Sciences - Volume 1: ICEEECS},
year={2025},
pages={14-19},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0014286500004848},
isbn={978-989-758-783-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Conference on Advances in Electrical, Electronics, Energy, and Computer Sciences - Volume 1: ICEEECS
TI - A Novel Method for Word Segmentation and Spell Correction in e-Commerce Search Engines
SN - 978-989-758-783-2
AU - Öztürk Umut M.
AU - Kaya M.
AU - Keskin M.
PY - 2025
SP - 14
EP - 19
DO - 10.5220/0014286500004848
PB - SciTePress