Classification of Products in Retail using Partially Abbreviated Product Names Only

Oliver Allweyer, Christian Schorr, Rolf Krieger, Andreas Mohr

Abstract

The management of product data in ERP systems is a big challenge for most retail companies. The reason lies in the large amount of data and its complexity. There are companies having millions of product data records. Sometimes more than one thousand data records are created daily. Because data entry and maintenance processes are linked with considerable manual effort, costs - both in time and money - for data management are high. In many systems, the product name and product category must be specified before the product data can be entered manually. Based on the product category many default values are proposed to simplify the manual data entry process. Consequently, classification is essential for error-free and efficient data entry. In this paper, we show how to classify products automatically and compare different machine learning approaches to this end. In order to minimize the effort for the manual data entry and due to the severely limited length of the product name field the classification algorithms are based on shortened names of the products. In particular, we analyse the benefits of different pre-processing strategies and compare the quality of classification models on different hierarchy levels. Our results show that, even in this special case, machine learning can considerably simplify the process of data input.

Download


Paper Citation