the  customer  requires  and  whether  omitting 
categories  with  only  a  few  member  products  is 
acceptable.  In  order  to  investigate  the  real-world 
relevance more closely, we suggest using the model 
first in a semi-automated process where categories are 
proposed to the user. Based on the user decisions, the 
model can then be further optimized and the degree 
of  automation  can  be  increased.  Apart  from 
classifying new  products,  our approach  can  also  be 
used for reclassification an already classified product 
data base into a different classification system. 
In  summary,  our  results  have  shown  that  the 
classification  of  food  products  can  be  carried  out 
during the initial product data generation step using 
only  the  product  name.  Standard  algorithms  are 
capable  of  achieving  satisfying  results  without  the 
need  for  hyper-specialized  and  difficult  to  optimize 
models. Our work can be extended to products from 
other segments like clothing or consumer electronics. 
Further  research  is  needed  to  answer  the  question 
whether a model covering products from all segments 
is better than a compartmentalized approach with one 
separate model for each segment. 
ACKNOWLEDGEMENTS  
The authors would like to thank the German Federal 
Ministry  of  Education  and  Research  for  supporting 
their  work  through  the  KMU-innovativ  programme 
under grant number 01—S18018. 
REFERENCES 
Allweyer,  O.  (2019).  Entwicklung  maschineller  Lern-
verfahren  zur Klassifizierung  von  Produktdatensätzen 
im Einzelhandel,  Master thesis,  University of Applied 
Sciences Trier. 
Cevahir, A.  and K.  Murakami (2016). Large-scale  Multi-
class and Hierarchical Product Categorization for an E-
commerce Giant. Proceedings of COLING 2016, 525–
535. 
Chavaltada,  C.,  K.  Pasupa  and D.  R.  Hardoon  (2017).  A 
Comparative  Study  of  Machine  Learning  Techniques 
for  Automatic  Product  Categorisation.  Advances in 
Neural Networks - ISNN 2017. 
Chen, J. and D. Warren (2013). Cost-sensitive Learning for 
Large-scale Hierarchical Classification of Commercial 
Products. Proceedings of the CIKM 2013. 
Cortes,  C.  and  V.  N.  Vapnik  (1995).  Support-vector 
networks. Machine Learning. 20 (3): 273–297. 
Ding, Y., M. Korotkiy, B. Omelayenko, V. Kartseva, V. 
Zykov,  M.  Klein,  E.  Schulten  and  D.  Fensel  (2002). 
GoldenBullet:  Automated  Classification  of  Product 
Data in E-commerce. Proceedings of BIS 2002. 
GS1  Germany,  Global  Product  Classification  (GPC) 
(2018). https://www.gs1-germany.de/ (02.03.20). 
Ha,  J.  W.,  H.  Pyo  and  J.  Kim.  (2016).  Large-scale  item 
categorization in e-commerce using multiple recurrent 
neural  networks.  Proceedings of the 22nd ACM 
SIGKDD. 
Hepp,  M.  and  J.  Leukel  and  V.  Schmitz  (2007).  A 
quantitative  analysis  of  product  categorization 
standards:  content,  coverage,  and  maintenance  of 
eCl@ss,  UNSPSC,  eOTD,  and  the  RosettaNet 
Technical  Dictionary,  Knowledge and Information 
Systems 13.1, 77–114. 
Ho, T.K. (1995). Random Decision Forests. Proceedings of 
the 3rd ICDAR, 278–282. 
Jones,  K.  S.  (1972).  A  statistical  interpretation  of  term 
specificity  and  its  application  in  retrieval.  Journal of 
documentation, 28(1), 11–21. 
Kohavi,  R.  (1995).  A  study  of  cross-validation  and 
bootstrap for accuracy estimation and model selection". 
Proceedings of the 14th International Joint Conference 
on Artificial Intelligence. 2 (12): 1137–1143. 
Kozareva,  Z.  (2015).  Everyone  Likes  Shopping!  Multi-
class  Product  Categorization  for  e-Commerce. 
Proceedings of the HLTC 2015, 1329–1333. 
Maron,  M.  E.  (1961).  Automatic  Indexing:  An 
Experimental Inquiry. Journal of the ACM. 8 (3). 
Mikolov,  T.;  et  al.  (2013).  Efficient  Estimation  of  Word 
Representations in Vector Space, arXiv:1301.3781. 
Porter,  M.  F.  (1980).  An  algorithm  for  suffix  stripping. 
Program, 14, 130–137. 
Rosenblatt,  F.  (1958):  The  perceptron:  a  probabilistic 
model for information storage and organization in the 
brain. Psychological Reviews 65 (1958) 386–408. 
Shankar, S. and I. Lin (2011). Applying Machine Learning 
to  Product  Categorization.  http://cs229.stanford.edu/ 
proj2011/LinShankar-Applying  Machine  Learning  to 
Product Categorization.pdf (02.03.20). 
Scikit-Learn (2019). https://scikit-learn.org/ (02.03.20). 
Song F., Liu S. and Yang J. (2005) A comparative study on 
text  representation  schemes  in  text  categorization 
Pattern Anal Applic 8: 199–209. 
Song, G.; et al. (2014). Short Text Classification: A Survey. 
Journal of Multimedia.  
Sun, C., Rampalli, N., Yang, F., Doan, A.. (2014) Chimera: 
Large-Scale  Classification  using  Machine  Learning, 
Rules,  and Crowdsourcing.  Proceedings of the VLDB 
Endowment,Vol. 7, No. 13. 
Taddy, M.  (2019). Stochastic  Gradient Descent. Business 
Data  Science:  Combining  Machine  Learning  and 
Economics  to  Optimize,  Automate,  and  Accelerate 
Business Decisions. McGraw-Hill. 303–307. 
Uysal,  A.  K.,  and  Gunal,  S.  (2014).  The  impact  of 
preprocessing  on  text  classification.  Information 
Processing & Management, 50(1), 104-112. 
Vandic,  D.,  F.  Frasincar  and  U.  Kaymak  (2018).  A 
Framework for Product Description Classification in E-
Commerce. Journal of Web Engineering. 17, 1–27.