Authors: Tomasz Walkowiak 1 and Piotr Malak 2

Affiliations: 1 Wroclaw Univeristy of Science and Technology, Poland ; 2 University of Wroclaw, Poland

ISBN: 978-989-758-275-2

Keyword(s): NLP, Polish, Text Classification, Feature Selection, Weighting Schema, Supervised Machine Learning.

Related Ontology Subjects/Areas/Topics: Applications ; Artificial Intelligence ; Computational Intelligence ; Data Mining ; Databases and Information Systems Integration ; Enterprise Information Systems ; Evolutionary Computing ; Knowledge Discovery and Information Retrieval ; Knowledge Engineering and Ontology Development ; Knowledge-Based Systems ; Machine Learning ; Natural Language Processing ; Pattern Recognition ; Sensor Networks ; Signal Processing ; Soft Computing ; Symbolic Systems

Abstract: Abstract: The paper presents preparation, lead and results of evaluation of efficiency of text classification (TC) methods for Polish. The subject language is of complex morphology, it belongs to flexional languages. Thus there is a strong need of making proper text preprocessing in order to guarantee reliable TC. Basing on authors’ practical experience from former TC, IR and general NLP experiments set of preprocessing rules was applied. Also feature-documents matrix was designed with respect to the most promising feature selected. About 216 experiments on exemplar corpus in subject (topic) classification task, with different preprocessing, weighting, filtering (for dimensions reduction) schemes and classifiers was conducted. Results shows there is not substantial increase of accuracy when using most of classical pre-processing steps in case of corpus of large size (at least 1000 exemplars per class). The highest impact authors were able to obtain concerned the system costs of TC pro cesses, not the TC accuracy. (More)

PDF ImageFull Text

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Walkowiak, T. and Malak, P. (2018). Polish Texts Topic Classification Evaluation.In Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-275-2, pages 515-522. DOI: 10.5220/0006601605150522

author={Tomasz Walkowiak. and Piotr Malak.},
title={Polish Texts Topic Classification Evaluation},
booktitle={Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},


JO - Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Polish Texts Topic Classification Evaluation
SN - 978-989-758-275-2
AU - Walkowiak, T.
AU - Malak, P.
PY - 2018
SP - 515
EP - 522
DO - 10.5220/0006601605150522

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.