loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Tomasz Walkowiak 1 and Piotr Malak 2

Affiliations: 1 Wroclaw Univeristy of Science and Technology, Poland ; 2 University of Wroclaw, Poland

Keyword(s): NLP, Polish, Text Classification, Feature Selection, Weighting Schema, Supervised Machine Learning.

Related Ontology Subjects/Areas/Topics: Applications ; Artificial Intelligence ; Computational Intelligence ; Data Mining ; Databases and Information Systems Integration ; Enterprise Information Systems ; Evolutionary Computing ; Knowledge Discovery and Information Retrieval ; Knowledge Engineering and Ontology Development ; Knowledge-Based Systems ; Machine Learning ; Natural Language Processing ; Pattern Recognition ; Sensor Networks ; Signal Processing ; Soft Computing ; Symbolic Systems

Abstract: Abstract: The paper presents preparation, lead and results of evaluation of efficiency of text classification (TC) methods for Polish. The subject language is of complex morphology, it belongs to flexional languages. Thus there is a strong need of making proper text preprocessing in order to guarantee reliable TC. Basing on authors’ practical experience from former TC, IR and general NLP experiments set of preprocessing rules was applied. Also feature-documents matrix was designed with respect to the most promising feature selected. About 216 experiments on exemplar corpus in subject (topic) classification task, with different preprocessing, weighting, filtering (for dimensions reduction) schemes and classifiers was conducted. Results shows there is not substantial increase of accuracy when using most of classical pre-processing steps in case of corpus of large size (at least 1000 exemplars per class). The highest impact authors were able to obtain concerned the system costs of TC pr ocesses, not the TC accuracy. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.16.70.101

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Walkowiak, T. and Malak, P. (2018). Polish Texts Topic Classification Evaluation. In Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART; ISBN 978-989-758-275-2; ISSN 2184-433X, SciTePress, pages 515-522. DOI: 10.5220/0006601605150522

@conference{icaart18,
author={Tomasz Walkowiak. and Piotr Malak.},
title={Polish Texts Topic Classification Evaluation},
booktitle={Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART},
year={2018},
pages={515-522},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006601605150522},
isbn={978-989-758-275-2},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART
TI - Polish Texts Topic Classification Evaluation
SN - 978-989-758-275-2
IS - 2184-433X
AU - Walkowiak, T.
AU - Malak, P.
PY - 2018
SP - 515
EP - 522
DO - 10.5220/0006601605150522
PB - SciTePress