Polish Texts Topic Classification Evaluation

Tomasz Walkowiak; Piotr Malak

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Polish Texts Topic Classification Evaluation

Topics: Data Mining; Machine Learning; Natural Language Processing

In Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, 515-522, 2018 , Funchal, Madeira, Portugal

Authors: Tomasz Walkowiak ¹ and Piotr Malak ²

Affiliations: ¹ Wroclaw Univeristy of Science and Technology, Poland ; ² University of Wroclaw, Poland

Keyword(s): NLP, Polish, Text Classification, Feature Selection, Weighting Schema, Supervised Machine Learning.

Related Ontology Subjects/Areas/Topics: Applications ; Artificial Intelligence ; Computational Intelligence ; Data Mining ; Databases and Information Systems Integration ; Enterprise Information Systems ; Evolutionary Computing ; Knowledge Discovery and Information Retrieval ; Knowledge Engineering and Ontology Development ; Knowledge-Based Systems ; Machine Learning ; Natural Language Processing ; Pattern Recognition ; Sensor Networks ; Signal Processing ; Soft Computing ; Symbolic Systems

Abstract: Abstract: The paper presents preparation, lead and results of evaluation of efficiency of text classification (TC) methods for Polish. The subject language is of complex morphology, it belongs to flexional languages. Thus there is a strong need of making proper text preprocessing in order to guarantee reliable TC. Basing on authors’ practical experience from former TC, IR and general NLP experiments set of preprocessing rules was applied. Also feature-documents matrix was designed with respect to the most promising feature selected. About 216 experiments on exemplar corpus in subject (topic) classification task, with different preprocessing, weighting, filtering (for dimensions reduction) schemes and classifiers was conducted. Results shows there is not substantial increase of accuracy when using most of classical pre-processing steps in case of corpus of large size (at least 1000 exemplars per class). The highest impact authors were able to obtain concerned the system costs of TC pr ocesses, not the TC accuracy. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.157

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Walkowiak, T., Malak and P. (2018). Polish Texts Topic Classification Evaluation. In Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART; ISBN 978-989-758-275-2; ISSN 2184-433X, SciTePress, pages 515-522. DOI: 10.5220/0006601605150522

@conference{icaart18,
author={Tomasz Walkowiak and Piotr Malak},
title={Polish Texts Topic Classification Evaluation},
booktitle={Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART},
year={2018},
pages={515-522},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006601605150522},
isbn={978-989-758-275-2},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART
TI - Polish Texts Topic Classification Evaluation
SN - 978-989-758-275-2
IS - 2184-433X
AU - Walkowiak, T.
AU - Malak, P.
PY - 2018
SP - 515
EP - 522
DO - 10.5220/0006601605150522
PB - SciTePress