loading
Documents

Research.Publish.Connect.

Paper

Mining M-Grams by a Granular Computing Approach for Text Classification

Topics: Applications: Image Processing and Artificial Vision, Pattern Recognition, Decision Making, Industrial and Real World Applications, Financial Applications, Neural Prostheses and Medical Applications, Neural Based Data Mining and Complex Information Process; Learning Paradigms and Algorithms; Neural based Implementation, Applications and Solutions

Authors: Antonino Capillo ; Enrico de Santis ; Fabio Mascioli and Antonello Rizzi

Affiliation: Department of Information Engineering, Electronics and Telecommunications, University of Rome “La Sapienza”, Via Eudossiana 18, 00184 Rome, Italy

ISBN: 978-989-758-475-6

Keyword(s): Text Mining, Text Categorization, Granular Computing, Knowledge Discovery, Explainable AI.

Abstract: Text mining and text classification are gaining more and more importance in AI related research fields. Researchers are particularly focused on classification systems, based on structured data (such as sequences or graphs), facing the challenge of synthesizing interpretable models, exploiting gray-box approaches. In this paper, a novel gray-box text classifier is presented. Documents to be classified are split into their constituent words, or tokens. Groups of frequent m tokens (or m-grams) are suitably mined adopting the Granular Computing framework. By fastText algorithm, each token is encoded in a real-valued vector and a custom-based dissimilarity measure, grounded on the Edit family, is designed specifically to deal with m-grams. Through a clustering procedure the most representative m-grams, pertaining the corpus of documents, are extrapolated and arranged into a Symbolic Histogram representation. The latter allows embedding documents in a well-suited real-valued space in which a standard classifier, such as SVM, can safety operate. Along with the classification procedure, an Evolutionary Algorithm is in charge of performing features selection, which is able to select most relevant symbols – m-grams – for each class. This study shows how symbols can be fruitfully interpreted, allowing an interesting knowledge discovery procedure, in lights with the new requirements of modern explainable AI systems. The effectiveness of the proposed algorithm has been proved through a set of experiments on paper abstracts classification and SMS spam detection. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.239.109.55

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Capillo, A.; de Santis, E.; Mascioli, F. and Rizzi, A. (2020). Mining M-Grams by a Granular Computing Approach for Text Classification.In Proceedings of the 12th International Joint Conference on Computational Intelligence - Volume 1: NCTA, ISBN 978-989-758-475-6, pages 350-360. DOI: 10.5220/0010109803500360

@conference{ncta20,
author={Antonino Capillo. and Enrico de Santis. and Fabio Massimo Frattale Mascioli. and Antonello Rizzi.},
title={Mining M-Grams by a Granular Computing Approach for Text Classification},
booktitle={Proceedings of the 12th International Joint Conference on Computational Intelligence - Volume 1: NCTA,},
year={2020},
pages={350-360},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010109803500360},
isbn={978-989-758-475-6},
}

TY - CONF

JO - Proceedings of the 12th International Joint Conference on Computational Intelligence - Volume 1: NCTA,
TI - Mining M-Grams by a Granular Computing Approach for Text Classification
SN - 978-989-758-475-6
AU - Capillo, A.
AU - de Santis, E.
AU - Mascioli, F.
AU - Rizzi, A.
PY - 2020
SP - 350
EP - 360
DO - 10.5220/0010109803500360

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.