Stopwords Identification by Means of Characteristic and Discriminant Analysis

Giuliano Armano; Francesca Fanni; Alessandro Giuliani

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Stopwords Identification by Means of Characteristic and Discriminant Analysis

Topics: Data Mining; Pattern Recognition

In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, 353-360, 2015 , Lisbon, Portugal

Authors: Giuliano Armano ; Francesca Fanni and Alessandro Giuliani

Affiliation: University of Cagliari, Italy

Keyword(s): Stopwords, Discriminant Capability, Characteristic Capability, Text Classification.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Biomedical Engineering ; Biomedical Signal Processing ; Data Manipulation ; Data Mining ; Databases and Information Systems Integration ; Enterprise Information Systems ; Health Engineering and Technology Applications ; Human-Computer Interaction ; Methodologies and Methods ; Neurocomputing ; Neurotechnology, Electronics and Informatics ; Pattern Recognition ; Physiological Computing Systems ; Sensor Networks ; Signal Processing ; Soft Computing

Abstract: Stopwords are meaningless, non-significant terms that frequently occur in a document. They should be removed, like a noise. Traditionally, two different approaches of building a stoplist have been used: the former considers the most frequent terms looking at a language (e.g., english stoplist), the other includes the most occurring terms in a document collection. In several tasks, e.g., text classification and clustering, documents are typically grouped into categories. We propose a novel approach aimed at automatically identifying specific stopwords for each category. The proposal relies on two unbiased metrics that allow to analyze the informative content of each term; one measures the discriminant capability and the latter measures the characteristic capability. For each term, the former is expected to be high in accordance with the ability to distinguish a category against others, whereas the latter is expected to be high according to how the term is frequent and common over all categories. A preliminary study and experiments have been performed, pointing out our insight. Results confirm that, for each domain, the metrics easily identify specific stoplist wich include classical and category-dependent stopwords. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.157

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Armano, G., Fanni, F., Giuliani and A. (2015). Stopwords Identification by Means of Characteristic and Discriminant Analysis. In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART; ISBN 978-989-758-074-1; ISSN 2184-433X, SciTePress, pages 353-360. DOI: 10.5220/0005194303530360

@conference{icaart15,
author={Giuliano Armano and Francesca Fanni and Alessandro Giuliani},
title={Stopwords Identification by Means of Characteristic and Discriminant Analysis},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART},
year={2015},
pages={353-360},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005194303530360},
isbn={978-989-758-074-1},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART
TI - Stopwords Identification by Means of Characteristic and Discriminant Analysis
SN - 978-989-758-074-1
IS - 2184-433X
AU - Armano, G.
AU - Fanni, F.
AU - Giuliani, A.
PY - 2015
SP - 353
EP - 360
DO - 10.5220/0005194303530360
PB - SciTePress