Addressing the Problem of Unbalanced Data Sets in Sentiment Analysis

Asmaa Mountassir; Houda Benbrahim; Ilham Berrada

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Addressing the Problem of Unbalanced Data Sets in Sentiment Analysis

Topics: AI Programming ; Applications and Case-Studies; Knowledge Reengineering; Knowledge Representation; Natural Language Processing

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 0IC3K, 306-311, 2012 , Barcelona, Spain

Authors: Asmaa Mountassir ; Houda Benbrahim and Ilham Berrada

Affiliation: ENSIAS and Mohamed 5 University, Morocco

Keyword(s): Sentiment Analysis, Opinion Mining, Unbalanced Data Sets, Machine Learning, Text Classification, Natural Language Processing, Arabic Language.

Related Ontology Subjects/Areas/Topics: AI Programming ; Applications ; Applications and Case-studies ; Artificial Intelligence ; Knowledge Engineering and Ontology Development ; Knowledge Reengineering ; Knowledge Representation ; Knowledge-Based Systems ; Natural Language Processing ; Pattern Recognition ; Symbolic Systems

Abstract: Sentiment Analysis is a research area where the studies focus on processing and analysing the opinions available on the web. This paper deals with the problem of unbalanced data sets in supervised sentiment classification. We propose three different methods to under-sample the majority class documents, namely Remove Similar, Remove Farthest and Remove by Clustering. Our goal is to compare the effectiveness of the proposed methods with the common random under-sampling. We use for classification three standard classifiers: Naïve Bayes, Support Vector Machines and k-Nearest Neighbours. The experiments are carried out on two different Arabic data sets that we have built and labelled manually. We show that results obtained on the first data set, which is slightly skewed, are better than those obtained on the second one which is highly skewed. The results show also that we can rely on the proposed techniques and that they are typically competitive with random under-sampling.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.141.8.247

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Mountassir, A.; Benbrahim, H. and Berrada, I. (2012). Addressing the Problem of Unbalanced Data Sets in Sentiment Analysis. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2012) - KDIR; ISBN 978-989-8565-29-7; ISSN 2184-3228, SciTePress, pages 306-311. DOI: 10.5220/0004142603060311

@conference{kdir12,
author={Asmaa Mountassir. and Houda Benbrahim. and Ilham Berrada.},
title={Addressing the Problem of Unbalanced Data Sets in Sentiment Analysis},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2012) - KDIR},
year={2012},
pages={306-311},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004142603060311},
isbn={978-989-8565-29-7},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2012) - KDIR
TI - Addressing the Problem of Unbalanced Data Sets in Sentiment Analysis
SN - 978-989-8565-29-7
IS - 2184-3228
AU - Mountassir, A.
AU - Benbrahim, H.
AU - Berrada, I.
PY - 2012
SP - 306
EP - 311
DO - 10.5220/0004142603060311
PB - SciTePress