Text Mining in Hotel Reviews: Impact of Words Restriction in Text Classification

Diogo Campos, Rodrigo Rocha Silva, Jorge Bernardino

2019

Abstract

Text Mining is the process of extracting interesting and non-trivial patterns or knowledge from unstructured text documents. Hotel Reviews are used by hotels to verify client satisfaction regarding their own services or facilities. However, we can’t deal with this type of big and unstructured data manually, so we should use OLAP techniques and Text Cube for modelling and manage text data. But then, we have a problem, we must separate the reviews in two classes, positive and negative, and for that, we use Sentiment Analysis technique. Nevertheless, do we really need all the words of a review to make the right classification? In this paper, we will study the impact of word restriction on text classification. To do that, we create some words domains (words that belong to a Hotel Domain). First, we use an algorithm that will pre-process the text (where we use our created domains like stop words). In the experimental evaluation, we use four classifiers to classify the text, Naïve-Bayes, Decision-Tree, Random-Forest, and Support Vector Machine.

Download


Paper Citation


in Harvard Style

Campos D., Silva R. and Bernardino J. (2019). Text Mining in Hotel Reviews: Impact of Words Restriction in Text Classification. In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR; ISBN 978-989-758-382-7, SciTePress, pages 442-449. DOI: 10.5220/0008346904420449


in Bibtex Style

@conference{kdir19,
author={Diogo Campos and Rodrigo Rocha Silva and Jorge Bernardino},
title={Text Mining in Hotel Reviews: Impact of Words Restriction in Text Classification},
booktitle={Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR},
year={2019},
pages={442-449},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008346904420449},
isbn={978-989-758-382-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR
TI - Text Mining in Hotel Reviews: Impact of Words Restriction in Text Classification
SN - 978-989-758-382-7
AU - Campos D.
AU - Silva R.
AU - Bernardino J.
PY - 2019
SP - 442
EP - 449
DO - 10.5220/0008346904420449
PB - SciTePress