Authors:
Diana Lopes-Teixeira
1
;
Fernando Batista
2
and
Ricardo Ribeiro
2
Affiliations:
1
Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa and Portugal
;
2
Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal, L2F, INESC-ID Lisboa, Lisboa and Portugal
Keyword(s):
Topic Modeling, Topics Evolution, LDA, Preprocessing, Brand Interest.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Business Intelligence Applications
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Pre-Processing and Post-Processing for Data Mining
;
Symbolic Systems
Abstract:
Topic Modeling is a well-known unsupervised learning technique used when dealing with text data. It is used to discover latent patterns, called topics, in a collection of documents (corpus). This technique provides a convenient way to retrieve information from unclassified and unstructured text. Topic Modeling tasks have been performed for tracking events/topics/trends in different domains such as academic, public health, marketing, news, and so on. In this paper, we propose a framework for extracting topics from a large dataset of short messages, for brand interest tracking purposes. The framework consists training LDA topic models for each brand using time intervals, and then applying the model on aggregated documents. Additionally, we present a set of preprocessing tasks that helped to improve the topic models and the corresponding outputs. The experiments demonstrate that topic modeling can successfully track people’s discussions on Social Networks even in massive datasets, and c
apture those topics spiked by real-life events.
(More)