Authors:
Jean-Valère Cossu
1
and
Liana Ermakova
2
Affiliations:
1
Vodkaster and MyLI - My Local Influence, France
;
2
Université Paris-Est Marne-la-Vallée and Université de Lorraine, France
Keyword(s):
Online Reputation Monitoring, Topic Categorization, Contextualization, Query Expansion, Natural Language Processing, Information Retrieval, Tweet, Classification.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Collaborative Computing
;
Enterprise Information Systems
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Society, e-Business and e-Government
;
Software Agents and Internet Computing
;
Symbolic Systems
;
User Profiling and Recommender Systems
;
Web 2.0 and Social Networking Controls
;
Web Information Systems and Technologies
Abstract:
Opinion and trend mining on micro-blogs like Twitter recently attracted research interest in several fields including
Information Retrieval (IR) and Natural Language Processing (NLP). However, the performance of
existing approaches is limited by the quality of available training material. Moreover, explaining automatic
systems’ suggestions for decision support is a difficult task thanks to this lack of data. One of the promising
solutions of this issue is the enrichment of textual content using large micro-blog archives or external document
collections, e.g. Wikipedia. Despite some advantages in Reputation Dimension Classification (RDC)
task pushed by RepLab, it remains a research challenge. In this paper we introduce a supervised classification
method for RDC based on a threshold intersection graph. We analyzed the impact of various micro-blogs
extension methods on RDC performance. We demonstrated that simple statistical NLP methods that do not
require any external resources can be
easily optimized to outperform the state-of-the-art approaches in RDC
task. Then, the conducted experiments proved that the micro-blog enrichment by effective expansion techniques
can improve classification quality.
(More)