Authors:
Bizhanova Aizhan
and
Atsushi Fujii
Affiliation:
Department of Computer Science, Tokyo Institute of Technology, Tokyo and Japan
Keyword(s):
Natural Language Processing, Word Sense Disambiguation, Text Normalization, Social Networking Service, Information Retrieval, Acronym, Emotion.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Machine Learning
;
Natural Language Processing
;
Pattern Recognition
;
Soft Computing
;
Symbolic Systems
Abstract:
Reflecting the rapid growth in the use of Social Networking Services (SNSs), it has of late become popular for users to share their feelings, impression, and opinions with each other, about what they saw or experienced, rapidly by means of short text messages (SMS). This trend has let a large number of users consciously or unconsciously use emotion-bearing words and also acronyms to reduce the number of characters to type. We have noticed this new emerging category of language unit, namely “Emotion-Driven Acronyms (EDAs)”. Because by definition, each acronym consists of less characters than its original full form, the acronyms for different full forms often coincidently identical. Consequently, the misuse of EDAs substantially decreases the readability of messages. Our long-term research goal is to normalize text in a corrupt language into the canonical one. In this paper, as the first step towards the exploration of EDAs, we focus only on the normalization for EDAs and propose a met
hod to disambiguate the occurrence of an EDA that corresponds to different full forms depending on the context, such as “smh (so much hate / shaking my head)”. We also demonstrate what kind of features are effective in our task experimentally and discuss the nature of EDAs from different perspectives.
(More)