Authors: Costin Chiru ; Traian Rebedea and Silvia Ciotec

Affiliation: University Politehnica Bucharest, Romania

ISBN: 978-989-758-024-6

Keyword(s): Latent Semantic Analysis - LSA, Latent Dirichlet Allocation - LDA, Lexical Chains, Semantic Relatedness.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Data Mining ; Databases and Information Systems Integration ; Enterprise Information Systems ; Sensor Networks ; Signal Processing ; Soft Computing

Abstract: This paper presents an analysis of three techniques used for similar tasks, especially related to semantics, in Natural Language Processing (NLP): Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA) and lexical chains. These techniques were evaluated and compared on two different corpora in order to highlight the similarities and differences between them from a semantic analysis viewpoint. The first corpus consisted of four Wikipedia articles on different topics, while the second one consisted of 35 online chat conversations between 4-12 participants debating four imposed topics (forum, chat, blog and wikis). The study focuses on finding similarities and differences between the outcomes of the three methods from a semantic analysis point of view, by computing quantitative factors such as correlations, degree of coverage of the resulting topics, etc. Using corpora from different types of discourse and quantitative factors that are task-independent allows us to prove that although LSA and LDA provide similar results, the results of lexical chaining are not very correlated with neither the ones of LSA or LDA, therefore lexical chains might be used complementary to LSA or LDA when performing semantic analysis for various NLP applications. (More)

Paper citation in several formats:
Chiru, C.; Rebedea, T. and Ciotec, S. (2014). Comparison between LSA-LDA-Lexical Chains.In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-024-6, pages 255-262. DOI: 10.5220/0004798102550262

