loading
Documents

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Dina Said 1 and Nayer Wanas 2

Affiliations: 1 University of Calgary, Canada ; 2 Cairo Microsoft Innovation Lab, Egypt

ISBN: 978-989-8425-28-7

Keyword(s): Distance metrics, Clustering, Online forums mining, Post clustering.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Clustering and Classification Methods ; Computational Intelligence ; Evolutionary Computing ; Interactive and Online Data Mining ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Machine Learning ; Mining Text and Semi-Structured Data ; Soft Computing ; Symbolic Systems

Abstract: Online discussion forums are considered a challenging repository for data mining tasks. Forums usually contain hundreds of threads which which in turn maybe composed of hundreds, or even thousands, of posts. Clustering these posts potentially will provide better visualization and exploration of online threads. Moreover, clustering can be used for discovering outlier and off-topic posts. In this paper, we propose the Leader-based Post Clustering (LPC), a modification to the Leader algorithm to be applied to the domain of clustering posts in threads of discussion boards. We also suggest using asymmetric pair-wise distances to measure the dissimilarity between posts. We further investigate the effect of indirect distance between posts, and how to calibrate it with the direct distance. In order to evaluate the proposed methods, we conduct experiments using artificial and real threads extracted from Slashdot and Ciao discussion forums. Experimental results demonstrate the effectiveness of the LPC algorithm when using the linear combination of direct and indirect distances, as well as using an averaging approach to evaluate a representative indirect distance. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 34.231.247.139

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Said, D. and Wanas, N. (2010). CLUSTERING OF THREAD POSTS IN ONLINE DISCUSSION FORUMS.In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 314-319. DOI: 10.5220/0003104303140319

@conference{kdir10,
author={Dina Said. and Nayer Wanas.},
title={CLUSTERING OF THREAD POSTS IN ONLINE DISCUSSION FORUMS},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={314-319},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003104303140319},
isbn={978-989-8425-28-7},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - CLUSTERING OF THREAD POSTS IN ONLINE DISCUSSION FORUMS
SN - 978-989-8425-28-7
AU - Said, D.
AU - Wanas, N.
PY - 2010
SP - 314
EP - 319
DO - 10.5220/0003104303140319

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.