loading
Papers

Research.Publish.Connect.

Paper

Paper Unlock

Authors: André Lourenço 1 ; Liliana Medina 2 ; Ana Fred 3 and Joaquim Filipe 4

Affiliations: 1 Instituto Superior de Engenharia de Lisboa and a, Portugal ; 2 Institute for Systems and Technologies of Information and Control and Communication, Portugal ; 3 Instituto Superior Técnico, Portugal ; 4 Institute for Systems and Technologies of Information, Control and Communication and Polytechnic Institute of Setúbal, Portugal

ISBN: 978-989-8425-79-9

Keyword(s): Unsupervised learning, Clustering, Clustering combination, Clustering ensembles, Text mining, Feature selection, Concept induction, Metaterm.

Abstract: Unsupervised organisation of documents, and in particular research papers, into meaningful groups is a difficult problem. Using the typical vector-space-model representation (Bag-of-words paradigm), difficulties arise due to its intrinsic high dimensionality, high redundancy of features, and the lack of semantic information. In this work we propose a document representation relying on a statistical feature reduction step, and an enrichment phase based on the introduction of higher abstraction terms, designated as metaterms, derived from text, using as prior knowledge papers topics and keywords. The proposed representation, combined with a clustering ensemble approach, leads to a novel document organization strategy. We evaluate the proposed approach taking as application domain conference papers, topic information being extracted from conference topics or areas. Performance evaluation on data sets from NIPS and INSTICC conferences show that the proposed approach leads to interesting a nd encouraging results. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.227.249.234

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Lourenço, A.; Medina, L.; Fred, A. and Filipe, J. (2011). UNSUPERVISED ORGANISATION OF SCIENTIFIC DOCUMENTS.In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2011) ISBN 978-989-8425-79-9, pages 549-560. DOI: 10.5220/0003722905570568

@conference{sstm11,
author={André Louren\c{C}o. and Liliana Medina. and Ana Fred. and Joaquim Filipe.},
title={UNSUPERVISED ORGANISATION OF SCIENTIFIC DOCUMENTS},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2011)},
year={2011},
pages={549-560},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003722905570568},
isbn={978-989-8425-79-9},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2011)
TI - UNSUPERVISED ORGANISATION OF SCIENTIFIC DOCUMENTS
SN - 978-989-8425-79-9
AU - Lourenço, A.
AU - Medina, L.
AU - Fred, A.
AU - Filipe, J.
PY - 2011
SP - 549
EP - 560
DO - 10.5220/0003722905570568

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.