loading
Documents

Research.Publish.Connect.

Paper

Paper Unlock

Authors: François Role and Mohamed Nadif

Affiliation: Université Paris Descartes, France

ISBN: 978-989-8425-79-9

Keyword(s): Text mining, Word similarity measures, Pointwise mutual information.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Concept Mining ; Information Extraction ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Mining Text and Semi-Structured Data ; Symbolic Systems

Abstract: Statistical measures of word similarity are widely used in many areas of information retrieval and text mining. Among popular word co-occurrence based measures is Pointwise Mutual Information (PMI). Altough widely used, PMI has a well-known tendency to give excessive scores of relatedness to word pairs that involve low frequency words. Many variants of it have therefore been proposed, which correct this bias empirically. In contrast to this empirical approach, we propose formulae and indicators that describe the behavior of these variants in a precise way so that researchers and practitioners can make a more informed decision as to which measure to use in different scenarios.

PDF ImageFull Text

Download
Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 54.163.20.123

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Role, F. and Nadif, M. (2011). HANDLING THE IMPACT OF LOW FREQUENCY EVENTS ON CO-OCCURRENCE BASED MEASURES OF WORD SIMILARITY - A Case Study of Pointwise Mutual Information.In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 218-223. DOI: 10.5220/0003655102260231

@conference{kdir11,
author={Fran\c{C}ois Role. and Mohamed Nadif.},
title={HANDLING THE IMPACT OF LOW FREQUENCY EVENTS ON CO-OCCURRENCE BASED MEASURES OF WORD SIMILARITY - A Case Study of Pointwise Mutual Information},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={218-223},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003655102260231},
isbn={978-989-8425-79-9},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - HANDLING THE IMPACT OF LOW FREQUENCY EVENTS ON CO-OCCURRENCE BASED MEASURES OF WORD SIMILARITY - A Case Study of Pointwise Mutual Information
SN - 978-989-8425-79-9
AU - Role, F.
AU - Nadif, M.
PY - 2011
SP - 218
EP - 223
DO - 10.5220/0003655102260231

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.