loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Giacomo Domeniconi ; Gianluca Moro ; Roberto Pasolini and Claudio Sartori

Affiliation: University of Bologna, Italy

Keyword(s): Term Weighting, Supervised Term Weighting Scheme, Text Categorization, tfidf, Text Representation.

Related Ontology Subjects/Areas/Topics: Applications ; Artificial Intelligence ; Big Data ; Biomedical Engineering ; Business Analytics ; Data Engineering ; Data Management and Quality ; Data Mining ; Databases and Information Systems Integration ; Datamining ; Enterprise Information Systems ; Health Information Systems ; Information Retrieval ; Ontologies and the Semantic Web ; Pattern Recognition ; Semi-Structured and Unstructured Data ; Sensor Networks ; Signal Processing ; Soft Computing ; Software Engineering ; Text Analytics

Abstract: Within text categorization and other data mining tasks, the use of suitable methods for term weighting can bring a substantial boost in effectiveness. Several term weighting methods have been presented throughout literature, based on assumptions commonly derived from observation of distribution of words in documents. For example, the idf assumption states that words appearing in many documents are usually not as important as less frequent ones. Contrarily to tf.idf and other weighting methods derived from information retrieval, schemes proposed more recently are supervised, i.e. based on knownledge of membership of training documents to categories. We propose here a supervised variant of the tf.idf scheme, based on computing the usual idf factor without considering documents of the category to be recognized, so that importance of terms frequently appearing only within it is not underestimated. A further proposed variant is additionally based on relevance frequency, considering occurr ences of words within the category itself. In extensive experiments on two recurring text collections with several unsupervised and supervised weighting schemes, we show that the ones we propose generally perform better than or comparably to other ones in terms of accuracy, using two different learning methods. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.189.180.76

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Domeniconi, G.; Moro, G.; Pasolini, R. and Sartori, C. (2015). A Study on Term Weighting for Text Categorization: A Novel Supervised Variant of tf.idf. In Proceedings of 4th International Conference on Data Management Technologies and Applications - DATA; ISBN 978-989-758-103-8; ISSN 2184-285X, SciTePress, pages 26-37. DOI: 10.5220/0005511900260037

@conference{data15,
author={Giacomo Domeniconi. and Gianluca Moro. and Roberto Pasolini. and Claudio Sartori.},
title={A Study on Term Weighting for Text Categorization: A Novel Supervised Variant of tf.idf},
booktitle={Proceedings of 4th International Conference on Data Management Technologies and Applications - DATA},
year={2015},
pages={26-37},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005511900260037},
isbn={978-989-758-103-8},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of 4th International Conference on Data Management Technologies and Applications - DATA
TI - A Study on Term Weighting for Text Categorization: A Novel Supervised Variant of tf.idf
SN - 978-989-758-103-8
IS - 2184-285X
AU - Domeniconi, G.
AU - Moro, G.
AU - Pasolini, R.
AU - Sartori, C.
PY - 2015
SP - 26
EP - 37
DO - 10.5220/0005511900260037
PB - SciTePress