loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: David Vilar 1 ; Hermann Ney 1 ; Alfons Juan 2 and Enrique Vidal 2

Affiliations: 1 Lehrstuhl für Informatik VI, RWTH Aachen University, Germany ; 2 Institut Tecnològic d’Informàtica, Universitat Politècnica de València, Spain

Keyword(s): Text Classification, Naive Bayes, Multinomial Distribution, Feature Selection, Smoothing, Length Normalization

Abstract: The number of features to be considered in a text classification system is given by the size of the vocabulary and this is normally in the range of the tens or hundreds of thousands even for small tasks. This leads to parameter estimation problems for statistical based methods and countermeasures have to be found. One of the most widely used methods consists of reducing the size of the vocabulary according to a well defined criterion in order to be able to reliably estimate the set of parameters. In the field of language modeling this problem is also encountered and several smoothing techniques have been developed. In this paper we show that using the full vocabulary together with a suitable choice of the smoothing technique for the text classification task obtains better results than the standard feature selection techniques.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.222.69.152

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Vilar, D.; Ney, H.; Juan, A. and Vidal, E. (2004). Effect of Feature Smoothing Methods in Text Classification Tasks. In Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (ICEIS 2004) - PRIS; ISBN 972-8865-01-5, SciTePress, pages 108-117. DOI: 10.5220/0002682001080117

@conference{pris04,
author={David Vilar. and Hermann Ney. and Alfons Juan. and Enrique Vidal.},
title={Effect of Feature Smoothing Methods in Text Classification Tasks},
booktitle={Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (ICEIS 2004) - PRIS},
year={2004},
pages={108-117},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002682001080117},
isbn={972-8865-01-5},
}

TY - CONF

JO - Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (ICEIS 2004) - PRIS
TI - Effect of Feature Smoothing Methods in Text Classification Tasks
SN - 972-8865-01-5
AU - Vilar, D.
AU - Ney, H.
AU - Juan, A.
AU - Vidal, E.
PY - 2004
SP - 108
EP - 117
DO - 10.5220/0002682001080117
PB - SciTePress