loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Fernando Ruiz-Rico ; Jose Luis Vicedo and María-Consuelo Rubio-Sánchez

Affiliation: University of Alicante, Spain

Keyword(s): Text classification, MEDLINE abstracts.

Abstract: Many algorithms have come up in the last years to tackle automated text categorization. They have been exhaustively studied, leading to several variants and combinations not only in the particular procedures but also in the treatment of the input data. A widely used approach is representing documents as Bag-Of-Words (BOW) and weighting tokens with the TFIDF schema. Many researchers have thrown into precision and recall improvements and classification time reduction enriching BOW with stemming, n-grams, feature selection, noun phrases, metadata, weight normalization, etc. We contribute to this field with a novel combination of these techniques. For evaluation purposes, we provide comparisons to previous works with SVM against the simple BOW. The well known OHSUMED corpus is exploited and different sets of categories are selected, as previously done in the literature. The conclusion is that the proposed method can be successfully applied to existing binary classifiers such as SVM outpe rforming the mixture of BOW and TFIDF approaches. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.189.170.17

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Ruiz-Rico, F.; Luis Vicedo, J. and Rubio-Sánchez, M. (2008). MEDLINE ABSTRACTS CLASSIFICATION - Average-based Discrimination for Noun Phrases Selection and Weighting Applied to Categorization of MEDLINE Abstracts. In Proceedings of the First International Conference on Health Informatics (BIOSTEC 2008) - Volume 1: HEALTHINF; ISBN 978-989-8111-16-6; ISSN 2184-4305, SciTePress, pages 94-101. DOI: 10.5220/0001043500940101

@conference{healthinf08,
author={Fernando Ruiz{-}Rico. and Jose {Luis Vicedo}. and María{-}Consuelo Rubio{-}Sánchez.},
title={MEDLINE ABSTRACTS CLASSIFICATION - Average-based Discrimination for Noun Phrases Selection and Weighting Applied to Categorization of MEDLINE Abstracts},
booktitle={Proceedings of the First International Conference on Health Informatics (BIOSTEC 2008) - Volume 1: HEALTHINF},
year={2008},
pages={94-101},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001043500940101},
isbn={978-989-8111-16-6},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the First International Conference on Health Informatics (BIOSTEC 2008) - Volume 1: HEALTHINF
TI - MEDLINE ABSTRACTS CLASSIFICATION - Average-based Discrimination for Noun Phrases Selection and Weighting Applied to Categorization of MEDLINE Abstracts
SN - 978-989-8111-16-6
IS - 2184-4305
AU - Ruiz-Rico, F.
AU - Luis Vicedo, J.
AU - Rubio-Sánchez, M.
PY - 2008
SP - 94
EP - 101
DO - 10.5220/0001043500940101
PB - SciTePress