Papers Papers/2020



Paper Unlock

Authors: Samaneh Chagheri 1 ; Sylvie Calabretto 1 ; Catherine Roussey 2 and Cyril Dumoulin 3

Affiliations: 1 Université de Lyon, France ; 2 Cemagref, France ; 3 , France

ISBN: 978-989-8425-53-9

ISSN: 2184-4992

Keyword(s): Document Classification, Document Structure, Technical Document, Support Vector Machine, Vector Space Model.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Biomedical Engineering ; Data Engineering ; Data Mining ; Databases and Information Systems Integration ; Enterprise Information Systems ; Health Information Systems ; Information Systems Analysis and Specification ; Knowledge Management ; Ontologies and the Semantic Web ; Sensor Networks ; Signal Processing ; Society, e-Business and e-Government ; Soft Computing ; Web Information Systems and Technologies

Abstract: Technical documentation such as user manual and manufacturing document is now an important part of the industrial production. Indeed, without such documents, the products can neither be manufactured nor used according to their complexity. Therefore, the increasing volume of such documents stored in the electronic format, needs an automatic classification system in order to categorize them in pre-defined classes and to retrieve the information quickly. On the other hand, these documents are strongly structured and contain the elements like tables and schemas. However, the traditional document classification typically classifies the documents considering the document text and ignoring its structural elements. In this paper, we propose a method which makes use of structural elements to create the document feature vector for classification. A feature in this vector is a combination of the term and the structure. The document structure is represented by the tags of the XML document. The SV M algorithm has been used as learning and classifying algorithm. (More)


Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Chagheri, S.; Calabretto, S.; Roussey, C. and Dumoulin, C. (2011). DOCUMENT CLASSIFICATION - Combining Structure and Content.In Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8425-53-9, ISSN 2184-4992, pages 95-100. DOI: 10.5220/0003505100950100

author={Samaneh Chagheri. and Sylvie Calabretto. and Catherine Roussey. and Cyril Dumoulin.},
title={DOCUMENT CLASSIFICATION - Combining Structure and Content},
booktitle={Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},


JO - Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - DOCUMENT CLASSIFICATION - Combining Structure and Content
SN - 978-989-8425-53-9
AU - Chagheri, S.
AU - Calabretto, S.
AU - Roussey, C.
AU - Dumoulin, C.
PY - 2011
SP - 95
EP - 100
DO - 10.5220/0003505100950100

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.