Authors:
A. Zelaia
;
I. Alegria
;
O. Arregi
;
A. Arruarte
;
A. Díaz de Ilarraza
;
J. A. Elorriaga
and
B. Sierra
Affiliation:
University of the Basque Country, Spain
Keyword(s):
Document Categorization, Latent Semantic Indexing (LSI), Computer Supported Learning Systems (CSLSs), Domain Module.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence and Decision Support Systems
;
Computer-Supported Education
;
e-Learning
;
Enterprise Information Systems
;
Information Technologies Supporting Learning
;
Intelligent Tutoring Systems
Abstract:
In the process of preparing learning material for Computer Supported Learning Systems (CSLSs), one of the first steps involves finding documents relevant to the topics and to the students. This requires documents to be categorized according to some criteria. In this paper we analyze the behaviour of classification techniques such as Na " i ve Bayes, Winnow, SVMs and k-NN, together with lemmatization and noun selection, in the categorization of documents written in Basque. In a second experiment, we study the effect of applying the Singular Value Decomposition (SVD) dimensionality reduction technique before using the mentioned classification techniques. The results obtained show that the approach which combines SVD and k-NN for a lemmatized corpus gives the best categorization of all with a remarkable difference. The final aim pursued in this project is to facilitate the semiautomatic construction of the domain module of a CSLS.