
 
conducted based on the students’ performance on a 
preliminary test posed to the student at the first time 
of his/her interaction with the system. Then, the k-
means algorithm takes as input multiple students’ 
characteristics, which are described below and 
serves as means for the initialization of the new-
student-model based on recognized similarities 
between the new student and past students who 
belong to the same stereotype category. 
This paper is organized as follows. First, we 
present the related scientific work. In sections 3 and 
4, we discuss our system’s architecture, namely the 
machine learning in student modelling and the k-
means clustering algorithm. Finally, in section 5, we 
come up with a discussion about the usability of 
centroid-based clustering for user models and we 
present our next plans. 
2 RELATED WORK 
Teaching languages through computer-assisted 
approaches is a quite significant field in language 
learning. User modeling has already been applied in 
a wide variety of scientific areas, including 
educational software for language instruction. 
Machine learning techniques have been applied to 
user modeling problems for acquiring models of 
users. In this section, we try to imprint the speckle of 
the scientific progress of student modeling 
concerning Machine Learning and CALL (Computer 
Assisted Language Learning). 
Basile et al (2011) proposed the exploitation of 
machine learning techniques to improve and adapt 
the set of user model stereotypes by making use of 
user log interactions with the system. To do this, a 
clustering technique is exploited to create a set of 
user models prototypes; then, an induction module is 
run on these aggregated classes in order to improve a 
set of rules aimed as classifying new and unseen 
users. Their approach exploited the knowledge 
extracted by the analysis of log interaction data 
without requiring an explicit feedback from the user. 
Nino (2009) presented a snapshot of what has been 
investigated in terms of the relationship between 
machine translation (MT) and foreign language (FL) 
teaching and learning. Moreover, the author outlined 
some of the implications of the use of MT and of 
free online MT for FL learning. Friaz-Martinez et al 
(2007) investigated which human factors are 
responsible for the behavior and the stereotypes of 
digital libraries users so that these human factors can 
be justified to be considered for personalization. To 
achieve this aim, the authors have studied if there is 
a statistical significance between the stereotypes 
created by robust clustering and each human factor, 
including cognitive styles, levels of expertise and 
gender differences. Virvou and Chrysafiadi (2006) 
described a web-based educational application for 
individualized instruction on the domain of 
programming and algorithms. Their system 
incorporates a user model, which relies on 
stereotypes, the determination of which is based on 
the knowledge level of the learner. Liccheli et al 
(2004) focused on machine learning approaches for 
inducing student profiles, based on Inductive Logic 
Programming and on methods using numeric 
algorithms, to be exploited in this environment. 
Moreover, an experimental session has been carried 
out from the authors, comparing the effectiveness of 
these methods along with an evaluation of their 
efficiency in order to decide how to best exploit 
them in the induction of student profiles. Tsiriga and 
Virvou (2004) introduced the ISM framework for 
the initialization of the student model in Web-based 
ITSs, which is a methodology that uses an 
innovative combination of stereotypes and the 
distance weighted k-nearest neighbor algorithm to 
set initial values for all aspects of the student model. 
 
SignMT was implemented by Ditcharoen et al 
(2010) to translate sentences/phrases from different 
sources in four steps, which are word 
transformation, word constraint, word addiction and 
word ordering. Finally, Virvou and Troussas (2011) 
described a ubiquitous e-learning tutoring system for 
multiple language learning, called CAMELL 
(Computer-Assisted Multilingual E-Language 
Learning). It is a post-desktop model of human-
computer interaction in which students “naturally” 
interact with the system in order to get used to 
electronically supported learning. Their system 
presents advances in user modeling, error proneness 
and user interface design. 
However, after a thorough investigation in the 
related scientific literature, we came up with the 
result that there was no implementation of 
multilingual educational systems that combine 
student modeling and machine learning. Hence, we 
implemented a prototype system, which incorporates 
intelligence in its diagnostic component, offers 
proneness to students’ errors provides error 
diagnosis and advice based on students’ needs. 
3  MACHINE LEARNING IN 
USER MODELING 
Student modeling can undoubtedly benefit from 
Centroid-basedClusteringforStudentModelsinComputer-basedMultipleLanguageTutoring
199