Finding Regularities in Courses Evaluation with K-means Clustering

R. Campagni, D. Merlini and M. C. Verri

Dipartimento di Statistica, Informatica, Applicazioni, Universit`a di Firenze

Viale Morgagni 65, 50134, Firenze, Italia

Keywords:

Educational Data Mining, K-means Clustering, Courses Evaluation, Assessment.

Abstract:

This paper presents an analysis about the courses evaluation made by university students together with their

results in the corresponding exams. The analysis concerns students and courses of a Computer Science pro-

gram of an Italian University from 2001/2002 to 2007/2008 academic years. Before the end of each course,

students evaluate different aspects of the course, such as the organization and the teaching. Evaluation data

and the results obtained by students in terms of grades and delays with which they take their exams can be

collected and reorganized in an appropriate way. Then we can use clustering techniques to analyze these data

thus show possible correlation between the evaluation of a course and the corresponding average results as

well as regularities among groups of courses over the years. The results of this type of analysis can possibly

suggest improvements in the teaching organization.

1 INTRODUCTION

The evaluation of university education is an important

process whose results can be used in the program-

ming and management of the educational activities

by monitoring resources (ﬁnancial, human, structural

and others), services (orientation for students and ad-

ministrative ofﬁces), students careers, courses and oc-

cupancy rate. In order to evaluate all these aspects, it

is important to analyse the opinion of the users of uni-

versity education, i.e. the students.

The evaluation of the learning process falls in the

context of the Educational Data Mining (EDM), an

emerging and interesting research area that aims to

identify previously unknown regularities in educa-

tional databases, to understand and improve student

performance and the assessment of their learning pro-

cess. As described in (Romero and Ventura, 2010),

EDM uses statistical, machine learning and data min-

ing algorithms on different types of data related to the

ﬁeld of education. It is concerned with developing

methods for exploring these data to better understand

the students and the frameworks in which they learn

thus possibly enhancing some aspects of the qual-

ity of education. Data mining techniques have also

been applied in computer-based and web-based ed-

ucational systems (see, e.g., (Romero et al., 2010;

Romero et al., 2008)). In this paper, we use a data

mining approach based on

K-means

clustering to link

the evaluation of courses taken by students with their

results, in terms of average grade and delay in the cor-

responding exams. We also analyse the evaluation of

courses over the years in order to identify similar be-

haviors or particular trends among courses, by using

an approach similar to time series clustering (see, e.g.,

(Liao, 2005)).

This study deepens the analysis presented in

(Campagni et al., 2013) and is analogous to that used

in (Campagni et al., 2012a; Campagni et al., 2012b;

Campagni et al., 2012c). The analysis refers to a

real case study concerning an Italian University but

it could be applied to different scenarios, except for

a possible reorganization of the involved data. The

data set is not very large but allows us to illustrate a

quite general methodology on a real case study. Our

approach uses standard data mining techniques, but

we think very interesting the concrete possibility of

applying these techniques to ﬁnd and analyse patterns

in the context of university courses evaluation, even

in large universities.

2 DATA FOR ANALYSIS

In this section, we describe how courses are evalu-

ated by students at the Universityof Florence, in Italy,

with the aim of providing a methodology to search

for regularities in data concerning courses evalua-

tion. Therefore, the steps we present can be ap-

Campagni R., Merlini D. and Verri M..

Finding Regularities in Courses Evaluation with K-means Clustering.

DOI: 10.5220/0004796000260033

In Proceedings of the 6th International Conference on Computer Supported Education (CSEDU-2014), pages 26-33

ISBN: 978-989-758-021-5

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

plied also in other academic contexts. In particular,

we refer to a Computer Science degree of the Sci-

ence School, under the Italian Ministerial Decree n.

509/1999. This academic degree was structured over

three years and every academic year was organized in

two semesters; there were several courses in each of

these six semesters and at the end of a semester stu-

dents could take their examinations. Exams could be

taken in different sessions during the same year, after

the end of the corresponding courses, or later.

Table 1 illustrates an example of students data af-

ter a preprocessing phase which allow us to integrate

original attributes, such as the grade and the date of

the exam, with both the semester in which the course

was given, Semester1, and the semester in which the

exam was taken, Semester2. Finally, we can com-

pute the value Delay as the difference between the

semester of the course and the semester in which the

student took the exam. We highlight that the values of

attributes Semester1 and Semester2 are not usually

stored in the databases of the universities, therefore

this preprocessing phase may be onerous.

At the University of Florence, starting from the

academic year 2001/2002, a database stores informa-

tion about evaluation of the courses quality of various

degree programs, among which we ﬁnd the degree

under consideration. The results of this process are

available at the address (SISValDidat), under permis-

sion of the involved teacher, and show for each course

several pieces of information, such as the name of the

teacher who took the course and the average rating

given by students on various topics. Before the end of

each course (at about 2/3 of the course), students com-

pile, anonymously, a module to express their opinion

on the course just taken. This form is divided into the

following ﬁve paragraphs:

• paragraph 1, concerns the organization of the de-

gree program;

• paragraph 2, concerns the organization of the

course;

• paragraph 3, concerns the teacher;

• paragraph 4, concerns classrooms and equipment;

• paragraph 5, concerns the general satisfaction

about the course.

Each paragraph is composed by some questions; stu-

dents can choose among four levels of answers, two

negative and two positive levels (disagree, slightly

disagree, slightly agree, agree). For details the in-

terested reader can see the sample of the module in

(SISValDidat).

For each course of an academic year and for each

paragraph, we can compute the percentage of positive

answers, that is, of type slightly agree and agree by

grouping together all questions belonging to the same

paragraph and their average percentage value.

To relate data of students careers with courses

evaluation, for each course we can compute the aver-

age grade and the average delay attained by students

who took the exam in the same year. An example

of this data organization is illustrated in the ﬁrst four

columns of Table 2. As already observed, the eval-

uation of courses is anonymous and is done only by

students who really take the course, therefore, in this

kind of organization, it may happen to consider in-

formation concerning exams of students who may not

be the same students who evaluated the courses. As

a consequence, we can only compare the results of

courses evaluation in a speciﬁc year with the aggre-

gate results of students who took the corresponding

exams in the same period. However, this data orga-

nization does not change a lot if it was possible to

identify the students involved in the courses evalu-

ation in order to connect properly the results of the

evaluation with those of exams. Obviously, in this

case we should ensure the privacy of results, for ex-

ample by using a differential privacy approach (see,

e.g., (Dwork, 2008)).

After a preprocessing phase, we can organize stu-

dents and evaluation data into two different ways by

taking into account the following ﬁelds:

• Exam, the code which identiﬁes an exam;

• Year, the year of the evaluation;

• AvgGrade, the average grade of the exam;

• AvgDelay, the average delay, in semesters, of stu-

dents exams;

• Park(t), the percentage of positive evaluations of

paragraph k at time t.

In particular, Table 2 illustrates a sample of the dataset

which can be used to compare examination results and

courses evaluation while Table 3 represents a sample

of data that can be used to analyze the evolution over

the years of courses evaluation. As we will illustrate

in Section 3, data organized as in Table 2 will be clus-

tered with

K-means

algorithm by using the Euclidean

distance to separate the multidimensional points rep-

resenting some characteristic of a course in a spe-

ciﬁc year; data organized as in Table 3 will be rep-

resented in the plane as trajectories corresponding to

the evaluation of courses over the years and will be

clustered with the Manhattan distance. Both these ap-

proaches can be used to ﬁnd regularities in courses

evaluations and can highlight criticalities or suggest

improvements in the teaching organization.

FindingRegularitiesinCoursesEvaluationwithK-meansClustering

Table 1: A sample of students data: grades in thirtieths.

Student Exam Date Grade Semester1 Semester2 Delay

100 10 2001-01-14 24 1 1 0

100 20 2002-12-20 27 2 3 1

200 20 2002-06-04 21 2 2 0

300 10 2001-01-29 26 1 3 2

400 10 2002-02-15 26 1 2 1

Table 2: Data organization for comparing examination results and courses evaluation.

Exam Year AvgGrade AvgDelay Par1 ... Par5

10 2001 25 1 51 ... 60

10 2002 26 1 50 ... 61

10 2007 25 1 81 ... 67

20 2001 24 0.5 56 ... 77

20 2002 26 1 62 ... 59

3 K-MEANS CLUSTERING WITH

EUCLIDEAN AND

MANHATTAN DISTANCES

Among the different data mining techniques, cluster-

ing is one of the most widely used methods. The

goal of cluster analysis is to group together objects

that are similar or related and, at the same time, are

different or unrelated to the objects in other clusters.

The greater the similarity (or homogeneity)is within a

group and the greater the differences between groups

are the more distinct the clusters are.

K-means

a very simple and well-known algorithm based on a

partitional approach; it was introduced in (MacQueen,

1967) and a detailed description can be found in (Tan

et al., 2006). In this algorithm, each cluster is associ-

ated with a centroid and each point is assigned to the

cluster with the closest centroid by using a particular

distance function. The centroids are iteratively com-

puted until a ﬁxed point is found. The number K of

clusters must be speciﬁed. In particular, in this paper

we use both the Euclidean and Manhattan distance;

in the ﬁrst case, the centroid of a cluster is computed

as the mean of the points in the cluster while in the

second case the appropriate centroid is the median of

the points (see, e.g., (Tan et al., 2006)).

The evaluation of the clustering model resulting

from the application of a cluster algorithm is not a

well developedor commonly used part of cluster anal-

ysis; nonetheless, cluster evaluation, or cluster vali-

dation, is important to measure the goodness of the

resulting clusters, for example to compare clustering

algorithms or to compare two sets of clusters. In our

analysis we measured cluster validity with correla-

tion, by using the concept of proximity matrix and in-

cidence matrix. Speciﬁcally, after obtaining the clus-

ters by applying

K-means

to a dataset, we computed

the proximity matrix P = (P

i, j

) having one row and

one column for each element of the dataset. In par-

ticular, each element P

i, j

represents the Euclidean,

or Manhattan, distance between elements i and j in

the dataset. Then, we computed the incidence matrix

I = (I

i, j

), where each element I

i, j

is 1 or 0 if the ele-

ments i and j belong to the same cluster or not. We

ﬁnally computed the Pearson’s correlation, as deﬁned

in (Tan et al., 2006, page 77), between the linear rep-

resentation by rows of matrices P and I. Correlation

is always in the range -1 to 1, where a correlation of

1 (-1) means a perfect positive (negative) linear rela-

tionship.

As a ﬁrst example, Table 4 illustrates the ﬁnal

grade and the graduation time, expressed in years,

of a sample of graduated students. By applying the

K-means

algorithm to this dataset, with K = 2, Fi-

nalGrade and Time as clustering attributes and by

using the Euclidean distance, we obtain the follow-

ing two clusters, in terms of the student identiﬁers:

= {100, 400, 600, 700} and C

= {200, 300, 500};

the centroids of the clusters have coordinates C

(107, 3.5) and C

= (96, 5.33), respectively. Tables

5 and 6 show the proximity matrix and the incidence

matrix corresponding to clustersC

and C

of the data

set illustrated in Table 4. The Pearson’s correlation

between the linear representation of these two matri-

ces is −0.59, a medium value of correlation.

CSEDU2014-6thInternationalConferenceonComputerSupportedEducation

Table 3: Data organization for analyzing the trend over the years of courses evaluation.

Exam Par1(2001) ... Par1(2007) ... Par5(2001) .. . Par5(2007)

10 51 ... 81 ... 60 ... 67

20 56 ... 84 ... 77 ... 84

Table 4: A sample data set about students.

Student FinalGrade Time

100 110 3

200 95 5

300 100 5

400 103 4

500 98 6

600 106 4

700 109 3

Table 5: The proximity matrix for data of Table 4.

100 200 300 400 500 600 700

100 0

200

20.12 0

300

10.25 10 0

400

7.07 13.08 3.32 0

500

12.41 8.06 2.24 5.48 0

600

4.12 16.06 6.16 3 8.31 0

700

1 19.13 9.27 6.08 11.45 3.16 0

As another example, Table 7 shows a sample

of data concerning courses evaluation: in particu-

lar, each row contains the exam identiﬁer and the

percentage of positive evaluation of a generic para-

graph at time t

, for i = 1, ··· , 4. We can apply the

K-means

algorithm to the dataset in Table 7, with

K = 2, Par(t

), for i = 1, · ·· , 4, as clustering at-

tributes and by using the Manhattan distance. This

means to represent each element of the data set as

a broken line connecting the points (t

,Par(t

)), for

i = 1, ··· , 4, in the cartesian plane. The Manhattan

distance between two broken lines thus corresponds

to the sum of the vertical distances between the ordi-

nates. By using the

K-means

algorithm, we obtain

the following two clusters in terms of course iden-

Table 6: The incidence matrix for clustering of data of Table

100 200 300 400 500 600 700

100 1

200 0 1

300 0 1 1

400

1 0 0 1

500 0 1 1 0 1

600

1 0 0 1 0 1

700 1 0 0 1 0 1 1

Table 7: A sample data set about courses evaluation.

Exam Par(t

) Par(t

)

100 55 65 67 60

200 85 87 85 92

300 72 68 65 77

400 77 80 70 73

500 80 95 90 91

tiﬁers: C

= {200, 500} and C

= {100, 300, 400};

the centroids of the clusters are represented by

the sequences C

= [(1, 72), (2, 68), (3, 67), (4, 73)]

and C

= [(1, 82.5), (2, 91), (3, 87.5), (4, 91.5)], re-

spectively. Figure 1 illustrates the clustering result by

evidencing the centroids C

and C

Figure 1:

K-means

results with data of Table 7 with K = 2

and Manhattan distance, centroids in evidence.

Also in this case we can compute the Pearson’s

correlation by using the proximity and the incidence

matrices computed by using the Manhattan distance.

3.1 The Case Study

As already observed, the real datasets we analysed

concern courses and exams during the academic years

from 2001/2002 to 2007/2008 at the Computer Sci-

ence program of the University of Florence, in Italy.

In particular, the ﬁrst data set is organized as illus-

trated in Table 2 and refers to the evaluation of 40

courses in seven different years. We explicitly ob-

serve that we did not consider in our analysis those

courses evaluated by a small number of students. For

clustering, we used the

K-means

implementation of

FindingRegularitiesinCoursesEvaluationwithK-meansClustering

Weka

(Witten et al., 2011), an open source software

for data mining analysis. The aim was to ﬁnd if there

is a relation between the valuation of a course and

the results obtained by students in the correspond-

ing exam. We performed several tests with different

values of the parameter K and we selected different

groups of attributes. We point out that the attributes

selection is an important step and should be done ac-

cording to the preference of an expert of the domain,

for example the coordinator of the degree program.

For each choice of attributes, we applied the

K-means

algorithm with the Euclidean distance to identify the

clusters; then, we computed the Pearson’s correlation

by using the proximity and incidence matrices. The

tests we performed pointed out that the exams hav-

ing good results, in terms of average grade and delay,

correspond to courses having also a good evaluation

from students.

In particular, we used AvgGrade, AvgDelay,

Par1, Par2, Par3, Par4 and Par5 as clustering at-

tributes and K = 2, obtaining the clusters illustrated

in Figures 2, 3, 4 and 5; each ﬁgure represents the

projection of the clusters along two dimensions corre-

sponding to the following pairs of attributes AvgDe-

lay and Par3, AvgGrade and Par3, AvgDelay and

Par4 and, ﬁnally, AvgGrade and Par4. The cen-

troids of the resulting clusters are shown in Table 8,

which also contains the average values relative to the

full data set.

Table 8: The centroids of clusters in Figures 2, 3, 4, 5.

Attribute Full Data Cluster0 Cluster1

AvgGrade 25.31 25.85 24.58

AvgDelay 2.61 1.8 3.68

Par1 70.86 77.74 61.67

Par2 72.23 82.19 58.94

Par3 84.51 90.25 76.86

Par4 72.03 74.67 68.5

Par5 76.02 80.83 69.61

The cluster number 0, which correspond to 88

blue stars in the ﬁgures, contains the courses which

students took with small delay and that they evalu-

ated positively. On the other hand, cluster number 1,

corresponding to 66 red stars, contains those courses

which students took with a large delay and that they

evaluated less positively. We observe that the cen-

troids of the two clusters are very close relative to the

attribute Par4 which concerns classrooms and equip-

ment. This is also evidenced from Figures 4 and 5,

where the blue and red stars are less separated than

those in Figures 2 and 3. The Pearson’s correlation

corresponding to these clusters is equal to −0.35. We

obtained an improvement by excluding the attribute

Figure 2: Clusters of Table 8 with AvgDelay and Par3 in

evidence.

Figure 3: Clusters of Table 8 with AvgGrade and Par3 in

evidence.

Figure 4: Clusters of Table 8 with AvgDelay and Par4 in

evidence.

CSEDU2014-6thInternationalConferenceonComputerSupportedEducation

Figure 5: Clusters of Table 8 with AvgGrade and Par4 in

evidence.

Par4 from clustering, in fact in this case we ﬁnd

a correlation equal to −0.51. In general, our tests

evidenced that the paragraphs evaluations which are

more correlated with students results regard attributes

Par2 and Par3, that is, those concerning the course

organization and the teacher. We point out that the

value K = 2 gave the best results in terms of correla-

tion.

Among the courses considered in the previous

data set, we selected those evaluated all seven years,

for a total of sixteen courses, some in Mathematics

and others in Computer Science. This time we are in-

terested in analysing data organized as in Table 3, by

considering the evaluation of a particular paragraph

over the years. The aim was to ﬁnd if there are simi-

lar behaviors among courses, that is, if we can classify

courses according to their evaluations. We performed

several tests, by choosing a paragraph at a time. For

each choice of attributes, we applied the

K-means

al-

gorithm with the Manhattan distance to identify the

clusters; also in this case we computed the Pearson’s

correlation by using the proximity and incidence ma-

trices.

Figure 6 illustrates the result of

K-means

with K =

2, Manahattan distance and Par2(2001), Par2(2002),

·· ·, Par2(2007) as clustering attributes. The points

deﬁning the centroid trajectories of the resulting clus-

ters are shown in Table 9, which also contains the me-

dian values relative to the full data set. The Pearson’s

correlation corresponding to these clusters is equal to

−0.64.

The ﬁgure puts well in evidence that the courses

are divided into two clusters with well distinct cen-

troids. The red cluster contains courses that have been

evaluated better over the years while the blue clus-

ter corresponds to courses that students rated worse.

Table 9: The points deﬁning the centroid trajectories of

clusters in Figure 6.

Attribute Full Data Cluster0 Cluster1

Par2(2001) 73.5 85 57

Par2(2002) 77 87 56

Par2(2003) 73.5 84 52

Par2(2004) 72.5 79 58

Par2(2005) 74.5 79 65

Par2(2006) 78.5 84 71

Par2(2007) 75.5 83 69

Figure 6: Clusters of Table 9 with centroids in evidence:

each line represents the percentage of positive evaluations

about the organization of a course (paragraph 2) over the

years 2001-2007.

What is interesting, though not surprising, is that

all courses in the red cluster are Computer Science

courses while the blue cluster contains many Mathe-

matics courses. We highlight that the centroids show

rather clearly the behavior of the assessment over the

years. In particular, the evaluation of the courses in

the blue cluster has improvedoverthe years while that

of courses in the red cluster has remained more stable.

Also in this case the best results in terms of cor-

relation were found with K = 2; however, with K = 4

we found the courses rated worse distributed into two

clusters, one of which contains only the Mathemat-

ics courses. The corresponding centroid illustrates a

gradual improvement of the assessment for this type

of courses during the years under examination.

FindingRegularitiesinCoursesEvaluationwithK-meansClustering

4 CONCLUSION AND FUTURE

WORK

The results of the previous sections show, in a for-

mal way with data mining techniques, that there is

a relationship between the evaluation of the courses

from students and the results they obtained in the cor-

responding examinations. In particular, the analysis

performed on data related to the Computer Science

degree program under examination illustrates that the

courses which received a positive evaluation corre-

spond to exams in which students obtained a good

average mark and that they took with a small delay.

Conversely, the worst evaluations were given to those

courses which do not match good achievements by

students.

The analysis based on clustering with Manhattan

distance allows us to classify courses according to

the assessment received by students and can highlight

some regularities that emerge over the years or points

out some trend reversals due to changes of teachers.

In the Computer Science degree program just consid-

ered, for example, we observe the trend to give not

so good evaluation to Mathematics courses. Results

of this type point out a critical issue in the involved

courses and can be used to implement improvement

strategies.

We wish to emphasize that our analysis refers to

the courses evaluation that students make before tak-

ing the exams and knowing their grades. In fact, as

already observed, the evaluation module is given to

students before the end of the course. Surely, there

is the risk that their judgment is inﬂuenced by the

inherent difﬁculty of the course or by the comments

made by students of the previous years. To this pur-

pose, it is important that during the module compi-

lation the teacher explains that a serious assessment

of the course can increase the quality level of the in-

volved services. Students represent the end-users as

well as the principal actors of the formative services

offered by the University and the measure of their per-

ceived quality is essential for planning changes. How-

ever, the results of courses evaluation should always

be considered in a critical way and should not have the

goal of simplifying the contents to get best ratings.

In general, many other factors should be consid-

ered for evaluating courses and student success, as

addressed in (Romero and Ventura, 2010). The ap-

proach used in this work could be reﬁned and deep-

ened if it was possible to identify the students in-

volved in the courses evaluation in order to connect

properly the results of the evaluation with those of ex-

ams. Moreover, it would be interesting to connect the

assessment of students with other information such

as the gender of students and teachers or the kind of

high school attended by students. Starting from the

academic year 2011/2012, the University of Florence

began to manage on line the evaluation module de-

scribed in Section 2. Therefore, in a next future, it

might be possible to proceed in this direction, tak-

ing into account appropriate strategies to maintain pri-

vacy.

An interesting additional source of information

could be given by social media sites, such as

Face-

book

Twitter

, used by students to post comments

about courses and teachers. It would be useful to link

this information with the results of students and their

ofﬁcial evaluations about teachings, in order to take

into account more feedbacks. In such a context, it

might be interesting to use text mining techniques to

classify the student comments and enrich the database

for an analysis similar to that illustrated in this work.

REFERENCES

Progetto SISValDidat. https://valmon.disia.uniﬁ.it/

sisvaldidat/uniﬁ/index.php.

Campagni, R., Merlini, D., and Sprugnoli, R. (2012a).

Analyzing paths in a student database. In The 5th

International Conference on Educational Data

Mining, Chania, Greece, pages 208–209.

Campagni, R., Merlini, D., and Sprugnoli, R.

(2012b). Data mining for a student database. In

ICTCS 2012, 13th Italian Conference on Theo-

retical Computer Science, Varese, Italy.

Campagni, R., Merlini, D., and Sprugnoli, R.

(2012c). Sequential patterns analysis in a student

database. In ECML-PKDD Workshop: Mining

and exploiting interpretable local patterns (I-Pat

2012), Bristol.

Campagni, R., Merlini, D., Sprugnoli, R., and Verri,

M. C. (2013). Comparing examination results

and courses evaluation: a data mining approach.

In Didamatica 2013, Pisa, Area della Ricerca

CNR, AICA, pages 893–902.

Dwork, C. (2008). Differential privacy: a survey of

results. In Theory and Applications of Models

of Computation, 5th International Conference,

TAMC 2008, pages 1–19.

Liao, T. W. (2005). Clustering of time series data: a

survey. Pattern Recognition, 38(11):1857–1874.

MacQueen, J. (1967). Some methods for classiﬁca-

tions and analysis of multivariate observations.

In Proc. of the 5th Berkeley Symp. on Mathemat-

ical Statistics and Probability. University of Cal-

ifornia Press., pages 281–297.

CSEDU2014-6thInternationalConferenceonComputerSupportedEducation

Romero, C., Romero, J. R., Luna, J. M., and Ven-

tura, S. (2010). Mining rare association rules

from e-learning data. In The 3rd International

Conference on Educational Data Mining, pages

171–180.

Romero, C. and Ventura, S. (2010). Educational Data

Mining: A Review of the State of the Art. IEEE

Transactions on systems, man and cybernetics,

40(6):601–618.

Romero, C., Ventura, S., and Garc´ıa, E. (2008). Data

mining in course management systems: Moodle

case study and tutorial. Computers & Education,

51(1):368–384.

Tan, P. N., Steinbach, M., and Kumar, V. (2006). In-

troduction to Data Mining. Addison-Wesley.

Witten, I. H., Frank, E., and Hall, M. A. (2011). Data

Mining: Practical Machine Learning Tools and

Techniques, Third Edition. Morgan Kaufmann.

FindingRegularitiesinCoursesEvaluationwithK-meansClustering