Enhance E-Learning through Data Mining for Personalized

Intervention

Lingma Lu Acheson and Xia Ning

Department of Computer and Information Science, Indiana University-Purdue University Indianapolis,

723 West Michigan Street, SL280, Indianapolis, IN 46202, U.S.A.

Keywords: Education Data Mining, E-Learning, Analytics in Education, Assessment.

Abstract: E-Learning has become an integral part of college education. Due to the lack of face-to-face interactions in

online courses, it is difficult to track student involvement and early detecting their performance decline via

direct communications as we typically practice in a classroom setting. Hence there is a critical need to

significantly improve the learning outcomes of online courses through advanced, non-traditional approaches.

University courses are often conducted through a web learning management system, which captures large

amount of course data, including students’ online footprints such as quiz scores, logged entries and frequency

of log-ins. Patterns discerned from this data can greatly help instructors gain insights over students learning

behaviours. This positioning paper argues potential approaches of using Data Mining and Machine Learning

techniques to analyse students’ online footprints. Software tools could be created to profile students,

identifying those with declining performance, and make corrective recommendations to instructors. This

timely and personalized instructor intervention would ultimately improve students’ learning experience and

enhance their learning outcome.

1 RESEARCH PROBLEM

Data Mining (DM) and Machine Learning(ML) tools

have been widely used in education to assist

instructors in understanding students and improving

learning outcomes, particularly in online education.

Due to the flexibility and often student-centred

curriculum design, online courses have become an

integral part of college education, hence there is a

critical need to significantly improve the learning

outcomes of online courses through advanced, non-

traditional approaches. We believe the automated

analysis of student performance and behaviour

though advanced DM and ML methods, together with

the intelligent personalized interventions, has a great

potential to achieve the goal.

University courses are often conducted through a

web learning management system such as Blackboard

or Canvas. These systems capture large amount of

course data, including students’ online footprints such

as quiz scores, logged entries and frequency of log-

ins. Patterns discerned from this data can greatly help

instructors gain insights over student learning

behaviours. We project that it would introduce a

significant amount of benefits if data mining and

machine learning techniques are used to analyse

students’ online footprint. From this, software tools

would be created to profile students, identifying those

with declining performance, and make corrective

recommendations to instructors.

2 OUTLINE OF OBJECTIVES

The objectives are designed to apply advanced DM

and ML tools in online courses to understand student

performance and behaviours to detect performance

decline and risk of dropout, and to design and deliver

personalized interventions to students, so as to

increase their engagement and improve their learning

outcomes. We believe tool development is necessary

to answer the following two questions:

Question 1: How to understand and forecast

student learning experience and performance based

on their online footprints?

Acheson, L. and Ning, X.

Enhance E-Learning through Data Mining for Personalized Intervention.

DOI: 10.5220/0006793304610465

In Proceedings of the 10th International Conference on Computer Supported Education (CSEDU 2018), pages 461-465

ISBN: 978-989-758-291-2

461

Question 2: How to construct and deliver

personalized interventions via peer-to-peer off-line

communications?

Due to the lack of face-to-face interactions in

online courses, it is difficult to track student

involvement and early detecting their performance

decline via direct communications as we typically

practice in a classroom setting. Fortunately, students

usually leave a lot of digital footprints whenever they

take the courses, participate the online forum

discussions, submit homework, read online slides,

etc. Such digital footprints are very valuable

information for the instructors to understand student

behaviours, and make meaningful interpretations and

predictions therefrom. However, given the fact that

online courses are normally large classes, it will be a

huge workload if instructors manually analyse such

footprint data. In addition, the highly heterogeneous

student body makes any manual analysis highly

nontrivial. This is because any conclusions for an

individual student may or may not apply to others, for

example, from a different major.

3 STATE OF THE ART

Schools offering fully online, hybrid and web-

enhanced degree programs have seen substantial

growth over the past ten years and all signs show that

growth will continue at this rapid rate (“How

Prevalent is Online Learning”, 2017). In addition,

Massive Open Online Courses offer a wide range of

online educational programs from leading

universities (Combs & Mesko, 2015). One clear

advantage of an online course is that logs can provide

clues about learner experiences in relation to ease of

course navigation and perceived value of content

(Robyn, 2013). On the other hand, the flaw of

MOOCs were eagerly dissected – high dropout rates,

limited social interaction, heavy reliance on

instructivist teaching, poor results for

underrepresented student populations, and so on

(Bunk et al., 2015). For example, a program

introduced by San Jose State University and Udacity

to run remedial courses in popular subjects ended in

a failure rate of up to 71% percent (Devlin, 2013).

Despite of this, the amount of data generated from

online courses are skyrocketing. Researchers and

developers of online learning systems have begun to

explore analogous techniques for gaining insights

from learners’ activities online (U.S. Department of

Education, 2012).

EDM has been emerging into an individual

research area in recent years (Baker et al., 2010).

Several main research focuses are developed in EDM,

including student behaviour modelling, student

performance modelling, assessment, et. al. Bayes

theorem, Hidden Markov Model, decision trees et. al.

are among the most popular methods applied in these

researches (Pena-Ayala, 2014).

Methods such as Collaborative Filtering (CF)

(Ning, Desrosiers, & Karypis, 2015) and Matrix

Factorization (MF) (Koren, Bell, & Volinsky, 2009),

have attracted increasing attention in EDM

applications, due to their strong ability to deal with

sparse data for ranking, prediction or classification,

which is particularly common in EDM. For example,

Sweeney et. al. (2015, 2016) adopted developed

methods including SVD, SVD-kNN and

Factorization Machine (FM) to predict next-term

performance. Polyzou and Karypis (2013) addressed

the future course grade prediction problem with three

approaches: course-specific regression, student-

specific regression and course-specific matrix

factorization. Moreover, neighborhood-based CF is

one of the most popular methods in EDM. Many

existing approaches (Ray & Sharma, 2011;

Bydzovska, 2015; Denley, 2013) predict grades

based on the student similarities, that is, they first

identify similar students and use their grades to

estimate the grades of the students of interest.

In order to capture the change of student dynamics

over time, various dynamic models have been

developed in EDM. Sun et. al. (2012, 2014) modelled

student preference change using a state space model

on latent student factors, and estimated student

factors over time using noncausal Kalman filters.

Similarly, Chua et.al. (2013) applied Linear

Dynamical Systems (LDS) on Non-negative Matrix

Factorization (NMF) to model student dynamics.

Zhang et. al. (2014) learned an explicit transition

matrix over the latent factor for each student, and

solved for the student and course latent factors and

the transition matrices within a Bayesian framework.

4 METHODOLOGY

To answer question 1, we argue that applying DL and

ML tools to analyse the digital footprints of a

carefully chosen online course would be a good pilot.

We believe particular focus on the following

information is necessary: 1) time students spend on

slide reading and course video watching, 2) the

frequency that students log into the learning system,

3) the frequency that students participate in online

forum discussion and time they spend, 4) their

interactions with other students on the forum through

CSEDU 2018 - 10th International Conference on Computer Supported Education

462

asking/answering others’ questions, and 5)

homework scores, quiz scores and student

performance on each of the single questions, etc.

To answer question 2, we could identify students

of declining performance. Customized learning

materials and interventions will be delivered, for

example, through homework assignment, email

communications, etc. Students would be

characterized by different traits, for example, short of

necessary reading, weak at concept understanding,

according to our analysis outcomes. We would design

and maintain a pool of homework questions, and

assign the questions that are helpful in improving the

corresponding weakness to respective students.

Matching algorithms could be applied so as to

guarantee coverage and fairness in the personalized

assignment for all students. Email communications

would also take place so that we could get feedback

on the learning experience, suggestions, demands,

etc., from identified students. Such feedback would

be further integrated into our DM and ML tools to

consistently improve the analysis accuracy and

sensitivity.

Typically, the mentioned intervention could be

divided into three stages – data collection, model

building, and model testing. At the early stage, digital

footprints would be collected and properly formatted.

Once sufficient data are in place, we could perform

some initial data analysis to gain general

understanding of student behaviours and their

association with learning outcomes, then apply DM

and ML model on such data. It is expected that due to

the high heterogeneity of the student body, we might

need to adjust our previous models (e.g., via

parameter configurations, including additional

components) to have it more adapted to the current

student body. We would then apply the model to the

students. We would also get feedbacks from the

students and get feedbacks on the model predictions.

Based on such feedbacks, we would continuously

improve and adjust the model for better predictions.

The course to be chosen to participate in this study

is one of the institution’s general education core, thus

is a course of significant importance. It routinely

enrols over 600 students per academic year, with

approximately 500 of those being online and with a

diverse student body majoring in science, technology,

business, education, liberal arts, philanthropy,

communication etc.. Around 50% of the students are

first-year and second-year students, who generally

face significant change, growth and challenge after

stepping out of high school. They have unique needs

and require additional support to nurture their

cognitive learning and emotional development. If

more instructor intervention and individual attention

is given, it will greatly build their confidence in

continuing with their college education. In particular,

we will choose an online format because online

courses can have large enrolments, and this approach

requires sufficient amount of data. Also, online

courses need special attention due to lack of face-to-

face time with students. Online courses require more

time-management skills and more self-discipline.

Lack of face-to- face time makes it hard for

instructors to interact with students, identify

problems and disperse timely feedback. Thus an

online session will be a good candidate to measure the

effectiveness of the proposed approach.

In the chosen session, all study materials

(syllabus, slides, videos, and supplementary

documents), instructions and assignments would be

published on the learning management tool, namely

Canvas, before the semester starts. Deadlines for

assignments would be announced at the beginning of

the semester so students could set pace for

themselves. Assignments will include projects,

quizzes, reading materials and exams. Weekly emails

would be sent out to students providing summary for

previous weeks(s), detailed guidance for the

following week, useful tips or due date reminders.

For this study, we would be mostly interested in

discerning learning patterns from students’ online

footprints, thus data extracted from Canvas will be

focused on time logged onto Canvas, time spent on

each quiz and quiz scores, performance on each quiz

problem, number and time of assignments submitted,

assignment scores, time spent on exams and each

exam problem, performance on each exam problem

and overall exam scores, and course grades. Figure 1

is a snap shot of this data set that shows the number

of page views, length of login time, grades for each

assignment etc., for each student. Submission status

Figure 1: Number of times student accessing course web

pages, total login time in minutes and assignment grades.

Enhance E-Learning through Data Mining for Personalized Intervention

463

for each assignment will be captured as shown in

Figure 2, including due time, submission time,

whether it is “On Time” or “Late”, grade for each

submission and so on. Item by item analysis for each

quiz and exam questions as illustrated in Figure 3, is

also available on Canvas. This will provide

significant insight on areas of weakness in concept

understanding.

Figure 2: Submission status for each assignment.

Figure 3: Quiz item-by-item result analysis.

5 EXPECTED OUTCOMES

The predicted learning outcomes are expected to be

better learning experience and better final grades

overall. The students would be better involved in the

course through personalized interventions and

communications, and would better master the course

materials through customized homework assignment.

Eventually, the students are expected to have better

homework grades and final grades.

Course evaluation data would be monitored both

prior to and after implementation of this approach.

Beginning and End-of-semester surveys would be

given to students. We would understand the student

expectations and experience regarding the DM and

ML analysis, and get the feedback as to whether they

feel more effective and involving in the learning

experience, whether they prefer the personalized

intervention and communication, and what comments

they have, etc. We would also compare the

performance of students with DM and ML applied

with that of students from previous years without DM

and ML applied. We should make sure the

comparison is fair (e.g., only students of a same major

or similar background will be compared) so as to get

unbiased conclusions.

We would do periodical surveys on students to get

their feedbacks. We would adjust our analysis

strategies and models according to the feedbacks. We

would also communicate with students via emails or

forum posts to get their personalized comments and

suggestions. We would correspondingly tailor our

model with respect to certain comments or

requirements.

Direct evidence could be their final grades. We

expect that with DM and ML analysis in place, the

students will have better grades by the end of the

semester. Another evidence could be their increasing

performance during the course of the learning

experience. With personal interventions, we expect

students be more and more involved, and their

performance will be continuously improved, which

can be measured by their grades on homework

assignments. Other indirect evidence could include

active participation in the online forum, which can be

measured by the time they spend and the number of

posts they post; their communications with

instructors, which can be measured by the frequency

of email exchanges and question/answering

interactions, etc.

REFERENCES

How Prevalent is Online Learning at the Collegiate Level.

(2017, November 19). Retrieved from http://www.

online-psychology-degrees.org/faq/how-prevalent-is-

online-learning-at-the-college-level/

Combs, C & Mesko, B. (2015). Disruptive Technologies

Affecting Education and Their Implications for

Curricular Redesign. The Transformation of Academic

Health Centers: Meeting the Challenges of

Healthcare's Changing Landscape. 57-68.

10.1016/B978-0-12-800762-4.00007-4.

Robyn P. (2013). Redesigning Courses for Online Delivery,

Cutting Edge Technologies in Higher Education,

Volume 8

Bunk, C. et al. (2015). MOOCs and Open Education

around the World, Routledge

Devlin, K. (2013). MOOC Mania Meets the Sober Reality

of Education, Huffington Post

U.S. Department of Education, Office of Educational

Technology (2012). Enhancing Teaching and Learning

Through Educational Data Mining and Learning

Analytics: An Issue Brief, Washington, D.C.

CSEDU 2018 - 10th International Conference on Computer Supported Education

464

Baker, R.S.J.d. et al. (2010). Data mining for education.

International encyclopedia of education, 7:112–118

Pena-Ayala, A. (2014). Educational data mining: A survey

and a data mining-based analysis of recent works.

Expert systems with applications, 41(4):1432–1462

Ning, X., Desrosiers, C. & Karypis, G. (2015). A

comprehensive survey of neighborhood-based

recommendation methods. In Francesco Ricci, Lior

Rokach, and Bracha Shapira, editors, Recommender

Systems Handbook, pages 37–76. Springer

Koren, Y., Bell, R. & Volinsky, C. (2009). Matrix

factorization techniques for recom- mender systems.

Computer, 42(8):30–37

Sweeney, M., Rangwala, H., Lester, J., & Johri, A., (2016).

Next-term student performance prediction: A

recommender systems approach. preprint arXiv:

1604.01840

Sweeney, M., Lester, J., & Rangwala, H., (2015). Next-

term student grade prediction. In Big Data (Big Data),

2015 IEEE International Conference on, pages 970–

975. IEEE

Polyzou, A. & Karypis, G. (2016). Grade prediction with

models specific to students and courses. International

Journal of Data Science and Analytics, pages 1–13

Ray, S. & Sharma, A. (2011). A collaborative filtering

based approach for Recommending elective courses. In

International Conference on Information Intelligence,

Systems, Technology and Management, pages 330–339.

Springer

Bydzovska, H. (2015). Are collaborative filtering methods

suitable for student performance prediction? In

Portuguese Conference on Artificial Intelligence, pages

425–430. Springer

Denley, T. (2013). Course recommendation system and

method. US Patent App. 13/441,063.

Sun, J., Parthasarathy, D. & Varshney, K. (2014).

Collaborative kalman filtering for dynamic matrix

factorization. IEEE Transactions on Signal Processing,

62(14):3499–3509

Sun, J., Varshney, K., & Subbian, K. (2012). Dynamic

matrix factorization: A state space approach. In 2012

IEEE International Conference on Acoustics, Speech

and Signal Processing (ICASSP), pages 1897–1900.

IEEE

Chua, FCT., Oentaryo, R. & Lim, E. (2013). Modeling

temporal adoptions using dynamic matrix factorization.

In 2013 IEEE 13th International Conference on Data

Mining, pages 91–100. IEEE

Zhang, C., Wang, K., Yu, H., Sun, J., & Lim, E. (2014).

Latent factor transition for dynamic collaborative

filtering. In SDM, pages 452–460. SIAM

Enhance E-Learning through Data Mining for Personalized Intervention

465