MakeTests: Generate and Correct Individualized Questions

with Many Styles

Fernando Teubl

, Val

erio Ramos Batista

and Francisco de Assis Zampirolli

∗ c

Centro de Matem

atica, Computac¸

ao e Cognic¸

ao, Universidade Federal do ABC (UFABC),

09210-580, Santo Andr

e, SP, Brazil

Keywords:

Automatic Item Generation, Multiple-choice Questions, Parametrized Quizzes.

Abstract:

Hardcopy exams are one of the most traditional evaluation methods. However, the greater the class the more

endeavour to correct them. Automatic generation and evaluation tools can help but most of them are restricted

to multiple-choice tests. Our paper presents MakeTests, a free of charge and open source system that auto-

mates both generation and correction of exams. MakeTests’ main contributions are: (1) highly parametrized

questions drawn from a database; (2) inclusion of many question styles besides multiple-choice (true/false,

matching, numerical and written response, only this one with manual correction); (3) real-time correction with

webcam upon handing in. This paper shows how to elaborate exams with MakeTests, specially regarding the

several question styles with parametrization, and also how to correct them automatically. These procedures are

illustrated here by two experiments: the ﬁrst one focused on feedback immediately upon handing in a test that

was given to 78 students, and the second one managed by a professor without any programming knowledge.

Our results indicate that MakeTests allows for quick elaboration of parametric questions and fast correction,

even for users that lack technical knowledge.

1 INTRODUCTION

An important and increasing activity is the task of

evaluating students with several computational re-

sources, or Information and Communication Tech-

nologies (ICT).

ICT can evolve up to activities in virtual laborato-

ries. In (Burkett and Smith, 2016) the authors present

a systematic revision that explores the differences be-

tween virtual and hands-on laboratories, both in sec-

ondary and in undergraduate education. They con-

clude that the virtual laboratories can complement ac-

tivities carried out in hands-on laboratories. Another

systematic revision is given in (Hakami et al., 2016),

which highlights some works devoted to the students’

anxiety when they are evaluated through a com-

puter, namely Computer-Based Assessment (CBA) or

e-assessment. Here the whole evaluation is performed

on the computer, including the student’s mark. This

suggests carrying out new studies in order to under-

https://orcid.org/0000-0002-2668-5568

https://orcid.org/0000-0002-8761-2450

https://orcid.org/0000-0002-7707-1793

∗

Grant: #2018/23561–1, S

ao Paulo Research Founda-

tion (FAPESP).

stand the problems related to the acceptance and to

the implementation of CBA.

In (Adkins and Linville, 2017) the authors study if

more exams change the students’ ﬁnal grades for an

introductory programming course in Computer Sci-

ence. The Spring class had a total of ﬁve exams,

whereas the Fall class had only three. Statistically

speaking, their study ﬁnds no difference but the stu-

dents prefer more exams in order to reduce anxiety

and increase both their self-conﬁdence and their mo-

tivation to learn. Moreover, the authors cite several

works showing that, in general, classes with more

evaluations attain a signiﬁcantly better performance

than others’.

According to (Engelhardt et al., 2017),

“[a]ssessments are increasingly carried out by

means of computers enabling the automatic evalua-

tion of responses, and more efﬁcient (i.e., adaptive)

testing”. This work presents two experimental

approaches to create variations of items/questions, or

heterogeneous items. They conducted an empirical

study with 983 individuals who were evaluated by

ICT, more speciﬁcally Heterogeneous Computer-

based Assessment Items (HCAI). The participants

were 14 to 16 years old and belonged to 34 schools

Teubl, F., Batista, V. and Zampirolli, F.

MakeTests: Generate and Correct Individualized Questions with Many Styles.

DOI: 10.5220/0010337902450253

In Proceedings of the 13th International Conference on Computer Supported Education (CSEDU 2021) - Volume 1, pages 245-253

ISBN: 978-989-758-502-9

245

in Germany. This work also discusses the complexity

of creating heterogeneous items. For instance, an

item can become difﬁcult to understand by changing

a single letter in a word or the word order on that

item. For evaluations they selected 40 out of 70

items, of which 10 and 30 had hard and easy levels of

difﬁculty, respectively.

A study presented in (Nguyen et al., 2017) was

conducted by comparing CBA with traditional meth-

ods on 74 undergraduate modules and their 72,377

students. The modules belonged to a variety of dis-

ciplines (25% in Science & Technology, 22% in Arts

& Social Sciences, 14% in Business & Law, 9% in

Education & Languages, and 30% in others). The

authors found that the time devoted to evaluation ac-

tivities had a signiﬁcant relation to the passing rates.

Their work also concluded that the balance between

weekly evaluations and other activities by CBA has a

positive inﬂuence on the passing rates.

Therefore, if we have ICT that make teach-

ers’ tasks easier when they conduct more evalua-

tions (Adkins and Linville, 2017; Nguyen et al., 2017;

Nguyen et al., 2018) with heterogeneous items (En-

gelhardt et al., 2017) and paper-and-pencil (as veri-

ﬁed in (Hakami et al., 2016)), but generating and cor-

recting tests automatically, the students will probably

achieve a better performance.

In this paper we present the system MakeTests,

whose open source code is available on GitHub. Its

code can be easily adapted to new question types, use-

ful for the generation and correction of printed ex-

ams. Section 2 motivates the discussion of some re-

lated works compared with MakeTests. In Section 3

we explain how to use MakeTests’ method, later ex-

empliﬁed by two experiments described in Section 4.

Finally, some future work and conclusions are drawn

in Section 5.

2 RELATED WORKS

The Introduction presented studies that reveal the im-

portance of making frequent assessments. Now we

comment on some works more related to what is pro-

posed in this article, namely ICT that facilitate the

process of creating and correcting heterogeneous (or

parametric) questions of various styles.

In (Smirnov and Bogun, 2011) the authors present

an ICT resource of visual modelling to teach science

and mathematics that includes solving scientiﬁc prob-

lems. Implemented with PHP programming language

and MySQL databases, the ICT relies on uniform re-

lational database of teachers, students, educational

projects and educational studies. The authors tested

the ICT for over 1,000 students of secondary schools

in Russia. As an example, groups of 5-6 students

had to solve problems with Newton’s Second Law.

They ﬁlled out tables of values, visualized graphics

and tried to decide on the problem analytically. How-

ever, the authors did not work on actual graphical in-

terfaces but focused on the methods to create activi-

ties. Therefore we cannot draw conclusions about the

usability of their ICT regarding students’ performance

in tests.

The formal speciﬁcation language ADLES was in-

troduced in (de Leon et al., 2018; Allen et al., 2019).

It is open-source and devoted to formal speciﬁcation

of hands-on exercises about virtual computing, net-

working, and cybersecurity. With ADLES educators

can design, specify, and semi-automatically deploy a

virtual machine (VM) for classes, tutorials or compe-

titions. Students access the VM in order to accom-

plish tasks.

In (Zampirolli et al., 2019) the authors present the

MCTest platform vision.ufabc.edu.br, which is devel-

oped in Django and MySQL, whose open source code

is available on GitHub. Hence one can install this

platform in several institutions, and the system admin-

istrator (SA) will register the departments, courses,

disciplines and professors. To each course the coor-

dinators attribute Topics, Classes, Questions, Exams,

Professors and Students. Any professor can also cre-

ate Classes, Questions and Exams. All these enti-

ties are created on web browser windows. Classes

and Questions can also be imported from CSV ﬁles,

in which the students’ Id, name and email are speci-

ﬁed in the case of a Class, but Questions follow an-

other CSV formatting. The purpose of that paper is

to describe the process of creating parametric ques-

tions of either dissertation or multiple-choice type

which rely on some Python code in their scope. In

this way, MCTest produces individualized exams, one

for each student, but they are all contained in a sin-

gle PDF ﬁle for each class. Moreover, a professor

that lectures a course can generate a uniﬁed exam

for all of their classes. The correction is automatic

for multiple-choice questions providing the profes-

sor digitizes the answer cards into another PDF to

be uploaded by the system. For questions that in-

clude program codes the student can submit the an-

swers to Moodle (moodle.org) via VPL (available at

vpl.dis.ulpgc.es) for automatic correction. As we are

going to see in the next sections, MakeTests includes

more different question styles than MCTest.

With the intention of reducing plagiarism

(Manoharan, 2019) reports how signiﬁcant it is to

create personalized multiple-choice questions. That

work describes the insufﬁciency of just shufﬂing

CSEDU 2021 - 13th International Conference on Computer Supported Education

246

questions and their respective order of alternatives.

One must also elaborate efﬁcient distractors, namely

the wrong alternatives, otherwise students can simply

discard them to guess the right answer.

According to (Manoharan, 2019) there are nor-

mally three approaches of personalized evaluations:

1) parametrization, in which some parameters take

random values; 2) databank, from which questions

are selected at random; 3) macro, which is a pro-

gram fragment (inside a question) that is replaced by

a new phrase whenever executed. Their multiple-

choice tests are responded on optical answer sheet,

also called bubble/Scantron sheet, commercialized by

Scantron Corporation®. On this sheet the student

must write both the script Id and theirs, because to-

gether they indicate what exam variation was an-

swered by that student.

The differences between MakeTests and these re-

lated works are that with MakeTests the professor de-

velops questions by means of various Python scien-

tiﬁc libraries such as graphics (matplotlib), alge-

braic analysis (sympy), and many others. Activities

are individualized and generated in hard copy, and

they make use of various styles of questions. Cor-

rections can be carried out automatically through a

mobile device. The professor can also resort to an-

swer keys generated by MakeTests in order to facili-

tate corrections, as we are going to discuss in the next

sections.

MakeTests is ideal for examiners that prefer writ-

ten response tests because, in the case of a medium

difﬁculty test, the automatic correction will already

show the student’s right and wrong answers. Hence

the examiner can concentrate their attention only on

what avails of the student’s solutions that led to wrong

answers. Roughly speaking, the examiner’s manual

correction will be just half of the whole work. Of

course, in the case of large classes MCTest can spare

even the whole manual correction, useful when the

examiner just wants to give a preparatory exam.

MakeTests’ greatest advantage is that it enables

conﬁguration and elaboration of exams in a highly

ﬂexible way. MakeTests proﬁts the user that is famil-

iar with Python language, shell commands and JSON

format. Otherwise one can resort to templates that

are easy to adapt, since question elaboration is what

mostly needs teachers’ and professors’ endeavour.

Each question prepared with MakeTests is repre-

sented by an abstract Class (in OOP) that deﬁnes a

model, namely the user must implement it according

to its aims. The question consists of a text, which

may include parametrization, and the answer format,

which deﬁnes its type and also its procedure of correc-

tion. Thus MakeTests enables an unlimited diversity

of questions and answer types, which just rely on their

implementation. The next section explains the details

of our methodology.

3 METHOD

Here we describe all the necessary steps to elaborate

and generate exams through MakeTests. The ﬁrst one

consists of creating a Database of Questions (DQ)

and classifying them under groups based on subject,

difﬁculty or any other characteristic, as detailed in

Subsection 3.1. An example of a typical question

in MakeTests will be given in Section 3.2. In the

second step exams are generated according to a con-

ﬁguration set by the user, as explained in Subsec-

tion 3.3. This conﬁguration encompasses the layout

of the exam (header, fonts, logotype, etc.) and also

the groups of questions that will be drawn with their

respective weights. Here one must give the DQ path,

as well as the one to the class roll. After generating

the tests the user can print them for the examiner. Fi-

nally Subsection 3.4 presents the third step, namely

the automatic correction, of which MakeTests allows

for three types:

Manual. The user prints an answer key from

MakeTests in order to correct the exams manually;

Scanned. The user scans all the exams into a single

PDF that MakeTests will process and render a spread-

sheet with marks and feedback;

Real-time. The user corrects the exams instantly with

a (web)cam by pointing it to the answer sheet. This

also generates the aforementioned spreadsheet.

The spreadsheet comes in CSV-format and the

feedback lists each question number with the respec-

tive student’s answer and score. Complementarily, the

library SendMail.py was developed to dispatch this

student’s information to their private email address,

so that each student will promptly get an individual

summary of their performance.

All these resources are organized in a structure of

ﬁles and folders generated by MakeTests.py, a li-

brary that is available on github.com/fernandoteubl/

MakeTests. Figure 1 illustrates the directory

structure, and besides MakeTests.py (only 2,609

lines including examples and comments) the reader

will also ﬁnd another three ﬁles on GitHub:

SendMail.py (only 270 lines), README.md and

convertPdfText2PdfImage.sh. All MakeTests

code was developed in Python 3.8.

The next subsections will explain each of the

aforementioned steps.

MakeTests: Generate and Correct Individualized Questions with Many Styles

247

Figure 1: File and folder structure of MakeTests. The

master ﬁle is MakeTests.py, which can generate template

questions indicated by the dotted rectangle. It can also

produce a student list for an exam (Students.csv), all

the exam issues in a single ﬁle (Tests.pdf), answer key

(AnswerKeys.pdf), and so on. These resources and their

employment are all detailed in README.md.

3.1 Elaborating Questions

The DQ is classiﬁed according to groups represented

by directories identiﬁed with a dotted rectangle in

Figure 1. For a large DQ one can also add subdirec-

tories that indicate subgroups, which make it easier

to organize the whole set of questions. For instance,

questions can be classiﬁed by theme in directories, to-

gether with their respective level of difﬁculty in subdi-

rectories. A question of an exam issue corresponds to

a unique Python ﬁle, and reciprocally. For example,

suppose the user runs MakeTests for an exam to have

three questions comprising all the difﬁculty levels in

Figure 1. Then for each exam issue MakeTests draws

one of the ﬁles in the folder Medium. An error is re-

ported if a folder lacks the minimum number for the

required speciﬁcation.

Each question is represented by a Python class im-

plemented in such a way that it contains all the nec-

essary information to generate the object question,

from which MakeTests will render its ﬁnal layout.

Each object question acquires speciﬁc data of a stu-

dent contained in the class roll Students.csv, like

name, Id and email, this ﬁle depicted in Figure 1. As

a Python Class, the user can parametrize any of its

parts. MakeTests furnishes pre-implemented codes

that simplify the elaboration of new questions when-

ever the user creates new corresponding PY-ﬁles. In

the present day MakeTests comprises the following

Classes for question styles:

Essay. Written response to be corrected manually,

and the score recorded in a spreadsheet of students’

marks;

MultipleChoices. An Array Class, in which any ele-

ment is a typical multiple-choice question. The user

deﬁnes the number of questions to be drawn;

TrueOrFalse. An Array Class in which any element

is a group of questions, and the answer key of any

group is a Boolean sequence. The user deﬁnes groups

with their respective questions, and afterwards the

number of groups to be drawn;

QuestionAnswer (QA). A Class that generates

matching questions. The user deﬁnes a list of ques-

tions and another of answers;

Number. The student chooses digits in a matrix to

form a number. The user deﬁnes the expected num-

ber of digits and the tolerance (zero for exact and

weighted for approximate answer).

3.2 Example

Here we detail an example of parametrized multiple-

choice question named MYchoices.py in Figure 1. In

order to see its layout the user can run the following

on the shell:

./MakeTests.py -e choices >

Questions/Easy/Mychoices.py

In MYchoices.py we implemented a Class named

myQuestionMultipleChoice, whose code is re-

produced in Figure 2. There one sees the method

makeSetup, which creates two ﬂoating-point vari-

ables x and y on lines 10-11. These take random

values between 1 and 49 with two decimals. On

line 6 we see random.choice(['+']) taking only

the addition operation. In order to enable all basic

arithmetic operations we can replace this line with

random.choice(['+','-','*','/','**']).

Moreover, lines 12-16 must be encapsulated by the

conditional if op == '+': and adapted for the

other operations. In fact MakeTests will shufﬂe the

entries of vetAnswers for each exam issue, no matter

what operation was drawn to deﬁne this vector. On

lines 12-15 notice that round is superﬂuous for '+'

and '-', but necessary for the others. Hence we kept

it because MYchoices.py can serve as a template that

will require little customization from future users. By

the way, this is one of the mindsets that enable good

employment of MakeTests.

With MakeTests one can utilize, modify or even

create question styles, which in their turn will allow

for many question types. As an example, the afore-

mentioned class QA for the matching style enables us

to work with the types one-to-one, one-to-many and

CSEDU 2021 - 13th International Conference on Computer Supported Education

248

Figure 2: Code of a question implemented in the Class

myQuestionMultipleChoice.

Figure 3: Cutout of an exam that includes the question in

Figure 2. Here we see the header and the answer card for

the automatic correction. A single answer card can be con-

ﬁgured to comprise many such questions at once.

many-to-many. Hence, for each style the user can ag-

gregate new types, either by resorting to an already

implemented Class or by creating a new one from

scratch.

Parametrization allows the question in Figure 2 for

many possible renderizations, one of them depicted

in Figure 3, which will be discussed in the next sub-

section. For instance, in order to create an exam by

drawing eight random questions from '+', '-', '*', '/',

'**', we can simply adapt the code in Figure 2 with

the function random.choices, as mentioned before.

Moreover, the new questions must be added on a list,

so that Figure 3 will have eight columns, one for each

question, and the ninth column will remain with the

letters A, B, C and D, as depicted there.

3.3 Conﬁguring the Exam

All the information regarding the generation of an

exam to a speciﬁc class must be conﬁgured through

a JSON ﬁle. MakeTests outputs a standard JSON ﬁle

that can be changed by the user. For instance, in order

to get the ﬁle config.json in Figure 1 we run the

shell command:

./MakeTests.py -e config > config.json

The JSON format was adopted in MakeTests because

of its both simple representation and easy portability

to web systems. As a matter of fact, MakeTests han-

dles an extended JSON syntax that includes com-

ments and line breaks. In order to generate the exam

in PDF from JSON this ﬁle is endowed with funda-

mental ﬁelds handled by MakeTests, namely:

IncludeJSON. Allows for merging another JSON ﬁle

to modify or complement the original one;

Questions. Contains both the DQ path and the con-

ﬁguration to select questions, as depicted in Figure 4,

which shows the parameters db path and select, re-

spectively. Each question of an exam issue is drawn

from a group, and this ﬁeld also deﬁnes the weight of

the corresponding question, together with a preﬁx as

exempliﬁed in Figure 4;

Input. Contains the class roll in CSV. For instance,

in order to produce the header in Figure 3 we used the

default ﬁle Students.csv generated by MakeTests;

Output. Sets path and ﬁle names of both the PDF

that contains all exam issues and the PDF with the

corresponding answer keys. MakeTests suggests cre-

ating the path in case it does not exist (for questions

and input the user just gets an error message in this

case). In Figure 1 they appear as Tests.pdf and

AnswerKeys.pdf, respectively;

Correction. Sets all the criteria for the automatic

correction. In this ﬁeld we write a Python code to

compute the total score, choose a name for the CSV

ﬁle with the students’ scores, and also the directory to

store this ﬁle and the image of the corrected exams.

Tex. Contains all the information about the exam for-

matting, like L

X preamble, headers and structuring

of questions. Notice that MakeTests neither contains

nor produces anything in L

X, but JSON instead

(e.g. config.json in Figure 1). Therefore, we can

alter the JSON ﬁle to generate the exams in HTML.

We have chosen to conﬁgure exams in JSON be-

cause this gives the user total ﬂexibility to design

them. The users can create their own models with

customized images and headers, and so relegate the

JSON ﬁle to speciﬁc details like date of the exam,

class roll and DQ.

MakeTests: Generate and Correct Individualized Questions with Many Styles

249

Figure 4: Cutout of a typical config.json to create an

exam with four questions (one easy, two medium and one

difﬁcult with their respective weights). Each path must be

as depicted in Figure 1.

3.4 Correcting the Exam

There are three means to correct exams with

MakeTests, as explained below.

3.4.1 Manual Correction

The user can print the aforementioned ﬁle

AnswerKeys.pdf as a guide to speed up the

manual correction, and also do it in parallel with

Teaching Assistants (TAs) without a computer. They

can take notes on the hardcopy whenever they ﬁnd

mistakes in the answer keys, which might happen in

case a question statement gives rise to an interpreta-

tion other than that made by the professor. Figure 5

shows part of such a ﬁle that begins with the student

“Qxirnrc” from Figure 3.

Figure 5: Cutout of AnswerKeys.pdf generated by

MakeTests.

The two other means of correction resort to Computer

Vision, and they generate automatic reports with the

students’ scores and also feedback. As a matter of

fact, they complement the manual correction in the

case of written response tests. For instance, the auto-

matic correction shows the student’s right and wrong

answers and therefore the manual correctors may re-

strict their work on what avails of the student’s solu-

tions that led to wrong answers. Roughly speaking,

their whole work will be halved for a medium difﬁ-

culty test.

Anyway, we shall see that the automatic correc-

tions allow for emailing a feedback to each student

even in real time, namely at the moment they hand the

test in. Afterwards the professor can digitize the cor-

rected exams and email them to the students, so that

each one will receive only their corresponding cor-

rected test.

In the case of a mere preparatory exam just to

evaluate the students’ performance, the professor can

opt for a traditional multiple-choice test and proﬁt

MakeTests’ DQ. In this case each student can get their

scores immediately upon handing in the exam.

3.4.2 Digitizing the Solved Exams

This consists of piling up the solved exams and scan-

ning them all at once through a document feeder. The

user gets a single PDF ﬁle, or even separate ones at

will, to be processed by MakeTests. For the automatic

correction MakeTests needs to access the DQ, the

class roll and the exam conﬁguration. Each student’s

name, Id and scores are saved as an individual row of

a CSV-ﬁle that the professor can open with a spread-

sheet program. For each student MakeTests also cre-

ates a folder containing their scanned exam with the

corresponding answer key, and also the image of each

solved question. With MakeTests the professor can

get these folders separately compressed and sent to

the respective students. Hence, in case of a distance

learning course the professor does not have to sched-

ule a meeting with the whole class to discuss the an-

swer key. Therefore, any student who takes objec-

tion can furnish arguments with the separate image(s)

through email.

Figure 6 shows some items of such a student’s

folder with the ﬁve question styles already available

in MakeTests. These images were generated with the

shell command:

./MakeTests.py -p scannerFile.pdf

Notice in Figure 6 that a red cross indicates a

wrong choice, whereas the missed right one is shown

with a red question mark in a blue background.

Right answers are indicated by a checked green cir-

cle. Exception is made for Figure 6(e), in which

MakeTests promptly chooses the computed mark.

This one can be changed by the professor in order to

include the correction of written response questions.

The individual corrections are emailed to the stu-

dents by the SendMail tool available in MakeTests.

3.4.3 Real-time Correction

The real-time correction is similar to the one ex-

plained in Sub-subsection 3.4.2. MakeTests uses

Computer Vision to read the answer cards of Figure 6

and correct them. Student information is contained in

the barcodes, as depicted in Figure 3. The student’s

scores are automatically computed and then sent as

feedback to their personal email.

CSEDU 2021 - 13th International Conference on Computer Supported Education

250

(a)

(b)

(c)

(d)

(e)

Figure 6: Cutouts generated by MakeTests after submitting

the PDF to the automatic correction: (a) Multiple-Choice;

(b) TrueOrFalse; (c) QA; (d) Number (they got a wrong

answer 26 instead of 25); (e) Essay.

This process is enabled by choosing either the

built-in camera of the professor’s computer, or a USB

webcam attached thereto, namely with the shell com-

mand:

./MakeTests.py -w 0

where 0 stands for the former, to be replaced with 1

for the latter.

With the real-time correction each student gets a

lower-bound estimate of their scores already at the

time they hand in the exam. Of course, manual cor-

rection will later add whatever avails of the handwrit-

ten solutions, and also the scores of the dissertation

questions. Figure 7 exempliﬁes a webcam image of

an answer card processed by MakeTests.

Although the feedback will be generated again

with the second correction method, discussed in Sub-

subsection 3.4.2, some students might want the real-

time correction in order to organize their time for up-

coming main exams. But a conventional webcam may

generate images in poor quality, and therefore the sec-

ond method will ﬁne-tune the feedback sent to the stu-

dents.

Figure 7: Cutout of an image captured by webcam. Here we

see an exam whose ﬁrst question consists of six parts (1.1 to

1.6), and each one is a multiple-choice item with four alter-

natives (A to D). Part 1.4 was left blank but MakeTests in-

dicates any missed right answer by a blue circle with a red

question mark.

4 RESULTS AND DISCUSSIONS

Now we present some preliminary results obtained

with MakeTests which aim at assessing its potential

use in new evaluation modalities. All the practical ex-

ams involving MakeTests happened at our institution

for different courses of the programme Bachelor in

Science and Technology (BST).

4.1 First Experiment Report

We tested our third correction method in a class

with 78 students of the course Object-Oriented Pro-

gramming (OOP). The main exam was generated by

MakeTests and consisted of three parametric ques-

tions, namely:

Q1) Multiple-choice. The statement shows part of

a parametrized source code for the student to deduce

what it does, and only one alternative describes it cor-

rectly. Four parts of source codes were chosen from a

DQ at random;

Q2) TrueOrFalse. Some sentences about OOP con-

cepts are listed, and the student must decide which

ones are true or false. With MakeTests we selected

twelve parametric sentences in a DQ at random;

Q3) QA. The statement includes a two-column ta-

ble, the ﬁrst containing several software Design Pat-

terns (DP) and the second one the respective descrip-

tions but in a shufﬂed order. The student must ﬁnd

which description matches the DP. Five items were

randomly selected from a DQ.

Correction was performed according to Sub-

subsection 3.4.3, and students could promptly check

their scores by email. In that experiment we used a

USB webcam Microsoft HD 720p 30fps connected

to an Apple MacBook Dual Core 512GB SSD with

8GB RAM. As depicted in Figure 7, the examiner

just had to point the camera to the answer cards

MakeTests: Generate and Correct Individualized Questions with Many Styles

251

with barely good enframing and ambient light (no

ﬂash). MakeTests also sends the scores to the ex-

aminer’s computer, so that students could also glance

at its screen for a quick evaluation. Each exam was

corrected in less than 20s, hence the queue was al-

ways short, but also because students were previously

warned that questions could not be asked at that mo-

ment. Moreover, about 20% of the students turned

down the automatic correction.

After the exam we proceeded to the method dis-

cussed in Sub-subsection 3.4.2. As already explained,

this second correction enables MakeTests to pro-

cess answercards without either distortion or blur-

ring. Moreover, the PDF contains the image of each

student’s complete exam, which can be emailed to

them for accurate checks. Indeed, that was done in

the experiment, as described at the beginning of Sub-

subsection 3.4.2. See Figure 6 for an example of what

the student receives by email. Of course, this feed-

back is optional and many professors prefer to dis-

cuss exam corrections privately with students during

the assistance.

On this experiment we remark two important

facts:

• The professor detected no cheating in the test. Since

questions are parametrized and randomly drawn to

each student, then cheating was signiﬁcantly ham-

pered;

• Far fewer students took objection to their marks

compared with standard written response exams al-

ready given by that same professor. Though he has

not analysed this reduction statistically, he claimed

that the usual amount of recorrection requirements

dropped more than 50%. This is possibly because that

exam did not include any written response question,

besides the fact that the students had already received

the correction by email.

• The professor detected no cheating in the test. Since

questions are parametrized and randomly drawn to

each student, then cheating was signiﬁcantly ham-

pered;

• Far fewer students took objection to their marks

compared with standard written response exams al-

ready given by that same professor. Though he has

not analysed this reduction statistically, he claimed

that the usual amount of recorrection requirements

dropped more than 50%. This is possibly because that

exam did not include any written response question,

besides the fact that the students had already received

the correction by email.

A difﬁculty arises in the case of large classes if

the professor wants nominal exams, as shown in Fig-

ure 3 for the student “Qxirnrc”. In the experiment it

took almost 10min for the class to begin the test. The

professor could have reduced this time by warning the

students to look for their seats in alphabetical order.

Another strategy is to replace nominal exams with

at least ﬁve versions of the same test. Hence, any stu-

dent that sits beside or behind a colleague will not

have more than 20% chance of receiving the same ver-

sion. Moreover, the professor can print extra copies

lest some of the students smudge their answer cards.

Such a strategy requires a manual association of each

student with their corresponding version but is useful

in the case of late matriculated ones that may turn up

to sit the exam.

During that exam circa 2% of the answer cards

were smudged, so the examiner had to use the an-

swer keys to correct them manually (notice that each

issue had three answer cards). Sometimes smudges

can be erased by a correction pen/tape/liquid/etc., but

Computer Vision cannot always decide which alterna-

tive the student really wanted to choose. In this case

MakeTests’ default is to give naught to the answer,

and in Figure 6 we would get a column with an extra

red cross.

4.2 Second Experiment Report

MakeTests’ present day version requires intermedi-

ate knowledge in Python programming. However,

the professor can draft questions and resort to a tech-

nician that will implement them in Python. Here

we comment on a maths professor who does not

have programming knowledge but was willing to use

MakeTests for a test. He gave an extra exam to 36 stu-

dents of the course Foundations of Mathematics. His

draft was implemented by a technician, and consisted

of three questions:

Multiple Steps: similar to the one whose answer card

is shown in Figure 7 but with ﬁve steps 1.1 to 1.5 of a

mathematical proof. In one of the versions the state-

ment was “prove that x > y and y > z imply x > z”.

Each step had six alternatives, and only one justiﬁed

that step correctly.

Multiple-choice with Justiﬁcation: similar to a stan-

dard multiple-choice question but requiring a written

response besides choosing an alternative. Naught is

promptly given to any wrong choice but the professor

additionally checks the student’s solution if it led to

the right one.

TrueOrFalse: on a list of mathematical proofs the

student must ﬁnd out which are wrong or right.

The technician used the professor’s draft to im-

plement the exam with MakeTests. Afterwards the

professor gave him all the solved exams digitized in

a single PDF, so that MakeTests generated the CSV

CSEDU 2021 - 13th International Conference on Computer Supported Education

252

spreadsheet with each question number and the re-

spective student’s answer and score, as mentioned

in Section 3. The professor just had the task of

validating the second question manually in case of

right choice. He expressed a positive opinion regard-

ing the exam variations and the quick correction by

MakeTests, but the only hindrance was his lack of pro-

gramming knowledge.

5 CONCLUSION AND FUTURE

WORKS

We have just presented MakeTests, a platform that

automates both generation and correction of numer-

ous exam issues via random parameters in Python.

Printed exams can be real-time corrected and all the

feedback emailed to the students individually. More-

over, MakeTests’ high ﬂexibility allows for creating

various types and styles of questions. These are

MakeTests’ main contributions, which to the best of

our knowledge also characterizes it as an original

work.

Five styles are already implemented in MakeTests,

as explained in Section 3.1. For a user to proﬁt

MakeTests without anyone else’s help, the only re-

quirement is a good knowledge of Python program-

ming. But this can be circumvented by any institution

that counts on programming support.

Future versions of MakeTests will work with an

interface that exempts users from programming in

Python. In this case the user can write an exam

in plain text, Markdown or even in L

X, and

MakeTests will translate it to get the exams in Python.

As another improvement MakeTests will gener-

ate online exams not only in PDF, but also in XML

compatible with Moodle. Finally, MakeTests’ typical

DQ is expected to migrate to a Databank of Questions

shared by many professors who contribute with their

lists of exercises, so that users may simply choose

questions to conﬁgure their exams instead of elaborat-

ing new exercises. Not only MakeTests’ DQ but also

its whole code could migrate to a WebService, so that

professors will be able to keep everything in cloud

and synchronized, without needing to access their lo-

cal machines.

REFERENCES

Adkins, J. K. and Linville, D. (2017). Testing frequency in

an introductory computer programming course. Infor-

mation Systems Education Journal, 15(3):22.

Allen, J., de Leon, D. C., and Goes, Haney, M. (2019).

Adles v2.0: Managing rapid reconﬁguration of com-

plex virtual machine environments. In The Collo-

quium for Information System Security Education.

Burkett, V. C. and Smith, C. (2016). Simulated vs. hands-

on laboratory position paper. Electronic Journal of

Science Education, 20(9):8–24.

de Leon, D. C., Goes, C. E., Haney, M. A., and Krings,

A. W. (2018). Adles: Specifying, deploying, and shar-

ing hands-on cyber-exercises. Computers & Security,

74:12–40.

Engelhardt, L., Goldhammer, F., Naumann, J., and Frey, A.

(2017). Experimental validation strategies for hetero-

geneous computer-based assessment items. Comput-

ers in Human Behavior, 76:683–692.

Hakami, Y. A. A., HUSSEI, B., AB RAZAK, C., and ADE-

NUGA, K. I. (2016). Preliminary model for computer

based assessment acceptance in developing countries.

Journal of Theoretical & Applied Information Tech-

nology, 85(2).

Manoharan, S. (2019). Cheat-resistant multiple-choice ex-

aminations using personalization. Computers & Edu-

cation, 130:139–151.

Nguyen, Q., Rienties, B., Toetenel, L., Ferguson, R., and

Whitelock, D. (2017). Examining the designs of

computer-based assessment and its impact on student

engagement, satisfaction, and pass rates. Computers

in Human Behavior, 76:703–714.

Nguyen, Q., Thorne, S., and Rienties, B. (2018). How

do students engage with computer-based assessments:

impact of study breaks on intertemporal engagement

and pass rates. Behaviormetrika, 45(2):597–614.

Smirnov, E. and Bogun, V. (2011). Science learning with in-

formation technologies as a tool for “Scientiﬁc Think-

ing” in engineering education. Online Submission.

Zampirolli, F. d. A., Teubl, F., and Batista, V. R. (2019). On-

line generator and corrector of parametric questions in

hard copy useful for the elaboration of thousands of

individualized exams. In 11th International Confer-

ence on Computer Supported Education, pages 352–

359.

MakeTests: Generate and Correct Individualized Questions with Many Styles

253