Children Face Long-term Identification in Classroom: Prototype

Proposal

Nikolajs Bumanis

, Gatis Vitols

, Irina Arhipova

and Inga Meirane

Faculty of Information Technologies, Latvia University of Life Sciences and Technologies, 2 Liela str., Jelgava, Latvia

WeAreDots Ltd., Elizabetes str. 75, Riga, Latvia

Keywords: Face Recognition, Deep Learning, Long-term Identification.

Abstract: A children face automated identification raise additional challenges compared to an adult face automated

identification. A long-term identification is used in the environment in which a person must be identified in

longer time spans, such as months and years. A long-term identification is present for example in schools

where children spend multiple years and, if automated face identification solution is implemented, it must be

resilient to recognise face biometrical data in the span of typically up to 9 years. In this proposal, we discuss

children face identification available solutions which use deep learning networks, introduce legal constraints

that come with privacy of children and propose prototype for a long-term identification of children attendance

in their classroom. The solution consists of a developed prototype that is architecturally separated into three

layers. The layers encapsulate necessary local and remote hardware, software and interconnectivity solutions

between these entities. The protype is intended for implementation into a school’s class attendance

management system, and should provide sufficient functionality for person’s identity management, object

detection and person’s identification processes. The prototype’s processing is based on the model that

incorporates the principles of multiple correct biometric pattern versions, providing possibility of a long-term

identification. The model uses Single Shot MultiBox Detector for object detection and Siamese neural

network for a person identification.

1 INTRODUCTION

Human face recognition automation has been broadly

researched especially with the improvement in

computational power and the development of AI

algorithms and implementations. Various solutions

have been developed and trained with different data

sets, for example, for age estimation (Anda et al.,

2019; Huang, Li, Zhu, & Chen, 2017). A large portion

of research dedicated to automated face recognition

comes from the realm of a smart city development

and a citizen recognition in a real-time for security

(Wu, Xu, & Li, 2020).

A long-term children face identification can be a

part of more a complex solution for overall

management of learning facilities with the emphasis

on security, a class emotional climate and the support

of an existing school management system, such as

https://orcid.org/0000-0002-1884-7731

https://orcid.org/0000-0002-4131-8635

https://orcid.org/0000-0003-1036-2024

automated counting of pupils in the classroom. A

short term identification of a pupil can be achieved

easier than long-term. Children spend time in school

for multiple years and during that time various

constraints on identification need to be addressed,

such as changes in face, changes of dressing styles,

cosmetic variations of faces, etc.

The aim of the research is to propose a prototype

for children face long-term identification in learning

facilities.

2 CHILDREN FACE

IDENTIFICATION

CHALLENGES

Automated identification and re-identification of

baby, toddler and children faces can raise some

Bumanis, N., Vitols, G., Arhipova, I. and Meirane, I.

Children Face Long-term Identiﬁcation in Classroom: Prototype Proposal.

DOI: 10.5220/0009414002870293

In Proceedings of the 22nd International Conference on Enterprise Information Systems (ICEIS 2020) - Volume 2, pages 287-293

ISBN: 978-989-758-423-7

287

challenges which are not present in the face

identification of adults. The main challenges are

faster changes in certain face biomarkers and stricter

privacy control policies for implementation.

2.1 Deep Learning for Automation of

Children Face Identification

The research on younger age people face recognition

typically focuses on children face recognition for

their identification in classroom settings in schools or

other establishments and a face emotion recognition

(Jaiswal & Nandi, 2019).

Classroom observation automated solutions have

been researched in recent years. There are promising

solutions (Ramakrishnan, Ottmar, LoCasale-Crouch,

& Whitehill, 2019) in detection of an emotional status

of a class (positive or negative) with an improved

success using ensemble learning using such deep

learning architectures as CNN and Bi-LSTM.

In India, children face identification after

kidnaping has been improved with deep learning

using CNN model and VGG Face

Descriptor(Chandran et al., 2018). Another article

(Siddiqui, Vatsa, & Singh, 2018) proposes class-

based penalties mechanism for CNN to improve the

recognition of toddler images.

By testing 8 datasets and using CNN network

with reduction of hyperparameters researchers

achieved 74% accuracy in children face emotion

detection (Jaiswal & Nandi, 2019).

The analysis of children behaviour in the

classroom is done by using a security camera in the

classroom and by analysing images from cameras

(Rothoft, Si, Jiang, & Shen, 2017). To detect if

children pay attention to the designed spatial area,

distribution of the focus points in two dimensions is

performed. After that the identification of anomalous

points is identified.

Face recognition for students’ attendance has

been developed (Prangchumpol, 2019) using Android

Face Recognition service which is linked to the

database on cloud storage. The information is

automatically added to a web-based attendance

management system.

The authors propose (Bhattacharya, Nainala, Das,

& Routray, 2018) the solution to prevent fake

attendance by using CNN network with loss function

and tools to improve image quality reaching above

80% accuracy. Authors use 15MP single camera.

Another article (Dalal, Dalal, & Dalal, 2019) for

the same attendance identification task propose to use

Viola-Jones algorithm for the detection of face region

in image and feedforward neural network Extreme

learning machines for identification.

A typical recording setting for children

identification in the classroom is done with single

camera and few participants(Bhattacharya et al.,

2018; Lin & Li, 2019). However, less research

address issue of more people in the classroom and

possible implementation of multiple camera

recording and synchronisation of data between

cameras.

A recent research shows improved face and face

expression identification and verification results with

application of Siamese Networks based on CNN(Dwi

Putranto & Wahyono, 2019; Hayale, Negi, &

Mahoor, 2019)

2.2 Privacy Policies

Children’s biometric data is regulated by multiple

entities. In Europe, GDPR impact the way data,

including biometric data, is handled. Some articles

(Sanchez-Reillo, Ortega-Fernandez, Ponce-

Hernandez, & C. Quiros-Sandoval, 2018) address

biometric data and legal regulation relationships

proposing particular procedures to handle analysed

data according to the regulations.

Some of the countries implement special acts

dedicated to people freedom and privacy. For

example, in the United Kingdom there is Protection

of Freedom Act 2012 (Parliament of the United

Kingdom, 2011) which first chapters address

Regulation of biometric data and Regulation of

surveillance.

For children biometrical data protection, there is

a separate regulation such as “Protection of biometric

information of children in schools and colleges”

(U.K. Department for Education, 2018) which

provide guidance for schools and possible

implementation of local school policies. Then also

each school has internal rule and regulation

document, typically including section about safety

(Jules Verne Riga French School council, 2014).

3 PROTOTYPE ARCHITECTURE

The prototype is developed for school attendance

management process. A typical management process

includes manual observation and fixation of each

student’s attendance into the electronic system or

journal. At the end of each semester, the attendance

readings are aggregated and used to determine

necessary procedures, for example, to allow a student

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

288

to take an exam, to prevent a student from taking an

exam before he completes additional tasks, etc.

A proposed prototype consists of multiple

interconnected software and hardware components.

The architecture can be described by three layers -

physical layer, data security layer and logical layer

(see Fig. 1).

Figure 1: Stack diagram of prototype architecture.

3.1 Physical Layer

A physical layer is responsible for connectivity of

hardware components by utilizing input/output

interfaces of these components. The following

components are included in physical layer: video

cameras, Work Station and Router.

Video cameras are responsible for observation of

the classroom and gathering video feed. There are

multiple variations of camera placement in the

classroom. The prototype uses two cameras,

positioned on the front room’s wall, where, in the

framework of this work, we assume classroom’s

composition as seen in Figure 2 (see Fig. 2).

This placement provides the following benefits:

the ease of camera installation because of convenient

cable routing, and the encapsulation of all the

classroom using two partly overlapping viewing

angles for more accurate object detection and

identification. However, multiple viewing angles,

especially when overlapped, introduce a statistical

challenge – the correct way to interpret the results for

scenarios when one object is detected and identified

by both cameras simultaneously.

A video feed from cameras is processed by Work

Station, specifically – Processing Unit (see Section

4.2), placed at the filming location. Work Station is

prepared by a service provider and has all the

necessary software installed, which is described in

chapter 3.3.

Figure 2: Proposed camera placement positions.

A physical layer includes Router used for secure

connection of camera devices, Work Station and Data

Centre. Security measures are described in chapter

3.2.

3.2 Data Security Layer

An intermediary level of the proposed architecture is

Data security level, which describes technologies and

protocols used by Router. This layer is responsible for

secure connection between camera devices, Work

Station and Data Centre. Connection is established

using Virtual Private Network (VPN) and

cryptographic protocol Transport Layer Security

(TLS).

Sensitive data are placed in Data Centre, thus

providing physical security of data. Access to these

data is secured by VPN and TLS, and is available only

to authorized and authenticated users. These

measures are also effective against Main-In-The-

Middle (MITM) attacks.

3.3 Logical Layer

Logical layer describes the software used by Work

Station and Server, and in general is responsible for

object detection and object identification processes.

The software of the proposed prototype can be

divided into server side applications and a user

application. Server side applications provide support

for biometric data processing functionality to manage

persons’ biographical data, event and class attendance

registers, and a persons’ identification process.

Children Face Long-term Identiﬁcation in Classroom: Prototype Proposal

289

Server side applications include knowledge base,

created using persons’ biometric data; data base;

video archive, which includes video feed captured by

cameras, and video cuts of a particular detected

object; authentication and authorization module.

The following Server side applications are

implemented as tools with corresponding methods:

 person management tools – methods for

managing person’s biographical and

biometrical data, and this person’s designation

to a particular group or class;

 image quality tools – methods for image quality

assessment;

 observation tools – methods for starting and

stopping video recording, creation of video

archive;

 classifier management tools – methods for

accessing and managing classifiers;

 event management tools –methods for

managing events and class attendance results;

 messaging tools – methods for implementation

of messaging mechanism;

 configuration tools – methods for managing

Prototype modules’ settings;

 audit tools – methods for managing audit

records.

The user application, installed in Work Station,

provides the functionality of creation and

modification of a person’s biographical and

biometrical profile by stating person’s credentials and

uploading person’s face images. The user application

performs identity management activities utilizing the

connection between Work Station and Server and

accessing Server side applications.

Biometric data processing is performed by logical

Processing Unit embedded into the user application.

Processing Unit is responsible for video feed

processing - object detection and detected object’s

crop image creation; object identification using

created crop image and biometric data from

knowledge base. Processing Unit is also responsible

for executing Server side applications’ methods like

creation of video archive.

Logical layer includes Data Base, which stores

the following data: person’s biographical and

biometric data, results of object detection in the form

of an event register, results of automatic person

identification and expert assessment of these results

in the form of a class attendance register, classifiers

for teaching classes, rooms, cameras, audit records

and configuration parameters.

4 PROTOTYPE FUNCIONALITY

The range of available functions depends on the

authorization level of an authenticated user. Basic,

teacher level authorization gives access to person

management tools, observation tools with some

limitations to start and end recording procedures,

class attendance result register management tools and

a read-only access to some classifiers; whereas

advanced, administrator level authorization gives

access to all existing tools. The administrator level

authorization is also required to perform management

of user accounts and classifiers. In addition, the

prototype gives possibility to modify access rights

resulting in creation of in-between level accounts.

Starting Prototype’s User application, the user is

offered an authorization window to input credentials.

In case of a successful validation, the user is

forwarded to Prototype’s main page. The main page

contains links to all sections available to

authenticated user’s authorization level. The sections

correspond to User application’s internal

functionality or Side server application.

4.1 User Interface Functions

The sections can be divided into three main branches

– managing person’s identity, managing the

observation process and managing the results of the

observation, and one secondary branch – data base

management.

The first branch is intended for development of

the knowledge base, used by a person identification

algorithm. The person’s identity is registered by

inputting biographical data, including name,

surname, person identification number and the birth

date. As an option, person’s gender can be defined.

After the definition of biographical data, person’s one

to four photos are uploaded; photos are taken

according to image quality standards (ISO, 2011) and

to ensure sufficient image quality the following

parameters are validated: face recognition rate,

contrast, sharpness, saturation. The final step includes

selecting person’s affiliated groups and classes.

The second branch is intended for the

management of the filming process. The filming

location and affiliated class or group are selected

from pre-registered classifies, and the filming process

is started by a user. The prototype supports a single

filming session for up to 10 minutes, and stops it

automatically after that. As an option, the filming

process can be stopped manually by a user. During

the filming process, the video feed is being

continuously processed by Processing Unit.

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

290

The third branch is intended for managing the

results of filming that includes both object detection

and person identification results. Every identified

person must be first detected, but not every detected

object gets identified. The prototype divides these

results into two sections – registered event section for

object detection, and class attendance management

section for person’s identification and attendance of

the affiliated class. In the framework of the prototype,

the event is a set of frames, starting with the frame

when an object was first detected, up to a frame in

which the object was re-detected in the timeframe of

last 3 seconds (see 4.3). By accessing the first

section, the results of object detection can be

observed.

These results include the following data: the event

date and time, the event location, 10 seconds long

video fragment with a detected object in the frame;

when playing, the video is displayed in the embedded

video player. In the case of a successful identification

of a detected object, this detected object’s name and

surname, including the corresponding degree of

coincidence, can also be observed.

The second section contains the results for person

identification with regards to his attendance of an

affiliated class. The results include metadata about

the event and a list of intended participant attendance

with each participant automatical identification

attendance assessment. An expert assessment can be

performed for each intended participant, and is

commonly done by a teacher level user. Assessment

values are used for statistical analysis which

determines Prototype’s performance in regards to the

object detection and the person identification. In case

of unidentified detected objects, respectively -

unexpected participants, frames with these persons

are included. The results can be exported in a form of

a report.

A secondary branch is intended for data

management processes with regards to user accounts

and classifiers. Entities of school rooms, teaching

classes and camera devices are registered as

classifiers, and can be later modified and managed by

an administrator level user. User accounts are created

by the administrator. In the current stage of

development the prototype does not allow creation of

a user account from a work station because of security

and legal rights reasons. Prototype’s user

authorization is realized with compliance to WS-

Trust specification (OASIS, 2012).

4.2 Processing Unit

Processing Unit is a set of software algorithms which

are responsible for processing video feed with the aim

to detect object and identify the detected object’s

identity. Processing Unit realizes the previously

described (Arhipova, Vitols, & Meirane, 2019) model

that incorporates principles of multiple correct

biometric pattern versions, providing possibility of

long-term identification.

The prototype uses Single Shot Multi Box (SSD)

(Liu et al., 2016) algorithm for object detection and

Siamese neural network (Bielski, 2019; Koch &

Koch, 2015) for person identification.

The proposed model was updated to

accommodate the necessity of using two cameras.

Update affects the first model’s step – the processing

of video feed and the creation of the object’s crop

image. A crop image is created during a continuous

video filming process, using a captured frame with

the best image quality, and is used by Siamese neural

network for person identification. In the case of two

cameras, only one crop image is created, decided by

threshold and image quality assessment values.

4.3 Object Tracking

Object tracking, which is commonly performed by

either a separate object tracking algorithm or an

object detection algorithm’s sub-function is

implemented into the Prototype synthetically.

Prototype’s object tracking is based on observing the

target within identical or very close proximity

coordinates with the aim to determine if the object in

the frame is the “same”.

These calculations are performed by SSD

algorithm. When an object is detected, the Prototype

generates an event. During video feed processing

SSD makes continuous attempts at detecting this

object. SSD creates crop images of detected objects

and forwards them to a person identification with

Siamese neural network.

A registration process continues until the object is

successfully detected within the timeframe of 3

seconds. When it happens, the registration process is

stopped, and the instance of the event is recorded in

the database. If object’s identity is identified at any

point during the registration process, a person’s name

and surname are also recorded.

Children Face Long-term Identiﬁcation in Classroom: Prototype Proposal

291

5 CONCLUSIONS

Children face identification raises two main

challenges: faster changes in face and stricter privacy

control policies.

The majority of research to capture face uses few

people and a single camera. Our proposal introduces

two camera implementation in a classroom with

synchronisation between cameras, using a router and

a workstation, thus providing a possibility of further

scalability and expansion.

The proposed prototype is meant to handle basic

school attendance management operations with

regards to object detection and person identification.

However, advanced scenarios, like entry of an

unintended person, introduction of additional usage

and processing challenges – filming process must be

stopped beforehand, and the results may be

insufficient for final analysis.

The prototype uses Single Shot MultiBox

Detector and Siamese neural network for the main re-

identification process, where recent researches show

an improved face and face expression identification

and result verification with the application of Siamese

Networks based on CNN.

Technically the prototype assumes correct and

accurate working regime – up to 10 minutes of non-

issue incurring filming and continuous processing.

Furthermore, the prototype uses a remote connection

to Data Centre. The potential issues which may occur

during the production were not included in this paper.

This requires in-depth approbation and adaptation.

Further steps include this prototype’s approbation

in Latvia high school. The legal permissions to

execute first experiments have already been acquired.

ACKNOWLEDGEMENTS

The research leading to these results has received

funding from the project "Competence Centre of

Information and Communication Technologies" of

EU Structural funds, contract No. 1.2.1.1/18/A/003

signed between IT Competence Centre and Central

Finance and Contracting Agency, Research No. 2.1

"Person long-period re-identification (Re-ID)

solution to improve the quality of education".

REFERENCES

Anda, F., Lillis, D., Kanta, A., Becker, B. A., Bou-Harb, E.,

Le-Khac, N.-A., & Scanlon, M. (2019). Improving

Borderline Adulthood Facial Age Estimation through

Ensemble Learning. In Proceedings of the 14th

International Conference on Availability, Reliability

and Security - ARES ’19 (pp. 1–8). New York, New

York, USA: ACM Press. https://doi.org/10.1145/

3339252.3341491

Arhipova, I., Vitols, G., & Meirane, I. (2019). Long Period

Re-Identification Approach to Improving the Quality of

Education: A Preliminary Study, FICC 2020. (p. In

Press).

Bhattacharya, S., Nainala, G. S., Das, P., & Routray, A.

(2018). Smart attendance monitoring system (SAMS):

A face recognition based attendance system for

classroom environment. In Proceedings - IEEE 18th

International Conference on Advanced Learning

Technologies, ICALT 2018 (pp. 358–360). Institute of

Electrical and Electronics Engineers Inc.

Bielski, A. (2019). Siamese and triplet networks with online

pair/triplet mining in PyTorch. Retrieved August 23,

2019, from https://github.com/adambielski/siamese-

triplet

Chandran, P. S., Byju, N. B., Deepak, R. U., Nishakumari,

K. N., Devanand, P., & Sasi, P. M. (2018). Missing

Child Identification System Using Deep Learning and

Multiclass SVM. In 2018 IEEE Recent Advances in

Intelligent Computational Systems (RAICS) (pp. 113–

116). IEEE. Retrieved from https://ieeexplore.ieee.org/

document/8635054/

Dalal, A., Dalal, P., & Dalal, S. (2019). Automatic

Attendance System Using Extreme Learning Machine.

International Journal of Engineering and Advanced

Technology (IJEAT).

Dwi Putranto, R. A., & Wahyono. (2019). Alignment based

siamese network model for face verification.

International Journal of Scientific and Technology

Research, 8(10), 2577–2581.

Hayale, W., Negi, P., & Mahoor, M. (2019). Facial

expression recognition using deep siamese neural

networks with a supervised loss function. In

Proceedings - 14th IEEE International Conference on

Automatic Face and Gesture Recognition, FG 2019.

Institute of Electrical and Electronics Engineers Inc.

Huang, J., Li, B., Zhu, J., & Chen, J. (2017). Age

classification with deep learning face representation.

Multimedia Tools and Applications, 76(19), 20231–

20247. Retrieved from http://link.springer.com/

10.1007/s11042-017-4646-5

ISO. (2011). ISO/IEC 19794-1:2011. Information

technology — Biometric data interchange formats —

Part 1: Framework. Retrieved from https://

www.iso.org/standard/50862.html

Jaiswal, S., & Nandi, G. C. (2019). Robust real-time

emotion detection system using CNN architecture.

Neural Computing and Applications.

Jules Verne Riga French School council. (2014). Internal

Rules of Jules Verne Riga French School. Retrieved

November 17, 2019, from http://

www.ecolejulesverne.lv/wp-content/uploads/2015/03/

Internal-rules-2014-2015.pdf

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

292

Koch, G., & Koch, G. (2015). Siamese Thesis.

Cs.Toronto.Edu. Retrieved from http://www.cs.

toronto.edu/~gkoch/files/msc-thesis.pdf

Lin, Z. H., & Li, Y. Z. (2019). Design and Implementation

of Classroom Attendance System Based on Video Face

Recognition. In Proceedings - 2019 International

Conference on Intelligent Transportation, Big Data and

Smart City, ICITBS 2019 (pp. 385–388). Institute of

Electrical and Electronics Engineers Inc.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,

Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot

multibox detector. Lecture Notes in Computer Science

(Including Subseries Lecture Notes in Artificial

Intelligence and Lecture Notes in Bioinformatics), 9905

LNCS, 21–37. https://doi.org/10.1007/978-3-319-

46448-0_2

OASIS. (2012). WS-Trust 1.4. Retrieved from http://docs.

oasis-open.org/ws-sx/ws-trust/v1.4/ws-trust.html

Parliament of the United Kingdom. (2011). Protection of

Freedoms Act 2012. Retrieved January 6, 2020, from

http://www.legislation.gov.uk/ukpga/2012/9/contents/

enacted

Prangchumpol, D. (2019). Face Recognition for Attendance

Management System Using Multiple Sensors. In

Journal of Physics: Conference Series (Vol. 1335).

Institute of Physics Publishing.

Ramakrishnan, A., Ottmar, E., LoCasale-Crouch, J., &

Whitehill, J. (2019). Toward Automated Classroom

Observation: Predicting Positive and Negative Climate.

In 2019 14th IEEE International Conference on

Automatic Face & Gesture Recognition (FG 2019) (pp.

1–8). IEEE. Retrieved from https://ieeexplore.ieee.org/

document/8756529/

Rothoft, V., Si, J., Jiang, F., & Shen, R. (2017). Monitor

Pupils’ Attention by Image Super-Resolution and

Anomaly Detection. 2017 International Conference on

Computer Systems, Electronics and Control (ICCSEC),

843–847. Retrieved from https://ieeexplore.ieee.org/

document/8446759/

Sanchez-Reillo, R., Ortega-Fernandez, I., Ponce-

Hernandez, W., & C. Quiros-Sandoval, H. (2018). How

to Implement EU Data Protection Regulation for R&D

in Biometrics. Computer Standards & Interfaces, 61,

89–96.

Siddiqui, S., Vatsa, M., & Singh, R. (2018). Face

Recognition for Newborns, Toddlers, and Pre-School

Children: A Deep Learning Approach. 2018 24th

International Conference on Pattern Recognition

(ICPR), 3156–3161. https://doi.org/10.1109/

ICPR.2018.8545742

U.K. Department for Education. (2018). Protection of

children’s biometric information in schools. Retrieved

December 19, 2019, from https://www.gov.uk/

government/publications/protection-of-biometric-

information-of-children-in-schools

Wu, H., Xu, H., & Li, P. (2020). Design and

Implementation of Cloud Service System Based on

Face Recognition. In Advances in Intelligent Systems

and Computing (Vol. 993, pp. 629–636). Springer

Verlag.

Children Face Long-term Identiﬁcation in Classroom: Prototype Proposal

293