Application Development for Mask Detection and Social Distancing

Violation Detection using Convolutional Neural Networks

Gokul Sudheesh Kumar and Sujala D. Shetty

Department of Computer Science, Birla Institute of Technology & Science, Pilani, Dubai Campus, Academic City,

Dubai, U.A.E.

Keywords: Machine Learning, YOLO Object Detection, Firebase, TensorFlow.

Abstract: This project aims to detect face masks and social distancing on a video feed using Machine Learning and

Object Detection. TensorFlow and Keras were used to build a CNN model to detect face masks and it was

trained on a dataset of 3800 images. YOLO Object detection was used to detect people in a frame and check

for social distancing by calculating the Euclidean distance between the centroids of the detected boxes.

Developed an Android app named “StaySafe” where the user will be notified and can monitor the violations.

For this purpose, Firebase was used as the backend service. If a violation is detected it will upload the image

to a Firebase Cloud Storage with a notification, and the user will be able to view these images on their Android

app along with the date and time. Firebase Cloud Messaging service was used to send notifications which will

be handled in the android app. The app offers various features like viewing history, saving the image to the

device, deleting the images from the cloud etc. A heat map can also be viewed which highlights crowded

regions which can help officials identify the regions that need to be sanitized more often.

1 INTRODUCTION

The year 2020, has brought us a lot of challenges,

especially in the working sectors. Many

establishments are switching to a work from home-

based environment. There are still some businesses

that opened after lockdown and continuing its normal

operation like restaurants, few schools and

universities, construction sector, and a few offices all

ensuring workplace safety such as wearing masks and

social distancing.

The health authorities are working hard on

ensuring that these businesses follow all the protocols

for keeping the employees safe. They are conducting

regular inspections and even shutting down these

establishments if they keep violating the safety

protocols.

Many of these establishments are working

towards the automation of detecting such violations,

thereby reducing the time and labour spent for the

same. The main aim of this project is to detect

violations such as not wearing a mask or not

following social distancing in a workplace and notify

the officials through an Android app.

Technology has advanced tremendously over the

past century, everything starting from the Internet of

Things (IoT) (Raj, 2021) to machine learning and

deep learning. CNN is used in various fields like

medical (Duran-Lopez, 2019), marine science (Sung,

2017) and many other applications (Hansen, 2017)

and has become a prominent domain of machine

learning.

This project was implemented using Keras and

TensorFlow where the CNN (Convolutional Neural

Network) model was trained for face mask detection

and the YOLO Object Detection was used for social

distancing detection. An android app named

“StaySafe” was developed with the help of Firebase

which is the backend service implemented for

pushing notifications and storing these detected

images, from where the user will be able to view them

on their app.

The rest of the paper is arranged as follows:

Section 1 gives a brief introduction to the project.

Section 2 reviews some work related to this project.

Section 3 explains the methodology and

implementation of the project. Section 4 talks about

Social Distance violation detection. Section 5 talks

about Firebase which is the back-end service used to

store detected images and send notifications to the

android app. Section 6 talks about the Android App.

Section 7 discusses the applications. Section 8 is

760

Kumar, G. and Shetty, S.

Application Development for Mask Detection and Social Distancing Violation Detection using Convolutional Neural Networks.

DOI: 10.5220/0010483107600767

In Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) - Volume 1, pages 760-767

ISBN: 978-989-758-509-8

conclusion and future scope of work. Section 9 is

references.

2 RELATED WORK

This section reviews some of the related works that

implements CNN with the help of TensorFlow, Keras

and YOLO Object detection and improvements in

object detection.

Akanksha Soni et al. (Soni, 2020) developed a

model that detects whether a person is wearing a

helmet in real time thereby, detecting any violations.

This project was also implemented with the help of

TensorFlow, Keras and OpenCV. Their proposed

model showed major improvements when compared

to some previous models that gave wrong predictions

whenever a rider wears clothes over their face. They

achieved an overall accuracy of 98% when tested.

S Chen et al. (Chen, 2020) implemented a model

with the help of TensorFlow to identify ID card

numbers. With the help of OpenCV the image of an

ID card is preprocessed and the number on the ID card

is recognized and given as output with the help of a

trained CNN model. When tested it was observed that

training speed is fast and the accuracy is high.

Emily Caveness et al. (Caveness, 2020)

developed TensorFlow Data Validation (TFDV)

which offers a scalable solution for data analysis and

validation for machine learning. It is deployed in

production which is integrated with TensorFlow

Extended (TFX), which is an end-to-end ML

platform. Their system has gained a lot of traction

ever since they open sourced their project. Other

open-source data validation systems such as Apache

Spark were also heavily inspired from their project.

Apache Spark packs with built-in modules for

streaming and has a fast, easy to use system for big

data processing.(Nair, 2018)

Yonghui Lu et al. (Lu, 2020) proposed an efficient

YOLO Architecture, YOLO-compact for a real time

single category detection. As we know in most

practical applications, the number of categories in

object detection is always single and the authors

aimed to make detections faster and more efficient for

these scenarios. By performing a series of

experiments, the authors were able to come up with

an efficient and compact network with the help of

YOLOv3. It was observed that YOLO-compact is

only of 9MB size, about 26 times smaller than

YOLOv3, 6.7 times smaller than tiny-yolov2 and 3.7

times smaller than tiny-yolov3. The average precision

of YOLO-compact is 86.85% which is significantly

higher than other YOLO models.

M. B. Ullah (Ullah, 2020), proposed a CPU-based

YOLO object detection model that is intended to run

on non-GPU computers. In the proposed method, the

author optimized YOLO with OpenCV in a way that

real time object detection can be much faster on CPU

based computers. Their network architecture

comprises 2 Convolutional layers each followed by

pooling layers and 3 fully connected layers. Their

model detects objects from videos in 10.12 to 16.29

FPS with 80-99% confidence in CPU-based

computers.

3 METHODOLGY AND

IMPLEMENTATION

Figure 1 depicts the project architecture.

Figure 1: Project Architecture.

3.1 Face Mask Detection

This project uses TensorFlow and Keras to train a

CNN model for detecting face masks.

3.2 TensorFlow and Keras

TensorFlow is an open-source platform that is used

for Machine Learning, created by the Google Brain

team. It is explicitly used for complex numerical

computation, that packs together a bunch of machine-

learning and deep learning models and algorithms.

It can be used for a variety of applications such as

classifying handwritten digits, object detection,

image recognition, natural language processing

(Natraj, 2019) by training and running deep neural

networks.

Application Development for Mask Detection and Social Distancing Violation Detection using Convolutional Neural Networks

761

Keras which acts as an interface for TensorFlow

is an open-source library that provides an efficient

way of implementing neural networks. It consists of

useful functions such as activation functions, and

optimizers.

3.2.1 How Does TensorFlow Work?

With the help of TensorFlow, developers can create

dataflow graphs which are structures that show how

data passes through the graph, or a series of nodes.

Think of each node as a mathematical operation and

each edge representing a multidimensional data array

or a tensor.

This can be easily implemented in python where

these nodes and tensors act as objects. However, the

mathematical operations are performed in C++

binaries which shows an optimal performance.

Python takes care of directing the traffic and

combines them to work together as a unit.

TensorFlow can be run on multiple platforms such

as in a cloud, a local machine, CPUs or GPUs, iOS,

and Android devices. It can also be run on Google’s

custom TensorFlow Processing Unit (TPUs). The

trained models can be run on any system for

predicting results.

TensorFlow 2.0 which was released in October

2019 made many significant changes from user

feedback. It works more efficiently and is more

convenient with simple Keras API for training models

and better performance. With the help of TensorFlow

Lite, it is possible to train models on a wide variety of

devices.

3.3 Training the Face Mask Detection

Model

In this project a convolutional neural network is used

to detect face masks. The neural network takes in the

input image as a frame from the video, processes it

and classifies it under two categories: mask and no

mask. The model was trained using 3800 images,

1900 images each for “with mask” and “without

mask” categories.

Figure 2: Network Architecture.

The network consists of two convolution layers

each followed by a relu activation function and a max

pooling layer. Figure 2 shows the network

architecture.

Relu function (Rectified Linear Unit) is

introduced for non-linearity in the convolutional

network. It is shown as:

𝑓



𝑥



max0,𝑥

(1)

Pooling layers will reduce the number of

parameters if the images are large. Spatial pooling

also referred to as sub-sampling or down-sampling

helps to reduce the dimensionality of each map

making sure important information is retained. It can

be of different types:

• Sum Pooling • Max Pooling

• Average Pooling

Max pooling will take the largest element from the

rectified feature map.

The data is then flattened by converting it to 1-

dimensional array which is passed as the input to the

final output layer. To help prevent overfitting, the

network ignores a certain percentage of neurons

during training. In this case the network drops out

50% of neurons. These units are not considered

during some forward or backward pass.

The final output layer takes the values and

transforms them into a probability distribution, this is

achieved with the help of softmax function. This

function is helpful when it comes to classification

problems or when dealing with multi-class

classification problems. The final predicted class is

the item in the list whose confidence score is the

highest.

It is represented as:

𝑒





∑

𝑒









(2)

Figure 3: Softmax Activation Function.

The final prediction will be based on the class that

has the highest probability. The model was then

compiled using Adam optimizer and trained for 20

epochs with a learning rate of 0.001 and an accuracy

of 96.35% was observed on the validation set.

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

762

Figure 4 plots the accuracy and loss, respectively.

(Goodfellow, 2016) (Belciug, 2020)

Figure 4: Accuracy and Loss.

3.4 Detection

OpenCV was used to capture the video and the trained

model was loaded using TensorFlow.

A pre-trained model was used to detect faces in a

video. The weights were initialized from the

configuration file with the help of OpenCV.

After getting the bounding boxes for the faces in

the frame, that region of interest is cropped out from

the main frame, reshaped, and is then passed to the

model.

Figure 5: Detecting Face masks (https://en.wikipedia.org/

wiki/File:Victoria_Pedretti.jpg).

Every time a violation is detected that frame will

be saved locally which will later get uploaded to

Firebase Storage.

4 SOCIAL DISTANCING

DETECTION

This project uses YOLO Object detection for

detecting people in a frame and finds the Euclidean

distance between the centroids of the detected boxes.

4.1 YOLO Object Detection

You Only Look Once (YOLO) is an effective real

time object detection system. It considers object

detection as a regression problem and finds the class

probabilities for each of the bounding boxes. In one

evaluation the neural network predicts bounding

boxes and class probabilities from the image, hence

the name YOLO. The base model detects images at

an astonishing speed of 45 frames per second whereas

a smaller version called Fast YOLO detects at 155

frames per second. It performs better than other

detection methods such as Deformable Part Models

(DPM) and Region-based convolutional neural

networks (R-CNN). There are two types of

algorithms that work towards object detection-

Algorithms based on classification, and Algorithms

based on regression. YOLO falls under the latter

category.

4.1.1 How Does YOLO Work?

A single neural network is used to combine separate

parts of object detection. It takes in features from an

image to predict each bounding box. The detection is

modeled as a regression problem where an image is

divided into SxS grids. In Fig. 7, the authors set the

value of S as 7. This can be changed in the YOLO

configuration file. Each grid predicts bounding boxes

(B), their confidence scores, and their class

probabilities (C). These predictions are encoded as an

S x S x (B*5 + C) tensor.

Each bounding box has 5 predictions: (x, y, w, h)

and confidence. Each grid cell predicts conditional

class probabilities (C) as P (Classi | Object).

The architecture for this Convolutional Neural

Network was inspired from GoogLeNet model that is

used for image classification. Just like GoogLeNet

their model was implemented using 24 convolutional

layers that helps to extract features from the image

followed by 2 fully connected layers to predict the

output probabilities and coordinates. They also came

up with a faster version on YOLO named Fast YOLO

that uses a CNN with less layers (9 instead of 24) and

less filters for those layers. Other than that, the

training and testing parameters were the same

between YOLO and Fast YOLO.

Application Development for Mask Detection and Social Distancing Violation Detection using Convolutional Neural Networks

763

Figure 6: The YOLO Architecture (Redmon, 2016).

Figure 7: The YOLO Model (Redmon, 2016).

4.2 Detection

This project uses YOLO v3 to detect people. OpenCV

python package was used to read the weights and set

up the model using the configuration file.

net=cv2.dnn.readNet("./YOLO/yolov3.weig

hts", "./YOLO/yolov3.cfg")

The classes were read from “coco.names” file

which contains the names of 80 different classes such

as person, bicycle, car etc. Open CV was used to

capture the video as shown below. This project was

tested on some sample videos, on the webcam and on

an IP camera as shown below.

cap=cv2.VideoCapture('./Videos/Example

4.mp4')

cap = cv2.VideoCapture(0)

cap=cv2.VideoCapture("http://192.168.2.

2:8080/video")

In each frame, the people are detected, and boxes

are drawn along with their centroids. The centroids

for each box were calculated and appended to a list.

Red boxes indicate that the person is violating social

distancing as shown in Fig. 8 below.

Figure 8: Detecting Social Distancing Violation.

Using Euclidean distance, the distance between

the centroids can be calculated with the help of the

scipy package.

D = dist.cdist(centroids, centroids,

metric="euclidean")

Figure 9: Calculating the Euclidean Distance between

centroids.

Where D is a 2-D array as shown in the table

below:

Table 1: 2-D array containing the Euclidean distance

between each point.

(x1, y1) (x2, y2) (x3, y3)

(x1, y1) 0 5 3

(x2, y2) 5 0 5

(x3, y3) 3 5 0

Where each element in the array is calculated

according to the formula mentioned below:

𝑑𝑖𝑠𝑡𝑥



,𝑦



,𝑥



,𝑦



 



𝑥



 𝑥







𝑦



𝑦







𝑖,

𝑗

1,..𝑁𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛𝑠

(3)

From the above equation, i and j represents rows

and columns in the 2-D array, respectively.

If this distance is less than a certain threshold

value, the person is violating social distancing.

We can estimate this threshold value by finding

the ratio as follows:

Assuming the pixel density to be 96 dpi, we can

deduce that 1 pixel = 0.26458333 mm or 1 mm =

3.7795 pixels.

Taking the ratio,

𝑅𝑎𝑡𝑖𝑜



Real Height o

the person mm

Pixel Hei

ht o

the person in the frame





(4)

Now to estimate the pixel distance for 1 meter

(1000 mm):

1000

𝑅𝑎𝑡𝑖𝑜

  3.7795

(5)

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

764

This threshold value may or may not be accurate

and it needs to be increased or decreased accordingly.

Every time a violation is detected that frame will

be saved locally which will later get uploaded to

Firebase Storage.

4.3 Heat Map

Heat map is used to detect if people are crowding in

a specific area and hence violating social distancing

norms. By looking at the heat map on their app, it can

help officials identify the places that need to be

sanitized more often.

To plot the heat map, the first frame is read and

every iteration the boxes are drawn on that frame. Red

boxes indicate densely populated and sparsely

populated regions are shown by blue boxes. When the

program terminates this image is saved which is then

uploaded to Firebase storage.

Figure 10: Heat Map showing densely and sparsely

populated areas.

5 FIREBASE

5.1 Introduction

Firebase is a useful platform for mobile and web

applications. It can be a back end for many services

such as user authentication, data storage, static

hosting, real-time database etc. It also offers a

machine learning kit which has face detection, image

labeling, object detection, text recognition, digital ink

recognition and pose detection. It provides a user-

friendly platform for building mobile and web apps.

This project uses cloud storage and cloud

messaging to store detected images and push

notifications to the app.

5.2 Cloud Storage

It is a simple, powerful, and cost-effective storage

service that is built for Google scale. The Firebase

SDKs for cloud storage adds security to uploads and

downloads for the app regardless of the network

quality. These SDKs can be used for storing audio,

video, images etc. Google Cloud Storage can be used

to access the same files on the server.

The uploaded files can be accessed from both

Firebase and Google cloud and each file is stored in a

Google Cloud Storage Bucket. This allows the

developer to download and upload files from mobile

clients through Firebase SDKs and makes it possible

to do server-side processing using Google Cloud.

This project uses the Pyrebase python package to

initialize the storage reference.

A configuration key is used to link the firebase

project with the python program. This key can be

retrieved from the Firebase service. A storage

reference is created using which files can be uploaded

or downloaded.

The images which were saved earlier by the mask

and social distancing detection are uploaded to the

cloud storage.

The local path and path to be uploaded to the

cloud storage are passed as arguments:

storage.child(path_to_cloud_storage).pu

t(path_to_local_storage)

Figure 11: Implementation Path – Cloud Storage

(https://firebase.google.com/docs).

Firebase was then added to the Android project by

registering the app on Firebase and setting up the

configuration files. A storage reference is again

created, and the image is downloaded from the cloud

storage. The image then gets updated to the image

view. This can be saved locally to the Android device

as well.

5.3 Firebase Cloud Messaging

Firebase Cloud Messaging (FCM) provides an easy

platform for sending messages. With the help of

FCM, we can notify a client app via the Firebase

Admin SDK or the FCM server protocols.

Using the firebase admin SDK, a notification can

be pushed to the app. While setting up, the credentials

Application Development for Mask Detection and Social Distancing Violation Detection using Convolutional Neural Networks

765

Figure 12: Implementation Path – Firebase Cloud

Messaging (https://firebase.google.com/docs).

are verified, and the app is initialized through a

‘. json’ file. A custom notification can be sent with a

title, message, and topic. The topic determines which

user will get the notification. The app will only get

notified depending on this topic which acts as the

username as each user is subscribed to a particular

topic.

In the Android app, a Firebase Message handling

class was added which takes care of handling the

incoming messages and a notification builder for

creating notifications. The app gets notified every

time a violation is detected.

Figure 13: Notifications from StaySafe App.

6 ANDROID APP

The app was developed in Java with the help of

Android Studio which is built on JetBrains' IntelliJ

IDEA software.

The main interface of the app has the recently

updated image along with the date and time. Two

buttons are used to switch to mask detection and

social distancing detection mode. A save button is

used to save the image locally. Refresh button is used

for updating the image view to the recently added

image on the cloud. “View history” button takes the

user to a new activity for checking the history. The

view for history is implemented as a RecyclerView.

RecyclerView makes it easy to display large sets of

data. The data is supplied along with how each item

looks and the RecyclerView library dynamically

creates the elements when needed.

When an item scrolls off the screen, the

RecyclerView recycles the individual elements

instead of destroying the view. By reusing the

previous views that were scrolled off screen, it

reduces power consumption and improves the app’s

performance and responsiveness. (Fatima, 2020)

Figure 14: Main Interface.

7 APPLICATIONS

Hospitals and Clinics: This is an important domain

that needs special attention during a pandemic. To

safeguard the health of doctors and nurses, this

system can be implemented to detect whether a

patient is wearing a mask. A CCTV camera can be

placed to detect mask and social distancing violations

which can in turn help reduce crowding while patients

are waiting in queues for their appointments.

Airports: Many of the destinations around the

world have started easing travel restrictions. This

system can be implemented in airports where people

are waiting in queues at check-in facilities, security

clearance gates, passport control, waiting lobbies etc.

Shops / Workplaces: As many businesses are

opening after lockdown this system can be

implemented to ensure that customers and employees

are following all the COVID 19 safety protocols.

8 CONCLUSION AND FUTURE

SCOPE

This project has been developed to come up with an

efficient way for detecting and notifying officials

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

766

when a person does not follow the COVID 19 safety

protocols in a workplace, business establishments etc.

In this work, we have trained a model for face mask

detection using TensorFlow and Keras and used

YOLO Object detection for detecting social

distancing. The proposed CNN architecture

comprises two convolutional layers followed by relu

activation function and a max pooling layer.

YOLOv3 was used to detect people in a frame and

find the Euclidean distance between them. With the

help of OpenCV we were able to capture the video

feed from different sources like webcam, video file or

an IP camera. An android app was developed which

will get notified every time a violation is detected,

and the detected images can also be viewed through

the app. This was achieved with the help of Firebase

service. As a future study, we can work on finding a

pattern to detect or predict the time at which it gets

crowded the most and the heat map can be plotted in

a more accurate manner.

REFERENCES

Soni, A., & Singh, A. P., 2020. Automatic Motorcyclist

Helmet Rule Violation Detection using Tensorflow &

Keras in OpenCV. In 2020 IEEE International Students

Conference on Electrical, Electronics and Computer

Science (SCEECS).

Chen, S., Wei, Y., Xu, Z., Sun, P., & Wen, C., 2020. Design

and Implementation of Second-generation ID Card

Number Identification Model based on TensorFlow. In

IEEE International Conference on Information

Technology, Big Data and Artificial Intelligence

(ICIBA).

Caveness, E., C., P. S., Peng, Z., Polyzotis, N., Roy, S., &

Zinkevich, M., 2020. TensorFlow Data Validation:

Data Analysis and Validation in Continuous ML

Pipelines. In Proceedings of the 2020 ACM SIGMOD

International Conference on Management of Data.

Lu, Y., Zhang, L., & Xie, W., 2020. YOLO-compact: An

Efficient YOLO Network for Single Category Real-

time Object Detection. In 2020 Chinese Control and

Decision Conference (CCDC).

Ullah, M. B., 2020. CPU Based YOLO: A Real Time

Object Detection Algorithm. In 2020 IEEE Region 10

Symposium (TENSYMP).

Raj, A., Maji, K., & Shetty, S. D., 2021. Ethereum for

Internet of Things security. In Multimedia Tools and

Applications.

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A., 2016.

You Only Look Once: Unified, Real-Time Object

Detection. In 2016 IEEE Conference on Computer

Vision and Pattern Recognition (CVPR).

Natraj, L., & Shetty, S. D., 2019. A Translation System

That Converts English Text to American Sign

Language Enhanced with Deep Learning Modules. In

International Journal of Innovative Technology and

Exploring Engineering Regular Issue, 8(12), 5378-

5383.

Khawas, C., & Shah, P., 2018. Application of Firebase in

Android App Development-A Study. In International

Journal of Computer Applications, 179(46), 49-53.

Fatima, N. S., Steffy, D., Stella, D., & Devi, S. N., 2020.

Enhanced Performance of Android Application Using

RecyclerView. In Advances in Intelligent Systems and

Computing Advanced Computing and Intelligent

Engineering, 189-199.

Nair, L. R., Shetty, S. D., & Shetty, S. D., 2018. Applying

spark based machine learning model on streaming big

data for health status prediction. In Computers &

Electrical Engineering, 65, 393-399.

Duran-Lopez, L., Dominguez-Morales, J., Amaya-

Rodriguez, I., Luna-Perejon, F., Civit-Masot, J.,

Vicente-Diaz, S., & Linares-Barranco, A., 2019. Breast

Cancer Automatic Diagnosis System using Faster

Regional Convolutional Neural Networks. In

Proceedings of the 11th International Joint Conference

on Computational Intelligence.

Sung, M., Yu, S., & Girdhar, Y., 2017. Vision based real-

time fish detection using convolutional neural network.

In OCEANS 2017 - Aberdeen.

Hansen, D. K., Nasrollahi, K., Rasmusen, C. B., &

Moeslund, T. B. (2017). Real-Time Barcode Detection

and Classification using Deep Learning. In Proceedings

of the 9th International Joint Conference on

Computational Intelligence.

Goodfellow, I., Bengio, Y., & Courville, A., 2016. Deep

learning, Cambridge (EE. UU.). MIT Press.

Belciug, S., 2020. Artificial intelligence in cancer:

Diagnostic to tailored treatment, London, United

Kingdom. Academic Press.

Application Development for Mask Detection and Social Distancing Violation Detection using Convolutional Neural Networks

767