Pothole Detection under Diverse Conditions using Object Detection
Syed Ibrahim Hassan
, Dympna O’Sullivan
and Susan Mckeever
Department of Computer Science, Technological University Dublin, Ireland
Object Detection, Pavement Inspection, Deep Learning, Machine Learning.
One of the most important tasks in road maintenance is the detection of potholes. This process is usually done
through manual visual inspection, where certified engineers assess recorded images of pavements acquired
using cameras or professional road assessment vehicles. Machine learning techniques are now being applied
to this problem, with models trained to automatically identify road conditions. However, approaching this real-
world problem with machine learning techniques presents the classic problem of how to produce generalisable
models. Images and videos may be captured in different illumination conditions, with different camera types,
camera angles and resolutions. In this paper we present our approach to building a generalized learning model
for pothole detection. We apply four datasets that contain a range of image and environment conditions. Using
the Faster RCNN object detection model, we demonstrate the extent to which pothole detection models can
generalise across various conditions. Our work is a contribution to bringing automated road maintenance
techniques from the research lab into the real-world.
The assessment of road surface (termed road pave-
ment) condition is a crucial task to ensure their us-
ability and provide maximum safety for the public.
The costs involved in maintaining pavements are sig-
nificant both to road users (over 60% of Irish peo-
ple have had their chosen mode of transport damaged
as a result of striking a pothole according to recent
research) (ALDWORTH, 2018). The UK, councils
allocate 75% funds for the maintenance of the local
road condition and 25% for construction (Radopoulou
and Brilakis, 2015). The two most common surface
materials for road pavement are concrete and asphalt.
Concrete roads are highly durable when compared to
asphalt roads. Although concrete road surfaces last
longer, repairing them is more complex. Holes or
cracks cannot simply be patched–instead, entire slabs
must be replaced. Asphalt paving is cheaper com-
pared to concrete paving. It also creates a smoother
drive, and provides better safety due to better trac-
tion and skid resistance. Asphalt is ideal for rural
road pavements due to the ease of maintenance and
repair, patching is simpler and faster than replacing
entire slabs of roadways on less heavily trafficked ar-
eas such as country roads. But with only a 10-year
lifespan, asphalt must be re-laid or repaired on a much
more regular basis than concrete.
Pavement defects vary depending on the pavement
surface. Pavement defects include cracking caused by
failure of the surface layer. Surface deformation such
as rutting that results from weakness in one or more
layers of the pavement. Disintegration such as pot-
holes caused by progressive breaking up of pavement
into small loose pieces and surface defects, such as
ravelling caused by errors during construction such as
insufficient adhesion between the asphalt and aggre-
gate particulate materials.
In this paper we focus on the detection and local-
ization of potholes which are a common defect on
both asphalt and concrete pavement. Potholes are
a common cause of accidents and therefore require
frequent inspection and timely repair. Pavement in-
spection usually consists of three main steps: 1) data
collection, 2) defect identification and 3) defect as-
sessment. The first step is largely automatic, carried
out by specially adapted vehicles for surface survey-
ing. However, the other two steps are largely manual.
Images of road pavements are visually inspected by
structural engineers or certified inspectors who assess
Hassan, S., O’Sullivan, D. and Mckeever, S.
Pothole Detection under Diverse Conditions using Object Detection Models.
DOI: 10.5220/0010463701280136
In Proceedings of the International Conference on Image Processing and Vision Engineering (IMPROVE 2021), pages 128-136
ISBN: 978-989-758-511-1
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
the road condition against the Pavement Condition In-
dex (PCI). PCI is widely used in transportation civil
engineering across the world and many authorities use
it to measure the performance of their road infrastruc-
ture. It provides a numerical index between 0 and 100
which is used to specify general condition of pave-
An automated defect detection and localization
system could be a valuable tool for improving the per-
formance and accuracy of the pavement inspection
and assessment process as well as reduce the man-
ual overhead of the current process. Such a system
could be used to evaluate images or videos to assess
pavement condition data. Additionally, an automated
pothole detection system could be integrated into ex-
isting road inspection tools to support the inspection
process by detecting road potholes from images ac-
quired during the pavement inspection process.
This paper proposes a pothole detection method to
detect and localize potholes on road surface. We
describe the development of a Faster RCNN (Ren
et al., 2016) model for pothole detection which is
trained on a public potholes dataset (Kaggle, 2019)
and tested for its generalizability on a number of other
pavement image dataset(s). The contribution of our
work is threefold: (1) We present an object detec-
tion model that can achieve accuracies of between
70% and 90%, for the task of detecting potholes in
images; (2) We measure the impact of various real-
world conditions on the accuracy of pothole detec-
tion models,and;(3) We publish three re-labelled pub-
lic dataset(s) and contribute a labelled pothole image
dataset to the field. The rest of this paper is orga-
nized as follows. In the next section related work on
image processing and machine learning for pothole
detection has been reviewed. In section 3 we describe
experimental work to develop a trained model for pot-
hole detection and the results of testing of this model
on several dataset(s). In Section 4 we present our re-
sults and conclude with a discussion in Section 5.
With recent advances in deep learning, computer vi-
sion and image processing, research work has been
carried out on pothole detection (Dhiman and Klette,
2019). Pothole detection methods are divided into
three broad areas: vision based (Sawalakhe and
Prakash, 2018) (Koch and Brilakis, 2011) (Ryu et al.,
2015), vibration based (Yu and Yu, 2006) and 3D re-
construction methods (Hou et al., 2007) (Cao et al.,
2020). Vision based methods rely on image or
video data. This approach divides into two main ap-
proaches: image processing and machine learning.
For machine learning, traditional algorithms that have
been applied to this task have relied heavily on hand
crafted features (Daniel and Preeja, 2014) (Hoang,
2018). To overcome these challenges, deep learn-
ing models, with their capability to extract visual fea-
tures automatically from images rather than relying
on hand crafted features, have become more popu-
lar. For example, object detection models are trained
which can perform pothole detection by drawing a
bounding box around the potholes. (Bhatia et al.,
2019) investigate the feasibility and accuracy of ther-
mal imaging in pothole detection. The proposed ap-
proach consists of 3 main steps: (1) data acquisition
under various lightning condition (2) data augmen-
tation, to increase the size of dataset (3) training a
convolutional neural network (CNN) model. They
compare the result of a self-built CNN model with
pre-trained CNN models, with the pre-trained model
achieving an overall detection accuracy of 97.8%.
The main objective of this work is to find an efficient
CNN model for pothole detection using thermal imag-
ing. However, the drawback of their system is that
it can only distinguish potholes and non-potholes on
thermal images which are rarely used in road inspec-
(Ping et al., 2020) developed a pothole detection
system by training on a pre-processed pothole dataset
with four different models and compared the accu-
racy across each model. The method achieved 82%
accuracy on YoloV3 (You Look Only Once: Version
3) (Farhadi and Redmon, 2018). However, the ap-
proach has not been tested on real world examples
and their trained YoloV3 model does not accurately
detect pothole on new images. The authors com-
pare results of four different object detection models
and found that SSD (Single Shot Multi-box Detector)
gives higher accuracy but lower speed in comparison
to the YoloV3 model. YoloV3 provides higher speed
but lower detection accuracy and fails to detect small
potholes. (Gupta et al., 2020) propose a pothole de-
tection and localization system which can detect pot-
holes by drawing a bounding box around potholes.
The research utilizes thermal images and modified
ResNet50 model and achieves 91.15% average preci-
sion in detecting potholes. However, their method can
only work with thermal images which are rarely used
by pavement inspection companies. In addition, the
method can only detect potholes in images where the
camera distance is relatively close to the pothole. In
our work, we examine the problem of generalizability
of trained models in the domain of road inspection.
From the road inspection prospective, it is important
to use models that consider the variety of conditions
Pothole Detection under Diverse Conditions using Object Detection Models
such as different road surfaces, pothole sizes, distance
of pothole from imaging device. (Dharneeshkar et al.,
2020) propose a pothole detection system on their
own collected pothole dataset. The authors trained
different versions of YOLO object detection model
on 1500 images collected using a smartphone cam-
era with a resolution of 1024 x 768. The proposed
method can detect and localize potholes at different
angles and on road surface. Similarly, (Ukhwah et al.,
2019) train different versions of YOLO object detec-
tion model on images from a highway survey vehi-
cle in Indonesia. To train the YOLO model, the au-
thors use 448 grey scale potholes images. The method
demonstrates that it can detect potholes from images
that are acquired by using highway survey vehicles.
The overall average precision achieved by the vari-
ous models was 83.43%, 79.33% and 88.93% respec-
From a commercial perspective, many vehicles
are now being adapted to include automatic pothole
detection system into their autonomous driving mod-
ules. For example, Jaguar Land Rover are researching
automatic road pothole detection in which their vehi-
cles not only detect potholes but can also identify the
location as well as the severity of the pothole (Jaguar,
2015). Their proposed system can also send warning
messages to the driver. FORD are also developing a
pothole detection system where the system can warn
drivers about the location of potholes (FORD, 2018).
Another commercial application of automatic pothole
detection was proposed by (Bansal et al., 2020). Their
approach is based on vibration-based pothole detec-
tion which combines GPS, Internet of Things (IoT)
sensors including accelerometers and gyroscope and
machine learning. The IoT dataset was trained us-
ing a Support Vector Machine (SVM), Logistic Re-
gression, Na
ıve Bayes, Random Forest and K-Nearest
Neighbours (KNN). Random forest achieved highest
accuracy in detecting potholes 86.8%. The objectives
of the systems are to reduce injuries and deaths, alert
drivers about potholes before driving over them, share
pothole location data with government and civil au-
thorities to repair pothole in timely manner and to
build a real time map which updates according to lat-
est road conditions.
The task of pothole detection has many differ-
ent variations the variety of potholes, lighting lev-
els, distance of the pothole from camera, shot angle,
imaging device and weather conditions. Therefore, a
useful road maintenance prediction model should be
able to generalise. In our approach we used a pub-
licly available Kaggle dataset (Kaggle, 2019) to train
a prediction model for pothole detection which was
subsequently tested on a number of other dataset(s)
that represent a variety of real-world variations. Our
experimental set up and results are presented in the
next sections.
This paper proposes a method for automatically de-
tecting the presence and location of potholes in an im-
age, whilst also considering the variety of real world
conditions that can occur during the automatic pave-
ment assessment process. Using supervised machine
learning, we train an object detection model using an
image dataset containing potholes. We control the
number of each variation in training dataset and then
do controlled testing of these conditions using test
sets because we wish to check how well a model can
generalize to other datasets so it is necessary to con-
trol the variation of training and testing samples.
This section describes the details of preparing
training and testing data, object detection model ar-
chitecture and training and testing steps. The training
dataset used in this research was the Kaggle pothole
dataset, a publicly available dataset with one positive
class (pothole present). The dataset contains 618 im-
ages. Due to the duplication of images in the dataset,
280 unique images were selected for training. Each
image was labelled by drawing a bounding around the
pothole using the LabelImg tool (LabelImg, 2015).
The model was trained under controlled settings and
tested using 4 other datasets described in detail be-
low. Table 1 shows details for the training dataset, in-
cluding four characteristics of training images which
could challenge model generalizability: acquisition
device, distance, distance from camera and lighting
level. The values or categories for each is provided
(e.g. close distance, medium distance and far distance
which means pothole is either close from camera an-
gle or it is too far from camera angle or it is near
(medium) from camera angle). Figure 1 shows the
example of each camera angle.
Table 1: Breakdown of the Kaggle Pothole Dataset used for
Index Name Description
Training Data Kaggle Pothole
Acquisition Device Smartphone/Digital Camera
Image Size 500 x 500
Close = 1 meter
Medium = 23 meters
Far = 510 meters
Lighting Level Normal: 228, Low: 52
Total Images 280
IMPROVE 2021 - International Conference on Image Processing and Vision Engineering
Figure 1: Example of different camera angles: Top left:
Close, Top right: Medium, Bottom: Far.
3.1 Data Preparation
Training images for our model require the location of
the pothole(s) in each image to be annotated. This
was manually applied to the 280 training images by
creating a bounding box around the area of interest i.e.
a pothole. Below are the steps that were performed for
the training data preparation:
Step 1: Create a dataset using labelImg, gener-
ating XML files that contain information about object
coordinates, image name, image width and height and
object name.
Step 2: Convert these XML files into CSV file
which store each XML file details.
Step 3: Convert CSV file to Tensorflow’s own
binary storage format tf-record i.e. train-record and
test-record. TensorFlow’s object detection API re-
quires the data to be in the ‘tfrecord’ format. The
tfrecord format enables splitting, creating batches,
shuffling data and providing a uniform format across
network architectures and systems.
Figure 2: Implementation pipeline for pothole detection us-
ing Tensorflow object detection.
3.2 Testing Datasets
The focus of this research is to determine how an ob-
ject detection model trained on one dataset (with a
known range of variations) can perform against a va-
riety of conditions via a number of testing dataset(s).
First, we test if the image can distinguish between
positive (contains pothole) and negative (does not
contain a pothole) images. Then we test for a number
of conditions ranging from different image sizes, dif-
ferent image types (stereo images) and different light-
ing conditions.
The details of the 4 test dataset(s) used for test-
ing are described below. Three dataset (s) (Negative
Images (Saxena, 2019), Cranfield (Alzoubi, 2018),
Pothole-600 (Fan et al., 2020)) are publicly available.
The fourth dataset is our own data, collected by ac-
quiring the pothole images from different streets of
Dublin. In each test dataset bounding box labels are
not included, therefore we manually labelled each im-
age by drawing a bounding box around the region of
interest using the LabelImg tool. Table 2 shows the
details of 3 testing dataset(s).
A) Negative Images: This dataset (Saxena, 2019)
provides a number of negative images. Due to the
presence of other objects in images such as vehi-
cles, trees or people, the pavement surface has been
cropped from each image.
B) Cranfield Pothole Dataset (Alzoubi, 2018):
This dataset provides images of potholes on asphalt
pavement. The reason for choosing this dataset for
testing is its small image size i.e. 300 x 300 pixels,
as the model was trained on images of size 500 x 500
C) Pothole-600 Dataset (Fan et al., 2020): This
dataset provides pothole images which were acquired
using a stereo camera. The reason for choosing this
dataset for testing is that the camera source and im-
age type are different from the training set, as all im-
ages in the training set are acquired using smartphone
camera or digital camera.
D) Dublin Road Dataset: This dataset was ac-
quired from pavements in Dublin using a smartphone
camera under daylight conditions. In total we col-
lected 40 images with a resolution of 3648 x 2736.
The reason for collecting this dataset is to test images
with normal and low lighting condition. All images in
this dataset were collected during daylight hours, as
acquiring road images is not an easy or practical task
in darker environments. Therefore, an artificial low
lighting effect have been applied on the same testing
Table 2: Testing dataset details.
Test Dataset Image Size Total Images
Cranfield (Alzoubi, 2018) 300 x 300 50
Cranfield 400 x 400 50
Pothole-600 (Fan et al., 2020) 400 x 400 50
Dublin Roads 3648 x 2736 40
Pothole Detection under Diverse Conditions using Object Detection Models
Figure 3: Example images of each testing dataset: Top left:
Pothole-600, Top right: Cranfield Pothole Dataset, Bottom:
Dublin Road Dataset.
3.3 Pothole Detection with Faster
To address the object detection problem for pothole
detection we have trained a state-of-the-art object de-
tection model Faster RCNN (Ren et al., 2016). Faster
R-CNN has two stages for detection. In the first stage,
images are processed using a feature extractor (e.g.
VGG, MobileNet) called the Region Proposal Net-
work (RPN) and simultaneously, intermediate level
layers (e.g.,” conv5”) are used to predict class bound-
ing box proposals. In the second stage, these box pro-
posals are used to crop features from the same inter-
mediate feature map, which are subsequently input to
the remainder of the feature extractor in order to pre-
dict a class label and its bounding box modification
for each proposal. We also use Inception-V2 architec-
ture as a backbone of Faster RCNN model. Inception
architecture has yielded better results than a conven-
tional CNN architecture. Additionally, Faster R-CNN
model combined with Inception CNN architecture
shows an improvement in detection accuracy. The
choice of this network was motivated by the fact that
it achieves good results on different dataset(s) such as
Microsoft Common Object Context (MS COCO) (Lin
et al., 2014) and PASCAL VOC (Everingham et al.,
2010). Furthermore, it offers a structure that can be
modified according to specific task needs. Table 1
shows the details of the Kaggle dataset which is used
to train the object detection model. After training the
model, we tested the trained model using 4 dataset(s)
to investigate various parameters negative images,
image sizes, images acquired from multiple sources
and lighting levels. In our experiments model training
and testing is done using Python, and the Tensorflow
object detection API. In this work we did not use data
augmentation. For training, a NVIDIA GeForce RTX
2070 GPU was used. All experiments are performed
under Windows 10 on Intel Core i7-9750 with 16GB
of DDR4 RAM. The model was trained with Adam
optimizer and L2 regularization, using an initial learn-
ing rate to 0.00001. The modelling process was done
in approximately 2 hours with 20k iterations. During
the training process we observed that after the 20k
iteration, the training loss did not decrease substan-
tially, so we stopped the training and saved the model
parameters for the testing purpose.
3.4 Evaluation Protocol
The performance of the developed model was evalu-
ated on 4 datasets as described in Section 3.2. For
each testing set, results generated from the Faster
RCNN model were compared with the actual ground
truth. Several researchers have proposed differ-
ent evaluation methods for the object detection task
(Padilla et al., 2020) (Zhao et al., 2019). In this pa-
per, Intersection over Union (IoU) (also known as the
Jaccard index), precision and recall are used to eval-
uate trained models. IoU measures the overlap be-
tween the actual ground truth bounding box and the
predicted bounding box. We defined an IoU threshold
of 0.5 which means if the overlap between an actual
and predicted bounding box is <0.50% the model will
consider it as false positive whereas, if the overlap be-
tween actual and predicted bounding box is >0.50%
the model will consider it as true positive. In this way
precision and recall are calculated at 0.5 IoU thresh-
olds. Increasing IoU threshold results in higher pre-
cision but lower recall. Conversely, decreasing IoU
threshold gives higher recall. For example, if we
IoU threshold set to 0.9 then we get higher preci-
sion which means model can detect potholes if the
overlap between actual and predicted bounding box
is >=0.9% the model will consider it as true posi-
tive. Figure 4 shows an example of an actual and pre-
dicted bounding box used for IoU calculation. A high
IoU threshold is not required as the exact placement
of the pothole relative to predicted area just needs to
be enough to say that a pothole exists in the area.
The task does not require precise pothole perimeter
discovery. Figure 5 shows precision and recall re-
sults from experiment 4 at IoU thresholds from 0.5
- 0.9. While precision does not vary greatly at differ-
ent IoU thresholds, relaxing the IoU threshold results
in higher recall values.
IMPROVE 2021 - International Conference on Image Processing and Vision Engineering
Figure 4: Example of Intersection over Union (IoU).
In this section we describe a series of experiments
conducted on a variety of dataset(s). We implemented
four experiments - within each, we focus on specific
variable conditions for are encountered on the pothole
detection task: distinguishing positive and negative
images, variety of image size, images captured with
different devices and variations in lighting level.
4.1 Experiment 1: (Positive and
Negative Images)
This experiment shows the performance of the trained
model on negative images i.e. no pothole on the pave-
ment surface. 50 images were passed to the trained
model and 45 of those images were predicted cor-
rectly. In order to check the reason of wrong detec-
tion we check all images manually and found that the
errors were due to the appearance of shadows on the
road surface where the model considered these dark
patches as potholes.
Table 3: Results of pothole detection on negative images.
Non-pothole Pothole Accuracy
45 5 90%
4.2 Experiment 2: (Smaller Images)
In this experiment we used the Cranfield dataset and
tested with 50 images of size 300 x 300 pixels and 50
images of 400 x 400 pixels. As our model is trained
on images of size 500 x 500, in this experiment we
are investigating how well our trained model can gen-
eralize to detect potholes on smaller images. Initially
we have tested images of size 500 x 500 in order to
check the model performance on the same image size
as those of the training set. We then resize images to
test smaller size images. Table 4 shows that model
achieve almost same results on both smaller size im-
ages, with just a slight improvement on the (trained)
500 x 500 image size.
Table 4: Results of pothole detection on small image.
Model Backbone Image Size Precision Recall
Faster RCNN Inception V2
500 x 500 79% 94%
400 x 400 80% 92%
300 x 300 79% 92%
4.3 Experiment 3: (Images Captured by
Stereo Camera)
The purpose of this experiment is to test the model on
a different image type: stereo images. However the
images in this dataset were varied in terms of the dis-
tance of the pothole from the camera so we also had to
check for any effect caused by this feature. Therefore,
we conducted 2 experiments. In the first experiment
we randomly selected 50 images from the dataset and
achieved the results in Table 5.
Table 5: Results of pothole detection on stereo images.
Model Backbone Image Size Precision Recall
Faster RCNN Inception V2 400 x 400 77% 55%
Analysing the results in Table 5, the detection per-
formance of the model is low. To understand this fur-
ther, we conducted a second experiment where we di-
vided the 50 images into two test sets based on dis-
tance - 25 images with a close distance between the
camera and pothole and 25 images with a medium dis-
tance between the camera and pothole. We compared
the results to a test set with 50 non-stereo images di-
vided into two testing sets of 25 images with close
and medium distance to the camera.
Table 6: Results of pothole detection on stereo and non-
stereo images with close and medium distance.
Model Backbone
Precision Recall
Medium 25 95% 84%
Close 25 65% 52%
Medium 25 83% 100%
Close 25 63% 48%
Results in table 6 shows that the image type or
source does not affect detection accuracy, rather the
distance from the pothole to the camera affects the
detection performance. In both experiments it can be
seen that images with medium distance have high de-
tection accuracy compared to those at close distance.
The main reason for this is that our training set con-
tains only a small sample of images where the dis-
tance from the camera to the pothole is close.
Pothole Detection under Diverse Conditions using Object Detection Models
4.4 Experiment 4: (Images with
Different Lighting Conditions)
This experiment uses images that were collected
across Dublin city centre in daylight. A total of 40 im-
ages were collected with normal lighting levels. In or-
der to test different lighting conditions, we first tested
the original 40 images collected in daylight and then
applied an artificial low lighting effect to the same 40
images and retested on those. Table 7 shows the re-
sults of experiment 4. Analyzing the results in table
7 the detection performance of the model on normal
images is slightly lower than on low lighting level im-
ages. The reason for low accuracy on normal lighting
images is because the model is unable to detect pot-
holes on two images in the normal lighting dataset.
However, in the low lighting dataset, the model cor-
rectly detected pothole on those two images.
Table 7: Results of pothole detection on Dublin roads
Model Backbone Testing Dataset Precision Recall
Faster RCNN Inception V2
(Normal Light)
78% 68%
(Low Light)
78% 73%
Figure 5: Comparison of Precision and Recall at different
IoU threshold values.
The process of pothole detection for road mainte-
nance is still largely done though manual visual in-
spection of images or videos acquired using cameras
or professional road assessment vehicles. This is a
time consuming and expensive task. The task is also
complicated by variations in images such as different
image types and sizes, camera types, lighting levels
and distance of the pothole from the camera.
From experiment 1 we found that shadows and
manhole covers on the pavement surface may be iden-
tified as false positives (i.e. the model falsely identi-
fies then as potholes) and as such this adversely af-
fects the model’s performance. From the second ex-
periment, we conclude that image size is not a major
factor which could affect the detection performance
as results are similar across all sizes. From the third
experiment we conclude that the camera source is not
a major factor affecting model performance. How-
ever, the distance of a pothole from the camera device
has an impact on the detection rate. From experiment
3 we conclude that the model has difficultly in identi-
fying potholes that are close to the camera. From the
4th experiment we notice that lighting effect is not
having a major impact on the detection rate as long as
the distance of the pothole from camera device is not
We believe that our results could be improved by
more labelled training data and more balanced train-
ing samples, particularly with respect to images taken
at different distances from the image capture device.
In this paper we investigated some common issues
which are likely affect the generalizability of any
model for automated pavement assessment. We deter-
mine the variations of images in terms of image size,
distance of pothole from camera angle and lighting
We trained an object detection model using the
Kaggle pothole dataset. We used Faster RCNN with
Inception V2 as a backbone model for object detec-
tion. To check model generalizability we used a va-
riety of conditions including small image sizes, dif-
ferent image types and lighting effects. We have at-
tempted to identify factors that may impact a gener-
alizable model for pothole detection. These factors
include image size, camera source, lighting levels and
distract from the camera. To investigate these factors,
we conducted four experiments. In each experiment
we explore one condition.
We conclude that distance of a pothole from a
camera device plays an important role when aiming
to create a generalizable model for pothole detection.
A further problem for the pothole detection task is the
presence of other objects in the images (e.g. manhole
covers) that may be falsely identified as potholes.
Our work is currently limited to detecting a sin-
gle pavement defect- potholes. This can be extended
to detect multiple pavement defects such as cracks,
patches, and ruts. The trained model could be de-
ployed in a number of ways - for offline detection on
batches of images or on a smartphone or other hard-
ware such as Nvidia Jetson Nano for real time pothole
detection. Regardless of the mode of implementation,
an automatic pothole detection method may help to
speed up and lower the cost of the pavement inspec-
IMPROVE 2021 - International Conference on Image Processing and Vision Engineering
tion process which currently relies heavily on manual
human expertise.
In future work we will train with larger dataset(s).
In particular, we will train with more samples of im-
ages containing potholes that are close and far from
the camera. We will also use images that includes pot-
holes as well other objects including shadows, man-
hole as well as other common objects in pavement
imagery. In this work we have focused on open-
source images, but there are specified images that
are collected through commercial road inspection ve-
hicles and provide more consistent images of pave-
ments which can help to build more robust model spe-
cially for the task of automatic road inspection. Re-
cently other sources such as drone shots and vehicle
windscreen cameras are being used to collect pave-
ment data. Such images often contain a multitude
of objects such as vehicles, trees, traffic signs and
or/people. Such data would require pre-processing to
extract these objects before training for the pothole
detection task. Training object detection models with
larger dataset(s) will require very high computational
power and need more training time. We will also ex-
periment with tuning the hyper parameters and train-
ing the model with other feature extraction networks.
This work was funded by Science Foundation Ireland
through the SFI Centre for Research Training in Ma-
chine Learning (18/CRT/6183).
ALDWORTH, B. (2018). Road users bearing the cost
of damaged surfaces. https://www.theaa.ie/blog/
Alzoubi, A. (2018). Potdataset. https://figshare.com/
Bansal, K., Mittal, K., Ahuja, G., Singh, A., and Gill, S. S.
(2020). Deepbus: Machine learning based real time
pothole detection system for smart transportation us-
ing iot. Internet Technology Letters, 3(3):e156.
Bhatia, Y., Rai, R., Gupta, V., Aggarwal, N., Akula, A.,
et al. (2019). Convolutional neural networks based
potholes detection using thermal imaging. Journal of
King Saud University-Computer and Information Sci-
Cao, W., Liu, Q., and He, Z. (2020). Review of pavement
defect detection methods. IEEE Access, 8:14531–
Daniel, A. and Preeja, V. (2014). Automatic road distress
detection and analysis. International Journal of Com-
puter Applications, 101(10).
Dharneeshkar, J., Aniruthan, S., Karthika, R.,
Parameswaran, L., et al. (2020). Deep learning
based detection of potholes in indian roads using
yolo. In 2020 International Conference on Inventive
Computation Technologies (ICICT), pages 381–385.
Dhiman, A. and Klette, R. (2019). Pothole detection using
computer vision and learning. IEEE Transactions on
Intelligent Transportation Systems, 21(8):3536–3550.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J.,
and Zisserman, A. (2010). The pascal visual object
classes (voc) challenge. International journal of com-
puter vision, 88(2):303–338.
Fan, R., Wang, H., Bocus, M. J., and Liu, M. (2020). We
learn better road pothole detection: from attention ag-
gregation to adversarial domain adaptation. In Euro-
pean Conference on Computer Vision, pages 285–300.
Farhadi, A. and Redmon, J. (2018). Yolov3: An incremental
improvement. Computer Vision and Pattern Recogni-
tion, cite as.
FORD (2018). Ford pothole detection system.
Gupta, S., Sharma, P., Sharma, D., Gupta, V., and Sambyal,
N. (2020). Detection and localization of potholes in
thermal images using deep neural networks. Multime-
dia Tools and Applications, 79(35):26265–26284.
Hoang, N.-D. (2018). An artificial intelligence method
for asphalt pavement pothole detection using least
squares support vector machine and neural network
with steerable filter-based feature extraction. Ad-
vances in Civil Engineering, 2018.
Hou, Z., Wang, K. C., and Gong, W. (2007). Experimen-
tation of 3d pavement imaging through stereovision.
In International Conference on Transportation Engi-
neering 2007, pages 376–381.
Jaguar (2015). Pothole detection technology by jaguar land
rover. https://www.landrover.com/experiences/news/
Kaggle (2019). Web scrapped road images for pothole
detection. https://www.kaggle.com/sachinpatel21/
Koch, C. and Brilakis, I. (2011). Pothole detection in as-
phalt pavement images. Advanced Engineering Infor-
matics, 25(3):507–515.
LabelImg (2015). Tzutalin.labelimg.gitcode. https://github.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Euro-
pean conference on computer vision, pages 740–755.
Pothole Detection under Diverse Conditions using Object Detection Models
Padilla, R., Netto, S. L., and da Silva, E. A. (2020). A sur-
vey on performance metrics for object-detection algo-
rithms. In 2020 International Conference on Systems,
Signals and Image Processing (IWSSIP), pages 237–
242. IEEE.
Ping, P., Yang, X., and Gao, Z. (2020). A deep learning
approach for street pothole detection. In 2020 IEEE
Sixth International Conference on Big Data Comput-
ing Service and Applications (BigDataService), pages
198–204. IEEE.
Radopoulou, S. C. and Brilakis, I. (2015). Patch detection
for pavement assessment. Automation in Construc-
tion, 53:95–104.
Ren, S., He, K., Girshick, R., and Sun, J. (2016). Faster
r-cnn: towards real-time object detection with region
proposal networks. IEEE transactions on pattern
analysis and machine intelligence, 39(6):1137–1149.
Ryu, S.-K., Kim, T., and Kim, Y.-R. (2015). Image-based
pothole detection system for its service and road man-
agement system. Mathematical Problems in Engi-
neering, 2015.
Sawalakhe, H. and Prakash, R. (2018). Development of
roads pothole detection system using image process-
ing. In Intelligent Embedded Systems, pages 187–195.
Saxena, S. (2019). Pothole detection. https://github.com/
Ukhwah, E. N., Yuniarno, E. M., and Suprapto, Y. K.
(2019). Asphalt pavement pothole detection using
deep learning method based on yolo neural network.
In 2019 International Seminar on Intelligent Technol-
ogy and Its Applications (ISITIA), pages 35–40. IEEE.
Yu, B. X. and Yu, X. (2006). Vibration-based system for
pavement condition evaluation. In Applications of Ad-
vanced Technology in Transportation, pages 183–189.
Zhao, Z.-Q., Zheng, P., Xu, S.-t., and Wu, X. (2019). Ob-
ject detection with deep learning: A review. IEEE
transactions on neural networks and learning systems,
IMPROVE 2021 - International Conference on Image Processing and Vision Engineering