Visual Inspection of Storm-Water Pipe Systems using Deep

Convolutional Neural Networks

Ruwan Tennakoon, Reza Hoseinnezhad, Huu Tran and Alireza Bab-Hadias har

School of Engineering, RMIT University, Melbourne, Australia

Keywords:

Storm-Water Pipe Inspection, Automated Infrastructure Inspection, Deep Convolutional Neural Networks.

Abstract:

Condition monitoring of storm-water pipe systems are carried-out regularly using semi-automated proces-

sors. Semi-automated inspection is time consuming, expensive and produces varying and relatively unreliable

results due to operators fatigue and novicity. This paper propose an innovative method to automate the storm-

water pipe i nspection and condition assessment process which employs a computer vision algorithm based on

deep-neural network architecture to classif y the defect types automatically. With the proposed method, the

operator only needs to guide the robot through each pipe and no longer needs to be an expert. The results

obtained on a CCTV video dataset of storm-water pipes shows that the deep neural network architectures

trained with data augmentation and transfer learning is capable of achieving high accuracies in identifying the

defect types.

1 INTRODUCTION

Condition monitoring of storm-water pipe systems

are often carried-out to provide an und erstanding of

the current status of the storm-water sy stem, which

enables the prediction of future deterioration of the

pipes and facilitate investmen t planning. These infor-

mation can also be used in allocating maintenance and

repair reso urces efﬁciently.

An on-site inspection with closed-circuit televi-

sion (CCTV) is currently the mo st common and com -

mercially available me thod for condition assessment

of storm-water pipes. The typical inspection process

can be described as follows. A certiﬁed technic ia n

guides a CCTV camera mounted on a robot that tra-

vels inside a pipe segment. Th e technician must v i-

sually detect the defects in the pipe segment by ob-

serving the video feed. Once a defect is detected the

technician manually rotates and zoom the camera to

gain a better understan ding of th e defect and adds the

informa tion relating to th at defect (i.e. defect type,

defect parameters) to the video together with add iti-

onal information such as pipe dia meter, location, in-

spection date. The recorded video is then used for

further analysis including discrete condition r ating,

deterioration modelling and planning (Tran et al.,

2010).

The above described CCTV inspection is conside-

red semi-automated an d is time consuming, expensive

and produces varying a nd relative ly unreliable results

in som e cases due to operators fatigu e and novicity. In

addition, training a professional technician to be able

to classify all the defect types, estimate defect para-

meters and conduct inspection is costly. Due to the

above limitations of the manual inspection process,

only around ten percent of the storm-water pipe sy-

stem in Melbourn e, Australia can be insp ected given

limited budget. Incr easing the portion of the inspected

pipes would increase the re liability of the network as

well as improve the resour c e allocation and planning

processes.

In this paper, we propose an innovative method

to automate the defect de te c tion and condition asses-

sment within the pipe inspection process. With the

proposed method, the operator only needs to guide

the robot through each pipe and no longer needs to be

an expert in piping. A computer vision algorithm ba-

sed on deep-neura l network architecture is de sig ned

to classify the defect types automatically. The block-

diagram of the overall process is shown in Figure 1.

In the proposed system, the techn ic ia n still needs to

drive the robot through the pipe and record a clear

video of all the internal conditions of the pipe. The

video is then fed to the model and the model will go

through the video frame b y fr ame to detects the un-

derlying defects in each frame. After successfully de-

tecting a defect, the system extracts those frames with

defects and classify the defect type and extract de-

Tennakoon, R., Hoseinnezhad, R., Tran, H. and Bab-Hadiashar, A.

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks.

DOI: 10.5220/0006851001350140

In Proceedings of the 15th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2018) - Volume 1, pages 135-140

ISBN: 978-989-758-321-6

135

Record

CCTV video

Deep-learning based

Defect type detection

Extract detect

frames

Extract detect

Parameters

Analysis

Report

Figure 1: The overall block-diagram of the automated storm-water pipe inspection process.

fect parameters required for condition assessment and

further analysis. By applyin g a utomated visual in-

spection, the reliab ility of the inspection process can

be improved. In a ddition, this au tomatic system redu-

ces th e cost and time in comparison with the manual

visual inspection process.

The remainder of the pape r is organized as fol-

lows: Section 2 provides a review of existing pipe in-

spection methods and deep neura l networks. Section

3 provides a description of the overall method and

Section 4 show the re sults of of our experim ents. Fi-

nally Section 5 concludes the paper.

2 BACKGROUND

2.1 Automated Inspection of

Storm-water/Sewage Pipes

Numerous attempts have been made to automate the

pipe inspection process using computer vision and

machine learning techniq ues. Xu et al. (Xu et al.,

1998) proposed an automated method for pipe d efor-

mation analysis and crack detection that uses image

processing techniques such as edge detection and bi-

nary image thresholding c ombine with boundary seg-

ment analy sis.

Shehab and Moselhi (Shehab and Moselhi, 2005)

propose a machine learning based metho d for in-

ﬁltration d e te ction in pipes. They ﬁrst extracted

17 features from images of pipes using a sequence

of image processing operatio ns including: dilation,

backgr ound subtraction, thresholding, and segmenta-

tion. These features we re then used in a neu ral net-

work to predict th e presence of inﬁltration which was

trained using back propagation. Yang and Su (Yang

and Su, 2008) also proposed a mach ine learning ba-

sed automatic pipe inspection framework. They ex-

tracted texture based features from the image using

techniques including wavelet transform and compu-

tation of co-occurrence matrices. These features a re

used with, th ree machine learning approaches: back-

propagation neural network (BPN), radial basis net-

work (RBN), and support vector machine (SVM) to

classify pipe defect patterns to following categories:

broken pipe, crack , fracture, and open joint. By ana-

lysing the above mentioned classiﬁers they concluded

that SVM and RBN are better than BPN.

Yang and Su (Yang and Su, 2009) proposed a pip e

defect detection method that utilized both supervised

and un-supervised techniques. In their method images

from CCTV camera were ﬁrst co nverted to a set of fe-

atures using morphology based segmentation techni-

que. The mo st important features w ere then identi-

ﬁed using principle componen t analysis and used in

a Radial basis network (RBN) to cla ssify them into

one of the following defect types: broken pipe, crack,

fracture, and open joint. Su and Yang (Su and Yang,

2014) also proposed a mo rphological segmentation

based method f or detecting def ects in CCTV video

of sewer pipelines. This method was only designed to

identify cracks and open joints in pipelines.

Halfawy and Hengmeechai (Halfawy and Heng-

meechai, 2014) proposed a method that ﬁrst extract

image region of interest using image segmentation

techniques. Next histogram of gradient feature s were

extracted from those regio ns and used in a SVM clas-

siﬁer to predict weather the regio n is defective or not.

None of the above mentioned methods are relia-

ble enough to completely replace the curre nt manual

inspection due to the limitation of d a ta size, data col-

lection techniques, image proce ssing and pattern clas-

siﬁcation approaches (Guo et al., 2009). Also, most

of them only cover few of the defect types.

2.2 Deep Convolutional Neural

Networks

Since winning th e I mageNet competition in 2012

(Russakovsky et al., 2015), deep-learning method has

gained signiﬁcant attentio n in computer vision com-

munity with many applications in image classiﬁcation

and segmentation.

Deep convolutional neural networks (CNN) used

in image classiﬁcation compr ises of multiple layers of

convolutio n operations coupled with non-lin e ar ope-

rations. The output of the convolutional stack is fed

through a classiﬁcation neural network that output th e

probability of the input image belonging to each of the

prediﬁned categories (Krizhevsky et al., 2012). The

parameters of the overall network is learned end-to-

end using back propagation algorithm on labelled trai-

ning data. Many CNN arc hitectures has been propo-

sed so far for image classiﬁcation tasks and, the state-

ICINCO 2018 - 15th International Conference on Informatics in Control, Automation and Robotics

136

of-the art method include (Simonyan and Zisserman ,

2014) , (He et al., 2016) and (Szegedy et al., 2015 ).

Unlike traditional mach ine learning that require

the features to be h a nd-craf ted, CNNs learn the re-

levant features fr om data. However, end-to-end trai-

ning of a deep neura l ne twork requires large amount

of labelled data which might not be available for many

applications. Several approachers have been propo -

sed to solve this problem including: Unsupervised

pre-train ing of feature layers, data-augmenta tion and

transfer learning (use off-the shelf pre-trained models

and ﬁne-tune the ﬁnal classiﬁcation layer).

3 PROPOSED METHOD

In this section we describe our proposed method for

storm water pipe inspection. T he focus of the paper is

the n ovel defect type detection module which is desig-

ned to detect ﬁve main types of defects found in stor m

water pipes i.e. 1) Breakin g - complete separation of

a pipe segment due to a radial crack 2) Cracks - either

radial or longitudinal 3) Deposition - sediment build-

up on the ﬂoor of the pipe 4) Root intrusion - intrusion

of tree roots through a gap in the pipes at a cra ck or at

the place where two pipes segments join and 5) Ho-

les. Examples of each defect type is shown in Figur e

2. Unlike the existing methods tha t use hand-crafte d

features c ombined with a learned classiﬁer, in this pa-

per we intend to use a deep neural n etwork that learns

end-to-end using data alone.

3.1 CNN Architecture and Cost

Functions

Given a set of labeled video frames, X = [x

, y

]

i=1

where x

is a vid e o fram e, y

is th e co rresponding

class and N is the total numb er of training in stances,

the intention here is to learn a p a rametrised function

f (x

;θ) that maps an unseen image to a correspon-

ding class. In this paper we test two network archi-

tectures to model this fun ction. The ﬁrst network was

a shallow network with only six layers, which inclu-

des three convolution layers (Conv), a Global average

pooling layer (GAP) and two fully connected layers

(FC). The above model has on ly a few parameters

(670,981 trainiable parameters) compared to typical

deep network s, and the architecture is shown in Ta-

bel 1. The next model is based o n the well known

ResNet-50 architecture (He et al., 2016). This net-

work consists of 50 residual blocks and it is selected

as it is a deep architecture that pr ovides an appropriate

balance between comp lexity an d accuracy for image

Table 1: Architecture of the shallow network. Relu stands

for Rectiﬁed linear units.

Layer Type Activation Shape Filters

1 Conv Relu 11x11 128

2 Conv Relu 5x5 256

3 Conv Relu 3x3 512

4 GAP

5 FC Relu 128

6 FC Softmax 5

classiﬁcation task. However this architecture has m il-

lions of parameters ( 23,597,957 trainable parameters)

and training of the network needs large amount of

data.

Tra ining of the mod e s require a loss function that

quantiﬁes the errors made by comparing the model

output with the supervision signal (ground truth labels

for each image). Here we used a categorical cross-

entropy as the loss function. The categorical cross-

entropy loss function can be written as:

L =

∑

i=1

∑

j=1

i j

ln ˆy

i j

(1)

where y

i j

is the ground-truth indicating whether

image i belongs to category j, ˆy

i j

is the predicted pro-

bability of image i belong ing to category j, N is the

number of instances in the dataset and C is the number

of detection classes which is 5 in ou r application.

3.2 Transfer Learning

The way the model parameters are initialized would

have a signiﬁcant effect on the ﬁnal result. One way

to initialize the par ameters is to set them to randomly

chosen values. This method of mode l training is cal-

led training from scratch and this does not involve any

prior information. As a neural network contained m il-

lions of parameters training from scratch effectively

requires a large dataset. It is not economically feasi-

ble to generate such a large dataset in ou r application.

Another well known method to train networks

with limited data is to start from a set of parameters

that is tr ained on a different domain and ﬁne tune the

parameter with the limited dataset collected for the

task of interest. As we have only a limited dataset we

adopted this approach and used the para meters of the

ResNet-50 model that was trained on natura l image

classiﬁcation task in ImageNet competition (He et al.,

2016). The ImageNet challenge involved classifying

natural images into 1 000 different classes. Becau se

our application involves only ﬁve classes and does

not map into any class that is in ImageNet competi-

tion, we removed the last classiﬁcation layer of the

network a nd added a new layer which was initialized

to random values.

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks

137

Figure 2: Examples of defect types.

3.3 Model Training

3.3.1 Dataset Preparation

We obtained 90 videos of storm water pipe inspecti-

ons and most of the video range from 15 to 25 mi-

nutes in duration recorded at 25 frames per second.

The v ideos were ﬁrst divided into training and valida-

tion splits of 80% and 20% respectively. Th e se videos

were then broken down into images. After converting

these videos to images, those images containing de-

fects were taken out and moved to diff erent folde rs

with respect to its type of defects. Each image has a

resolution of 720 X 576, wh ich matches the r e solution

of the v ideo. Since ima ges are generated from video

frame by fram e, it is inevitable to get many duplicated

photos, which would cause over-ﬁtting of the model.

A sof tware called “Dup lica te Photo Fixer” was used

to remove all the duplicated images at 83% simila-

rity. We ach ieved 13 classes of images at the very

beginning, which includes crac king, breaking, hole,

sparling, fracture, intrusion, roo t, steel reinforcement

explosion, deposition, water accumulation, and angu-

lar, longitudinal and radial joint displacement. But

due to image data insufﬁciency for so me of the defect

types, we only kept the breakin g, crack, d e position,

hole and root defect instances.

3.3.2 Class Balancing

The resulting dataset was not balanced as it had large

number of instances from some classes and few in-

stances of some other classes. Tra ining a model with

such class imbalance would result in the model lear-

ning to predict only the dominant classes in the data-

set. To overcome this issue, we balanced the classes

by oversampling the instances in less frequent classes.

3.3.3 Data Augmentation

Due to the limited number of ima ges and variation

of the classes, over-ﬁtting would be a major concern.

Consequently, it is a challenge to achieve high clas-

siﬁcation accuracy by the limited num ber of data we

have. To reduce this issue, we applied imag e s aug-

mentation on the dataset. New images were created

by randomly zooming, shearing and horizonta lly ﬂip-

ping the origina ls, so that a relatively larger dataset

exist to train the model and reduce the likelihood of

over- ﬁttin g.

Once the dataset was prepared, we trained the mo -

del using ADAM o ptimization. The network was trai-

ned on a Nvidia Titan X GPU with 12 GB of RAM for

150 epoch. The hyp er parameters fo r training are: Ba-

tch size: 32, learning rate 0.001, β

= 0.9, β

= 0.999.

ICINCO 2018 - 15th International Conference on Informatics in Control, Automation and Robotics

138

(a) S-net

(b) Resnet-RND (c) Resnet-TL

Figure 3: Area under ROC curve plots for each tested network types.

Table 2: The confusion matrix on the validation set for the method Resnet-TL. Dep stands for the class deposition.

Break Roots Crack Dep Hole

Break 290 33 48 9 20

Roots 7 383 1 3 6

Crack 88 6 300 0 6

Dep 1 0 2 397 0

Hole 33 12 30 0 325

Table 3: Per-Class Precision and Recall of each model.

Class S-net Resnet-RND Resnet-TL

Precision Recall Precision Recall Precision Recall

Breaking 0.45 0.38 0.65 0.70 0.69 0.72

Roots 0.80 0.70 0.89 0.7 4 0.88 0.96

Cracks 0.78 0.42 0.73 0.76 0.79 0.75

Deposition 0.71 0.93 0.95 1.00 0.97 0.99

Hole 0.50 0.72 0.88 0.8 8 0.91 0.81

Average 0.65 0.63 0.82 0. 82 0.85 0.85

4 RESULTS

We tested the trained models on a h eld out validation

set created from 20% o f the original inspection vi-

deos. The validation set consists of 400 images per

each category. The evaluations were done using area

under the receiver operating characteristics (ROC)

curve. We also report the p recision and rec all for each

category.

The results for the shallow network (S-net),

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks

139

Resnet-50 without random initialized weig hts

(Resnet-RND) and Resnet-50 with transfer learning

(Resnet-TL) are shown in Figur e 3 and Table 3.

The results show that S-net, even with fewer pa-

rameters compar ed to Resnet, has only been able to

achieve an overall ROC value of 0.88 . However both

Resnet with and without transfer learning has bee n

able to obtain h igh ROC values of 0.97 and 0.96 re-

spectively. The Resnet with transfer learning shows

slightly better ROC values in classifying breaking and

roots whereas the ROC values across othe r catego-

ries are similar to that without transfer learning. The

results indicate that data augmentation has enabled

accurate learning of a deep network with limited data

in storm-water pipe inspection.

The confusion matrix on the validation set for the

method Resnet-TL is shown in Table 2. The confu-

sion matrix shows that th ere is some misclassiﬁcation

between the classes cracks and breaking. This behavi-

our is understandable given that the two defect types

mentioned above share similar physical characteris-

tics.

5 CONCLUSIONS

The pape r presents a new method for automated vi-

sual inspection of the storm water pipes. The main

novelty of our method is to use a deep convo lutional

neural network in identifying the defect types. The re -

sults obtained on a held out validation set shows that

proposed deep neural network architectures trained

with data augmentatio n and transfer learning are ca-

pable of achiev ing high accuracies in identifying the

defect typ e s.

In these experiments we have only used ﬁve de-

fect types due to the limited availability of data from

other categories and we intend to increase th is in fu-

ture work. Defect parameters such as the crack w idth

are also important in de cision making and we intend

to extend our work towards automated prediction of

defect parameters.

ACKNOWLEDGEMENTS

The Authors would like to tha nk Mr. Juncheng Li for

his help with the implementation part of this work.

REFERENCES

Guo, W., Soibelman, L., and Garrett, J. (2009). Automa-

ted defect detection for sewer pipeline inspection and

condition assessment. Automation in Construction,

18(5):587 – 596.

Halfawy, M. R. and Hengmeechai, J. (2014). Automated

defect detection in sewer closed circuit television ima-

ges using histograms of oriented gradients and support

vector machine. Automation in Construction, 38:1 –

13.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resi-

dual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).

Imagenet classiﬁcation with deep convolutional neu-

ral networks. In Advances in neural information pro-

cessing systems, pages 1097–1105.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,

Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-

stein, M., Berg, A. C., and Fei-Fei, L. (2015). Image-

net large scale visual recognition challenge. Interna-

tional Journal of Computer Vision, 115(3):211–252.

Shehab, T. and Moselhi, O. ( 2005). Automated detection

and classiﬁcation of inﬁltration in sewer pipes. Jour-

nal of Infrastructure Systems, 11(3):165–171.

Simonyan, K. and Z isserman, A. (2014). Very deep con-

volutional networks for large-scale image recognition.

arXiv preprint arXiv:1409.1556.

Su, T.-C. and Yang, M.-D. (2014). Application of morpho-

logical segmentation to leaking defect detection in se-

wer pipelines. Sensors, 14(5):8686–8704.

Szegedy, C., Liu, W., Jia, Y., S ermanet, P., Reed, S., An-

guelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.,

Rick Chang, J.-H., et al. (2015). Going deeper with

convolutions. In The IEEE Conference on Computer

Vision and Pattern Recognition (CVPR).

Tran, H. D., Perera, B. J. C., and Ng, A . W. M. ( 2010). Mar-

kov and neural network models for prediction of struc-

tural deterioration of storm-water pipe assets. Journal

of Infrastructure Systems, 16(2):167–171.

Xu, K. , Luxmoore, A., and Davies, T. (1998). Sewer pipe

deformation assessment by image analysis of video

surveys. Pattern Recognition, 31(2):169 – 180.

Yang, M.-D. and Su, T.-C. (2008). Automated diagnosis of

sew er pipe defects based on machine learning approa-

ches. Expert Systems with Applications, 35(3):1327 –

1337.

Yang, M.-D. and Su, T.-C. (2009). Segmenting ideal mor-

phologies of sewer pipe defects on cctv images for au-

tomated diagnosis. Expert Systems with Applications,

36(2, Part 2):3562 – 3573.

ICINCO 2018 - 15th International Conference on Informatics in Control, Automation and Robotics

140