Sewer-AI: Sustainable Automated Analysis of Real-World Sewer Videos

Using DNNs

Rajarshi Biswas, Marcel Mutz, Piyush Pimplikar, Noor Ahmed and Dirk Werth

August-Wilhelm Scheer Institut, Uni-Campus D 5 1, 66123 Saarbr

ucken, Germany

{ﬁrstname.lastname}@aws-institut.de

Keywords:

Computer Vision, Deep Learning, Industrial Application.

Abstract:

Automated maintenance of sewer networks using computer vision techniques has gained prominence in the

vision-research community. In this work, we handle sewer inspection videos with severe challenges. These

obstacles hinder direct application of state-of-the-art neural networks in ﬁnding a solution. Thus, we perform

an exhaustive study on the performance of highly successful neural architectures on our challenging sewer-

video-dataset. For complete understanding we analyze their performance in different modes. We propose

training strategies for effectively handling the different challenges and obtain balanced accuracy, F1 and F2

scores of more than 90% for 17 out of the 25 defect categories. Furthermore, for developing resource efﬁcient,

sustainable versions of the models we study the trade-off between performance and parameter pruning. We

show that the drop in average performance of the networks is within 1% with more than 90% weight pruning.

We test our models on the state-of-the-art Sewer-ML-dataset and obtained 100% true positive rate for 8 out of

18 defect categories in the Sewer-ML-dataset.

1 INTRODUCTION

The sewage system is an indispensable part of civic

life all around the world. In Germany the sewer net-

work is approximately 594,321 kilometers long as of

2018 while the US has almost 2.08 million kilometers

in total network length (ASCE, 2017). Various fac-

tors cause progressive aging of the sewer networks.

Thus, systematic inspection is required for maintain-

ing the sewers. Sewer networks maintenance in most

countries is carried out by the network operators who

perform sampling inspections with the aid of TV-

camera mounted robotic probes. These probes record

the sewer conditions using a rotating video camera

while being remotely driven by a professional inspec-

tor. This is the state-of-the-art in sewer inspection,

however, the cost incurred in terms of man hours and

ﬁnance is exorbitant. The inspectors need to examine

the video for identifying the defects for extended time

lengths which is exhausting, time-consuming and er-

ror prone. This results in inconsistent labels, i.e., the

same image is assigned multiple labels by different in-

spectors. It is depicted in ﬁgure 4 where each image

is assigned to multiple different categories.

It costs approximately 2500C (Euros) to inspect

one kilometer of the sewer network in Germany. This

is compounded by the acute shortage of labor and do-

main experts. These factors render the current ap-

Figure 1: Challenging artefacts in real world sewer data.

proach for monitoring sewers both unscalable and un-

sustainable. In this paper, we perform an in-depth

analysis of the suitability of deep learning based mod-

els for automating the network inspection process and

thereby reducing the cost burden on the maintenance

industry. However, the prospect of direct application

of deep learning models from the computer vision

community is greatly limited due to several difﬁcult

challenges. The primary obstacle is the inconsistency

in data collection, that is, there is no uniform stan-

dard with regards to equipment, format and resolu-

tion in which data is stored. Moreover, the dataset is

Biswas, R., Mutz, M., Pimplikar, P., Ahmed, N. and Werth, D.

Sewer-AI: Sustainable Automated Analysis of Real-World Sewer Videos Using DNNs.

DOI: 10.5220/0011709400003411

In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2023), pages 897-904

ISBN: 978-989-758-626-2; ISSN: 2184-4313

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

897

Figure 2: Highly imbalanced class distribution in real world

sewer data.

characterized by poor quality as most of the recorded

videos contain blur, grain and distortion artefacts (re-

fer to ﬁgure 1). There is also huge variation in terms

of lighting distribution, camera orientation, material

characteristics and inconsistent labeling, i.e., there is

high inter-annotator disagreement (refer to ﬁgure 4).

In addition to this, the dataset suffers from highly

imbalanced distribution among the labels (as can be

seen in ﬁgure 2). In order to systematically handle

all these obstacles we perform three experiments with

increasing degree of complexity for the classiﬁcation

task. We study the performance of the chosen neu-

ral architectures, HRNet (Wang et al., 2019), ResNet-

152 (He et al., 2016), MobileNet V3 Large (Sandler

et al., 2018), DenseNet-264 (Huang et al., 2017), In-

ception V4 (Szegedy et al., 2016) and Efﬁcient-Net

(Tan and Le, 2019) respectively through these exper-

iments. Our motivation behind these experiments is

to determine the comparative difﬁculty-level posed

by the different defect classes to these networks un-

der the non-uniform data collection and labeling stan-

dards of the sewer maintenance industry. In the ﬁrst

experiment, Single-defect classiﬁcation (E1), we test

the networks for individual defect category identiﬁca-

tion using dataset containing only positive instances

from speciﬁc defect category and all the negative in-

stances present in our dataset. Then in the second ex-

periment, One-vs-all classiﬁcation (E2), we observe

the performance of the networks in distinguishing a

speciﬁc defect category against all the categories, in-

cluding the non-defect instances, together. Through

this experiment, we test the inﬂuence of the low inter-

class distinction among some of the labels and other

inconsistencies in our dataset on the classiﬁcation per-

formance of the networks for each defect category.

Finally, as the last experiment Classiﬁcation under

Weight-pruning (E3) we test the drop in performance

of the trained models when performing inference with

heavily pruned models. Our goal here is to study the

suitability of the chosen networks under resource con-

strained environments. To summarize, our contribu-

tions in this paper are the following:

• Comprehensive performance analysis of the most

effective neural network architectures for sewer

video classiﬁcation using three different experi-

ments with gradual increase in complexity.

• Performance estimation of the neural networks

under heavy weight pruning for developing re-

source efﬁcient and sustainable solutions.

• Training strategies for handling acute data imbal-

ance in the sewer data.

2 RELATED WORK

The success of deep learning models in solving com-

puter vision problems has prompted their widespread

adoption in various industrial applications. The civil

engineering discipline too is witnessing a steady

adoption of different techniques for conducting visual

inspection of large civic infrastructures. There are ap-

proaches that borrow different classical vision meth-

ods for analysing the condition of pavements, bridges,

roads tunnels etc. Additionally, there is emphasis

too on using 3D modeling for digital reconstruction

and visualization of these large infrastructures for im-

proving the inspection quality. The area of auto-

matic sewer network analysis also generates atten-

tion in the vision research community. This has lead

to the development of crack detection in the sewers

using image processing (Halfawy and Hengmeechai,

2014) and segmentation (Iyer and Sinha, 2006) meth-

ods. Mathematical morphology is used by the au-

thors in (Sinha and Fieguth, 2006) for classifying

cracks, holes and joints in the sewer pipes whereas the

work (Dang et al., 2018) uses morphological opera-

tions along with edge-detection, binarization for iden-

tifying defects by recognizing the text on the sewer

videos. Employment of models with task-speciﬁc

features or heuristic decision rules are reported in

(Myrans et al., 2018). However, classical approaches

such as the ones mentioned require a lot of feature en-

gineering, extensive pre-processing routines and suf-

fer more from noisy, low-quality data. Deep learn-

ing based approaches resolve this problem. As for

example, convolutional neural networks or similar

deep neural network derived applications learn di-

rectly from the data obviating feature engineering

with little to almost non-existent pre-processing ef-

forts. The authors in (Cha et al., 2017; Montero

et al., 2015) demonstrate the efﬁciency of convolu-

tional neural networks over conventional techniques

for tunnel inspection. The introduction of deep learn-

ing has led to improvements in image & video anal-

ysis (Fang et al., 2020; Moradi et al., 2020; Wang

et al., 2021), estimation of water level (Haurum et al.,

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

898

2020; Ji et al., 2020), defect identiﬁcation (Cheng and

Wang, 2018; Kumar et al., 2020; Yin et al., 2020),

segmentation (Pan et al., 2020; Piciarelli et al., 2019;

Wang and Cheng, 2020) along with the handling of

the classiﬁcation problem in the multi-class setting

(Hassan et al., 2019; Kumar et al., 2018; Li et al.,

2019; Meijer et al., 2019; Xie et al., 2019). The

steady industrial and academic interest in automat-

ing the sewer maintenance process has led to utiliza-

tion of different technologies. For instance, the works

(Duran et al., 2002; Liu and Kleiner, 2013) report

the use of various sensors in this area. More specif-

ically, there is use of acoustic sensors (Iyer et al.,

2012) , depth sensors (Alejo et al., 2017; Haurum

et al., 2021; Henriksen et al., 2020) and laser scan-

ners (Lepot et al., 2017; Dehghan et al., 2015) for de-

fect detection and reconstruction of the sewer pipes.

However, none of the works mentioned above study

the performance of different highly effective neural

networks for classifying sewer defects in three differ-

ent modes as described earlier. Moreover, through the

pruning experiment we derive performance bounds

for developing lightweight versions of the networks

suitable for operation under resource constrained en-

vironments. To the best of our knowledge our work is

the ﬁrst in the area of automated sewer video analysis

to perform this study which can help in developing

sustainable and energy efﬁcient solutions. Further-

more, we benchmark our models against the Sewer-

ml dataset (Haurum and Moeslund, 2021) for under-

standing their generalization capabilities and obtain

better results compared to them.

3 DATASET

The data in our work is collected by registered sewer

network operators from 221 municipalities in Ger-

many. The operators employ human experts to iden-

tify the defects in the videos and use two software,

IBAK and PIPEX respectively, for annotating the

video data with the identiﬁed defects. The defects

are categorized into standard defect inspection codes

used all across Germany for sewer maintenance. A

total of 10,205 video ﬁles are collected from the 221

municipal locations amounting to 2.37 Terra-Bytes.

For every video, we extract defect frames from the

time interval marked as featuring a set of defects and

extract the negative frames using an offset of 10 sec-

onds outside this interval in either direction of the

time axis. Following this strategy we collect 156,028

frames, having resolutions of 576x480 and 576x768

respectively, in total amounting to 87.10 GB of frame

data. The extracted frames contain different informa-

Figure 3: Inpainting based text removal from frames.

(a) BAG,BAH,BCA,BAJ (b) BAJ,BAB

Figure 4: Ambiguous instances with multiple labels.

tion in the form of text overlay. However, such tex-

tual information can wrongly inﬂuence the decision

making process of the neural classiﬁers. So, we re-

move the text from the images using optical character

recognition, for identifying the text, followed by in-

painting (Bertalmio et al., 2001). The effect of this

frame processing is seen in ﬁgure 3. There are 25

different defect categories, however, the distribution

of instances pertaining to each of these categories is

highly imbalanced as shown in ﬁgure 2.

4 METHOD

We perform three different experiments with the cho-

sen neural networks to handle the challenges in our

real world dataset. Our goal is to systematically es-

timate the performance of the different chosen net-

works. So, as our ﬁrst experiment (E1), we train the

networks to perform single defect category identiﬁ-

cation. This is followed by the second experiment

(E2), where we test the networks ability to perform

one-vs-all classiﬁcation and ﬁnally in our last exper-

iment (E3), we prune systematically the parameters

of the network up to 98 percent of it’s original value

and study the drop in performance compared to the

results obtained from experiments E1 and E2 respec-

tively. The motivation behind our experiments is to

handle challenges in our dataset in an increasing de-

gree of complexity. Since our dataset suffers from

Sewer-AI: Sustainable Automated Analysis of Real-World Sewer Videos Using DNNs

899

Table 1: Average performance of the networks for experiments E1, E2, E3.

Neural Net E1 E3(E1 + Pruning) E2 E3(E2 + Pruning)

BA F1 F2 BA F1 F2 BA F1 F2 BA F1 F2

HRNet 85.27 0.81 0.83 84.46 0.82 0.80 77.67 0.71 0.65 79.12 0.74 0.69

ResNet 82.77 0.82 0.82 81.54 0.82 0.82 78.65 0.73 0.67 78.75 0.73 0.67

EfﬁcientNet 75.39 0.73 0.73 74.88 0.70 0.70 73.34 0.63 0.59 72.75 0.67 0.65

MobileNet 82.83 0.80 0.78 78.76 0.72 0.70 79.34 0.74 0.68 74.90 0.62 0.58

Inception 82.05 0.77 0.76 82.76 0.79 0.78 76.55 0.70 0.66 76.41 0.69 0.66

DenseNet 82.25 0.81 0.75 81.25 0.78 0.76 80.55 0.76 0.72 80.94 0.76 0.70

Best Avg. Perf. 89.38 0.88 0.88 90.13 0.89 0.90 84.59 0.82 0.79 84.08 0.83 0.82

various limitations, discussed previously, we ﬁrst con-

sider the simple setting of single defect category iden-

tiﬁcation (E1). Subsequently, we consider the com-

paratively complex task of one-vs-all classiﬁcation

with our dataset (E2) followed by the parameter prun-

ing experiment (E3).

We divide our dataset into train, validation and test

sets using a 80-10-10 split for each category. How-

ever, it is important to note that our dataset is highly

skewed towards negative images, that is, the number

of frames without any defects is much larger com-

pared to the number of frames in any individual defect

category. For handling this imbalance while training

our models we follow a speciﬁc training strategy. In

this approach, we ﬁrst consider all the positive train-

ing instances of a particular class and an equal num-

ber of random negative instances in our ﬁrst epoch of

training. In the next epoch, we keep the same positive

instances and take another set of random negative im-

ages of same size and continue training. In this way,

in every epoch we keep the same positive images but

keep considering set of random negative instances of

same size from our negative frames. We repeat this

strategy for 100 epochs for training the chosen neu-

ral networks. We have intentionally extracted nega-

tive frames from the videos in much larger proportion

compared to the frames containing the defects as we

wanted to understand the performance of the models

under real conditions where on average the defects oc-

cur in a highly infrequent manner.

For training the models in the one-vs-all classiﬁ-

cation mode, experiment E2, we use the pre-trained

single defect classiﬁers and ﬁne-tune the models on

the dataset reorganized in a one-vs-all way. We

take the speciﬁc defect category as the positive class

and all the other defect categories including the non-

defective instances as the negative class. For ease of

reference we assign the non-defective instances to the

“No Defect” category. In each epoch, we take the

same number of random “No Defect” instances as

the number of positive instances, and in the negative

dataset we further add the same number of instances

from the other defect categories with equal distribu-

tion from each category. Now, our negative dataset

size is twice the size of our positive dataset. This

setup ensures that the model will encounter a higher

proportion of “No Defect” instances compared to spe-

ciﬁc defect category instances during training. At the

same time, in addition to the “No Defect” instances,

it will be able to distinguish the speciﬁc defect cat-

egory from the other defect categories. For the ex-

periments E1 and E2, we optimize binary cross en-

tropy loss using a batch size of 16 and perform hy-

perparamter search over the optimizers Adam, SGD

and over learning rates of 1e-3, 1e-4 respectively. We

compute balanced accuracy, F1 and F2 scores in all

our experiments for measuring the performance of the

models on the test set.

With the view of developing energy efﬁcient so-

lutions and reducing the computational burden we

study the inﬂuence of parameter pruning on the per-

formance of our models, obtained from E1 and E2 re-

spectively, under experiment E3. We use global prun-

ing for this purpose. In this approach, different layers

of the model are considered as a single global mod-

ule, and the weights closer to zero are pruned depend-

ing upon the amount of pruning percentage speciﬁed.

This type of pruning converts dense matrix represen-

tation to a sparse matrix, due to which the model size

gets reduced substantially. We tried pruning percent-

ages of 30, 50, 70, 90, 98 respectively. We keep on

increasing the pruning percentage as speciﬁed, in suc-

cession, if the balanced accuracy drop on test dataset

is less than 1 percent. If it drops more than one per-

cent, we stop pruning that network. This way we are

able to do dynamic pruning and come up with the best

pruning threshold or percentage for all the models for

each defect category. Following this strategy, we are

able to prune the models to a tenth of it’s original size

which signiﬁcantly helps us in reducing the storage

and computational budget. Through this pruning ex-

ercise we try to ﬁnd the most accountable weights of

our models responsible for making correct predictions

and discard those that are either irrelevant or degrade

the model’s performance. With each pruning thresh-

old, we check performance of each pruned model on

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

900

Figure 5: F1 and F2 scores obtained for each defect cate-

gory.

Figure 6: Balanced Accuracy for each defect category under

experiments E1, E2 and E3.

the test set that is used in our previous experiments

and report the results obtained with the pruned mod-

els.

5 RESULTS AND DISCUSSION

We trained six different classiﬁers, HRNet, ResNet-

152, EfﬁcientNetV2-L, MobileNet V3 Large, Incep-

tion V4 and DenseNet-264 with signiﬁcantly differ-

ent number of trainable parameters, for identifying

each defect category. We follow the training strat-

egy described in section 4 for handling the critical

imbalance in our dataset. The results obtained with

the chosen networks for classifying each defect cat-

egory under the experiments E1, E2 and E3 are re-

ported in detail in ﬁgures 5 and 6. The balanced accu-

racy scores obtained from the chosen neural networks

under all three experiments for every defect category

in our sewer dataset can be seen in ﬁgure 6. We re-

port the best F1 and F2 scores obtained for each defect

category from our experiments in ﬁgure 5.

Under experiment E1 we trained the six different

neural networks on each of the 25 defect categories.

Out of the 25 defect categories, we obtained more

Table 2: Performance of our single defect models on the

Sewer-ML test set.

SewerML-Code Our-Code TPR on SewerML

VA BDD 2.20%

RB BAB 100.00%

OB BAF 2.49%

PF — —

DE BAA 100.00%

FS BAG 98.83%

IS BAI 2.15%

RO BBA 100.00%

IN BBF 100.00%

AF BBC 62.09%

BE BBC 68.26%

FO BBE 0.0%

GR BCA 0.0%

PH BAH 100.00%

PB BAG 96.47%

OS BAH 100.00%

OP BAH 100.00%

OK BAH 100.00%

than 90 percent balanced accuracy on 14 categories,

more than 85 percent balanced accuracy on 5 cate-

gories while the rest gave around 80 percent on the

same metric. The defect category BAF is the most

frequently occurring defect in our dataset with a total

number of 8629 instances. On this category we ob-

tained a balanced accuracy score of 91.5 percent. On

all the categories containing more than 1000 instances

we achieved more than 90 percent balanced accuracy.

Interestingly, on the remaining 10 categories with less

than 1000 instances we achieved balanced accuracy

score between 85 percent and 90 percent. In some of

the rare categories, despite having very less instances

but due to its visually distinct nature, our models were

able to achieve very high accuracy. For instance, we

achieved 98.70 percent accuracy on the BAN class

which has only 119 instances and 91.91 percent ac-

curacy for the BAP class which has 138 instances re-

spectively. This shows that if the class is visually dis-

tinct, then even with as less as 100 images, our models

are capable of achieving good results. HRNet (19.25

million parameters) was the best performing model

with an average balanced accuracy of 85.27 percent

and a F2 score of 0.83 across all the defect categories

(refer to Table 1). The second best performing model

was MobileNet V3 Large, which is also the smallest

in size with 5.48 million parameters compared to the

other networks, with an average balanced accuracy of

82.83 percent. While DenseNet-264 (72.08 Million

parameters) and ResNet-152 (60.19 Million param-

eters) achieved average balanced accuracy scores of

82.25 percent and 82.77 percent respectively. These

results demonstrated models with smaller parameter

Sewer-AI: Sustainable Automated Analysis of Real-World Sewer Videos Using DNNs

901

count, i.e., MobileNet V3 Large achieved accuracy

scores close to or even better than much larger mod-

els. Such smaller models might be easier to integrate

on the inspection robots and can be used for quick

reliable inference during sewer inspection. This fact

further motivated us in the direction of pruning (E3)

in order to explore the feasibility of creating smaller

models that require less computational power but can

still achieve high, reliable performance.

In our next experiment E2 we tested the chosen

neural networks ability to distinguish each defect cat-

egory against all the other defect categories includ-

ing the non-defective instances, i.e., extracted images

with no defects. The results obtained in this exper-

iment establish DenseNet-264 as the best perform-

ing neural network with an average balanced accu-

racy of 80.55 percent, F1 score of 0.76 and F2 score

of 0.72 respectively across all the defect categories

(refer to Table 1). This is different from the results

obtained from experiment E1 where HRNet was the

best performing among all the chosen networks. In

the results from E2 we witness a drop of 4.72 per-

cent in average balanced accuracy, however, we ex-

pected this since there is very low inter-class varia-

tion among many defect categories. For example, the

defect categories BAG, BAH, BAJ and BDE are all

related to errors in pipe connections and are very sim-

ilar in their visual appearance. Therefore, having sim-

ilar images in both the positive and negative classes

under the one-vs-all setting decrease the performance

of the networks. As further examples, the defect cat-

egories BAB, BAC, BAF and BAO are related pipe

damage and contain instances which are very similar

with some being visually indistinguishable. In spite

of this, we were able to obtain more than 90 percent

balanced accuracy for 5 defect categories. Visually

distinct categories like BAN yielded very high bal-

anced accuracy of 99.57 percent while performance

on the shifted connection category BAJ, similar to

other connection related defect categories like BAG,

BAH, BDE, dropped to 86.01 percent from 91.95 per-

cent obtained under experiment E1. Thus, the models

under one-vs-all strategy (E2) certainly helped us in

differentiating one defect category from all the other

categories including the non-defective category, but

due to low inter-class variability in few of the cate-

gories the performance of the models dropped.

Under experiment E3, we performed pruning on

all the single-defect and one-vs-all classiﬁers devel-

oped in the experiments E1 and E2 respectively. With

the dynamic pruning strategy, described earlier, we

were able to ﬁnd models with least size but still capa-

ble of achieving performance similar to the original

models. The average pruning on single-defect classi-

ﬁers from E1 achieved in our experiment across all the

classes was around 50 percent for all the neural net-

works apart from DenseNet-264 where we achieved

75 percent average pruning. For one-vs-all classiﬁers

from E2, the average pruning ranged from 60 to 85

percent. We were able to prune more than 90 percent

for 44 different single-defect models and 73 differ-

ent one-vs-all models. Pruning signiﬁcantly reduced

model size, as with 90 percent or more pruning the

model size went down to 1/10th of the original model.

As a result, a signiﬁcant number of models were re-

duced to a much smaller size and it helped us to re-

duce the storage budget signiﬁcantly. Remarkably,

55 single-defect classiﬁers under pruning gave bet-

ter results than their corresponding non-pruned ver-

sions. Similarly, 62 of our one-vs-all pruned mod-

els performed better compared to their corresponding

non-pruned versions. Over here 6 models gave sig-

niﬁcant results, with an increase in balanced accuracy

by 6 to 11 percentage points. This shows that pruning

resulted in ﬁnding the most important weights of the

model and it discarded the weights which were rel-

atively less important or which reduced the model’s

performance. Thus pruning gave us the best com-

pact models which resulted in achieving higher per-

formances. In majority of the pruned models, the drop

in balanced accuracy scores stayed within the 1 per-

cent mark of the scores obtained with the correspond-

ing non-pruned versions. Finally, after conducting ex-

periments E1, E2 and E3 we found that the average

best performance over the 25 different defect cate-

gories considering only the best performing networks

is 90.13 percent (refer to Table 1). This shows that if

we use the best performing model for each defect cat-

egory, we are able to achieve 90% average balanced

accuracy across all the 25 classes.

Lastly, we tested the performance of our models

on the largest publicly available standardized dataset

for sewer inspection (Haurum and Moeslund, 2021).

Our goal was to determine whether our models are

able to generalize to other data sources and to observe

the levels of performance they can offer on these data.

We identiﬁed corresponding defect categories in our

respective datasets and measure the performance of

our models on the Sewer-ML test set. To be precise,

we computed the True Positive Rate (TPR) or Recall

and found that on 8 out of the 18 defect categories

in the Sewer-ML dataset our models achieve a TPR

of 100 percent and more than 95 percent on the de-

fect catagories FS and PB in the Sewer-ML dataset

(refer to Table 2). On two defect categories, our cor-

responding models achieve TPR of more than 60 per-

cent while on the remaining six very low performance

is observed. We believe this can be due to improper

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

902

mapping between the defect categories in our dataset

compared to their dataset.

6 CONCLUSION

To summarize, in this paper we thoroughly analyzed

the performance of six different neural networks for

identifying the defects present in real world sewer in-

spection videos. We outlined the critical challenges

present in our dataset, proposed strategies for train-

ing neural networks for handling these challenges and

performed three different experiments, E1, E2 and

E3 respectively, to study the effectiveness of the cho-

sen networks in handling such challenging data. We

found that for experiment E1 HRNet was the best

performing neural architecture with an average bal-

anced accuracy of 85.27% across all the defect cate-

gories. For the more complex task in experiment E2

we found that DenseNet-264 performed the best with

an average balanced accuracy of 80.55%. In our re-

sults obtained for experiment E2 we witnessed a drop

of 4.72% on average in balanced accuracy. We believe

this is due to the very low inter-class variation among

the defect categories present in our dataset. Under

the pruning experiment in E3 we were able to signiﬁ-

cantly reduce the model size. With pruning at 90 per-

cent or more, the models got reduced to one-tenth of

it’s original size. In total, we were able to prune 117

different class speciﬁc models, from experiments E1

and E2 combined, by more than 90 percent. However,

the drop in balanced accuracy scores stayed within the

1 percent mark of the scores obtained with the corre-

sponding non-pruned versions. Finally, we tested our

models on the Sewer-ML dataset and obtained very

high recall of 100% on 8 out of the 18 defect cate-

gories presented in the Sewer-ML dataset.

ACKNOWLEDGEMENTS

This research was funded in part by the German Fed-

eral Ministry of Education and Research (BMBF) un-

der the project KIKI (grant number 02WDG1594A).

It is a joint project of the August-Wilhelm-Scheer

Institut, TU Clausthal, AHT AquaGemini GmbH,

EURAWASSER Betriebsf

uhrungsgesellschaft mbh,

IBAK Helmut Hunger GmbH & Co. KG and

Entsorgungsverband Saar Kd

oR. August-Wilhelm-

Scheer Institut is mainly entrusted with conducting

research in artiﬁcial intelligence for automatically di-

agnosing the condition of the sewers.

REFERENCES

Alejo, D., Caballero, F., and Merino, L. (2017). Rgbd-

based robot localization in sewer networks. In

2017 IEEE/RSJ International Conference on Intelli-

gent Robots and Systems (IROS), pages 4070–4076.

ASCE (2017). American society of civil engineers 2017

infrastructure report card.

Bertalmio, M., Bertozzi, A. L., and Sapiro, G. (2001).

Navier-stokes, ﬂuid dynamics, and image and video

inpainting. In Proceedings of the 2001 IEEE Com-

puter Society Conference on Computer Vision and

Pattern Recognition. CVPR 2001, volume 1, pages I–

I. IEEE.

Cha, Y.-J., Choi, W., and B

ozt

urk, O. (2017). Deep

learning-based crack damage detection using convo-

lutional neural networks. Comput.-Aided Civ. Infras-

truct. Eng., 32(5):361–378.

Cheng, J. C. and Wang, M. (2018). Automated detection

of sewer pipe defects in closed-circuit television im-

ages using deep learning techniques. Automation in

Construction, 95:155–171.

Dang, L. M., Hassan, S., Im, S., Mehmood, I., and Moon,

H. (2018). Utilizing text recognition for the defects

extraction in sewers cctv inspection videos. Comput-

ers in Industry, 99.

Dehghan, A., Mehrandezh, M., and Paranjape, R. (2015).

Defect detection in pipes using a mobile laser-optics

technology and digital geometry. MATEC Web of Con-

ferences, 32:06006.

Duran, O., Althoefer, K., and Seneviratne, L. (2002). State

of the art in sensor technologies for sewer inspection.

IEEE Sensors Journal, 2(2):73–81.

Fang, X., Guo, W., Li, Q., Zhu, J., Chen, Z., Yu, J., Zhou,

B., and Yang, H. (2020). Sewer pipeline fault identi-

ﬁcation using anomaly detection algorithms on video

sequences. IEEE Access, 8:39574–39586.

Halfawy, M. and Hengmeechai, J. (2014). Efﬁcient algo-

rithm for crack detection in sewer images from closed-

circuit television inspections. Journal of Infrastruc-

ture Systems, 20:04013014.

Hassan, S., Dang, L. M., Mehmood, I., Im, S., Choi, C.,

Kang, J., Park, Y.-S., and Moon, H. (2019). Under-

ground sewer pipe condition assessment based on con-

volutional neural networks. Automation in Construc-

tion.

Haurum, J. B., Allahham, M. M. J., Lynge, M. S., Hen-

riksen, K. S., Nikolov, I. A., and Moeslund, T. B.

(2021). Sewer defect classiﬁcation using synthetic

point clouds. In VISIGRAPP.

Haurum, J. B., Bahnsen, C. H., Pedersen, M., and Moes-

lund, T. B. (2020). Water level estimation in sewer

pipes using deep convolutional neural networks. Wa-

ter.

Haurum, J. B. and Moeslund, T. B. (2021). Sewer-ml:

A multi-label sewer defect classiﬁcation dataset and

benchmark. In Proceedings of the IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition,

pages 13456–13467.

Sewer-AI: Sustainable Automated Analysis of Real-World Sewer Videos Using DNNs

903

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. 2016 IEEE Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 770–778.

Henriksen, K. S., Lynge, M. S., Jeppesen, M. D. B., Al-

lahham, M. M. J., Nikolov, I. A., Haurum, J. B.,

and Moeslund, T. B. (2020). Generating synthetic

point clouds of sewer networks: An initial investi-

gation. In Augmented Reality, Virtual Reality, and

Computer Graphics: 7th International Conference,

AVR 2020, Lecce, Italy, September 7–10, 2020, Pro-

ceedings, Part II, page 364–373, Berlin, Heidelberg.

Springer-Verlag.

Huang, G., Liu, Z., and Weinberger, K. Q. (2017). Densely

connected convolutional networks. 2017 IEEE Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 2261–2269.

Iyer, S. and Sinha, S. (2006). Segmentation of pipe images

for crack detection in buried sewers. Comp.-Aided

Civil and Infrastruct. Engineering, 21:395–410.

Iyer, S., Sinha, S. K., Pedrick, M., and Tittmann, B. R.

(2012). Evaluation of ultrasonic inspection and imag-

ing systems for concrete pipes. Automation in Con-

struction, 22:149–164.

Ji, H. W., Yoo, S. S., Lee, B.-J., Koo, D. D., and Kang, J.-

H. (2020). Measurement of wastewater discharge in

sewer pipes using image analysis. Water, 12(6).

Kumar, S., Wang, M., Abraham, D., Jahanshahi, M., Iseley,

T., and Cheng, J. (2020). Deep learning–based auto-

mated detection of sewer defects in cctv videos. Jour-

nal of Computing in Civil Engineering, 34:04019047.

Kumar, S. S., Abraham, D. M., Jahanshahi, M. R., Iseley,

T., and Starr, J. (2018). Automated defect classiﬁca-

tion in sewer closed circuit television inspections us-

ing deep convolutional neural networks. Automation

in Construction, 91:273–283.

Lepot, M., Stani

c, N., and Clemens, F. H. (2017). A tech-

nology for sewer pipe inspection (part 2): Experimen-

tal assessment of a new laser proﬁler for sewer defect

detection and quantiﬁcation. Automation in Construc-

tion, 73:1–11.

Li, D., Cong, A., and Guo, S. (2019). Sewer damage detec-

tion from imbalanced cctv inspection data using deep

convolutional neural networks with hierarchical clas-

siﬁcation. Automation in Construction, 101:199–208.

Liu, Z. and Kleiner, Y. (2013). State of the art review of in-

spection technologies for condition assessment of wa-

ter pipes. Measurement, 46:1–15.

Meijer, D., Scholten, L., Clemens, F., and Knobbe, A.

(2019). A defect classiﬁcation methodology for sewer

image sets with convolutional neural networks. Au-

tomation in Construction, 104:281–298.

Montero, R., Victores, J., Mart

ınez, S., Jard

on, A., and

Balaguer, C. (2015). Past, present and future of

robotic tunnel inspection. Automation in Construc-

tion, 59:99–112.

Moradi, S., Zayed, T., Nasiri, F., and Golkhoo, F. (2020).

Automated anomaly detection and localization in

sewer inspection videos using proportional data mod-

eling and deep learning–based text recognition. Jour-

nal of Infrastructure Systems, 26:04020018.

Myrans, J., Everson, R. M., and Kapelan, Z. (2018). Auto-

mated detection of faults in sewers using cctv image

sequences. Automation in Construction.

Pan, G., Zheng, Y., Guo, S., and Lv, Y. (2020). Auto-

matic sewer pipe defect semantic segmentation based

on improved u-net. Automation in Construction,

119:103383.

Piciarelli, C., Avola, D., Pannone, D., and Foresti, G. L.

(2019). A vision-based system for internal pipeline

inspection. IEEE Transactions on Industrial Informat-

ics, 15(6):3289–3299.

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and

Chen, L.-C. (2018). Mobilenetv2: Inverted residu-

als and linear bottlenecks. In 2018 IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition,

pages 4510–4520.

Sinha, K. and Fieguth, W. (2006). Morphological segmen-

tation and classiﬁcation of underground pipe images.

Mach. Vision Appl., 17(1):21–31.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna,

Z. (2016). Rethinking the inception architecture for

computer vision. In 2016 IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages

2818–2826.

Tan, M. and Le, Q. (2019). EfﬁcientNet: Rethinking model

scaling for convolutional neural networks. In Chaud-

huri, K. and Salakhutdinov, R., editors, Proceedings of

the 36th International Conference on Machine Learn-

ing, volume 97 of Proceedings of Machine Learning

Research, pages 6105–6114. PMLR.

Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao,

Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., and

Xiao, B. (2019). Deep high-resolution representation

learning for visual recognition. TPAMI.

Wang, M. and Cheng, J. C. P. (2020). A uniﬁed con-

volutional neural network integrated with conditional

random ﬁeld for pipe defect segmentation. Comput.-

Aided Civ. Infrastruct. Eng., 35(2):162–177.

Wang, M., Kumar, S. S., and Cheng, J. C. (2021). Auto-

mated sewer pipe defect tracking in cctv videos based

on defect detection and metric learning. Automation

in Construction, 121:103438.

Xie, Q., Li, D., Xu, J., Yu, Z., and Wang, J. (2019). Auto-

matic detection and classiﬁcation of sewer defects via

hierarchical deep learning. IEEE Transactions on Au-

tomation Science and Engineering, 16(4):1836–1847.

Yin, X., Chen, Y., Bouferguene, A., Zaman, H., Al-Hussein,

M., and Kurach, L. (2020). A deep learning-based

framework for an automated defect detection sys-

tem for sewer pipes. Automation in Construction,

109:102967.

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

904