Attentive-YOLO: On-Site Water Pipeline Inspection Using Efﬁcient

Channel Attention and Reduced ELAN-Based YOLOv7

Atanu Shuvam Roy

and Priyanka Bagade

Indian Institute of Technology, Kanpur, India

Keywords:

YOLOv7, Computer Vision, Water Pipeline Inspection, Pipe Robot.

Abstract:

The effective and dependable distribution of clean water to communities depends on the timely inspection and

repair of water pipes. Traditional inspection techniques frequently require expensive physical labour, resulting

in false and delayed defect detections. Current water pipeline inspection methods include radiography testing,

eddy current testing, and CCTV inspection. These methods require experts to be present on-site to conduct the

tests. Radiographed and CCTV images are usually used for pipeline defect detection on-site, yet real-time au-

tomatic detection is lacking. Current approaches, including YOLOv5 models with Retinex-based illumination,

achieve acceptable performance but hinder fast inference due to bulky models, which is especially concerning

for edge devices. This paper proposes an Attentive-YOLO model based on the state-of-the-art object detec-

tion YOLOv7 model with a reduced Efﬁcient Layer Aggregation Network (ELAN). We propose a lightweight

attention model in the head and backbone of the YOLOv7 network to improve accuracy while reducing model

complexity and size. The paper aims to present an efﬁcient model to be deployed on edge devices such as

the Raspberry Pi to be used in Internet of Things (IoT) systems and on-site robotics applications like pipeline

inspection robots. Based on the experiments, the proposed model, Attentive-YOLO, achieves an mAP score

of 0.962 over 0.93 (1/3rd channel width) compared to the Yolov7-tiny model, with an almost 20% decrease in

model parameters.

1 INTRODUCTION

Metal pipes are an indispensable material in daily

life for transporting water from reservoirs to residen-

tial and industrial areas. These underground pipes

are prone to corrosion and degradation with the pas-

sage of time. In recent years, several incidents of

gas pipeline leakage and water pipeline bursts have

caused drastic disasters (Kalita, 2023). Figure 1

shows some of the most common defects inside un-

derground metal pipes requiring manual labour to de-

tect in most cases.

Destructive methods such as cutting, section-

ing, and drilling for pipeline defect detection dis-

rupt the normal services of the pipeline. Hence,

non-destructive methods (NDT) such as Visual Test-

ing (VT), Ultrasonic Testing (UT), Thermography

and Radiographic Testing (RT) are used to test the

pipeline defects. These sensor-based defect detec-

tion methods require manual monitoring , e.g. vi-

sual inspection via CCTV and sound sensor detec-

https://orcid.org/0000-0002-5522-9043

https://orcid.org/0000-0003-1045-4148

Figure 1: Various types of water pipeline defects. (a) en-

crustation, (b) ferrule, (c) stone, (d) deformation, (e) sludge

formation, (f) roots.

tion of leaks, pressure testing, and water quality sam-

pling (Korlapati et al., 2022). Thus, for automation,

the building of pipeline crawler robots with an at-

tached camera (Ab Rashid et al., 2020), radiography-

based detections (Silva et al., 2021), and Eddy Cur-

rent Based detection (Sharma, 2021) have been em-

ployed.

492

Roy, A. and Bagade, P.

Attentive-YOLO: On-Site Water Pipeline Inspection Using Efﬁcient Channel Attention and Reduced ELAN-Based YOLOv7.

DOI: 10.5220/0012374000003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 4: VISAPP, pages

492-499

ISBN: 978-989-758-679-8; ISSN: 2184-4321

Although NDT and automated robots for wa-

ter pipeline inspection works, they are not auto-

matic. Hence, various approaches employing tra-

ditional image processing, Machine Learning (ML),

and Deep Learning (DL) techniques such as Convo-

lutional Neural Networks (Yan and Song, 2020), Re-

current Neural Networks (Shaik et al., 2022), ResNet

(Guo et al., 2022) etc. have been applied to pipeline

inspection tasks like corrosion, joints, holes, encrus-

tations, ferrule, sludge formation, cracks, and pene-

tration of roots. However, drawbacks such as high

computational requirements, lack of interpretability

causes problems since the main intention remains to

be run on single-chip boards for IoT and automation.

Transitioning to object detection models like

YOLO (You Only Look Once) and Faster R-CNN ad-

dresses limitations in conventional DL methods for

pipeline inspection. These models improve defect lo-

calization and facilitate targeted maintenance. Ob-

ject detection enhances adaptability across pipelines

and environments by recognizing common defect pat-

terns, reducing the need for domain-speciﬁc data. You

Only Look Once (YOLO) is a family of object detec-

tion models based on Convolutional Neural Networks

(CNN). Some of its advantages are faster inference

with good mAP score i.e. accuracy required for real-

time monitoring and detection(Redmon et al., 2016)

for single chip devices. This paper showcases an im-

proved object detection model for the water pipeline

defect dataset(solinas xml to txt, 2023) based on the

YOLOv7-Tiny variant.

The primary contribution of this paper is to build

a lightweight YOLOv7-based model for real-time

pipeline defect detection with the following architec-

tural changes:

1. Implemented Efﬁcient Channel Attention Mech-

anism (ECAM) in the head module for enhanced

feature extraction, speciﬁcally improving textures

in low-light environments through local cross-

channel interaction.

2. Introduced Reduced-ELAN (Efﬁcient Layer Ag-

gregation Networks) by removing one standard

convolutional layer from each ELAN structure in

the original model, reducing overall model com-

plexity.

3. Improved detection phase efﬁciency by replacing

the last three convolution layers with RepConv,

demonstrating a favorable accuracy-speed trade-

off compared to standard convolution layers.

Extensive experiments performed on a Raspberry Pi

3B+ device take 4% less inference time per frame in

full channel width and up to 8% less inference time

compared to the original YOLOv7-Tiny model. Our

proposed model Attentive-YOLO, achieves an mAP

score of 0.962 over 0.93 (1/3rd channel width) com-

pared to the Yolov7-tiny model, with an almost 20%

decrease in model parameters.

2 RELATED WORK

The pipeline inspection system has used various

methods to analyse the pipeline defects (Manga-

yarkarasi et al., 2019) including image processing,

use of ML and DL models to categorise the type of

defects, and sensors (acoustic, radiography, camera)

to capture data and then analyze that for defect detec-

tion.

Speciﬁc robots have been developed in recent

years using embedded platforms. Mohd Aliff et al

(Aliff et al., 2022) created a remotely operated un-

derwater vehicle equipped with the Raspberry Pi and

the Raspberry Pi camera to take images of underwater

pipes and then use canny edge detection techniques to

identify the cracks. Shaikat et al (Shaikat et al., 2021)

designed a similar DC motor-powered robot based on

the Raspberry Pi and equipped it with an ultrasound

sensor, GPS, and a webcam to detect pipeline defects.

2.1 Image Processing Techniques

Digital image processing techniques identify the de-

fective region inside the pipe. Region growing (Yang

and Su, 2009), Thresholding (Zhong et al., 2018),

and Otsu’s binarization (Saranya et al., 2014) are fre-

quently used for segmenting the corrosion and cracks

region. Prema et al (Prema Kirubakaran and Mu-

rali Krishna, 2018) have used the Kohonen cluster-

ing network, Canny Edge detection, and mathemat-

ical morphological operator algorithm to develop a

system for visually identifying oil pipeline defects.

A colour-based technique was created by Venkata-

sainath et al (Bondada et al., 2018) to identify and

measure the corrosion in the pipes. Calculating the

mean saturation value across all pixels using the hue

saturation and intensity of the surface photographs of

the pipes was utilised to identify the corroded area.

The morphological procedure is used to quantify the

level of corrosion once the corroded area has been

identiﬁed. This method can only be used for corro-

sion detection. To ﬁnd the defects in the CCTV pho-

tos of the pipes low image quality and lighting, Yang

and Su (Yang and Su, 2009) proposed to use Otsu’s

segmentation approach to ﬁnd the defects.

Attentive-YOLO: On-Site Water Pipeline Inspection Using Efﬁcient Channel Attention and Reduced ELAN-Based YOLOv7

493

2.2 Machine Learning-Based

Techniques

The traditional ML classiﬁers have been employed in

many projects with various feature sets for multiclass

pipe defects (Mangayarkarasi et al., 2019),(De Masi

et al., 2015). To categorize the defects in sewer pipes,

Wei Wu et al (Wu et al., 2015) performed feature ex-

traction and used Adaboost, Random forest and Ro-

tation forest in ensemble learning. To identify de-

fects in CCTV images, Halfway et al. (Halfawy

and Hengmeechai, 2014) suggested a classiﬁcation

method based on segmentation, using Otsu’s picture

segmentation method to extract the Region of Interest

(ROI) and used SVM to classify the defects. Duran et

al. (Duran et al., 2007) created maps and visualized

signal information for future analysis in their paper. A

binary classiﬁer ﬁrst identiﬁes faulty pipe segments

and then the defects are classiﬁed ass holes, cracks,

and foreign obstacles.

2.3 YOLO-Based Techniques

Various works and comparisons have been made us-

ing YOLO Models for underwater pipe inspection

(Ga

sparovi

c et al., 2022) (Ga

sparovi

c et al., 2023).

Bastian et al. (Bastian et al., 2019) built a com-

prehensive pipeline corrosion dataset and achieved

98.8% accuracy in categorizing corrosion levels using

a customized convolutional neural network (CNN).

Another study by Chen et al. (Chen et al., 2022)

proposed an autonomous defect detection system for

petrochemical pipeline systems. The approach uti-

lized an enhanced YOLOv5 and Cycle-GAN to ad-

dress issues of low sample counts and category imbal-

ances. It incorporated a self-attention mechanism and

vision transformer, and achieved an mAP of 93.10%

for detecting faulty areas on pipes. Situ et al (Situ

et al., 2023) have shown a real-time detection sys-

tem based on YOLOv5 integrating object detection

networks, migration learning, and channel pruning

techniques. The strategy increased both accuracy and

speed. Zhang et al (Zhang et al., 2023) proposed

an improved YOLOv4 model where they used spa-

tial pyramid pooling (SPP) to identify sewage defects.

According to the experiments, the accuracy of the im-

proved model is 4.6% higher than that of the base

YOLOv4 model. Other similar ﬁelds, such as vac-

uum glass tube defect detection (Sheng et al., 2023),

and Metal pipe surface defect detection (Nabizadeh

and Parghi, 2023) have used YOLOv7 object de-

tection model with attention. Radiography image-

based defect detection (Wang et al., 2022) and metal

pipe welding defect classiﬁcation using Mobile Net

(Moshayedi et al., 2022) are also proposed in the lit-

erature.

However, while these projects demonstrate the

potential of using embedded boards like Raspberry

Pi to deploy camera-based defect detection systems,

there are several limitations to the current approaches.

Many of these robots operate in a semi-automatic

manner, where video feeds are transmitted for later

analysis due to the resource-intensive nature of the

models available today. The existing solutions often

require substantial computational power for real-time

defect detection, rendering them impractical for de-

ployment on low-powered IoT devices.

This paper addresses limitations in current de-

fect detection systems by combining the YOLO (You

Only Look Once) object detection framework with

the Efﬁcient Channel Attention Mechanism (ECAM).

This fusion reduces model complexity and size, mak-

ing it suitable for low-powered IoT devices like

Raspberry Pi. YOLO’s real-time capabilities enable

prompt defect identiﬁcation, while ECAM enhances

feature representation without signiﬁcant computa-

tional overhead. This optimized synergy allows for

efﬁcient real-time defect detection on underwater

robots, ensuring timely maintenance and streamlined

inspection operations.

3 METHODOLOGY

Utilizing deep learning object detection for identify-

ing pipe defects improves inspection and industrial

automation. However, accurately classifying multiple

defects with low inference times poses a challenge,

especially compared to larger models. This work em-

ploys an enhanced YOLOv7-tiny version for object

detection, chosen based on application requirements.

YOLOv7 is known for its accuracy and real-time de-

tection speed, but there’s a trade-off between speed

and accuracy (Huang et al., 2017). In our application,

minimizing this trade-off is crucial for efﬁcient real-

time detection.

3.1 The YOLOv7 Models

Figure 2: Simpliﬁed structure of YOLO algorithm.

YOLO models, including the latest YOLOv7, are

single-stage object detectors with a pipeline illus-

trated in Figure 2. YOLOv7 enhances images using

ELAN in the backbone, where extracted features are

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

494

fused and mixed in the neck before detection in the

head. Incorporating focal loss in YOLOv7 improves

small object detection by adjusting loss weights. It

also processes higher resolution images. YOLOv7

tiny, a lightweight variant, maintains the base model’s

backbone with ELAN but has fewer convolutional

layers, retaining the original architecture without re-

ducing channel width.

3.2 Proposed Model Architecture

This paper suggests a novel pipe defect detection al-

gorithm and uses YOLOv7 Tiny as the base model.

This method makes three novel contributions:

• Reducing ELAN structure by having fewer convo-

lution layers before concatenation to reduce com-

plexity and hence inference speed.

• Implementation of ECAM at the end of the back-

bone layer with an additional convolution layer to

compensate for the loss of layers, hence increas-

ing model accuracy.

• The addition of ECAM to the head of the network

before the concatenation with the P5 and P4 layers

of the backbone also contributes to the accuracy

improvement when channel width is reduced.

3.2.1 Reduced ELAN

The backbone of the Attentive YOLO employs re-

duced ELAN (R-ELAN) where the layers have been

reduced from ELAN, maxpooling layers and the

PReLU (parametric linear unit) activation function for

faster training and reducing inference speed. The net-

work structure of the reduced ELAN is shown in Fig-

ure 3.

Figure 3: Reduced ELAN module (R-ELAN).

As shown in Figure 3, R-ELAN has three con-

volutional blocks. Afterwards, layers from the pre-

vious are passed through a maxpool layer, concate-

nated, and then passed through another convolutional

block for R-ELAN (Figure 3) and then passed to the

next layer shown here as output.

3.2.2 Efﬁcient Channel Attention (ECA)

Attention Mechanism

In Efﬁcient Channel Attention, all the channels of the

image are passed through the global average pool op-

eration after the image input. Next, channel weights

are produced by using a 1x1 convolution of size K

After calculating the corresponding probabilities of

the channels, it is compared to the original input im-

age (Wang et al., 2020), (Wang et al., 2022).

Figure 4: ECA Mechanism Structure.

The input characteristics are multiplied together

and then used as the input for the next layer. Equa-

tion 1 shows how this method uses function adapta-

tion to ﬁnd the K

value, and its proportionality to

the dimension of the channel - C:

C =φ(K

) = 2

(λK

−b)

=ψ(C) = |

log

(C)

|odd

(1)

where, λ = 2, b = 1 and K

accepts nearest odd value

Despite being lightweight in design, ECA can

eliminate unnecessary data and concentrate on valu-

able semantic details within feature maps. This is

achieved without reducing the dimensions of the data.

Figure 4 shows the structure of the ECA Mechanism.

3.2.3 CBS Layers, Loss Function and Activation

Function

Each convolutional block shown in the ﬁgures has a

convolutional layer at the beginning and an activation

function at the end, with a batch normalization oper-

ation in the middle. The YOLOv7 loss function com-

prises three components: the bounding box loss func-

tion (e.g., CIoU loss Equation 3), the objectness loss

Attentive-YOLO: On-Site Water Pipeline Inspection Using Efﬁcient Channel Attention and Reduced ELAN-Based YOLOv7

495

function and the classiﬁcation loss function, both be-

ing BCEWithLogitsLoss (BCE-Loss) Equation 2 be-

cause of its efﬁcient handling of imbalanced classes.

BCE-Loss(x, t) = −t · log(σ(x)) − (1 −t) · log(1 − σ(x))

(2)

where, x represents the input logits or predicted val-

ues, t is the target values or labels and α is the sig-

moid function, which maps the logits to probabilities

between 0 and 1.

CIoU Loss = 1 − IoU +

, c

)

+ α · v (3)

where IoU is the Intersection over Union, d(c

, c

)

represents the distance between the centres of the pre-

dicted and ground truth bounding boxes, r

represents

the squared diagonal length of the smallest enclosing

box, α is a hyperparameter that balances the impact of

the enclosing box term and ﬁnally v represents an ad-

ditional penalty term that encourages more accurate

bounding box regression.

pReLU(x) = x if x ≥ 0

= αx else, where α need to be trained

(4)

PReLU, as shown in Equation 4, is a variant of the

Leaky ReLU function that allows for the alpha pa-

rameter to be learned during training.

3.3 Final Model

As shown in Figure 5, we add an ECA block using

the ECAM with the convolutional block before and

after at the end of the backbone. After every up-

sample in the neck of the network, an ECAM has

been added to capture the features for ﬁnal detection.

Besides, we have several max-pooling layers before

each R-ELAN. The initial part of the YOLOv7 tiny

model is a simpliﬁed version of the SPPCSPC (CSP-

Net with Spatial Pyramid Pooling Layer) block from

the base model. Compared to the original YOLOv7

tiny model, instead of regular convolutional blocks,

we have used RepConv in the head of the network.

RepConv uses 3x3 convolution, 1x1 convolution, and

identity connection in one convolutional layer (Ding

et al., 2021) except the identity connection is skipped.

4 EXPERIMENTATION AND

RESULTS

For training and initial testing, we used a system run-

ning Ubuntu 20.04 LTS equipped with two Nvidia

RTX 3090Ti with 24 GB memory GPU. For the run-

time environment and framework, python 3.9 and py-

torch 2.0.1 are used, respectively. The training pa-

rameters were kept identical to the original YOLOv7

model.

4.1 Dataset

It paper uses an open source data set (solinas xml to

txt, 2023) consisting of 1872 images gathered from

CCTV footage from inside the used water pipes in the

sewage system having variable inside diameters from

2.94 inches to 5.94 inches. The dataset contained

eight classes, namely root blockage (rb), encrustation

(en), ferrule (fr), joint (jt), pipe surface damage (pd),

shape deformation (sd), slug accumulation (sa) and

stone or obstacles (st) with train/test/val split being

80/10/10.

4.2 Experimental Results

We evaluated several models including YOLOv5 by

training them on the dataset mentioned in section 4.1.

Two sections are shown in the Table 1. Two ap-

proaches were explored: ﬁrst, maintaining full chan-

nel width in each layer and second, adjusting the

width by multiplying it with a factor inherent in the

YOLOv7 network. The latter option, the lowest feasi-

ble multiple was chosen to minimize model size and

computation while enhancing accuracy and speed in

defect detection.

Table 1: Ablation Experiments Results.

Method mAP

mAP

.50:9.5

Size (MB) Time (ms) Para (M)

Previous Generation of YOLO Model

YOLOv5-l 0.98 0.70 92.9 7.0 46.1

YOLOv5-m 0.983 0.69 42.0 4.4 20.0

YOLOv5-s 0.979 0.68 13.9 3.4 7.03

YOLOv5-n 0.98 0.68 3.9 3.5 1.7

Full Channel Width [YOLOv7]

YOLOv7 0.971 0.643 74.9 5.7 37.2

YOLOv7-Tiny 0.975 0.658 12.3 3.0 6.02

YOLOv7-Tiny + CBAM 0.959 0.551 10.34 4.8 5.03

Proposed Method 0.973 0.634 10.26 2.3 4.9

1/3rd Model Depth + 1/4th Channel Width [YOLOv7]

YOLOv7 0.951 0.628 5.1 3.5 2.3

YOLOv7-Tiny 0.93 0.58 1.0 2.8 0.38

YOLOv7-Tiny + CBAM 0.956 0.58 0.88 4.3 0.32

Proposed Method 0.962 0.572 0.87 2.2 0.31

As shown in Table 1, the original YOLOv7 per-

forms well but demands powerful hardware due to its

larger size. The larger YOLOv7 and Yolov5 do not

run efﬁciently on the Raspberry Pi system. YOLOv7

achieves 0.951 mAP

with reduced depth and 5.1 MB

size, with an inference time of 3.5 ms. YOLOv7-tiny

gets 0.93 mAP

and 1.0 MB size, but with less in-

ference speed. With CBAM, YOLOv7 tiny reaches

0.956 mAP

but slower inference, unsuitable for the

Raspberry Pi. Our proposed model, Attentive YOLO

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

496

Figure 5: Architecture of the proposed Attentive YOLO.

(YOLOv7-Tiny + ECAM) gets 0.962 mAP

, similar

size, yet signiﬁcantly faster inference than the base

models, i.e. a favorable speed-accuracy tradeoff.

In the Table 2, the accuracies of different classes

of the dataset are shown. Compared to the original

models, our proposed model, Attentive YOLO per-

forms consistently in both full-channel and reduced-

channel mode.

Table 2: Model Performance on water pipeline image

dataset (solinas xml to txt, 2023).

Model rb en fr jt pd sd sa st

Previous Generation of YOLO Model

YOLOv5-l 0.995 0.929 0.966 0.995 0.995 0.995 0.995 0.977

YOLOv5-m 0.995 0.929 0.965 0.995 0.995 0.995 0.995 0.993

YOLOv5-s 0.995 0.913 0.949 0.995 0.995 0.995 0.995 0.991

YOLOv5-n 0.995 0.928 0.965 0.995 0.995 0.995 0.995 0.99

Full Channel Width [YOLOv7]

YOLOv7 0.996 0.918 0.972 0.979 0.995 0.996 0.996 0.975

YOLOv7-Tiny 0.996 0.908 0.972 0.917 0.995 0.996 0.996 0.98

YOLOv7-Tiny+CBAM 0.996 0.912 0.975 0.996 0.996 0.997 0.996 0.991

Proposed Method 0.995 0.897 0.956 0.995 0.995 0.996 0.996 0.992

Reduced Model Depth and Channel Width [YOLOv7]

YOLOv7 0.995 0.892 0.971 0.786 0.995 0.995 0.995 0.981

YOLOv7-Tiny 0.933 0.821 0.93 0.928 0.904 0.995 0.992 0.954

YOLOv7-Tiny+CBAM 0.995 0.876 0.902 0.995 0.902 0.995 0.995 0.985

Proposed Method 0.995 0.875 0.927 0.976 0.956 0.995 0.995 0.975

As the table shows, as the model depth is reduced

compared to the original model (YOLOv7-Tiny), ac-

curacy has reduced signiﬁcantly. Adding CBAM cer-

tainly increases the accuracy; however, the perfor-

mance, as shown in Table 3 deployed on a Raspberry

Pi Model 3B+ is reduced. With Attentive YOLO, ac-

curacy is nearly consistent with the full-depth models,

keeping performance the same too.

Table 3: Performance of Model on the Raspberry Pi.

Model Min FPS AVG FPS Max FPS Inference Time (ms)

Previous Generation of YOLO Model

YOLOv5-l NULL NULL NULL NULL

YOLOv5-m 0.02 0.04 0.05 20307

YOLOv5-s 0.12 0.129 0.134 7773

YOLOv5-n 0.22 0.29 0.35 3481

Full Channel Width [YOLOv7]

YOLOv7 NULL NULL NULL NULL

YOLOv7-Tiny 0.16 0.23 0.30 4259

YOLOv7-Tiny-CBAM 0.19 0.28 0.33 3493

Proposed Method 0.22 0.28 0.35 3385

Reduced Model Depth and Channel Width [YOLOv7]

YOLOv7 0.19 0.26 0.35 3716

YOLOv7-Tiny 0.53 0.87 1.36 946

YOLOv7-Tiny-CBAM 0.40 0.89 1.39 902

Proposed Method 0.53 0.95 1.18 878

As it shows in Table 3, the proposed Attentive

YOLO performs efﬁciently w.r.t. inference speed and

FPS, considering it runs on a single-chip computer

Raspberry-Pi system. The initial YOLOv7 model

encounters runtime issues, causing device crashes,

while the YOLOv7-Tiny model exhibits sluggish per-

formance with inference times nearing 4 seconds per

frame. In contrast, our proposed model, Attentive

YOLO with R-ELAN, demonstrates substantial en-

hancements tailored to our use case, achieving ap-

proximately 3.3 seconds per frame for the full-depth

model. This translates to nearly 1 frame per sec-

ond (FPS) or an efﬁcient 0.9 milliseconds of infer-

ence time on average, surpassing the performance of

CBAM with reduced width in the model.

Attentive-YOLO: On-Site Water Pipeline Inspection Using Efﬁcient Channel Attention and Reduced ELAN-Based YOLOv7

497

Figure 6A shows the simulated setup of the de-

vice with a USB webcam connected to Raspberry-Pi

system. Figure 6B shows the prediction results after

being run. For the real-world simulation, the video of

the pipe was directly fed to the Raspberry Pi as a ﬁle

and via webcam.

Figure 6: Attentive-YOLO Experimental Setup & Results.

The test images have been augmented (rotated,

cropped, ﬂipped etc.) in several ways to challenge

the model. Inference times for the webcam and direct

video feed are averaged and shown in Table 3.

5 CONCLUSION AND FUTURE

WORK

In this paper, we address the challenge of inspecting

and repairing water pipes by leveraging computer vi-

sion techniques for non-destructive testing by propos-

ing a modiﬁed version of the YOLOv7 Tiny model,

incorporating the R-ELAN with ECAM into the base

architecture. This efﬁcient model is suitable for de-

ployment on small-scale computing devices like the

Raspberry Pi in IoT and robotics applications like

pipeline inspection robots. The performance evalua-

tion demonstrates that our proposed model, Attentive

YOLO, outperforms the base YOLOv7 and YOLOv7-

Tiny models on a single-chip computer regarding in-

ference speed and Frames Per Second. The base

YOLOv7 model fails to run on the device, leading to

system crashes, while the YOLOv7-Tiny model ex-

hibits slow inference times, taking nearly 4 seconds

per frame. In contrast, the attentive YOLO achieves

an average inference time of approximately 0.9 sec-

onds per frame on the Raspberry Pi for the full-depth

model, corresponding to nearly 1 FPS on average on

the shallow model while retaining considerable ac-

curacy compared to state-of-the-art models. Future

work can focus on reﬁning the model further, explor-

ing additional optimizations, and evaluating its per-

formance in real-world pipeline inspection scenarios

to ensure its practical applicability, scalability and im-

plementation in a pipe inspection robot.

REFERENCES

Ab Rashid, M. Z., Yakub, M. F. M., bin Shaikh Salim, S.

A. Z., Mamat, N., Putra, S. M. S. M., and Roslan,

S. A. (2020). Modeling of the in-pipe inspection

robot: A comprehensive review. Ocean Engineering,

203:107206.

Aliff, M., Hanisah, N. F., Ashroff, M. S., Hassan, S., Nurr,

S. F., and Sani, N. S. (2022). Development of un-

derwater pipe crack detection system for low-cost un-

derwater vehicle using raspberry pi and canny edge

detection method. International Journal of Advanced

Computer Science and Applications, 13(11).

Bastian, B. T., N, J., Ranjith, S. K., and Jiji, C. (2019). Vi-

sual inspection and characterization of external corro-

sion in pipelines using deep neural network. NDT &

E International, 107:102134.

Bondada, V., Pratihar, D. K., and Kumar, C. S. (2018). De-

tection and quantitative assessment of corrosion on

pipelines through image analysis. Procedia Computer

Science, 133:804–811.

Chen, K., Li, H., Li, C., Zhao, X., Wu, S., Duan, Y., and

Wang, J. (2022). An automatic defect detection sys-

tem for petrochemical pipeline based on cycle-gan and

yolo v5. Sensors, 22(20):7907.

De Masi, G., Gentile, M., Vichi, R., Bruschi, R., and Ga-

betta, G. (2015). Machine learning approach to cor-

rosion assessment in subsea pipelines. In OCEANS

2015-Genova, pages 1–6. IEEE.

Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., and Sun,

J. (2021). Repvgg: Making vgg-style convnets great

again. In Proceedings of the IEEE/CVF conference

on computer vision and pattern recognition, pages

13733–13742.

Duran, O., Althoefer, K., and Seneviratne, L. D. (2007).

Automated pipe defect detection and categorization

using camera/laser-based proﬁler and artiﬁcial neural

network. IEEE Transactions on Automation Science

and Engineering, 4(1):118–126.

sparovi

c, B., Lerga, J., Mau

sa, G., and Iva

c-Kos, M.

(2022). Deep learning approach for objects detection

in underwater pipeline images. Applied Artiﬁcial In-

telligence, 36(1):2146853.

sparovi

c, B., Mau

sa, G., Rukavina, J., and Lerga, J.

(2023). Evaluating yolov5, yolov6, yolov7, and

yolov8 in underwater environment: Is there real im-

provement? In 2023 8th International Conference on

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

498

Smart and Sustainable Technologies (SpliTech), pages

1–4. IEEE.

Guo, W., Zhang, X., Zhang, D., Chen, Z., Zhou, B., Huang,

D., and Li, Q. (2022). Detection and classiﬁcation of

pipe defects based on pipe-extended feature pyramid

network. Automation in Construction, 141:104399.

Halfawy, M. R. and Hengmeechai, J. (2014). Automated de-

fect detection in sewer closed circuit television images

using histograms of oriented gradients and support

vector machine. Automation in Construction, 38:1–

13.

Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A.,

Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadar-

rama, S., et al. (2017). Speed/accuracy trade-offs for

modern convolutional object detectors. In Proceed-

ings of the IEEE conference on computer vision and

pattern recognition, pages 7310–7311.

Kalita, N. (2023). Assam: Water pipeline connecting Khar-

guli reservoir explodes, several injured.

Korlapati, N. V. S., Khan, F., Noor, Q., Mirza, S., and Vad-

diraju, S. (2022). Review and analysis of pipeline leak

detection methods. Journal of Pipeline Science and

Engineering, page 100074.

Mangayarkarasi, N., Raghuraman, G., and Kavitha, S.

(2019). Inﬂuence of computer vision and iot for

pipeline inspection-a review. In 2019 International

Conference on Computational Intelligence in Data

Science (ICCIDS), pages 1–6.

Moshayedi, A. J., Khan, A. S., Yang, S., and Zanjani, S. M.

(2022). Personal image classiﬁer based handy pipe de-

fect recognizer (hpd): Design and test. In 2022 7th In-

ternational Conference on Intelligent Computing and

Signal Processing (ICSP), pages 1721–1728. IEEE.

Nabizadeh, E. and Parghi, A. (2023). Automated corro-

sion detection using deep learning and computer vi-

sion. Asian Journal of Civil Engineering, pages 1–13.

Prema Kirubakaran, A. and Murali Krishna, I. (2018).

Pipeline crack detection using mathematical morpho-

logical operator. Knowledge Computing and its Appli-

cations: Knowledge Computing in Speciﬁc Domains:

Volume II, pages 29–46.

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.

(2016). You only look once: Uniﬁed, real-time ob-

ject detection. In 2016 IEEE Conference on Computer

Vision and Pattern Recognition (CVPR), pages 779–

788.

Saranya, R., Daniel, J., Abudhahir, A., and Chermakani, N.

(2014). Comparison of segmentation techniques for

detection of defects in non-destructive testing images.

In 2014 International Conference on Electronics and

Communication Systems (ICECS), pages 1–6. IEEE.

Shaik, N. B., Benjapolakul, W., Pedapati, S. R., Bingi,

K., Le, N. T., Asdornwised, W., and Chaitusaney, S.

(2022). Recurrent neural network-based model for es-

timating the life condition of a dry gas pipeline. Pro-

cess Safety and Environmental Protection, 164:639–

650.

Shaikat, A. S., Hussein, M. R., and Tasnim, R. (2021). De-

sign and development of a pipeline inspection robot

for visual inspection and fault detection. In Proceed-

ings of Research and Applications in Artiﬁcial Intelli-

gence: RAAI 2020, pages 243–253. Springer.

Sharma, R. R. (2021). Gas leakage detection in pipeline by

svm classiﬁer with automatic eddy current based de-

fect recognition method. Journal of Ubiquitous Com-

puting and Communication Technologies (UCCT),

3(03):196–212.

Sheng, Z., Chen, H., and Qi, Z. (2023). Cbam-based

method in yolov7 for detecting defective vacuum glass

tubes. In Proceedings of the 2023 2nd Asia Confer-

ence on Algorithms, Computing and Machine Learn-

ing, pages 413–418.

Silva, W., Lopes, R., Zscherpel, U., Meinel, D., and Ewert,

U. (2021). X-ray imaging techniques for inspection of

composite pipelines. Micron, 145:103033.

Situ, Z., Teng, S., Liao, X., Chen, G., and Zhou, Q. (2023).

Real-time sewer defect detection based on yolo net-

work, transfer learning, and channel pruning algo-

rithm. Journal of Civil Structural Health Monitoring,

pages 1–17.

solinas xml to txt (2023). kapilproject dataset.

https://universe.roboﬂow.com/solinas-xml-to-txt/

kapilproject. visited on 2023-06-17.

Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., and Hu, Q.

(2020). Eca-net: Efﬁcient channel attention for deep

convolutional neural networks. In Proceedings of the

IEEE/CVF conference on computer vision and pattern

recognition, pages 11534–11542.

Wang, Y., Wang, H., and Xin, Z. (2022). Efﬁcient detection

model of steel strip surface defects based on yolo-v7.

IEEE Access, 10:133936–133944.

Wu, W., Liu, Z., and He, Y. (2015). Classiﬁcation of de-

fects with ensemble methods in the automated visual

inspection of sewer pipes. Pattern Analysis and Ap-

plications, 18:263–276.

Yan, X. and Song, X. (2020). An image recognition al-

gorithm for defect detection of underground pipelines

based on convolutional neural network. Traitement du

Signal, 37(1).

Yang, M.-D. and Su, T.-C. (2009). Segmenting ideal mor-

phologies of sewer pipe defects on cctv images for au-

tomated diagnosis. Expert Systems with Applications,

36(2):3562–3573.

Zhang, J., Liu, X., Zhang, X., Xi, Z., and Wang, S. (2023).

Automatic detection method of sewer pipe defects

using deep learning techniques. Applied Sciences,

13(7):4589.

Zhong, X., Peng, X., Yan, S., Shen, M., and Zhai, Y. (2018).

Assessment of the feasibility of detecting concrete

cracks in images acquired by unmanned aerial vehi-

cles. Automation in Construction, 89:49–57.

Attentive-YOLO: On-Site Water Pipeline Inspection Using Efﬁcient Channel Attention and Reduced ELAN-Based YOLOv7

499