Application of Deep Learning to the Detection of Foreign Object Debris

at Aerodromes’ Movement Area

ao Almeida, Gonc¸alo Cruz

, Diogo Silva

and Tiago Oliveira

Portuguese Air Force Academy Research Center, Sintra, Portugal

Keywords:

Foreign Object Debris, Computer Vision, Dataset, Image Classiﬁcation, Object Detection.

Abstract:

This work describes a low-cost and passive system installed on ground vehicles that detects Foreign Object

Debris (FOD) at aerodromes’ movement area, using neural networks. In this work, we created a dataset

of images collected at an airﬁeld to test our proposed solution, using three different electro-optical sensors,

capturing images in different wavelengths: i) visible, ii) near-infrared plus visible and iii) long-wave infrared.

The ﬁrst sensor captured 9,497 images, the second 5,858, and the third 10,388. Unlike other works in this ﬁeld,

our dataset is publicly available, and was collected accordingly to our envisioned real world application. We

rely on image classiﬁcation, object detection networks and image segmentation networks to ﬁnd objects in the

image. For classiﬁer and detector, we choose Xception and YOLOv3, respectively. For image segmentation,

we tested several approaches based on Unet with backbone networks. The classiﬁcation task achieved an AP

of 77.92%, the detection achieved 37.49% mAP and the segmentation network achieved 26.9% mIoU.

1 INTRODUCTION

In aviation, safety plays a fundamental role and pre-

vention is the preferred method to assure it. Foreign

Object Debris (FOD) are one of the biggest threats to

aircraft’s ground operation (Kraus and Watson, 2001).

In addition, the costs associated with FOD reach over

$5 billion globally every year (McCreary, 2010).

Aerodromes perform regular visual inspections to

the movement area every day to assure the safe circu-

lation of aircraft. However, modern radar and electro

optical-based systems, started to be implemented at

some major airports, These are capable of accurately

detecting FOD in a wide range of weather conditions,

yet, their cost implementation is a major downside to

aerodromes with less movements number.

In recent years, the advent of deep learning and

computer vision allowed the implementation of solu-

tions to tasks that before would require visual inspec-

tion by humans, bringing lower costs and near-human

accuracy. Works by (Cao et al., 2018) and (Han et al.,

2015a) reveal that the application of computer vision

to this problem is viable.

https://orcid.org/0000-0003-3496-3561

https://orcid.org/0000-0003-1557-3082

https://orcid.org/0000-0002-4922-8546

FOD Characterisation

The characterisation and deﬁnition of FOD are broad

since anything that should not be at the movement

area of an aerodrome is foreign to that place. In the

case of the Portuguese Air Force (PoAF), FOD are di-

vided into categories and types, classifying them ac-

cording to their source and material. However, dif-

ferent organisations describe FOD differently from

PoAF (PoAF, 2018).

The materials of the most commonly found ob-

jects are metal (60%) and rubber (19%) while 50% are

dark coloured. Tool pieces, ground equipment, pave-

ment debris and metal from unknown sources are the

objects with greater representation. In terms of size,

FOD can be catalogued in two major groups: clus-

ters of debris with individual size bellow 2 cm and

FOD individually larger than 2 cm (90%). Although

FOD are more prevalent on apron’s areas, most of

the strikes occur on runways and taxiways (McCreary,

2010), where the engine regimes and speed exponen-

tiate safety risks. The Australian Transport Safety Bu-

reau (ATSB, 2010), found that 11% of the FOD oc-

currences lead to wheel, engine and airframe damage.

Moreover, (McCreary, 2010), concluded that FOD

strikes occur 4.0 times per 10,000 movements, and

79% of those (3.2/10,000) inﬂicted damage to the air-

craft. In terms of repairing and replacing, FOD inﬂict

814

Almeida, J., Cruz, G., Silva, D. and Oliveira, T.

Application of Deep Learning to the Detection of Foreign Object Debris at Aerodromes’ Movement Area.

DOI: 10.5220/0011790600003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP, pages

814-821

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

an average cost of $10, 366 per strike. Regarding mil-

itary aviation, the risks associated with FOD are no

different from civil aviation. The aircraft which suffer

more from FOD damage are the ones with turbofan or

turbojet engines with air intakes placed lower(Warren

et al., 2005).

FOD Detection Systems

Although many airports still rely on the traditional

methods of detecting FOD, larger ones started to

implement radar and electro-optical solutions. Cur-

rently, a number of systems are available on the mar-

ket (FAA, 2018) and these can be divided in two:

ﬁxed and mobile. Fixed systems provide continuous

surveillance and are installed either on the light ﬁx-

tures of runways and taxiways or on towers near them.

Mobile systems, are installed on the back of vehicles

solely dedicated to that purpose, and must detect FOD

while moving, at least, at 30 km/h (FAA, 2009). The

accuracy of these systems is higher than 95% as reg-

ulated by the Federal Aviation Administration (FAA).

The two major downsides of these systems are their

acquisition and maintenance costs as well as the per-

missions required, making them difﬁcult to install at

medium and small airports.

FOD Detection with Computer Vision

According to (Huang, 1996), computer vision aims

to develop computational models that imitate the hu-

man visual sensory system in order to develop au-

tonomous systems. These models have experienced

developments with larger databases of images, more

efﬁcient computation hardware and especially the in-

troduction of more powerful machine learning algo-

rithms (Shapiro, 2020). The most relevant type of ma-

chine learning algorithms for computer vision in the

last decade are Neural Networks (NNs). NNs have en-

abled many new applications in computer vision and

FOD is no exception.

The ﬁrst application of computer vision to FOD

detection was made with conventional methods such

as Local Binary Patterns (Han et al., 2015b) and His-

togram of Oriented Gradients (HOG), but these meth-

ods were not able to cope with background variation

(Cao et al., 2018). More recently, other authors em-

ployed object detection based on NN. In (Cao et al.,

2018) a framework for FOD detection is introduced

and is composed of two stages. The ﬁrst is based on

a region proposal network and the second combines a

spatial transformer network with a classiﬁcation net-

work. (Liu et al., 2018) also use a region proposal

network but now associated to Faster R-CNN and Fo-

cal Loss to improve the detection of small objects. (Li

and Li, 2020) on the other hand, follow a simpler ap-

proach, by using just a single stage detection network:

You Only Look Once v3 (YOLOv3). Despite the au-

thors claiming good results, their datasets and trained

models’ weights are not publicly available which lim-

its comparisons and even their applicability.

Paper Contributions and Organization

This paper contributes to the development of low-cost

FOD vision-based detection systems using ground ve-

hicles. In particular, a new image dataset (Almeida

et al., 2022) with FOD objects collected at an aero-

drome is made publicly available. The FOD object

selection for the dataset relied on FAA’s Advisory

Circular 150/5220-24 (FAA, 2009). The dataset is

composed by a set of 25,743 images collected by

three different cameras operating in different wave-

lengths (infra-red and visible light spectrum). The

presented low-cost FOD detection system resembles

as much as possible the way a fully operational sys-

tem would work, by using cameras mounted on the

top of a ground vehicle which regularly drives around

the aerodrome. Additionally, a thorough description

of the obtained dataset, the implemented software

(including classiﬁcation, detection and segmentation

networks) and the obtained performance metrics is

provided, thus consisting on a benchmark to be used

by the scientiﬁc community.

The remainder of the paper is organized as fol-

lows. Section 2 describes the system’s hardware ar-

chitecture used for the data acquisition and the imple-

mented software pipeline. Then, Section 3 focuses on

the process of building our own and also the resulting

characteristics. Section 4 contains the description of

the experiments that were conducted to evaluate the

proposed methodology. Finally, Section 5 presents

the main conclusions and future work.

2 SYSTEM ARCHITECTURE

Systems which are based on radar and electro-optical

sensors achieve good performance, yet they are ex-

pensive and require several permissions. On the other

hand, a system based on electro-optical sensors em-

bedded in vehicles which move around the aerodrome

is a low-cost, effective and passive solution that sat-

isﬁes most of the intended goals. Moreover, our vi-

sion, is to have the system installed in vehicles al-

ready moving in the aerodrome area, such as ﬁreﬁght-

ers, towing and service trucks. One objective of our

work is to create an FOD dataset to test the viability

Application of Deep Learning to the Detection of Foreign Object Debris at Aerodromes’ Movement Area

815

Figure 1: Image acquisition details, including the ﬁeld-of-

view of each sensor. Sensor 01 FOV in blue, 02 in red and

03 in green.

of the implementation of the embedded system afore-

mentioned. This involves having a mobile platform

that simulates as close as possible the intended de-

ployment of the system, hardware and connections,

and creating a software pipeline.

2.1 Setup Description

The image acquisition platform comprises three el-

ements: vehicle, cameras and embedded computa-

tional boards (see Figure 1). For the vehicle, we opt

for a van whose height allows for a greater Field of

View (FoV) of the cameras and for an easy installa-

tion of the system on its roof. As for the electro opti-

cal sensors, we choose cameras that work on different

ranges of the electromagnetic spectrum.

The ﬁrst camera (sensor 01) has a sensor which

works on the visual spectrum (VIS), the second (sen-

sor 02) on the visual and near-infrared spectra (VIS

+ NIR) and the third (sensor 03) on the long-wave in-

frared spectrum (LWIR). The ﬁrst and the second sen-

sors are connected to a NVIDIA

 Jetson TX2 and a

Raspberry Pi v3, respectively. The third sensor is a

Gobi-384. The FoV which the cameras provide when

installed on the top of the van at 2.55 m of height

and 38

of inclination in relation to the horizontal can

be observed in Figure 1. In the case of sensor 01,

the height of the trapezoid (in blue) is 7.42 m and the

width of the larger base 12.45 m. For sensor 02, the

height of the trapezoid (in red) is 9.21 m and the width

of the larger base 13.08 m. The sensor 03 (in green)

has the smallest FoV with a height of 2.50 m and a

width of the larger base of 2.59 m.

After the image acquisition campaigns, the posi-

tion of the objects in the image was annotated with an

open-source image labelling tool – Label-Studio. The

implementation of the machine learning techniques

was done using Keras, a high-level NN Application

Programming Interface (API) written in Python.

2.2 Neural Networks

Our proposed solutions to detect FOD are based on

a classiﬁcation network, an object detection network

and an image segmentation network. The classiﬁer

network is Xception (Chollet, 2017), the detector is

YOLOv3 (Redmon and Farhadi, 2018) and the seg-

mentation network is Unet (Ronneberger et al., 2015).

This decision has to do with several factors where

speed in real-time classiﬁcation and detection is nec-

essary and computational board’s processing capabil-

ity is limited. Moreover, we aim to establish a bench-

mark for future developments using the dataset.

The main feature of Xception is in the application

of the depthwise separable convolution. This method

reducea the number of learnable parameters and the

computational cost, making the model lighter (Lak-

shmanan et al., 2021).

YOLOv3 is a single-stage fully convolutional ob-

ject detector which uses Darknet-53, trained on Im-

ageNet. Darknet-53 convolutional layers are ar-

ranged in consecutive 1x1 and 3x3 convolutional lay-

ers followed by batch normalization, Leaky ReLU

and residual blocks inspired in ResNet (He et al.,

2016). YOLOv3 does not have pooling layers; in-

stead, it applies convolutional layers of stride 2 for

down sampling. This characteristic further helps de-

tect smaller objects since it preserves low-level fea-

tures. The network generates bounding boxes con-

taining information about the position and size as well

as a conﬁdence score.

To perform segmentation, we selected Unet which

is a fully convolutional network. This network is

comprised of two parts, an encoder and a decoder.

The encoder, consists of a stack of convolutional and

max pooling layers that contract the information in

the image. The decoder consists mostly of transposed

convolutions layers, which upsample the map resolu-

tion to achieve adequate localization capability. An-

other important aspect of Unet is that the upsampling

part creates a large number of feature channels, thus

passes more contextual information to improve local-

ization. (Ronneberger et al., 2015) For the backbone

network, we tested both ResNet-34, ResNet-50 and

ResNet-101 (He et al., 2016). The rational behind

this selection was the compromise between feature

extraction performance, number of trainable param-

eters (thus memory constraints) and number of oper-

ations.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

816

3 FOD DATASET

3.1 Existing Datasets

Some authors had already addressed the FOD issue

through machine learning when this project started,

but most datasets are not publicly available. In the

meantime, (Munyer et al., 2021) released a large,

publicly available, dataset compared to private ones,

comprising 30,000 images and 31 object categories in

three different lighting and two weather conditions.

Although that dataset could be useful to our problem,

the way the images were captured do not match the

way we expect to deploy our system.

We want to implement the cameras at the back of

several vehicles, which limits the point of view of the

camera in relation to the objects (FOD) in terms of

angle and height. Combining the factors and the limi-

tations of the aforementioned datasets, we decided to

create our own dataset, detailed in the sequel.

3.2 Objects Selected for the Dataset

In order to guide our selection of FOD, we relied on

FAA’s Advisory Circular 150/5220-24 (FAA, 2009).

Consequently, we chose objects that would resemble

as much as possible the descriptions given on the Ad-

visory Circular (AC), and other objects we found rel-

evant based on other reports (ATSB, 2010), papers

(Herricks et al., 2015) (McCreary, 2010) and PoAF’s

prevention plans (PoAF, 2018).

AC 150/5220-24 further describes the perfor-

mance that the FOD detection systems must deliver.

If the system is installed on a mobile platform, it must

be able to detect the FOD at a minimum speed of

30 km/h. The FOD selected followed a thorough se-

lection process which was mainly based on the AC

150/5220-24’s list.

3.3 Data Acquisition and

Characteristics

During this work, we deployed our vehicle with the

cameras in two campaigns at Sintra Air Base, Portu-

gal.

Table 1 provides a general perspective on the num-

ber of frames, labels and objects captured by each

camera, during the ﬁrst campaign. It also presents the

dimensions of the objects (in pixels). From the table,

we can observe that although Gobi-384 captured the

largest number of frames, it lacks two labels and two

objects due to the limited FoV. On the other hand, the

remaining two cameras captured less images but en-

close all the labels and objects. The average size, size

range and standard deviation of the width and height

of sequences 01 and 02 are similar.

Table 1: Capture sequence and objects characteristics of the

images that compose the train and validation set (ﬁrst ac-

quisition campaign).

Sensor ID 01 02 03

Spectrum Vis.

NIR +

Vis.

LWIR

Resol. [px.]

1920×

1080

1920×

1080

384×

288

Frames 9,260 5,672 10,388

Type of objects 16 16 14

Number of objects 21 21 19

(Units in pixel)

Average width 40 37 22

Width range [4;258] [5;239] [3;103]

Width std. dev. 38 36 20

Average height 27 24 16

Height range [5;142] [5;137] [3;49]

Height std. dev. 23 22 11

In a real world application, the models should be

able to alert the presence of an unknown type of ob-

ject. Therefore, with the goal performing an evalu-

ation closer to real world application and of having

a robust test set, with previously unseen objects, we

captured an additional set of images. This collection

includes new samples on a road-like surface to mimic

the background of previous acquisitions. This time,

we only captured images with sensors 01 and 02 and

some relevant data about this capture is presented in

Table 2.

There is no standardised deﬁnition of what a small

object is in machine learning, leading some authors

to take their own approaches. However, (Chen et al.,

2017) considers the median relative area of the ob-

jects between 0.08% and 0.58%, extracted from MS

COCO and Scene UNderstanding (SUN) (Xiao et al.,

2010) datasets, as being small instances. As presented

on Tables 1 and 2, the objects in our dataset ﬁt this

deﬁnition. This aspect, combined with their size vari-

ability makes detection challenging.

In an effort to promote reproducibility and foster

research in the area of FOD detection, we made our

dataset publicly available at Harvard Dataverse web-

site

. The dataset includes data from both campaigns

that were previously described.

https://dataverse.harvard.edu/dataset.xhtml?

persistentId=doi:10.7910/DVN/XSINZN

Application of Deep Learning to the Detection of Foreign Object Debris at Aerodromes’ Movement Area

817

Table 2: Capture sequence and objects characteristics of the

images that compose the test set (second acquisition cam-

paign). This set only contains images captured by sensor

01 and 02. The set contains some objects present on the

ﬁrst train and validation set but also some completely new

classes of objects.

Sensor ID 01 02

Spectrum Vis.

NIR + Vis.

Resol. [px]

1920×

1080

1920×

1080

Frames 237 186

Type of objects 7 7

Number of objects 8 7

(Units in pixel)

Average width 43 70

Width Range [4;227] [13; 320]

Width std. dev. 35 74

Average height 31 51

Height range [9;278] [7;276]

Height std. dev. 49 53

4 SYSTEM TRAINING AND

TESTING

4.1 Image Preparation

The cameras’ resolution is 1920 × 1080 px. Images

this big result in huge computation requirements for

NN, which limits real-time application, but increased

detail also produces greater performance.

We solve address this problem by creating tiles of

different sizes. It artiﬁcially reduces image size and

required computational power while making sure we

keep the objects and their original features.

We did not use LWIR images for training and test-

ing. This decision was made based on the small num-

ber of images with FOD, the small FoV of the camera

and our difﬁculty in ﬁnding the objects in the images

and labelling them.

Since one of the goals of our work is to determine

which frameworks better suit our problem, we opted

for a ﬁxed tile size for image classiﬁcation and an-

other for object detection. The median relative area

of the objects in relation to the original image area is

quite small (< 0.18%). By cropping the image into

tiles of 256 × 256 px for classiﬁcation, that ratio will

increase (< 0.763%). We followed the same princi-

ple for object detection, but with tiles of 416 × 416

px. The range of the median relative area changes

from 0.0189% to 0.180% to 0.180% to 0.289%. For

both classiﬁcation and detection tiles, we applied a

horizontal and vertical overlap ratio of 0.5. For the

segmentation network, we used a grid search on sev-

eral crop sizes, including 416x416 tiles.

4.2 EXPERIMENTS AND RESULTS

During the experiments with FOD detection tech-

niques, the images from sensor 01 and 02, obtained

during the ﬁrst acquisition were used for training and

validation. For the test set, we used the images from

the second data acquisition, with unseen objects. In

the current work, our main concern is not the compu-

tational performance, however, we measured the in-

ference time to indicate relative computational perfor-

mance advantages. This metric was determined on a

high performance desktop.

4.2.1 Experiment A: Classiﬁcation

We trained the classiﬁer on two types of subsets: bal-

anced and imbalanced, and tested the models only

on balanced test sets. We also created a subset with

half of the images from sensors 1 and 2 in order

to test if a balanced combination of images from

two sensors would generate better results. Since,

presumably more images provide better results, we

also tested training with all images from both sen-

sors; this resulted in a unbalanced dataset. The im-

ages were either labelled as ‘fod’ or ‘no fod’. The

train/validation split is in a proportion of 89/11%, re-

spectively. We applied transfer learning to the net-

work with its weights pretrained on Imagenet (Deng

et al., 2009).

We conduct training for an unlimited number of

epochs until convergence. We tested different conﬁg-

urations of the hyperparameters and data augmenta-

tion. The balanced dataset which contains the images

from both sensors delivers the best results in terms of

validation accuracy (98.76%). The processing time

for each tile is approximately 0.01s.

During the training process, we faced a challenge.

In many of the training runs, the training loss de-

creased monotonically while the validation loss did

not. In our perspective, the problem is caused by over-

ﬁtting, the limited number of images with FOD and

few data augmentation. This allows the model to per-

form very well in the training examples – high train

accuracy –, but losing generalisation capability.

To obtain the test subset, We divided the images in

tiles with the same parameters as before. From here,

resulted 1,134 tiles for testing which contain FOD and

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

818

1,132 that do not, where 721 and 720, respectively,

correspond to sensor 01.

By testing the images containing the unseen ob-

jects on the trained model, we can see a drop in the

performance. This result was expected since most of

the objects presented to the NN are signiﬁcantly dif-

ferent from the previous. However the obtained result,

an accuracy of 77.92%, is relatively high.

Objects that are similar to the ones used in the

ﬁrst acquisition such as the bolt, the plastic tube and

the metals are correctly classiﬁed. It is clear that the

model tends to classify novelties as FOD. Cracks on

the tarmac and plants growing on these cracks, hap-

pen to be detected over FOD such as a large tree

branch. However, despite not being considered FOD,

plants and cracks may become an hazard.

4.2.2 Experiment B: Detection

The crop size employed for object detection was

416 × 416 px. This allows YOLOv3 to infer in real-

time while scoring good AP results. Just like we did

for Xception, we opted for transfer learning with pre-

trained weights on MS COCO (Lin et al., 2014). The

train/validation split is in a proportion of 89/11%, re-

spectively. We tested the model in different subsets

just like in the case of classiﬁcation.

The subset with more images got the best results,

with a score of 93.16% AP, at training time. The pro-

cessing rate that was obtained was 11.5 fps. Given the

limitation established by the FAA of 30 km/h for mo-

bile platforms in conjunction with capturing the FOD

at least twice ( 2.25 fps), we can say that the system

can work in real-time.

For the test of the detector, we used a subset of

423 images, where 248 contain FOD and 175 do not,

and 237 belong to sensor 01 and 186 to sensor 02.

Similarly to the behaviour of the classiﬁer, the de-

tector had a performance drop. However, the perfor-

mance decreased signiﬁcantly more than that of the

classiﬁer, from 93.16% to 37.49%.

As shown in Figure 2, the model tends to classify

novelties as FOD. One of the cases that causes plenty

false positives are plants. This result is not as bad it

would seem at ﬁrst because plants constitute an or-

ganic FOD, especially in larger quantities.

4.2.3 Experiment C: Segmentation

We performed a grid search on the following param-

eters of the model: network backbone, size and va-

riety of train set, and tile size. All models were

trained up to 200 epochs. The backbones explored

were ResNet34, ResNet50 and ResNet101 - all pre-

trained on the Imagenet dataset. The base dataset was

either images from sensors 01, 02 or a combination

of both. Finally, tile sizes were 416 × 416, 512×512,

832×832 or the original resolution 1920×1080, with

the height cropped to 1056, to ﬁt convolution win-

dows. Dataset size varies for different tile sizes, be-

cause of the number of crops. The proportions of

the training and validation sets were kept constant at

89/11%. For all combinations, Unet was chosen for

segmentation, as mentioned in Section 2.2.

The best model was trained with the combined

dataset of both sensors, ResNet50 backbone and

832 × 832 tile with an mIoU of 71.6% on the val-

idation set and 26.9% on the test set. The num-

ber of images used in each set was 359/45/346

(train/validation/test).A sample of the results can be

observed in Figure 3. Objects like twigs offer a dou-

ble challenge. Firstly, there were no samples simi-

lar to this object on the training set. Additionally,

they don’t have a detailed ground truth mask and pro-

duced the worst results. Additionally, many frames

had incorrect predicted pixels on vegetation, which

was never annotated as a FOD. Objects with a solid

ground truth mask had the best results, even when the

color was similar to background. Regarding inference

speed, the performance varied signiﬁcantly depend-

ing on the backbone network that was used. The net-

works’ inference time allowed a processing rate from

6.7 to 58.8 fps.

5 CONCLUSIONS

In this work we address the problem of FOD detec-

tion with two contributions. Firstly, we build a dataset

of images with three different sensors which operate

in different wavelengths. This dataset resembles as

much as possible the way a fully deployable system

would work. This system must be low-cost and non-

intrusive to the normal operation of an aerodrome.

Secondly, we test classiﬁcation, detection and seg-

mentation techniques to evaluate the pros and cons of

each.

One important outcome of the ﬁrst part was the

creation of our own dataset. It contains 9,497 images

from the visible sensor, 5,858 images from the visible

plus near infrared sensor and 10,388 images from the

long-wave infrared sensor.

To determine the presence of FOD, we trained

classiﬁcation, detection and segmentation networks.

We trained these methods on a set of images and eval-

uated its performance on another set resulting from a

different acquisition campaign. The best classiﬁca-

tion model achieved an accuracy of 77,92% at 90.9

fps, however there are some indications of overﬁtting.

Application of Deep Learning to the Detection of Foreign Object Debris at Aerodromes’ Movement Area

819

Figure 2: Examples of correctly detected FOD. These results were obtained with previously unseen objects. Images in the top

row were captured by sensor 02 and images in the bottom row were captured by sensor 01.

Figure 3: Sample results from the best segmentation model: 832x832 tile, Unet with a ResNet50 backbone trained on images

from both sensors 01 and 02. The ﬁrst object is a tree twig, while the second is a small object similar in appearance to asphalt.

Secondly, we trained the detection network. The per-

formance in the test set was 37.49% AP at 11.5 fps.

Given the difference in AP between training and test-

ing, we suspect that the detector was overﬁtting. The

third experiment used the dataset for segmentation,

where we conducted a grid search on several param-

eters, obtaining the best model on images cropped to

832×832 tiles, used to train the Unet algorithm with a

ResNet-50 encoder pre-trained on Imagenet. The best

model obtained an mIoU score of 71.6% on the vali-

dation set, contrasting signiﬁcantly with the 26.9% on

the test set.

While the results are encouraging, there is also fu-

ture work identiﬁed. Firstly, we would like to carry

out more image acquisition campaigns in different

conditions. Additionally, the computational perfor-

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

820

mance needs to be studied in more detail. In partic-

ular, what compromises are needed to run the NN in

embedded platforms with limited memory. In conclu-

sion, the preliminary results obtained in the present

article provide a solid foundation to many paths of

further improvements and system development.

REFERENCES

Almeida, J., Cruz, G., and Oliveira, T. (2022). Foreign Ob-

ject Debris at Aerodromes.

ATSB (2010). Ground operations occurrences at australian

airports 1998 to 2008 (safety report no. 42). Technical

Report 42, Australian Transport Safety Bureau.

Cao, X., Wang, P., Meng, C., Bai, X., Gong, G., Liu, M.,

and Qi, J. (2018). Region based cnn for foreign ob-

ject debris detection on airﬁeld pavement. Sensors,

18(3):737.

Chen, L.-C., Papandreou, G., Schroff, F., and Adam,

H. (2017). Rethinking atrous convolution for

semantic image segmentation. arXiv preprint

arXiv:1706.05587.

Chollet, F. (2017). Xception: Deep learning with depthwise

separable convolutions. In Proceedings of the IEEE

conference on computer vision and pattern recogni-

tion, pages 1251–1258.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-

Fei, L. (2009). Imagenet: A large-scale hierarchical

image database. In 2009 IEEE conference on com-

puter vision and pattern recognition, pages 248–255.

Ieee.

FAA (2009). AC 150/5220-24 - Foreign Object Debris De-

tection Equipment – Document Information.

FAA (2018). Faa reauthorization bill 2018 foreign object

debris (fod) detection technology.

Han, Z., Fang, Y., and Xu, H. (2015a). Fusion of low-level

feature for fod classiﬁcation. In 2015 10th Interna-

tional Conference on Communications and Network-

ing in China (ChinaCom), pages 465–469. IEEE.

Han, Z., Fang, Y., Xu, H., and Zheng, Y. (2015b). A

novel fod classiﬁcation system based on visual fea-

tures. In International Conference on Image and

Graphics, pages 288–296. Springer.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Herricks, E. E., Mayer, D., and Majumdar, S. (2015). For-

eign object debris characterization at a large interna-

tional airport. Technical report.

Huang, T. S. (1996). Computer vision: Evolution and

promise. In 19th CERN School of Computing

(CSC’19), pages 25–21. CERN.

Kraus, D. and Watson, J. (2001). Guidelines for the

Prevention an Elimination of Foreign Object Dam-

age/Debris (FOD) in the Aviation Maintenance En-

vironment through Improved Human Performance.

Technical report, Federal Aviation Administration -

Flight Standards Service - Aircraft Maintenance Di-

vision.

Lakshmanan, V., G

orner, M., and Gillard, R. (2021). Prac-

tical Machine Learning for Computer Vision. ”

O’Reilly Media, Inc.”.

Li, P. and Li, H. (2020). Research on fod detection for air-

port runway based on yolov3. In 2020 39th Chinese

Control Conference (CCC), pages 7096–7099. IEEE.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,

Ramanan, D., Doll

ar, P., and Zitnick, C. L. (2014).

Microsoft coco: Common objects in context. In Euro-

pean conference on computer vision, pages 740–755.

Springer.

Liu, Y., Li, Y., Liu, J., Peng, X., Zhou, Y., and Murphey,

Y. L. (2018). Fod detection using densenet with fo-

cal loss of object samples for airport runway. In

2018 IEEE Symposium Series on Computational In-

telligence (SSCI), pages 547–554. IEEE.

McCreary, I. (2010). Runway safety: Fod, birds and the

case for automated scanning. Insight SRI, pages 146–

157.

Munyer, T., Huang, P.-C., Huang, C., and Zhong, X. (2021).

Fod-a: A dataset for foreign object debris in airports.

arXiv preprint arXiv:2110.03072.

PoAF (2018). Programa de prevenc¸

ao de danos por objetos

estranhos. Technical Report MBA5 330-3 (A), Por-

tuguese Air Force.

Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental

improvement. arXiv preprint arXiv:1804.02767.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:

Convolutional networks for biomedical image seg-

mentation. In International Conference on Medical

image computing and computer-assisted intervention,

pages 234–241. Springer.

Shapiro, L. G. (2020). Computer vision: the last 50 years.

International Journal of Parallel, Emergent and Dis-

tributed Systems, 35(2):112–117.

Warren, J., Gorton, C., Hoff, S., and Alby, F. (2005). Best

practices for the mitigation and control of foreign ob-

ject damage-induced high cycle fatigue in gas tur-

bine engine compression system airfoils. Annex B Ef-

fects of Sand and Dust on Small Gas Turbine Engines,

NATO RTO Applied Vehicle Technology Panel (AVT)

Task Group-094.

Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., and Torralba,

A. (2010). Sun database: Large-scale scene recogni-

tion from abbey to zoo. In 2010 IEEE computer soci-

ety conference on computer vision and pattern recog-

nition, pages 3485–3492. IEEE.

Application of Deep Learning to the Detection of Foreign Object Debris at Aerodromes’ Movement Area

821