Explainable Deep Semantic Segmentation for Flood Inundation Mapping

with Class Activation Mapping Techniques

Jacob Sanderson

1 a

, Hua Mao

, Naruephorn Tengtrairat

, Raid Raﬁ Al-Nima

and Wai Lok Woo

1 b

Department of Computer and Information Sciences, Northumbria University, Newcastle Upon Tyne, U.K.

School of Software Engineering, Payap University, Chiang Mai, Thailand

Technical Engineering College, Northern Technical University, Mosul, Iraq

Keywords:

Deep Learning, Semantic Segmentation, Explainable Artiﬁcial Intelligence, Flood Inundation Mapping,

Satellite Imagery.

Abstract:

Climate change is causing escalating extreme weather events, resulting in frequent, intense ﬂooding. Flood

inundation mapping is a key tool in com-bating these ﬂood events, by providing insight into ﬂood-prone areas,

allowing for effective resource allocation and preparation. In this study, a novel deep learning architecture for

the generation of ﬂood inundation maps is presented and compared with several state-of-the-art models across

both Sentinel-1 and Sentinel-2 imagery, where it demonstrates consistently superior performance, with an

Intersection Over Union (IOU) of 0.5902 with Sentinel-1, and 0.6984 with Sentinel-2 images. The importance

of this versatility is underscored by visual analysis of images from each satellite under different weather

conditions, demonstrating the differing strengths and limitations of each. Explainable Artiﬁcial Intelligence

(XAI) is leveraged to interpret the decision-making of the model, which reveals that the proposed model not

only provides the greatest accuracy but exhibits an improved ability to conﬁdently identify the most relevant

areas of an image for ﬂood detection.

1 INTRODUCTION

In recent years, climate change is causing rising

sea-levels (Rosier et al., 2023) and extreme weather

events, which is resulting in a rapid increase in the

occurrence and intensity of ﬂooding (Schreider et al.,

2000), with devastating consequences for communi-

ties worldwide. Flood inundation mapping has a piv-

otal role in protecting affected communities by pro-

viding insights into potential ﬂood extents, identify-

ing at-risk areas, and enabling effective resource allo-

cation (Sahana and Patel, 2019).

Given the rise in the frequency of ﬂoods, the time-

liness of ﬂood inundation map generation is of grow-

ing importance. Traditional methods often rely on

manual preprocessing of images making them labour-

intensive and time-consuming to implement (Landuyt

et al., 2018). As computational power has increased,

deep learning has emerged as an effective technique,

with Convolutional Neural Networks (CNNs), prov-

ing most effective (Tavus et al., 2022). CNNs, of-

https://orcid.org/0009-0002-5724-6637

https://orcid.org/0000-0002-8698-7605

ten in a fully convolutional network (FCN) architec-

ture, segment images into water and non-water pix-

els, outperforming classical machine learning meth-

ods (Gebrehiwot and Hashemi-Beni, 2020). Increas-

ingly, models are employing encoder-decoder archi-

tectures, with U-Net and DeepLab variants being pop-

ular choices (Katiyar et al., 2021; Ghosh et al., 2022;

Helleis et al., 2022; Muszynski et al., 2022; Li and

Demir, 2023; Paul and Ganju, 2021; Yuan et al., 2021;

Sanderson et al., 2023a; Sanderson et al., 2023b).

The primary drawback of employing deep learn-

ing lies in its lack of transparency, with deep neu-

ral net-works often being referred to as ‘black box’

models. While these models can achieve impressive

accuracy in generating ﬂood inundation maps, un-

derstanding the rationale behind their decisions can

be challenging. This raises concerns about the trust-

worthiness of these models, as it becomes difﬁcult to

verify that their decisions are not inﬂuenced by bias

or error. To address this issue, explainable artiﬁcial

intelligence (XAI) has emerged as a solution. XAI

aims to provide human-understandable explanations

for the decisions made by deep neural networks and

other black box models (Mirzaei et al., 2024). In the

1028

Sanderson, J., Mao, H., Tengtrairat, N., Al-Nima, R. and Woo, W.

Explainable Deep Semantic Segmentation for Flood Inundation Mapping with Class Activation Mapping Techniques.

DOI: 10.5220/0012432300003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 1028-1035

ISBN: 978-989-758-680-4; ISSN: 2184-433X

context of ﬂood inundation mapping, XAI could of-

fer valuable insights into the factors inﬂuencing ﬂood

extent prediction, enhancing the trust of stakehold-

ers. Despite this, as far as the authors are aware, re-

search in XAI for ﬂood modelling is limited. Exist-

ing studies mainly focus on explaining classical ma-

chine learning algorithms using model-agnostic XAI

techniques like LIME and SHAP (Prasanth Kadiyala

and Woo, 2021; Pradhan et al., 2023). The use of

classical machine learning limits their ability to cap-

ture complex relationships and results in lower accu-

racy. The model-agnostic algorithms used in these

studies are unable to access the inner workings of

the predictive models, limiting the insights they can

provide. To address this, model-speciﬁc XAI algo-

rithms exist, which can provide more ﬁne-grained ex-

planations, tailored to a speciﬁc model. To explain

CNN decision-making, techniques like class activa-

tion mapping (CAM) (Zhou et al., 2016) and a grow-

ing number of modiﬁed CAM techniques are em-

ployed (Chattopadhay et al., 2018; Selvaraju et al.,

2017; Fu et al., 2020; Wang et al., 2020; Muhammad

and Yeasin, 2020; Draelos and Carin, 2020). They

visualize important features without signiﬁcantly af-

fecting model performance. The majority of these

techniques are well-demonstrated in image classiﬁ-

cation, however in semantic segmentation gradient

weighted class activation mapping (Grad-CAM) (Sel-

varaju et al., 2017) is by far the most well-used. In im-

age classiﬁcation, it has been noted that Grad-CAM

has a substantial shortcoming, in the gradient averag-

ing step often regions are highlighted that the model

did not use, leading to an unfaithful interpretation. To

overcome this, methods such as high-resolution class

activation mapping (HiResCAM) (Draelos and Carin,

2020) have been proposed to provide a more faithful

explanation, however, the development of this method

for semantic segmentation has yet to be investigated,

as far as the authors are aware.

In addition to model selection, the choice of in-

put data type is a crucial consideration in develop-

ing ﬂood inundation maps. Two of the primary sen-

sor types used on board satellites are synthetic aper-

ture radar (SAR), and optical. The European Space

Agency provides free access to images taken by their

Sentinel missions, making them a popular and ac-

cessible choice for researchers and practitioners in

the ﬁeld. Within the Sentinel missions, Sentinel-1,

which makes use of a SAR instrument, and Sentinel-

2 equipped with a multi-spectral optical instrument

(MSI) are the most commonly used in mapping ﬂood

extent. Each of these instruments has its own dis-

tinct advantages and limitations. SAR can penetrate

through cloud cover and provides its own light source,

meaning that images can be taken in any weather or

light conditions. However, it is susceptible to speckle

noise, which can make detection of the true signal

more challenging. In contrast, MSI delivers imagery

with high spatial and spectral resolution, providing a

greater wealth of information for land cover and wa-

ter classiﬁcation, but is not able to penetrate clouds as

effectively (Konapala et al., 2021).

This work aims to develop a novel architecture

for ﬂood inundation mapping and demonstrate its ef-

fectiveness through comparative analysis with several

state-of-the-art models. The inﬂuence of input data

is also assessed by applying the proposed model to

Sentinel-1 and Sentinel-2, analysing both the overall

quantitative performance, as well as visually assess-

ing the quality of the generated maps in both clear and

cloud-covered conditions. Finally, XAI is employed

to interpret the decision-making of the models, as well

as to provide deeper insight into how the different in-

put data types inﬂuence the behaviour of the model.

This study will explore the suitability of HiResCAM

as an explainability method for semantic segmenta-

tion, in comparison with results from Grad-CAM.

2 METHODOLOGY

2.1 Dataset

The Sen1Floods11 dataset (Bonaﬁlia et al., 2020) is

used in this study, which consists of 446 hand-labelled

images from each of the Sentinel-1 and Sentinel-

2 satellites, with a total of 892 images from 11

global ﬂood events in Bolivia, Ghana, India, Cambo-

dia, Nigeria, Pakistan, Paraguay, Somalia, Spain, Sri

Lanka, and the USA. These ﬂood events were sam-

pled as both Sentinel-1 and coincident Sentinel-2 im-

ages were available within a maximum of 2 days of

each other.

The Sentinel-1 images are taken with a C-band

SAR instrument in the interferometric wide swath

(IW) mode, which allows a wide area to be cov-

ered, making it suitable for large-scale ﬂood map-

ping. The ground resolution of these images is 10m.

The Sentinel-1 satellite offers different polarization

modes, where the images in this dataset are taken

in vertical transmit, vertical and horizontal receive

(VV + VH), offering enhanced information for im-

proved water detection, with relatively low computa-

tional cost. The Sentinel-2 images are taken with an

MSI with a 290km ﬁeld of view, capturing 13 spectral

bands in the visible light, near-infrared and shortwave

infrared spectrum. The ground resolution of these

bands varies, so the images have been re-sampled to

Explainable Deep Semantic Segmentation for Flood Inundation Mapping with Class Activation Mapping Techniques

1029

Figure 1: Architecture diagram of the proposed model.

10m for each to ensure consistency in the analysis.

2.2 Proposed Model

In ﬂood inundation mapping, conventional fully con-

volutional neural networks (CNNs) have been widely

used but face challenges such as information loss and

low-resolution predictions. To address these issues,

encoder-decoder architectures are used by most re-

cent works in segmentation, which are able to cap-

ture high-level semantic information while preserving

spatial detail much more effectively.

The proposed model takes inspiration from this

and adopts a dual encoder-decoder architecture, as

shown in Figure 1. One encoder is a powerful

CNN pre-trained on ImageNet, while the other is a

lightweight CNN, ensuring a balance between accu-

racy and computational efﬁciency. By employing two

base models, the robustness of the ﬁnal model is im-

proved by providing more diverse input representa-

tions to the ﬁnal classiﬁer and enabling the compen-

sation for limitations in one model with the strengths

of the other and vice versa.

The powerful encoder employs convolutional and

identity blocks, mitigating issues like vanishing gra-

dients. The lightweight encoder features bottleneck

blocks for computational efﬁciency. In the decoder,

transposed convolutions perform upsampling to re-

cover spatial information and performance-enhancing

features, including dense skip pathways and deep su-

pervision are integrated to improve the model’s accu-

racy and convergence speed.

Atrous convolution is introduced to the

lightweight encoder to address spatial resolution

reduction. It enhances the ﬁeld of view with-

out increasing computational costs. Additionally,

spatial pyramid pooling and depthwise separable

convolution are employed in its decoder module

to capture multi-scale information while managing

computational complexity.

2.3 Class Activation Mapping

Techniques

CAM is a widely used XAI technique in CNN inter-

pretation. In CAM, the architecture of a CNN is mod-

iﬁed by replacing fully connected layers with global

average pooling layers, to provide class-speciﬁc fea-

ture maps, showing the localization of a CNN (Zhou

et al., 2016). This modiﬁcation requires the retrain-

ing of the model and also limits the variety of CNN

architectures that it will perform well with.

To overcome this, several extended versions of

CAM have been introduced, most notably, Grad-

CAM (Selvaraju et al., 2017). In Grad-CAM, the gra-

dients of a target concept as they ﬂow into a target

layer are used to produce the class activation map, by

ﬁrst ﬁnding the gradient for the target class with re-

spect to the feature map activations, then global av-

erage pooling to give the importance weight for each

neuron. Following this, a weighted combination of

the forward activation maps is computed and ReLU is

applied to ensure only the pixels with a positive inﬂu-

ence on the target class are highlighted.

Due to the gradient averaging step, it has been

identiﬁed that Grad-CAM does not always reﬂect

the locations of the image that the model used for

training, and so can produce misleading explana-

tions, often resulting in smoother heatmaps that sug-

gest a larger area of the image was considered in the

model’s decision making. To overcome this limita-

tion, HiResCAM (Draelos and Carin, 2020) generates

its explanations through element-wise multiplication

of the gradients with the feature maps, then summing

over the feature dimensions.

2.4 State-of-the-Art Models

To best demonstrate the superior performance of the

proposed model, a comparative analysis is conducted

with two state-of-the-art models, DeepLabV3+ and

U-Net++, each with both a ResNet50 and Mo-

bileNet

V2 back-bone. These models have been se-

lected as they remain two of the most sophisticated

and commonly used methods for semantic segmenta-

tion.

DeepLabV3+ incorporates atrous convolution and

spatial pyramid pooling to effectively handle seg-

mentation tasks involving multiple objects at differ-

ing scales. A streamlined decoder is then employed,

offering control of the density of the encoder fea-

tures, as well as restoring the precise object bound-

aries (Chen et al., 2018). U-Net++ involves a CNN

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1030

as the encoder for feature extraction, followed by up-

sampling by the decoder to in-crease the output res-

olution. Dense skip connections are employed to re-

duce the semantic gap between the en-coder and de-

coder, as well as deep supervision to facilitate versa-

tile operation modes (Zhou et al., 2018).

ResNet50 consists of residual blocks, each involv-

ing a sequence of 3x3 convolution, batch normaliza-

tion and ReLU activation. Skip connections are em-

ployed after each block enabling certain layers to be

bypassed, ad-dressing issues related to model degra-

dation and vanishing gradients (He et al., 2016).

MobileNet v2 is a streamlined architecture fea-

turing depthwise separable convolution, which can

reduce computational complexity while maintain-

ing high performance. An inverted residual block

with linear bottle-neck is incorporated which takes a

compressed low-dimensional representation and ex-

pands it into a higher dimension, applying lightweight

depthwise separable convolution, before projecting

the features back to a low-dimensional representation.

This minimizes the memory requirements through the

reduction of parameters (Sandler et al., 2019).

2.5 Experiments

The Pytorch deep learning framework is used to de-

velop the models used in this study, accelerated with

the use of NVIDIA A100 graphics processing unit

(GPU), accessed through the cloud. The images and

labels in the dataset are split into training, validation

and testing sets, with 251 training, 89 validation and

90 testing samples for each satellite. The training

images are augmented through random ﬂipping and

cropping of the training images create more variation

in the data, improving the robustness of the model.

The validation and testing images are cropped at a

ﬁxed point to ensure consistency in visualizations. All

images are normalized by the mean and standard de-

viation, ensuring all values are within the same scale

for effective optimization. A batch size of 16 is used

for training the models, with the Cross Entropy Loss

function, which measures the similarity between the

true and predicted values.The Sen1Floods11 dataset

is imbalanced, with a higher number of non-water

pixels. To address this, a weighting of 8:1 is applied

within the loss function, placing 8 times the impor-

tance on the water class to ensure the model places

more importance on correctly identifying these pix-

els.

The optimizer AdamW is used with a learning rate

of 5e-4. AdamW is a stochastic gradient descent-

based optimization method, where the ﬁrst-order and

second-order moments are adaptively estimated, with

an improved weight decay method through decou-

pling the weight decay from the optimization steps

(Loshchilov and Hutter, 2019). Cosine Annealing

Warm Restarts is used to schedule the learning rate,

where the learning rate is decreased from a high to a

low value, and then restarts at a previous good weight

(Loshchilov and Hutter, 2017).

The model was trained over 250 epochs, being

evaluated against the accuracy, loss, F1 score, and In-

tersection over Union (IOU) on both the training, then

the validation set to ensure that the model can gener-

alise to unseen data. The model with the best valida-

tion IOU is saved and evaluated on the testing data,

against the same metrics.

3 EXPERIMENTAL RESULTS

3.1 Quantitative Results

The performance of each model against the met-

rics described in section 3.5 is given in Table 1.

These results highlight the proposed model’s superi-

ority for both image types. In the case of Sentinel-

1 images, U-Net++ and DeepLabV3+ yield similar

IOU values, with the U-Net++ model employing the

ResNet50 backbone and the DeepLabV3+ model with

the MobileNet V2 backbone slightly surpassing the

other two. Furthermore, the DeepLabV3+ model with

the MobileNet V2 backbone excels in the F1 score.

When analysing Sentinel-2 images, performance im-

proves compared to Sentinel-1. Both U-Net++ mod-

els outperform the DeepLabV3+ models. It is impor-

tant to note that performance varies between the two

backbones, with ResNet50 excelling in conjunction

with U-Net++ with the Sentinel-1 images, while Mo-

bileNet V2 performs slightly better in all other cases.

3.2 Visualizations

Figures 2 to 6 show visualizations of the ﬂood inunda-

tion maps generated by each model, and the explana-

tions provided by both Grad-CAM and HiResCAM,

with each depicting an example image in clear con-

ditions and a cloud-covered image for both Sentinel-

1 and Sentinel-2. These images illustrate that the

models achieving the best performance across eval-

uation metrics, especially IOU, generate highly de-

tailed segmentation masks, resulting in more precise

maps. Conversely, models with poorer performance

produce coarser masks, featuring an increased num-

ber of false positive pixel classiﬁcations. Notably, the

proposed model generates inundation maps that are

Explainable Deep Semantic Segmentation for Flood Inundation Mapping with Class Activation Mapping Techniques

1031

Table 1: Accuracy, loss, F1 score and IOU for each model.

Model Accuracy Loss F1 Score IOU

Sentinel-1

U-Net++ ResNet50 0.9413 0.3054 0.6127 0.5642

U-Net++ MobileNet V2 0.9400 0.2907 0.6070 0.5440

DeepLabV3+ ResNet50 0.9415 0.2740 0.6444 0.5464

DeepLabV3+ MobileNet V2 0.9371 0.3099 0.6372 0.5660

Proposed Model 0.9730 0.2390 0.7327 0.5902

Sentinel-2

U-Net++ ResNet50 0.9382 0.1524 0.7081 0.6514

U-Net++ MobileNet V2 0.9468 0.1345 0.7053 0.6583

DeepLabV3+ ResNet50 0.9525 0.1730 0.7162 0.6427

DeepLabV3+ MobileNet V2 0.9412 0.1917 0.7209 0.6456

Proposed Model 0.9691 0.1129 0.7894 0.6984

more closely aligned with the ground truth than any

other model.

Sentinel-1 demonstrates overall modest perfor-

mance, particularly struggling in areas with ﬁner de-

tail. In contrast, Sentinel-2, while presenting more

distinct ﬂood maps in clear conditions, faces signiﬁ-

cant challenges in cloud-covered scenarios, where the

maps are more poorly deﬁned. Notably, the image

taken in clear conditions includes some background

areas which appear similar to the water pixels in the

Sentinel-1 image, while in the Sentinel-2 image, there

is more clear contrast between the two, leading to

the Sentinel-1 trained models either falsely classify-

ing background areas as water, or underpredicting the

water covered areas.

The class activation maps visualise the impor-

tant features for each model’s decision-making and

demonstrate that the proposed model not only gener-

ated the most accurate ﬂood inundation maps, but also

made the most appropriate decisions, placing higher

conﬁdence on the most relevant pixels, and lower

importance on the non-ﬂooded pixels than any other

model. It is also evident from these images that Grad-

CAM does produce a smoothed class activation map,

highlighting a larger area as important, which aligns

with the assertion that its gradient averaging causes

unfaithful interpretation. HiResCAM provides a class

activation map equally understandable to humans as

Grad-CAM but presents a more faithful representa-

tion of the model’s decision-making process.

3.3 Discussion

The model proposed in this work excels in generat-

ing ﬂood maps from both Sentinel-1 and Sentinel-2

imagery, with a higher level of accuracy than any of

the state-of-the-art models. Notably, there are per-

formance disparities observed between the models

Figure 2: Visualizations generated by U-Net++ with

ResNet50 backbone for (A) a Sentinel-1 image in clear

conditions, (B) a Sentinel-1 image in cloud-covered con-

ditions, (C) a Sentinel-2 image in clear conditions and (D)

a Sentinel-2 image in cloud-covered conditions.

when trained with Sentinel-1 and Sentinel-2, where

DeepLabV3+ with MobileNet V2 exhibited the high-

est IOU in Sentinel-1 imagery, while U-Net++ with

MobilNet V2 was superior with Sentinel-2. The pro-

posed model, outperforming all state-of-the-art mod-

els across both image types, stands out as the most

versatile option. The visual analysis of the ﬂood inun-

dation maps demonstrates how crucial this versatility

is, as each satellite image type demonstrates superior

performance in certain conditions, so the ability to

easily change the data type without needing to change

the model is beneﬁcial. With Sentinel-1 images, the

models struggle to detect ﬁner details and frequently

overpredict water-covered areas, resulting in a coarser

inundation map. Conversely, with Sentinel-2 images

the ﬁne detail is much more easily detected, how-ever,

cloud coverage signiﬁcantly impacts performance, re-

sulting in a map with insufﬁcient detail.

The class activation maps provide further insights

into the superior performance of the proposed model.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1032

Figure 3: Visualizations generated by U-Net++ with Mo-

bileNet V2 backbone for (A) a Sentinel-1 image in clear

conditions, (B) a Sentinel-1 image in cloud-covered condi-

tions, (C) a Sentinel-2 image in clear conditions and (D) a

Sentinel-2 image in cloud-covered conditions.

Figure 4: Visualizations generated by DeepLabV3+ with

ResNet50 backbone for (A) a Sentinel-1 image in clear

conditions, (B) a Sentinel-1 image in cloud-covered con-

ditions, (C) a Sentinel-2 image in clear conditions and (D)

a Sentinel-2 image in cloud-covered conditions.

Figure 5: Visualizations generated by DeepLabV3+ with

MobileNet V2 backbone for (A) a Sentinel-1 image in clear

conditions, (B) a Sentinel-1 image in cloud-covered condi-

tions, (C) a Sentinel-2 image in clear conditions and (D) a

Sentinel-2 image in cloud-covered conditions.

Figure 6: Visualizations generated by the proposed model

for (A) a Sentinel-1 image in clear conditions, (B)

a Sentinel-1 image in cloud-covered conditions, (C) a

Sentinel-2 image in clear conditions and (D) a Sentinel-2

image in cloud-covered conditions.

While many of the ﬂood inundation map visualiza-

tions, particularly the cloud-covered Sentinel-1 im-

age, look similar across each model, the class acti-

vation maps reveal discrepancies in how the models

made the decisions. For instance, the U-Net++ mod-

els both have a signiﬁcantly lower magnitude of im-

portance placed on the relevant pixels for this image.

However, they place a higher magnitude of impor-

tance on the irrelevant pixels for the Sentinel-1 im-

age in clear conditions. The proposed model con-

sistently assigns a higher magnitude of importance to

the ﬂooded areas, and a lower magnitude to the non-

ﬂooded areas, demonstrating that it has more effec-

tively learned to delineate ﬂoods with greater conﬁ-

dence than any of the state-of-the-art models. This is

a crucial consideration for the employment of a model

in a ﬂood inundation mapping system, as it shows that

the model has more effectively learned how to iden-

tify ﬂooded areas, so can be better trusted to correctly

identify ﬂoods in practice.

It was observed that the class activation maps gen-

erated with the use of Grad-CAM are smoothed due

to the gradient averaging step, bringing attention to a

larger area of the image than the model truly focuses

on. This brings into question the faithfulness of the in-

terpretation of the models. The explanations provided

by Grad-CAM generally present a more favourable

view of the models’ performance than HiResCAM,

potentially leading to misplaced trust in the models’

abilities. This could have potentially catastrophic im-

plications for ﬂood mapping, as this could lead to in-

accurate ﬂood maps being generated, leading to inap-

propriate resource allocation and risk assessment.

Explainable Deep Semantic Segmentation for Flood Inundation Mapping with Class Activation Mapping Techniques

1033

4 CONCLUSIONS

This study addresses the urgent need for timely and

accurate ﬂood inundation mapping in the face of in-

creasing climate-induced challenges. We introduce

a novel dual encoder-decoder architecture that con-

sistently demonstrates superiority over state-of-the-

art models across both Sentinel-1 and Sentinel-2 im-

ages, as evidenced by comprehensive quantitative and

visual analyses. The versatility of this model is

crucial, highlighted through comparative analyses of

each satellite image under different conditions, re-

vealing strengths and limitations in various scenarios.

XAI is leveraged to better understand the

decision-making process of these models. It is shown

that not only is the proposed model the most accurate,

but it has also learned to detect ﬂooded areas more ef-

fectively with greater conﬁdence, showcasing its im-

proved trustworthiness for practical applications.

Despite the success of the proposed model, further

reﬁnement techniques should be incorporated in the

future to enhance segmentation results, such as con-

ditional or Markov random ﬁelds. Attention mech-

anisms have also demonstrated superior results in a

range of computer vision tasks, particularly channel

and spatial attention. The incorporation of these tech-

niques can enhance the interpretability of the model

by highlighting the regions that the model paid atten-

tion to, without the need for additional post-hoc XAI

algorithms.

The use of additional data is likely to improve the

results of the work. Deep learning models continue

to im-prove as more data is added, allowing them to

learn more complex feature representations more ef-

fectively. The inclusion of different data types, such

as Digital Elevation Models (DEM) and Light Detec-

tion and Ranging (LiDAR), can provide more detailed

information about the topography of an area. This en-

ables the models to generate more precise ﬂood maps,

as well as more detailed explanations through XAI re-

garding how environmental factors impact ﬂood inun-

dation.

REFERENCES

Bonaﬁlia, D., Tellman, B., Anderson, T., and Issenberg,

E. (2020). Sen1ﬂoods11: A georeferenced dataset

to train and test deep learning ﬂood algorithms for

sentinel-1. In Proceedings of the IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition

Workshops, pages 210–211.

Chattopadhay, A., Sarkar, A., Howlader, P., and Balasub-

ramanian, V. N. (2018). Grad-cam++: Generalized

gradient-based visual explanations for deep convolu-

tional networks. In 2018 IEEE winter conference on

applications of computer vision (WACV), pages 839–

847. IEEE.

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and

Adam, H. (2018). Encoder-decoder with atrous sepa-

rable convolution for semantic image segmentation. In

Proceedings of the European conference on computer

vision (ECCV), pages 801–818.

Draelos, R. L. and Carin, L. (2020). Use hirescam instead

of grad-cam for faithful explanations of convolutional

neural networks. arXiv preprint arXiv:2011.08891.

Fu, R., Hu, Q., Dong, X., Guo, Y., Gao, Y., and Li, B.

(2020). Axiom-based grad-cam: Towards accurate vi-

sualization and explanation of cnns. arXiv preprint

arXiv:2008.02312.

Gebrehiwot, A. and Hashemi-Beni, L. (2020). Automated

indunation mapping: comparison of methods. In

IGARSS 2020-2020 IEEE International Geoscience

and Remote Sensing Symposium, pages 3265–3268.

IEEE.

Ghosh, B., Garg, S., and Motagh, M. (2022). Automatic

ﬂood detection from sentinel-1 data using deep learn-

ing architectures. ISPRS Annals of the Photogramme-

try, Remote Sensing and Spatial Information Sciences,

3:201–208.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Helleis, M., Wieland, M., Krullikowski, C., Martinis, S.,

and Plank, S. (2022). Sentinel-1-based water and

ﬂood mapping: Benchmarking convolutional neural

networks against an operational rule-based processing

chain. IEEE Journal of Selected Topics in Applied

Earth Observations and Remote Sensing, 15:2023–

2036.

Katiyar, V., Tamkuan, N., and Nagai, M. (2021). Near-

real-time ﬂood mapping using off-the-shelf models

with sar imagery and deep learning. Remote Sensing,

13(12):2334.

Konapala, G., Kumar, S. V., and Ahmad, S. K. (2021). Ex-

ploring sentinel-1 and sentinel-2 diversity for ﬂood in-

undation mapping using deep learning. ISPRS Journal

of Photogrammetry and Remote Sensing, 180:163–

173.

Landuyt, L., Van Wesemael, A., Schumann, G. J.-P.,

Hostache, R., Verhoest, N. E., and Van Coillie, F. M.

(2018). Flood mapping based on synthetic aper-

ture radar: An assessment of established approaches.

IEEE Transactions on Geoscience and Remote Sens-

ing, 57(2):722–739.

Li, Z. and Demir, I. (2023). U-net-based semantic clas-

siﬁcation for ﬂood extent extraction using sar im-

agery and gee platform: A case study for 2019 cen-

tral us ﬂooding. Science of The Total Environment,

869:161757.

Loshchilov, I. and Hutter, F. (2017). Sgdr: Stochastic gra-

dient descent with warm restarts.

Loshchilov, I. and Hutter, F. (2019). Decoupled weight de-

cay regularization.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1034

Mirzaei, S., Mao, H., Al-Nima, R. R. O., and Woo, W. L.

(2024). Explainable ai evaluation: A top-down ap-

proach for selecting optimal explanations for black

box models. Information, 15(1).

Muhammad, M. B. and Yeasin, M. (2020). Eigen-cam:

Class activation map using principal components. In

2020 international joint conference on neural net-

works (IJCNN), pages 1–7. IEEE.

Muszynski, M., H

olzer, T., Weiss, J., Fraccaro, P., Zortea,

M., and Brunschwiler, T. (2022). Flood event detec-

tion from sentinel 1 and sentinel 2 data: Does land

use matter for performance of u-net based ﬂood seg-

menters? In 2022 IEEE International Conference on

Big Data (Big Data), pages 4860–4867. IEEE.

Paul, S. and Ganju, S. (2021). Flood segmentation on

sentinel-1 sar imagery with semi-supervised learning.

arXiv preprint arXiv:2107.08369.

Pradhan, B., Lee, S., Dikshit, A., and Kim, H. (2023). Spa-

tial ﬂood susceptibility mapping using an explainable

artiﬁcial intelligence (xai) model. Geoscience Fron-

tiers, 14(6):101625.

Prasanth Kadiyala, S. and Woo, W. L. (2021). Flood pre-

diction and analysis on the relevance of features using

explainable artiﬁcial intelligence. In 2021 2nd Arti-

ﬁcial Intelligence and Complex Systems Conference,

pages 1–6.

Rosier, S. H. R., Bull, C. Y. S., Woo, W. L., and Gudmunds-

son, G. H. (2023). Predicting ocean-induced ice-

shelf melt rates using deep learning. The Cryosphere,

17(2):499–518.

Sahana, M. and Patel, P. P. (2019). A comparison of fre-

quency ratio and fuzzy logic models for ﬂood suscep-

tibility assessment of the lower kosi river basin in in-

dia. Environmental Earth Sciences, 78:1–27.

Sanderson, J., Mao, H., Abdullah, M. A. M., Al-Nima, R.

R. O., and Woo, W. L. (2023a). Optimal fusion of mul-

tispectral optical and sar images for ﬂood inundation

mapping through explainable deep learning. Informa-

tion, 14(12).

Sanderson, J., Tengtrairat, N., Woo, W. L., Mao, H., and

Al-Nima, R. R. (2023b). Xﬁmnet: an explainable

deep learning architecture for versatile ﬂood inunda-

tion mapping with synthetic aperture radar and multi-

spectral optical images. International Journal of Re-

mote Sensing, 44(24):7755–7789.

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and

Chen, L.-C. (2019). Mobilenetv2: Inverted residuals

and linear bottlenecks.

Schreider, S. Y., Smith, D., and Jakeman, A. (2000). Cli-

mate change impacts on urban ﬂooding. Climatic

Change, 47:91–115.

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,

Parikh, D., and Batra, D. (2017). Grad-cam: Visual

explanations from deep networks via gradient-based

localization. In Proceedings of the IEEE international

conference on computer vision, pages 618–626.

Tavus, B., Can, R., and Kocaman, S. (2022). A cnn-based

ﬂood mapping approach using sentinel-1 data. ISPRS

Annals of the Photogrammetry, Remote Sensing and

Spatial Information Sciences, 3:549–556.

Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S.,

Mardziel, P., and Hu, X. (2020). Score-cam: Score-

weighted visual explanations for convolutional neu-

ral networks. In Proceedings of the IEEE/CVF con-

ference on computer vision and pattern recognition

workshops, pages 24–25.

Yuan, K., Zhuang, X., Schaefer, G., Feng, J., Guan, L., and

Fang, H. (2021). Deep-learning-based multispectral

satellite image segmentation for water body detection.

IEEE Journal of Selected Topics in Applied Earth Ob-

servations and Remote Sensing, 14:7422–7434.

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Tor-

ralba, A. (2016). Learning deep features for discrim-

inative localization. In Proceedings of the IEEE con-

ference on computer vision and pattern recognition,

pages 2921–2929.

Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N., and

Liang, J. (2018). Unet++: A nested u-net architec-

ture for medical image segmentation. In Deep Learn-

ing in Medical Image Analysis and Multimodal Learn-

ing for Clinical Decision Support: 4th International

Workshop, DLMIA 2018, and 8th International Work-

shop, ML-CDS 2018, Held in Conjunction with MIC-

CAI 2018, Granada, Spain, September 20, 2018, Pro-

ceedings 4, pages 3–11. Springer.

Explainable Deep Semantic Segmentation for Flood Inundation Mapping with Class Activation Mapping Techniques

1035