Foreground Extraction in Histo-Pathological Image by Combining

Mathematical Morphology Operations and U-Net

Jia Li

1,3 a

, Junling He

, Jingmin Long

, Chenxu Wang

3 b

, Jesper Kers

2 c

and Fons J. Verbeek

1 d

Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands

Leiden University Medical Center, Leiden, The Netherlands

School of Computer Science, Xi’an Jiaotong University, Beilin, Xi’an, China

Keywords:

Tissue Segmentation, Foreground Extraction, U-Net, Whole Slide Image.

Abstract:

In recent years, computational pathology is rapidly developing. This resulted in various artiﬁcial intelligence

approaches that have been proposed and applied to images common to the pathology practice, i.e. Whole Slide

Images. It is very important to pre-process these images for a deep learning classiﬁer because they are simply

too large to feed into such a network. In order to get useful information from these images, we propose a new

background removal method for the extracted Regions Of Interest in these images. We combine traditional

morphology image operators and a U-Net framework. Firstly, we pre-process the images by using Contrast

Limited Adaptive Histogram Equalization and thresholding. Then we predict the mask by using pre-trained

U-Net weights. Finally, we use morphological opening and propagation operators on the predicted mask to

reﬁne the masks. The experiments based on different types of staining (H&E, PAS, and JONES silver) show

the effectiveness of our method compared to 3 state-of-the-art models.

1 INTRODUCTION

Pathological examination of biopsies is an important

method for clinical diagnosis and plays a crucial role

in the diagnostic process. Over the past decade, dig-

ital pathology has become one of the main directions

for development in pathology (Li et al., 2022). The

introduction and use of digital slide scanner systems

provide high-resolution whole slide images (WSIs)

which are obtained from the traditional pathologi-

cal slides resulting in image sizes in gigabyte order.

Digitization of pathological slides contributes a lot

to the preservation, sharing, and analysis of patho-

logical information. On the basis of WSIs, an au-

tomated or computer-assisted diagnosis comes into

reach. This requires that dedicated pattern recogni-

tion systems need to be developed and, consequently,

the WSIs need to be prepared for these pattern recog-

nition procedures. At present, pattern recognition

methods are based on, so-called, deep learning sys-

tems. So, WSIs can be used in computational ap-

https://orcid.org/0000-0001-8842-1042

https://orcid.org/0000-0002-9539-5046

https://orcid.org/0000-0002-2418-5279

https://orcid.org/0000-0003-2445-8158

proaches to recognize certain pathologies in these im-

ages (Neuner et al., 2021). Pattern recognition can as-

sure efﬁcient and accurate pathological assessment of

diseases. Although the computer-aided diagnosis of

histo-pathological images gains critical acclaim for its

accuracy, stability and efﬁciency, still, the quality of

the histo-pathological images has posed various chal-

lenges to those proposed techniques. We will discuss

some of these limitations in terms of slide preparation

and computational preparation. In this paper, we em-

ploy histo-pathological images of the kidney, in par-

ticular by investigating biopsies taken from patients

with a kidney transplant.

There are several staining methods for tissues

commonly used in kidney pathology; haematoxylin

and eosin (H&E) is the most frequently used staining

technique. For kidney transplants, also the Periodic

Acid Schiff (PAS) and the JONES silver staining are

used. Examples of these three different staining meth-

ods are depicted in Figure 1. These images are typical

examples for kidney biopsies.

As mentioned, the WSIs are large and have a high

resolution, this complicates the use of the image as

a whole for pattern recognition; i.e., computer mem-

ory is still limited. Therefore, it is important to ﬁrst

ﬁnd where slide of the biopsy and thus on the WSI the

146

Li, J., He, J., Long, J., Wang, C., Kers, J. and Verbeek, F.

Foreground Extraction in Histo-Pathological Image by Combining Mathematical Morphology Operations and U-Net.

DOI: 10.5220/0011803500003414

In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 2: BIOIMAGING, pages 146-153

ISBN: 978-989-758-631-6; ISSN: 2184-4305

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

relevant information is.. Once the locations of infor-

mation are found, just the tissue information on those

locations needs to be extracted. Only this part of the

WSI is considered relevant for further processing.

The relevant information on the biopsy WSI is re-

ferred to as the foreground, and this part is used to ﬁnd

the patterns. Most often there are a number biopsy

sections mounted on one slide. The ﬁrst step is to

identify the regions where the sections are. The empty

background, i.e. the part of the slide where no sec-

tions are mounted, is often indicated with one color,

this needs to be set to zero ﬁrst; an example of this

can be seen in Figure 2a.

Existing methods for foreground extraction for

WSIs are mostly focused on H&E staining (Riasa-

tian et al., 2020). In kidney transplant biopsies, how-

ever, also the PAS staining is important for histo-

pathological analysis. Foreground extraction of PAS

stained WSI can be complicated as the staining fades

over time resulting in a low contrast image. In gen-

eral, the stained tissue in a WSI should be evaluated

for fading, artifact, dirt, and low contrast. In Figure 2b

and 2c some typical examples are depicted.

In order to analyse the information in the tissue,

all tissue areas need to be identiﬁed. The analysis

consists of building a classiﬁer for the pathological

state of the tissue. Therefore it is important to only

use relevant, i.e. tissue, information in the classiﬁca-

tion. Therefore, the WSI needs to be pre-processed to

that this information can be submitted to the classiﬁ-

cation system. We focus on getting this information

from the WSI.

Once the tissue regions in the WSI are identiﬁed,

the tissues themselves need to be identiﬁed as fore-

ground. To this end, a segmentation procedure is ap-

plied. Therefore, the next step of histo-pathological

image analysis is tissue segmentation (Khened et al.,

2021).

The classiﬁcation of the foreground parts is ac-

complished using a deep neural network. The com-

putational task is facilitated trough the use of graphic

processing units (GPUs). However, the memory of

GPUs has limitations. Consequently, the common ap-

proach is to divide the image into smaller patches.

These patches are then used to train the deep neural

network.

So, the preparation of the WSIs for the training

of a classiﬁer requires constructing “patches” for the

relevant areas with tissue. Once, these areas are es-

tablished and “clean”, the patching is the last step for

the preparation. This needs to be done in such a man-

ner that the patches contain useful information for the

training of a classiﬁcation. Therefore our work aims

to pre-process the WSI in such a manner that ﬁrst

the regions are established where sections are on the

slide, and subsequently process the tissue area in each

of the regions such that a binary mask is obtained that

can be used for the construction of patches that con-

tain relevant tissue information.

(a) H&E (b) JONES (c) PAS

Figure 1: Different types of staining.

(a) Empty area (b) fade staining (c) dirty staining

Figure 2: Some challenging examples for tissue segmenta-

tion.

In our approach, the empty area, cf. Figure 2a, of

the WSI is set to zero through a simple thresholding

operation. Next, the tissue part of the WSI needs to be

assessed. In order to deal with low contrast in the im-

age, image enhancement is used. This is typically the

case for staining that is fading over time. We assess

the contrast distribution in the image and enhance the

contrast through a redistribution of the intensity val-

ues. For our approach we use Contrast Limited Adap-

tive Histogram Equalization (CLAHE) (Pizer et al.,

1990) for low-contrast images. This enhancement

method operates on a local assessment of the intensity

distributions and combines well with the subdivision

of patches later in the process. Artifacts and dirt on

the slides are often seen as tissue by the segmentation

procedure. In order to remove these artifacts we use

mathematical morphology operators. The combina-

tion of these procedures will result in areas that are

suitable for consistent and robust patching.

Further, based on results from the literature, we

propose a new method that combines mathematical

morphology operations with a U-Net deep learning

structure. We ﬁrst assess the contrast distribution,

then we use thresholding to remove the empty parts

Foreground Extraction in Histo-Pathological Image by Combining Mathematical Morphology Operations and U-Net

147

from the analysis. Next, we use the MobileNet neu-

ral network (Howard et al., 2017) as the U-Net en-

coder backbone to predict the tissue foreground ar-

eas. This neural network was pre-trained on The Can-

cer Genome Atlas (TCGA) datasets

. Next, we post-

process the initial masking area through mathematical

morphology operations. The resulting mask is then

used to create the patches for the classiﬁer.

The main contributions of this paper are: (1). A

better solution for the extraction of the relevant fore-

ground from the WSI. (2). Development of a new

generic procedure for the pre-processing of kidney

biopsy images. (3) Comparison with three state-of-

the-art methods (Otsu, MobileNet, and EfﬁcientNet-

B3.) on 7 typical images with two binary evaluation

indexes.

The remainder of this paper is organized as fol-

lows: in section 2, several existing approaches related

to our algorithm are presented. Then, in section 3, we

will introduce our method. Section 4 provides the ex-

periment results. Finally, we present our conclusions

in section 5.

2 RELATED WORK

For the processing of WSIs There are two categories

related to our work: background removal and whole

slide image processing.

2.1 Background Removal

In tissue segmentation tasks, background removal can

refer to different approaches. One is to correct for

uneven illumination in the background. And elabo-

rate methods are available for this (Cai and Verbeek,

2015). For WSIs background correction refers to the

removal of non-object parts in the image. This of-

ten uses a segmentation method and the crux is to

ﬁnd the right threshold value(s). Here we see two

categories, one based on traditional machine learn-

ing algorithms and the other based on a deep learn-

ing method. The traditional approaches entail re-

gion growing, the watershed-based method, and Otsu

thresholding (Otsu, 1979). The main idea of the

threshold-based algorithm is to compute an optimum.

In a deep learning approach, patching over the image

is applied to ﬁnd local optima.

By dividing the section parts of the WSI, aka the

regions of interest (ROI), into small patches, a deep

neural network for the segmentation can be used. This

Pretrained model available at: https://kimialab.uwater

loo.ca/kimia/index.php/data-and-code/

works by predicting a label for each of the pixels in a

patch. The label denotes whether it is foreground or

not. Next, the segmented patches are stitched back to

the overall images. There are several neural network

models for image segmentation (Sultana et al., 2020),

such as FCN (Fully Convolutional Network) (Long

et al., 2015), U-Net (Ronneberger et al., 2015), and

Mask R-CNN (He et al., 2017). Riasatian et al. com-

pared different U-Net topologies for background re-

moval in histo-pathological images (Riasatian et al.,

2020). By training on different backbones in their

experiments, they have shown that MobileNet and

EfﬁcientNet-B3 (Tan and Le, 2019) perform better

than the others.

2.2 Whole Slide Image Pre-Processing

Chen et al. proposed a tissue localization pipeline

to process WSIs (Chen and Yang, 2019). They use

thresholding on grayscale images followed by ﬁlling

the holes, which works well on H&E staining. Neuner

et al. developed an open-source library to process

WSIs, which helps the training and evaluation task for

classiﬁcation (Neuner et al., 2021). The general pro-

cedure of their software consists of several steps: ROI

deﬁnition, tile ﬁltering, tile extraction, and tile collec-

tion. After this procedure, we can get a batch of tiles,

which can be directly used for downstream tasks.

They employed several kinds of ﬁlters to segment the

background and foreground. Clustering-constrained

Attention Multiple Instance Learning (CLAM) is an-

other deep-learning-based method that uses attention-

based learning to classify the WSIs (Lu et al., 2021).

In CLAM, thresholding is used on the saturation

channel after blurring the image with a median ﬁlter.

In addition, morphological operators are used to ﬁll

the small holes.

3 METHOD

Our method consists of three main steps: image pre-

processing by contrast assessment and thresholding,

MobileNet mask prediction using pretrained weights,

and mask post-processing using propagation and mor-

phological opening. The workﬂow of our method is

shown in Figure 3.

3.1 Image Pre-Processing

Since there is often some debris on the slide as well as

color coding of non-section areas in the WSIs, we use

thresholding to remove the very dark and very light

BIOIMAGING 2023 - 10th International Conference on Bioimaging

148

Original WSI

Extract ROI

Distribution

Assessment

Low Contrast

CLAHE

Thresholding

MobileNet

Propagation

Opening

Binary Mask

yes

Figure 3: Flow chart of our method.

areas. First, we convert the image to a grayscale im-

age and then assess the contrast, i.e. low or sufﬁcient.

The grayscale image is calculated using this conver-

sion:

Y = 0.299 ∗ R + 0.587 ∗ G + 0.114 ∗ B (1)

Where R,G,B is the intensity value of red, green, and

blue channels. From the resulting intensity image, we

calculate the contrast by using:

max

− I

min

f ull

(2)

where I

max

and I

min

is the maximum and minimum

intensity value in the image respectively. I

f ull

repre-

sents the dynamic range for the given image type; typ-

ical, for an 8-bit image, this is [0, 255]. If the image

contrast is lower than a given value, it is considered

a low-contrast image (see Figure 4 for a histogram

example from the source image in Figure 2(b)). For

this image, the max gray value is 241, and the min

gray value is 80. As we can see, the value above 231

and below 214 are no more than 1%. In this case, the

max

is 231, and I

min

is 214. I

f ull

is 256. So the contrast

value would be 0.066; meaning that only 6.6% of the

dynamic range is used. We consider this image to be

of low contrast and we employ CLAHE as an image

enhancement method on this image. If it is not the

low-contrast image, we directly proceed to threshold-

ing.

In the thresholding step, we set the pixels whose

intensity values are below 70 and above 230 to be

255 (from empirical assessments). By doing this, the

pixels with values beyond this interval are viewed as

background by the neural network. Through image

enhancement and thresholding, some of the noise is

removed. After thresholding, we will get an image

that keeps almost all of the tissues and contains less

background.

(a) (b)

Figure 4: Histogram of low contrast image (Figure 2(b)) (a)

RGB histogram of the image. (b) Zoom in to the highest

frequency gray values.

3.2 U-Net Architecture

Image segmentation tasks can be accomplished in

many ways. In order to further classify the image con-

tent, we need to create masks over the regions where

the tissues are found. Following recent accomplish-

ments in WSI segmentation, we invoke a deep learn-

ing strategy for the segmentation. As a widely used

convolutional neural network model, U-Net works

well for image segmentation tasks, especially for

biomedical microscopy images (Ronneberger et al.,

2015). There are several different backbones that we

can choose for the tissue segmentation task. Accord-

ing to experiments in previous work (Riasatian et al.,

2020), the MobileNet works better than the others.

The main idea of MobileNet is the depth-wise separa-

ble convolution, with which the number of parameters

can be reduced. Therefore, we choose the MobileNet

as the backbone of U-Net, and we use the default set-

tings (the patch size is 400) to produce the (initial)

mask.

3.3 Post-Processing

After the initial prediction of U-Net, we get the mask

with small holes both in the tissue are and on the bor-

der of the tissue area. To ﬁll the holes in the tissue,

we process the mask using image propagation. Image

propagation is to ﬁll the holes in the overall mask. As

for small concavities on the border of the tissue, we

use an opening with a rectangular structuring element,

size 20x20, to ﬁll in the small concavities. Finally, in

this manner. we obtain a binary mask for each of the

ROIs.

Given the generated mask, we could keep only tis-

sues by combining it with the original image using

logic AND operation. Subsequently, we generate the

image patches for the neural network classiﬁer from

the masked area. A simple patch example is shown in

Figure 5. The green line represents the overall mask

outline. We generate 256*256 sized patches. As an

Foreground Extraction in Histo-Pathological Image by Combining Mathematical Morphology Operations and U-Net

149

Table 1: Similarity measure between the masks from our method and the others.

Image

binary correlation binary overlap

EfﬁcientNet MobileNet Otsu EfﬁcientNet MobileNet Otsu

0.9864 0.9817 0.8592 0.9919 0.9890 0.9055

0.7178 0.7967 0.4940 0.7476 0.8258 0.5733

0.9060 0.6395 0.8753 0.9184 0.6215 0.8915

0.9217 0.3483 0.1447 0.9269 0.3363 0.0387

0.9661 0.9782 0.9108 0.9703 0.9810 0.9199

0.9881 0.9923 0.0335 0.9900 0.9935 0.0230

0.9902 0.9860 0.6770 0.9919 0.9885 0.7365

extra heuristic, we establish if there is sufﬁcient infor-

mation in the patch for the training. This is to prevent

the classiﬁer to train on the background. In a patch,

the tissue should have at least an area (256*256)/2

pixels, i.e. 50%, for it to be relevant and kept to feed

it into the classiﬁer (cf. shown in red square in Fig-

ure 5).

Figure 5: A simple example of the resulting patches.

4 EXPERIMENT

4.1 Dataset Preparation

We have selected 7 WSIs from kidney transplants on

different types of staining, which are difﬁcult for the

neural network to predict nice masks. The WSIs we

used are scanned with a Philips DP v1.0. The Au-

tomated Slide Analysis Platform (ASAP) is used to

annotate the best quality ROIs. From the annotation

generated by ASAP (XML ﬁle), we extract the infor-

mation on the ROIs. A WSI is acquired at different

resolutions, aka levels. These levels are ranged from

0 to 9. On average 1 ROI per WSI, which leads in

total to 500 useful patches for training. To speed up

the prediction of neural networks and include as much

information as possible, we extract the level 5 ROIs

corresponds with a magniﬁcation of 1.25×. The orig-

inal images are shown in Figure 6 (a). The images

from top to down are denoted as I

to I

. I

is the

H&E staining. I

is the PAS staining. The others are

all JONES staining. I

, I

, and I

are from the same

kidney but with different types of staining.

4.2 Experimental Settings

We have implemented our algorithms in Python and

use the OpenCV and Diplib library

to process the

image. The experiments were run on a Windows 11

system with Intel 3.4GHz Processor and 16GB mem-

ory. We compare our method with Otsu threshold seg-

mentation in OpenCV, MobileNet, and EfﬁcientNet-

. The mask results are shown in Figure 6.

4.3 Performance Evaluation

To be able to compare the masks generated by our

method with other approaches, we employ binary cor-

relation and binary overlap to measure the correlation

between two binary images (Verbeek, 1995). In this

calculation, the two images should be binary images

of the same size. The total pixels in the image are

denoted as N

tot

. The binary correlation could be cal-

culated as:

bc(I

) =

∩I

∗ N

tot

− N

∗ N

tot

− N

) ∗ (N

∗ N

tot

− N

)

1/2

(3)

https://diplib.org/.

Code and pretrained weights available at: https://kimia

lab.uwaterloo.ca/kimia/index.php/ijcnn-2020-u-net-based-

background-removal-in-histopathology/.

BIOIMAGING 2023 - 10th International Conference on Bioimaging

150

(a) Original (b) Otsu (c) MobileNet (d) EfﬁcientNet (e) Our method

Figure 6: The results of the different methods.

where I

denotes the mask generated by our method,

and I

denotes the mask generated by another method.

is the number of object pixels in I

. N

is the num-

ber of object pixels in I

. N

∩I

denotes the number

Foreground Extraction in Histo-Pathological Image by Combining Mathematical Morphology Operations and U-Net

151

of the patterns that result from the logical AND op-

eration of I

and I

. And the binary overlap could be

calculated as:

bo(I

) =

2 ∗ N

∩I

+ N

(4)

4.4 Results Analysis

The binary correlation and binary overlap between the

masks from our method and the others are shown in

Table 1. The measures indicate the discrepancies be-

tween our methods compared to the other approaches.

The results show that all the methods work well for

H&E staining. Our method can, however, remove all

the background in the image resulting from a dirty

staining I

. For images resulting from a weak staining

, MobileNet predicts fewer tissues and EfﬁcientNet

could ﬁnd more tissues. Otsu and MobileNet view

the empty area in I

as foreground, EfﬁcientNet and

our method can recognize the empty area and only

consider the tissue as foreground. Due to the empty

area and the white part inside the empty area, the Otsu

missed most tissues in I

and I

. Our method removes

the small holes predicted by MobileNet in I

, I

, and

5 CONCLUSIONS

In this paper, we have proposed a solution for the con-

struction of a tissue mask as a pre-processing step

for tissue classiﬁcation. The masking is based on a

tissue segmentation task, which uses a combination

of mathematical morphology processing on results

from the U-Net architecture. Several experiments of

our method on different types of staining show the

method performs well and leads to better results for

the patching. For the PAS staining, there are still a few

parts of tissue missing. So, here we need to do further

ﬁlter and parameter optimization to be as complete

as possible in identifying the tissue parts. Further-

more, we aim to automatically extract the parameters

from the images. This will require further analysis of

a larger number of images.

ACKNOWLEDGEMENTS

This work is partially supported by the Chinese

Scholarship Council (CSC No.202106280008). We

would like to thank the LUMC (Leiden University

Medical Center) to provide the research data.

REFERENCES

Cai, F. and Verbeek, F. J. (2015). Dam-based rolling ball

with fuzzy-rough constraints, a new background sub-

traction algorithm for image analysis in microscopy.

In 2015 International Conference on Image Process-

ing Theory, Tools and Applications (IPTA), pages

298–303. IEEE.

Chen, P. and Yang, L. (2019). Tissueloc: Whole slide digital

pathology image tissue localization. J. Open Source

Software, 4(33):1148.

He, K., Gkioxari, G., Doll

ar, P., and Girshick, R. (2017).

Mask r-cnn. In Proceedings of the IEEE international

conference on computer vision, pages 2961–2969.

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,

Wang, W., Weyand, T., Andreetto, M., and Adam,

H. (2017). Mobilenets: Efﬁcient convolutional neu-

ral networks for mobile vision applications. arXiv

preprint arXiv:1704.04861.

Khened, M., Kori, A., Rajkumar, H., Krishnamurthi, G.,

and Srinivasan, B. (2021). A generalized deep learn-

ing framework for whole-slide image segmentation

and analysis. Scientiﬁc reports, 11(1):1–14.

Li, X., Li, C., Rahaman, M. M., Sun, H., Li, X., Wu, J.,

Yao, Y., and Grzegorzek, M. (2022). A comprehensive

review of computer-aided whole-slide image analy-

sis: from datasets to feature extraction, segmentation,

classiﬁcation and detection approaches. Artiﬁcial In-

telligence Review, pages 1–70.

Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-

volutional networks for semantic segmentation. In

Proceedings of the IEEE conference on computer vi-

sion and pattern recognition, pages 3431–3440.

Lu, M. Y., Williamson, D. F., Chen, T. Y., Chen, R. J., Bar-

bieri, M., and Mahmood, F. (2021). Data-efﬁcient

and weakly supervised computational pathology on

whole-slide images. Nature biomedical engineering,

5(6):555–570.

Neuner, C., Coras, R., Bl

umcke, I., Popp, A., Schlaf-

fer, S. M., Wirries, A., Buchfelder, M., and Jabari,

S. (2021). A whole-slide image managing library

based on fastai for deep learning in the context of

histopathology: Two use-cases explained. Applied

Sciences, 12(1):13.

Otsu, N. (1979). A threshold selection method from gray-

level histograms. IEEE transactions on systems, man,

and cybernetics, 9(1):62–66.

Pizer, S., Johnston, R., Ericksen, J., Yankaskas, B., and

Muller, K. (1990). Contrast-limited adaptive his-

togram equalization: speed and effectiveness. In

[1990] Proceedings of the First Conference on Visu-

alization in Biomedical Computing, pages 337–345.

Riasatian, A., Rasoolijaberi, M., Babaei, M., and Tizhoosh,

H. R. (2020). A comparative study of u-net topologies

for background removal in histopathology images. In

2020 International Joint Conference on Neural Net-

works (IJCNN), pages 1–8. IEEE.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net :

Convolutional Networks for Biomedical. In MICCAI,

pages 234–241.

BIOIMAGING 2023 - 10th International Conference on Bioimaging

152

Sultana, F., Suﬁan, A., and Dutta, P. (2020). Evolution

of image segmentation using deep convolutional neu-

ral network: a survey. Knowledge-Based Systems,

201:106062.

Tan, M. and Le, Q. (2019). Efﬁcientnet: Rethinking model

scaling for convolutional neural networks. In Interna-

tional conference on machine learning, pages 6105–

6114. PMLR.

Verbeek, F. J. (1995). Three-Dimensional Reconstruction

of Biological Objects from Serial Sections Including

Deformation Correction., chapter 4, pages 83–84.

Foreground Extraction in Histo-Pathological Image by Combining Mathematical Morphology Operations and U-Net

153