Weakly Supervised Gleason Grading of Prostate Cancer Slides using

Graph Neural Network

Nan Jiang

, Yaqing Hou

1,∗

, Dongsheng Zhou

, Pengfei Wang

, Jianxin Zhang

and Qiang Zhang

1,∗

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China

Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University,

Dalian 116622, China

School of Computer Science and Engineering, Dalian Minzu University, Dalian 116600, China

Keywords:

Prostate Cancer, Gleason Grading, Graph Neural Network, Weakly Supervised.

Abstract:

Gleason grading of histopathology slides has been the “gold standard” for diagnosis, treatment and prognosis

of prostate cancer. For the heterogenous Gleason score 7, patients with Gleason score 3+4 and 4+3 show a

signiﬁcant statistical difference in cancer recurrence and survival outcomes. Considering patients with Gleason

score 7 reach up to 40% among all prostate cancers diagnosed, the question of choosing appropriate treatment

and management strategy for these people is of utmost importance. In this paper, we present a Graph Neural

Network (GNN) based weakly supervised framework for the classiﬁcation of Gleason score 7. First, we

construct the slides as graphs to capture both local relations among patches and global topological information

of the whole slides. Then GNN based models are trained for the classiﬁcation of heterogeneous Gleason

score 7. According to the results, our approach obtains the best performance among existing works, with

an accuracy of 79.5% on TCGA dataset. The experimental results thus demonstrate the signiﬁcance of our

proposed method in performing the Gleason grading task.

1 INTRODUCTION

Prostate cancer is one of the most common cancers,

seriously affecting around 1 in 9 men all over the

world (Moch et al., 2016). Gleason grading system

has been recognized as the most powerful indicator

for estimating the aggressiveness of prostate cancer,

which is of great signiﬁcance for instructing its risk

stratiﬁcation and determining treatment. Speciﬁcally,

Gleason score (GS) is deﬁned by a sum of the primary

and secondary patterns present in the tumor area with

the range of 2 to 10. Each pattern is assigned with

a score ranging from 1 (G1) to 5 (G5), that higher

scores indicate more aggressive cancer and poorly dif-

ferentiated glands. In current clinical practice, the

lowest GS assigned is GS 6 (G3 + G3) (Epstein and

Jonathan, 2018), since assignment of GS 2 to 5 have

poor reproducibility and low correlation with radi-

cal prostatectomy grade (Zareba et al., 2010) (Epstein

et al., 2015).

Conventionally, the assessment of GS is carried

out manually by well trained pathologists, which is

time-consuming and suffers from very high inter-

observer variability. In recent years, there is growing

∗

Yaqing Hou and Qiang Zhang are the corresponding au-

thors of this article.

interest in computer-aided automatic Gleason grad-

ing methods based on deep learning techniques, es-

pecially Convolutional Neural Network (CNN). Ex-

isting researches can be roughly categorized into su-

pervised methods (Arvaniti et al., 2018) (Ren et al.,

2018) and weakly supervised methods (del Toro et al.,

2017) (Arvaniti et al., 2018) (Xu et al., 2018) (Wang

et al., 2019) (Pinckaers et al., 2020). However, most

of them have focused on the classiﬁcation of homo-

geneous tumor regions with only one single Glea-

son pattern (i.e., G3 ,G4 or G5) (Khurd et al., 2010)

(Kallen et al., 2016) (Nagpal et al., 2018) (Pinckaers

et al., 2020) (Wang et al., 2018), or high grades (i.e.,

GS ¿= 8) versus low grades (i.e., GS ¡= 7) (del Toro

et al., 2017) (Ren et al., 2018) (Xu et al., 2018) (Wang

et al., 2019), which are of limited help for clinical di-

agnosis.

In this paper, we mainly focus on the classiﬁcation

of heterogeneous GS 7 (e.g., G3 + G4 and G4 + G3).

Studies show that GS 7 should be delineated into dif-

ferent prognostic groups since patients with G3 + G4

and G4 + G3 show a signiﬁcant statistical difference

in cancer recurrence and survival outcomes (Hochre-

iter and Schmidhuber, 1997). Comparing to G3 + G4,

the gland structures in G4 + G3 are poorly differenti-

ated (Epstein et al., 2016). Considering patients with

426

Jiang, N., Hou, Y., Zhou, D., Wang, P., Zhang, J. and Zhang, Q.

Weakly Supervised Gleason Grading of Prostate Cancer Slides using Graph Neural Network.

DOI: 10.5220/0010264804260434

In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2021), pages 426-434

ISBN: 978-989-758-486-2

GS 7 reach up to 40% among all prostate cancers di-

agnosed (Siegel et al., 2017), the question of choos-

ing appropriate treatment and management strategy

for these people is of utmost importance.

Recently, several studies have been carried out

on the analysis of heterogeneous GS 7. For exam-

ple, (Zhou et al., 2017) proposed an automatic Glea-

son grading method for heterogeneous GS 7. Their

pipeline consists of gland region segmentation by

K-means clustering, color decomposition, and CNN

based classiﬁcation. (Li et al., 2019) proposed a

two-stage attention based Multiple Instance Learn-

ing (MIL) model that can classify the prostate can-

cer slides into benign, low-grade (i.e., G3 + G3 or

G3 + G4) and high grade (i.e., G4 + G3 or higher).

Both approaches mentioned above are not sufﬁciently

context-aware and do not capture the correlations

among patches that are predictive of Gleason grad-

ing. (Jian et al., 2018) developed a survival analy-

sis model, further exploring the prognosis of prostate

cancer patients that are graded with G3 + G4 and

G4 + G3. Speciﬁcally, they used a CNN based long

short-term memory (LSTM) (Hochreiter and Schmid-

huber, 1997) method to model the spatial relationship

of patches extracted from one slide. However, LSTM

model works in a sequential way, which is not capable

of describing one to many correlations among patches

correctly.

To alleviate the deﬁciencies, we introduce Graph

Neural Network (GNN), which is an emerging tech-

nology for graph data analyzing, into the Gleason

grading task. In particular, with the introduction of

convolution operator on the basis of GNN, Graph

Convolutional Network (GCN) has a strong ability

of modeling the global information and dependencies

among graph nodes. It updates each node embed-

ding by aggregating the information come from multi-

layer neighborhoods. Then the updated node repre-

sentations are used to complete subsequent tasks (Wu

et al., 2019). (Wang et al., 2019) came up with a GCN

based automatic Gleason grading method that assigns

prostate cancer tissue micro-arrays (TMA) with GS

=6 or GS ¿= 7. Their model can capture the dis-

tribution and spatial relations of cells by modeling

TMAs as cell-graphs through learning nucei features

as nodes. However, the cell-graph is not capable of

modeling the gland structures, which is of great im-

portance in the classiﬁcation of GS 7. In our work,

for the sake of capturing both gland features and re-

lations among patches, we crop the prostate cancer

slides into small patches to model patch-graphs.

In this paper, we present a GNN based weakly

supervised Gleason grading method, which models

the prostate cancer slides as graphs with patch-level

Figure 1: Overview of the GNN based Gleason grading

workﬂow. a. Graph reconstruction module. b. GNN mod-

ule.

features and introduces two edge construction mech-

anisms. The patch–level feature extractor is trained

on pure slides (GS 3+3 and GS 4+4) to further pro-

mote the accuracy of classiﬁcation. Our GNN based

model has the inherent ability to accurately capture

both local relations among patches and global topo-

logical information of the tumor area.

The main contributions of this work can be sum-

marized as follows:

• We focus on the classiﬁcation of heterogeneous

GS 7 that very few researches have studied. We

propose a GNN based weakly supervised method

without relying on the patch-level annotations and

non-tumor slides. To the best of our knowledge,

we are the ﬁrst to introduce GNN mechanism into

heterogeneous GS 7 classiﬁcation task.

• We conducted experiments on cancer genome

atlas (TCGA), which is one of the most fa-

mous databases for cancer research. Our model

achieves an accuracy of 79.5% in differentiating

G3 + G4 with G4 + G3, which is superior to state-

of-the-art result.

The rest of this paper is organized as follows. We

ﬁrst review some related works about automatic Glea-

son grading techniques in Sec. 2. Next, in Sec. 3,

we describe the pipeline of our proposed GNN based

model. Implementation details of the experiments and

ﬁnal results with analysis are shown in Sec. 4. Fi-

nally, conclusion is present in Sec. 5.

2 RELATED WORK

Existing automatic Gleason grading methods can be

roughly divided into supervised methods and weakly

supervised methods.

Weakly Supervised Gleason Grading of Prostate Cancer Slides using Graph Neural Network

427

2.1 Supervised Gleason Grading

At an earlier stage of computer aided Gleason grad-

ing, (Khurd et al., 2010) assign GS to prostate cancer

slides by classifying texture, which is characterized

by clustering the ﬁlter responses extracted from ev-

ery pixel. With the revolution of Convolutional Neu-

ral Networks (CNNs), many researchers train CNN

based classiﬁers with sufﬁcient ﬁne-grained labels

that manually annotated by pathologists. Several

prevalent CNN models, such as ResNet (He et al.,

2016), VGGNet (Simonyan and Zisserman, 2014),

and GoogleNet (Szegedy et al., 2014) were tested in

previous works (Arvaniti et al., 2018) (Nagpal et al.,

2018) (Zhang et al., 2020). While promising results

were reported compared to traditional methods, label-

ing every patch and drawing all the discrete tumor ar-

eas are tedious and error-prone for pathologists.

In order to reduce the dependence on detailed la-

bels, many weakly supervised Gleason Grading meth-

ods using only slide-level labels have been released

recently.

2.2 Weakly Supervised Gleason

Grading

Toro et al. detected cancerous patches of prostate can-

cer slides according to the Blue Ratio Image (BR im-

age). Then the selected patches were used to train

a patch-level classiﬁer of high grade (GS ¿= 8) vs.

low grade (GS ¡= 7) (del Toro et al., 2017). However,

they annotated the patches with their slide label di-

rectly, which is inconsistent with the Gleason grading

principle and will seriously damage the accuracy of

classiﬁcation. (Zhou et al., 2017) proposed a research

on the classiﬁcation of heterogeneous GS 7. In their

work, human engineered features and CNN features

are combined to give patch-level predictions. (Xu

et al., 2018) used multi-class Support Vector Machine

(SVM) to classify the texture feature of all patches.

Then the results were integrated to assign prostate

biopsies with GS 6, 7 or GS ¿=8. (Li et al., 2019) de-

veloped an attention based Multiple Instance Learn-

ing (MIL) model, which is a two-stage model that

imitated the procedure that pathologists perform the

Gleason grading. However, they used benign prostate

cancer slides, which are not always available, to train

a cancer versus non-cancer MIL classiﬁcation model.

Information embedded in the ﬁnal GS is not fully in-

corporated. Moreover, in these methods, the ﬁnal GS

is obtained by integrating independent patch-level re-

sults without considering topological information and

correlations among patches.

In this work, we develop a GNN based weakly

supervised Gleason grading method, which aims to

capture both global information and relations among

patches.

3 GNN BASED GLEASON

GRADING

Considering the clinical signiﬁcance of the classiﬁ-

cation of heterogeneous GS 7, we develop a weakly

supervised method that can automatically grade the

GS 7 slides using only slide-level labels. Different

from previous researches that rely on patch-level or

pixel-level annotations, our model uses only cancer-

ous slides with their slide-level labels.

Speciﬁcally, in Sec. 3.1, we reconstruct prostate

cancer slides as graphs. GNN-based models are

trained to learn graph representations of the slides in

Sec. 3.2. Figure 1 shows the overall workﬂow of our

method.

3.1 Reconstruct Prostate Cancer Slides

The graphs we use to train the GNN based model are

reconstructed from prostate cancer slides, with the

patch-level feature vectors as graph nodes and con-

nections among nodes as graph edges. Our recon-

struction module consists of node embedding con-

struction and edge generation.

3.2 Node Embedding Construction

We construct node embeddings by extracting feature

vectors of each patch using CNN models. CNN is

a kind of neural network that can accurately learn

useful information of images. Performance of CNN

learned features is superior to texture descriptors

(Khurd et al., 2010) and human-engineered features

(Zhou et al., 2017) in image analyzing tasks. In this

paper, we train a CNN model as the feature extractor

using prostate cancer slides with pure GS (e.g., G3 +

G3 and G4 + G4), the primary GS and the secondary

GS of which equals to each other. Figure 5 shows the

training process of the feature extractor. We transfer

the ImageNet features by initializing the CNN model

with weights of pretrained model. This makes it pos-

sible to differentiate Gleason patterns G3 and G4.

3.3 Edge Generation

Edges of graphs represent the connections among

nodes in feature space. In this paper, we use the dis-

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

428

tance between node embeddings to represent the cor-

relations. If the distance between two node vectors is

larger than a threshold, they are considered to share

less similarities then no edge will be generated and

vice versa. We employ two kinds of distance metrics

(e.g., Euclidean distance and Mahalanobis distance)

to establish edges of the graphs and evaluate perfor-

mance of them. The details are as follows.

(1) Euclidean distance. It is the most common used

deﬁnition of distance, which represents the linear dis-

tance between two points in n-dimensional Euclidean

space. It is widely accepted as a useful distance met-

ric and can be deﬁned as Eq. (1).

i j

− X

)

− X

) (1)

Where X

and X

are node feature vectors.

(2) Mahalanobis distance. It is another metric suit-

able for calculating the distance between node em-

beddings. Different from Euclidean distance, which

only computes the straight-line distance, Mahalanobis

takes the correlations of attributes into consideration.

Therefore, it is an effective method to calculate the

similarity of two unknown points in high-dimensional

space. Mahalanobis distance of node embedding X

and X

is formulated in Eq. (2).

i j

− X

)

−1

− X

) (2)

Where X

and X

are node feature vectors and Σ is

the covariance matrix that shows the relationships of

attributes.

By modeling prostate cancer slides as graphs, the

heterogeneous GS 7 classiﬁcation problem has been

converted into Graph classiﬁcation task.

3.4 GCN based Model For Gleason

Grading

GCN is a deep learning approach for performing fea-

ture extraction and classiﬁcation on graphs, which in-

troduces convolution operator based on GNN. It plays

a role of message passing and updates each node em-

bedding in a ﬂat way following “neural message pass-

ing method” (Gilmer et al., 2017) formulated as Eq

(3).

(l)

= M(A, H

(l−1)

;θ

(l)

) (3)

Where H

(l)

∈ R

n×D

denotes the output of layer l (e.g.,

node embeddings) and M indicates the message pass-

ing method. M computes the node representation de-

pends on adjacency matrix A and trainable parameters

(l)

in each layer. Speciﬁcally, H

(0)

= X (e.g., patch

feature vectors).

In the message passing framework, each node rep-

resentation is computed by aggregating features of

neighborhood nodes iteratively and the ﬁnal node em-

bedding is generated after several iterations of Eq (3).

In our work, we take GCN as the message passing

method and the iterative process can be expressed as

Eq. (4).

(l)

= GCN

l,embbed

(l)

, X

(l)

)

= ReLU(

−

(l−1)

)

(4)

Where A represents adjacency matrix and X indicates

the input node embeddings. W is trainable weights

of GCN model. Since node embeddings are not ade-

quate for our graph classiﬁcation task, differentiable

pooling (DIFFPOOL) (Bulten et al., 2019) is intro-

duced into our work to hierarchically learn the graph

representation. Notably, it pools node embeddings

(e.g., the output of GCN layer) into different clusters

hierarchically and ﬁnally encodes the graph into a fea-

ture vector as graph representation.

DIFFPOOL module is realized through an assign-

ment matrix S

(l)

∈ R

×n

l+1

as discribed in Eq. (5). n

and n

l+1

are the number of nodes (clusters) in layer

l and layer l + 1 respectively (n

> n

l+1

). S

is used

to coarsen the graph step by step and ﬁnally obtains

the graph representation vector. Each row of S

(l)

in-

dicates a node (cluster) in layer l while each column

corresponds to a node (cluster) in layer l +1. Softmax

is performed in each row to indicate the probability of

a node (cluster) in layer l assigned to a cluster in next

layer l + 1.

(l)

= so f tmax(GCN

l,pool

(l)

, X

(l)

)) (5)

With the learned matrix S

(l)

, new embeddings of

the clusters in layer l + 1 is computed as Eq. (6) and

adjacency matrix of new coarsened graph in layer l +

1 is calculated as Eq. (7).

(l+1)

= S

(l)

(6)

(l+1)

= S

(l)

(7)

DIFFPOOL module can be simply summarized as

Eq. (8).

(l+1)

, X

(l+1)

) = POOL(A

(l)

, Z

(l)

) (8)

4 EXPERIMENTS

The organization of this section is consistent with the

process of our experiments. We ﬁrst introduce the

dataset in Sec. 4.1. and data preprocessing in Sec.

Weakly Supervised Gleason Grading of Prostate Cancer Slides using Graph Neural Network

429

4.2. The implementation details about our method is

described in Sec. 4.3. Finally, in Sec. 4.4, we present

the results of our method with detail analysis.

4.1 Dataset

All hematoxylin and eosin (H & E) stained prostate

cancer slides and their clinical GS are obtained from

an open database-the cancer genome atlas (TCGA)

(Weinstein et al., 2013), including histopathology

slides uploaded by 32 institutions that have been ac-

quired at 40x magniﬁcation. We train our model

and evaluate the performance using 406 high quality

slides selected from TCGA. Table 1 shows the num-

ber of prostate cancer slides used under different GS

during the experiments.

4.2 Data Preprocessing

Since prostate cancer slides with giga-pixel resolution

contain around 50% background regions, we shrink

the slides by a factor of 32 and threshold the fore-

ground pixels (e.g., tissue areas) using OTSU algo-

rithm (Otsu, 2007), which is suitable for tissue area

segmentation. Some prostate cancer slides may have

been contaminated by red, blue or green pen marks.

We ﬁlter R, G, B channels respectively with tens of

threshold values to create a mask for tissue area. Mor-

phological operations such as dilation and erosion are

conducted to ﬁll in small blanks and remove outliers.

Then, shrinked images are multiplied with their bi-

nary masks to generate tissue area (Fig. 2). Finally,

a set of patches with size 256*256 are extracted from

tissue area without overlap. Patches that contain less

than 70% tissue regions are discarded from analysis.

4.3 Implementation Details

The implementation details of our experiments are de-

scribed as follows.

(1) Parameter setting for training CNN models as fea-

ture extractor. During training, the batch size is set to

32 and SGD optimizer is used with an initial learn-

ing rate of 1e

−3

. Speciﬁcally, all the CNN models are

trained for 20 epochs using a warming up step in the

ﬁrst 2 epochs, which can further promote the accuracy

of classiﬁcation.

(2) Parameter setting for GNN based models. We set

3 GCN layers followed by 1 Pooling layer with an

assignment rate of 20%. All the GNN based models

are trained for 1000 epoches using 10-fold validation.

Finally, the batch size and initial learning rate during

training process is set to 20 and 1e

−3

respectively.

4.3.1 Patch Selection

In a histopathology slide, tumor area takes up only

a small ratio of the whole image, automatic Regions

Of Interests (ROIs) selection is crucial for Gleason

grading. In histopathology slides, tumor area means

active mitosis and more nuclei, which appears more

blue while non-tumor area appears more pink or red

(Chang et al., 2011). Blue ratio image (BR image)

corresponds well to this property and was used in pre-

vious research (del Toro et al., 2017) to select relevant

patches. BR value can be calculated as Eq. (9) where

Figure 2: Filtered images. The two images on the top are

original slides shrinked by a factor of 32x. The ﬁltered im-

ages are shown at the bottom.

R, G, B represent the pixel value of red, green, blue

channel respectively.

BR =

100 × B

1 + R + G

256

1 + B + R + G

(9)

We rank the patches extracted from one slide ac-

cording to their BR scores, which are calculated by

averaging BR value of every pixel. Usually, tumor

area only accounts for about 10% of the whole tissue

area, thus top 10% patches are regarded as cancerous.

To further reduce the computation cost, 1000 patches

are randomly selected for subsequent processing. For

the slides with less than 1000 cancerous patches, all

of them are accepted. Figure 3 shows an example heat

map created based on BR score, the number of nuclei

in the patch with high BR score is signiﬁcantly higher

than that in the patch with a lower BR score.

4.3.2 Color Normalization

Color variation is another factor that could damage

the accuracy of GS classiﬁcation (Abhishek et al.,

2016). Distinct tissue preparation, H & E stain re-

activity, and scanners produced by different man-

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

430

Table 1: The number of prostate cancer slides from TCGA under different GS.

GS 6(3+3) 7(3+4) 7(4+3) 8(4+4) 9 (4+5,5+4) 10(5+5)

#WSI 43 110 84 47 117 5

Figure 3: Patch Selection. The patch with high BR score

appear more blue due to active mitosis while that with low

BR score appear more pink.

ufacturers will result in color variations of digital

histopathology slides. Therefore, color normalization

is performed on selected patches using the color trans-

fer method (Reinhard et al., 2001), which converts

patches into a color template that is determined in

advance, to alleviate the damage of color variations.

Figure 4 shows the comparison of a patch before (left)

and after (right) color normalization.

Figure 4: Color normalization. Image on the left is the orig-

inal patch and the normalized patch is shown on the right.

4.3.3 CNN Feature Extractor

For graph reconstruction, we train a CNN classiﬁer

to extract patch features (Figure 5). To evaluate the

performance of different CNN architectures, VGG19

(Li et al., 2019), ResNet18, ResNet34, ResNet50 (He

et al., 2016) and DenseNet (Huang et al., 2016) are

used as backbones for classiﬁcation of G3 patches

versus G4 patches. Since our interest lies in classi-

ﬁcation of G3 + G4 and G4 + G3, we assume that

Table 2: Classiﬁcation accuracy of different CNN back-

bones.

Feature Extractor backbones Accuracy

VGG19 77.01%

GoogleNet 77.04%

ResNet18 88.27%

ResNet34 89.42%

ResNet50 85.46%

DenseNet 81.04%

labels of the patches selected from pure slides (e.g.,

G3 + G3 and G4 + G4) are consistent with G3 and

G4.

Table 2 compares the performance of VGG19 (Si-

monyan and Zisserman, 2014), GoogleNet (Szegedy

et al., 2014), ResNets (He et al., 2016) and DenseNet

(Huang et al., 2016). From table 2, we can see that

ResNets have a higher capability to learn useful in-

formation for classiﬁcation task. As the number of

CNN layer grows, the accuracy ﬁrst increases and

then starts to decrease. This is because more trainable

parameters yields overﬁtting. Therefore, the best per-

forming ResNet34 was choosen as our patch feature

extractor.

Figure 5: The training process of feature extractor. We

train the feature extractor using patches extracted from pure

slides (e.g., GS 6(3+3) and GS 8(4+4)).

4.3.4 Graph Reconstruction

We feed patches selected into ResNet34 to get 512

dimensional feature vectors. Then PCA is used to

compress the vectors into 32 dimension to reduce the

Weakly Supervised Gleason Grading of Prostate Cancer Slides using Graph Neural Network

431

Table 3: Test performance of different models for Gleason grading.

Models Accuracy Dataset F1 score classiﬁcation task

Nagpal et al. 70.0%

112 million patches

and 1490 slides

4 Gleason groups

Zhou et al. 75.0% TCGA - G3 + G4 vs.G4 + G3

GCN + Euclidean 75.3% TCGA 0.720 G3 + G4 vs.G4 + G3

GCN + Euclidean + DIFFPOOL 76.8% TCGA 0.741 G3 + G4 vs.G4 + G3

GCN + Mahalanobis 77.9% TCGA 0.774 G3 + G4 vs.G4 + G3

GCN + Mahalanobis + DIFFPOOL 79.5% TCGA 0.775 G3 + G4 vs.G4 + G3

Table 4: Results of classiﬁcation of low GS(e.g.,<=GS 7) vs. high GS (e.g., >= GS 8).

Models Accuracy Dataset F1 score #GCN layer

del Toro et al. 78.2% TCGA - -

GCN + Mahalanobis + DIFFPOOL 83.4% TCGA 0.820 3

computation cost. In addition, dense graphs will sig-

niﬁcantly increase the computation cost and sparse

graphs can not accurately model the correlation be-

tween patches. In order to construct graphs with ap-

propriate number of edges, distance threshold is set to

40% of the average distance between all patch pairs in

the edge generation module (e.g., Eq. (10)).

d = 0.4 ×

∑

i, j∈n

Dist(x

, x

)

(i 6= j)

(10)

Where d is the distance threshold and n denotes the

number of patches that selected from one prostate

cancer slide.

4.4 Results

In this study, we focus on the classiﬁcation of het-

erogeneous GS 7 and propose a GCN based weakly

supervised Gleason grading model. We construct

edges of graphs using Euclidean and Mahalanobis

distance metrics and train the models with GCN and

GCN+DIFFPOOL as backbones. Our models are

trained for 1000 epochs using 10-fold validation, the

best accuracy and F1-score of each fold are averaged

to obtain the ﬁnal results. All the results are shown in

table 3 and table 4.

In table 3, we show the performance of different

combinations and results of existing works. We can

see that all GCN based methods give better results

than (Zhou et al., 2017) and (Wang et al., 2018). This

is likely due to the fact that GCN can accurately cap-

ture the relationships among patches and the global

topological information, which are of great signiﬁ-

cance for Gleason grading task. GCN + Mahalanobis

+ DIFFPOOL achieves the best performance with an

accuracy of 79.5%. It approves that DIFFPOOL mod-

ule can help to learn meaningful node clusters by

pooling similar nodes together and obtain the accu-

rate graph representation hierarchically. Table 3 also

reveals that distance metrics make a difference on

the performance of classiﬁcation of GS 7. Methods

with Mahalanobis metric achieve better results than

those with Euclidean metric, this is because the Ma-

halanobis metric leverages correlations of attributes

of features by introducing the covariance matrix that

shows the correlations of attributes.

To further verify the effectiveness of our devel-

oped method, we apply our model on the classiﬁca-

tion of high GS (e.g., GS ¿= 8) vs. low GS (e.g., GS ¡=

7). The results are shown in table 4. We train the fea-

ture extractor using patches selected from the slides

with GS 6 (G3 + G3), GS 8 (G4 + G4) and GS 10

(G5 + G5). Since we have only 5 slides graded as GS

10, data augmentation including mirroring, random

cropping, rotation, and local warping is conducted.

To save more information about G5, in graph con-

struction process, we use high dimensional node em-

beddings and leave out the PCA process. An accu-

racy of 83.4% is achieved, which is superior to the

result 78.2% of (del Toro et al., 2017). As described

in related work, this is likely due to the fact that they

annotated the patches with slide-level labels directly,

which can seriously damage the accuracy of the clas-

siﬁcation of GS 7.

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

432

5 CONCLUSIONS

In this study, we introduce a GCN based model that is

capable of grading the heterogeneous prostate cancer

slides with GS 7 automatically. We construct prostate

cancer slides as graphs to model correlations among

patches and capture topological information of the

whole slides. By combining DIFFPOOL layer with

GCN layers, our method achieves an classiﬁcation

accuracy of 79.5%, which is superior to state-of-the-

art result on the dataset of TCGA. The reported re-

sults demonstrate efﬁciency of the proposed method,

which are consistent with our expectation.

ACKNOWLEDGEMENTS

This work was supported in part by the National Key

Research and Development Program of China un-

der Grant 2018YFC0910500, in part by the National

Natural Science Foundation of China under Grant

61906032, in part by the Liaoning Key R&D Program

under Grant 2019JH2/10100030, in part by the Liaon-

ing United Foundation under Grant U1908214, and in

part by the Fundamental Research Funds for the Cen-

tral Universities under Grant DUT20RC(4)005 and

DUT18RC(3)069.

REFERENCES

Abhishek, Vahadane, Tingying, Peng, Amit, Sethi, Shadi,

Albarqouni, Lichao, and and, W. (2016). Structure-

preserving color normalization and sparse stain sepa-

ration for histological images. IEEE transactions on

medical imaging.

Arvaniti, E., Fricker, K. S., Moret, M., Rupp, N. J., Her-

manns, T., Fankhauser, C. D., Wey, N., Wild, P. J.,

Ruschoff, J. H., and Claassen, M. (2018). Automated

gleason grading of prostate cancer tissue microarrays

via deep learning. Scientiﬁc Reports, 8(1):12054–

12054.

Bulten, W., Pinckaers, H., Van Boven, H., Vink, R., De Bel,

T., Van Ginneken, B., Jeroen, V. D. L., De Kaa, H. V.,

and Litjens, G. (2019). Automated gleason grading of

prostate biopsies using deep learning.

Chang, H., Loss, L. A., and Parvin, B. (2011). Nuclear seg-

mentation in h & e sections via multi-reference graph

cut ( mrgc ).

del Toro, O. J., Atzori, M., Ot

alora, S., Andersson, M.,

Eur

en, K., Hedlund, M., R

onnquist, P., and M

uller,

H. (2017). Convolutional neural networks for an au-

tomatic classiﬁcation of prostate tissue slides with

high-grade Gleason score. In Gurcan, M. N. and

Tomaszewski, J. E., editors, Medical Imaging 2017:

Digital Pathology, volume 10140, pages 165 – 173.

International Society for Optics and Photonics, SPIE.

Epstein and Jonathan, I. (2018). Prostate cancer grading:

a decade after the 2005 modiﬁed system. Modern

Pathology An Ofﬁcial Journal of the United States &

Canadian Academy of Pathology Inc, 31:S47.

Epstein, J. I., Egevad, L., Amin, M. B., Delahunt, B.,

Srigley, J. R., and Humphrey, P. A. (2015). The 2014

international society of urological pathology (isup)

consensus conference on gleason grading of prostatic

carcinoma deﬁnition of grading patterns and proposal

for a new grading system. The American Journal of

Surgical Pathology, 40(2):244–252.

Epstein, J. I., Zelefsky, M. J., Sjoberg, D. D., Nelson, J. B.,

Egevad, L., Magigalluzzi, C., Vickers, A. J., Parwani,

A. V., Reuter, V. E., Fine, S. W., et al. (2016). A

contemporary prostate cancer grading system: A vali-

dated alternative to the gleason score. European Urol-

ogy, 69(3):428–435.

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and

Dahl, G. E. (2017). Neural message passing for quan-

tum chemistry.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural Computation, 9(8):1735–1780.

Huang, G., Liu, Z., Laurens, V. D. M., and Weinberger,

K. Q. (2016). Densely connected convolutional net-

works.

Jian, Ren, Kubra, Karagoz, Michael, Gatza, David, J,

Foran, and and, X. (2018). Differentiation among

prostate cancer patients with gleason score of 7 us-

ing histopathology whole-slide image and genomic

data. Proceedings of SPIE–the International Society

for Optical Engineering.

Kallen, H., Molin, J., Heyden, A., Lundstrom, C., and As-

trom, K. (2016). Towards grading gleason score us-

ing generically trained deep convolutional neural net-

works. pages 1163–1167.

Khurd, P., Bahlmann, C., Maday, P., Kamen, A., Gibb-

sstrauss, S. L., Genega, E. M., and Frangioni, J. V.

(2010). Computer-aided gleason grading of prostate

cancer histopathological images using texton forests.

pages 636–639.

Li, J., Li, W., Gertych, A., Knudsen, B. S., Speier, W.,

and Arnold, C. W. (2019). An attention-based multi-

resolution model for prostate whole slide imageclassi-

ﬁcation and localization. arXiv: Computer Vision and

Pattern Recognition.

Moch, H., Cubilla, A. L., Humphrey, P. A., Reuter, V. E.,

and Ulbright, T. M. (2016). The 2016 who classiﬁca-

tion of tumours of the urinary system and male genital

organs—part a: Renal, penile, and testicular tumours.

European Urology, 70(1):93–105.

Nagpal, K., Foote, D., Liu, Y., Pohsuan, Chen, Wulczyn, E.,

Tan, F., Olson, N., Smith, J. L., Mohtashamian, A.,

et al. (2018). Development and validation of a deep

learning algorithm for improving gleason scoring of

prostate cancer. arXiv: Computer Vision and Pattern

Recognition.

Otsu, N. (2007). A threshold selection method from gray-

level histograms. IEEE Transactions on Systems Man

& Cybernetics, 9(1):62–66.

Weakly Supervised Gleason Grading of Prostate Cancer Slides using Graph Neural Network

433

Pinckaers, H., Bulten, W., Jeroen, V. D. L., and Litjens, G.

(2020). Detection of prostate cancer in whole-slide

images through end-to-end training with image-level

labels.

Reinhard, Erik, Ashikhmin, Michael, Shirley, Peter, Gooch,

and Bruce (2001). Color transfer between images.

IEEE Computer Graphics & Applications.

Ren, J., Hacihaliloglu, I., Singer, E. A., Foran, D. J., and Qi,

X. (2018). Adversarial domain adaptation for classiﬁ-

cation of prostate histopathology whole-slide images.

11071:201–209.

Siegel, R. L., Miller, K. D., and Jemal, A. (2017). Cancer

statistics, 2017. Ca A Cancer Journal for Clinicians,

67(1).

Simonyan, K. and Zisserman, A. (2014). Very deep con-

volutional networks for large-scale image recognition.

Computer ence.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,

Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-

novich, A. (2014). Going deeper with convolutions.

Wang, J., Chen, R. J., Lu, M. Y., Baras, A. S., and Mah-

mood, F. (2019). Weakly supervised prostate tma clas-

siﬁcation via graph convolutional networks. arXiv:

Computer Vision and Pattern Recognition.

Wang, P., Xiao, X., Glissen Brown, J. R., Berzin, T. M.,

Tu, M., Xiong, F., Hu, X., Liu, P., Song, Y., and

Zhang, D. a. (2018). Development and validation of

a deep-learning algorithm for the detection of polyps

during colonoscopy. Nature Biomedical Engineering,

2(10):741–748.

Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K.

R. M., Ozenberger, B., Ellrott, K., Sander, C., Stuart,

J. M., Chang, K., Creighton, C. J., et al. (2013). The

cancer genome atlas pan-cancer analysis project. Na-

ture Genetics, 45(10):1113–1120.

Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu,

P. S. (2019). A comprehensive survey on graph neural

networks. IEEE Transactions on Neural Networks and

Learning Systems.

Xu, H., Park, S., and Hwang, T. H. (2018). Automatic

classiﬁcation of prostate cancer gleason scores from

digitized whole slide tissue biopsies. bioRxiv, page

315648.

Zareba, P., Zhang, J., Yilmaz, A., and Trpkov, K. (2010).

The impact of the 2005 international society of uro-

logical pathology (isup) consensus on gleason grading

in contemporary practice. Histopathology, 55(4):384–

391.

Zhang, Y. H., Zhang, J., Song, Y., Shen, C., and Yang, G.

(2020). Gleason score prediction using deep learning

in tissue microarray image. arXiv e-prints.

Zhou, N., Fedorov, A., Fennessy, F. M., Kikinis, R., and

Gao, Y. (2017). Large scale digital prostate pathology

image analysis combining feature extraction and deep

neural network. arXiv: Computer Vision and Pattern

Recognition.

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

434