Shape-based Features Investigation for Preneoplastic Lesions on Cervical

Cancer Diagnosis

Daniela C. Terra

1,4 a

, Adriano C. Lisboa

2 b

, Mariana T. Rezende

3 c

, Claudia M. Carneiro

3 d

and Andrea G. C. Bianchi

4 e

Department of Computing, Federal Institute of Minas Gerais, Ouro Branco, MG, Brazil

Research Department, GAIA, Belo Horizonte, MG, Brazil

Clinical Analysis Department, Federal University of Ouro Preto, Ouro Preto, MG, Brazil

Department of Computing, Federal University of Ouro Preto, Ouro Preto, MG, Brazil

Keywords:

Cervical Cancer, Image Classiﬁcation, Morphological Features, Features Selection, XGBoost Classiﬁer.

Abstract:

The diagnosis of cervical lesions is an interpretative process carried out by specialists based on cellular in-

formation from the nucleus and cytoplasm. Some authors have used cell nucleus detection and segmentation

algorithms to support the computer-assisted diagnosis process. These approaches are based on the assump-

tion that the nucleus contains the most important information for lesion detection. This work investigates the

inﬂuence of morphological information from the nucleus, cytoplasm, and both on cervical cell diagnosis. Ex-

periments were performed to analyze 3,233 real cells extracting from each one 200 attributes related to size,

shape, and edge contours. Results showed that morphological attributes could accurately represent lesions in

binary and ternary classiﬁcations. However, identifying speciﬁc cell anomalies like Bethesda System classes

requires adding new attributes such as texture.

1 INTRODUCTION

Cervical cancer is the fourth most common cancer

in women after breast, colorectal, and lung cancer.

In 2018 about 570,000 women were diagnosed with

the cervical disease, and 311,000 of them died due to

malignancy tumors worldwide (Das, 2021). This oc-

curs even with slow progress from precursor lesions

to the cancer stage. Thus, the cure of malignancy

cases depends on a timely diagnosis or screening for

pre-neoplastic lesions. If detected early, the prognosis

can be substantially improved with effective treatment

(Williams, 2021).

A Pap smear is a cost-effective technique widely

used to prevent cervical cancer. Under the micro-

scope, professionals identifying suspicious cell struc-

tures following diagnosis protocols internationally

adopted such as the Bethesda System (Nayar and

Wilbur, 2015). The main disadvantage of such man-

https://orcid.org/0000-0002-2828-8275

https://orcid.org/0000-0001-5773-2200

https://orcid.org/0000-0002-9514-9312

https://orcid.org/0000-0002-6002-857X

https://orcid.org/0000-0001-7949-1188

ual analysis is the high rate of false negatives. Screen-

ing and diagnosis are subject to misinterpretation by

visual habituation and a need for expertise.

Computer-aided diagnostics can reduce errors and

increase productivity in cancer screening. Propos-

als for automated cytology include solutions to detect

(Diniz et al., 2021c; Li et al., 2021), segment (Umadi

et al., 2020; Teixeira et al., 2022; Zhao et al., 2022),

and automate the screening of cell lesions.

Automatic cervical lesion classiﬁcation follows

cell detection or segmentation. The solutions of-

ten employ features extraction related to cell size

and shape such as area, perimeter, elongation (ma-

jor/major axes), circularity, and nucleus-cytoplasm

ratios (Jantzen et al., 2005; Marinakis et al., 2009;

Chankong et al., 2014; Dong et al., 2020; Yakkundi-

math et al., 2022). Other works measures such as the

fractal dimension (Bhowmik et al., 2018), the rela-

tive position of the nucleus within cytoplasm (Mari-

arputham and Stephen, 2015), roughness index, the

standard deviation of radial distance and Fourier de-

scriptors (Zhang et al., 2014). Diniz et al. (2021b)

uses the CRIC base (Rezende et al., 2021) and tra-

ditional ML techniques to classify pre-neoplastic le-

506

Terra, D., Lisboa, A., Rezende, M., Carneiro, C. and Bianchi, A.

Shape-based Features Investigation for Preneoplastic Lesions on Cervical Cancer Diagnosis.

DOI: 10.5220/0011900800003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP, pages

506-513

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

sions using cell nuclei shape and texture features.

Classiﬁcation without segmentation is leveraged by

deep learning approaches (Dong et al., 2020; Ra-

haman et al., 2021). The work proposed by Diniz

et al. (2021a) achieved a high recall rate on detect-

ing cellular lesions using an ensemble of deep neural

networks tested with the CRIC dataset images.

This work proposes an investigation of the inﬂu-

ence of morphological cell attributes during the clas-

siﬁcation of cervical cell lesions. The extracted fea-

tures are related to the size, shape, and edge con-

tours calculated for the nucleus, the cytoplasm, and

both components. Our solution is based on tradi-

tional machine learning (ML) techniques to classify

cervical cells with or without (pre)neoplastic lesions.

We evaluated a binary classiﬁer (normal/abnormal), a

ternary classiﬁer (normal cell/low-grade lesions/high-

grade lesions), and an identiﬁer for the 6 (six) classes

of the Bethesda System for cytological diagnosis.

To the best of our knowledge, this is the ﬁrst ap-

proach to investigate the adequacy or limitation of

these attributes in automated diagnosis. The main

contributions of our investigation are:

• Exploring the potential of shape-based features on

discrimination of cervical cells lesions;

• Verify the effectiveness of Elliptic Fourier De-

scriptors (EFD) in this classiﬁcation process;

• Evaluate the proposed solution on real images of

conventional cytology;

• Analyzing the results of the shape-based classiﬁ-

cation at the level of cells (both nucleus and cyto-

plasm), only nuclei and cytoplasms.

The next section discusses our proposal in detail.

Section 3 presents experiments and results. Finally,

Section 4 reviews the proposed solution’s results.

2 METHODOLOGY

This section presents the materials and methods con-

sidered. Section 2.1 presents the database used in ex-

periments. Section 2.2 describes the extracted fea-

tures and the feature selection procedure. Section 2.3

explains the computational model built for the exper-

iments.

2.1 Dataset

In this work, we use the CRIC Cervix-Seg database

of conventional cytology (Rezende et al., 2021). The

database contains 3,224 segmented cellular nuclei

and cytoplasm from 400 real Pap smear images.

Figure 1: CRIC Cervix-Seg example for nuclei and cyto-

plasm segmentation.

Classiﬁcation and segmentation of cells were per-

formed according to Bethesda nomenclature and car-

ried out manually by experienced cytopathologists

from the Center for Recognition and Inspection of

Cells (CRIC) (see Figure 1).

The Cervix-Seg collection includes six (6) classes:

(a) negative for intraepithelial lesion or malignancy

(NILM); (b) atypical squamous cells of undetermined

signiﬁcance, possibly non-neoplastic (ASC-US); (c)

low-grade squamous intraepithelial lesion (LSIL); (d)

atypical squamous cells which cannot exclude high-

grade lesions (ASC-H); (e) high-grade squamous in-

traepithelial lesion (HSIL); and (f) squamous cell car-

cinoma (SCC).

Table 1 presents the classiﬁcation groups consid-

ered for computational experiments. Our model was

built to label cells considering the binary classiﬁca-

tion (normal and abnormal), the ternary classiﬁcation

(normal cells, low-grade lesions, and high-grade le-

sions), and the classiﬁcation based on the Bethesda

nomenclature (6 classes).

Table 1: Three classiﬁcation categories with the number of

class samples.

Binary Ternary Bethesda Nº of samples

Normal NILM 862

Abnormal

Low grade

ASC-US 286

LSIL 536

High grade

ASC-H 598

HSIL 874

SCC 77

Total: 3,233

For the binary categorization, the abnormal cells

comprise all Bethesda labels except NILM (nor-

mal). Another possible classiﬁcation is used to group

Bethesda categories into 3 (three) classes: normal

Shape-based Features Investigation for Preneoplastic Lesions on Cervical Cancer Diagnosis

507

cells (NILM), low-grade lesions (ASC-US and LSIL),

and high-grade lesions (ASC-H, HSIL, and SCC).

Low and high-grade groupings become important due

to different treatment protocols. For low-grade le-

sions, the follow-up requires a repeat screening. In

the case of high-grade lesions cells, patients should

undergo colposcopy and/or biopsy (Sung et al., 2021).

2.2 Shape-based Features

We used 200 features for each cell. The same 98 mea-

surements applied to the nucleus were calculated for

the cytoplasm. The remaining 4 comprise the two cel-

lular components. Features are related to the size,

shape, and edge contour of the nucleus (N) and the

cytoplasm (C), and some ratios between N and C:

• Size: area, bounding box, convex hull, perimeter,

equivalent diameter (circumference), minor and

major axis;

• Shape: circularity, compacity, eccentricity, con-

vexity, solidity, elongation, fractal dimension;

• Contour: roughness index, entropy, kurtosis, and

other statistics of normalized radial distance (from

the centroid to edge points). Also, the ﬁrst 20 co-

efﬁcients of the elliptic Fourier series (Kuhl and

Giardina, 1982);

• N/C relations: nucleus relative position (within

the cell), nucleus to cytoplasm ratios for the area,

perimeter, bounding box, and convex hull.

The Box Counting method was used to calculate

the fractal dimension (FD) for the cell components

(N and C) (Konatar et al., 2020). As known, the more

irregular the regions, the higher the FD value.

Elliptic Fourier coefﬁcients are also related to

edge contour irregularities in the frequency spectrum.

The EFD method is based on the string code (con-

nectivity 8) extracted from contour points of a region

(Kuhl and Giardina, 1982). We use the ﬁrst 20 EFD

coefﬁcients for later feature selection.

Roughness index and standard deviation of radial

distance were used for cervical cells by Zhang et al.

(2014). We calculated these features as described by

Po-Hsiang Tsui et al. (2010), as they are commonly

used in breast tumor detection.

2.2.1 Feature Selection

To improve performance and get some intuition about

the interpretability of the model, we select the most

relevant features for each estimator shown in Figure

At most, 30 attributes were chosen from 200 using

two methods: mutual information (MI) and simulta-

neous perturbation stochastic approximation (SPFSR)

(Akman et al., 2023).

The MI is a ﬁlter method based on the statistical

measure related to the joint entropy of the variables.

The SPFSR is based on stochastic simultaneous per-

turbation approximation. The SPFSR as a wrapper-

based proposal can be used with any classiﬁer or re-

gression to optimize a suitable performance metric. It

is a multivariate approach that considers the interac-

tions between features so that redundancies decrease

their scores.

2.3 Computational Model

Figure 2 presents the proposed model. The procedure

starts from the two .csv ﬁles. One ﬁle contains la-

bels for each cell/image along with the set of contour

points (x, y) of the nucleus (Figure 2, f1). The other

ﬁle is analogous to that of the nucleus for the cell’s

cytoplasm (Figure 2, f2). These points are the contour

of the manual segmentation made by cytopathologists

and used here to reconstruct the cells’ masks for the

nucleus and cytoplasm. From masks, all described

features are calculated (Figure 2, fx).

Before classiﬁcation, a selection of the most im-

portant features was considered for each model esti-

mator (Figure 2, fs). The ﬁltered set of features is used

as input to the model.

Cells were classiﬁed according to Table 1 in two

ways: with independent classiﬁers for 2, 3, and 6

classes (Figure 2, by estimators a1, a2, and a3, re-

spectively ) and with a hierarchical classiﬁer (Figure

2, by estimators b1, b2, b3.1, and b3.2). Our solution

implements a hierarchical classiﬁcation as proposed

by Diniz et al. (2021c).

A hierarchical categorization operates in levels.

The ﬁrst level deﬁnes a binary classiﬁer to distinguish

normal and abnormal cells (Figure 2, b1). Cells iden-

tiﬁed as abnormal are reclassiﬁed as low- or high-

grade lesions at the second level (Figure 2, b2). Fi-

nally, two third-level classiﬁers must identify speciﬁc

lesions according to Bethesda nomenclature (Figure

2, b3.1 and b3.2). In b3.1, low-grade lesions will be

differentiated as ASC-US or LSIL. In estimator b3.2,

cells with more severe anomalies will be categorized

into ASC-H, HSIL, or SCC.

The class predictions of both approaches are eval-

uated in the end through evaluation metrics (Figure 2,

ev). We employ commonly used metrics for classiﬁ-

cation as mentioned by Jiang et al. (2022): accuracy

(Acc), precision (P), recall (R), speciﬁcity (Spec), and

F1-score (F1).

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

508

Figure 2: Shape-based diagnostic solution: CRIC-Seg ﬁles (f1, f2), attribute extraction (fx) and selection (fs), standard

classiﬁcation (estimators a.1, a.2 and a.3) and hierarchical classiﬁcation (b1, b2, b3.1 and b3.2 estimators).

2.3.1 Learning Algorithms

Each classiﬁer component of Figure 2 is a traditional

machine learning (ML) algorithm: Support Vector

Machine (SVM), Random Forest (RF), and eXtreme

Gradient Boosting (XGBoost).

An SVM outputs an ordered sorted map based on

the training data using a subset of those points in

the decision function (called support vectors). Such

points deﬁne the best margins to separate samples in

classes in an n-dimensional space (Geron, 2022).

RF is a bagging method based on decision trees

(DC). It introduces randomness in selecting subsam-

ples and features from the data to build the trees. Re-

sults have a decreasing variation of the error (Geron,

2022).

XGBoost is another ensemble model based on

gradient boosting. Predictions are adjusted sequen-

tially after each weak estimator (for example, a shal-

low DC). The method improves performance, overﬁt-

ting, and other ﬂexibilities (Chen and Guestrin, 2016).

2.3.2 Oversampling

While the random over-sampler technique dupli-

cates some of the original samples, other techniques

build ’synthetic’ samples based on original examples

(Chawla et al., 2004). The Synthetic Minority Over-

sampling Technique (SMOTE) and the Borderline-

SMOTE are some of these methods (Chawla et al.,

2002; Han et al., 2005). They operate in the feature

space rather than at the data level (i.e., the image).

In the SMOTE method, oversampling is done by

taking an original sample from the minority class and

introducing a new sample considering any of its near-

est k neighbors through interpolation. The difference

between the Borderline-SMOTE method and the orig-

inal SMOTE is that the former restricts the original

samples of the minority class. Selected samples in

Borderline-SMOTE are those at the borderline be-

tween the minority class and the majority classes.

3 RESULTS AND DISCUSSIONS

Experiments were written in Python (version 3.9.1)

using well-known libraries for ML and data ma-

nipulation/visualization, such as scikit-learn, SciPy,

and scikit-image. Other modules were employed

for speciﬁc tasks. For data augmentation tech-

niques (SMOTE and Borderline-SMOTE) we ap-

ply the imbalanced-learn module (https://imbalanced-

learn.org/stable/install.html). The pyEFD package

was used to calculate the Elliptic Fourier coefﬁ-

cients (https://pyefd.readthedocs.io/en/latest/). The

implementation of the SPFSR method used is avail-

able at (https://github.com/akmand/spFSR). Codes

for the experiments described here are avail-

able at https://github.com/danielaterra/shape-based-

CervicalCellsClassiﬁer.

In all model executions, we used 10-fold cross-

validation with the data augmentation techniques de-

ﬁned in Section 2.3.2. Data augmentation was ap-

plied within each fold of the training data to equal the

number of instances with those of the majority class.

The results of the experiment show that the results of

SMOTE were slightly superior to those of Borderline-

Shape-based Features Investigation for Preneoplastic Lesions on Cervical Cancer Diagnosis

509

SMOTE. The evaluation metrics presented below are

the results of experiments using SMOTE.

3.1 Features Selection Procedure

Calling pyEFD method to retrieve the ﬁrst 20 EFD

coefﬁcients results in 40 values: 20 to x variations and

20 to y. All EFD coefﬁcients plus the other features

described in Section 2.2 were calculated from each

mask of the nucleus and cytoplasm (see Figure 2). At

most 30 features were selected from the total: 200

(when considering the nucleus and cytoplasm) or 98

for experiments applied to only one of these cellular

structures.

As mentioned in Section 2.2.1 two FS techniques

were used: MI and the SPFSR.

Table 2: Feature selection methods: binary results using MI

and SPFSR.

Acc P R Espec F1

SVM 0.94 0.969 0.954 0.91 0.961

RF 0.94 0.969 0.964 0.91 0.966

XGB 0.95 0.971 0.968 0.91 0.969

SPFSR

SVM 0.94 0.969 0.950 0.91 0.959

RF 0.94 0.968 0.963 0.90 0.965

XGB 0.94 0.967 0.961 0.90 0.964

Figure 3 presents SPFSR features selected for the

normal/abnormal classiﬁcation. The scores suggest

the most discriminatory attributes: 1) area of the

convex hull and the edges entropy of the nuclei; 2)

compacity and area of the cytoplasm (the larger, the

less circular or irregular); and (3) cells’ nucleus-to-

cytoplasm ratio (N/C).

Table 2 shows the classiﬁcation metrics of the bi-

nary prediction using both methods. As the values

were similar, hereafter the best prediction is shown.

3.2 Experiments

We performed 3 (three) tests with the proposed solu-

tions using a different set of features: 1) shape fea-

tures selected considering nucleus, cytoplasm, and

both (N/C); 2) nucleus shape features (N); and 3) cy-

toplasm shape features(C). The ﬁrst tests perform the

classiﬁcation using the standard solution.

3.2.1 Standard Classiﬁcation

Table 3 presents evaluation metrics for the nor-

mal/abnormal classiﬁcation using attributes for the

cell (N/C), the nucleus (N), and the cytoplasm (C).

Results for detections of normal cells, low-grade or

high-grade lesions are shown in Table 4. Table 5

presents the results for the 6 (six) classes detection.

Table 3: Binary classiﬁcation: from cells, nuclei, and cyto-

plasms.

Acc P R Spec F1

SVM 0.942 0.971 0.951 0.917 0.960

RF 0.952 0.970 0.966 0.912 0.967

N/C

XGB 0.954 0.971 0.968 0.917 0.969

SVM 0.877 0.952 0.880 0.870 0.912

RF 0.874 0.949 0.878 0.863 0.910

XGB 0.884 0.937 0.906 0.822 0.919

SVM 0.901 0.970 0.894 0.922 0.927

RF 0.835 0.939 0.830 0.849 0.870

XGB 0.884 0.944 0.897 0.847 0.916

Table 4: Ternary classiﬁcation: from cells, nuclei, and cy-

toplasms.

Acc P R Spec F1

SVM 0.936 0.939 0.936 0.968 0.935

RF 0.938 0.941 0.938 0.968 0.936

N/C

XGB 0.941 0.943 0.941 0.970 0.939

SVM 0.695 0.708 0.695 0.843 0.689

RF 0.713 0.724 0.713 0.852 0.709

XGB 0.711 0.720 0.711 0.848 0.707

SVM 0.915 0.917 0.915 0.957 0.913

RF 0.846 0.849 0.846 0.926 0.844

XGB 0.884 0.888 0.884 0.943 0.882

Table 5: Bethesda classiﬁcation: cells, nuclei, and cyto-

plasms.

Acc P R Spec F1

SVM 0.632 0.693 0.632 0.927 0.640

RF 0.658 0.700 0.658 0.931 0.663

N/C

XGB 0.682 0.698 0.682 0.935 0.682

SVM 0.442 0.482 0.442 0.888 0.437

RF 0.475 0.493 0.475 0.892 0.469

XGB 0.490 0.489 0.490 0.893 0.479

SVM 0.620 0.664 0.6202 0.924 0.623

RF 0.557 0.601 0.5577 0.910 0.563

XGB 0.619 0.634 0.6190 0.921 0.617

3.2.2 Hierarchical Classiﬁcation

As observed in Table 5 the model proved to be un-

feasible for a Bethesda diagnosis. Trying to improve

and analyze the results we implemented the hierar-

chical solution as depicted in Section 2.3. Tables 6-8

exhibit the results for the 2, 3, and 6 classes, respec-

tively. (Small differences in the results of tables 3 and

6 are due to the use of a 10-stratiﬁed fold considering

binary and Bethesda labels, respectively.)

Experiments conﬁrmed that the hierarchical solu-

tion did not solve class confusion for adequate predic-

tion of speciﬁc lesions according to Bethesda System.

Figure 4 shows the results of the hierarchical classiﬁer

in a confusion matrix. Note that most errors occurred

within the low-grade (ASCUS/LSIL) and high-grade

(ASCH/HSIL) lesion categories. Likewise, Carci-

noma diagnoses were more frequently confused with

ASCH and HSIL.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

510

Figure 3: SPFSR relative scores: features selected to normal/abnormal classiﬁer (Figure 2, a.1).

Table 6: Binary (hierarchical) classiﬁcation: from cells, nu-

clei, and cytoplasms.

Acc P R Spec F1

SVM 0.944 0.969 0.955 0.916 0.962

RF 0.961 0.968 0.978 0.913 0.973

N/C

XGB 0.960 0.969 0.977 0.915 0.973

SVM 0.879 0.950 0.881 0.872 0.914

RF 0.879 0.950 0.881 0.874 0.914

XGB 0.887 0.933 0.911 0.822 0.922

SVM 0.9122 0.968 0.910 0.917 0.938

RF 0.8413 0.933 0.843 0.835 0.886

XGB 0.8797 0.930 0.903 0.813 0.916

Table 7: Ternary (hierarchical) classiﬁcation: from cells,

nuclei, and cytoplasms.

Acc P R Spec F1

SVM 0.928 0.929 0.928 0.964 0.928

RF 0.945 0.945 0.945 0.971 0.945

N/C

XGB 0.944 0.944 0.944 0.971 0.944

SVM 0.688 0.695 0.688 0.842 0.683

RF 0.706 0.712 0.706 0.851 0.702

XGB 0.701 0.701 0.701 0.843 0.700

SVM 0.897 0.903 0.897 0.949 0.897

RF 0.836 0.847 0.836 0.921 0.831

XGB 0.885 0.886 0.885 0.944 0.885

Table 8: Bethesda (hierarchical) classiﬁcation: from cells,

nuclei, and cytoplasms.

Acc P R Spec F1

SVM 0.631 0.665 0.631 0.926 0.642

RF 0.685 0.698 0.685 0.935 0.689

N/C

XGB 0.688 0.684 0.688 0.935 0.685

SVM 0.425 0.458 0.425 0.885 0.416

RF 0.479 0.473 0.479 0.893 0.469

XGB 0.490 0.479 0.490 0.893 0.483

SVM 0.617 0.647 0.617 0.923 0.623

RF 0.593 0.596 0.593 0.915 0.585

XGB 0.629 0.622 0.629 0.923 0.625

Figure 4: Confusion matrix of a Bethesda classiﬁcation for

N/C features using Random Forest.

3.2.3 Fourier Coefﬁcients Results

Table 9 shows the results of running the model us-

ing only EFD features as cell shape descriptors. We

conﬁrm the irrelevance of these descriptors by again

running the solution with all other features except the

EFD coefﬁcients. The results are shown in Table 10.

As shown in Figure 3, EFD did not well appear to ex-

plain cervix lesions as received low scores by feature

selectors (e.g., SPFSR).

Table 9: EFD-based classiﬁcation results: 30 descriptors

from cells (binary).

Acc P R Spec F1

1. N/C (XGBoost) 0.78 0.86 0.84 0.61 0.85

Table 10: Classiﬁcation without EFD: from cells (binary).

Type Acc P R Spec F1

binary 0.95 0.97 0.96 0.91 0.96

1. N/C

Ternary 0.94 0.94 0.94 0.97 0.94

Shape-based Features Investigation for Preneoplastic Lesions on Cervical Cancer Diagnosis

511

3.3 Discussions

As Diniz et al. (2021c) pointed out, the recog-

nized correlation of cervical lesions with toxicolog-

ical changes in the nucleus allows an analysis based

only on this components to classify the degree of le-

sions. However, the results presented for a classiﬁca-

tion based on morphology suggest the inﬂuence of the

cytoplasm in the diagnosis. Furthermore, we observe

that:

• Morphology attributes, as proposed in this

work, can assist a cytopathologist’s ﬁnal diag-

nosis for binary (normal/abnormal) and ternary

(normal/low-/high grade) classiﬁcations. The F1-

score values from cell tests (N/C) are above 92%

(see Tables 3, 4, 6, and 7);

• The confusion matrix in Figure 4 conﬁrms that

most errors fall within subcategories of low/high-

grade lesions. Despite the failure, it is worth re-

membering that the same clinical procedure must

be applied in cases of ASC-US and LSIL (low

grade) and in cases of ASC-H and HSIL (high

grade).

• The nucleus/cytoplasm ratio attributes for the

area, perimeter, and convex hull received high

scores in the feature selection procedures. These

features always contributed to the prediction re-

sults for cells (N/C) (see Tables 3, 4, 6 and 7).

4 CONCLUSION

This is the ﬁrst work to validate a classiﬁcation based

only on morphological attributes. A model for classi-

fying cervical cell lesions was evaluated according to

Bethesda System’s diagnostic classes. We extract 200

features related to the size, shape, and edge contour of

each cell from a total of 3,233 samples from a real Pa-

panicolaou image dataset (CRIC Cervix-Seg). As can

be seen in Tables 9 and 10 the discrimination test re-

sults conﬁrm that Elliptic Fourier Descriptors (EFD)

as features showed a result lower than expected.

Table 11 presents our shape-based solution re-

sults compared to other works (Diniz et al., 2021b,a)

for cervix cell diagnosis. Both solutions employed

only cellular nuclei from image patches of the CRIC

dataset, and both performed texture analysis.

Our work suggests the dependence of other types

of attributes, such as the texture of the nucleus and

cytoplasm for discrimination of speciﬁc degrees of

lesions as the 6 (six) classes considered here from

Bethesda nomenclature. However, the proposed

morphological attributes play an important role in

Table 11: Comparison with the methods from the literature.

Method Nº classes Acc P R Spec F1

2 0.96 0.97 0.97 0.91 0.97

3 0.94 0.94 0.94 0.97 0.941.

6 0.68 0.68 0.68 0.93 0.68

2 0.95 0.95 0.95 0.95 0.95

3 0.96 0.96 0.96 0.97 0.962.

6 0.96 0.91 0.90 0.98 0.90

2 0.96 0.96 0.96 0.96 0.96

3 0.96 0.94 0.94 0.97 0.943.

6 0.95 0.85 0.85 0.97 0.85

1. Our proposal/ XGBoost,RF/ cells’morphology

2. Diniz et al. (2021b)/ RF / nuclei texture/shape

3. Diniz et al. (2021a)/ DNN/ cells’ image

binary (normal/abnormal) and ternary (normal/low

grade/high grade) classiﬁcation. As shown in Ta-

ble 11, the results of our shape-based proposal for 2

(two) and 3 (three) classes were comparable to exist-

ing works using the CRIC dataset.

ACKNOWLEDGEMENTS

Daniela C. Terra acknowledges the support of

the Federal Institute of Minas Gerais (IFMG).

We also acknowledge Federal University of Ouro

Preto (UFOP), FAPEMIG [APQ-00751-19, APQ-

01306-22, APQ-01518-21, PPSUS-FAPEMIG/APQ-

03740-17]; CNPq [303266/2019-8, 305895/2019-

2, 308947/2020-7]; Pr

o-Reitoria de Pesquisa, P

os-

Graduac¸

ao e Inovac¸

ao - PROPPI/UFOP [19/2020,

23109.000928/2020-33, and 23109.000929/2020-

88]; CAPES, and Ministry of Health [905103/2020].

REFERENCES

Akman, D. V., Malekipirbazari, M., Yenice, Z. D., Yeo, A.,

Adhikari, N., Wong, Y. K., Abbasi, B., and Gumus,

A. T. (2023). k-best feature selection and ranking via

stochastic approximation. Expert Systems with Appli-

cations, 213:118864.

Bhowmik, M. K., Roy, S. D., Nath, N., and Datta, A.

(2018). Nucleus region segmentation towards cervical

cancer screening using AGMC-TU Pap-smear dataset.

In ACM Int. Conf. Proceeding Series, pages 44–53,

New York, New York, USA. ACM Press.

Chankong, T., Theera-Umpon, N., and Auephanwiriyakul,

S. (2014). Automatic cervical cell segmentation and

classiﬁcation in Pap smears. Computer Methods and

Programs in Biomedicine, 113(2):539–556.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,

W. P. (2002). SMOTE: synthetic minority over-

sampling technique. Journ. of Art. Intel. Research,

16:321–357.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

512

Chawla, N. V., Japkowicz, N., and Kotcz, A. (2004). Edi-

torial: Special issue on learning from imbalanced data

sets. SIGKDD Explor. Newsl., 6(1):1–6.

Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable

Tree Boosting System. In Proceedings of the 22nd

ACM SIGKDD Int. Conf. on Knowledge Discovery

and Data Mining, pages 785–794, New York, NY,

USA. ACM.

Das, M. (2021). WHO launches strategy to accelerate

elimination of cervical cancer. The Lancet Oncology,

22(1):20–21.

Diniz, D. N., Rezende, M. T., Bianchi, A. G., Carneiro,

C. M., Luz, E. J., Moreira, G. J., Ushizima, D. M.,

de Medeiros, F. N., and Souza, M. J. (2021a). A deep

learning ensemble method to assist cytopathologists

in pap test image classiﬁcation. Journal of Imaging,

7(7).

Diniz, D. N., Rezende, M. T., Bianchi, A. G. C., Carneiro,

C. M., Ushizima, D. M., de Medeiros, F. N. S., and

Souza, M. J. F. (2021b). A Hierarchical Feature-Based

Methodology to Perform Cervical Cancer Classiﬁca-

tion. Applied Sciences, 11(9):2–19.

Diniz, D. N., Vitor, R. F., Bianchi, A. G. C., Delabrida,

S., Carneiro, C. M., Ushizima, D. M., de Medeiros,

F. N. S., and Souza, M. J. F. (2021c). An ensemble

method for nuclei detection of overlapping cervical

cells. Expert Systems with Applications, 185:115642.

Dong, N., Zhao, L., Wu, C., and Chang, J. (2020). Inception

v3 based cervical cell classiﬁcation combined with ar-

tiﬁcially extracted features. Applied Soft Computing,

93:106311.

Geron, A. (2022). Hands-on machine learning with Scikit-

Learn, Keras, and TensorFlow. O’Reilly Media, Inc.

Han, H., Wang, W.-Y., and Mao, B.-H. (2005). Borderline-

SMOTE: a new over-sampling method in imbalanced

data sets learning. In Int. conf. on intelligent comput-

ing, pages 878–887. Springer.

Jantzen, J., Norup, J., Dounias, G., and Bjerregaard, B.

(2005). Pap-smear Benchmark Data For Pattern Clas-

siﬁcation. Proc. NiSIS 2005, Albufeira, Portugal,

pages 1–9.

Jiang, H., Zhou, Y., Lin, Y., Chan, R. C., Liu, J., and Chen,

H. (2022). Deep learning for computational cytology:

A survey. Medical Image Analysis, page 102691.

Konatar, I., Popovic, T., and Popovic, N. (2020). Box-

Counting Method in Python for Fractal Analysis of

Biomedical Images. 2020 24th Int. Conf. on Informa-

tion Technology, IT 2020, (February).

Kuhl, F. P. and Giardina, C. R. (1982). Elliptic fourier fea-

tures of a closed contour. Computer Graphics and Im-

age Processing, 18(3):236–258.

Li, X., Xu, Z., Shen, X., Zhou, Y., Xiao, B., and Li, T.-

Q. (2021). Detection of Cervical Cancer Cells in

Whole Slide Images Using Deformable and Global

Context Aware Faster RCNN-FPN. Current Oncol-

ogy, 28(5):3585–3601.

Mariarputham, E. J. and Stephen, A. (2015). Nominated

Texture Based Cervical Cancer Classiﬁcation. Com-

putational and Mathematical Methods in Medicine,

2015:1–10.

Marinakis, Y., Dounias, G., and Jantzen, J. (2009). Pap

smear diagnosis using a hybrid intelligent scheme fo-

cusing on genetic algorithm based feature selection

and nearest neighbor classiﬁcation. Computers in Bi-

ology and Medicine, 39(1):69–78.

Nayar, R. and Wilbur, D. C. (2015). The bethesda system for

reporting cervical cytology: Deﬁnitions, criteria, and

explanatory notes. Springer International Publishing.

Po-Hsiang Tsui, Yin-Yin Liao, Chien-Cheng Chang, Wen-

Hung Kuo, King-Jen Chang, and Chih-Kuang Yeh

(2010). Classiﬁcation of Benign and Malignant Breast

Tumors by 2-D Analysis Based on Contour Descrip-

tion and Scatterer Characterization. IEEE Trans. on

Med. Imaging, 29(2):513–522.

Rahaman, M. M., Li, C., Yao, Y., Kulwa, F., Wu, X., Li, X.,

and Wang, Q. (2021). DeepCervix: A deep learning-

based framework for the classiﬁcation of cervical cells

using hybrid deep feature fusion techniques. Comput-

ers in Biology and Medicine, 136:104649.

Rezende, M. T., Silva, R., Bernardo, F. d. O., Tobias,

A. H., Oliveira, P. H., Machado, T. M., Costa, C. S.,

Medeiros, F. N., Ushizima, D. M., Carneiro, C. M.,

and Bianchi, A. G. (2021). Cric searchable image

database as a public platform for conventional pap

smear cytology data. Scientiﬁc Data, 8(1):151.

Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjo-

mataram, I., Jemal, A., and Bray, F. (2021). Global

cancer statistics 2020: Globocan estimates of in-

cidence and mortality worldwide for 36 cancers in

185 countries. CA: a cancer journal for clinicians,

71(3):209–249.

Teixeira, J. B. A., Rezende, M. T., Diniz, D. N., Carneiro,

C. M., Luz, E. J. d. S., Souza, M. J. F., Ushizima,

D. M., de Medeiros, F. N. S., and Bianchi, A. G. C.

(2022). Segmentac¸

ao autom

atica de n

ucleos cervicais

em imagens de Papanicolaou. In Anais do XXII Simp.

Bras. de Computac¸

ao Aplicada

a Sa

ude, pages 346–

357. Soc. Bras. de Computac¸

ao.

Umadi, A., Nagarajan, K., Venkatesha, J. B., Ganesh, A.,

and George, K. (2020). Automated Segmentation of

Overlapping Cells in Cervical Cytology Images Using

Deep Learning. In 2020 IEEE 17th India Council Int.

Conf.e, INDICON 2020, pages 1–7. IEEE.

Williams, A. (2021). Cervical cancer: what’s new in

squamous cell neoplasia. Diagnostic Histopathology,

27(12):478–482.

Yakkundimath, R., Jadhav, V., Anami, B., and Malvade,

N. (2022). Co-occurrence histogram based ensem-

ble of classiﬁers for classiﬁcation of cervical cancer

cells. Journal of Electronic Science and Technology,

20(3):100170.

Zhang, L., Kong, H., Ting Chin, C., Liu, S., Fan, X., Wang,

T., and Chen, S. (2014). Automation-assisted cervi-

cal cancer screening in manual liquid-based cytology

with hematoxylin and eosin staining. Cytometry Part

A, 85(3):214–230.

Zhao, Y., Fu, C., Xu, S., Cao, L., and Ma, H.-f.

(2022). LFANet: Lightweight feature attention net-

work for abnormal cell segmentation in cervical cy-

tology images. Computers in Biology and Medicine,

145:105500.

Shape-based Features Investigation for Preneoplastic Lesions on Cervical Cancer Diagnosis

513