Technological Model using Machine Learning Tools to Support Decision

Making in the Diagnosis and Treatment of Pediatric Leukemia

Daniel Mendoza-Vasquez, Stephany Salazar-Chavez and Willy Ugarte

Universidad Peruana de Ciencias Aplicadas (UPC), Lima, Peru

Keywords:

Model, Machine Learning, Leukemia, Decision Tree, Medical Assistance.

Abstract:

In recent years, multiple applications of machine learning have been visualized to solve problems in different

contexts, in which the health ﬁeld stands out. That is why, based on what has been previously described, there

is a wide interest in developing models based on machine learning for the creation of solutions that support

medical assistance for disease such as pediatric cancer. Our work deﬁnes the proposal of a technological

model based on machine learning which seeks to analyze the input medical data to obtain a predictive result,

oriented to support the decision making of the specialist physician in relation to the diagnosis and treatment

of pediatric leukemia. For the evaluation of the proposed model, a web validation system was developed that

communicates with a service hosted on a cloud server which performs the predictive analysis of the inputs

entered by the physician. As a result, an accuracy rate of 92.86% was obtained in the diagnosis of pediatric

leukemia using the multiclass boosted decision tree classiﬁcation algorithm.

1 INTRODUCTION

Pediatric cancer is one of the leading causes of death

among children and adolescents worldwide. This dis-

ease affects everyone and can appear in any part of

the body

. In addition, according to the World Health

Organization (WHO) in its report made in 2018, it re-

ports that in underdeveloped countries there is a lower

survival rate of pediatric cancer due to the inability to

obtain an accurate diagnosis, deaths from toxicity due

to side effects caused by treatments and the lack of es-

sential technological solutions

Thus, it is detailed that, in Peru, 70% of pediatric

cancer cases are diagnosed late, which hinders the de-

velopment of a treatment. In addition, it is speciﬁcally

detailed that the mortality rate of pediatric cancer is

higher than 50%

Currently, to diagnose and treat pediatric

leukemia, specialists in pediatric oncology go

through a complex process of integration of medical

aspects to determine the recovery process, which

includes radiotherapies, surgeries, physical examina-

tions, among others. Consequently, there is an impact

https://orcid.org/0000-0002-7510-618X

PAHO “Childhood and Adolescence Cancer” - https:

//bit.ly/3jREaO6

WHO “Childhood cancer” - https://bit.ly/3g0g1nA

Gestion Newspaper - https://bit.ly/2Pa3weL

on the patient’s health, since the procedure has a long

duration (Marti-Bonmati et al., 2020). In addition,

on the technological side, there are challenges such

as the selection of the most appropriate classiﬁcation

algorithm to perform a predictive analysis and pattern

identiﬁcation to develop an alternative solution for

the detailed problem.

Nowadays, there are projects that are developing

models and solutions using technologies such as ma-

chine learning with cloud computing, applying dif-

ferent approaches, for example, the reading of med-

ical images through computer vision for the detec-

tion of tumors, development of platforms that imple-

ment biomarkers to support diagnosis and treatment,

among others (Marti-Bonmati et al., 2020; Fathi et al.,

2020; Verda et al., 2019; Chaber et al., 2021).

Equally important, third level hospitals in Peru

do not have the appropriate technology to optimize

the time for diagnosis and treatment of pediatric

leukemia, resulting in delayed evaluation and care

As a consequence, the patient’s recovery is compli-

cated with little chance of recovery. With new trends

and technologies, learning models make it possible to

analyze this information using algorithms. Therefore,

these models can be leveraged with information for

analysis and prediction in pediatric leukemia.

Health Ministry of Peru - https://bit.ly/2XmHF7M

346

Mendoza-Vasquez, D., Salazar-Chavez, S. and Ugarte, W.

Technological Model using Machine Learning Tools to Support Decision Making in the Diagnosis and Treatment of Pediatric Leukemia.

DOI: 10.5220/0010684600003058

In Proceedings of the 17th International Conference on Web Information Systems and Technologies (WEBIST 2021), pages 346-353

ISBN: 978-989-758-536-4; ISSN: 2184-3252

Machine learning is a process that uses automatic

models that allow learning without any direct instruc-

tion. These learning models perform training that can

last from minutes to hours (Rajagopal et al., 2020).

Learning models can be adapted to different situa-

tions through classiﬁcation and prediction. Thus, this

technology has been used to classify and predict the

different subtypes of leukemia. Thus, based on anal-

ysis and research, the boosted decision tree algorithm

was selected to predict the diagnosis according to the

type of pediatric leukemia. Consequently, this alter-

native solution will support the decision making of

the pediatric oncology specialist, since, when making

consultations, based on the history, signs, symptoms

and laboratory results, in a system based on machine

learning tools, a predictive result can be obtained to

indicate whether the patient has the disease or not.

Our contributions are as follows:

- A tabular dataset has been constructed with input

parameters such as medical history, signs, symptoms

and laboratory results for diagnosis and treatment.

- A machine learning model has been designed and

trained to be able to perform the classiﬁcation accord-

ing to the type of pediatric leukemia.

- A validation system has been developed with the aim

of verifying how our proposal supports the pediatric

oncology specialist in the diagnosis and treatment of

pediatric leukemia.

This paper is has been organized as follows: Sec-

tion 2 presents a brief theoretical framework of the

most important concepts of our approach. Section 3

describes in detail our proposed technological model

by explaining all the concepts involved in its devel-

opment. Subsequently, in Section 4, we present the

experiments carried out for the validation of our re-

search and the results obtained. Section 5 gives de-

tails about the different existing research, and how in

contrast with our proposal to generate a new solution.

Finally, in Section 6, we present the conclusions and

perspectives.

2 CONTEXT

In this section, the main requiered concepts for our

approach will be introduced.

2.1 Pediatric Leukemia

This disease is among the most common cancers with

the highest number of cases. Pediatric leukemia is a

disease that originates in the bone marrow where new

blood cells are formed and it has 4 types:

1. Acute Lymphoblastic Leukemia (ALL) is the

most common cancer among children and the most

frequent cause of death. This type of leukemia

presents an abnormal increase of lymphoblasts in the

patient (O’Brien et al., 2018).

2. Chronic Lymphocytic Leukemia (CLL) is a ma-

lignant neoplasm in which small, mature-appearing

lymphocytes accumulate in the blood, bone marrow

and lymphoid tissues.

3. Acute Myeloid Leukemia (AML) is a form

of cancer characterized by the inﬁltration of bone

marrow, blood and other tissues by proliferative

hematopoietic systems cells (Naymagon et al., 2021).

4. Chronic Myeloid Leukemia (CML) represents

2% leukemias in children and adolescents. This dis-

ease presents with an increase in white blood cells and

large spleen size (Hijiya and Suttorp, 2019).

Figure 1: Microphotographs of the different type of

leukemias (Kaplan, 2019).

Figure 1 shows the leukemic cells according to

their type. In Figure 1A, ALL cells are evidenced

where the large size of the nucleus to the cytoplasm is

observed. In Figure 1B, CLL cells where the enlarge-

ment of the cytoplasm is visualized. In Figure 1C,

AML cells are observed where the enlargement of the

cytoplasm with prominent granules can be seen. In

Figure 1D, CML cells where the number of myeloid

progenitors in different stages can be observed.

2.2 Cloud Computing

Cloud computing is the compilation of integrated and

networked hardware, software and internet infrastruc-

ture (Mathew et al., 2017).

1. API services (application programming inter-

faces) are interfaces that provide the program with the

interaction with another system for data exchange.

2. Azure functions is a cloud service that provides

infrastructure and resources for executings apps.

Technological Model using Machine Learning Tools to Support Decision Making in the Diagnosis and Treatment of Pediatric Leukemia

347

2.3 Machine Learning

This discipline belongs to artiﬁcial intelligence de-

signed to create new systems that learn automati-

cally without human intervention. Machine learn-

ing (ML) is a method of data analysis that auto-

mates the constructions of analytical models where

this model can learn from experience to improve its

performance (Gonz

alez et al., 2020).

A ML model has the following components:

1. Dataset is a collection of data that is made up of

values from a table, which is part of database.

2. A machine learning model is a ﬁle that has been

trained to recognize certain types of patterns.

3. A Decision tree is a classiﬁcation technique is a

tree structured representation where the nodes rep-

resent an attribute, the branches symbolize the test

output and the ﬁnal nodes are the classiﬁcation re-

sult (Gonzalez et al., 2019).

Figure 2: Decision tree classiﬁcation model (Bi et al.,

2019).

Figure 2 shows a case of a classiﬁcation deci-

sion tree to predict a binary outcome for type 2 di-

abetes mellitus. Also, it should be noted that several

works (Rodin et al., 2021; Zhao et al., 2021; Aslam

et al., 2021) using this technique have used with dif-

ferent types of cancer providing a diagnostic result.

2.4 Validation

A the validation system has been built for the pediatric

oncology specialist to provide support on the patient’s

diagnosis. In this way, input variables have been de-

ﬁned that allow the predictive analysis to obtain a pos-

itive or negative result of suffering the disease.

2.4.1 System Development

- Framework: It is a framework that enables software

development. Therefore, Angular has been selected

as the framework for this project.

- Programming Language: It is a set of instructions

or actions designed for the execution of the system.

For this purpose, TypeScript has been used since it is

open source and based on JavaScript.

- Execution Environment: The execution environ-

ment used for the development of the project is

Node.js, which is oriented to the interpretation of the

JavaScript language.

- Versioning: The versioning and hosting of the sys-

tem code has been done in the GitHub platform since

it allows us to have a version control.

2.4.2 Input Parameters

For the development of the system, input parameters

classiﬁed by medical history, signs, symptoms, and

laboratory results have been deﬁned to perform the

predictive analysis with the objective of obtaining a

result as a support suggestion in the diagnosis and

treatment of pediatric leukemia.

- Medical History: The history is focused on the col-

lection of patient information. Among the recorded

histories of pediatric leukemia is DNA lesion, previ-

ous diseases, exposure to X-rays, chemotherapy, ra-

diotherapy, among others.

- Signs: Signs are diagnosed during a physical exami-

nation with the patient. Among the signs that special-

ist identiﬁes in pediatric leukemia are pallor, enlarged

lymph nodes, fever, weight loss, appearance of red

spots on the skin (petechiae), among others.

- Symptoms: The patient’s symptoms are those ail-

ments that the patient suffers from. The identiﬁed

symptoms that the patient commonly suffers with the

leukemia disease are extreme tiredness, night sweats,

dizziness, blurred vision, among others.

- Laboratory Results: Laboratory tests analyze sam-

ples of blood or body tissue. With these results, the

physician analyzes and determines the patient’s cur-

rent condition. For example, among the tests per-

formed on the patient are the number of red and white

blood cells, full blood count and others.

Figure 3: Integrated architecture of proposed solution.

WEBIST 2021 - 17th International Conference on Web Information Systems and Technologies

348

Figure 4: Proposed technological model for decision support in the diagnosis and treatment of pediatric leukemia.

3 TECHNOLOGICAL MODEL

FOR PEDIATRIC LEUKEMIA

Based on the analysis of the current situation in Peru

regarding the diagnosis and treatment of pediatric

leukemia, the following solution proposal has been

developed with the objective of providing support in

decision making to the medical specialist, optimizing

the time in the diagnosis and treatment of this disease.

For this reason, the following paragraphs will detail

the development of the proposal.

3.1 Integrated Architecture

The design of the integrated architecture that allows

visualizing the integration of the development of the

validation system to the technological model has been

carried out. In this way, the pediatric oncology spe-

cialist has been identiﬁed as the main actor. This ac-

tor accesses the validation system with the objective

of obtaining a diagnosis and treatment suggestion for

pediatric leukemia (see Section 2.1).

Figure 3 deﬁnes the integrated architecture where

the specialist enters through a device (computer) to

the validation system via an Internet connection. The

main actor visualizes the diagnosis and treatment

module to obtain a predictive analysis report using the

selected variables: history, signs, symptoms, and lab-

oratory results (see Section 2.3) according to the type

of leukemia (see Section 2.1) that the patient possibly

suffers from. Also, there are two environments in our

research proposal. The ﬁrst environment is the local

infrastructure where it is conformed with the front-

end of the validation system. Thus, this system has

the diagnosis and treatment and reporting modules. It

should be noted that in the reports the percentage of

accuracy obtained with the machine learning model

is displayed. The second environment is the cloud in-

frastructure, where it is conformed by the Azure ser-

vices used for the development of the model.

It should be noted that the information is kept se-

cure through services offered by Azure platform such

as data encryption, access management and imple-

mentation of audits to ensure compliance with the se-

curity policies established by the client.

3.2 Technological Model

The following technological model (see Figure 4) has

been developed to improve the accuracy in the diag-

nosis and treatment of pediatric leukemia. The objec-

tive of this model is to optimize and improve medical

care for the patient. For this reason, the following

modules have been determined.

1. Input: This phase consists of three modules:

- The data repository is focused on the patient’s med-

ical data and important documents.

- The medical consultation is focused on the medical

evaluation of the patient.

- The medical examinations that the patient must un-

dergo to determine his or her diagnosis.

2. Transformation: This phase is focused on data

transformation. Therefore, it consists of two phases:

The ﬁrst module is the data loading where the prepa-

ration of the information in the previous phase is per-

formed. The second module is the data transforma-

tion, which adapts the data conversion for its analysis.

3. Analysis: This phase is oriented to the analysis

of data collected in the previous phase. Therefore,

two modules have been deﬁned in this stage: The data

analysis module is focused on the veriﬁcation of these

obtained from the patient. The second module is the

validation of the data, where the correctness of the

data is conﬁrmed.

4. Prediction: This phase is focused on performing

pediatric leukemia diagnosis and treatment prediction

using Azure Machine Learning tool using supervised

classiﬁcation algorithms.

5. Results: This phase is focused on visualizing the

reports with the results obtained through the predic-

Technological Model using Machine Learning Tools to Support Decision Making in the Diagnosis and Treatment of Pediatric Leukemia

349

tive analysis model. Thus, it has been segmented into

the following modules. First, there is the web plat-

form, which is the medium where the specialist in-

teracts to identify and visualize the results obtained.

Secondly, it is focused on the reports where the accu-

racy obtained for the diagnosis of pediatric leukemia

is detailed. Finally, there is the deﬁnitive diagnosis

where it is indicated whether the patient may suffer

from this disease based on the predictive analysis per-

formed.

6. Treatment. This last phase consists of providing

treatment suggestions based on the results of the diag-

nosis provided by the predictive analysis tool. Thus,

the specialist analyzes the suggestions and determines

the ideal treatment for the patient’s recovery.

3.3 Development of the Predictive

Analytics Model

Based on the design of the proposed technological

model, its development was carried out at a functional

level through the Azure Machine Learning cloud plat-

form, which offers tools as a service could allow ob-

taining a predictive result in relation to the diagnosis

of pediatric leukemia. Thus, the following lines will

detail the process of creating the predictive model:

Firstly, a resource group was created, which refers

to a container of services provided by the Azure

Cloud Platform, which contains virtual networks, vir-

tual machines, coding services, machine learning and

other. In this way, we started by creating a dataset in

a spreadsheet, with the values of each input variable.

Also, it is highlighted that the input medical parame-

ters represented by general analysis, signs, symptoms,

and laboratory results were extracted from research

and reports of formal medical institutions.

Secondly, after having the dataset, we proceeded

to the creation of the environment where the predic-

tive analysis is going to be performed, for which rea-

son we used the service called process, for the gener-

ation of a resource represented by a virtual machine

which will allow the training of the model.

Thirdly, we proceed to the design of the predic-

tive model, where through the components provided

by Azure such as machine learning algorithms, data

analysis, statistical functions, training model and oth-

ers, it is possible to establish the ﬂow that will allow to

obtain a result. Therefore, the ﬁrst component is the

parameterized dataset, then there is the data normal-

ization component to allow the algorithm to model the

data correctly. Then, the training and test data were

separated (with the split data component). After, the

MultiClass Boosted Decision Tree algorithm was se-

lected to perform the prediction, since there are four

types of leukemia to be classiﬁed. Next, the train-

ing phase is performed, which allows learning from

patterns identiﬁed based on historical data sets (with

the train model component). Finally, a score and an

evaluation of the training (with the components score

model and evaluate model) were obtained, which will

be detailed in Section 4.

3.4 Validation System

To verify the performance of the model, a web sys-

tem was developed under a controlled environment

using the dataset with test cases established by the re-

searchers detailed in the previous section. As part of

our validation system, the pediatric oncology special-

ist selects the patient along with the type of leukemia

he/she may possibly have. In this way, the various

primers are displayed with the variables that the pa-

tient may present. As a result, the diagnostic re-

port is displayed indicating the patient’s status (pos-

itive or negative) with the percentage of accuracy or

false positives based on the analysis of the predictive

model developed in Microsoft Azure Machine Learn-

ing. Also, the validation system displays the “Treat-

ment” option when the result is positive.

4 EXPERIMENTS

In this section, the experiments carried out in this pa-

per will be explained, starting from the experimental

protocol and the results, as well as its discussion.

4.1 Experimental Protocol

For the development of our research project, several

activities have been carried out to obtain an expected

result. Figure 5 details the steps that have been carried

out for the implementation of our approach.

4.1.1 Dataset Creation

A dataset has been created to obtain the percentage of

accuracy in the diagnosis of the four leukemia sub-

types (see Section 2.1). Then, we have placed the

input values and generated the casuistry to train the

model and thus obtain a predictive result with a per-

centage of accuracy of having or not having the pe-

diatric leukemia disease. It should also be noted that

the input parameters were collected from medical re-

search and validated by a pediatric specialist.

WEBIST 2021 - 17th International Conference on Web Information Systems and Technologies

350

Figure 5: Proposed technological model for decision sup-

port in the diagnosis and treatment of pediatric leukemia.

4.1.2 Design and Training of the Predictive

Model

A model has been developed using the tools of the

Azure Machine Learning cloud platform, from which

training has been performed with the input values

with certain controlled casuistries to obtain the per-

centage of accuracy. Therefore, to train the model,

the parameterized dataset with the medical values de-

veloped in the previous phase was used. It should also

be noted that the following Azure services have been

used to train the model:

1. Process: This service allows to use the resource

as a destination to perform the training of the model

with the following speciﬁcations:

- Process name: ML1-Process

- Process type: Machine Learning compute

- Operating system: Linux 3.10 x86 64

- Processor: Intel Xeon E5-2673 @2.29GHz 4 cores

2. Pipelines: This Azure Service allows to organize

step by step the machine learning ﬂow in parallel with

the data processing with the following results:

- Training duration: 1 hour, 8 minutes.

- Process destination: ML1-Process.

- Accuracy: 92.86%

3. Validation System: This validation system has

been elaborated under the approach of a controlled

environment with the processing of simulated data to

obtain a predictive result provided by the Azure tool

with the following technologies:

- Framework: Angular

- Programming language: TypeScript

- Execution Environment: Node.js

- GitHub: https://bit.ly/2UhUqzm

In addition, to collect and process the data, an API

was developed using Azure Functions to enable com-

munication between the web app and the model.

4.2 Results

4.2.1 Treatment and Caring Time

In addition, in Lima, the delay from the ﬁrst symp-

toms of cancer to access to a hospital specialized in

pediatric oncology care was 81 days. Likewise, in the

regions of origin, the delay time from the appearance

of the ﬁrst symptoms was 63 days (142 days in ES-

SALUD). On the other hand, in Lima, the delay time

is 61 days (88 days in MINSA)

. It is evident that cur-

rently, the specialist has a delay time in the diagnosis

of pediatric leukemia of 1464 hours (61 days). There-

fore, based on the test performed together with ex-

perts, it can be pointed out that, the specialist will take

a maximum time of 10 additional minutes to obtain an

accurate suggestion to support decision making in the

diagnosis and treatment of pediatric leukemia.

4.2.2 Validation with Experts

For our work, validation has been carried out with ex-

perts in the ﬁeld of pediatric medicine and develop-

ment of health care systems. In this way, meetings

were held with experts where the research proposal

was detailed. For further details, please see the fol-

lowing link https://bit.ly/3zikrhq.

Likewise, the validation system has been pre-

sented to the experts indicating its functionalities to

demonstrate the predictive result in support of the di-

agnosis and treatment of pediatric leukemia. As a

result, the specialist has performed the correspond-

ing tests on the validation system in a time inter-

val of 8 to 10 minutes based on the developed test

plan that can be visualized in the following link https:

//bit.ly/3pFGiuI (in Spanish).

4.2.3 Algorithm Results

As part of the experiment of this research, different

supervision algorithms have been used to compare the

results obtained in relation to the percentage of accu-

racy. Table 1 shows the percentage obtained accord-

ing to the type of classiﬁcation algorithm.

Table 1: Accuracy score w.r.t the classiﬁcation technique.

Classiﬁcation algorithm Percentage of accuracy

Multiclass Decision Forest 23.61%

Multiclass Logistic Regression 33.33%

Multiclass Neural Network 85.71%

Multiclass Boosted Decision Tree 92.86%

Peruvian ombudsman - https://bit.ly/363qPf5

Technological Model using Machine Learning Tools to Support Decision Making in the Diagnosis and Treatment of Pediatric Leukemia

351

Thus, based on the experiment carried out, it was

validated that the result obtained by the algorithm

called “Multiclass Boosted Decision Tree” is the most

suitable for predictive analysis based on the input

medical parameters. Therefore, this algorithm was

selected to provide a suggestion result with a high de-

gree of accuracy, which supports the diagnosis and

treatment of pediatric leukemia.

4.2.4 Diagnostic Error Rate

On the other hand, the misdiagnosis rate has identi-

ﬁed opportunities to reduce the recovery rate of pedi-

atric cancer. Therefore, the key factor of the research

factor of the project is to focus on the lack of diagno-

sis or misdiagnosis since approximately 20% of chil-

dren are affected with this diagnosis

. Table 2 shows

the percentage of error represented by false positives

according to the classiﬁcation algorithm provided by

Microsoft Azure Machine Learning. As a result, it is

observed that the neural network algorithms and the

boosted decision tree algorithm have a lower percent-

age of error compared to the percentage of error in the

diagnosis of pediatric cancer.

Table 2: % of false positives w.r.t. classiﬁcation technique.

Classiﬁcation Algorithm Percentage of accuracy

Multiclass Decision Forest 76.39%

Multiclass Logistic Regression 66.67%

Multiclass Neural Network 14.29%

Multiclass Boosted Decision Tree 7.14%

Thus, it is demonstrated that with the neural net-

work and boosted decision tree algorithms there is a

lower percentage of false positives compared to the

diagnostic error rate of 20% detailed in the PAHO re-

port. Thus, based on the above, we have a viable solu-

tion to support specialists in making a better decision

in the deﬁnitive diagnosis of pediatric leukemia.

4.3 Discussion

At the end of the result stage, the following discus-

sions have become evident in relation to the devel-

opment of this research. Based on the development

of the technological model validation system, it was

possible to verify that the process of obtaining a pre-

dictive result in relation to the diagnosis and treat-

ment of pediatric leukemia has a time interval of be-

tween 8 and 10 minutes. In addition, to have the abil-

ity to determine the most appropriate pediatric onco-

Health Ministry of Peru “Early Diagnostic Guide Can-

cer in Children and Teens in Peru” - http://bvs.minsa.gob.

pe/local/MINSA/5044.pdf

logic treatment line for the patient’s recovery. The

ideal classiﬁcation algorithm for the development of

this project is the boosted decision tree algorithm base

on the comparison of the results of other algorithms

(see Table 1), since an accuracy percentage of 92.86%

was obtained. It should be noted that this percent-

age was obtained based on the use of the Microsoft

Azure Machine Learning platform, so this research

seeks those other studies use different platforms and

design multiple machine learning models to obtain

better results. The present research proposal aims to

support the process of diagnosis and treatment of pe-

diatric leukemia by reducing the percentage of mis-

diagnosis to increase the recovery rate of the disease.

For this reason, the results of false positives have been

analyzed and detailed to solve the problem of misdi-

agnosis (see Table 2).

5 RELATED WORKS

In (Fathi et al., 2020), the authors propose an expert

system based on neural networks with the aim of per-

forming the prognosis and classiﬁcation of the type

of leukemia in children according to complete blood

count test, ANFIS (Artiﬁcial Neural Network Fuzzy

Inference System), GMDH and metaheuristic algo-

rithms. Therefore, this proposal is conformed in the

collection of data and samples, training and veriﬁca-

tion of answers, data classiﬁcation and data division.

As a part of the study results, it has been demon-

strated that there are limitations in the separation of

cancer types due to the high percentage of error (Fathi

et al., 2020). In relation to our proposal, our model

is focused on the patient’s symptomatology as input

values to make the diagnosis of pediatric leukemia

according to its type allowing to visualize treatment

suggestions for the early recovery of the patient.

In (Verda et al., 2019), the authors detail the devel-

opment of Logic Learning Machine (LLM) in order

to perform gene expression data analysis for pediatric

cancer diagnosis. In this research, a comparison has

been made with existing supervised analysis meth-

ods such as decision tree (DT), artiﬁcial neural net-

work (ANN) and k-means classiﬁer algorithms. For

this purpose, the authors have used a dataset of eight

databases including cancer cell types. In contrast to

our research proposal, our project is focused on the

four subtypes of pediatric leukemia using the multi-

class classiﬁcation algorithm. To make the diagnosis

of this disease, we have analyzed the patient’s history,

signs, symptoms, and laboratory results without fo-

cusing on the patient’s genetics. In (Chaber et al.,

2021), the authors mention the use of the Fourier

WEBIST 2021 - 17th International Conference on Web Information Systems and Technologies

352

transform infrared spectroscopy (FTIR) tool as it is

a fast and cost-effective method, which allows early

detection of cancer-speciﬁc chemical changes in tis-

sues, cells and bioﬂuids. In this research, it is pro-

posed to use this tool for diagnosis in pediatric acute

lymphoblastic leukemia using blood samples. As part

of the experiment, an evaluation has been performed

with 10 patients with this disease. Given this, the

authors developed a predictive model based on Ad-

aBoost with a percentage of 85% accuracy. In con-

trast to our approach, a technological model has been

developed focused on the diagnosis and treatment of

the four subtypes of pediatric leukemia through labo-

ratory results, symptoms, signs, and general medical

aspects of the patient. From which, an accuracy of

92.86% was obtained.

6 CONCLUSIONS

A machine learning model trained with a dataset in

a tabular manner with medical history, symptoms,

signs, and laboratory results has been developed to

be able to identify whether the patient has high prob-

ability in suffering from pediatric leukemia disease.

It has been shown that the multiclass boosted deci-

sion tree algorithm has a high percentage of accuracy

(92.86%) for obtaining a predictive result of sugges-

tion to support the diagnosis and treatment of pedi-

atric leukemia. In addition, there is evidence of an

opportunity to reduce the misdiagnosis results from

the solution, since a lower percentage of false posi-

tives (7.14%) was obtained.

An interesting future work can be the analysis

of information such as medical history (with ma-

chine learning), medical images (with computer vi-

sion) or adding new modules to complement and

increase knowledge and support the recovery rate

of pediatric cancer disease or its protection with

blockchain (Arroyo-Mari

nos et al., 2021).

REFERENCES

Arroyo-Mari

nos, J. C., Mejia-Valle, K. M., and Ugarte, W.

(2021). Technological model for the protection of ge-

netic information using blockchain technology in the

private health sector. In ICT4AWE.

Aslam, M., Xue, C., Chen, Y., Zhang, A., Liu, M., Wang,

K., and Cui, D. (2021). Breath analysis based early

gastric cancer classiﬁcation from deep stacked sparse

autoencoder neural network. Scientiﬁc Reports, 11.

Bi, Q., Goodman, K., Kaminsky, J., and Lessler, J. (2019).

What is machine learning: a primer for the epidemiol-

ogist. American journal of epidemiology, 188.

Chaber, R., Kowal, A., Jakubczyk, P., Arthur, C., Łach,

K., Wojnarowska-Nowak, R., Kusz, K., Zawlik, I.,

Paszek, S., and Cebulski, J. (2021). A preliminary

study of ftir spectroscopy as a potential non-invasive

screening tool for pediatric precursor b lymphoblastic

leukemia. Molecules, 26.

Fathi, E., Jahangoshai Rezaee, M., Tavakkoli-Moghaddam,

R., Alizadeh, A., and Montazer, A. (2020). Design

of an integrated model for diagnosis and classiﬁcation

of pediatric acute leukemia using machine learning.

Journal of Engineering in Medicine, 234.

Gonz

alez, E. E. M., Reyes-Ort

ız, J. A., and Gonz

alez-

Beltr

an, B. A. (2020). Machine learning models

for cancer type classiﬁcation with unstructured data.

Computaci

on y Sistemas, 24(2).

Gonzalez, G., Lu, Z., Leaman, R., Weissenbacher, D.,

Boland, M. R., Chen, Y., Du, J., Fluck, J., Greene,

C. S., Holmes, J. H., Kashyap, A., Nielsen, R. L.,

Ouyang, Z., Schaaf, S., Taroni, J. N., Tao, C., Zhang,

Y., and Liu, H. (2019). Text mining and machine

learning for precision medicine. In PSB.

Hijiya, N. and Suttorp, M. (2019). How i treat chronic

myeloid leukemia in children and adolescents. Blood,

133.

Kaplan, J. (2019). Leukemia in children. Pediatrics in Re-

view, 40.

Marti-Bonmati, L., Alberich-Bayarri, A., Ladenstein, R.,

Blanquer, I., Segrelles, D., Cerd

a-Alberich, L., Gkon-

tra, P., Hero, B., Garcia Aznar, J. M., Keim,

D., Jentner, W., Seymour, K., Jim

enez-Pastor, A.,

Gonz

alez Valverde, I., Heras, B., Essiaf, S., Walker,

D., Rochette, M., Bubak, M., and Neri, E. (2020).

Primage project: predictive in silico multiscale ana-

lytics to support childhood cancer personalised evalu-

ation empowered by imaging biomarkers. European

Radiology Experimental, 4.

Mathew, S., Gulia, S., Singh, V., and Dev, V. (2017). A

review paper on cloud computing and its security con-

cerns. In RICE, volume 10.

Naymagon, L., Najfeld, V., Teruya-Feldstein, J., El Ja-

mal, S., and Mascarenhas, J. (2021). Acute Myeloid

Leukemia.

O’Brien, M., Seif, A., and Hunger, S. (2018). Acute lym-

phoblastic leukemia in children. Wintrobe’s Clinical

Hematology: Fourteenth Edition.

Rajagopal, S., S., H., and Kundapur, P. (2020). Performance

analysis of binary and multiclass models using azure

machine learning. International Journal of Electrical

and Computer Engineering (IJECE), 10.

Rodin, A., Gogoshin, G., Hilliard, S., Wang, L., Egelston,

C., Rockne, R., Chao, J., and Lee, P. (2021). Dis-

secting response to cancer immunotherapy by apply-

ing bayesian network analysis to ﬂow cytometry data.

International Journal of Molecular Sciences, 22.

Verda, D., Parodi, S., Ferrari, E., and Muselli, M. (2019).

Analyzing gene expression data for pediatric and adult

cancer diagnosis using logic learning machine and

standard supervised methods. BMC Bioinform., 20(9).

Zhao, Y., Chang, C., Hannum, M., Lee, J., and Shen, R.

(2021). Bayesian network-driven clustering analy-

sis with feature selection for high-dimensional multi-

modal molecular data. Scientiﬁc Reports, 11.

Technological Model using Machine Learning Tools to Support Decision Making in the Diagnosis and Treatment of Pediatric Leukemia

353