Impact of Machine Learning Assistance on the Quality of Life Prediction

for Breast Cancer Patients

Mikko Nuutinen

1,2 a

, Sonja Korhonen

, Anna-Maria Hiltunen

, Ira Haavisto

1,3

Paula Poikonen-Saksela

, Johanna Mattson

, Haridimos Kondylakis

, Ketti Mazzocco

6,7

Ruth Pat-Horenczyk

, Berta Sousa

and Riikka-Leena Leskel

Nordic Healthcare Group, Helsinki, Finland

Haartman Institute, University of Helsinki, Helsinki, Finland

Laurea University of Applied Sciences, Sustainable and Versatile Social and Health Care, Vantaa, Finland

Helsinki University Hospital Comprehensive Cancer Center and Helsinki University, Finland

FORTH-ICS, Heraklion, Greece

Department of Oncology and Hemato-oncology, University of Milan, Milan, Italy

Applied Research Division for Cognitive and Psychological Science, European Institute of Oncology IRCCS, Milan, Italy

Paul Baerwald School of Social Work and Social Welfare, The Hebrew University of Jerusalem, Jerusalem, Israel

Champalimaud Clinical Centre, Breast Unit, Champalimaud Centre for the Unknown, Champalimaud Foundation, Lisboa,

Portugal

Keywords:

Clinical Decision Support System, Breast Cancer, Resilience, Machine Learning.

Abstract:

Proper and well-timed interventions may improve breast cancer patient adaptation, resilience and quality of life

(QoL) during treatment process and time after disease. The challenge is to identify those patients who would

beneﬁt most from a particular intervention. The aim of this study was to measure whether the machine learning

prediction incorporated in the clinical decision support system (CDSS) improves clinicians’ performance to

predict patients’ QoL during treatment process. We conducted an experimental setup in which six clinicians

used CDSS and predicted QoL for 60 breast cancer patients. Each patient was evaluated both with and without

the aid of machine learning prediction. The clinicians were also open-ended interviewed to investigate the

usage and perceived beneﬁts of CDSS with the machine learning prediction aid. Clinicians’ performance

to evaluate the patients’ QoL was higher with the aid of machine learning predictions than without the aid.

AUROC of clinicians was .777 (95% CI .691 − .857) with the aid and .755 (95% CI .664 − .840) without the

aid. When the machine learning model’s prediction was correct, the average accuracy (ACC) of the clinicians

was .788 (95% CI .739 − .838) with the aid and .717 (95% CI .636 − .798) without the aid.

1 INTRODUCTION

Breast cancer is a major socio-economic challenge

due to its high prevalence. In 2018, more than 2 mil-

lion new breast cancer patients were diagnosed world-

wide (Bray et al., 2018). 28% of all cancers in Europe

were breast cancers. The concept of resilience refers

to a person’s ability to adapt and bounce back from

some challenging event (Deshields et al., 2016; Rut-

ter, 2006). How a breast cancer patient adapts to treat-

ment process and time after disease greatly affects a

patient’s quality of life (QoL). Proper and well-timed

interventions may be important in improving patient

adaptation and resilience. The challenge is to identify

https://orcid.org/0000-0002-7429-3710

in advance and in a timely manner those patients who

would beneﬁt most from a particular intervention.

An advanced machine learning algorithms inte-

grated into clinical decision support system (CDSS

(Sutton et al., 2020)) can help a clinician to identify

target patients and to determine appropriate interven-

tions. As far as we know, no previous studies have

investigated the aid of machine learning prediction in-

tegrated into CDSS to identify patients who may need

attention and intervention for resilience of breast can-

cer treatment process and survival. In this study, we

investigated the use of machine learning prediction in-

tegrated into CDSS to identify breast cancer patients

who may need help. We conducted a user experiment

in which clinicians’ task was to predict patients’ qual-

344

Nuutinen, M., Korhonen, S., Hiltunen, A., Haavisto, I., Poikonen-Saksela, P., Mattson, J., Kondylakis, H., Mazzocco, K., Pat-Horenczyk, R., Sousa, B. and Leskelä, R.

Impact of Machine Learning Assistance on the Quality of Life Prediction for Breast Cancer Patients.

DOI: 10.5220/0010786900003123

In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 5: HEALTHINF, pages 344-352

ISBN: 978-989-758-552-4; ISSN: 2184-4305

ity of life after the time period of 6 months from the

diagnosis of breast cancer. The independent variable

of the user experiment was the aid of machine learn-

ing algorithm. The aim was to measure whether the

machine learning prediction improves clinicians’ per-

formance to predict the QoL of patients. In addition,

we conducted an open-ended interview for each clini-

cian. The aim was to determine how this kind of deci-

sion support tool could be used and who would beneﬁt

from it and how.

2 METHODS

2.1 Dataset

Patient data that is used in this study was collected

from four clinical sites: (1) Helsinki University Hos-

pital Comprehensive Cancer Center (HUS), (2) He-

brew University in Jerusalem, Israel, (3) Champal-

imaud Breast Unit (CHAMP) and (4) European In-

stitute of Oncology (IEO). The study was approved

by the European Institute of Oncology, Applied Re-

search Division for Cognitive and Psychological Sci-

ence (Approval No R868/18 – IEO 916) and the clin-

ical ethical committees of each hospital.

The retrospective data set contains sociodemo-

graphic and lifestyle, medical and treatment and psy-

chosocial assessment values for 608 breast cancer

patients

. For the user experiment, we selected 60

HUS patients (test set). The remaining 548 patients

(train set) were used for training the machine learn-

ing algorithm. The target variable for machine learn-

ing algorithm and user experiment was patients’ self-

assessed quality of life (QoL) value evaluated six

months (Month 6) after the baseline (Month 0). Each

patient’s baseline was 3-4 weeks after breast cancer

was diagnosed. The QoL value was measured us-

ing EORTC QLQ-Global QoL scale (Aaronson et al.,

1993).

Table 1 presents descriptive analysis of sociode-

mographic and lifestyle, medical and treatment and

psychosocial assessment values for the patients of the

test set. These variables were presented in the user

interface of the user experiment for the clinicians (Fig

1). In Table 1, the patients are divided into the low

and high QoL groups. The threshold of grouping is

the QoL value of 75. Patients whose self-assessed

QoL value was higher than 75 after 6 months from

the baseline were grouped in the high QoL group.

The same grouping was used for training the machine

HUS: 185 patients, Hebrew University: 138 patients,

CHAMP: 108 patients and IEO: 177 patients

learning classiﬁer (Section 2.2). Table 1 shows that

the high QoL patients were signiﬁcantly older and

they had lower BMI and better baseline values for the

overall health and quality of life and lower distress

level compared to the low QoL group.

2.2 Machine Learning Model

Train data set (n=548) was used for training machine

learning model (random forest classiﬁer). The task of

machine learning model was to classify a patient to

be either in the group of low QoL or high QoL af-

ter 6 months from the baseline. The performance of

the trained machine learning model to classify high

and low QoL patients was evaluated on the test data

set (n=60) by calculating the standard performance

metrics, such as the area under the receiver operating

characteristic curve (AUROC), recall and precision.

2.3 User Experiment

The standard performance measurements of machine

learning algorithms are not sufﬁcient to show that

CDSS is effective also in a real clinical environment.

The human decision-making process is complex and

biased. It cannot be assumed that clinicians will al-

ways closely follow the recommendations of machine

learning model (Vasey et al., 2021; Ginestra et al.,

2019). In this study we conducted a user experiment

for measuring the performance of decision making by

simulating the use of CDSS with or without the aid

of the machine learning prediction. The independent

variable was the aid of machine learning prediction.

The dependent variable was the predicted QoL value

for patients. The QoL values were given by using the

continuous scale from 0 (low QoL) to 100 (high QoL).

2.3.1 User Interface

Fig 1 presents the user interface of the user experi-

ment in which the machine learning prediction (prob-

ablity of high QoL) was presented for the participants.

In the case of without the aid of prediction, only pa-

tient background information and patient question-

naire data were presented for the participants. The

QoL predictions from the participants were stored

with the slider at the bottom of the user interface (in-

side the red square).

The patient background information presented on

the user interface tried to present the same informa-

tion as clinicians use at the normal patient admission.

It’s important to note that the results of the experiment

are comparable to normal patient examinations only

when all patient’s background information relevant to

Impact of Machine Learning Assistance on the Quality of Life Prediction for Breast Cancer Patients

345

Table 1: Sociodemographic and lifestyle, medical and treatment and psychosocial assessment values for the test set patient

cohort. The patients are divided into the low and high QoL (quality of life) groups according to the QoL value measured

using EORTC global QLQ scale (Aaronson et al., 1993). The threshold of grouping is the QoL value of 75. Patients whose

self-assessed QoL value was higher than 75 after 6 months from the baseline were grouped in the high QoL group. P values

were calculated by Fisher’s exact or Mann-Whitney U test. EORTC = European organization for research and treatment of

cancer, Avg. = Average, SD = Standard deviation, BMI = Body mass index.

Variable group Variable Low QoL (0-75) High QoL (75-100) P value

Number of patients 38 22

Sociodemographic

and lifestyle

Age, Avg.(±SD) 54.6 (7.61) 62.2 (6.38) < .001

BMI, Avg.(±SD) 27.3 (4.62) 24.2 (3.76) 0.006

Higher education

, n (%) 38 (100.0) 21 (95.5) 0.367

Part time or unemployment, n (%) 1 (2.6) 2 (9.1) 0.548

Low income

, n (%) 6 (15.8) 1 (4.5) 0.246

No exercise, n (%) 3 (7.9) 1 (4.5) 1

Living alone, n (%) 15 (39.5) 10 (45.5) 0.787

Number of children, Avg.(±SD) 1.4 (1.15) 1.8 (1.1) 0.064

Medical and

treatment

Chemotherapy treatment, n (%) 26 (68.4) 12 (54.5) 0.405

Preexisting mentalillness, n (%) 15 (39.5) 5 (22.7) 0.258

Chronic depression, n (%) 5 (13.2) 0 (0.0) 0.148

Psychosocial

assessment

Baseline self-assessment:

Overall quality of life

, Avg.(±SD)

5.3 (1.12) 6.2 (1.18) < .001

Baseline self-assessment:

Overall health

, Avg.(±SD)

5.1 (1.04) 6.3 (1.08) < .001

Baseline self-assessment:

Distress level

, Avg.(±SD)

4.7 (2.68) 1.6 (1.65) < .001

Month 6, self-assessment:

Global QLQ

, Avg.(±SD)

55.7 (14.77) 92.0 (7.03) < .001

Bachelor, high school, postgraduate school or vocational non academic diploma

Net monthly income 0 -1500C

QLQ30-29 (Aaronson et al., 1993), How would you rate your overall quality of life during the past week? 1 (very poor) - 7 (exellent)

QLQ30-30 (Aaronson et al., 1993), How would you rate your overall health during the past week? 1 (very poor) - 7 (exellent)

NCCN distress thermometer (Goebel and Mehdorn, 2011), Please circle the number (0-10) that best describes how much distress you have been experiencing in the past week,

including today: 1 (No distress) - 10 (Extreme distress)

QLQ30 functional scale Global, (Aaronson et al., 1993) [0-100], the higher is better

the task are presented on the user interface. The pa-

tient background information of this study presented

on the user interface was based on consultation of two

medical oncologists with long experience of breast

cancer treatment (HUS: Paula Poikonen-Saksela and

Leena Vehmanen, 5.10.2020). The selected patient

background variables were related to the patient’s age,

BMI, education, working life, physical activity, fam-

ily relationships, chemotherapy treatment and mental

health background. Also, previous research (Bonanno

et al., 2007; Molina et al., 2014) supports that the vari-

ables such as age, socioeconomic and marital status

and social support are important factors of resilience.

Furthermore, the user interface presented the pa-

tients’ answers for three psychosocial questions. The

questions were related to patient’s health, quality of

life and distress level at the baseline. The questions

were selected according to the variable importance

values of the trained machine learning model (Table

3).

2.3.2 Participants and Samples

To compare the performance of clinicians with and

without the aid of prediction, six clinicians diagnosed

three sets of 20 patients twice, in two separate ses-

sions, according to the crossover design detailed in

Fig 2. Participants were oncologists with median

7.5 (4-18) years of experience of treating breast can-

cer patients. During each session, clinicians inter-

preted half of patients with machine learning predic-

tion value, and half without. After a washout period

the clinicians diagnosed the same set of 20 patients

with the aid status reversed. The patients that were re-

viewed with the aid of predictions at the ﬁrst session

were reviewed without the aid during the second ses-

sion, and vice versa. That is, the 60 patients (test set)

were randomly grouped into three groups, 20 patients

in each, and each clinician evaluated all the patients

in one group with and without the aid. Thus, each pa-

tient group was evaluated by two different clinicians.

To establish familiarity with the CDSS and the

machine learning predictions, each session began

HEALTHINF 2022 - 15th International Conference on Health Informatics

346

Figure 1: User interface of the user experiment. After participants have analyzed patient background information and selected

patient questionnaires, they rate quality of life value for the patient with or without the aid of machine learning prediction. The

quality of life values were given by using the continuous scale from 0 (low QoL) to 100 (high QoL) (inside the red square).

In this user interface example, the aid of machine learning prediction have been presented.

with an introduction and 4 training patients (2 with

and 2 without the aid) that were not part of the test

patients. Study administrator also clariﬁed any ques-

tions about the functionality and the variables of user

experiment.

The washout period between the two sessions of

the crossover design was 2-4 weeks. According to

the recommendation (Pantanowitz et al., 2013) the

washout period should be at least 2 weeks. On the

other hand, with a long washout period, the partic-

ipant’s diagnostic criteria could have changed over

time. For example, participants could have gained

more experience or changed their attitude toward di-

agnostic criteria (Nielsen et al., 2010).

Too long experiment causes fatigue, which low-

ers the quality of input values. With a pilot study

we conﬁrmed that the length of a single session with

20 patients was no more than 30 minutes. According

to standard (ITU-R, 2012) the duration of experiment

should be less than 60 minutes.

2.3.3 Open-ended Interview

After the second session of the user experiment, an

open-ended interview was conducted for the partici-

pants. The interview data was analyzed following the-

matic analysis and the approach identiﬁed by (Clarke

5 5 5

Washout period of 2 weeks

Training

With aid

Without aid

Order 2

Order 1

Figure 2: Experimental design. Each of the 6 clinicians

was randomly assigned to either test order 1 or 2. Each test

began with a brief practice block of 4 (2 with the aid and

2 without the aid) patient cases, followed by 4 experiment

blocks of 5 patients, with order 1 beginning with the aid of

machine learning predictions and order 2 beginning without

the aid.

and Braun, 2014). The interview included the follow-

ing questions:

• Could you make use of this kind of decision sup-

port tool when taking care of a patient and how?

• How would you envision it to be used in your or-

ganisation / department?

• Who (what role/s) in your organisation would use

such a tool?

• Who (what role/s) in your organisation would

Impact of Machine Learning Assistance on the Quality of Life Prediction for Breast Cancer Patients

347

make use of the information?

• How might the predicted score affect the patient

care processes from your perspective / in your or-

ganisation?

• Do you think the patients could beneﬁt from this

kind of prediction? Under which conditions?

• What aspects should take in consideration when

further developing the decision support tool?

2.3.4 Statistical Analyses

The performance of the clinicians with the aid and

without the aid was evaluated by calculating the per-

formance metrics of the area under the receiver oper-

ating characterictic curve (AUROC), recall, precision

and balanced accuracy (ACC). Furthermore, we mea-

sured participants’ review time when decisions were

made with or without the aid of predictions. We used

bootstrapping (Seabold and Perktold, 2010) to com-

pute 95% conﬁdential intervals (CI) and p-values for

the performance metrics.

3 RESULTS

3.1 Machine Learning Model

The AU ROC value of the trained machine learning

model (random forest) for the test data set was .832

(95% CI .757-.900). Recall and precision values were

.727 (95% CI .583-.857) and .727 (95% CI .589-.854)

when the threshold value of the model was .60. Table

2 presents the confusion matrix of the trained machine

learning model for the test data set when the threshold

value of the model was .60 or .70. With the thresh-

old of .60, the model classiﬁed 6/38 low QoL patients

in the group of high QoL (false positives). With the

threshold of .70, 1/38 low QoL patients were classi-

ﬁed in the group of high QoL.

Table 3 lists the 10 most important variables of

the trained machine learning model according to the

random forest feature importance values. The vari-

ables of Global QLQ, mental health (HADS) and dis-

tress level at the baseline (Month 0) were important

psychosocial factors. Age, BMI and monthly income

were important sociodemographic and lifestyle fac-

tors.

3.1.1 User Experiment

Table 4 presents the performance values for the ma-

chine learning model and over clinicians with and

without the aid of the predictions. The overall

receiver operating characteristic (ROC) curves are

Table 2: Confusion matrix for the trained machine learning

model when the classiﬁcation threshold (th) was .60 or .70.

QoL = Quality of life, Pred = predicted.

th = .60 Pred low QoL Pred high QoL

Low QoL 32 6

High QoL 6 16

th = .70 Pred low QoL Pred high QoL

Low QoL 37 1

High QoL 15 7

Table 3: Variable importance values of the trained machine

learning model.

Variable Value

Global QLQ

.106

Mental health, HADS

.079

Age .072

Distress level

.071

Overall quality of life, QLQ30-30

.057

Overall health, QLQ30-29

.048

Upset, PANAS 5

.044

Monthly income .043

Coping with cancer, CBI

.041

BMI .040

The psychosocial variabels were from the questionnaires:

EORTC quality of life questionnaire (QLQ-C30)

Hospital Anxiety and Depression Scale (HADS)

NCCN distress thermometer

Positive and Negative affectivity - short form (PANAS)

Cancer Behavior Inventory (self-efﬁcacy in coping with cancer) (CBI-B)

shown in Fig 3. AUROC of clinicians was .755

(95% CI .664–.840) without the aid and .777 (95% CI

.691–.857) with the aid. AUROC of machine learn-

ing model was .832 (95% CI .757-.900) which is not

statistically signiﬁcantly higher than AUROC of clin-

icians with or without the aid (p = .53 and p = .135).

Figure 3: The overall receiver operating characteristic

(ROC) curves for machine learning (ML) model and clin-

icians with/without the aid of machine learning prediction.

AUROC = Area under the receiver operating characteristic

curve.

The AUROC values of the individual clinicians

with or without the aid for the evaluated patient

groups (n = 20) are presented in Fig 4. The AUROC

values of the machine learning model are shown with

HEALTHINF 2022 - 15th International Conference on Health Informatics

348

the dashed lines. Two clinicians (#1 and #5) with

the aid had higher AUROC than the machine learning

model had for the same 20 patient group. Four clini-

cians (#1, #3, #4, #5) with the aid had higher AUROC

than without the aid. Two clinicians (#2, #6) without

the aid had higher AUROC than with the aid.

Figure 4: Area under the receiver operating characteristic

curve (AUROC) values for the 6 clinicians with and without

the aid of machine learning prediction. The performance of

machine learning algorithm is shown with the dashed lines.

As can be seen from the results, on average, all

performance values (AU ROC, recall, precision, ACC)

were better with the aid than without the aid. It is also

clear, that the performance of the machine learning

model was higher than that of clinicians except for the

recall measure. However, recall and precision values

can be optimized by thresholding classiﬁer. That is,

by using a lower probability threshold, recall can be

higher and precision lower and vice versa.

Table 5 presents review time for clinicians with

and without the aid. The average review time was

34.01 s (95% CI 31.49 s - 36.53 s) without the aid

and 38.63 s (95% CI 36.58 s - 40.67 s) with the aid.

The difference is statistically signiﬁcant (p < .001).

Only one clinician (#6) was faster to give the predic-

tion with the aid than without the aid.

Table 6 presents accuracy values (ACC) of the

clinicians, when machine learning model predicted

the QoL classes correctly or incorrectly. The aver-

age accuracy of the clinicians was .720 (95% CI .644-

.797) without the aid and .793 (95% CI .754-.832)

with the aid (p = .040) when the machine learning

model predicted the QoL classes correctly. The aver-

age accuracy of the clinicians was .532 (95% CI .264-

.799) without the aid and .476 (95% CI .208-.745)

with the aid (p = .363) when the machine learning

model predicted the QoL classes incorrectly. That is,

when the prediction of the machine learning model

was correct, the predictions of the clinicians were

more accurate on average.

3.1.2 Open-ended Interview

All clinicians found the CDSS to be useful if incor-

porated into the care of breast cancer patients. Ac-

cording to clinicians, the information provided by the

CDSS would not likely affect the actual breast cancer

treatment of patients or the choice of therapies, but

rather inﬂuence the psychosocial support and other

possible interventions offered to patients. However,

there was a consensus that for the resilience predic-

tion to be valuable it must lead to an actual interven-

tion for the patient. The usefulness of the tool is there-

fore affected by the availability of interventions to im-

prove resilience. Furthermore, one clinician thought

that the prediction would be most useful and informa-

tive in cases where the predicted resilience is lower

than the clinician’s intuitive prediction. Several clin-

icians viewed that the CDSS would be most useful

if it could identify the patients with weak resilience

12 months after the end of treatment, at which point

in time a portion of patients are generally less vigor-

ous than the majority. The resilience prediction could

then be used to target speciﬁc individually planned in-

terventions and a higher level of support for this group

of patients. The optimal timing for the use of the

CDSS is thought to differ between patients, varying

from the time of planning adjuvant treatment to the

post-treatment period.

Most clinicians thought that both doctors and

nurses may be possible users of the CDSS and could

make use of resilience prediction information. How-

ever, the suitable user depends on which interventions

would follow from the prediction, as offering certain

interventions may require a referral from a doctor.

However, the likelihood and motivation of clinicians

to use the CDSS is generally believed to be signif-

icantly affected by the ease of use and convenience

of the tool. With regards to breast cancer patients,

there were conﬂicting views on whether the informa-

tion provided by the tool would be useful to be shared

with patients. While some clinicians viewed that pa-

tients learning their resilience prediction may moti-

vate and encourage them through their treatment and

rehabilitation process, some clinicians worried that a

poor predicted resilience may cause discouragement

and increase stress. Therefore, if the resilience pre-

diction is shared with patients the manner in which

the information is communicated must be paid atten-

tion to.

In further development of the CDSS, one clini-

cian highlighted the importance of the incorporation

of more parameters concerning breast cancer treat-

ment and possible comorbidities into the CDSS, while

another hoped for more detailed information of the

Impact of Machine Learning Assistance on the Quality of Life Prediction for Breast Cancer Patients

349

Table 4: The performance measurements of the area under the receiver operating characteristic curve (AUROC), recall, pre-

cision and balanced accuracy (ACC) for machine learning (ML) model and over all participants with and without the aid of

machine learning prediction. Recall, precision and ACC were calculated for the clinicians by using the threshold QoL value of

75 and for machine learning model by using the threshold probability value of .60. Avg. = Average, CI = Conﬁdence interval.

Set AUROC Recall Precision ACC

ML, Avg. (95% CI) .832 (.757-.900) .727 (.587-.854) .727 (.589-.854) .785 (.704-.863)

With aid, Avg. (95% CI) .777 (.686-.858) .818 (.696-.930) .590 (.465-.707) .745 (.663-.820)

Without aid, Avg. (95% CI) .755 (.656-.840) .773 (.636-.887) .540 (.412-.662) .696 (.607-.778)

Table 5: Review time of the clinicians with and without the aid of the machine learning predictions. s = seconds, Avg. =

Average.

Clinician Review time with aid (s) Review time without aid (s)

1 38.99 29.05

2 40.78 32.96

3 35.02 29.32

4 38.60 35.55

5 32.99 27.81

6 45.38 49.40

Avg. 38.63 34.02

mental health and possible medications of the patient.

Furthermore, one clinician also hoped for the patient

perspective in terms of their feelings towards learning

their predicted resilience to be explored further.

4 DISCUSSION

The aim of this study was to measure whether the ma-

chine learning prediction integrated into the CDSS af-

fects clinicians’ ability to predict the quality of life of

breast cancer patients during the treatment process.

Based on the results, the aid of machine learning pre-

diction improved the ability of clinicians to predict

patients’ quality of life. Clinicians’ performance im-

proved at a statistically signiﬁcant level in patients for

whom the machine learning model was able to predict

the correct outcome. The same result has also been

observed in a previous study (Kiani et al., 2020).

Traditional performance measurements, such as

AUROC, accuracy, and sensitivity, measure numerical

accuracy values. A deeper understanding of advan-

tages and disadvantages of CDSS requires different

measures and methods. Previous studies (Lee et al.,

2020; Jang et al., 2020) have measured, for example,

clinician’s conﬁdence in his or her own assessment

when the prediction of a machine learning model was

visible. In this study, in addition to traditional per-

formance measures, we conducted an open-ended in-

terview for the participants. The interview gathered

information for the development and use of decision

support tool. Based on the results, this kind of deci-

sion support tool was found to be useful. However, it

requires that the use of the tool would lead to real in-

terventions, which in turn requires that interventions

are available and possible to apply. This ﬁnding lim-

its the usefulness of the tool to hospitals which have

ready interventions for support of resilience in place

or the possibility to add such interventions into the

breast cancer care process.

The research setup of this experimental study sim-

ulated the use of decision support tool. This is a re-

search setup that should be conducted after the perfor-

mance validation of machine learning algorithm but

before ﬁeld study. The goal of ﬁeld study is to val-

idate tool for a real operating environment. In other

words, the results of this study determined whether

the tool needs to be further developed and what im-

provements are needed before ﬁeld study. Based

on the results, several improvements are needed be-

fore the ﬁeld study phase. First, based on the open-

ended interview, the patient’s medication, treatment,

and other conditions should be presented more de-

tailed level. More speciﬁc information may improve

clinicians’ conﬁdence in both their own assessments

and predictions provided by the CDSS. Second, clin-

icians’ performance improved only slightly when the

prediction of machine learning method was available.

If the prediction of machine learning model was cor-

rect, performance of clinicians improved at statisti-

cally signiﬁcant level. Based on this, the performance

of the machine learning model should be improved for

the ﬁeld study phase. The number of false predictions

should be minimized that the usefulness of the tool in

actual use can be higher. Third, the machine learning

model outputs only single prediction value to the time

point after six months from the baseline. CDSS could

be more useful if more endpoints (e.g., 6, 9 and 12

months) are predicted and/or timeline-type QoL tra-

jectories are possible. Furthermore, from the point of

view of the interviewed clinicians the resilience pre-

HEALTHINF 2022 - 15th International Conference on Health Informatics

350

Table 6: Accuracy of individual participants when machine learning algorithm predicted correctly or incorrectly QoL (quality

of life) class for the patients. Avg. = Average, CI = Conﬁdence interval.

With aid Without aid

Clinician Correct prediction Incorrect prediction Correct prediction Incorrect prediction

1 .765 .667 .647 .667

2 .846 .429 .846 .429

3 .722 .000 .667 .000

4 .833 1.000 .722 1.000

5 .824 .333 .824 .667

6 .769 .429 .615 .429

Avg. (95% CI) .793 (.754-.832) .476 (.208-.745) .720 (.644-.797) .532 (.264-.799)

diction would be most useful for the time point 12

months after the end of treatment, as more variance in

patients’ resilience is often observed at this point in

time.

As a follow-up study, the effect of the clinician’s

experience and test environment for the performance

should also be investigated. Previous research (Cai

et al., 2019) has shown that the aid of machine learn-

ing beneﬁts more inexperienced clinicians. All partic-

ipants in this study were experienced clinicians. That

is, with inexperienced clinicians beneﬁts from the aid

of machine learning predictions could be higher. Test

environment of this study did not correspond to a real

clinical environment. There were no unrelated dis-

tractions or other examinations requiring the attention

of clinicians. In noisy real clinical environment, the

aid of machine learning predictions can be higher that

should be studied.

5 CONCLUSIONS

Based on the study, the machine learning model in-

tegrated into the CDSS improved clinicians’ perfor-

mance in predicting patients’ quality of life after six

months from the baseline. Performance improved

especially in the cases where the machine learning

model was able to correctly predict patient’s QoL

value. It should be noted, however, that based on the

open-ended interview, this kind of tool is considered

useful only when resilience strengthening interven-

tions can be implemented for the patients identiﬁed

to have low predicted resilience

REFERENCES

Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger,

M., Cull, A., Duez, N. J., Filiberti, A., Flechtner, H.,

Fleishman, S. B., and de Haes, J. C. (1993). The Eu-

ropean Organization for Research and Treatment of

Cancer QLQ-C30: a quality-of-life instrument for use

in international clinical trials in oncology. J Natl Can-

cer Inst, 85(5):365–376.

Bonanno, G. A., Galea, S., Bucciarelli, A., and Vlahov, D.

(2007). What predicts psychological resilience after

disaster? The role of demographics, resources, and

life stress. J Consult Clin Psychol, 75(5):671–682.

Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre,

L. A., and Jemal, A. (2018). Global cancer statistics

2018: GLOBOCAN estimates of incidence and mor-

tality worldwide for 36 cancers in 185 countries. CA

Cancer J Clin, 68(6):394–424.

Cai, S. L., Li, B., Tan, W. M., Niu, X. J., Yu, H. H.,

Yao, L. Q., Zhou, P. H., Yan, B., and Zhong, Y. S.

(2019). Using a deep learning system in endoscopy

for screening of early esophageal squamous cell car-

cinoma (with video). Gastrointest Endosc, 90(5):745–

753.

Clarke, V. and Braun, V. (2014). Thematic Analysis, vol-

ume 3, pages 1947–1952.

Deshields, T. L., Heiland, M. F., Kracen, A. C., and Dua, P.

(2016). Resilience in adults with cancer: development

of a conceptual model. Psychooncology, 25(1):11–18.

Ginestra, J. C., Giannini, H. M., Schweickert, W. D., Mead-

ows, L., Lynch, M. J., Pavan, K., Chivers, C. J.,

Draugelis, M., Donnelly, P. J., Fuchs, B. D., and

Umscheid, C. A. (2019). Clinician Perception of a

Machine Learning-Based Early Warning System De-

signed to Predict Severe Sepsis and Septic Shock. Crit

Care Med, 47(11):1477–1484.

Goebel, S. and Mehdorn, H. M. (2011). Measurement of

psychological distress in patients with intracranial tu-

mours: the NCCN distress thermometer. J Neuroon-

col, 104(1):357–364.

ITU-R (2012). Itu-r rec. bt.500-13, methodology for the

subjective assessment of the quality of television pic-

tures. Report A 70000, ITU Radiocommunication

Sector.

Jang, S., Song, H., Shin, Y. J., Kim, J., Kim, J., Lee, K. W.,

Lee, S. S., Lee, W., Lee, S., and Lee, K. H. (2020).

Deep Learning-based Automatic Detection Algorithm

for Reducing Overlooked Lung Cancers on Chest Ra-

diographs. Radiology, 296(3):652–661.

Kiani, A., Uyumazturk, B., Rajpurkar, P., Wang, A., Gao,

R., Jones, E., Yu, Y., Langlotz, C. P., Ball, R. L.,

Montine, T. J., Martin, B. A., Berry, G. J., Ozawa,

M. G., Hazard, F. K., Brown, R. A., Chen, S. B.,

Wood, M., Allard, L. S., Ylagan, L., Ng, A. Y., and

Shen, J. (2020). Impact of a deep learning assistant on

the histopathologic classiﬁcation of liver cancer. NPJ

Digit Med, 3:23.

Impact of Machine Learning Assistance on the Quality of Life Prediction for Breast Cancer Patients

351

Lee, J. H., Ha, E. J., Kim, D., Jung, Y. J., Heo, S., Jang,

Y. H., An, S. H., and Lee, K. (2020). Application

of deep learning to the diagnosis of cervical lymph

node metastasis from thyroid cancer with CT: external

validation and clinical utility for resident training. Eur

Radiol, 30(6):3066–3072.

Molina, Y., Yi, J. C., Martinez-Gutierrez, J., Reding, K. W.,

Yi-Frazier, J. P., and Rosenberg, A. R. (2014). Re-

silience among patients across the cancer continuum:

diverse perspectives. Clin J Oncol Nurs, 18(1):93–

101.

Nielsen, P. S., Lindebjerg, J., Rasmussen, J., Starklint, H.,

Waldstrøm, M., and Nielsen, B. (2010). Virtual mi-

croscopy: an evaluation of its validity and diagnostic

performance in routine histologic diagnosis of skin tu-

mors. Hum Pathol, 41(12):1770–1776.

Pantanowitz, L., Sinard, J. H., Henricks, W. H., Fatheree,

L. A., Carter, A. B., Contis, L., Beckwith, B. A.,

Evans, A. J., Lal, A., and Parwani, A. V. (2013). Vali-

dating whole slide imaging for diagnostic purposes in

pathology: guideline from the College of American

Pathologists Pathology and Laboratory Quality Cen-

ter. Arch Pathol Lab Med, 137(12):1710–1722.

Rutter, M. (2006). Implications of resilience concepts for

scientiﬁc understanding. Ann N Y Acad Sci, 1094:1–

12.

Seabold, S. and Perktold, J. (2010). Statsmodels: Econo-

metric and statistical modeling with python. Proceed-

ings of the 9th Python in Science Conference, 2010.

Sutton, R. T., Pincock, D., Baumgart, D. C., Sadowski,

D. C., Fedorak, R. N., and Kroeker, K. I. (2020). An

overview of clinical decision support systems: bene-

ﬁts, risks, and strategies for success. NPJ Digit Med,

3:17.

Vasey, B., Clifton, D. A., Collins, G. S., Denniston, A. K.,

Faes, L., Geerts, B. F., Liu, X., Morgan, L., Watkin-

son, P., and McCulloch, P. (2021). DECIDE-AI: new

reporting guidelines to bridge the development-to-

implementation gap in clinical artiﬁcial intelligence.

Nat Med, 27(2):186–187.

HEALTHINF 2022 - 15th International Conference on Health Informatics

352