Classiﬁcation Model for Cerebral Aneurysm Rupture Prediction using

Medical and Blood-ﬂow-simulation Data

Masaaki Suzuki

, Toshiyuki Haruhara

, Hiroyuki Takao

2,3,4

, Takashi Suzuki

, Soichiro Fujimura

3,4

Toshihiro Ishibashi

, Makoto Yamamoto

, Yuichi Murayama

and Hayato Ohwada

Department of Industrial Administration, Tokyo University of Science, Chiba, Japan

Department of Neurosurgery, Jikei University School of Medicine, Tokyo, Japan

Department of Innovation for Medical Information Technology, Jikei University School of Medicine, Tokyo, Japan

Department of Mechanical Engineering, Tokyo University of Science, Tokyo, Japan

s.fujimura5016@gmail.com, t-ishibashi@jikei.ac.jp, yamamoto@rs.kagu.tus.ac.jp, ymurayama@jikei.ac.jp,

ohwada@rs.tus.ac.jp

Keywords:

Artiﬁcial Intelligence, Machine Learning, Medical Data, Simulation Data, Computational Fluid Dynamics,

Stroke, Subarachnoid Hemorrhage, Cerebral Aneurysm.

Abstract:

Stroke is a serious cerebrovascular condition, in which brain cells die due to an abrupt blockage of arteries

supplying blood and oxygen or due to bleeding in the brain tissue when a blood vessel bursts or ruptures. Be-

cause stroke occurs suddenly in most people, prevention is oftentimes difﬁcult. In Japan, this condition is one

of the major causes of death, which is associated with high medical cost, especially among the society’s aging

population. Therefore, stroke prediction and treatment is important. Stroke incidences can be avoided by a

preventive treatment based on the risk of onset. However, since judgment of the onset risk largely depends on

the individual experience and skill of the doctor, a highly accurate prediction method that is independent of the

doctor’s experience and skill is the focus of this study. The target of prediction for this research is subarach-

noid hemorrhage that is part of stroke. Logistic regression and support vector machine that predict cerebral

aneurysm rupture by machine learning using combined medical data and cerebral blood-ﬂow-simulation data

were employed to analyze 338 cerebral aneurysm samples (35 ruptured, 303 unruptured). SMOTE algorithm

solved the imbalance of data, while the SelectKBest algorithm was used to extract important features from

the total 70 features obtained from both data. Out of the 27 important features extracted, 40% belonged to

the medical data and the remaining 60% were from the blood-ﬂow-simulation data. Using logistic regression

as a classiﬁcation model, we found the sensitivity of 0.64 and the speciﬁcity of 0.85. The results validated

the possibility of a highly accurate method of cerebral aneurysm rupture prediction by machine learning using

engineering information obtained from mechanical simulation.

1 INTRODUCTION

Stroke—a generic term for cerebral infarction, cere-

bral hemorrhage, and subarachnoid hemorrhage—is a

serious cerebrovascular condition, in which the brain

cells die due to an abrupt blockage of arteries that sup-

ply blood and oxygen to the brain or due to bleeding

in the brain tissue when a blood vessel bursts. For

many people, stroke may occur suddenly and with-

out warning; thus, it could be difﬁcult to prevent. In

Japan, stroke is one of the leading causes of death. In

2017, stroke became the country’s third leading cause

of death due to illness and the number one cause of

being bedridden. Prediction and cure of the condition

is an important issue. Reducing stroke incidence re-

quires a preventive treatment that deals with the risk

of onset; however, at present, risk judgment largely

depends on the individual experience and skill of the

doctor. Therefore, prediction of the onset of stroke

that is highly accurate and independent of the doc-

tor’s experience and skill is required.

Existing stroke-prediction models (Manolio et al.,

1996), (Lumley et al., 2002) adopted features that are

clinically veriﬁed or manually selected by medical ex-

perts. (Wang et al., 2003), (Hitman et al., 2007), and

(Letham et al., 2015) used medical history data as

input features in their research, while (Amini et al.,

2013) used K-nearest neighbor and C4.5 decision tree

Suzuki, M., Haruhara, T., Takao, H., Suzuki, T., Fujimura, S., Ishibashi, T., Yamamoto, M., Murayama, Y. and Ohwada, H.

Classiﬁcation Model for Cerebral Aneurysm Rupture Prediction using Medical and Blood-ﬂow-simulation Data.

DOI: 10.5220/0007691708950899

In Proceedings of the 11th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2019), pages 895-899

ISBN: 978-989-758-350-6

895

method on medical history data for stroke prediction.

Moreover, some studies have started employing vas-

cular imaging for disease prediction; for example,

(Nogueira et al., 2016) employed vascular imaging

to predict clinical outcomes and investigated the risk

of symptomatic intracerebral hemorrhage in patients

who underwent intravenous thrombolytic treatment.

On the other hand, (Bentley et al., 2014) used comput-

erized tomography brain-image inputs into a support

vector machine (SVM) algorithm to predict stroke.

There are several other reports wherein the state

of cerebral blood ﬂow, in addition to medical infor-

mation, was deeply involved with the stroke onset

(Chung and Cebral, 2015). (Morino et al., 2010) used

particle image velocimetry (PIV) and laser doppler

velocimetry (LDV) to measure the velocity proﬁles

of ruptured and unruptured intra-aneurysmal hemo-

dynamics. (Xiang et al., 2014) examined how an in-

let waveform affects the predicted hemodynamics in

patient-speciﬁc aneurysm geometries. (Shojima et al.,

2004), (Qian et al., 2011), and (Takao et al., 2012)

acknowledged the importance of wall shear stress

(WSS), energy loss (EL), and pressure loss coefﬁcient

(PLC), respectively, in predicting cerebral aneurysm

rupture.

Among these studies, very few considered com-

bining data from various technological sources to

successfully predict a stroke onset. In this regard,

this study combined medical data with blood-ﬂow

data obtained by computational ﬂuid dynamics (CFD)

simulations into a classiﬁcation model for enhanced

prediction. Moreover, this research aims to develop

a highly precise stroke-onset prediction method by

machine learning that integrates engineering informa-

tion obtained by mechanical simulation with medi-

cal information. Speciﬁcally, a classiﬁer predicting

whether a cerebral aneurysm that causes subarach-

noid hemorrhage would rupture was constructed via

machine learning using medical data and CFD simu-

lation data of cerebral blood ﬂow as inputs. Factors

that govern cerebral aneurysm rupture were also ex-

tracted.

The rest of the paper is organized as follows. In

Section 2, we describe the data required to build the

proposed classiﬁcation model along with the process

of training the classiﬁer. Section 3 illustrates the re-

sults of model building as well as the discussion of

the numerical experiments. We conclude the paper in

Section 4.

2 METHODS

2.1 Dataset

The total of 6,470 cases had registered to the Jikei

University’s database, we ﬁrst extracted cases for

each location of occurrence of the aneurysm. If the

case was unruptured, we then extracted the cases that

are being observed and have not been treated in the

past, and if the case was ruptured, we then extracted

the cases that ruptured during follow-up visits. In

addition, we used morphology to restrict the cases

to those in which the length, width, and neck of the

bulge are each less than 10 mm, but at least 1 of these

measurement is greater than 3 mm. Furthermore, we

restricted the unruptured cases to those in which the

follow-up period

is over 2 years, and analyzed all

consecutive cases that can be analyzed. In the end,

the scope of this research was 338 cases.

The medical data and blood-ﬂow-simulation data

were collected from the 338 cases, 303 of which for

unruptured and 35 for ruptured aneurysm samples.

2.1.1 Medical Data

There were two categories of the patients’ medi-

cal history data used in the study. The ﬁrst cate-

gory is clinical information, including their age; gen-

der; aneurysm location; history of subarachnoid hem-

orrhage (SAH); smoking; diabetes mellitus (DM);

hypertension (HT); hyperlipidemia; alcohol con-

sumption (Alcohol); polycystic kidneys (PK); cere-

bral hemorrhage (CH); hormone replacement (HR);

and family history of SAH (FH SAH), unruptured

aneurysm (FH Unruptured Aneurysm), PK (FH PK).

The other category is morphological information of

cerebral aneurysm, including maximum aneurysm

height, maximum neck diameter, neck area, volume,

aspect ratio, side-wall or bifurcation type, and pres-

ence or absence of bleb. A total of 17 features were

collected in the patients’ medical data.

2.1.2 Blood-ﬂow-simulation Data

Hemodynamic data were obtained through the CFD

simulation of the cerebral blood ﬂow. CFD is a branch

of ﬂuid mechanics that employs numerical analysis to

solve problems that involve ﬂuid ﬂow. The simulation

identiﬁed physical blood-ﬂow characteristics such as

PLC, EL, Energy Loss per Unit Volume (ELV), max-

imum WSS, average WSS, minimum WSS, and Os-

cillatory Shear Index (OSI), and the maximum, mini-

The follow-up period is deﬁned as the time between the

initial consultation and the ﬁnal consultation.

ICAART 2019 - 11th International Conference on Agents and Artiﬁcial Intelligence

896

mum, amplitude, and average of these quantities were

used in the study. Among these characteristics, PLC,

EL and WSS were reported as helpful in predicting

whether cerebral aneurysm would rupture (Shojima

et al., 2004),(Qian et al., 2011),(Takao et al., 2012). A

total of 53 features were collected in the blood-ﬂow-

simulation data.

The calculation conditions are summarized as fol-

lows. A prototype CFD solver (Siemens Healthcare

GmbH, Forchheim, Germany, ”Not to be used for Di-

agnosis and/or Therapy”), which utilizes the Lattice

Boltzmann method, was used. With regards to the

physical properties of blood, ﬁxed density and viscos-

ity values were set, and non-Newtonian ﬂuids were

disregarded. After considering a laminar ﬂow ﬁeld,

the two pulses were calculated using the pulse con-

ditions, and only results from the second pulse were

used. The outlet boundary condition was set to an av-

erage static pressure of 0 Pa, and the calculations were

established in a structured computational grid with a

maximum size of 0.1 mm. For further details, see pre-

vious works (Qian et al., 2011),(Takao et al., 2012).

2.2 Classiﬁcation Model for Cerebral

Aneurysm Rupture Prediction

2.2.1 Oversampling of Minority Sample

The number of patients who suffer from ruptured

aneurysm is far less than those who suffer from

unruptured aneurysms, i.e., a classical class imbal-

ance problem exists. Therefore, the synthetic minor-

ity oversampling technique (SMOTE) (Chawla et al.,

2002) was ideal in generating the simulated instances

for the classiﬁcation model. As one of the powerful

and effective approaches in various ﬁelds, SMOTE

oversamples the minority class by taking each in-

stance and introducing synthetic instances along the

line segments joining any or all of the K-nearest

neighbors in the minority class. A synthetic instance

of an instance under consideration (called the base

instance) is generated by ﬁrst taking the difference

between the feature vector of the base instance and

its nearest neighbor, multiplying this difference by a

random number between 0 and 1, and ﬁnally adding

the product to the feature vector of the base instance.

SMOTE was applied in this research to enlarge the

number of samples with ruptured aneurysm.

2.2.2 Feature Selection

This study employed the SelectKBest algorithm for

the selection of the useful features out of the 70 com-

bined medical and blood-ﬂow-simulation data fea-

tures to serve as input for the classiﬁcation model.

SelectKBest uses a function (in this case f classif, but

could be others) to score the features and then re-

moves all but those with the K highest scores.

2.2.3 Building a Classiﬁer

The data sample is divided into training, veriﬁca-

tion, and test data while maintaining the ratio be-

tween the number of ruptured and unruptured sam-

ples. Here, the ratio of training data, veriﬁcation data,

and test data was set to 4:3:3. The training data was

used to optimize the hyper parameter of the classiﬁ-

cation model using a grid search with stratiﬁed ﬁve-

fold cross validation. On the other hand, the veriﬁca-

tion data was used to determine the optimum number

and item of features. For K = 1, . . . , 70(= 17 + 53),

features were selected by the SelectKBest algorithm,

their performance on the veriﬁcation data was evalu-

ated, and the optimum number of features and items

were determined. Moreover, the test data were used

to evaluate the ﬁnal classiﬁcation performance. Here,

logistic regression and SVM were used as classiﬁers,

and the performance of both methods were subse-

quently compared.

3 RESULTS AND DISCUSSION

3.1 Feature Selection

Results of the feature selections are organized in Ta-

bles 1 and 2. Approximately 40% of the features ex-

tracted as important features were medical data that

doctors have used as judgment materials for diagnosis

until now such as age and size of cerebral aneurysms.

In contrast, the remaining 60% were cerebral blood-

ﬂow CFD simulation data such as WSS and PLC.

Table 1: Features selected: logistic regression.

Data Type Description

Clinical infor-

mation

Age, Aneurysm location,

Multi/Single aneurysm

Morphological

information

Max. height, Volume, Aspect

ratio, Bleb

Hemodynamic

information

Temporal max. spatial min.

WSS, Temporal ave. spatial

min. WSS, Temporal max.

LSA, LSI, SCI, Temporal max.

& ave. SCI, Temporal min. &

ave. PLC

Total number

of features

Classiﬁcation Model for Cerebral Aneurysm Rupture Prediction using Medical and Blood-ﬂow-simulation Data

897

Table 2: Features selected: SVM.

Data Type Description

Clinical infor-

mation

Age, Aneurysm location,

Multi/Single aneurysm, HT, HL

Morphological

information

Side/Bifurcation, Max. height,

Volume, Aspect ratio, Bleb

Hemodynamic

information

Temporal max. spatial

ave.WSS, Temporal max.

spatial min WSS, Temporal

min. spatial ave. WSS, Tem-

poral min. spatial min.WSS,

Temporal max LSA, LSI, SCI,

Temporal ave. spatial ave.

WSS, Temporal ave. spatial

min.WSS, Temporal ave. spa-

tial min. WSS, Temporal ave.

PLC, SCI, Temporal min. LSI,

PLC

Total number

of features

3.2 Cerebral Aneurysm Rupture

Prediction

The measures to evaluate the classiﬁcation model

were sensitivity, speciﬁcity, and F-measure.

Sensitivity, computed by Eq. (1), represents the frac-

tion of actual correctly predicted ruptured samples

from the total number of ruptured samples.

Sensitivity =

TruePositive

TruePositive + FalseNegative

(1)

Speciﬁcity, computed by Eq. (2), represents the frac-

tion of actual correctly predicted unruptured samples

from the total number of unruptured samples.

Speciﬁcity =

TrueNegative

TrueNegative + FalsePositive

(2)

F-measure is the harmonic mean of Precision and

Sensitivity computed by Eq.(3).

F-measure =

2 TruePositive

2 TruePositive + FalsePositive + FalseNegative

(3)

Tables 3 and 4 show the confusion matrix obtained

by logistic regression and SVM, respectively. Table 5

summarizes the performance measures resulting from

the test data classiﬁcation by the two models.

Based on the performance measures of the two

classiﬁers, logistic regression and SVM, logistic re-

gression was found to slightly lower the speciﬁcity

but greatly increase the sensitivity. In other words,

using logistic regression made the classiﬁcation more

stable.

Table 3: Confusion matrix: logistic regression.

N=102 Actual class

Rupture Unrupture

Predicted Rupture 7 14

class Unrupture 4 77

Table 4: Confusion matrix: SVM.

N=102 Actual class

Rupture Unrupture

Predicted Rupture 5 8

class Unrupture 6 83

Table 5: Performance measures resulting from test data

classiﬁcation by the two models.

logistic regression SVM

Sensitivity 0.636 0.455

Speciﬁcity 0.846 0.912

F-measure 0.437 0.417

4 CONCLUSIONS

A classiﬁer constructed by machine learning us-

ing combined medical and cerebral blood-ﬂow-

simulation data was used for prediction of cerebral

aneurysm rupture in a total of 338 cerebral aneurysm

data samples (35 ruptured, 303 unruptured). SMOTE

algorithm was used to resolve the imbalance of data,

while SelectKBest algorithm was applied to the 70

features, resulting in the extraction of 27 important

features. Among the features extracted, 40% belong

to the medical data while 60% were from the blood-

ﬂow-simulation data. Using logistic regression as a

classiﬁcation model, we found the sensitivity of 0.64

and the speciﬁcity of 0.85. The results showed the

possibility of highly accurate prediction of cerebral

aneurysm rupture by machine learning using engi-

neering information obtained from simulations.

Thus, this study successfully developed a classi-

ﬁcation model on stroke-onset prediction with data

from three different sources. Although the number

of cases used in the analysis was limited, the success

and great performance of this model could still make

a good reference for future research, even for doctors

who could issue objective diagnoses by considering

various data sources to help patients receive preven-

tive treatment. Stroke detection and prevention could

help more people and save medical resources, espe-

cially for an aging society like Japan.

ICAART 2019 - 11th International Conference on Agents and Artiﬁcial Intelligence

898

ACKNOWLEDGEMENTS

This paper is based on results obtained from a

project commissioned by the New Energy and Indus-

trial Technology Development Organization (NEDO).

CFD calculations were performed in collaboration

with Siemens Healthcare within a collaboration

agreement with the Jikei University.

REFERENCES

Amini, L., Azarpazhouh, R., Farzadfar, M., Mousavi, S.,

Jazaieri, F., Khorvash, F., Norouzi, R., and Toghianfar,

N. (2013). Prediction and control of stroke by data

mining. Int J Prev Med., 4(Suppl 2):S245–249.

Bentley, P., Ganesalingam, J., Jones, A., Mahady, K., Ep-

ton, S., Rinne, P., Sharma, P., Halse, O., Mehta, A.,

and Rueckert, D. (2014). Prediction of stroke throm-

bolysis outcome using ct brain machine learning. Neu-

roimage Clin., 30(4):635–640.

Chawla, N., Bowyer, K., Hall, L., and Kegelmeyer, W.

(2002). Smote: Synthetic minority over-sampling

technique. Journal of Artiﬁcial Intelligence Research,

16:321–357.

Chung, B. and Cebral, J. (2015). Cfd for evaluation and

treatment planning of aneurysms: review of proposed

clinical uses and their challenges. Ann Biomed Eng.,

43(1):122–138.

Hitman, G., Colhoun, H., Newman, C., Szarek, M., Bet-

teridge, D., Durrington, P., Fuller, J., Livingstone, S.,

Neil, H., and Investigators., C. (2007). Stroke predic-

tion and stroke prevention with atorvastatin in the col-

laborative atorvastatin diabetes study (cards). Diabet

Med., 24(12):1313–1321.

Letham, B., Rudin, C., McCormick, T., and Madigan, D.

(2015). Stroke prediction and stroke prevention with

atorvastatin in the collaborative atorvastatin diabetes

study (cards). Annals of Applied Statistics, 9(3):1350–

1371.

Lumley, T., Kronmal, R., Cushman, M., Manolio, T., and

Goldstein, S. (2002). A stroke prediction score in the

elderly: validation and web-based application. J Clin

Epidemiol, 55(2):129–136.

Manolio, T., Kronmal, R., Burke, G., O’Leary, D., and

Price, T. (1996). Short-term predictors of incident

stroke in older adults. Stroke, 27(9):1479–1486.

Morino, T., Tanoue, T., Tateshima, S., Vinuela, F., and Tan-

ishita, K. (2010). Intra-aneurysmal blood ﬂow based

on patient-speciﬁc ct angiogram. Experiments in Flu-

ids, 49(2):485–496.

Nogueira, R., Bor-Seng-Shu, E., Saeed, N., Teixeira, M.,

Panerai, R., and Robinson, T. (2016). Meta-analysis of

vascular imaging features to predict outcome follow-

ing intravenous rtpa for acute ischemic stroke. Fron-

tiers in Neurology, 7(77):1–8.

Qian, Y., Takao, H., Umezu, M., and Murayama, Y. (2011).

Risk analysis of unruptured aneurysms using compu-

tational ﬂuid dynamics technology: preliminary re-

sults. AJNR Am J Neuroradiol., 32(10):1948–1955.

Shojima, M., Oshima, M., Takagi, K., Torii, R., Hayakawa,

M., Katada, K., Morita, A., and Kirino, T. (2004).

Magnitude and role of wall shear stress on cere-

bral aneurysm: computational ﬂuid dynamic study

of 20 middle cerebral artery aneurysms. Stroke,

35(11):2500–2505.

Takao, H., Murayama, Y., Abe, T., Ishibashi, T., Yuki, I.,

Otsuka, S., Suzuki, T., Masuda, S., Mohamed, A.,

Sen, I., Yamamoto, M., and Abe, T. (2012). Cfd re-

veals hemodynamic differences between unruptured

and ruptured intracranial aneurysms during observa-

tion. Stroke, 43(2):A2731.

Wang, T., Massaro, J., Levy, D., Vasan, R., Wolf, P.,

D’Agostino, R., Larson, M., Kannel, W., and Ben-

jamin, E. (2003). A risk score for predicting stroke

or death in individuals with new-onset atrial ﬁbrilla-

tion in the community: the framingham heart study.

JAMA, 290(8):1049–1056.

Xiang, J., Siddiqui, A., and Meng, H. (2014). The effect of

inlet waveforms on computational hemodynamics of

patient-speciﬁc intracranial aneurysms. J Biomech.,

47(16):3882–3890.

Classiﬁcation Model for Cerebral Aneurysm Rupture Prediction using Medical and Blood-ﬂow-simulation Data

899