Classification Model for Cerebral Aneurysm Rupture Prediction using
Medical and Blood-flow-simulation Data
Masaaki Suzuki
1
, Toshiyuki Haruhara
1
, Hiroyuki Takao
2,3,4
, Takashi Suzuki
3
, Soichiro Fujimura
3,4
,
Toshihiro Ishibashi
2
, Makoto Yamamoto
4
, Yuichi Murayama
2
and Hayato Ohwada
1
1
Department of Industrial Administration, Tokyo University of Science, Chiba, Japan
2
Department of Neurosurgery, Jikei University School of Medicine, Tokyo, Japan
3
Department of Innovation for Medical Information Technology, Jikei University School of Medicine, Tokyo, Japan
4
Department of Mechanical Engineering, Tokyo University of Science, Tokyo, Japan
s.fujimura5016@gmail.com, t-ishibashi@jikei.ac.jp, yamamoto@rs.kagu.tus.ac.jp, ymurayama@jikei.ac.jp,
ohwada@rs.tus.ac.jp
Keywords:
Artificial Intelligence, Machine Learning, Medical Data, Simulation Data, Computational Fluid Dynamics,
Stroke, Subarachnoid Hemorrhage, Cerebral Aneurysm.
Abstract:
Stroke is a serious cerebrovascular condition, in which brain cells die due to an abrupt blockage of arteries
supplying blood and oxygen or due to bleeding in the brain tissue when a blood vessel bursts or ruptures. Be-
cause stroke occurs suddenly in most people, prevention is oftentimes difficult. In Japan, this condition is one
of the major causes of death, which is associated with high medical cost, especially among the society’s aging
population. Therefore, stroke prediction and treatment is important. Stroke incidences can be avoided by a
preventive treatment based on the risk of onset. However, since judgment of the onset risk largely depends on
the individual experience and skill of the doctor, a highly accurate prediction method that is independent of the
doctor’s experience and skill is the focus of this study. The target of prediction for this research is subarach-
noid hemorrhage that is part of stroke. Logistic regression and support vector machine that predict cerebral
aneurysm rupture by machine learning using combined medical data and cerebral blood-flow-simulation data
were employed to analyze 338 cerebral aneurysm samples (35 ruptured, 303 unruptured). SMOTE algorithm
solved the imbalance of data, while the SelectKBest algorithm was used to extract important features from
the total 70 features obtained from both data. Out of the 27 important features extracted, 40% belonged to
the medical data and the remaining 60% were from the blood-flow-simulation data. Using logistic regression
as a classification model, we found the sensitivity of 0.64 and the specificity of 0.85. The results validated
the possibility of a highly accurate method of cerebral aneurysm rupture prediction by machine learning using
engineering information obtained from mechanical simulation.
1 INTRODUCTION
Stroke—a generic term for cerebral infarction, cere-
bral hemorrhage, and subarachnoid hemorrhage—is a
serious cerebrovascular condition, in which the brain
cells die due to an abrupt blockage of arteries that sup-
ply blood and oxygen to the brain or due to bleeding
in the brain tissue when a blood vessel bursts. For
many people, stroke may occur suddenly and with-
out warning; thus, it could be difficult to prevent. In
Japan, stroke is one of the leading causes of death. In
2017, stroke became the country’s third leading cause
of death due to illness and the number one cause of
being bedridden. Prediction and cure of the condition
is an important issue. Reducing stroke incidence re-
quires a preventive treatment that deals with the risk
of onset; however, at present, risk judgment largely
depends on the individual experience and skill of the
doctor. Therefore, prediction of the onset of stroke
that is highly accurate and independent of the doc-
tor’s experience and skill is required.
Existing stroke-prediction models (Manolio et al.,
1996), (Lumley et al., 2002) adopted features that are
clinically verified or manually selected by medical ex-
perts. (Wang et al., 2003), (Hitman et al., 2007), and
(Letham et al., 2015) used medical history data as
input features in their research, while (Amini et al.,
2013) used K-nearest neighbor and C4.5 decision tree
Suzuki, M., Haruhara, T., Takao, H., Suzuki, T., Fujimura, S., Ishibashi, T., Yamamoto, M., Murayama, Y. and Ohwada, H.
Classification Model for Cerebral Aneurysm Rupture Prediction using Medical and Blood-flow-simulation Data.
DOI: 10.5220/0007691708950899
In Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART 2019), pages 895-899
ISBN: 978-989-758-350-6
Copyright
c
2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
895
method on medical history data for stroke prediction.
Moreover, some studies have started employing vas-
cular imaging for disease prediction; for example,
(Nogueira et al., 2016) employed vascular imaging
to predict clinical outcomes and investigated the risk
of symptomatic intracerebral hemorrhage in patients
who underwent intravenous thrombolytic treatment.
On the other hand, (Bentley et al., 2014) used comput-
erized tomography brain-image inputs into a support
vector machine (SVM) algorithm to predict stroke.
There are several other reports wherein the state
of cerebral blood flow, in addition to medical infor-
mation, was deeply involved with the stroke onset
(Chung and Cebral, 2015). (Morino et al., 2010) used
particle image velocimetry (PIV) and laser doppler
velocimetry (LDV) to measure the velocity profiles
of ruptured and unruptured intra-aneurysmal hemo-
dynamics. (Xiang et al., 2014) examined how an in-
let waveform affects the predicted hemodynamics in
patient-specific aneurysm geometries. (Shojima et al.,
2004), (Qian et al., 2011), and (Takao et al., 2012)
acknowledged the importance of wall shear stress
(WSS), energy loss (EL), and pressure loss coefficient
(PLC), respectively, in predicting cerebral aneurysm
rupture.
Among these studies, very few considered com-
bining data from various technological sources to
successfully predict a stroke onset. In this regard,
this study combined medical data with blood-flow
data obtained by computational fluid dynamics (CFD)
simulations into a classification model for enhanced
prediction. Moreover, this research aims to develop
a highly precise stroke-onset prediction method by
machine learning that integrates engineering informa-
tion obtained by mechanical simulation with medi-
cal information. Specifically, a classifier predicting
whether a cerebral aneurysm that causes subarach-
noid hemorrhage would rupture was constructed via
machine learning using medical data and CFD simu-
lation data of cerebral blood flow as inputs. Factors
that govern cerebral aneurysm rupture were also ex-
tracted.
The rest of the paper is organized as follows. In
Section 2, we describe the data required to build the
proposed classification model along with the process
of training the classifier. Section 3 illustrates the re-
sults of model building as well as the discussion of
the numerical experiments. We conclude the paper in
Section 4.
2 METHODS
2.1 Dataset
The total of 6,470 cases had registered to the Jikei
University’s database, we first extracted cases for
each location of occurrence of the aneurysm. If the
case was unruptured, we then extracted the cases that
are being observed and have not been treated in the
past, and if the case was ruptured, we then extracted
the cases that ruptured during follow-up visits. In
addition, we used morphology to restrict the cases
to those in which the length, width, and neck of the
bulge are each less than 10 mm, but at least 1 of these
measurement is greater than 3 mm. Furthermore, we
restricted the unruptured cases to those in which the
follow-up period
1
is over 2 years, and analyzed all
consecutive cases that can be analyzed. In the end,
the scope of this research was 338 cases.
The medical data and blood-flow-simulation data
were collected from the 338 cases, 303 of which for
unruptured and 35 for ruptured aneurysm samples.
2.1.1 Medical Data
There were two categories of the patients’ medi-
cal history data used in the study. The first cate-
gory is clinical information, including their age; gen-
der; aneurysm location; history of subarachnoid hem-
orrhage (SAH); smoking; diabetes mellitus (DM);
hypertension (HT); hyperlipidemia; alcohol con-
sumption (Alcohol); polycystic kidneys (PK); cere-
bral hemorrhage (CH); hormone replacement (HR);
and family history of SAH (FH SAH), unruptured
aneurysm (FH Unruptured Aneurysm), PK (FH PK).
The other category is morphological information of
cerebral aneurysm, including maximum aneurysm
height, maximum neck diameter, neck area, volume,
aspect ratio, side-wall or bifurcation type, and pres-
ence or absence of bleb. A total of 17 features were
collected in the patients’ medical data.
2.1.2 Blood-flow-simulation Data
Hemodynamic data were obtained through the CFD
simulation of the cerebral blood flow. CFD is a branch
of fluid mechanics that employs numerical analysis to
solve problems that involve fluid flow. The simulation
identified physical blood-flow characteristics such as
PLC, EL, Energy Loss per Unit Volume (ELV), max-
imum WSS, average WSS, minimum WSS, and Os-
cillatory Shear Index (OSI), and the maximum, mini-
1
The follow-up period is defined as the time between the
initial consultation and the final consultation.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
896
mum, amplitude, and average of these quantities were
used in the study. Among these characteristics, PLC,
EL and WSS were reported as helpful in predicting
whether cerebral aneurysm would rupture (Shojima
et al., 2004),(Qian et al., 2011),(Takao et al., 2012). A
total of 53 features were collected in the blood-flow-
simulation data.
The calculation conditions are summarized as fol-
lows. A prototype CFD solver (Siemens Healthcare
GmbH, Forchheim, Germany, ”Not to be used for Di-
agnosis and/or Therapy”), which utilizes the Lattice
Boltzmann method, was used. With regards to the
physical properties of blood, fixed density and viscos-
ity values were set, and non-Newtonian fluids were
disregarded. After considering a laminar flow field,
the two pulses were calculated using the pulse con-
ditions, and only results from the second pulse were
used. The outlet boundary condition was set to an av-
erage static pressure of 0 Pa, and the calculations were
established in a structured computational grid with a
maximum size of 0.1 mm. For further details, see pre-
vious works (Qian et al., 2011),(Takao et al., 2012).
2.2 Classification Model for Cerebral
Aneurysm Rupture Prediction
2.2.1 Oversampling of Minority Sample
The number of patients who suffer from ruptured
aneurysm is far less than those who suffer from
unruptured aneurysms, i.e., a classical class imbal-
ance problem exists. Therefore, the synthetic minor-
ity oversampling technique (SMOTE) (Chawla et al.,
2002) was ideal in generating the simulated instances
for the classification model. As one of the powerful
and effective approaches in various fields, SMOTE
oversamples the minority class by taking each in-
stance and introducing synthetic instances along the
line segments joining any or all of the K-nearest
neighbors in the minority class. A synthetic instance
of an instance under consideration (called the base
instance) is generated by first taking the difference
between the feature vector of the base instance and
its nearest neighbor, multiplying this difference by a
random number between 0 and 1, and finally adding
the product to the feature vector of the base instance.
SMOTE was applied in this research to enlarge the
number of samples with ruptured aneurysm.
2.2.2 Feature Selection
This study employed the SelectKBest algorithm for
the selection of the useful features out of the 70 com-
bined medical and blood-flow-simulation data fea-
tures to serve as input for the classification model.
SelectKBest uses a function (in this case f classif, but
could be others) to score the features and then re-
moves all but those with the K highest scores.
2.2.3 Building a Classifier
The data sample is divided into training, verifica-
tion, and test data while maintaining the ratio be-
tween the number of ruptured and unruptured sam-
ples. Here, the ratio of training data, verification data,
and test data was set to 4:3:3. The training data was
used to optimize the hyper parameter of the classifi-
cation model using a grid search with stratified five-
fold cross validation. On the other hand, the verifica-
tion data was used to determine the optimum number
and item of features. For K = 1, . . . , 70(= 17 + 53),
features were selected by the SelectKBest algorithm,
their performance on the verification data was evalu-
ated, and the optimum number of features and items
were determined. Moreover, the test data were used
to evaluate the final classification performance. Here,
logistic regression and SVM were used as classifiers,
and the performance of both methods were subse-
quently compared.
3 RESULTS AND DISCUSSION
3.1 Feature Selection
Results of the feature selections are organized in Ta-
bles 1 and 2. Approximately 40% of the features ex-
tracted as important features were medical data that
doctors have used as judgment materials for diagnosis
until now such as age and size of cerebral aneurysms.
In contrast, the remaining 60% were cerebral blood-
flow CFD simulation data such as WSS and PLC.
Table 1: Features selected: logistic regression.
Data Type Description
Clinical infor-
mation
Age, Aneurysm location,
Multi/Single aneurysm
Morphological
information
Max. height, Volume, Aspect
ratio, Bleb
Hemodynamic
information
Temporal max. spatial min.
WSS, Temporal ave. spatial
min. WSS, Temporal max.
LSA, LSI, SCI, Temporal max.
& ave. SCI, Temporal min. &
ave. PLC
Total number
of features
16
Classification Model for Cerebral Aneurysm Rupture Prediction using Medical and Blood-flow-simulation Data
897
Table 2: Features selected: SVM.
Data Type Description
Clinical infor-
mation
Age, Aneurysm location,
Multi/Single aneurysm, HT, HL
Morphological
information
Side/Bifurcation, Max. height,
Volume, Aspect ratio, Bleb
Hemodynamic
information
Temporal max. spatial
ave.WSS, Temporal max.
spatial min WSS, Temporal
min. spatial ave. WSS, Tem-
poral min. spatial min.WSS,
Temporal max LSA, LSI, SCI,
Temporal ave. spatial ave.
WSS, Temporal ave. spatial
min.WSS, Temporal ave. spa-
tial min. WSS, Temporal ave.
PLC, SCI, Temporal min. LSI,
PLC
Total number
of features
27
3.2 Cerebral Aneurysm Rupture
Prediction
The measures to evaluate the classification model
were sensitivity, specificity, and F-measure.
Sensitivity, computed by Eq. (1), represents the frac-
tion of actual correctly predicted ruptured samples
from the total number of ruptured samples.
Sensitivity =
TruePositive
TruePositive + FalseNegative
(1)
Specificity, computed by Eq. (2), represents the frac-
tion of actual correctly predicted unruptured samples
from the total number of unruptured samples.
Specificity =
TrueNegative
TrueNegative + FalsePositive
(2)
F-measure is the harmonic mean of Precision and
Sensitivity computed by Eq.(3).
F-measure =
2 TruePositive
2 TruePositive + FalsePositive + FalseNegative
(3)
Tables 3 and 4 show the confusion matrix obtained
by logistic regression and SVM, respectively. Table 5
summarizes the performance measures resulting from
the test data classification by the two models.
Based on the performance measures of the two
classifiers, logistic regression and SVM, logistic re-
gression was found to slightly lower the specificity
but greatly increase the sensitivity. In other words,
using logistic regression made the classification more
stable.
Table 3: Confusion matrix: logistic regression.
N=102 Actual class
Rupture Unrupture
Predicted Rupture 7 14
class Unrupture 4 77
Table 4: Confusion matrix: SVM.
N=102 Actual class
Rupture Unrupture
Predicted Rupture 5 8
class Unrupture 6 83
Table 5: Performance measures resulting from test data
classification by the two models.
logistic regression SVM
Sensitivity 0.636 0.455
Specificity 0.846 0.912
F-measure 0.437 0.417
4 CONCLUSIONS
A classifier constructed by machine learning us-
ing combined medical and cerebral blood-flow-
simulation data was used for prediction of cerebral
aneurysm rupture in a total of 338 cerebral aneurysm
data samples (35 ruptured, 303 unruptured). SMOTE
algorithm was used to resolve the imbalance of data,
while SelectKBest algorithm was applied to the 70
features, resulting in the extraction of 27 important
features. Among the features extracted, 40% belong
to the medical data while 60% were from the blood-
flow-simulation data. Using logistic regression as a
classification model, we found the sensitivity of 0.64
and the specificity of 0.85. The results showed the
possibility of highly accurate prediction of cerebral
aneurysm rupture by machine learning using engi-
neering information obtained from simulations.
Thus, this study successfully developed a classi-
fication model on stroke-onset prediction with data
from three different sources. Although the number
of cases used in the analysis was limited, the success
and great performance of this model could still make
a good reference for future research, even for doctors
who could issue objective diagnoses by considering
various data sources to help patients receive preven-
tive treatment. Stroke detection and prevention could
help more people and save medical resources, espe-
cially for an aging society like Japan.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
898
ACKNOWLEDGEMENTS
This paper is based on results obtained from a
project commissioned by the New Energy and Indus-
trial Technology Development Organization (NEDO).
CFD calculations were performed in collaboration
with Siemens Healthcare within a collaboration
agreement with the Jikei University.
REFERENCES
Amini, L., Azarpazhouh, R., Farzadfar, M., Mousavi, S.,
Jazaieri, F., Khorvash, F., Norouzi, R., and Toghianfar,
N. (2013). Prediction and control of stroke by data
mining. Int J Prev Med., 4(Suppl 2):S245–249.
Bentley, P., Ganesalingam, J., Jones, A., Mahady, K., Ep-
ton, S., Rinne, P., Sharma, P., Halse, O., Mehta, A.,
and Rueckert, D. (2014). Prediction of stroke throm-
bolysis outcome using ct brain machine learning. Neu-
roimage Clin., 30(4):635–640.
Chawla, N., Bowyer, K., Hall, L., and Kegelmeyer, W.
(2002). Smote: Synthetic minority over-sampling
technique. Journal of Artificial Intelligence Research,
16:321–357.
Chung, B. and Cebral, J. (2015). Cfd for evaluation and
treatment planning of aneurysms: review of proposed
clinical uses and their challenges. Ann Biomed Eng.,
43(1):122–138.
Hitman, G., Colhoun, H., Newman, C., Szarek, M., Bet-
teridge, D., Durrington, P., Fuller, J., Livingstone, S.,
Neil, H., and Investigators., C. (2007). Stroke predic-
tion and stroke prevention with atorvastatin in the col-
laborative atorvastatin diabetes study (cards). Diabet
Med., 24(12):1313–1321.
Letham, B., Rudin, C., McCormick, T., and Madigan, D.
(2015). Stroke prediction and stroke prevention with
atorvastatin in the collaborative atorvastatin diabetes
study (cards). Annals of Applied Statistics, 9(3):1350–
1371.
Lumley, T., Kronmal, R., Cushman, M., Manolio, T., and
Goldstein, S. (2002). A stroke prediction score in the
elderly: validation and web-based application. J Clin
Epidemiol, 55(2):129–136.
Manolio, T., Kronmal, R., Burke, G., O’Leary, D., and
Price, T. (1996). Short-term predictors of incident
stroke in older adults. Stroke, 27(9):1479–1486.
Morino, T., Tanoue, T., Tateshima, S., Vinuela, F., and Tan-
ishita, K. (2010). Intra-aneurysmal blood flow based
on patient-specific ct angiogram. Experiments in Flu-
ids, 49(2):485–496.
Nogueira, R., Bor-Seng-Shu, E., Saeed, N., Teixeira, M.,
Panerai, R., and Robinson, T. (2016). Meta-analysis of
vascular imaging features to predict outcome follow-
ing intravenous rtpa for acute ischemic stroke. Fron-
tiers in Neurology, 7(77):1–8.
Qian, Y., Takao, H., Umezu, M., and Murayama, Y. (2011).
Risk analysis of unruptured aneurysms using compu-
tational fluid dynamics technology: preliminary re-
sults. AJNR Am J Neuroradiol., 32(10):1948–1955.
Shojima, M., Oshima, M., Takagi, K., Torii, R., Hayakawa,
M., Katada, K., Morita, A., and Kirino, T. (2004).
Magnitude and role of wall shear stress on cere-
bral aneurysm: computational fluid dynamic study
of 20 middle cerebral artery aneurysms. Stroke,
35(11):2500–2505.
Takao, H., Murayama, Y., Abe, T., Ishibashi, T., Yuki, I.,
Otsuka, S., Suzuki, T., Masuda, S., Mohamed, A.,
Sen, I., Yamamoto, M., and Abe, T. (2012). Cfd re-
veals hemodynamic differences between unruptured
and ruptured intracranial aneurysms during observa-
tion. Stroke, 43(2):A2731.
Wang, T., Massaro, J., Levy, D., Vasan, R., Wolf, P.,
D’Agostino, R., Larson, M., Kannel, W., and Ben-
jamin, E. (2003). A risk score for predicting stroke
or death in individuals with new-onset atrial fibrilla-
tion in the community: the framingham heart study.
JAMA, 290(8):1049–1056.
Xiang, J., Siddiqui, A., and Meng, H. (2014). The effect of
inlet waveforms on computational hemodynamics of
patient-specific intracranial aneurysms. J Biomech.,
47(16):3882–3890.
Classification Model for Cerebral Aneurysm Rupture Prediction using Medical and Blood-flow-simulation Data
899