
cleaning and preprocessing steps resulted in 802,356
records and 12 features in the ICU pneumonia dataset
as described in Table 1, and 12,250 records and 11
features in the Hospitals Compliance dataset as de-
scribed in Table 2.
2.3 Merging the Two Datasets
Investigating the potential impact of ventilator bundle
compliance rates on predicting ventilator-associated
pneumonia (VAP) among ICU patients mandates
comparing the prediction results of ML models
trained only on the essential information in the “ICU
Pneumonia dataset” as well as ML models trained
on a dataset consisting of both the “ICU Pneumonia
dataset” and the associated records from the “Hospi-
tals Compliance dataset” that complements the pri-
mary dataset by sharing key features regarding the
compliance to the ventilator bundle. The inner merg-
ing technique was employed, focusing on the shared
key features such as region, hospital, unit, year, and
month among the two datasets. This process resulted
in a merged dataset comprising 77,577 records and 18
features.
2.4 Machine Learning Models
Five ML models were built and evaluated in our
study: random forest (RF), support vector machine
(SVM), logistic regression (LR), extreme gradient
boosting machine (XGBoost), and adaptive boost-
ing (AdaBoost). These models have been proven
to achieve superior performance for similar research
problems in the literature, especially binary classifi-
cation problems with severe class imbalance (Khushi
et al., 2021).
2.5 Experimental Settings
To investigate the impact of the hospital compliance
with the ventilator bundle data on predicting VAP
among ICU patients, we built and evaluated two mod-
els of each of the selected ML algorithms: one on
the primary “ICU Pneumonia dataset” and one on the
merged dataset. This resulted in the development of
ten ML models for our experiment.
To build and evaluate the models, both datasets
were split into an 80% training set and a 20% testing
set. The Synthetic Minority Oversampling Technique
(SMOTE) (Blagus and Lusa, 2013) was applied to the
training set to handle the class imbalance issue before
building the ML models. A fivefold cross-validation
technique was employed to build the ML models,
where each model experienced rigorous training and
evaluation within the defined cross-validation frame-
work, allowing for a comprehensive comparison of
their respective performances, optimizing model se-
lection, and enhancing their robustness and generaliz-
ability. Finally, the testing set was used to evaluate the
prediction performance of the models. We reported
our results using several performance evaluation mea-
sures: accuracy, sensitivity, precision, and F1-score.
The experiment was conducted using a HP com-
puter with a Windows 64-bit operating system, a 2
GHz processor, and 8 GB of RAM. Tools used in-
clude Anaconda Navigator, Jupyter Notebooks, and
pandas, NumPy, OS, and sklearn Python libraries.
3 RESULTS
Table 3 shows the obtained results of evaluating the
prediction performance of the selected ML algorithms
on the primary “ICU Pneumonia dataset” and the
merged dataset in terms of accuracy, precision, sen-
sitivity, specificity, and F1-score. Evaluation results
showed that SVM emerged as the top-performing
model in terms of accuracy, obtaining 89.48% accu-
racy using the primary dataset. Additionally, SVM
demonstrated the highest sensitivity at 90.73%, show-
casing its proficiency in correctly identifying posi-
tive instances. Precision, a critical metric for assess-
ing the correctness of positive predictions, was no-
tably high for both RF and XGBoost, reaching 97%
on the merged dataset. Moreover, when consider-
ing the F1-score, XGBoost outperformed other mod-
els with a score of 92% on the merged dataset. It
is noteworthy that random forest (RF) and XGBoost
consistently excelled in recall, achieving the highest
value of 89% on the merged dataset. Logistic re-
gression (LR) demonstrated its strength in achieving
specificity, with 74.62% on the merged dataset, signi-
fying its proficiency in correctly identifying negative
instances.
4 DISCUSSION
SVM, LR, RF, XGBoost, and AdaBoost are popular
ML models proven to achieve high performance in
many domains, including healthcare. We built and
evaluated these five ML models on two datasets.
We found that LR yielded the worst overall pre-
diction performance compared to the other models.
More specifically, LR performed the worst in accu-
racy and sensitivity when the merged dataset is used,
while it performed the worst in precision and F1-score
with the primary dataset. We noticed that using the
Investigating the Impact of Ventilator Bundle Compliance Rates on Predicting ICU Patients with Risk for Hospital-Acquired
Ventilator-Associated Pneumonia Infection in Saudi Arabia
799