Predicting Hospital Length of Stay of Patients Leaving the Emergency

Department

Alexander Winter

1,3 a

, Mattis Hartwig

2,3 b

and Toralf Kirsten

1 c

Department of Medical Data Science, Leipzig University, Germany

German Research Center for Artiﬁcial Intelligence, 23562 L

ubeck, Germany

singularIT GmbH, 04109 Leipzig, Germany

Keywords:

Length of Stay Prediction, Emergency Deparment, MIMIC-IV, CatBoost Architecture.

Abstract:

In this paper, we aim to predict the patient’s length of stay (LOS) after they are dismissed from the emergency

department and transferred to the next hospital unit. An accurate prediction has positive effects for patients,

doctors and hospital administrators. We extract a dataset of 181,797 patients from the United States and

perform a set of feature engineering steps. For the prediction we use a CatBoost regression architecture with

a speciﬁcally implemented loss function. The results are compared with baseline models and results from

related work on other use cases. With an average absolute error of 2.36 days in the newly deﬁned use case

of post ED LOS prediction, we outperform baseline models achieve comparable results to use cases from

intensive care unit LOS prediction. The approach can be used as a new baseline for further improvements of

the prediction.

1 INTRODUCTION

Accurately predicting the patient’s length of stay

(LOS) is an important capability for hospital admin-

istrators. An accurate forecast can be used for effec-

tive planning and management of hospital resources,

which has positive effects for patients, doctors and

hospitals (Stone et al., 2022). Patients will experience

more seamless treatments and have a reduced risk of

running into capacity bottlenecks resulting in negative

effects on their recovery. Doctors will experience less

stress induced by capacity issues and do not need to

focus on ad-hoc capacity planning (Rocheteau et al.,

2021). Hospitals can achieve a better utilization of re-

sources and capacities, which will increase their efﬁ-

ciency and enable more sustainable budgeting. Since

many patients enter the hospital through the emer-

gency department (ED) the transition from ED to the

follow-up unit is an interesting point in time for pre-

dicting the remaining LOS (Christ et al., 2010).

In this paper, we use the MIMIC-IV dataset as a

basis to learn a regression model for LOS prediction

at the moment when the patient is released from the

https://orcid.org/0000-0002-2866-0073

https://orcid.org/0000-0002-1507-7647

https://orcid.org/0000-0001-7117-4268

ED. The version 4 of the MIMIC dataset has been

published recently and is the ﬁrst version that con-

tains speciﬁc ED data. Older versions of the MIMIC

dataset have already been used for LOS prediction

which makes our results comparable to other research

(Gentimis et al., 2017; Rocheteau et al., 2021). Vari-

ables that inﬂuence the hospital LOS are plentiful.

They include mostly medical information but can also

depend on organizational problems, like unavailabil-

ity of beds or personal issues, for example a doctor

making a misdiagnosis (Buttigieg et al., 2018). An

amount of over 250,000 patients and the number of

features make the prediction tasks very suitable for

machine learning methods. Since the dataset contains

many high dimensional categorical features, we use

the state-of-the-art CatBoost model (Dorogush et al.,

2018) together with a feature engineering, hyperpa-

rameter tuning, and a speciﬁcally implemented loss

function for the regression and task. We also use naive

prediction models that predict the mean and median

for regression or the most common unit for the clas-

siﬁcation task as benchmarks. We achieve an average

absolute error of 2.36 days which is signiﬁcantly bet-

ter than the baseline models and comparable to the

work on other prediction tasks based on the MIMIC

dataset.

The remainder of the paper is structured as fol-

124

Winter, A., Hartwig, M. and Kirsten, T.

Predicting Hospital Length of Stay of Patients Leaving the Emergency Department.

DOI: 10.5220/0011671700003414

In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 5: HEALTHINF, pages 124-131

ISBN: 978-989-758-631-6; ISSN: 2184-4305

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

lows. Section 1 gives an overview of the related work

when it comes to LOS prediction. Section 3 intro-

duces the dataset and Section 4 describes the meth-

ods used in this paper together with the experimental

setup. The results are discussed in Section 5. The

paper concludes in Section 6.

2 RELATED WORK

LOS has been researched from various perspectives.

Business process research is one perspective that

works more as a motivation for our work than as re-

lated in terms of methodology. Sadler et al. (Sadler

et al., 2011) have identiﬁed LOS as a relevant busi-

ness factor, De Jong et al. (De Jong et al., 2006) have

looked into the effect of LOS distributions in hos-

pitals on decisions made by doctors and Buttgieg et

al. (Buttigieg et al., 2018) have investigated differ-

ent structural effects that increase the overall average

LOS for hospitals.

The directly related work has also built models for

predicting LOS in different situations. There are sev-

eral papers that also have performed LOS prediction

in other scenarios on older versions of the MIMIC

dataset. Gentimis et al. (Gentimis et al., 2017) have

set-up a binary classiﬁer that differentiates between

short ( ≤ 5 days) and long (> 5 days) stays after a

patient leaves the intensive care unit (ICU) using a

neural network. Zebin et al. (Zebin et al., 2019) have

used a similar approach with slightly different classes

( ≤ 7 days and > 7 days). Rocheteau et al. (Ro-

cheteau et al., 2021) have used a temporal pointwise

convolutional model to predict the remaining days of

patients in intensive care.

There are also studies focusing on speciﬁc

datasets or cohorts. Here we only name a few that

have a direct link to ED patients. Launay et al. (Lau-

nay et al., 2015) have classiﬁed prolonged LOS us-

ing a neural network and Chang et al. (Chang et al.,

2022) have further focused on classifying the pro-

longed LOS on severe subgroups in the data and have

achieved best results using a CatBoost model. Zol-

banin et al. (Zolbanin et al., 2020) has focused on

predicting LOS for patients with chronic diseases on

a specialized dataset. Stone et al. (Stone et al., 2019)

has focused on using admission data to predict the ED

LOS.

For an extensive overview of studies connected

with LOS prediction Stone et al. (Stone et al., 2022)

and Bacchi et al. (Bacchi et al., 2022) have set-up

two review papers. Both review papers differentiate

between solving a classiﬁcation task (i. e. long vs

short stay) and a regression task (i. e. predicting the

LOS on a continuous time-scale). Overall the related

work shows that the LOS prediction is a frequently

researched task. Several works focus on using infor-

mation from a previous unit to predict LOS of the next

unit. Despite the importance of the patients that have

come through ED admission, to our knowledge pre-

dicting LOS of patients from information available

at the point in time of leaving the ED unit has not

been researched before. An explanation is that the

MIMIC-IV dataset, and with it the ED module, has

only been released rather recently. Additionally, the

overall availability of large datasets that cover multi-

ple process steps in hospitals is quite small.

3 DATASET

The chosen database of our work is MIMIC-IV, a cen-

tralized medical information mart, which holds health

records of more than 250,000 patients admitted to the

Beth Israel Deaconess Medical Center in Boston be-

tween the years 2008 - 2019 (Johnsen et al., 2021).

All patient data has been extracted from the hospi-

tal databases, prepared and reorganized to facilitate

data analysis for researchers and anonymized to pro-

tect each patients personal information.

The MIMIC-IV database is structured into the

modules core, hosp and icu, which store a compre-

hensive view of each patient stay from demographic

information to laboratory results. The newly added

ed module further includes data originating from the

emergency department.

Our cohort has been selected to only include adult

patients (age > 18) who had at least one stay in the

emergency department. We further excluded very

long stays (LOS > 50 days) to remove extreme out-

liers, which resulted in dropping 537 stays. Patients

with missing data, which is only present in the triage

table, have been dropped from the ﬁnal dataset. The

selection resulted in a total of 181,797 patients ex-

tracted from MIMIC-IV.

As Figure 1 shows, ages are in the range of 18 to

91, with all patients older than 89 grouped into the

age of 91. The largest amount of patients fall into the

range of 50 to 70 years of age. The distribution of

women and men is equal in the dataset, with around

52% of stays by female patients.

Figure 2 shows the LOS distribution for patients

in the MIMIC IV database. The graph displays the

typical positive skew of LOS data, with the mean at

3.9 days and a median value of 2.4 days.

Predicting Hospital Length of Stay of Patients Leaving the Emergency Department

125

4 METHODS

In this section, we give a brief overview of the tech-

nical methodology used in the experiments. The

methodology is structured into feature engineering,

the CatBoost architecture, the chosen loss functions

and the hyperparameter tuning.

4.1 Features

Features have been selected based on the research of

Buttigieg et al. (Buttigieg et al., 2018) and are catego-

rized into the thematic groups: demographics, medi-

cal and triage.

Demographics are features that are effected by

the patient directly and by their living circumstances.

They consist of the age, gender, insurance and eth-

nicity. All the values are retrieved directly from the

patients and admission table, as they are included in

the electronic health record (EHR).

Medical features refer to attributes that depend on

the speciﬁc hospital stay. This includes the admis-

sion location and the diagnosis given to patients at the

end of their emergency department stay in form of an

ICD-Code.

Additional features have been engineered from the

existing data, to take advantage of additional infor-

mation existing in MIMIC IV. The variables los and

ed los are based on the admission and discharge times

from the hospital and emergency department. Both

values can be calculated directly from the admission

and the ed stays table, where admission and discharge

times are available and represent the fractional days a

patient has spent in the hospital and the emergency

department respectively.

The variable diagnosis count is calculated by

summing up each individual diagnosis given to a pa-

tient during their stay, which is noted in the diagnosis

table. The variable medicine count follows the same

Figure 1: Age distribution of the created dataset used for

LOS prediction.

Figure 2: LoS distribution as a histogram for all patients

from MIMIC-IV. Values larger than 50 are ignored for the

purpose of visibility

procedure, but is calculated from the medrecon ta-

ble, which tracks the medicine a patient is taking cur-

rently. Both values are created to add further informa-

tion about the complexity of the patients condition.

The variable previous stays is calculated by count-

ing all hospital admissions a patient has had in the

past. This can be done by counting the amount of dif-

ferent hospital admissions for a single patient prior to

the current admission date.

The variable previous stays average length is cre-

ated by adding the LOS value of the stays found and

dividing by the number of previous stays.

Triage data is collected speciﬁcally while patients

are in the emergency department by a care provider

asking questions to assess the patients’ current health

status questions. Afterwards the patients’ vital signs

are measured. Based on the measurement the level

of acuity is decided, which serves as the basis when

deciding if the patient has to be put into critical care.

Features resulting from vital signs are resprate, the

resperatory rate in breaths per minute, temperature,

o2sat, sbp and dbp, paint and acuity.

Table 1 gives an overview about all the features

extracted from MIMIC IV, including each type and

where it is extracted from. The engineered features

and how they have been created are further explained

in Section 4.1.

4.2 CatBoost Architecture

CatBoost is an open-source library for gradient boost-

ing. The name stands for categorical boosting, be-

cause the CatBoost architecture is able to handle

categorical data directly, without the need of man-

ual conversion to a numerical representation (Doro-

gush et al., 2018). The algorithm is designed to

HEALTHINF 2023 - 16th International Conference on Health Informatics

126

calculate target statistics for each categorical value,

which transforms the categorical into a numeric value,

while keeping the information the feature holds intact.

The conversion avoids adding unfeasible amounts

of columns to a dataset, which is a known prob-

lem with One-Hot-Encoding (Cerda and Varoquaux,

2022). With over 13,000 possible ICD-Codes in the

database, One-Hot-Encoding has passed the limits in

usefulness.

Comparing CatBoost to other popular boosting

frameworks like XGBoost or LightGBM show that

CatBoost achieves state-of-the-art performance, both

on quality and speed. It outperformed both frame-

works on multiple tasks (Dorogush et al., 2018). In

the realm of boosting frameworks, CatBoost has in-

creased in popularity compared to the other libraries.

To give an example, in healthcare CatBoost has been

used for predicting ICU mortality (Safaei et al., 2022)

and to predict if a patient will need mechanical venti-

lation during the hospital stay (Yu et al., 2021).

4.3 Loss Function & Evaluation Metrics

We use the CatBoost Model in two conﬁgurations.

First, we ﬁt a model on the root mean squared er-

ror (RMSE) loss function provided by the CatBoost

library. RMSE is a commonly used metric in ma-

chine learning tasks, which penalizes larger errors

more heavily than smaller ones.

Because of the high positive skew of LOS data, it

is important to consider a loss function, that is more

robust against outliers and able to mitigate the skwe-

ness of the data (Rocheteau et al., 2021). In accor-

dance to the ﬁndings of Rocheteau et al. (Rocheteau

et al., 2021), we used the root mean squared logarith-

Table 1: Features extracted from MIMIC IV, with type and

source table.

Group Feature Type Source Table

Demographic

Gender Binary Patients

Age Discrete Patients

Ethnicity Categorical Admissions

Insurance Categorical Admissions

Medical

ICD Code Categorical Diagnosis

Adm. Location Categorical Admissions

Diagnosis Count Discrete Engineered

Medicine Count Discrete Engineered

Previous Stays Discrete Engineered

Prev. Stays Avg. Continuous Engineered

ED LoS Continuous Engineered

LoS Continous Engineered

Triage

Resprate Discrete Triage

Temperature Continuous Triage

O2sat Discrete Triage

sbp Discrete Triage

dbp Discrete Triage

Pain Discrete Triage

Acuity Discrete Triage

mic error (RMSLE) as a second loss function, which

penalizes proportional errors and is less affected by

outliers. Since CatBoost does not provide RMSLE

as an optimization objective, we have implemented it

ourselves using the custom objective interface.

Since we want to compare our results to the works

of Rocheteau et al. (Rocheteau et al., 2021) and Gen-

timits et al. (Gentimis et al., 2017), our LOS predic-

tion is conducted in a similar way to the aforemen-

tioned works and uses the same metrics for evalua-

tion. Metrics used are mean squared error (MSE),

mean absolute percentage error (MAPE), mean ab-

solute error (MAE), mean squared logarithmic error

(MSLE) and the coefﬁcient of determination (R2).

For the case of using predictions to optimize clini-

cal processes and capacity the MAE and MAPE er-

rors are the most important. Additionally, in order to

compare our results to the results of Gentimis et al.

(Gentimis et al., 2017), who predicted short ( ˆy <= 5)

vs. long stays ( ˆy > 5), the results of the regressor and

the target values are converted into a categorical rep-

resentation of short vs, long stay with the same thresh-

old of 5 days. After the conversion we calculate the

accuracy of the CatBoost model for the classiﬁcation

task.

4.4 Hyperparameter Tuning

Since CatBoost is a library for gradient boosted trees,

hyperparameters fall into the domain of tree-speciﬁc

parameters. CatBoost provides an order of impor-

tance in the documentation

, going from convention-

ally most inﬂuential parameters to the more case spe-

ciﬁc ones. First, we used the CatBoost regression

model with default values, to check for initial over-

ﬁtting and to get reasonable default values for each

parameter.

Afterwards we performed a grid search, with the

hyperparameter space being a combination of the

most inﬂuential parameters, which are the learning

Table 2: Hyperparameter selection of the ﬁnal CatBoost

model, after the grid search has been performed.

Hyperparameter Value Default

Learning rate 0.1 no

Tree Depth 6 no

L2 regularization 50 no

Random strength 1 yes

Bagging temperature 1 yes

Border count 128 yes

Internal dataset order False yes

Tree growing policy Symmetric yes

https://catboost.ai/en/docs/concepts/parameter-tuning

Predicting Hospital Length of Stay of Patients Leaving the Emergency Department

127

rate, the tree depth and the L2 regularization. The

values for the grid search are predeﬁned with the de-

fault values and recommendations from the CatBoost

documentation serving as the basis for the selection.

We selected the parameters of the run with the

most optimal evaluation metric as the parameters for

the ﬁnal model. Table 2 presents the selected hyper-

parameters.

Performing the grid search has shown, that ad-

justing the tree depth contributed the most to the

emergence of under- or overﬁtting. Larger trees per-

formed better on the training dataset, but but lost per-

formance when making predictions on the evaluation

data, which is a sign that the model lost the ability to

generalize on new data.

4.5 Generation of Final Results

The LOS prediction is performed with the model

setup described above. We split our dataset into train,

validation and test data with a proportion of 60%,

20% and 20% respectively. The training and testing is

conducted in 10 runs, where each run has the model

train and predict on a new, randomly sampled dataset,

which introduces some randomness in the data to not

have the model be inﬂuenced by a biased selection of

the dataset.

To provide an unbiased evaluation of the model

performance during training and hyperparameter tun-

ing, the validation data is used to calculate the met-

rics during training. Finally, the model is tested on

the new, unseen test data, where the evaluation met-

rics described in Section 4.3 are calculated from the

model results.

To understand the impact of the diagnosis a patient

received at the end of the emergency department stay,

we have created two separate training datasets with

varying levels of detail of the ICD-Code.

3 Digit ICD-Code: The ﬁrst dataset has the ICD-

Codes truncated to 3 digit codes to reduce the cardi-

nality, while also reducing the amount of information

the ICD-Code holds.

Full ICD-Code: The second dataset uses full ICD-

Codes, where each ICD-Codes encodes the most in-

formation about the patients condition.

The separation has been performed to take advan-

tage of CatBoosts ability to handle inputs with high

cardinality. We calculate the selected evaluation met-

rics (see section 4.3) based on the results of each run

and calculate 95%-conﬁdence for every metric. The

same procedure is repeated for the baseline models.

4.6 Baselines

We included additional baseline models in our work,

to evaluate the CatBoost model. We used mean and

median predictors, which calculate the mean and me-

dian of the training dataset and use the values for ev-

ery prediction. In our case the values are 3.9 for the

mean and 2.4 for the median regressor. The so called

Dummy Regressor is the most simple model possi-

ble, which is better than random guessing, because it

is independent from the actual input when making a

prediction. They are used to set performance expec-

tations for the task on our speciﬁc dataset.

Additionally, we used a linear regression model to

predict the LOS as a further baseline. Linear regres-

sion has been used in LOS prediction before and is

usually a popular choice, because it is widely appli-

cable and the results can be easily interpreted (Austin

et al., 2002).

5 RESULTS & DISCUSSION

In this section, we present the results of the trained

regression models and compare our results and accu-

racy metrics to related works (Gentimis et al., 2017;

Rocheteau et al., 2021).

5.1 Prediction Results

Table 3 shows the chosen metrics for seven differ-

ent regressor models. The ﬁrst three models are our

three baseline models. Due to the skew of the LOS

curve, the median model, which is predicting slightly

lower LOS times, has slightly worse performance on

the MSE. Looking at our main metrics when it comes

to usability, MAE and MAPE are the better metrics

for the median model. The linear regression is not

adding any value and in fact makes the model worse,

which shows that more complex models are needed to

solve the use case.

The following four models are different versions

of the CatBoost model (trained on two different data

sets and using two different loss-functions). All four

CatBoost models are better than the baseline mod-

els. The best results terms of MAE and MAPE

are achieved by the CatBoost (RMSLE, 3-digit ICD

codes) model, where we get a MAE of 2.36 and a

MAPE of 136. Compared with the baseline model,

the increase is signiﬁcant but has still room for im-

provements. Especially, the change of going to the

RMSLE loss function that we implemented for the

CatBoost architecture was able to achieve a signiﬁ-

cant gain compared to the RMSE loss function. The

HEALTHINF 2023 - 16th International Conference on Health Informatics

128

Table 3: Regression results of the CatBoost model compared to the deﬁned baselines. Three separate datasets are used during

the experiments and the metrics are calculated for each dataset. Results are displayed as 95%-Conﬁdence intervals. The

intervals are not calculated for the dummy predictors, because they are deterministic. The CatBoost model is trained with

both the RMSE and RMSLE loss function. For the ﬁrst four metrics lower values are better. The R2 score is optimal for a

value of one.

Model MSE MAE MAPE MSLE R2

Mean 25.03 3.15 372 0.66 0

Median 27.6 2.88 229 0.57 -0.09

Linear Regression 27.3±0.0 3.34±0.00 379±0 0.73±0.00 -0.09±0.00

CatBoost (RMSE) (3 digit ICD-Code) 20.23±0.01 2.61±0.00 209±0 0.42±0.00 0.18±0.00

CatBoost (RMSLE) (3 digit ICD-Code) 21.59± 0.00 2.36±0.00 136±0 0.36±0.00 0.13±0.00

CatBoost (RMSE) (Full ICD-Code) 19.82±0.00 2.58±0.00 206±0 0.41±0.00 0.18±0.00

CatBoost (RMSLE) (Full ICD-Code) 21.70±0.00 2.42±0.00 129±0 0.36±0.00 0.11±0.00

performance increase is in line with other research

in use cases that have very skewed distributions in

the prediction variable (Rocheteau et al., 2021; Feng

et al., 2014; Rengasamy et al., 2020). As can be seen

in Figure 3, the model trained with the RMSLE loss

function managed to further centralize the loss around

zero, with around 43 per cent of errors being below

one day. As negative values signify that predictions

are lower than the target, the overall shift to the right

shows that the model with the RMSLE loss function

is more likely to overpredict. The model predictions

do not vary greatly over the ten runs, as the 95%-

Conﬁdence intervals in Table 3 show.

Figure 4 shows the feature importance of the

model provided by the CatBoost library. The ﬁgure

shows that the top features are all related directly to

the patient condition, with the most important feature

being the actual diagnosis. Furthermore, the graph

shows that engineered features have made an overall

impact on the prediction, since 4 out of the top 10 fea-

tures to the model have been created. To the contrary,

the high inﬂuence of ed los can be seen as a limita-

Figure 3: Comparison of prediction errors for the RMSE

(blue) and RMSLE (light-blue) loss functions. RMSLE had

a lower variance, further centering the errors around zero.

Predictions errors that are greater than 20 days are hidden

here, to improve readability.

Figure 4: Top 10 most important features to the CatBoost

model.

tion of the model, since the ed los can be inﬂuenced

by more than the medical condition of the patient. Op-

erational factors, like holding patients in the ED be-

cause of hospital unit overcrowding, would prolong

the ED stay as well. Therefore, the exact composition

of the ed los and its actual inﬂuence on the hospital

LOS should be further investigated.

Lastly, the graph shows that mostly medical fea-

tures, related to the patient condition directly, are of

importance to the model. The ICD-Code had the

largest impact over all the features used during train-

ing, signiﬁcantly impacting the ﬁnal prediction. Com-

paring the results on the two datasets from Table 3

shows an increase in performance when using the full

ICD-Code, which further conﬁrms the importance of

accounting for categorical data.

5.2 Comparison with Related Work

As described above we compare our results to the re-

sults from Rocheteau et al. (Rocheteau et al., 2021)

and Gentimis et al. (Gentimis et al., 2017). It is im-

portant to stress that both works have solved different

prediction tasks to our work. Gentimis et al. (Gen-

timis et al., 2017) predicts the LOS of the patient af-

ter they leave the ICU. Rocheteau et al. (Rocheteau

et al., 2021) predicts the time the patient is staying

in ICU. They used different data compared to our ED

Predicting Hospital Length of Stay of Patients Leaving the Emergency Department

129

Table 4: Performance of the regressor model compared to the works of Rocheateau and Gentimis (Gentimis et al., 2017;

Rocheteau et al., 2021). The same metrics are used for comparison.

Model MSE MAE MAPE MSLE R2 Short vs. Long

CatBoost (RMSE) 20.23 2.61 209 0.42 0.18 74%

CatBoost (RMSLE) 21.59 2.36 136 0.36 0.13 78%

TPC (MSE) 21.6 2.21 154.3 1.80 0.27 —

TPC(MSLE) 21.7 1.78 63.5 0.70 0.27 —

Gentimis NN — — — — — 79%

use case. Nevertheless we include a comparison to

see, if the performance metrics of the predictions are

in a similar range.

Our reported metrics match the ones from Ro-

cheteau et al. (Rocheteau et al., 2021). Gentimis

et al. (Gentimis et al., 2017) have chosen a classi-

ﬁcation between long stays and short stays, where a

long stay is predicted, when the LOS is greater than 5

days. Consequently, the prediction results of the Cat-

Boost model must be transformed to be comparable.

The transformation has been performed by retroac-

tively classiﬁying the prediction outputs and the tar-

get variable depending on its value being lower or

greater than 5. Afterwards, the accuracy is calculated

by comparing both values, which results in the same

metric used by Gentimis et al.

Table 4 displays the results of all metrics, the last

column being the accuracy on classifying short vs.

long stays, which Gentimis et al. have done. The

CatBoost model produced similar but slightly worse

results compared to the Temporal Pointwise Convo-

lution Network created by Rocheteau et al. when it

comes to MAE and MAPE and relatively compara-

ble results when it comes to MSE. The distribution of

ICU LOS is signiﬁcantly narrower compared to reg-

ular station LOS after ED dismissal which might be

part of the explanation. The tendency of getting bet-

ter performance when switching from RMSE/MSE

to RMSLE/MSLE was also observed by Rocheteau

et al. Our transformed classiﬁcation metric shows

almost identical accuracy performance (78% for the

CatBoost RMSLE, 3-Digit Groups) as the results of

Gentimis et al. (79%).

6 CONCLUSION

In this paper, we have used the released ED data of the

MIMIC-IV dataset released in 2020 to predict clinical

LOS of patients after their ED stay. We have trained

a CatBoost model on the LOS prediction task and im-

plemented the MSLE loss function as a transfer from

other models to the CatBoost architecture. The per-

formed feature engineering had a positive effect on

the prediction quality, as 4 out of the top 10 important

features are engineered, which further reiterates the

importance of taking advantage of domain knowledge

to extract additional information. Our prediction per-

formance was better than the implemented baseline

models and comparable to similar use cases of predic-

tions using the MIMIC dataset. The average absolute

error of 2.36 days is a signiﬁcant improvement and

might be used for better planning in hospitals but still

has room for improvement. A further reduction of

the prediction error based on our presented approach

will be the target for future research. Potential ideas

could be to reﬁne the feature engineering process with

more domain knowledge, e. g. by grouping further

grouping of high dimensional categorical features, or

to benchmark further model architectures, e.g. Gen-

eralized Linear Models (GLMs) that have been shown

effective in dealing with skewed data.

REFERENCES

Austin, P. C., Rothwell, D. M., and Tu, J. V. (2002). A com-

parison of statistical modeling strategies for analyzing

length of stay after cabg surgery. Health Services and

Outcomes Research Methodology, 3(2):107–133.

Bacchi, S., Tan, Y., Oakden-Rayner, L., Jannes, J., Kleinig,

T., and Koblar, S. (2022). Machine learning in the

prediction of medical inpatient length of stay. Internal

medicine journal, 52(2):176–185.

Buttigieg, S. C., Abela, L., and Pace, A. (2018). Vari-

ables affecting hospital length of stay: A scoping re-

view. Journal of Health Organization and Manage-

ment, 32(3):463––493.

Cerda, P. and Varoquaux, G. (2022). Encoding high-

cardinality string categorical variables. IEEE

Transactions on Knowledge and Data Engineering,

34(3):1164–1176.

Chang, Y.-H., Shih, H.-M., Wu, J.-E., Huang, F.-W., Chen,

W.-K., Chen, D.-M., Chung, Y.-T., and Wang, C. C.

(2022). Machine learning–based triage to identify

low-severity patients with a short discharge length

of stay in emergency department. BMC Emergency

Medicine, 22(1):1–10.

Christ, M., Grossmann, F., Winter, D., Bingisser, R., and

Platz, E. (2010). Modern triage in the emergency

department. Deutsches

Arzteblatt international, page

892–898.

HEALTHINF 2023 - 16th International Conference on Health Informatics

130

De Jong, J. D., Westert, G. P., Lagoe, R., and Groenewe-

gen, P. P. (2006). Variation in hospital length of stay:

do physicians adapt their length of stay decisions to

what is usual in the hospital where they work? Health

Services Research, 41(2):374–394.

Dorogush, A. V., Ershov, V., and Gulin, A. (2018). Cat-

boost: gradient boosting with categorical features sup-

port. ArXiv, abs/1810.11363.

Feng, C., Hongyue, W., Lu, N., Chen, T., He, H., Lu, Y.,

and Tu, X. (2014). Log-transformation and its impli-

cations for data analysis. Shanghai archives of psychi-

atry, 26:105–9.

Gentimis, T., Alnaser, A. J., Durante, A., Cook,

K., and Steele, R. (2017). Predicting hospital

length of stay using neural networks on mimic

iii data. 2017 IEEE 15th Intl Conf on De-

pendable, Autonomic and Secure Computing, 15th

Intl Conf on Pervasive Intelligence and Comput-

ing, 3rd Intl Conf on Big Data Intelligence and

Computing and Cyber Science and Technology

Congress(DASC/PiCom/DataCom/CyberSciTech).

Johnsen, A., Bulgarelli, L., Horng, S., Celi, L. A.,

and Mark, R. (2021). Mimic-iv (version 1.0).

https://doi.org/10.13026/s6n6-xd98.

Launay, C., Rivi

ere, H., Kabeshova, A., and Beauchet, O.

(2015). Predicting prolonged length of hospital stay

in older emergency department users: use of a novel

analysis method, the artiﬁcial neural network. Euro-

pean journal of internal medicine, 26(7):478–482.

Rengasamy, D., Rothwell, B., and Figueredo, G. P. (2020).

Asymmetric loss functions for deep learning early

predictions of remaining useful life in aerospace gas

turbine engines. 2020 International Joint Conference

on Neural Networks (IJCNN), page 1–7.

Rocheteau, E., Li

o, P., and Hyland, S. (2021). Temporal

pointwise convolutional networks for length of stay

prediction in the intensive care unit. Proceedings of

the Conference on Health, Inference, and Learning.

Sadler, B. L., Berry, L. L., Guenther, R., Hamilton, D. K.,

Hessler, F. A., Merritt, C., and Parker, D. (2011). Fa-

ble hospital 2.0: the business case for building bet-

ter health care facilities. Hastings Center Report,

41(1):13–23.

Safaei, N., Safaei, B., Seyedekrami, S., Talaﬁdaryani, M.,

Masoud, A., Wang, S., Li, Q., and Moqri, M. (2022).

E-catboost: An efﬁcient machine learning framework

for predicting icu mortality using the eicu collabora-

tive research database. PLOS ONE, 17(5).

Stone, K., Zwiggelaar, R., Jones, P., and Mac Parthal

ain,

N. (2022). A systematic review of the prediction of

hospital length of stay: Towards a uniﬁed framework.

PLOS Digital Health, 1(4):e0000017.

Stone, K., Zwiggelaar, R., Jones, P., and Parthal

ain, N. M.

(2019). Predicting hospital length of stay for accident

and emergency admissions. In UK Workshop on Com-

putational Intelligence, pages 283–295. Springer.

Yu, L., Halalau, A., Dalal, B., Abbas, A. E., Ivascu, F.,

Amin, M., and Nair, G. B. (2021). Machine learning

methods to predict mechanical ventilation and mortal-

ity in patients with covid-19. PLOS ONE, 16(4).

Zebin, T., Rezvy, S., and Chaussalet, T. J. (2019). A deep

learning approach for length of stay prediction in clin-

ical settings from medical records. In 2019 IEEE

Conference on Computational Intelligence in Bioin-

formatics and Computational Biology (CIBCB), pages

1–5. IEEE.

Zolbanin, H. M., Davazdahemami, B., Delen, D., and

Zadeh, A. H. (2020). Data analytics for the sustainable

use of resources in hospitals: predicting the length of

stay for patients with chronic diseases. Information &

Management, page 103282.

Predicting Hospital Length of Stay of Patients Leaving the Emergency Department

131