PT-MESS: A Problem-transformation Approach for Multi-event

Survival Analysis

Michela Venturini

1,2 a

, Felipe Kenji Nakano

1,2 b

and Celine Vens

1,2 c

KU Leuven, Campus KULAK, Department of Public Health and Primary Care,

Etienne Sabbelaan 53, 8500 Kortrijk, Belgium

Itec, imec research group at KU Leuven, Etienne Sabbelaan 53, 8500 Kortrijk, Belgium

Keywords:

Data Scarcity, Right-censoring, Survival Analysis, Multi-target Regression.

Abstract:

Multi-event survival analysis is an under-explored ﬁeld in literature, typically addressed by modeling each

event independently or implying speciﬁc event settings. In this context, problem transformations approaches

offer a promising alternative to rephrase the setting into standard multi-target regression. Nevertheless, they

also suffer from the intrinsic presence of partial information in time-to-event data, since their application often

requires the exclusion of censored observations, thus potentially discarding valuable information. In this work,

we propose a novel Problem Transformation Approach for Multi-event Survival analySis (PT-MESS), which

is capable of exploiting partial information, by encoding the survival outcome in a risk score based on the time-

to-event distribution estimation. This approach allows the use of any multi-target machine learning model to

address the original survival task. Using random forest as the underlying model, we conducted experiments

using real-data multiple benchmarks from the medical domain and synthetic datasets. Our results revealed

that PT-MESS provides superior or competitive results compared to competitors from the literature, especially

when the events considered had a similar survival distribution.

1 INTRODUCTION

Survival analysis (SA) refers to a ﬁeld of statistics

that deals with time-to-event data, which often con-

cerns medical applications where the outcome of in-

terest is the time until occurrence of one (or more)

adverse outcomes (e.g., death or cancer recurrence).

SA is characterized by the presence of partial infor-

mation, referred to as censoring, mainly due to indi-

viduals that are either lost to follow-up or do not ex-

perience the event during the follow-up, thus leaving

the true time-to-event unknown. This phenomenon

can be considered as a type of data scarcity as it leads

to lack of labels and possibly poor prediction perfor-

mance in case of high censoring rate. SA has been

tackled either with algorithm-adaptation or problem-

transformation machine learning approaches. While

the ﬁrst scenario implies adapting existing algorithms

to handle survival data (Ping Wang, 2019), the latter

one focuses on transforming time-to-event data to a

well-studied problem, such as classiﬁcation and re-

https://orcid.org/0000-0002-9947-0218

https://orcid.org/0000-0002-4884-9420

https://orcid.org/0000-0003-0983-256X

gression, enabling off-the-shelf models to be straight-

forwardly applied (Vock et al., 2016). For instance,

Vock et.al. (Vock et al., 2016) introduced an ap-

proach that employs weights to address SA as a bi-

nary classiﬁcation task, however it excludes censored

observations, potentially overlooking useful informa-

tion. Despite the existence of numerous machine

learning approaches for single-event SA, the literature

presents few studies on the multi-event setting (Tjan-

dra et al., 2021; Ishwaran et al., 2014). Furthermore,

these studies have proposed algorithm-adaptation ap-

proaches which are solely tailored for competing and

semi-competing risk analysis. That is, events are nec-

essarily mutually exclusive (e.g., death from heart

attack or breast cancer) or chronologically ordered

(e.g., Alzheimer’s disease onset and death), thus be-

ing inadequate for more general settings. Although

multiple outputs have been extensively studied in the

machine learning literature (Xu et al., 2020), to date

no problem-transformation approach exists to cast

multi-event survival analysis to a multi-output predic-

tion problem. In this work, we propose a problem-

transformation approach, namely Problem Transfor-

mation Multi-Event Survival AnalySis (PT-MESS),

Venturini, M., Nakano, F. and Vens, C.

PT-MESS: A Problem-transformation Approach for Multi-event Survival Analysis.

DOI: 10.5220/0011531600003523

In Proceedings of the 1st Workshop on Scarce Data in Artiﬁcial Intelligence for Healthcare (SDAIH 2022), pages 29-34

ISBN: 978-989-758-629-3

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

which can be employed in any multi-event survival

setting. More speciﬁcally, our approach relies on the

survival distribution relative to each event to encode

outcomes, allowing the inclusion of censored data in

the model. Speciﬁcally, PT-MESS encodes time and

censoring information into a score, enabling us to

treat any multi-event SA task as a multi-target regres-

sion problem. By performing experiments on pub-

licly available medical datasets, employing a standard

multi-target regression model, we showcase that PT-

MESS leads to superior or competitive results against

methods from the literature in the majority of the

cases.

2 PROPOSED METHOD

Survival outcome typically consists of two elements,

namely time to event (or censoring) and binary sta-

tus, indicating whether the individual experiences the

event or is censored at the given time point. We pro-

pose to encode multi-event SA as a multi-target re-

gression task, by transforming the survival outcome,

for each event, into a single score. Speciﬁcally, the

encoding process is based on the Kaplan-Meier esti-

mate (Kaplan and Meier, 1958) of the survival curve

relative to each event. Kaplan-Meier curves are used

to describe the event distribution in the population,

taking into account time and censoring information.

Typically, multi-event survival problems are encoded

as follows. Given N individuals and K events, each

individual i = 1, 2, ..., N is associated with three vari-

ables: the covariates vector x

(i)

, the censoring statuses

for each event c

(i)

and the time information o

(i)

. In

this context, the Kaplan-Meier curve estimate of the

survival curve for each event k,

(t), is deﬁned as

follows:

(t) =

∏

j:t

≤t

(1 −

( j)

) (1)

with t

a time when at least one event k happened,

( j)

the number of events k that happened at time t

and r

( j)

the individuals still at risk for event k at time

, estimated from c

and o

. PT-MESS incorporates

time and censoring information into the new outcome

(i)

as follows:

(i)

= 1 −

(i)

t=0

ˆs

inf

t=0

ˆs

(2)

where

(i)

t=0

ˆs

is the restricted mean survival time, up

to o

(i)

, the time at which patient i experiences the

event or is censored, and

inf

t=0

ˆs

is the expected sur-

vival time of the population, for event k. Thus, Equa-

tion 2 can be seen as an indication of how early pa-

tient i is expected to experience event k w.r.t. the con-

sidered population. The higher the score, the lower

the risk of experiencing the event. The new outcome

for each individual i thus becomes m

(i)

, a vector con-

taining a risk score for each event, that can be the

outcome to any multi-target regression model. Given

their efﬁciency and ﬂexibility in handling high dimen-

sional datasets, we employed multi-target random for-

est (Kocev et al., 2013) as the underlying regression

model in our approach. Such model intrinsically ex-

ploits correlation among events in the splitting rule of

the individual trees, by computing the average impu-

rity reduction across them.

3 EXPERIMENTAL SETUP

3.1 Datasets

We employed 5 publicly available datasets, 3 of

which are real world datasets from the medical do-

main (ADNI

, MIMIC (Johnson et al., 2016) and

CIBMTR

) and 2 were synthetic (scrData

and Syn-

thetic). All of the real ones contain semi-competing

risks, while the synthetic datasets contain both semi-

competing risk and multi-event (events are not mu-

tually exclusive and can happen in any order). Pre-

processing of ADNI and MIMIC, and creation of Syn-

thetic dataset were performed according to (Tjandra

et al., 2021). Further details are reported at Table 1.

It can be seen that these datasets present different

characteristics. Namely, MIMIC presents a consider-

able high number of features in comparison to ADNI.

Similarly, CIBMTR contains a rather limited number

of instances which are described by a considerably

higher number of features. It can also be noticed that

most of the datasets present scarce data where the cen-

soring rate is frequently above 80%.

Data used in the preparation of this article were ob-

tained from the Alzheimer’s Disease Neuroimaging Initia-

tive (ADNI) database (adni.loni.usc.edu). The ADNI was

launched in 2003 as a public-private partnership, led by

Principal Investigator Michael W. Weiner, MD. The pri-

mary goal of ADNI has been to test whether serial mag-

netic resonance imaging (MRI), positron emission tomog-

raphy (PET), other biological markers, and clinical and neu-

ropsychological assessment can be combined to measure

the progression of mild cognitive impairment (MCI) and

early Alzheimer’s disease (AD).

https://cran.r-project.org/web/packages/

SemiCompRisks/index.html

SDAIH 2022 - Scarce Data in Artiﬁcial Intelligence for Healthcare

Table 1: Characteristics of the datasets employed in this work where semi-competing datasets are represented using (S) and

independent as (I). The terminal event in each dataset highlighted in bold.

Dataset (Setting) Features Instances Events Event (Censoring rate)

MIMIC (S) 13801 3822 3 1 (0.87), 2 (0.91) and 3 (0.92)

ADNI (S) 1917 1024 2 1 (0.82) and 2 (0.99)

CIBMTR (S) 9651 30 2 1 (0.82) and 2 (0.62)

ScrData (S) 2000 4 2 1 (0.34) and 2 (0.50)

Synthetic (I) 5000 15 2 1 (0.70) and 2 (0.71)

3.2 Comparison Methods

For a fair comparison, we focused on competitive

methods that also employ ensembles of decision

trees. To the best of our knowledge, the literature

only presents a single study on random forests for

multi-event SA (Ishwaran et al., 2014), nonetheless

it is speciﬁcally tailored for competing risks, assum-

ing that each patient can only experience one event.

Hence, applying this method would require adap-

tations which are not straightforward. Further, the

method proposed by Tjandra et.al. was excluded

(Tjandra et al., 2021). Although prominent, the au-

thors have proposed a deep learning approach which

requires time-consuming parameters tuning and lacks

in interpretability. Thus, we compared the following

methods:

• Random survival forest (RSFi): this approach

learns a separate RSF (Ishwaran et al., 2008) per

event and assumes that the time-to-events among

all the events are independent.

• Inverse probability of censoring weighting

(IPCWi): this problem transformation approach

translates the survival task into a binary classiﬁ-

cation (Dong et al., 2020). Similarly to RSFi, this

approach learns a separate model (random forest

classiﬁer) for each event.

• PT-MESS independent (PT-MESSi): A variant

of our problem transformation approach, which

learns a separate single-target random forest re-

gressor for each event;

• (PT-MESS): A variant of our approach that builds

a multi-target random forest regressor that consid-

ers all events at once;

Each ensemble model was trained with 200 trees

and all other parameters were left to their default

values

. RSF is probably the most prominent ap-

proach using ensemble of trees, thus its comparison is

mandatory. Similarly, IPCW is employed as compar-

ison because it is a problem-transformation approach

https://scikit-learn.org/stable

https://scikit-survival.readthedocs.io/en/stable

comparable to ours, nonetheless it is worth mention-

ing that this method discards censored data. More-

over, both RSF and IPCW are originally designed for

single-event applications. Thus, to make them com-

parable, it is necessary to build an independent model

per event.

3.3 Evaluation

To estimate predictive performances of the models,

we employed Harrell’s concordance index(Harrell

et al., 1982) (C-index), one of the most used metrics

in survival analysis. C-index is given by

C =

∑

i,l

I(T

> T

) × I(r

> r

) × ∆

∑

i,l

I(T

> T

) × ∆

(3)

where i and l refer to pairs of observations in the sam-

ple i, l = 1, ..., N with i ̸= l, r

∈ R is the outcome risk

score and T

is the observed time-to event. ∆

discards

pairs of observations that are not comparable because

the smaller survival time is censored. The C-index

estimates how well a predicted risk score ranks obser-

vations according to their true time-to-event, taking

into account censored data. Moreover, it is easy to in-

terpret: C = 0.5 indicates a non-informative model

prediction, while C = 1 indicates that the model is

perfectly capable of separating patients with differ-

ent outcomes. We evaluated our results based on

per-event C-index as well as averaged C-index across

events.

4 RESULTS

We present our results in Figure 1. All experiments

were repeated using 5-fold cross-validation, strati-

ﬁed according to the rarest event(considering the cen-

soring distribution) in the dataset. Average C-index

(per single event, and overall) is reported, together

with standard deviation. As can be seen, our pro-

posed method often has the upper-hand. More specif-

ically, PT-MESS achieves the overall superior perfor-

mance in 3 out of 5 datasets (CMBTR, scrData and

Synthetic). Furthermore, when analyzing the perfor-

PT-MESS: A Problem-transformation Approach for Multi-event Survival Analysis

Figure 1: Results obtained on each dataset. We report the average performance obtained using the C-index and its standard

deviation obtained using 5-fold cross validation.

mance per single event, our proposed method sur-

passed the competitors in 7 out of 11 cases. As

opposed to that, PT-MESSi managed, at its best, to

be competitive with RSF and IPCW. We interpret

this as an indication that PT-MESS is preferable over

its counterpart and the competitors, considering both

its performance and its computational complexity, as

only one single model is required.

Additionally, we observed that PT-MESS out-

performed its problem-transformation approach com-

petitor, IPCWi, in several cases, as seen in events 1

and 2 of MIMIC, in event 1 of Synthetic and in the

entire scrData dataset. As reported in Table 1, these

datasets present a very high censoring rate. Hence,

we may assume that PT-MESS is capable of correctly

encoding the information of data which did not expe-

rience the event, whereas overlooking this data might

lead to sub-optimal results.

Surprisingly, RSF and IPCWi yielded consider-

ably better results in speciﬁc cases with very high cen-

SDAIH 2022 - Scarce Data in Artiﬁcial Intelligence for Healthcare

Figure 2: Kaplan-Meier estimate of the survival distribution for each event in the datasets: MIMIC, CIBMTR, ADNI and

scrData, Synthetic.

soring rate, as in MIMIC-event 3, and in the ADNI

dataset. We believe that this is related to the low cor-

relation of the events. As shown in Figure 2, their

Kaplan Meier curves reveal that events in ADNI have

very different survival curves (or time-to-event dis-

tributions). A similar behavior is observed in the

MIMIC dataset where events 1 and 2 behave iden-

tically, whereas the curve associated to event 3 fol-

lows a signiﬁcantly different tendency. This ﬁnding is

further reinforced by the curve of scrData where the

events appear to be substantially correlated, leading

to superior results by our method.

5 CONCLUSIONS

We introduced a problem transformation approach

to address multi-event survival analysis, where data

scarcity is present as right-censored outcomes. Pre-

cisely, our approach encodes the typical survival out-

come in a single score per event, based on Kaplan-

Meier estimate of the survival curve and allows to in-

clude the censored observations in the model. As the

underlying predictive model, we chose multi-target

random survival forest. Our results revealed that PT-

MESS is capable of providing superior results, indi-

cating that predicting all events concurrently is bene-

ﬁcial over addressing them separately in the majority

of the cases. In the challenging datasets, we could

identify that our method, in its current form, struggles

to predict events which are not correlated.

Hence, future work should aim to amend such de-

ﬁciency. In this direction, we instigate the investi-

gation of hybrid approaches, where global and local

methods, may be used in cooperation to achieve su-

perior results. That is, we will extend our method

PT-MESS: A Problem-transformation Approach for Multi-event Survival Analysis

to automatically detect and address correlated events

during its building time, similarly to the concept of

predictive bi-clustering trees used in multi-label clas-

siﬁcation (Zamith et al., 2020). Finally, we would

also like to further validate our method by performing

more experiments, specially regarding datasets with

competing risks, as seen in (Ishwaran et al., 2014).

AUTHORS CONTRIBUTIONS

Conceptualization, M.V.; methodology, M.V. and

F.K.N.; software, M.V.; writing—original draft prepa-

ration, M.V. and F.K.N.; writing—review and editing,

C.V.; supervision, funding acquisition, C.V. All au-

thors have read and agreed to the published version of

the manuscript

FUNDING

This research was funded by the Research Fund

Flanders (through research project G080118N and

G0A2120N). The authors also acknowledge the

Flemish Government (AI Research Program).

REFERENCES

Dong, G., Mao, L., Huang, B., Gamalo-Siebers, M.,

Wang, J., Yu, G., and Hoaglin, D. C. (2020). The

inverse-probability-of-censoring weighting (ipcw) ad-

justed win ratio statistic: an unbiased estimator in the

presence of independent censoring. Journal of bio-

pharmaceutical statistics, 30(5):882–899.

Harrell, F., Califf, R., Pryor, D., Lee, K., and Rosati, R.

(1982). Evaluating the yield of medical tests. JAMA,

247(18):2543—2546.

Ishwaran, H., Gerds, T. A., Kogalur, U. B., Moore, R. D.,

Gange, S. J., and Lau, B. M. (2014). Random survival

forests for competing risks. Biostatistics, 15(4):757–

773.

Ishwaran, H., Kogalur, U. B., Blackstone, E. H., and Lauer,

M. S. (2008). Random survival forests. The annals of

applied statistics, 2(3):841–860.

Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L.-w. H.,

Feng, M., Ghassemi, M., Moody, B., Szolovits, P.,

Anthony Celi, L., and Mark, R. G. (2016). Mimic-

iii, a freely accessible critical care database. Scientiﬁc

Data, 3(1):160035.

Kaplan, E. L. and Meier, P. (1958). Nonparametric esti-

mation from incomplete observations. Journal of the

American Statistical Association, 53(282):457–481.

Kocev, D., Vens, C., Struyf, J., and D

zeroski, S. (2013).

Tree ensembles for predicting structured outputs. Pat-

tern Recognition, 46(3):817–833.

Ping Wang, Yan Li, C. K. R. (2019). Machine Learning for

Survival Analysis: A Survey. ACM Comput. Surv.,

51(6).

Tjandra, D., He, Y., and Wiens, J. (2021). A hierarchical

approach to multi-event survival analysis. Proceed-

ings of the AAAI Conference on Artiﬁcial Intelligence,

35(1):591–599.

Vock, D. M., Wolfson, J., Bandyopadhyay, S., Adomavi-

cius, G., Johnson, P. E., Vazquez-Benitez, G., and

O’Connor, P. J. (2016). Adapting machine learn-

ing techniques to censored time-to-event health record

data: A general-purpose approach using inverse prob-

ability of censoring weighting. J Biomed Inform,

61:119–131.

Xu, D., Shi, Y., Tsang, I. W., Ong, Y.-S., Gong, C., and

Shen, X. (2020). Survey on multi-output learning.

IEEE Transactions on Neural Networks and Learning

Systems, 31(7):2409–2429.

Zamith, B., Nakano, F. K., Cerri, R., and Vens, C. (2020).

Predictive bi-clustering trees for hierarchical multi-

label classiﬁcation. ECML PKDD 2020.

SDAIH 2022 - Scarce Data in Artiﬁcial Intelligence for Healthcare