Patient Similarity Networks Integration for Partial Multimodal Datasets

Jessica Gliozzo

1,2 a

, Alex Patak

2 b

, Antonio Puertas-Gallardo

2 c

, Elena Casiraghi

1 d

and Giorgio Valentini

1,3 e

AnacletoLab - Computer Science Department, Universit

a degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy

European Commission, Joint Research Centre (JRC), Ispra, Italy

ELLIS - European Laboratory for Learning and Intelligent Systems, Milan Unit, Via Celoria 18, 20135, Milan, Italy

Keywords:

Patient Similarity Networks, Similarity Network Fusion, Data Integration, Multi-Omics Data Integration,

Missing Data, Partial Samples.

Abstract:

Integration of partial samples in Patients Similarity Networks, i.e. the combination of multiple data sources

when some of them are completely missing in some samples, is a largely overlooked problem in the multi-

omics data integration literature for Precision Medicine. Nevertheless in clinical practice it is quite usual that

one or more types of data are missing for a subset of patients. We present an algorithm able to combine multiple

sources of data in Patients Similarity Networks when data of one or more sources are completely missing for

a subset of patients. The proposed approach relies on a message-passing learning strategy to recover and

combine completely missing data leveraging the Similarity Network Fusion algorithm. Preliminary results on

TCGA breast cancer data show the effectiveness of the proposed approach.

1 INTRODUCTION

In the last decade, Patient Similarity Networks (PSN)

emerged as a convenient model to integrate multiple

data views and perform clustering/classiﬁcation tasks

to support Precision Medicine (PM) (Pai and Bader,

2018; Gliozzo et al., 2022). The aim of PM is to tailor

the diagnosis, prognosis and treatment of each patient

making decisions based on her/his genomic, environ-

mental and lifestyle data. This approach represents

a paradigm shift from the use of broad disease cate-

gories typical of a “one size ﬁts all” method (Akhoon,

2021). PM requires by deﬁnition the use of big het-

erogeneous data acquired from each patient at differ-

ent levels (e.g. clinical, omics, images, etc), which

is currently possible thanks to the advent of high-

throughput technologies (Lightbody et al., 2019).

PSN are a simple yet powerful way to exploit such

diverse array of data sources. A PSN is a graph where

nodes are patients and edges represent the pairwise

https://orcid.org/0000-0001-7629-8112

https://orcid.org/0000-0002-9282-188X

https://orcid.org/0000-0003-3457-4777

https://orcid.org/0000-0003-2024-7572

https://orcid.org/0000-0002-5694-3919

similarity between individuals computed using their

clinical and/or biomolecular proﬁles. The underlying

assumption is that patients described by similar pro-

ﬁles should show a similar clinical outcome (Gliozzo

et al., 2022). PSN are suitable for heterogeneous data

since they can be computed from every data type (Pai

and Bader, 2018) providing an overview of the re-

lationships among patients across the different data

views.

The integration of the computed PSN is not triv-

ial and many methods were proposed in literature to

tackle this issue, which are collectively classiﬁed as

“PSN-fusion methods” (Gliozzo et al., 2022). Sur-

prisingly, the vast majority of methods require “com-

plete datasets” having all data sources for every con-

sidered sample. However, it is quite common to

have completely missing data sources in multi-omics

datasets (Rappoport and Shamir, 2019; Xu et al.,

2021) due to the limited availability of samples, cost

of the assays and experimental design (Conesa and

Beck, 2019). A naive strategy to integrate multi-

modal data having partial samples, i.e. samples with

one or more data sources not available, is to re-

move them from the dataset. While this approach

is simple, it can signiﬁcantly reduce the amount of

samples available for the integration and for further

228

Gliozzo, J., Patak, A., Puertas-Gallardo, A., Casiraghi, E. and Valentini, G.

Patient Similarity Networks Integration for Partial Multimodal Datasets.

DOI: 10.5220/0011725500003414

In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 3: BIOINFORMATICS, pages 228-234

ISBN: 978-989-758-631-6; ISSN: 2184-4305

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

analysis. More sophisticated approaches attempt to

impute missing data exploiting information coming

from other data sources (e.g. KNN imputation on the

concatenated data matrices (Rappoport and Shamir,

2019)). Unfortunately state-of-the-art approaches to

recover missing data usually work when only some

parts of the data are missing (e.g. when only a rela-

tively small subset of the gene expression data for a

given patient are missing), but are not able to recover

a completely missed source of information for a given

patient (Xu et al., 2021).

Another related open problem regards the devel-

opment of methods to integrate “partial datasets”, i.e.

datasets having individuals with completely missing

data sources. Hence performing a good quality im-

putation and partial samples data integration in PSNs

represents an open problem, largely overlooked in

literature (Rappoport and Shamir, 2019; Xu et al.,

2021).

In this work, we propose miss-SNF: a novel al-

gorithm that can integrate partial datasets through the

cross-diffusion of information among PSN computed

from different data sources. Differently from other

proposed approaches able to handle partial datasets,

miss-SNF can partially reconstruct missing data by

using information from different sources during the

cross-diffusion process.

2 METHODS

Miss-SNF is a novel PSN-fusion method able to com-

bine multiple biological data sources having partial

samples. In particular, miss-SNF leverages the Simi-

larity Network Fusion (SNF) algorithm (Wang et al.,

2014) to integrate two or more data sources and man-

ages partial samples using a message-passing learning

strategy to recover and combine completely missed

data.

2.1 SNF

SNF is a method that exploits a cross-diffusion pro-

cess to pass information among PSN built from dif-

ferent data sources, until convergence to an integrated

network. The ﬁrst step of SNF is the computation

of a PSN W , expressing the pairwise similarity be-

tween the biomolecular proﬁles of individuals x

and

, from each data modality. A scaled exponential

similarity kernel is exploited to compute similarity for

continuous data:

W (i, j) = exp



−

ρ(x

)

µε

i j



(1)

where ρ(x

) is the Euclidean distance between

patients, µ is a hyperparameter related to the variance

of the local model and ε

i, j

is a scaling factor taking

into account the neighbourhoods N

and N

of the con-

sidered patients:

i, j

mean(ρ(x

)) + mean(ρ(x

)) + ρ(x

)

(2)

From the initial PSN W (i, j), other two matrices

P and S are computed for each data modality. The

“global” similarity matrix P, which is essential to cap-

ture the overall relationships between patients and it

is computed through the following normalization:

P(i, j) =

(

W (i, j)

∑

k̸=i

W (i,k)

, if j ̸= i

1/2 ,if j = i

(3)

where for equation 3 the property

∑

P(i, j) = 1

holds. Then a “local” similarity matrix is obtained as

follows:

S(i, j) =

(

W (i, j)

∑

k∈N

W (i,k)

, if j ∈ N

0 ,otherwise

(4)

where N

= {x

∈ kNN(x

) ∪ {x

}}. S is able

to capture the local structure of the network because

considers only local similarities in the neighbourhood

of each individual, setting to zero all the others.

Given m data modalities, m different W , S and P

matrices are constructed and an iterative process is ap-

plied where similarities are diffused through the Ps

until convergence, that is, until all the matrices P be-

come similar. In the simplest case, when m = 2, we

have P

(v)

that refers to P matrices for data v ∈ {1, 2} at

time t. In this case, the following recursive updating

formulas describes the diffusion process:

(1)

t+1

= S

(1)

× P

(2)

× S

(1)T

(2)

t+1

= S

(2)

× P

(1)

× S

(2)T

(5)

In other words P

(1)

is updated by using S

(1)

from

the same data source but P

(2)

from a different view

and vice-versa.

SNF can be easily extended to m > 2 data sources:

(s)

= S

(s)

∑

k̸=s

(k)

m − 1



(s)T



(6)

and the ﬁnal “consensus” matrix P

(c)

is:

(c)

∑

k=1

(k)

(7)

Patient Similarity Networks Integration for Partial Multimodal Datasets

229

2.2 Miss-SNF

As above-mentioned, a common issue in biological

multi-modal datasets regards the presence of partial

samples, i.e. samples that present one or more com-

pletely missing data sources. The original SNF algo-

rithm (described in the previous section 2.1) requires

the presence of all data modalities for each sample

to perform integration. A naive solution to this prob-

lem cannot consist in setting the feature vector of the

missing data source to

0 (vector of zeros). Indeed it

is easy to see that if we use any similarity measure to

compute the weights we obtain a value different from

zero.

To tackle this problem, we propose two different

extensions of SNF:

1. Reconstruction of missing data by propagation of

the information from other available sources. This

approach is able to partially reconstruct missing

data by using information from different sources

during the cross-diffusion process performed by

SNF. This can be accomplished by appropriately

setting the initial values for W

(s)

and S

(s)

ma-

trices for the patients having no data for the source

s and then by run the SNF algorithm.

2. Managing missing data by ignoring them. This

second solution simply ignores the missing data

by setting to zero all the entries of the patient x

the matrices W

(s)

, P

(s)

and S

(s)

and then by run-

ning the vanilla SNF.

2.2.1 Miss-SNF with Partial Reconstruction of

Missing Data (miss-SNF ONE)

The ﬁrst solution, that handles missing data by par-

tially reconstructing them during the diffusion process

of SNF, is performed by changing the similarity ma-

trices W , P and S. If for a patient x

we have a com-

pletely missing source s, the similarity matrices are

modiﬁed as follows:

• set W

(s)

(i, j) = 0 ∀ j ̸= i and W

(s)

(i,i) = 1

• set P

(s)

(i, j) = 0 ∀ j ̸= i and P

(s)

(i,i) = 1

• set S

(s)

(i, j) = 0 ∀ j ̸= i and S

(s)

(i,i) = 1

Having a second data source s

′

̸= s, this implies

that the update equation for node x

will be:

1. when k ̸= i

(s)

t+1

(i, j) =

∑

k∈N

∑

l∈N

(s)

(i,k)S

(s)

( j,l)P

′

̸=s)

(k, l) = 0

2. when k = i

(s)

t+1

(i, j) =

∑

l∈N

(s)

(i,i)S

(s)

( j,l)P

′

̸=s)

(i,l) > 0,

if ∃l s.t. S

(s)

( j,l) > 0 and P

′

̸=s)

(i,l) > 0

The result of the second equation can be different

from 0 when there is a common neighbour x

between

and x

such that S

(s)

( j,l) > 0 and P

′

̸=s

(i,l) > 0.

In other words, we have a contribution to the missing

(s)

t+1

(i, j) when does exist a common neighbour be-

tween i and j in respectively the “global” network P

for a different source s

′

̸= s and in the “local” network

S for the missed source s.

This implies that we can populate P

(s)

(i, j) also

when data are missed for source s.

This procedure can be easily extended to manage

missing data having m different sources. Indeed, the

update equation can be written as:

(s)

t+1

(i, j) =

∑

k∈N

∑

l∈N

(s)

(i,k)

∑

v̸=s

(v)

(k, l)

m − 1

(s)

( j,l)

Referring to

∑

v̸=s

(v)

(k,l)

m−1

as P

v̸=s

(k, l), and having

set S

(s)

(i,k) = 0 ∀ k ̸= i and P

(s)

(i,i) = 1, then:

(s)

t+1

(i, j) =

∑

k∈N

∑

l∈N

(s)

(i,k) P

v̸=s

(k, l)

(s)

( j,l) =

∑

l∈N

(s)

(i,i) P

v̸=s

(i,l) S

(s)

( j,l) =

∑

l∈N

v̸=s

(i,l) S

(s)

( j,l)

According to the above equations, if x

is not

missing, then P

(s)

t+1

(i, j) is “imputed” if x

and x

common neighbours respectively in the “global” net-

work P

v̸=s

and in the “local” network S

(s)

2.2.2 Miss-SNF Ignoring Partial Samples

(Miss-SNF ZERO)

The second solution, which handles missing data by

ignoring partial samples in the diffusion process, is

again performed by changing the similarity matrices

W , P and S. If for a patient x

we have no data for

source s, the similarity matrices are modiﬁed as fol-

lows:

• set W

(s)

(i, j) = 0 ∀ j

• set P

(s)

(i, j) = 0 ∀ j

• set S

(s)

(i, j) = 0 ∀ j

BIOINFORMATICS 2023 - 14th International Conference on Bioinformatics Models, Methods and Algorithms

230

In this way, the following update rule is obtained:

∀ j : P

(s)

t+1

(i, j) =

∑

k∈N

∑

l∈N

(s)

(i,k)S

(s)

( j,l)P

′

̸=s)

(k, l)

= 0

Hence, there will be no contribution to P

(s)

(i, j)

because it will remain P

(s)

(i, j) = 0 for every value of

The ﬁnal integrated “consensus” P

(c)

is computed

as:

(c)

∑

k=1

(k)

⊙ M.

where ⊙ is a pointwise multiplication, and M is a

matrix of the same dimension of P where M(i, j) is the

reciprocal of the sources s available for the edge (i, j).

In other words M(i, j) counts how many sources are

available for the edge (i, j). Note that M = M

In this way, we obtain a “consensus” P

(c)

that av-

erages edge weights with respect to the actually avail-

able data source for each patient.

3 RESULTS

We present some preliminary results to show the

effectiveness of the proposed approach. By using

multi-omics data from The Cancer Genome Atlas

(TCGA) (Hutter and Zenklusen, 2018), we compared

SNF using the complete data sets with miss-SNF

when amputed data are used instead. We both com-

pared predictions on early/late stage cancer patients

and the resulting integrated adjacency matrices to es-

timate the recovery capabilities of miss-SNF when

partial samples are present in the available data.

3.1 Dataset

To evaluate the performance of miss-SNF, a breast

cancer multi-omics dataset from TCGA (Tomczak

et al., 2015) was downloaded through the R package

“curatedTCGAData” (Ramos et al., 2020). Only pri-

mary solid tumors are considered and technical repli-

cates were no present in each data view. The follow-

ing data sources are considered:

• miRNA gene-level expression values (log2 RPM)

from RNA-sequencing

• mRNA gene expression values (TPM) from RNA-

sequencing

• normalized protein expression values from Re-

verse Phase Protein Array

Moreover, cancer stages were downloaded and the

samples dichotomized into early-stage cancer (stage

I and stage II) and late-stage cancer (stage III and

stage IV) (Dianatinasab et al., 2018). We consid-

ered a common set of 628 patients across data sources

and removed features having missing values in pro-

tein data. In this way, a complete multi-omic dataset

is obtained, where each data view has the same set

of samples. The dataset was further ﬁltered to re-

tain only the most interesting features for each data

source. A feature is removed if it has zero variance or

if it has very few unique values with respect to sam-

ples cardinality and the ratio of the frequency of the

most common value to the frequency of the second

most common value is large (Kuhn, 2021). After min-

max normalization, features were again ﬁltered based

on Pearson correlation. In particular, features with

an absolute correlation higher than 0.75 are selected

and the one with the largest mean absolute correlation

with respect to the other features is removed (Kuhn,

2021). At the end of this ﬁltering stage, we have 494

miRNA, 13220 mRNA and 202 proteins. Moreover,

the complete dataset is randomly amputed removing

10%, 20% and 30% of the features from each data

source. The only constrain is avoiding to remove the

same patient in the third data source if it was already

removed in the other two views. In this way we avoid

to have no data across sources for a speciﬁc sample.

At the end, we obtained four different datasets:

1. Complete dataset: no partial samples are present

2. Amputed 10%: each view has 10% of samples

completely missing

3. Amputed 20%: each view has 20% of samples

completely missing

4. Amputed 30%: each view has 30% of samples

completely missing

3.2 Experimental Setup

We aim at comparing the performance of SNF (Wang

et al., 2014) with respect to our proposed approach

miss-SNF. Since SNF can integrate only data sources

without partial samples, we used SNF to integrate the

breast cancer complete dataset and we used miss-SNF

to integrate the amputed datasets. In particular, we

run both the algorithms setting 20 iterations for the

diffusion process and k = 20 for the k- Nearest Neigh-

bours used to build the “local” similarity matrix S (see

equation 4), as suggested by SNF’s authors (Wang

et al., 2021). In both cases, scaled exponential eu-

clidean distance is used to compute the unimodal sim-

ilarity matrices (k = 20, µ = 0.5). The integrated ma-

trices obtained with SNF and miss-SNF are used to

Patient Similarity Networks Integration for Partial Multimodal Datasets

231

perform the prediction of late vs early-stage samples

using a 10 multiple hold-out procedure.

The label propagation algorithm (Zhu et al., 2003)

is exploited to perform a ranking of the predicted late

vs early stage breast cancer patients. A threshold of

0.5 was used to dichotomize the computed normal-

ized scores (scores are normalized between 0 and 1

by min-max normalization) and the following metrics

are computed to evaluate the generalization perfor-

mance on the test set: precision, recall, speciﬁcity,

F-measure, accuracy, AUC, AUPRC. Moreover, the

RMSE (Root Mean Squared Error) between the in-

tegrated matrix obtained by SNF (P

(c)

comp

) and the in-

tegrated matrices obtained by miss-SNF on the am-

puted datasets (P

(c)

amp

) is computed, in order to evaluate

whether the proposed algorithm is able to recover the

integrated data set when partial samples are present:

RMSE =

∑

i=1



(c)

comp

− P

(c)

amp



3.3 Experimental Results

Figure 1 shows the performance of SNF and miss-

SNF ONE on the different amputed datasets, while

Figure 2 shows the performance of miss-SNF ZERO.

In Figure 1, we can see that miss-SNF ONE has com-

petitive results with respect to SNF in terms of pre-

cision, AUC and AUPRC for all the percentages of

dataset amputation, while it is even able to surpris-

ingly improve recall and F-measure. On the other

hand, we can spot a drop in speciﬁcity and accuracy

for 10% and 20% of amputation, while the results

seems comparable with miss-SNF when the dataset

has a 30% of missing samples. A similar behaviour

is shown by miss-SNF ZERO even if performance

are generally slightly worse (except for recall and F-

measure) with respect to miss-SNF ONE, probably

due to the missing data reconstruction performed by

the latter. Indeed, Table 1 shows a slightly lower

RMSE for miss-SNF ONE with respect to miss-SNF

ZERO and the difference increases when the percent-

age of missing data is higher, showing that, as ex-

pected, miss-SNF ONE is able to partially recover the

missing samples.

4 DISCUSSION AND

CONCLUSIONS

In this work we presented miss-SNF, a novel algo-

rithm able to integrate PSN built from different bio-

Figure 1: Performance of SNF and miss-SNF ONE on the

different amputed datasets (10%, 20%, 30%). Results are

averaged across multiple holdouts and error bars show stan-

dard error. “Prec” is precision, “Rec” is recall, “Spec” is

speciﬁcity, “fmeas” is F-measure, “acc” is accuracy, “auc”

is the Area under the ROC Curve and “auprc” is the Area

under the Precision-Recall Curve.

Figure 2: Performance of SNF and miss-SNF ZERO on the

different amputed datasets (10%, 20%, 30%). Results are

averaged across multiple holdouts and error bars show stan-

dard error. “Prec” is precision, “Rec” is recall, “Spec” is

speciﬁcity, “fmeas” is F-measure, “acc” is accuracy, “auc”

is the Area under the ROC Curve and “auprc” is the Area

under the Precision-Recall Curve.

logical data sources having partial samples. The pro-

posed method leverages the non-linear integration ap-

proach based on message-passing theory presented in

SNF (Wang et al., 2014) to fuse multi-modal data and

to partially reconstruct missing data coming from the

presence of partial samples, i.e. samples having one

BIOINFORMATICS 2023 - 14th International Conference on Bioinformatics Models, Methods and Algorithms

232

Table 1: RMSE of miss-SNF ONE and ZERO for the differ-

ent amputed (Amp.) datasets with respect to SNF applied

on the complete dataset.

Amp. 10% Amp. 20% Amp. 30%

miss-SNF

ONE 0.0020 0.0021 0.0022

miss-SNF

ZERO 0.0022 0.0025 0.0028

or more completely missing data sources. Many ap-

proaches able to integrate PSN computed from differ-

ent sources stem from the algorithm SNF (see (Ma

and Zhang, 2017; Liu and Shang, 2018; Jiang et al.,

2019; Ruan et al., 2019; Rappoport and Shamir, 2019;

Liu et al., 2021; Li et al., 2022; Wu et al., 2021)), but

only NEMO (Rappoport and Shamir, 2019) modiﬁed

the original method to take into account the presence

of partial samples, which is a largely overlooked prob-

lem in literature. Of note, NEMO requires that each

pair of patients should have at least one common data

source to be integrated. This assumption is absent in

miss-SNF. To the best of our knowledge, miss-SNF is

the ﬁrst “SNF-based approch” (Gliozzo et al., 2022)

able to handle partial samples without such constraint.

We showed on a breast cancer multi-omic dataset that

miss-SNF can achieve comparable or even better per-

formance with respect to SNF considering different

percentages of partial samples present in the dataset.

Moreover, we showed that SNF is able to reconstruct

missing data. In future works, we plan to extensively

test miss-SNF on other multi-omics cancer datasets

of different sample size and on non-cancer datasets.

Moreover, we will compare miss-SNF with state-of-

the-art methods able to integrate multiple data sources

and handle the presence of completely missing sam-

ples in the dataset (Rappoport and Shamir, 2019; Xu

et al., 2021).

REFERENCES

Akhoon, N. (2021). Precision medicine: a new paradigm

in therapeutics. International Journal of Preventive

Medicine, 12.

Conesa, A. and Beck, S. (2019). Making multi-omics data

accessible to researchers. Scientiﬁc data, 6(1):1–4.

Dianatinasab, M., Mohammadianpanah, M., Daneshi,

N., Zare-Bandamiri, M., Rezaeianzadeh, A., and

Fararouei, M. (2018). Socioeconomic factors, health

behavior, and late-stage diagnosis of breast cancer:

considering the impact of delay in diagnosis. Clini-

cal breast cancer, 18(3):239–245.

Gliozzo, J., Mesiti, M., Notaro, M., Petrini, A., Patak, A.,

Puertas-Gallardo, A., Paccanaro, A., Valentini, G.,

and Casiraghi, E. (2022). Heterogeneous data integra-

tion methods for patient similarity networks. Brieﬁngs

in Bioinformatics, 23(4).

Hutter, C. and Zenklusen, J. (2018). The cancer genome

atlas: Creating lasting value beyond its data. Cell,

173(2):283–285.

Jiang, L., Xiao, Y., Ding, Y., Tang, J., and Guo, F. (2019).

Discovering cancer subtypes via an accurate fusion

strategy on multiple proﬁle data. Frontiers in genetics,

10:20.

Kuhn, M. (2021). caret: Classiﬁcation and Regression

Training. R package version 6.0-90.

Li, L., Wei, Y., Shi, G., Yang, H., Li, Z., Fang, R., Cao,

H., and Cui, Y. (2022). Multi-omics data integra-

tion for subtype identiﬁcation of chinese lower-grade

gliomas: A joint similarity network fusion approach.

Computational and Structural Biotechnology Journal,

20:3482–3492.

Lightbody, G., Haberland, V., Browne, F., Taggart, L.,

Zheng, H., Parkes, E., and Blayney, J. K. (2019). Re-

view of applications of high-throughput sequencing

in personalized medicine: barriers and facilitators of

future progress in research and clinical application.

Brieﬁngs in bioinformatics, 20(5):1795–1811.

Liu, J., Liu, W., Cheng, Y., Ge, S., and Wang, X. (2021).

Similarity network fusion based on random walk

and relative entropy for cancer subtype prediction of

multigenomic data. Scientiﬁc Programming, 2021.

Liu, S. and Shang, X. (2018). Hierarchical similarity net-

work fusion for discovering cancer subtypes. In Inter-

national Symposium on Bioinformatics Research and

Applications, pages 125–136. Springer.

Ma, T. and Zhang, A. (2017). Integrate multi-omic data

using afﬁnity network fusion (anf) for cancer patient

clustering. In 2017 IEEE International Conference on

Bioinformatics and Biomedicine (BIBM), pages 398–

403. IEEE.

Pai, S. and Bader, G. D. (2018). Patient similarity networks

for precision medicine. Journal of molecular biology,

430(18):2924–2938.

Ramos, M., Geistlinger, L., Oh, S., Schiffer, L., Azhar, R.,

Kodali, H., de Bruijn, I., Gao, J., Carey, V. J., Mor-

gan, M., and Waldron, L. (2020). Multiomic inte-

gration of public oncology databases in bioconduc-

tor. JCO Clinical Cancer Informatics, (4):958–971.

PMID: 33119407.

Rappoport, N. and Shamir, R. (2019). Nemo: cancer

subtyping by integration of partial multi-omic data.

Bioinformatics, 35(18):3348–3356.

Ruan, P., Wang, Y., Shen, R., and Wang, S. (2019). Us-

ing association signal annotations to boost similarity

network fusion. Bioinformatics, 35(19):3718–3726.

Tomczak, K., Czerwi

nska, P., and Wiznerowicz, M. (2015).

Review the cancer genome atlas (tcga): an immea-

surable source of knowledge. Contemporary Oncol-

ogy/Wsp

ołczesna Onkologia, 2015(1):68–77.

Wang, B., Mezlini, A., Demir, F., Fiume, M., Tu, Z.,

Brudno, M., Haibe-Kains, B., and Goldenberg, A.

(2021). SNFtool: Similarity Network Fusion. R pack-

age version 2.3.1.

Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z.,

Brudno, M., Haibe-Kains, B., and Goldenberg, A.

(2014). Similarity network fusion for aggregating data

Patient Similarity Networks Integration for Partial Multimodal Datasets

233

types on a genomic scale. Nature methods, 11(3):333–

337.

Wu, Y., Wang, H., Li, Z., Cheng, J., Fang, R., Cao, H., and

Cui, Y. (2021). Subtypes identiﬁcation on heart failure

with preserved ejection fraction via network enhance-

ment fusion using multi-omics data. Computational

and Structural Biotechnology Journal, 19:1567–1578.

Xu, H., Gao, L., Huang, M., and Duan, R. (2021). A net-

work embedding based method for partial multi-omics

integration in cancer subtyping. Methods, 192:67–76.

Zhu, X., Ghahramani, Z., and Lafferty, J. D. (2003). Semi-

supervised learning using gaussian ﬁelds and har-

monic functions. In Proceedings of the 20th Inter-

national conference on Machine learning (ICML-03),

pages 912–919.

BIOINFORMATICS 2023 - 14th International Conference on Bioinformatics Models, Methods and Algorithms

234