Consensus Clustering for Cancer Gene Expression Data
Isidora Šašić, Sanja Brdar, Tatjana Lončar-Turukalo, Helena Aidos, Ana Fred
2017
Abstract
Clustering algorithms are extensively used on patient tissue samples in order to group and visualize the microarray data. The high dimensionality and probe specific noise make the selection of the appropriate clustering algorithm an uneasy task. This study presents a large-scale analysis of three clustering algorithms: k-means, hierarchical clustering (HC) and evidence accumulation clustering (EAC) on thirty-five cancer gene expression data sets selected to benchmark the performance of the clustering algorithms. Separated performance analysis was done on data sets from Affymetrix and cDNA chip platforms to examine the possible influence of the microarray technology. The study revealed no consistent algorithm ranking can be inferred, though in general EAC presented the best compromise of adjusted rand index (ARI) and variance. However, the results indicated that ARI variance under repeated k-means initializations offers useful information on the need to implement more complex clustering techniques. If repeated K-means converges to the same partition, also confirmed by the HC clustering, there is no need to run EAC. However, under moderate or highly variable ARI in repeated K-means, EAC should be used to reduce the uncertainty of clustering and unveil the data structure.
DownloadPaper Citation
in Harvard Style
Šašić I., Brdar S., Lončar-Turukalo T., Aidos H. and Fred A. (2017). Consensus Clustering for Cancer Gene Expression Data. In - BIOINFORMATICS, (BIOSTEC 2017) ISBN , pages 0-0. DOI: 10.5220/0006174500001488
in Bibtex Style
@conference{bioinformatics17,
author={Isidora Šašić and Sanja Brdar and Tatjana Lončar-Turukalo and Helena Aidos and Ana Fred},
title={Consensus Clustering for Cancer Gene Expression Data},
booktitle={ - BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006174500001488},
isbn={},
}
in EndNote Style
TY - CONF
JO - - BIOINFORMATICS, (BIOSTEC 2017)
TI - Consensus Clustering for Cancer Gene Expression Data
SN -
AU - Šašić I.
AU - Brdar S.
AU - Lončar-Turukalo T.
AU - Aidos H.
AU - Fred A.
PY - 2017
SP - 0
EP - 0
DO - 10.5220/0006174500001488