Authors:
Marcel Ochocki
1
;
Michal Marczyk
1
;
2
and
Joanna Zyla
1
Affiliations:
1
Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, Gliwice, Poland
;
2
Breast Medical Oncology, Yale Cancer Center, Yale School of Medicine, New Haven, CT, U.S.A.
Keyword(s):
Unsupervised Learning, Data Normalization, Dimensionality Reduction, Single-Cell Sequencing.
Abstract:
Through the decades, improvements in high-throughput molecular biology techniques have brought to the level of sequencing transcripts from single cells (scRNA-Seq) instead of bulk material. Implementing these new techniques requires innovative analytical methods and knowledge about their performance. Data normalization is a crucial step in the bioinformatical pipeline applied in scRNA-Seq analysis. We evaluated the impact of six commonly used normalization methods on two dimensionality reduction methods, namely tSNE and UMAP, using three real scRNA-Seq datasets. We tested dispersion and clustering efficiency using three clustering algorithms after dimensionality reduction. Our results demonstrated that simple normalization methods, such as log2 or Freeman-Tukey, as well as scran normalization consistently outperformed other scRNA-seq-dedicated techniques, yielding superior dimensionality reduction and clustering efficiency for small and medium-sized datasets. Regardless of no statist
ically significant enhancement in results for any dimensionality reduction methods or clustering techniques, the Louvain clustering method consistently demonstrated lower performance results. We conclude, that the choice of normalization technique should be carefully tailored to the dataset’s size and characteristics since it may affect the final within-pipeline processing results.
(More)