loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Karolina Widzisz 1 ; Mateusz Kania 2 ; Joanna Zyla 3 and Andrzej Polański 1

Affiliations: 1 Department of Computer Graphics, Vision and Digital Systems, Silesian University of Technology, Akademicka 16, Gliwice, Poland ; 2 Department of Applied Informatics, Silesian University of Technology, Akademicka 16, Gliwice, Poland ; 3 Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, Gliwice, Poland

Keyword(s): scRNA-seq, Clustering Performance, Binary Data, Data Information Reduction.

Abstract: The primary objective of this study was to test the hypothesis that the binary information on the presence or absence of gene expression can sufficiently capture the inherent heterogeneity within single-cell RNA sequencing (scRNA-seq) data. This hypothesis posits that even without detailed expression levels, valuable insights about cellular diversity can be obtained. Utilizing this method can be particularly advantageous when analyzing large datasets, a common scenario in the field of scRNA-seq. In this paper, we evaluate clustering performance and cluster separability of a variety of model-based algorithms and distance-based methods to analyze both expression level data and threshold-encoded binarized data. We examined the performance of the Bernoulli-mixture model and Gaussian-mixture model. These were compared against traditional clustering techniques such as hierarchical clustering, K-means, and the Louvain algorithm on a range of scRNA-seq datasets. Our findings reveal that mixt ure models exhibit a lower dependence on the specific dataset compared to distance-based methods. Mixture models, particularly, demonstrate greater efficacy in accurately estimating the number of clusters present within the data. Among analyzed algorithms, the Bernoulli-mixture model stands out, outperforming distance-based approaches in several key aspects. Binary data, presence/absence of gene expression, seem to be indeed adequate to capture the heterogeneity of scRNA-seq data when clustering with methods specifically designed for binary datasets. The implications of this finding are significant, as it opens up new possibilities for simplifying data analysis in scRNA-seq studies without compromising the accuracy of the results. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.119.100.196

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Widzisz, K., Kania, M., Zyla, J. and Polański, A. (2025). Clustering Single-Cell RNA-seq Data: Impact of Data Binarization on Algorithmic Performance. In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS; ISBN 978-989-758-731-3; ISSN 2184-4305, SciTePress, pages 594-602. DOI: 10.5220/0013178800003911

@conference{bioinformatics25,
author={Karolina Widzisz and Mateusz Kania and Joanna Zyla and Andrzej Polański},
title={Clustering Single-Cell RNA-seq Data: Impact of Data Binarization on Algorithmic Performance},
booktitle={Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS},
year={2025},
pages={594-602},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013178800003911},
isbn={978-989-758-731-3},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS
TI - Clustering Single-Cell RNA-seq Data: Impact of Data Binarization on Algorithmic Performance
SN - 978-989-758-731-3
IS - 2184-4305
AU - Widzisz, K.
AU - Kania, M.
AU - Zyla, J.
AU - Polański, A.
PY - 2025
SP - 594
EP - 602
DO - 10.5220/0013178800003911
PB - SciTePress