Clustering Single-Cell RNA-seq Data: Impact of Data Binarization on Algorithmic Performance

Karolina Widzisz; Mateusz Kania; Joanna Zyla; Andrzej Polański

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Clustering Single-Cell RNA-seq Data: Impact of Data Binarization on Algorithmic Performance

Topics: Pattern Recognition, Clustering and Classification; Single-cell Sequencing and Analysis; Transcriptomics

In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS, 594-602, 2025 , Porto, Portugal

Authors: Karolina Widzisz ¹ ; Mateusz Kania ² ; Joanna Zyla ³ and Andrzej Polański ¹

Affiliations: ¹ Department of Computer Graphics, Vision and Digital Systems, Silesian University of Technology, Akademicka 16, Gliwice, Poland ; ² Department of Applied Informatics, Silesian University of Technology, Akademicka 16, Gliwice, Poland ; ³ Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, Gliwice, Poland

Keyword(s): scRNA-seq, Clustering Performance, Binary Data, Data Information Reduction.

Abstract: The primary objective of this study was to test the hypothesis that the binary information on the presence or absence of gene expression can sufficiently capture the inherent heterogeneity within single-cell RNA sequencing (scRNA-seq) data. This hypothesis posits that even without detailed expression levels, valuable insights about cellular diversity can be obtained. Utilizing this method can be particularly advantageous when analyzing large datasets, a common scenario in the field of scRNA-seq. In this paper, we evaluate clustering performance and cluster separability of a variety of model-based algorithms and distance-based methods to analyze both expression level data and threshold-encoded binarized data. We examined the performance of the Bernoulli-mixture model and Gaussian-mixture model. These were compared against traditional clustering techniques such as hierarchical clustering, K-means, and the Louvain algorithm on a range of scRNA-seq datasets. Our findings reveal that mixt ure models exhibit a lower dependence on the specific dataset compared to distance-based methods. Mixture models, particularly, demonstrate greater efficacy in accurately estimating the number of clusters present within the data. Among analyzed algorithms, the Bernoulli-mixture model stands out, outperforming distance-based approaches in several key aspects. Binary data, presence/absence of gene expression, seem to be indeed adequate to capture the heterogeneity of scRNA-seq data when clustering with methods specifically designed for binary datasets. The implications of this finding are significant, as it opens up new possibilities for simplifying data analysis in scRNA-seq studies without compromising the accuracy of the results. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.119.100.196

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Widzisz, K., Kania, M., Zyla, J. and Polański, A. (2025). Clustering Single-Cell RNA-seq Data: Impact of Data Binarization on Algorithmic Performance. In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS; ISBN 978-989-758-731-3; ISSN 2184-4305, SciTePress, pages 594-602. DOI: 10.5220/0013178800003911

@conference{bioinformatics25,
author={Karolina Widzisz and Mateusz Kania and Joanna Zyla and Andrzej Polański},
title={Clustering Single-Cell RNA-seq Data: Impact of Data Binarization on Algorithmic Performance},
booktitle={Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS},
year={2025},
pages={594-602},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013178800003911},
isbn={978-989-758-731-3},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS
TI - Clustering Single-Cell RNA-seq Data: Impact of Data Binarization on Algorithmic Performance
SN - 978-989-758-731-3
IS - 2184-4305
AU - Widzisz, K.
AU - Kania, M.
AU - Zyla, J.
AU - Polański, A.
PY - 2025
SP - 594
EP - 602
DO - 10.5220/0013178800003911
PB - SciTePress