USTAR2: Fast and Succinct Representation of k-mer Sets Using De Bruijn Graphs

Enrico Rossignolo, Matteo Comin

2024

Abstract

A fundamental operation within the realm of computational genomics revolves around the reduction of input sequences into their constituent k-mers. The development of space-efficient methods to represent a collection of k-mers assumes significant importance in advancing the scalability of bioinformatics analyses. One prevalent strategy involves transforming the set of k-mers into a de Bruijn graph and subsequently devising a streamlined representation of this graph by identifying the smallest path cover. In this article, we introduce USTAR2, a novel algorithm for the compression of k-mers. USTAR2 harnesses the principles of node connectivity in the de Bruijn graph, for a more efficient selection of paths for constructing the path cover. We performed a series of test on the compression of real read datasets, and compared USTAR2 with several other tools. USTAR2 achieved the best performance in terms of compression, it requires less memory and it is also considerably faster (up to 96x). The code of USTAR2 is available at the repository https://github.com/CominLab/USTAR2.

Download


Paper Citation


in Harvard Style

Rossignolo E. and Comin M. (2024). USTAR2: Fast and Succinct Representation of k-mer Sets Using De Bruijn Graphs. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS; ISBN 978-989-758-688-0, SciTePress, pages 368-378. DOI: 10.5220/0012423100003657


in Bibtex Style

@conference{bioinformatics24,
author={Enrico Rossignolo and Matteo Comin},
title={USTAR2: Fast and Succinct Representation of k-mer Sets Using De Bruijn Graphs},
booktitle={Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS},
year={2024},
pages={368-378},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012423100003657},
isbn={978-989-758-688-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS
TI - USTAR2: Fast and Succinct Representation of k-mer Sets Using De Bruijn Graphs
SN - 978-989-758-688-0
AU - Rossignolo E.
AU - Comin M.
PY - 2024
SP - 368
EP - 378
DO - 10.5220/0012423100003657
PB - SciTePress