SMT: A High-Performance Approach for Counting Kmers

Jader Garbelini, Danilo Sanches, André Kashiwabara, Aurora Pozo

2024

Abstract

Motivation: Finding conserved motifs in DNA sequences is a key problem in bioinformatics. The growing availability of large-scale genomic data poses significant challenges for computational biology, particularly in terms of efficiency in analysis, kmer identification, and noise presence. The detection of conserved motifs and patterns in DNA sequences is determinant for understanding gene functions and regulations. Therefore, it is essential to develop a novel approaches and methods that can handle these large volumes of information and provide accurate and fast results. Results: We present SMT, an innovative tool designed to efficiently store and count kmers, optimizing memory usage and computation time. The application of SMT has also proven effective in discovering motifs in CHIP-SEQ data, allowing the identification of conserved regions in sequences. Furthermore, SMT allows exact searches in constant time proportional to the size of k and retrieves the most abundant kmers through a frequency table. This approach facilitates large-scale data analysis and provides important insights into the conserved properties of biological sequences. The application of SMT in motif discovery demonstrates its potential to drive research in bioinformatics and genomics. Availability and implementation: Supplementary data and results are available to provide additional information and support the conclusions. SMT and source code can be found at the following address: https://github.com/jadermcg/smt.

Download


Paper Citation


in Harvard Style

Garbelini J., Sanches D., Kashiwabara A. and Pozo A. (2024). SMT: A High-Performance Approach for Counting Kmers. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS; ISBN 978-989-758-688-0, SciTePress, pages 545-552. DOI: 10.5220/0012546500003657


in Bibtex Style

@conference{bioinformatics24,
author={Jader Garbelini and Danilo Sanches and André Kashiwabara and Aurora Pozo},
title={SMT: A High-Performance Approach for Counting Kmers},
booktitle={Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS},
year={2024},
pages={545-552},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012546500003657},
isbn={978-989-758-688-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS
TI - SMT: A High-Performance Approach for Counting Kmers
SN - 978-989-758-688-0
AU - Garbelini J.
AU - Sanches D.
AU - Kashiwabara A.
AU - Pozo A.
PY - 2024
SP - 545
EP - 552
DO - 10.5220/0012546500003657
PB - SciTePress