Efficient k-mer Indexing with Application to Mapping-free SNP Genotyping

Mattia Marcolin, Francesco Andreace, Matteo Comin

2022

Abstract

Advances in sequencing technologies and computational methods have enabled rapid and accurate identification of genetic variants. Accurate genotype calls and allele frequency estimations are crucial for population genomics analyses. One of the most demanding step in the genotyping pipeline is mapping reads to the human reference genome. Recently mapping-free methods, like Lava and VarGeno, have been proposed for the genotyping problem. They are reported to perform 30 times faster than a standard alignment-based genotyping pipeline while achieving comparable accuracy. Moreover, these methods are able to include known genomic variants in the reference making read mapping, and genotyping variant-aware. However, in order to run they require a large k-mers database, of about 60GB, to be loaded in memory. In this paper we study the problem of genotyping using new efficient data structures based on k-mers set compression, and we present a fast mapping-free genotyping tool, named GenoLight. GenoLight reports accuracy results similar to the standard pipeline, but it is up to 8 times faster. Also, GenoLight uses between 5 to 10 times less memory than the other mapping-free tools, and it can be run on a laptop. Availability: https://github.com/CominLab/GenoLight.

Download


Paper Citation


in Harvard Style

Marcolin M., Andreace F. and Comin M. (2022). Efficient k-mer Indexing with Application to Mapping-free SNP Genotyping. In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-552-4, SciTePress, pages 62-70. DOI: 10.5220/0010985700003123


in Bibtex Style

@conference{bioinformatics22,
author={Mattia Marcolin and Francesco Andreace and Matteo Comin},
title={Efficient k-mer Indexing with Application to Mapping-free SNP Genotyping},
booktitle={Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 3: BIOINFORMATICS},
year={2022},
pages={62-70},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010985700003123},
isbn={978-989-758-552-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 3: BIOINFORMATICS
TI - Efficient k-mer Indexing with Application to Mapping-free SNP Genotyping
SN - 978-989-758-552-4
AU - Marcolin M.
AU - Andreace F.
AU - Comin M.
PY - 2022
SP - 62
EP - 70
DO - 10.5220/0010985700003123
PB - SciTePress