SKraken: Fast and Sensitive Classification of Short Metagenomic Reads based on Filtering Uninformative k-mers

Davide Marchiori, Matteo Comin

2017

Abstract

The study of microbial communities is an emerging field that is revolutionizing many disciplines from ecology to medicine. The major problem when analyzing a metagenomic sample is to taxonomic annotate its reads in order to identify the species in the sample and their relative abundance. Many tools have been developed in the recent years, however the performance in terms of precision and speed are not always adequate for these very large datasets. In this work we present SKraken an efficient approach to accurately classify metagenomic reads against a set of reference genomes, e.g. the NCBI/RefSeq database. SKraken is based on k-mers statistics combined with the taxonomic tree. Given a set of target genomes SKraken is able to detect the most representative k-mers for each species, filtering out uninformative k-mers. The classification performance on several synthetic and real metagenomics datasets shows that SKraken achieves in most cases the best performances in terms of precision and recall w.r.t. Kraken. In particular, at species level classification, the estimation of the abundance ratios improves by 6% and the precision by 8%. This behavior is confirmed also on a real stool metagenomic sample where SKraken is able to detect species with high precision. Because of the efficient filtering of uninformative $k$-mers, SKraken requires less RAM and it is faster than Kraken, one of the fastest tool. Availability: https://bitbucket.org/marchiori_dev/skraken Corresponding Author: comin@dei.unipd.it

Download


Paper Citation


in Harvard Style

Marchiori D. and Comin M. (2017). SKraken: Fast and Sensitive Classification of Short Metagenomic Reads based on Filtering Uninformative k-mers. In - BIOINFORMATICS, (BIOSTEC 2017) ISBN , pages 0-0. DOI: 10.5220/0006150500001488


in Bibtex Style

@conference{bioinformatics17,
author={Davide Marchiori and Matteo Comin},
title={SKraken: Fast and Sensitive Classification of Short Metagenomic Reads based on Filtering Uninformative k-mers},
booktitle={ - BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006150500001488},
isbn={},
}


in EndNote Style

TY - CONF

JO - - BIOINFORMATICS, (BIOSTEC 2017)
TI - SKraken: Fast and Sensitive Classification of Short Metagenomic Reads based on Filtering Uninformative k-mers
SN -
AU - Marchiori D.
AU - Comin M.
PY - 2017
SP - 0
EP - 0
DO - 10.5220/0006150500001488