Robust K-Mer Partitioning for Parallel Counting

Kemal Efe

2018

Abstract

Due to the sheer size of the input data, k-mer counting is a memory-intensive task. Existing methods to parallelize k-mer counting cannot guarantee equal block sizes. Consequently, when the largest block is too large for a processor’s local memory, the entire computation fails. This paper shows how to partition the input into approximately equal-sized blocks each of which can be processed independently. Initially, we consider how to map k-mers into a number of independent blocks such that block sizes follow a truncated normal distribution. Then, we show how to modify the mapping function to obtain an approximately uniform distribution. To prove the claimed statistical properties of block sizes, we refer to the central limit theorem, along with certain properties of Pascal’s quadrinomial triangle. This analysis yields a tight upper bound on block sizes, which can be controlled by changing certain parameters of the mapping function. Since the running time of the resulting algorithm is O(1) per k-mer, partitioning can be performed efficiently while reading the input data from the storage medium.

Download


Paper Citation


in Harvard Style

Efe K. (2018). Robust K-Mer Partitioning for Parallel Counting. In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-280-6, SciTePress, pages 146-153. DOI: 10.5220/0006638801460153


in Bibtex Style

@conference{bioinformatics18,
author={Kemal Efe},
title={Robust K-Mer Partitioning for Parallel Counting},
booktitle={Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 3: BIOINFORMATICS},
year={2018},
pages={146-153},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006638801460153},
isbn={978-989-758-280-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 3: BIOINFORMATICS
TI - Robust K-Mer Partitioning for Parallel Counting
SN - 978-989-758-280-6
AU - Efe K.
PY - 2018
SP - 146
EP - 153
DO - 10.5220/0006638801460153
PB - SciTePress