# Robust K-Mer Partitioning for Parallel Counting

### Kemal Efe

#### Abstract

Due to the sheer size of the input data, k-mer counting is a memory-intensive task. Existing methods to parallelize k-mer counting cannot guarantee equal block sizes. Consequently, when the largest block is too large for a processor’s local memory, the entire computation fails. This paper shows how to partition the input into approximately equal-sized blocks each of which can be processed independently. Initially, we consider how to map k-mers into a number of independent blocks such that block sizes follow a truncated normal distribution. Then, we show how to modify the mapping function to obtain an approximately uniform distribution. To prove the claimed statistical properties of block sizes, we refer to the central limit theorem, along with certain properties of Pascal’s quadrinomial triangle. This analysis yields a tight upper bound on block sizes, which can be controlled by changing certain parameters of the mapping function. Since the running time of the resulting algorithm is O(1) per k-mer, partitioning can be performed efficiently while reading the input data from the storage medium.

Download#### Paper Citation

#### in Harvard Style

Efe K. (2018). **Robust K-Mer Partitioning for Parallel Counting**.In *Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: BIOINFORMATICS,* ISBN 978-989-758-280-6, pages 146-153. DOI: 10.5220/0006638801460153

#### in Bibtex Style

@conference{bioinformatics18,

author={Kemal Efe},

title={Robust K-Mer Partitioning for Parallel Counting},

booktitle={Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: BIOINFORMATICS,},

year={2018},

pages={146-153},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0006638801460153},

isbn={978-989-758-280-6},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: BIOINFORMATICS,

TI - Robust K-Mer Partitioning for Parallel Counting

SN - 978-989-758-280-6

AU - Efe K.

PY - 2018

SP - 146

EP - 153

DO - 10.5220/0006638801460153