loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Mete Akgün and Mahmut Şamil Sağıroğlu

Affiliation: Tübitak BİLGEM, Turkey

Keyword(s): FASTQ, Quality Scores, Prediction by Partial Matching (PPM), Compression.

Related Ontology Subjects/Areas/Topics: Bioinformatics ; Biomedical Engineering ; Databases and Data Management

Abstract: Next Generation Sequencing (NGS) platforms generate header data and quality information for each nucleotide sequence. These platforms may produce gigabyte-scale datasets. The storage of these datasets is one of the major bottlenecks of NGS technology. Information produced by NGS are stored in FASTQ format. In this paper, we propose an algorithm to compress quality score information stored in a FASTQ file. We try to find a model that gives the lowest entropy on quality score data. We combine our powerful statistical model with arithmetic coding to compress the quality score data the smallest. We compare its performance to text compression utilities such as bzip2, gzip and ppmd and existing compression algorithms for quality scores. We show that the performance of our compression algorithm is superior to that of both systems.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 54.198.45.0

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Akgün, M. and Şamil Sağıroğlu, M. (2013). Alternative PPM Model for Quality Score Compression. In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC 2013) - BIOINFORMATICS; ISBN 978-989-8565-35-8; ISSN 2184-4305, SciTePress, pages 122-126. DOI: 10.5220/0004221601220126

@conference{bioinformatics13,
author={Mete Akgün. and Mahmut {Şamil Sağıroğlu}.},
title={Alternative PPM Model for Quality Score Compression},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC 2013) - BIOINFORMATICS},
year={2013},
pages={122-126},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004221601220126},
isbn={978-989-8565-35-8},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC 2013) - BIOINFORMATICS
TI - Alternative PPM Model for Quality Score Compression
SN - 978-989-8565-35-8
IS - 2184-4305
AU - Akgün, M.
AU - Şamil Sağıroğlu, M.
PY - 2013
SP - 122
EP - 126
DO - 10.5220/0004221601220126
PB - SciTePress