Authors:
Tam Truong
1
;
Roland Faure
1
;
2
and
Rumen Andonov
1
Affiliations:
1
Univ. Rennes, INRIA RBA, CNRS UMR 6074, Rennes, France
;
2
Service Evolution Biologique et Ecologie, Université Libre de Bruxelles (ULB), 1050 Brussels, Belgium
Keyword(s):
Metagenomes, Strain Separation, Integer Linear Programming.
Abstract:
Metagenomic assembly is essential for understanding microbial communities but faces challenges in distinguishing conspecific bacterial strains. This is especially true when dealing with low-accuracy sequencing reads such as PacBio CLR and Oxford Nanopore. While these technologies provide unequaled throughput and read length, the high error rate makes it difficult to distinguish close bacterial strains. Consequently, current de novo metagenome assembly methods excel to assemble dominant species but struggle to reconstruct low-abundance strains. In our study, we innovate by approaching strain separation as an Integer Linear Programming (ILP) problem. We introduce a strain-separation module, strainMiner, and integrate it into an established pipeline to create strain-separated assemblies from sequencing data. Across simulated and real experiments encompassing a wide range of error rates (5-12%), our tool consistently compared favorably to the state-of-the-art in terms of assembly quality
and strain reconstruction. Moreover, strainMiner substantially cuts down the computational burden of strain-level assembly compared to published software by leveraging the powerful Gurobi solver. We think the new methodological ideas presented in this paper will help democratizing strain-separated assembly.
(More)