CNV-LDC: An Optimized CNV Detection Method for Low Depth of Coverage Data

Ayyoub Salmi, Sara El Jadid, Ismail Jamail, Taoufik Bensellak, Romain Philippe, Veronique Blanquet, Ahmed Moussa

Abstract

Recent improvements in technologies showed much greater variance of our genome than we thought. A part of this variance is due to submicroscopic chromosomal deletions/duplications called Copy Number Variations (CNVs). For some of these CNVs, it was clearly demonstrated that they play an important role in disease susceptibility, including complex diseases and Mendelian diseases. Last advances in next-generation sequencing have made fast progress in analyzing data for CNVs, in so far as they promise to improve the sensitivity in detection. This has led to the development of several new bioinformatics approaches and algorithms for detecting CNVs from this data for the four common methods: Assembly Based, Split Read, Read-Paired mapping, and Read Depth. Here we focus on the RD method that is able to detect the exact number of CNVs in comparison with the other methods. We propose an alternative method for detecting CNVs from short sequencing reads, CNV-LDC (Copy Number Variation-Low Depth of Coverage), that complements the existing method named CNV-TV (Copy Number Variation-Total Variation). We optimize the signal modeling and threshold step to lift the performance in low depth of coverage. Results of this new approach have been compared to various recent methods on different simulated data using small and large CNVs.

References

  1. Abyzov, A., Urban, A. E., Snyder, M., and Gerstein, M. (2011). CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Research, 21(6):974-984.
  2. Alkan, C., Coe, B. P., and Eichler, E. E. (2011). Genome structural variation discovery and genotyping. Nat Rev Genet, 12(5):363-376.
  3. Alkan, C., Kidd, J. M., Marques-Bonet, T., Aksay, G., Antonacci, F., Hormozdiari, F., Kitzman, J. O., Baker, C., Malig, M., Mutlu, O., Sahinalp, S. C., Gibbs, R. A., and Eichler, E. E. (2009). Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genetics, 41(10):1061-1067.
  4. Beckmann, J. S., Estivill, X., and Antonarakis, S. E. (2007). Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability. Nat Rev Genet, 8(8):639-646.
  5. Benjamini, Y. and Speed, T. P. (2012). Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Research, 40(10):e72-e72.
  6. Bentley, D. R. (2008). Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 456(7218):53-59.
  7. Boeva, V., Zinovyev, A., Bleakley, K., Vert, J.-P., JanoueixLerosey, I., Delattre, O., and Barillot, E. (2010). Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics, 27(2):268-269.
  8. Chiang, D. Y., Getz, G., Jaffe, D. B., Zhao, X., Carter, S. L., Russ, C., Nusbaum, C., Meyerson, M., and Lander, E. S. (2008). High-resolution mapping of copynumber alterations with massively parallel sequencing. Nature Methods, 6(1):99-103.
  9. Duan, J., Zhang, J.-G., Deng, H.-W., and Wang, Y.-P. (2013). CNV-TV: A robust method to discover copy number variation from short sequencing reads. BMC Bioinformatics, 14(1):150.
  10. Fadista, J., Thomsen, B., Holm, L.-E., and Bendixen, C. (2010). Copy number variation in the bovine genome. BMC Genomics, 11(1):284.
  11. Gusnanto, A., Wood, H. M., Pawitan, Y., Rabbitts, P., and Berri, S. (2011). Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Bioinformatics, 28(1):40-47.
  12. Huang, W., Li, L., Myers, J. R., and Marth, G. T. (2011). ART: a next-generation sequencing read simulator. Bioinformatics, 28(4):593-594.
  13. Janevski, A., Varadan, V., Kamalakaran, S., Banerjee, N., and Dimitrova, N. (2012). Effective normalization for copy number variation detection from whole genome sequencing. BMC Genomics, 13(Suppl 6):S16.
  14. Langmead, B. and Salzberg, S. L. (2012). Fast gapped-read alignment with bowtie 2. Nature Methods, 9(4):357- 359.
  15. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and and, R. D. (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25(16):2078-2079.
  16. Liu, B., Morrison, C. D., Johnson, C. S., Trump, D. L., Qin, M., Conroy, J. C., Wang, J., and Liu, S. (2013). Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges. Oncotarget, 4(11):1868-1881.
  17. Mardis, E. R. (2013). Next-generation sequencing platforms. Annual Rev. Anal. Chem., 6(1):287-303.
  18. Medvedev, P., Stanciu, M., and Brudno, M. (2009). Computational methods for discovering structural variation with next-generation sequencing. Nature Methods, 6(11s):S13-S20.
  19. Rausch, T., Zichner, T., Schlattl, A., Stutz, A. M., Benes, V., and Korbel, J. O. (2012). DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics, 28(18):i333-i339.
  20. Tan, R., Wang, Y., Kleinstein, S. E., Liu, Y., Zhu, X., Guo, H., Jiang, Q., Allen, A. S., and Zhu, M. (2014). An evaluation of copy number variation detection tools from whole-exome sequencing data. Human Mutation, 35(7):899-907.
  21. Teo, S. M., Pawitan, Y., Ku, C. S., Chia, K. S., and Salim, A. (2012). Statistical challenges associated with detecting copy number variations with next-generation sequencing. Bioinformatics, 28(21):2711-2718.
  22. Tibshirani, R. et al. (1997). The lasso method for variable selection in the cox model. Statistics in medicine, 16(4):385-395.
  23. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1):91-108.
  24. Turner, D. J., Miretti, M., Rajan, D., Fiegler, H., Carter, N. P., Blayney, M. L., Beck, S., and Hurles, M. E. (2007). Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nature Genetics, 40(1):90-95.
  25. Xi, R., Lee, S., and Park, P. J. (2012). A survey of copy-number variation detection tools based on highthroughput sequencing data. Current Protocols in Human Genetics, pages 7-19.
  26. Xie, C. and Tammi, M. T. (2009). CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics, 10(1):80.
  27. Ye, K., Schulz, M. H., Long, Q., Apweiler, R., and Ning, Z. (2009). Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics, 25(21):2865-2871.
  28. Yoon, S., Xuan, Z., Makarov, V., Ye, K., and Sebat, J. (2009). Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Research, 19(9):1586-1592.
  29. Zhang, J., Wang, J., and Wu, Y. (2012). An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data. BMC bioinformatics, 13(6):1.
  30. Zhao, M., Wang, Q., Wang, Q., Jia, P., and Zhao, Z. (2013). Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics, 14(Suppl 11):S1.
Download


Paper Citation


in Harvard Style

Salmi A., El Jadid S., Jamail I., Bensellak T., Philippe R., Blanquet V. and Moussa A. (2017). CNV-LDC: An Optimized CNV Detection Method for Low Depth of Coverage Data . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017) ISBN 978-989-758-214-1, pages 37-42. DOI: 10.5220/0006111600370042


in Bibtex Style

@conference{bioinformatics17,
author={Ayyoub Salmi and Sara El Jadid and Ismail Jamail and Taoufik Bensellak and Romain Philippe and Veronique Blanquet and Ahmed Moussa},
title={CNV-LDC: An Optimized CNV Detection Method for Low Depth of Coverage Data},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={37-42},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006111600370042},
isbn={978-989-758-214-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)
TI - CNV-LDC: An Optimized CNV Detection Method for Low Depth of Coverage Data
SN - 978-989-758-214-1
AU - Salmi A.
AU - El Jadid S.
AU - Jamail I.
AU - Bensellak T.
AU - Philippe R.
AU - Blanquet V.
AU - Moussa A.
PY - 2017
SP - 37
EP - 42
DO - 10.5220/0006111600370042