Collaborative Reproducible Reporting - Git Submodules as a Data Security Solution

Peter E. Dewitt, Tellen D. Bennett

Abstract

Sensitive data and collaborative projects pose challenges for reproducible computational research. We present a workflow based on literate programming and distributed version control to produce well-documented and dynamic documents collaboratively authored by a team composed of members with varying data access privileges. Data are stored on secure institutional network drives and incorporated into projects using a feature of the Git version control system: submodules. Code to analyze data and write text is managed on public collaborative development environments. This workflow supports collaborative authorship while simultaneously protecting sensitive data. The workflow is designed to be inexpensive and is implemented primarily with a variety of free and open-source software. Work products can be abstracts, manuscripts, posters, slide decks, grant applications, or other documents. This approach is adaptable to teams of varying size in other collaborative situations.

References

  1. Chacon, S. and Straub, B. (2014). Pro git. Apress. Online at https://git-scm.com/book/en/v2.
  2. De Alwis, B. and Sillito, J. (2009). Why are software projects moving from centralized to decentralized version control systems? In Cooperative and Human Aspects on Software Engineering, 2009. CHASE'09. ICSE Workshop on, pages 36-39. IEEE.
  3. Donoho, D. L. (2010). An invitation to reproducible computational research. Biostatistics, 11(3):385-388.
  4. Gandrud, C. (2015). Reproducible Research with R and RStudio. Chapman & Hall/CRC Press, second edition.
  5. Knuth, D. E. (1984). Literate programming. The Computer Journal, 27(2):97-111.
  6. Leisch, F. (2002). Sweave: Dynamic generation of statistical reports using literate data analysis. In Härdle, W. and R önz, B., editors, Compstat 2002 - Proceedings in Computational Statistics, pages 575-580. Physica Verlag, Heidelberg. ISBN 3-7908-1517-9.
  7. National Academies of Sciences, Engineering, and Medicine, Division on Engineering and Physical Sciences, Board on Mathematical Sciences and Their Applications, Committee on Applied and Theoretical Statistics (2016). Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results: Summary of a Workshop. National Academies Press.
  8. Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060):1226-1227.
  9. Peng, R. D., Dominici, F., and Zeger, S. L. (2006). Reproducible epidemiologic research. Am J Epidemiol, 163(9):783-9.
  10. Rossini, A. and Leisch, F. (2003). Literate statistical practice. Biostatistics Working Paper Series. Working Paper 194. accessed May 17th, 2016.
  11. Xie, Y. (2015). Dynamic Documents with R and knitr, Second Edition. Chapman & Hall/CRC The R Series. CRC Press.
Download


Paper Citation


in Harvard Style

Dewitt P. and Bennett T. (2017). Collaborative Reproducible Reporting - Git Submodules as a Data Security Solution . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2017) ISBN 978-989-758-213-4, pages 230-235. DOI: 10.5220/0006109302300235


in Bibtex Style

@conference{healthinf17,
author={Peter E. Dewitt and Tellen D. Bennett},
title={Collaborative Reproducible Reporting - Git Submodules as a Data Security Solution},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2017)},
year={2017},
pages={230-235},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006109302300235},
isbn={978-989-758-213-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2017)
TI - Collaborative Reproducible Reporting - Git Submodules as a Data Security Solution
SN - 978-989-758-213-4
AU - Dewitt P.
AU - Bennett T.
PY - 2017
SP - 230
EP - 235
DO - 10.5220/0006109302300235