4th Annual Summer Institute in Statistics for Big Data (SISBID)

Module 2: Reproducible Research for Biomedical Big Data

Mon, July 16 to Wed, July 18

Module dates/times: Monday, July 16; 8:30 a.m. -5 p.m.; Tuesday, July 17, 8:30 a.m.-5 p.m., and Wednesday, July 18, 8:30 a.m.-Noon

The validity of conclusions from scientific investigations is typically strengthened by the replication of results by independent researchers. Full replication of a study’s results using independent methods, data, equipment, and protocols has long been, and will continue to be, the standard by which scientific claims are evaluated. However, in many fields of study, there are examples of scientific investigations which cannot be fully replicated, often because of a lack of time or resources.

In such situations, there is a need for a minimum standard which can serve as an intermediate step between full replication and nothing. This minimum standard is reproducible research, which requires that datasets and computer code be made available to others for verifying published results and conducting alternate analyses. This standard is especially important in the context of biomedical research, where the results may determine patient care.

Unfortunately, reviews of the current literature suggest that this “standard” is anything but. Examples of non-reproducible research resulting in improper treatment of patients have driven journals, funding agencies, and regulatory agencies to press for a greater standard of reproducibility.

In this module, we will provide examples of systemic breakdowns demonstrating the need for reproducible research, and an introduction to tools for conducting reproducible research. Topics covered will include the types of breakdowns most commonly seen, current regulatory requests, literate statistical programming techniques, reproducible statistical computation, and techniques for making large-scale data analyses reproducible.

We will focus on the R statistical computing language, and will discuss other tools that can be used for producing reproducible documents. Module assumes some familiarity with R.

Recommended ReadingGandrud (2015) Reproducible Research with R and RStudio (2e).

Course materials from previous yearshttps://github.com/SISBID