Reproducible Analysis

What is reproducible research?

Reproducible research is the practice of distributing all data, software source code and tools required to reproduce the results discussed in a research publication.

This differs from replication, which is the confirmation of results and conclusions from one study obtained independently in another.

Reproducibility Spectrum

Consider reproducibility as a spectrum of evidence that spans the area between publication only and full replication.

The reproducibility spectrum, from 'publication only' (not reproducible) on the left to full replication (gold standard) on the right. In between, from left to right, are 'code',  'code and data' and 'linked and executable code and data' under the label 'publication plus'.

From “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227

Depending on resources available, different research products can be released to increase the reproducibility.

Tools for reproducible research:

  • Version control – Keeping track of changes in your documents is essential. Some software packages have version control built in. Version control systems like git will version control any file. Additionally, they can be combined with GitHub or Bitbucket, which allows you to back up your documents on the web.
  • Automation – Many computational activities done in research are repetitive. These tasks can be automated using scripts. For those who are not familiar with writing code, graphical software packages can export scripts detailing what was done in the graphical interfaces. Look for software that allows you to keep a record of what you did, so you can repeat your analysis and share it with others.
  • Literate programming – Even experts in a particular programming language can have trouble understanding someone else’s code. Literate programming intersperses code snippets, making the analysis reproducible with human readable language.