Fork me on GitHub

Reproducible research: Motivation

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • Should research software and data be reproducible?
  • Are they?
Objectives
  • Discuss factors affecting reproducibility in research

A scary anecdote

  • A group of researchers obtain great results and submit their work to a high-profile journal.
  • Reviewers ask for new figures and additional analysis.
  • The researchers start working on revisions and generate modified figures, but find inconsistencies with old figures.
  • The researchers can’t find some of the data they used to generate the original results, and can’t figure out which parameters they used when running their analyses.
  • The manuscript is still languishing in the drawer …

What is reproducible research?

“reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results. Reproducibility is a minimum necessary condition for a finding to be believable and informative.”

U.S. National Science Foundation (NSF) subcommittee on replicability in science

  • For any research project, an independent researcher should be able to replicate an experiment:
    • the same results should be obtained under the same contitions
    • it should be possible to recreate the same conditions!
  • “Experiment” is interpreted in a wide sense, encompassing also computational work

Why all the talk about reproducible research?

A recent survey in Nature revealed that irreproducible experiments are a problem across all domains of science:


Factors behind irreproducible research

  • Not enough documentation on how experiment is conducted and data is generated
  • Data used to generate original results unavailable
  • Software used to generate original results unavailable
  • Difficult to recreate software environment (libraries, versions) used to generate original results

Levels of reproducibility

Ensuring that one’s research is fully reproducible can be a challenging task, but multiple tools exist to make it easier.

Discussion

Discuss with your neighbors or among all participants

Computer programs are expected to produce the same output for the same inputs. Is that true for research software?

Can you give some examples? What can we do about it?