Fork me on GitHub

Reproducible research: Recording environments

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • How to capture the software environment of a computational experiment?

Containers

  • Containers can be built to bundle all the necessary ingredients (data, code, environment).
  • A container provides operating-system-level virtualization, sharing the host system’s kernel with other containers.
  • Popular container implementations are Docker and Singularity.

Docker

  • Available for most common operating systems.
  • A mechanism to “send the computer to the data” when data is too large or too sensitive to travel over network.
  • DockerHub is a platform to share Docker images (stored in repositories - similar to a Git repository).
  • Public Docker images available on DockerHub.

Use only official and trusted images!

Not all images can be trusted! There have been examples of contaminated images so investigate before using images blindly. Apply same caution as installing software packages from untrusted package repositories.

Singularity

  • Singularity is aimed at scientific community and to run scientific workflows on HPC resources.
  • Docker images can be converted into Singularity images.

Container vs. image vs. recipe (Dockerfile)

  • Image is like a blueprint. It is immutable.
  • Container is an instance of an image.
  • Dockerfile is a recipe which creates a container based on an image and applies small changes to it.

Pros and cons of containers

Containers are popular for a reason - they solve a number of important problems:

  • Allow for seamlessly moving workflows across different platforms.
  • Much more lightweight than virtual machines.
  • Eliminates the “works on my machine” situation.
  • For software with many dependencies with it turn own dependencies possibly the only (?) way to preserve the computational experiment for future reproducibility.

However, containers may also have some drawbacks:

  • Containers can have security vulnerabilities which can be exploited.
  • Can be used to hide away software installation problems and thereby discourage good software development practices.
  • It may not be clear whether to record the environment in the image part or the recipe part.

Discussion: reproducibility aspects of container images

  • Do you think containers contribute to reproducible research?
  • Do you see a use case for your own work?