Fork me on GitHub

Reproducible research: Sharing code and data

Overview

Teaching: 10 min
Exercises: 15 min
Questions
  • How can I share research code and data?
Objectives
  • Discuss the pros and cons of open science
  • Learn how to mint a DOI for your project

Sharing research data

The Open Science movement encourages researchers to share research output beyond the contents of a published academic article (and possibly supplementary information).

Arguments in favor (from Wikipedia):

  • Open access publication of research reports and data allows for rigorous peer-review
  • Science is publicly funded so all results of the research should be publicly available
  • Open Science will make science more reproducible and transparent
  • Open Science has more impact
  • Open Science will help answer uniquely complex questions

Arguments against (from Wikipedia):

  • Too much unsorted information overwhelms scientists
  • Potential misuse
  • The public will misunderstand science data
  • Increasing the scale of science will make verification of any discovery more difficult
  • Low-quality science

FAIR principles

“FAIR” is the current buzzword for data management. You may be asked about it in, for example, making data management plans for grants:

  • Findable
    • Will anyone else know that your data exists?
    • Solutions: put it in a standard repository, or at least a description of the data.
  • Accessible
    • Once someone knows that the data exists, can they get it?
    • Usually solved by being in a repository, but for non-open data, may require more procedures.
  • Interoperable
    • Is your data in a format that can be used by others, like csv instead of PDF?
    • Or better than csv. Example: 5-star open data
  • Reusable
    • Is there a license allowing others to re-use?

Even though this is usually referred to as “open data”, it means considering and making good decisions, even if non-open.

FAIR principles are usually discussed in the context of data, but they apply also for research software.

Note that FAIR principles do not require data/software to be open.

Exercise: Discuss open science

  • Do you share any other research outputs besides published articles and possibly source code?
  • Discuss pros and cons of sharing research data.

International services for sharing and collaborating on research data

  • Zenodo: A general-purpose open access repository created by OpenAIRE and CERN. Integration with GitHub, allows researchers to upload files up to 50 GB.
  • Figshare: Online digital repository where researchers can preserve and share their research outputs (figures, datasets, images and videos). Users can make all of their research outputs available in a citable, shareable and discoverable manner.
  • EUDAT: European platform for researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment.
  • Dryad: A general-purpose home for a wide diversity of datatypes, governed by a nonprofit membership organization. A curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable.
  • The Open Science Framework: Gives free accounts for collaboration around files and other research artifacts. Each account can have up to 5 GB of files without any problem, and it remains private until you make it public.

To find a research data repository for your data, you can search on the Registry of Research Data Repositories (re3data) platform and filter by country, content type, discipline, etc.

Exercise: Get a DOI by connecting your repository to Zenodo

Digital object identifiers (DOI) are the backbone of the academic reference and metrics system. In this exercise we will see how to make a GitHub repository citable by archiving it on the Zenodo archiving service:

  1. Sign in to Zenodo using your GitHub account. For this exercise, use the sandbox service: https://sandbox.zenodo.org/login/. This is a test version of the real Zenodo platform.
  2. Go to https://sandbox.zenodo.org/account/settings/github/.
  3. Find the repository you wish to publish, and flip the switch to ON.
  4. Go to GitHub and create a release by clicking the release tab and Create a new release (a release is based on a Git tag, but is a higher-level GitHub feature),
  5. Creating a new release will trigger Zenodo into archiving your repository, and a DOI badge will be displayed next to your repository after a minute or two. You can include it in your GitHub README file: click the DOI badge and copy the relevant format (Markdown, RST, HTML).

Further reading on reproducibility and open science

Key points

  • Consider sharing other research outputs than articles