Author Carpentry logo

Author Carpentry : Docker for reproducible research

This course uses the Author Carpentry template but is not an Author Carpentry lesson yet! Find out more on the progress of this project at https://github.com/AuthorCarpentry/planning/issues/3.

Reproducibility of computational results is crucial in modern algorithm-based research. In this lesson, we introduce Docker as a useful tool to (a) document your computational environment, and (b) make a computational environment transferable across machines and thus archivable. The intention of this course is to showcase Docker as a useful tool for scientists, even if they are not regular users of the command line, which this course is completely based on.

Content Contributors: Daniel Nüst

Lesson Maintainers: Daniel Nüst

Lesson status: In Development

Learning Objectives:

Topics:

  1. Getting started with Docker
  2. RStudio in a Docker container
  3. Jupyter in a Docker container
  4. Transfer and archive of containers
  5. Create an image from a Dockerfile

Optional

Scope of this lesson

This lesson provides a rather “raw and manual” approach to creating reproducible packages of data, code, and the required runtime environment. Making this potentially tedious process more comfortable and ideally automatic for users is an active field of research, see for example the Executable Research Compendium by the o2r project, and tools such as ReproZip. Naturally understanding in depth how reproducibility can be achieved provides a clear advantage over simply using a white box (the existing tools are all open, so there is no black box). Therefore this lesson’s contents on concepts or containerization/virtualization and the leading open source tool are surely worth knowing, even when using supporting tools and services. This also makes this topic suitable for a generic audience interested in Author Carpentry.

Requirements

Author Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to insure the proper setup of tools for an efficient workflow. These lessons assume no prior knowledge of the skills or tools, but working through this lesson requires working installations of the software described below. To most effectively use these materials, please make sure to install everything before working through this lesson.

Optional requirements

In addition to the software, please bring a piece of your own research in one of the following formats (preferred formats first) if you have one at hand. This could be your digital notebook, or a section of any analysis script you’ve been using for a published paper, for example. Make sure you can share these files with other course participants, i.e. also bring required data, remove information with privacy issues, and potentially make an excerpt of a longer script and make sure it still runs.