Author Carpentry logo

Jupyter in Docker container

10 Minutes


Learning Objectives


This section assumes you have completed the “RStudio” section and have downloaded the file articles.csv.


Start container

Many different Docker images exist for Python. In this lesson, we’ll try out a Jupyter image.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. https://jupyter.org/

Jupyter images are available as different “opinioned ready-to-run stacks”.

docker run -it --rm -p 8888:8888 jupyter/scipy-notebook:eb70bcf1a292
# stop the container with Ctrl+c

In this case we use a new CLI option, --rm to remove the container right-away after it is closed Some housekeeping on container instances never hurts! We also use a specific release tag so we are reproducible.

Open your browser at the URL provided in the console, e.g. http://localhost:8888/?token=fce55...<token string>, create a new notebook and start editing your Python code, or copy and paste an example from the Numpy tutorial or from Matplotlib gallery.

Mount a directory from the host

To use files from your host computer and make data available in a container, you can mount volumes. This means you create a link between a directory on the host and a directory in the container. You can preserve work when containers are deleted, or between containers.

This part of the lesson is based on the Data Carpentry course Python for ecologists: starting with data.

# download a data file
mkdir /tmp/author-carpentry-docker
cd /tmp/authorcarpentry-docker
wget -O lc-articles.zip https://ndownloader.figshare.com/files/5330251
unzip lc-articles.zip

# mount a volume when starting the container
docker run -it --rm -p 8888:8888 -v /tmp/authorcarpentry-docker:/home/jovyan/work:rw jupyter/scipy-notebook:eb70bcf1a292

We use the :rw option, meaning we allow reading and wwriting from within the container, as oppossed to ro (read-only).

Learn more about volume mounts in the Docker docs.

We can now load the mounted data in the notebook:

import pandas as pd
pd.read_csv("articles.csv")

How is this useful for reproducible research?

It is not, because we separate the data from the container, which is not self-consistent anymore. But mounting host volumes is useful for data input and output during development, and also for large datasets, which cannot easily be replicated multiple times on the same machine.

Next steps

If you have not used Jupyter notebooks before, but use Python scripts for your work, then you can of course also wrap a Python script and its dependencies in a container. There are a number of Python images available on the Docker Hub that can get you started. You can even make your script the containers default command and provide instructions to your script when running a container.