7 Containerization With Docker
What you’ll have learned by the end of the chapter: build self-contained, truly reproducible analytical pipelines thanks to Docker.
7.1 Introduction
Up until now, we’ve been using Nix as a powerful tool for creating reproducible development environments directly on our machines. Nix gives us fine-grained control over every package and dependency in our project, ensuring bit-for-bit reproducibility. However, when it comes to distributing a data product, another technology, Docker, is incredibly popular.
While Nix manages dependencies for an application that runs on a host operating system, Docker takes a different approach: it packages an application along with a lightweight operating system and all its dependencies into a single, portable unit called a container. This container can then run on any machine that has Docker installed, regardless of its underlying OS.
The idea is to not only deliver the source code for our data products, but also include it inside a complete package that contains not only R and the required libraries, but also the necessary components of the operating system itself (which will usually be a flavor of Linux, like Ubuntu). This approach solves the “it works on my machine” problem in a very direct way.
For rebuilding a data product, a single command can be used which will pull the Docker image from a registry, start a container, build the data product, and stop.
If you’ve never heard of Docker before, this chapter will provide the basic knowledge required to get started. Let’s start by watching this very short video that introduces the core concepts.
In a sense, Docker can be seen as a lightweight virtual machine running a Linux distribution (usually Ubuntu) that you can interact with using the command line. This also means that familiarity with Linux distributions will make using Docker easier. Thankfully, there is a very large community of Docker users who also use R. This community is organized as the Rocker Project and provides a very large collection of Dockerfile
s to get started easily. As you saw in the video above, Dockerfile
s are simple text files that define a Docker image, from which you can start a container.
While Nix and Docker are often seen as competing tools for environment management, they can be used together effectively by leveraging their respective strengths. A powerful pattern is to use Nix inside a Docker container. In this setup, you start with a minimal base Docker image that has Nix installed. Then, you use Nix to declaratively build the precise, bit-for-bit reproducible development environment within the image. Docker’s role then shifts from environment provisioning to simply being a portable, universal runtime for this Nix-managed environment, making it excellent for deployment.
This approach contrasts with using Docker alone for reproducibility. While many attempt this, it’s not Docker’s core strength. Achieving a reproducible docker build
often requires “abusing” Docker’s features—pinning base image hashes, freezing system package versions, and using specific package manager snapshots—because Docker was designed for creating portable runtime containers, not for guaranteeing reproducible builds. Its true reproducibility promise is that a specific, pre-built image will always launch an identical container, not that building the same Dockerfile
twice will yield an identical image.
7.2 Docker essentials
7.2.1 Installing Docker
The first step is to install Docker. You’ll find the instructions for Ubuntu here, for Windows here (read the system requirements section as well!) and for macOS here (make sure to choose the right version for the architecture of your Mac, if you have an M1 Mac use Mac with Apple silicon).
After installation, it might be a good idea to restart your computer, if the installation wizard does not invite you to do so. To check whether Docker was installed successfully, run the following command in a terminal (or on the desktop app on Windows):
docker run --rm hello-world
This should print the following message:
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
If you see this message, congratulations, you are ready to run Docker. If you see an error message about permissions, this means that something went wrong. If you’re running Linux, make sure that your user is in the Docker group by running:
groups $USER
you should see your username and a list of groups that your user belongs to. If a group called docker
is not listed, then you should add yourself to the group by following these steps.
7.2.2 The Rocker Project and image registries
When running a command like:
docker run --rm hello-world
what happens is that an image, in this case hello-world
gets pulled from a so-called registry. A registry is a storage and distribution system for Docker images. Think of it as a GitHub for Docker images, where you can push and pull images, much like you would with code repositories. The default public registry that Docker uses is called Docker Hub, but companies can also host their own private registries to store proprietary images. When you execute a command like docker run
, the Docker daemon first checks if the image is present on your local machine. If not, it connects to the configured registry, downloads the required image layers, and then assembles them to run the container.
Many open source projects build and distribute Docker images through Docker Hub, for example the Rocker Project.
The Rocker Project is instrumental for R users that want to use Docker. The project provides a large list of images that are ready to run with a single command. As an illustration, open a terminal and paste the following line:
docker run --rm -e PASSWORD=yourpassword -p 8787:8787 rocker/rstudio
Once this stops running, go to http://localhost:8787/
and enter rstudio
as the username and yourpassword
as the password. You should login to a RStudio instance: this is the web interface of RStudio that allows you to work with R from a server. In this case, the server is the Docker container running the image. Yes, you’ve just pulled a Docker image containing Ubuntu with a fully working installation of RStudio web!
(If you cannot connect to http://localhost:8787
, try with the following command:
docker run --rm -ti -d -e PASSWORD=yourpassword -p 8787:8787 --network="host" rocker/rstudio
)
Let’s open a new script and run the following lines:
data(mtcars)
summary(mtcars)
You can now stop the container (by pressing CTRL-C
in the terminal). Let’s now rerun the container… (with the same command as before) you should realize that your script is gone! This is the first lesson: whatever you do inside a container will disappear once the container is stopped. This also means that if you install the R packages that you need while the container is running, you will need to reinstall them every time. Thankfully, the Rocker Project provides a list of images with many packages already available. For example to run R with the {tidyverse}
collection of packages already pre-installed, run the following command:
docker run --rm -ti -e PASSWORD=yourpassword -p 8787:8787 rocker/tidyverse
If you compare it to the previous command, you see that we have replaced rstudio with tidyverse. This is because rocker/tidyverse
references an image, hosted on Docker Hub, that provides the latest version of R, RStudio server and the packages from the {tidyverse}
. You can find the image hosted on Docker Hub here. There are many different images, and we will be using the versioned images made specifically for reproducibility. For now, however, let’s stick with the tidyverse
image, and let’s learn a bit more about some specifics.
7.2.3 Basic Docker workflow
You already know about running containers using docker run
. With the commands we ran before, your terminal will need to stay open, or else, the container will stop. Starting now, we will run Docker commands in the background. For this, we will use the -d
flag (d
as in detach), so let’s stop the container one last time with CTRL-C
and rerun it using:
docker run --rm -d -e PASSWORD=yourpassword -p 8787:8787 rocker/tidyverse
(notice -d
just after run
). You can run several containers in the background simultaneously. You can list running containers with docker ps
:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c956fbeeebcb rocker/tidyverse "/init" 3 minutes ago Up 3 minutes 0.0.0.0:8787->8787/tcp, :::8787->8787/tcp elastic_morse
The running container has the ID c956fbeeebcb
. Also, the very last column, shows the name of the running container. This is a label that you can change. For now, take note of ID, because we are going to stop the container:
docker stop c956fbeeebcb
After Docker is done stopping the running container, you can check the running containers using docker ps
again, but this time no containers should get listed. Let’s also discuss the other flags --rm
, -e
and -p
. --rm
removes the container once it’s stopped. Without this flag, we can restart the container and all the data and preferences we saved will be restored. However, this is dangerous because if the container gets removed, then everything will get lost, forever. We are going to learn how to deal with that later. -e
allows you to provide environment variables to the container, so in this case the $PASSWORD
variable. -p
is for setting the port at which your app is going to get served. Let’s now rerun the container, but by giving it a name:
docker run -d --name my_r --rm -e PASSWORD=yourpassword -p 8787:8787 rocker/tidyverse
Notice the --name
flag followed by the name we want to use, my_r
. We can now interact with this container using its name instead of its ID. For example, let’s open an interactive bash session. Run the following command:
docker exec -ti my_r bash
You are now inside a terminal session, inside the running container! This can be useful for debugging purposes. It’s also possible to start R in the terminal, simply replace bash
by R
in the command above.
Finally, let’s solve the issue of our scripts disappearing. For this, create a folder somewhere on your computer (host). Then, rerun the container, but this time with this command:
docker run -d --name my_r --rm -e PASSWORD=yourpassword -p 8787:8787 -v /path/to/your/local/folder:/home/rstudio/scripts:rw rocker/tidyverse
where /path/to/your/local/folder
should be replaced to the folder you created. You should now be able to save the scripts inside the scripts/
folder from RStudio and they will appear in the folder you created.
7.2.4 Making our own images
To create our own images, you can start from an image provided by an open source project like Rocker, or you can start from the base Ubuntu or Apline Linux images. These images are barebones compared to the ones from Rocker, but as a consequence they are very lightweight, which in some cases can be important. For the remainder of the course, we are going to start from a base Ubuntu image, and use Nix to add our software stack.
The snippet below is a minimal Dockerfile
that shows exactly this:
FROM ubuntu:latest
RUN apt update -y
RUN apt install curl -y
# We don't have R nor {rix} in this image, so we can bootstrap it by downloading
# the default.nix file that comes with {rix}. You can also download it beforehand
# and then copy it to the Docker image
RUN curl -O https://raw.githubusercontent.com/ropensci/rix/main/inst/extdata/default.nix
# The next 4 lines install Nix inside Docker. See the Determinate Systems installer's documentation
RUN curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install linux \
--extra-conf "sandbox = false" \
--init none \
--no-confirm
# Adds Nix to the path, as described by the Determinate Systems installer's documentation
ENV PATH="${PATH}:/nix/var/nix/profiles/default/bin"
ENV user=root
# Set up rstats-on-nix cache
# Thanks to the rstats-on-nix cache, precompiled binary packages will
# be downloaded instead of being compiled from source
RUN mkdir -p /root/.config/nix && \
echo "substituters = https://cache.nixos.org https://rstats-on-nix.cachix.org" > /root/.config/nix/nix.conf && \
echo "trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= rstats-on-nix.cachix.org-1:vdiiVgocg6WeJrODIqdprZRUrhi1JzhBnXv7aWI6+F0=" >> /root/.config/nix/nix.conf
# Copy a script to generate the environment of interest using {rix}
COPY generate_env.R .
# This will overwrite the default.nix we downloaded previously with a new
# expression generated from running `generate_env.R`
RUN nix-shell --run "Rscript generate_env.R"
# We now build the environment
RUN nix-build
# Finally, we run `nix-shell`. This will get executed when running
# containers from this image. You can of course put anything in here
CMD nix-shell
This can seem quite complicated, but if you take the time to read the comments, you’ll see that it’s actually quite simple.
Every Dockerfile
starts with a FROM
statement. This means that this Dockerfile
will use the ubuntu:latest
image as a starting point.
We start off from the ubuntu:latest
image: you might read online that this is not a good practice, and that instead one should use a stable image, for example ubuntu:24.04
which will always use version 24.04 of Ubuntu. This is true IF you don’t use Nix. But since we are using Nix to set up the reproducible development environment, we can use ubuntu:latest
: our development environment will always be exactly the same, thanks to Nix.
Then, every command we wish to run starts with a RUN
statement. We install and configure Nix, copy an R script to generate the environment (we could also copy an already generated default.nix
instead) and then build the environment. Finally, we finish by running nix-shell
when executing a container which is the command prepended with CMD
.
This image actually does two things:
- a first step which consists in setting up Nix inside Docker;
- a second step which consists in setting up our project-specific Nix development environment.
Because the first step is generic, we will split up this in two stages.
First, create a new Dockerfile in a separate directory, with a new Git repo so that you can commit and push it (later in the book we will set up continuous integration to build and publish this image automatically):
# Stage 1 — Base with Nix and rstats-on-nix cache
FROM ubuntu:latest AS nix-base
RUN apt update -y && apt install -y curl
# Install Nix via Determinate Systems installer
RUN curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install linux \
--extra-conf "sandbox = false" \
--init none \
--no-confirm
ENV PATH="/nix/var/nix/profiles/default/bin:${PATH}"
ENV user=root
# Configure Nix binary cache
RUN mkdir -p /root/.config/nix && \
echo "substituters = https://cache.nixos.org https://rstats-on-nix.cachix.org" > /root/.config/nix/nix.conf && \
echo "trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= rstats-on-nix.cachix.org-1:vdiiVgocg6WeJrODIqdprZRUrhi1JzhBnXv7aWI6+F0=" >> /root/.config/nix/nix.conf
Commit and push. Then, we need to build this image once, and tag it:
docker build -t nix-base:latest .
This image is now available on our machines under the tag nix-base:latest
, and we can refer to it for any of our projects. For a new project, simply reuse it like so:
FROM nix-base:latest
COPY generate_env.R .
RUN curl -O https://raw.githubusercontent.com/ropensci/rix/main/inst/extdata/default.nix
RUN nix-shell --run "Rscript generate_env.R"
RUN nix-build
CMD ["nix-shell"]
The issue with this approach is that now you have created a dependency between the two Dockerfiles which you need to manage. I would recommend the second approach only if you can push the first image with the Nix base on a registry (either public or a private one from your company). Later in this chapter we will publish the first image.
In the same folder than the second Dockerfile, add the required generate_env.R
script:
library(rix)
rix(
date = "2025-08-04",
r_pkgs = c("dplyr", "ggplot2"),
py_conf = list(
py_version = "3.13",
py_pkgs = c("polars", "great-tables")
),ide = "none",
project_path = ".",
overwrite = TRUE
)
This will setup an environment for our project. Let’s stop here, and build the image:
docker build -t my-project .
and now run a container:
docker run -it --rm --name my-project-container my-project
This should drop you in an interactive Nix shell running inside Docker! As Docker is more popular than Nix, in particular in enterprise settings, this makes sharing development environments easier.
Remember, anything you do in this container will be lost after you stop it. So if you want to use it to work interactively on files, you should mount a volume:
docker run --rm --name my-project-container -v /path/to/your/local/project-folder/workspace:/workspace:rw -w /workspace my-project
This will mount a folder called workspace
inside a running Docker container that will map to a folder called workspace
on your current project folder. The -w /workspace
flag sets the working directory inside the container to /workspace
, so any commands you run will execute from there. This acts as a kind of tunnel between the two; any file put there will be available and editable on the other side.
While this is good to know, I don’t recommend using Docker to work interactively. Use Nix for this instead, and use Docker to then deploy whatever product you’ve been working on once you’re done.
Before moving on to actually build projects using Docker, let’s first publish the base Nix image on Docker Hub to easily re-use it across projects.
7.2.5 Publishing images on Docker Hub
If you want to share Docker images through Docker Hub, you first need to create a free account. A free account gives you unlimited public repositories. If you want to make your images private, you need a paid account. For our purposes though, a free account is more than enough. In the next section, we will discuss how you can build new images upon other images without using Docker Hub.
We will be uploading the image nix-base
to Docker Hub.
Now is the right moment to talk about the docker images
command. This will list all the images available on your computer. You should see something like this:
REPOSITORY TAG IMAGE ID CREATED SIZE
nix-base latest d3764d067534 2 days ago 1.61GB
dev_env_r latest 92fcf973ba42 2 days ago 1.42GB
raps_ubuntu_r latest 7dabadf3c7ee 4 days ago 1.04GB
rocker/tidyverse 4.2.2 545e4538a28a 3 weeks ago 2.19GB
rocker/r-ver 4.2.2 08942f81ec9c 3 weeks ago 824MB
Take note of the image id of the nix-base
image (second line), we will use it to push our image to Docker Hub. Also, don’t be alarmed by the size of the images, because this is a bit misleading. Different images that use the same base (so here Ubuntu), will reuse “layers” such that they don’t actually take up the size that is printed by docker images
. So if images A and B both use the same version of Ubuntu as a base, but image A has RStudio installed and B also RStudio but Python as well, most of the space that A and B take up will be shared. The only difference will be that B will need a little bit more space for Python.
You can also list the running containers with docker container ls
(or docker ps
). If a container is running you should see something like this:
CONTAINER ID IMAGE COMMAND CREATED
545e4538a28a rocker/tidyverse "/init" 3 minutes ago
STATUS PORTS NAMES
Up 3 minutes 0.0.0.0:8787->8787/tcp, :::8787->8787/tcp elastic_morse
You can stop the container by running docker stop CONTAINER ID
. So, list the images again using docker images
. Take note of the image id of the image you want to push to Docker Hub.
Now, log in to Docker Hub using docker login
(yes, from your terminal). You will be asked for your credentials, and if log in is successful, you see a message Log In Succeeded
in your terminal (of course, you need first to have an account on Docker Hub).
Now, you need to tag the image (this gives it a version number). So you would write something like:
docker tag IMAGE_ID your_username_on_docker_hub/your_image:version1
so in my case, it would be:
docker tag 92fcf973ba42 brodriguesco/nix-base:latest
Next, I need to push it using docker push
:
docker push brodriguesco/nix-base:latest
You can go check your profile and your repositories, you should see your image there.
This image can now be used as a stable base for developing our pipelines. Here’s how I can now use this base image for our project:
FROM brodriguesco/nix-base:latest
RUN mkdir ...
Now I’m re-using the image that defines the development environment, and I can do so for as many projects as necessary. I would recommend putting a link to the base image as a comment just before the first FROM
.
If you want to test this, you could delete all images and containers from your system. This way, when you build the image using the above Dockerfile, it will have to pull from Docker Hub. To delete all containers, start by using docker system prune
. You can then delete all images using docker rmi $(docker images -a -q)
. This should remove everything.
If you work for a company that has its own private registry, the process will be essentially the same, as it’s just that Docker would have been configured to pull and push to the private registry instead.
In the next section, I’ll explain to you how you can re-use base images like we just did, but without using Docker Hub, in case you cannot, or do not want, to rely on it.
7.2.7 What if you don’t use Nix?
Using Nix inside of Docker makes it very easy to setup an environment, but what if you can’t use Nix for some reason? In this case, you would need to use other tools to install the right R or Python packages to build your Docker image and it is likely that it’s going to be more difficulty. The main issue you will likely face is missing development libraries to successfully install R or Python packages. In this case, you will need to first install the right development library. For example, to install the and use the R {stringr}
package, you will need to first install libicu-dev
. Below is an example of how this may end up looking like:
FROM rocker/r-ver:4.5.1
RUN apt-get update && apt-get install -y \
\
libglpk-dev \
libxml2-dev \
libcairo2-dev \
libgit2-dev \
default-libmysqlclient-dev \
libpq-dev \
libsasl2-dev \
libsqlite3-dev \
libssh2-1-dev \
libxtst6 \
libcurl4-openssl-dev \
libharfbuzz-dev \
libfribidi-dev \
libfreetype6-dev \
libpng-dev \
libtiff5-dev \
libjpeg-dev \
unixodbc-dev wget
A way to avoid that is to configure.
Another issue that you will face is that building the image is not a reproducible process, only running containers is. To mitigate this issue you can use tagged images (like in the example above) or better yet, using a digest which you can find on Dockerhub:
FROM rocker/r-ver@sha256:1dbe7a6718b7bd8630addc45a32731624fb7b7ffa08c0b5b91959b0dbf7ba88e
This will always pull exactly the same layers. However, this does not completely solve everything. At some point, that version of Ubuntu that you are using will be outdated, and it won’t be able to download anything from repositories anymore. At that point, if you still need that image, you either need to store and keep it, or you will need to start using a newer image, and potentially have to update your code as well. Using Nix, you can stay on ubuntu:latest
.
To summarise, if you can’t use Nix inside of Docker, you will have to deal with the same issues you face when trying to setup environment on your computer.
7.3 Dockerizing a {rixpress}
Pipeline
In the previous chapter, we learned how to build a fully reproducible, polyglot pipeline using Nix and {rixpress}
. This workflow is perfect for development, ensuring that every run is bit-for-bit identical. However, what if you need to share your final data product with a collaborator or deploy it to a server where installing Nix is not an option?
This is the ideal use case for Docker. We can package our entire {rixpress}
project (the Nix environment definition, the pipeline logic, and all source files) into a single Docker image. This image can then be run by anyone with Docker installed, regardless of their host operating system or whether they have Nix. The user doesn’t build the pipeline; they run the container, and the pre-built results are extracted. If they wish, they can “log-in” to an interactive session to acces the environment within a running Docker container and “play around” with the data and code.
Let’s take the {rixpress}
project we created in the last chapter and dockerize it. Assume your project directory has the following structure:
.
├── data/
│ └── mtcars.csv
├── gen-env.R
├── gen-pipeline.R
├── functions.R
├── functions.py
├── default.nix
└── pipeline.nix
7.3.1 Step 1: The Dockerfile
Create a new file named Dockerfile
in your project’s root directory. We will use the nix-base
image we built earlier as our foundation.
# Use the base image with Nix pre-installed.
# If you built the image locally, you can use its local tag:
FROM nix-base:latest
# If you pushed it to Docker Hub, you (or a collaborator) can use that.
# This is the more portable and recommended approach for sharing.
# FROM your-username/nix-base:latest
# Optionally
# Set a working directory inside the container
WORKDIR /app
# Copy all your project files into the image's working directory
COPY . .
# Build the pipeline
# This single command leverages the 'pipeline.nix' file.
# Nix will first build the environment defined in 'default.nix',
# then it will execute the pipeline defined in 'pipeline.nix'.
# The results are stored immutably in the /nix/store inside the image.
# Instead, you can also only copy the R files, and regenerate the
# pipeline.nix file during the build process
RUN nix-build pipeline.nix
# The CMD will define what happens when the container is run.
# We will create a small R script to export the final artifact
# from the Nix store to a mounted volume.
COPY export-results.R .
CMD ["Rscript", "export-results.R"]
This Dockerfile
is elegant and concise because it delegates the heavy lifting of environment and pipeline management to Nix, which is its specialty.
7.3.2 Step 2: The Export Script
The RUN nix-build
command has already executed our entire pipeline during the image build process. This is ideal when your data can be bundled with the image. The user running the container doesn’t need to re-run the pipeline; they just need the output. (Note: If your data is too large or sensitive to be included in the image, an alternative approach is to remove the RUN nix-build
step from the Dockerfile and instead execute the pipeline at runtime, using a mounted volume to provide the input data.)
In our case however, we can bundle the data in the image. The final artifacts, like our mtcars_head
data frame, are now stored in the Nix store within the image. The user running the container doesn’t need to re-run the pipeline; they just need the output.
Create a small R script named export-results.R
to extract these results:
# export-results.R
# This script runs inside the container after the pipeline has been built.
# Ensure {rixpress} is available to find the results
# The environment is activated via the Nix build process.
library(rixpress)
library(jsonlite)
# Define the directory where Docker will mount the output volume.
# This path MUST match the target path in the `docker run -v` command.
<- "/output"
output_dir dir.create(output_dir, showWarnings = FALSE)
# Read the final artifact from the completed pipeline
message("Reading target 'mtcars_head'...")
<- rxp_read("mtcars_head")
final_data
# Save the final data to the mounted directory in a universal format
# You could also save the data in a `.csv` file
<- file.path(output_dir, "mtcars_analysis_result.json")
output_path write_json(final_data, output_path, pretty = TRUE)
message(paste("Successfully exported result to", output_path))
7.3.3 Step 3: Build and Run
With the Dockerfile
and export-results.R
in place, you can now build your self-contained data product.
Build the Docker image: Open a terminal in your project directory and run:
bash docker build -t my-reproducible-pipeline .
Run the container to get the results: Now, anyone can get the result of your analysis with a single command. We will create a local
output
folder and mount it into the container. (In case you couldn’t bundle the raw data into the image, then this is also how you would provide the data at run time. The pipeline would only be executed then).# Create a directory on your host machine to receive the output mkdir -p ./output # Run the container docker run --rm --name my_pipeline_run \ -v "$(pwd)/output":/output \ my-reproducible-pipeline
After the container runs and exits, check your local output directory. You will find the mtcars_analysis_result.json
file, containing the exact, reproducible result of your pipeline. The docker run command automatically executed the CMD ["Rscript", "export-results.R"]
we defined in our Dockerfile, which extracted the pre-built artifact.
You have successfully packaged a complex, polyglot pipeline into a simple, portable Docker image. This workflow combines the best of both worlds: Nix’s unparalleled power for creating reproducible builds and Docker’s universal standard for distributing and running applications.
7.4 Further reading
- https://www.statworx.com/content-hub/blog/wie-du-ein-r-skript-in-docker-ausfuehrst/ (in German, English translation: https://www.r-bloggers.com/2019/02/running-your-r-script-in-docker/)
- https://colinfay.me/docker-r-reproducibility/
- https://jsta.github.io/r-docker-tutorial/
- http://haines-lab.com/post/2022-01-23-automating-computational-reproducibility-with-r-using-renv-docker-and-github-actions/
7.5 Hands-on Exercises
7.5.1 Level 1: Foundational Skills
These exercises focus on mastering the basic Docker commands and concepts.
Exercise 1: The Ephemeral Container and the Persistent Volume
The goal of this exercise is to solidify your understanding of how containers are ephemeral and how volumes provide persistence.
- Run the
rocker/tidyverse
image in detached mode (-d
), giving it a name (e.g.,my-rstudio
). Publish the port8787
and set a password. - Connect to the RStudio instance in your browser. Create a new R script named
test_script.R
and save it in the default home directory (/home/rstudio
). Inside the script, write a simple R command likemessage("Hello from inside the container!")
. - Stop and remove the container using
docker stop
anddocker rm
(or just use--rm
from the start). - Run a new container with the same command. Connect to RStudio again. Is your
test_script.R
still there? (It shouldn’t be). - Now, create a folder on your local machine called
r_projects
. - Run the
rocker/tidyverse
container again, but this time, mount yourr_projects
folder as a volume to/home/rstudio/projects
inside the container. The flag should look something like-v "$(pwd)/r_projects":/home/rstudio/projects
. - Connect to RStudio, navigate to the
projects
folder, and create yourtest_script.R
there. - Stop and remove the container. Check your local
r_projects
folder. Is the script there? (It should be). This demonstrates how volumes link your host machine to the container’s filesystem.
Exercise 2: The Container Inspector
This exercise is designed to get you comfortable with interacting with a running container from your terminal.
- Run a basic
ubuntu:latest
container in detached mode (-d
) and give it a name likemy-ubuntu-box
. Use the commandsleep 3600
to keep it running for an hour.- Hint: The full command would be
docker run -d --name my-ubuntu-box ubuntu:latest sleep 3600
.
- Hint: The full command would be
- Use
docker ps
to verify that your container is running. - Use
docker exec
to get an interactive bash shell inside themy-ubuntu-box
container. - Once inside, run the following Linux commands:
ls -la
,pwd
,whoami
, andcat /etc/os-release
. What do you observe? - Still inside the container, use
apt-get update && apt-get install -y fortunes
to install a fun package. Run thefortune
command. - Exit the container’s shell (type
exit
). Is the container still running? (It should be). - Stop the container using
docker stop my-ubuntu-box
.
7.5.2 Level 2: Building and Distributing Images
These exercises focus on creating your own Dockerfile
and sharing your work.
Exercise 3: Your First Custom R Image
Create a Dockerfile
that builds a simple, non-Nix R image.
- Create a new project folder. Inside, create a
Dockerfile
. - In the
Dockerfile
, start fromrocker/r-ver:4.5.1
(or any other versioned tag). - Add a
RUN
command to install the R package{cowsay}
from CRAN. - Create an R script named
run.R
in the same folder. The script should contain:r library(cowsay) say("Moo-ving to Docker!", by = "cow")
- In your
Dockerfile
, add aCOPY
command to copyrun.R
into the image (e.g., to/home/run.R
). - Set the
CMD
to execute your script usingRscript
. - Build the image with the tag
my-cowsay-app
. - Run a container from your new image. You should see the cow’s message printed to your terminal.
Exercise 4: Publish Your Nix Base
Take the nix-base
image you created in the chapter and practice the full distribution workflow.
- If you haven’t already, build the
nix-base
image locally. - Create a free account on Docker Hub.
- Log in to Docker Hub from your terminal using
docker login
. - Tag your
nix-base:latest
image with your Docker Hub username, e.g.,docker tag nix-base:latest your-username/nix-base:1.0
. - Push the image to Docker Hub using
docker push
. - To test that it works, remove your local copy of the image:
docker rmi your-username/nix-base:1.0
anddocker rmi nix-base:latest
. - Create a simple
Dockerfile
that starts withFROM your-username/nix-base:1.0
. When you build thisDockerfile
, Docker should pull the image you just pushed from Docker Hub.
7.5.3 Level 3: The Capstone Project
This exercise integrates all the concepts from the chapter into a complete, reproducible data product.
Exercise 5: Package Your Own {rixpress}
Pipeline
Take a {rixpress}
pipeline (you can use the one from the previous chapter’s exercises or create a new one) and package it into a distributable Docker image.
- Your project should contain all the necessary files:
gen-env.R
,gen-pipeline.R
, yourdefault.nix
andpipeline.nix
, and any data/function scripts. - Create a
Dockerfile
that uses your publishednix-base
image from Exercise 4 as itsFROM
source. - The
Dockerfile
should:- Set a working directory (e.g.,
/app
). COPY
all your project files into the image.RUN
thenix-build pipeline.nix
command to execute the pipeline during the build process.- Include an
export-results.R
script (you’ll need to write this) that saves one or more of your final pipeline artifacts to an/output
directory. - Set the
CMD
to run yourexport-results.R
script.
- Set a working directory (e.g.,
- Build the final image with a descriptive tag (e.g.,
my-final-analysis:latest
). - Run the image, mounting a local
output
folder to the container’s/output
folder. - Verify that the final results (e.g., a plot, a CSV, or a JSON file) appear in your local
output
folder. Congratulations, you’ve created a fully portable and reproducible data product!
Extra Challenge: Modify your solution from Exercise 5 for a “big data” scenario where the input data cannot be included in the image. * Your Dockerfile
should not copy the data and should not run nix-build
at build time. * Instead, the CMD
should execute the entire pipeline at runtime. You will need to figure out how to pass the paths for input data and output results to your script. * The docker run
command will now need to mount two volumes: one for the input data and one for the output results.