7 Containerization With Docker

What you’ll have learned by the end of the chapter: build self-contained, truly reproducible analytical pipelines thanks to Docker.

7.1 Introduction

Up until now, we’ve been using Nix as a powerful tool for creating reproducible development environments directly on our machines. Nix gives us fine-grained control over every package and dependency in our project, ensuring bit-for-bit reproducibility. However, when it comes to distributing a data product, another technology, Docker, is incredibly popular.

While Nix manages dependencies for an application that runs on a host operating system, Docker takes a different approach: it packages an application along with a lightweight operating system and all its dependencies into a single, portable unit called a container. This container can then run on any machine that has Docker installed, regardless of its underlying OS.

The idea is to not only deliver the source code for our data products, but also include it inside a complete package that contains not only R and the required libraries, but also the necessary components of the operating system itself (which will usually be a flavor of Linux, like Ubuntu). This approach solves the “it works on my machine” problem in a very direct way.

For rebuilding a data product, a single command can be used which will pull the Docker image from a registry, start a container, build the data product, and stop.

If you’ve never heard of Docker before, this chapter will provide the basic knowledge required to get started. Let’s start by watching this very short video that introduces the core concepts.

In a sense, Docker can be seen as a lightweight virtual machine running a Linux distribution (usually Ubuntu) that you can interact with using the command line. This also means that familiarity with Linux distributions will make using Docker easier. Thankfully, there is a very large community of Docker users who also use R. This community is organized as the Rocker Project and provides a very large collection of Dockerfiles to get started easily. As you saw in the video above, Dockerfiles are simple text files that define a Docker image, from which you can start a container.

While Nix and Docker are often seen as competing tools for environment management, they can be used together effectively by leveraging their respective strengths. A powerful pattern is to use Nix inside a Docker container. In this setup, you start with a minimal base Docker image that has Nix installed. Then, you use Nix to declaratively build the precise, bit-for-bit reproducible development environment within the image. Docker’s role then shifts from environment provisioning to simply being a portable, universal runtime for this Nix-managed environment, making it excellent for deployment.

This approach contrasts with using Docker alone for reproducibility. While many attempt this, it’s not Docker’s core strength. Achieving a reproducible docker build often requires “abusing” Docker’s features—pinning base image hashes, freezing system package versions, and using specific package manager snapshots—because Docker was designed for creating portable runtime containers, not for guaranteeing reproducible builds. Its true reproducibility promise is that a specific, pre-built image will always launch an identical container, not that building the same Dockerfile twice will yield an identical image.

7.2 Docker essentials

7.2.1 Installing Docker

The first step is to install Docker. You’ll find the instructions for Ubuntu here, for Windows here (read the system requirements section as well!) and for macOS here (make sure to choose the right version for the architecture of your Mac, if you have an M1 Mac use Mac with Apple silicon).

After installation, it might be a good idea to restart your computer, if the installation wizard does not invite you to do so. To check whether Docker was installed successfully, run the following command in a terminal (or on the desktop app on Windows):

docker run --rm hello-world

This should print the following message:

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

If you see this message, congratulations, you are ready to run Docker. If you see an error message about permissions, this means that something went wrong. If you’re running Linux, make sure that your user is in the Docker group by running:

groups $USER

you should see your username and a list of groups that your user belongs to. If a group called docker is not listed, then you should add yourself to the group by following these steps.

7.2.2 The Rocker Project and image registries

When running a command like:

docker run --rm hello-world

what happens is that an image, in this case hello-world gets pulled from a so-called registry. A registry is a storage and distribution system for Docker images. Think of it as a GitHub for Docker images, where you can push and pull images, much like you would with code repositories. The default public registry that Docker uses is called Docker Hub, but companies can also host their own private registries to store proprietary images. When you execute a command like docker run, the Docker daemon first checks if the image is present on your local machine. If not, it connects to the configured registry, downloads the required image layers, and then assembles them to run the container.

Many open source projects build and distribute Docker images through Docker Hub, for example the Rocker Project.

The Rocker Project is instrumental for R users that want to use Docker. The project provides a large list of images that are ready to run with a single command. As an illustration, open a terminal and paste the following line:

docker run --rm -e PASSWORD=yourpassword -p 8787:8787 rocker/rstudio

Once this stops running, go to http://localhost:8787/ and enter rstudio as the username and yourpassword as the password. You should login to a RStudio instance: this is the web interface of RStudio that allows you to work with R from a server. In this case, the server is the Docker container running the image. Yes, you’ve just pulled a Docker image containing Ubuntu with a fully working installation of RStudio web!

(If you cannot connect to http://localhost:8787, try with the following command:

docker run --rm -ti -d -e PASSWORD=yourpassword -p 8787:8787 --network="host" rocker/rstudio

)

Let’s open a new script and run the following lines:

data(mtcars)

summary(mtcars)

You can now stop the container (by pressing CTRL-C in the terminal). Let’s now rerun the container… (with the same command as before) you should realize that your script is gone! This is the first lesson: whatever you do inside a container will disappear once the container is stopped. This also means that if you install the R packages that you need while the container is running, you will need to reinstall them every time. Thankfully, the Rocker Project provides a list of images with many packages already available. For example to run R with the {tidyverse} collection of packages already pre-installed, run the following command:

docker run --rm -ti -e PASSWORD=yourpassword -p 8787:8787 rocker/tidyverse

If you compare it to the previous command, you see that we have replaced rstudio with tidyverse. This is because rocker/tidyverse references an image, hosted on Docker Hub, that provides the latest version of R, RStudio server and the packages from the {tidyverse}. You can find the image hosted on Docker Hub here. There are many different images, and we will be using the versioned images made specifically for reproducibility. For now, however, let’s stick with the tidyverse image, and let’s learn a bit more about some specifics.

7.2.3 Basic Docker workflow

You already know about running containers using docker run. With the commands we ran before, your terminal will need to stay open, or else, the container will stop. Starting now, we will run Docker commands in the background. For this, we will use the -d flag (d as in detach), so let’s stop the container one last time with CTRL-C and rerun it using:

docker run --rm -d -e PASSWORD=yourpassword -p 8787:8787 rocker/tidyverse

(notice -d just after run). You can run several containers in the background simultaneously. You can list running containers with docker ps:

docker ps
CONTAINER ID   IMAGE              COMMAND   CREATED         STATUS         PORTS                                       NAMES
c956fbeeebcb   rocker/tidyverse   "/init"   3 minutes ago   Up 3 minutes   0.0.0.0:8787->8787/tcp, :::8787->8787/tcp   elastic_morse

The running container has the ID c956fbeeebcb. Also, the very last column, shows the name of the running container. This is a label that you can change. For now, take note of ID, because we are going to stop the container:

docker stop c956fbeeebcb

After Docker is done stopping the running container, you can check the running containers using docker ps again, but this time no containers should get listed. Let’s also discuss the other flags --rm, -e and -p. --rm removes the container once it’s stopped. Without this flag, we can restart the container and all the data and preferences we saved will be restored. However, this is dangerous because if the container gets removed, then everything will get lost, forever. We are going to learn how to deal with that later. -e allows you to provide environment variables to the container, so in this case the $PASSWORD variable. -p is for setting the port at which your app is going to get served. Let’s now rerun the container, but by giving it a name:

docker run -d --name my_r --rm -e PASSWORD=yourpassword -p 8787:8787 rocker/tidyverse

Notice the --name flag followed by the name we want to use, my_r. We can now interact with this container using its name instead of its ID. For example, let’s open an interactive bash session. Run the following command:

docker exec -ti my_r bash

You are now inside a terminal session, inside the running container! This can be useful for debugging purposes. It’s also possible to start R in the terminal, simply replace bash by R in the command above.

Finally, let’s solve the issue of our scripts disappearing. For this, create a folder somewhere on your computer (host). Then, rerun the container, but this time with this command:

docker run -d --name my_r --rm -e PASSWORD=yourpassword -p 8787:8787 -v /path/to/your/local/folder:/home/rstudio/scripts:rw rocker/tidyverse

where /path/to/your/local/folder should be replaced to the folder you created. You can also use $PWD instead of /path/to/your/local/folder if you run the docker run ... call in the right folder. You should now be able to save the scripts inside the scripts/ folder from RStudio and they will appear in the folder you created.

7.2.4 Making our own images

To create our own images, you can start from an image provided by an open source project like Rocker, or you can start from the base Ubuntu or Apline Linux images. These images are barebones compared to the ones from Rocker, but as a consequence they are very lightweight, which in some cases can be important. For the remainder of the course, we are going to start from a base Ubuntu image, and use Nix to add our software stack.

The snippet below is a minimal Dockerfile that shows exactly this:

FROM ubuntu:latest

RUN apt update -y

RUN apt install curl -y

# We don't have R nor {rix} in this image, so we can bootstrap it by downloading
# the default.nix file that comes with {rix}. You can also download it beforehand
# and then copy it to the Docker image
RUN curl -O https://raw.githubusercontent.com/ropensci/rix/main/inst/extdata/default.nix

# The next 4 lines install Nix inside Docker. See the Determinate Systems installer's documentation
RUN curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install linux \
  --extra-conf "sandbox = false" \
  --init none \
  --no-confirm

# Adds Nix to the path, as described by the Determinate Systems installer's documentation
ENV PATH="${PATH}:/nix/var/nix/profiles/default/bin"
ENV user=root

# Set up rstats-on-nix cache
# Thanks to the rstats-on-nix cache, precompiled binary packages will
# be downloaded instead of being compiled from source
RUN mkdir -p /root/.config/nix && \
    echo "substituters = https://cache.nixos.org https://rstats-on-nix.cachix.org" > /root/.config/nix/nix.conf && \
    echo "trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= rstats-on-nix.cachix.org-1:vdiiVgocg6WeJrODIqdprZRUrhi1JzhBnXv7aWI6+F0=" >> /root/.config/nix/nix.conf

# Copy a script to generate the environment of interest using {rix}
COPY gen-env.R .

# This will overwrite the default.nix we downloaded previously with a new
# expression generated from running `gen-env.R`
RUN nix-shell --run "Rscript gen-env.R"

# We now build the environment
RUN nix-build

# Finally, we run `nix-shell`. This will get executed when running
# containers from this image. You can of course put anything in here
CMD nix-shell

This can seem quite complicated, but if you take the time to read the comments, you’ll see that it’s actually quite simple.

Every Dockerfile starts with a FROM statement. This means that this Dockerfile will use the ubuntu:latest image as a starting point.

We start off from the ubuntu:latest image: you might read online that this is not a good practice, and that instead one should use a stable image, for example ubuntu:24.04 which will always use version 24.04 of Ubuntu. This is true IF you don’t use Nix. But since we are using Nix to set up the reproducible development environment, we can use ubuntu:latest: our development environment will always be exactly the same, thanks to Nix.

Then, every command we wish to run starts with a RUN statement. We install and configure Nix, copy an R script to generate the environment (we could also copy an already generated default.nix instead) and then build the environment. Finally, we finish by running nix-shell when executing a container which is the command prepended with CMD.

This image actually does two things:

a first step which consists in setting up Nix inside Docker;
a second step which consists in setting up our project-specific Nix development environment.

Because the first step is generic, we will split up this in two stages.

First, create a new Dockerfile in a separate directory, with a new Git repo so that you can commit and push it (later in the book we will set up continuous integration to build and publish this image automatically):

# Stage 1 — Base with Nix and rstats-on-nix cache
FROM ubuntu:latest AS nix-base

RUN apt update -y && apt install -y curl

# Install Nix via Determinate Systems installer
RUN curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install linux \
  --extra-conf "sandbox = false" \
  --init none \
  --no-confirm

ENV PATH="/nix/var/nix/profiles/default/bin:${PATH}"
ENV user=root

# Configure Nix binary cache
RUN mkdir -p /root/.config/nix && \
    echo "substituters = https://cache.nixos.org https://rstats-on-nix.cachix.org" > /root/.config/nix/nix.conf && \
    echo "trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= rstats-on-nix.cachix.org-1:vdiiVgocg6WeJrODIqdprZRUrhi1JzhBnXv7aWI6+F0=" >> /root/.config/nix/nix.conf

Commit and push. Then, we need to build this image once, and tag it:

docker build -t nix-base:latest .

This image is now available on our machines under the tag nix-base:latest, and we can refer to it for any of our projects. For a new project, simply reuse it like so:

FROM nix-base:latest

COPY gen-env.R .

RUN curl -O https://raw.githubusercontent.com/ropensci/rix/main/inst/extdata/default.nix
RUN nix-shell --run "Rscript gen-env.R"
RUN nix-build

CMD ["nix-shell"]

The issue with this approach is that now you have created a dependency between the two Dockerfiles which you need to manage. I would recommend the second approach only if you can push the first image with the Nix base on a registry (either public or a private one from your company). Later in this chapter we will publish the first image.

In the same folder than the second Dockerfile, add the required gen-env.R script:

library(rix)

rix(
  date = "2025-08-04",
  r_pkgs = c("dplyr", "ggplot2"),
  py_conf = list(
    py_version = "3.13",
    py_pkgs = c("polars", "great-tables")
  ),
  ide = "none",
  project_path = ".",
  overwrite = TRUE
)

This will setup an environment for our project. Let’s stop here, and build the image:

docker build -t my-project .

and now run a container:

docker run -it --rm --name my-project-container my-project

This should drop you in an interactive Nix shell running inside Docker! As Docker is more popular than Nix, in particular in enterprise settings, this makes sharing development environments easier.

Remember, anything you do in this container will be lost after you stop it. So if you want to use it to work interactively on files, you should mount a volume:

docker run --rm --name my-project-container -v /path/to/your/local/project-folder/workspace:/workspace:rw -w /workspace my-project

This will mount a folder called workspace inside a running Docker container that will map to a folder called workspace on your current project folder. The -w /workspace flag sets the working directory inside the container to /workspace, so any commands you run will execute from there. This acts as a kind of tunnel between the two; any file put there will be available and editable on the other side.

While this is good to know, I don’t recommend using Docker to work interactively. Use Nix for this instead, and use Docker to then deploy whatever product you’ve been working on once you’re done.

Before moving on to actually build projects using Docker, let’s first publish the base Nix image on Docker Hub to easily re-use it across projects.

7.2.5 Publishing images on Docker Hub

If you want to share Docker images through Docker Hub, you first need to create a free account. A free account gives you unlimited public repositories. If you want to make your images private, you need a paid account. For our purposes though, a free account is more than enough. In the next section, we will discuss how you can build new images upon other images without using Docker Hub.

We will be uploading the image nix-base to Docker Hub.

Now is the right moment to talk about the docker images command. This will list all the images available on your computer. You should see something like this:

REPOSITORY         TAG       IMAGE ID       CREATED       SIZE
nix-base           latest    d3764d067534   2 days ago    1.61GB
dev_env_r          latest    92fcf973ba42   2 days ago    1.42GB
raps_ubuntu_r      latest    7dabadf3c7ee   4 days ago    1.04GB
rocker/tidyverse   4.2.2     545e4538a28a   3 weeks ago   2.19GB
rocker/r-ver       4.2.2     08942f81ec9c   3 weeks ago   824MB

Take note of the image id of the nix-base image (second line), we will use it to push our image to Docker Hub. Also, don’t be alarmed by the size of the images, because this is a bit misleading. Different images that use the same base (so here Ubuntu), will reuse “layers” such that they don’t actually take up the size that is printed by docker images. So if images A and B both use the same version of Ubuntu as a base, but image A has RStudio installed and B also RStudio but Python as well, most of the space that A and B take up will be shared. The only difference will be that B will need a little bit more space for Python.

You can also list the running containers with docker container ls (or docker ps). If a container is running you should see something like this:

CONTAINER ID   IMAGE              COMMAND   CREATED
545e4538a28a   rocker/tidyverse   "/init"   3 minutes ago

STATUS         PORTS                                       NAMES
Up 3 minutes   0.0.0.0:8787->8787/tcp, :::8787->8787/tcp   elastic_morse

You can stop the container by running docker stop CONTAINER ID. So, list the images again using docker images. Take note of the image id of the image you want to push to Docker Hub.

Now, log in to Docker Hub using docker login (yes, from your terminal). You will be asked for your credentials, and if log in is successful, you see a message Log In Succeeded in your terminal (of course, you need first to have an account on Docker Hub).

Now, you need to tag the image (this gives it a version number). So you would write something like:

docker tag IMAGE_ID your_username_on_docker_hub/your_image:version1

so in my case, it would be:

docker tag 92fcf973ba42 brodriguesco/nix-base:latest

Next, I need to push it using docker push:

docker push brodriguesco/nix-base:latest

You can go check your profile and your repositories, you should see your image there.

This image can now be used as a stable base for developing our pipelines. Here’s how I can now use this base image for our project:

FROM brodriguesco/nix-base:latest

RUN mkdir ...

Now I’m re-using the image that defines the development environment, and I can do so for as many projects as necessary. I would recommend putting a link to the base image as a comment just before the first FROM.

If you want to test this, you could delete all images and containers from your system. This way, when you build the image using the above Dockerfile, it will have to pull from Docker Hub. To delete all containers, start by using docker system prune. You can then delete all images using docker rmi $(docker images -a -q). This should remove everything.

If you work for a company that has its own private registry, the process will be essentially the same, as it’s just that Docker would have been configured to pull and push to the private registry instead.

In the next section, I’ll explain to you how you can re-use base images like we just did, but without using Docker Hub, in case you cannot, or do not want, to rely on it.

7.2.6 Sharing a compressed archive of your image

If you can’t upload the image on Docker Hub, you can still “save it” into a file and share that file instead (internally to your institution/company).

Run docker save to save the image into a file:

docker save nix-base > nix-base.tar

This will create a .tar file of the image. You can then compress this file with an archiving tool if you want. If you’re on Linux, you could do so in one go (this will take some time):

docker save nix-base | gzip > nix-base.tgz

If you want to load this image, use docker load:

# Uncompress it first
gzip -d nix-base.tgz

# Load it
docker load < nix-base.tar

you should see an output like this:

202fe64c3ce3: Loading layer [======================>]  80.33MB/80.33MB
e7484d5519b7: Loading layer [======================>]  6.144kB/6.144kB
a0f5608ee4a8: Loading layer [======================>]  645.4MB/645.4MB
475d1d69813f: Loading layer [======================>]  102.9kB/102.9kB
d7963749937d: Loading layer [======================>]  108.9MB/108.9MB
224a0042a76f: Loading layer [======================>]    600MB/600MB
a75e978c1654: Loading layer [======================>]  605.7kB/605.7kB
7efc10233531: Loading layer [======================>]  1.474MB/1.474MB
Loaded image: nix-base:latest

or you can also use:

docker load -i nix-file.tar

to load the archive.

Since the image is available locally, it’ll get used instead of pulling it from Docker Hub. So in case you cannot use Docker Hub, you could build the base images, compress them, and share them on your corporate network. Then, people can simply download them and load them and build new images on top of them.

So in summary, here’s how you can share images with the world, your colleagues, or future you:

Only share the Dockerfiles. Users need to build the images.
Share images on Docker Hub. It’s up to you if you want to share a base image with the required development environment, and then separate, smaller images for the pipelines, or if you want to share a single image which contains everything.
Share images privately using a private registry, or by saving the image unto a file.

7.2.7 What if you don’t use Nix?

Using Nix inside of Docker makes it very easy to setup an environment, but what if you can’t use Nix for some reason? In this case, you would need to use other tools to install the right R or Python packages to build your Docker image and it is likely that it’s going to be more difficulty. The main issue you will likely face is missing development libraries to successfully install R or Python packages. In this case, you will need to first install the right development library. For example, to install the and use the R {stringr} package, you will need to first install libicu-dev. Below is an example of how this may end up looking like:

FROM rocker/r-ver:4.5.1

RUN apt-get update && apt-get install -y \
    libglpk-dev \
    libxml2-dev \
    libcairo2-dev \
    libgit2-dev \
    default-libmysqlclient-dev \
    libpq-dev \
    libsasl2-dev \
    libsqlite3-dev \
    libssh2-1-dev \
    libxtst6 \
    libcurl4-openssl-dev \
    libharfbuzz-dev \
    libfribidi-dev \
    libfreetype6-dev \
    libpng-dev \
    libtiff5-dev \
    libjpeg-dev \
    unixodbc-dev \
    wget

Another issue that you will face is that building the image is not a reproducible process, only running containers is. To mitigate this issue you can use tagged images (like in the example above) or better yet, using a digest which you can find on Dockerhub:

FROM rocker/r-ver@sha256:1dbe7a6718b7bd8630addc45a32731624fb7b7ffa08c0b5b91959b0dbf7ba88e

This will always pull exactly the same layers. However, this does not completely solve everything. At some point, that version of Ubuntu that you are using will be outdated, and it won’t be able to download anything from repositories anymore. At that point, if you still need that image, you either need to store and keep it, or you will need to start using a newer image, and potentially have to update your code as well. Using Nix, you can stay on ubuntu:latest.

To summarise, if you can’t use Nix inside of Docker, you will have to deal with the same issues you face when trying to setup environment on your computer.

7.3 Dockerizing a `{rixpress}` Pipeline

In the previous chapter, we learned how to build a fully reproducible, polyglot pipeline using Nix and {rixpress}. This workflow is perfect for development, ensuring that every run is bit-for-bit identical. However, what if you need to share your final data product with a collaborator or deploy it to a server where installing Nix is not an option?

This is the ideal use case for Docker. We can package our entire {rixpress} project (the Nix environment definition, the pipeline logic, and all source files) into a single Docker image. This image can then be run by anyone with Docker installed, regardless of their host operating system or whether they have Nix. The user doesn’t build the pipeline; they run the container, and the pre-built results are extracted. If they wish, they can “log-in” to an interactive session to acces the environment within a running Docker container and “play around” with the data and code.

Let’s take the {rixpress} project we created in the last chapter and dockerize it. Assume your project directory has the following structure:

.
├── data/
│   └── mtcars.csv
├── gen-env.R
├── gen-pipeline.R
├── functions.R
├── functions.py
├── default.nix
└── pipeline.nix

7.3.1 Step 1: The Dockerfile

Create a new file named Dockerfile in your project’s root directory. We will use the nix-base image we built earlier as our foundation.

# Use the base image with Nix pre-installed.
# If you built the image locally, you can use its local tag:
FROM nix-base:latest

# If you pushed it to Docker Hub, you (or a collaborator) can use that.
# This is the more portable and recommended approach for sharing.
# FROM your-username/nix-base:latest

# Optionally
# Set a working directory inside the container
WORKDIR /app

# Copy all your project files into the image's working directory
COPY . .

# Build the pipeline
# This single command leverages the 'pipeline.nix' file.
# Nix will first build the environment defined in 'default.nix',
# then it will execute the pipeline defined in 'pipeline.nix'.
# The results are stored immutably in the /nix/store inside the image.
# Instead, you can also only copy the R files, and regenerate the
# pipeline.nix file during the build process
RUN nix-build pipeline.nix

# The CMD will define what happens when the container is run.
# We will create a small R script to export the final artifact
# from the Nix store to a mounted volume.
COPY export-results.R .
CMD ["Rscript", "export-results.R"]

This Dockerfile is elegant and concise because it delegates the heavy lifting of environment and pipeline management to Nix, which is its specialty.

7.3.2 Step 2: The Export Script

The RUN nix-build command has already executed our entire pipeline during the image build process. This is ideal when your data can be bundled with the image. The user running the container doesn’t need to re-run the pipeline; they just need the output. (Note: If your data is too large or sensitive to be included in the image, an alternative approach is to remove the RUN nix-build step from the Dockerfile and instead execute the pipeline at runtime, using a mounted volume to provide the input data.)

In our case however, we can bundle the data in the image. The final artifacts, like our mtcars_head data frame, are now stored in the Nix store within the image. The user running the container doesn’t need to re-run the pipeline; they just need the output.

Create a small R script named export-results.R to extract these results:

# export-results.R
# This script runs inside the container after the pipeline has been built.

# Ensure {rixpress} is available to find the results
# The environment is activated via the Nix build process.
library(rixpress)
library(jsonlite)

# Define the directory where Docker will mount the output volume.
# This path MUST match the target path in the `docker run -v` command.
output_dir <- "/output"
dir.create(output_dir, showWarnings = FALSE)

# Read the final artifact from the completed pipeline
message("Reading target 'mtcars_head'...")
final_data <- rxp_read("mtcars_head")

# Save the final data to the mounted directory in a universal format
# You could also save the data in a `.csv` file
output_path <- file.path(output_dir, "mtcars_analysis_result.json")
write_json(final_data, output_path, pretty = TRUE)

message(paste("Successfully exported result to", output_path))

7.3.3 Step 3: Build and Run

With the Dockerfile and export-results.R in place, you can now build your self-contained data product.

Build the Docker image: Open a terminal in your project directory and run:
```
docker build -t my-reproducible-pipeline .
```
Run the container to get the results: Now, anyone can get the result of your analysis with a single command. We will create a local output folder and mount it into the container. (In case you couldn’t bundle the raw data into the image, then this is also how you would provide the data at run time. The pipeline would only be executed then).
```
# Create a directory on your host machine to receive the output
mkdir -p ./output

# Run the container
docker run --rm --name my_pipeline_run \
  -v "$(pwd)/output":/output \
  my-reproducible-pipeline
```

After the container runs and exits, check your local output directory. You will find the mtcars_analysis_result.json file, containing the exact, reproducible result of your pipeline. The docker run command automatically executed the CMD ["Rscript", "export-results.R"] we defined in our Dockerfile, which extracted the pre-built artifact.

You have successfully packaged a complex, polyglot pipeline into a simple, portable Docker image. This workflow combines the best of both worlds: Nix’s unparalleled power for creating reproducible builds and Docker’s universal standard for distributing and running applications.

7.4 Further reading

https://www.statworx.com/content-hub/blog/wie-du-ein-r-skript-in-docker-ausfuehrst/ (in German, English translation: https://www.r-bloggers.com/2019/02/running-your-r-script-in-docker/)
https://colinfay.me/docker-r-reproducibility/
https://jsta.github.io/r-docker-tutorial/
http://haines-lab.com/post/2022-01-23-automating-computational-reproducibility-with-r-using-renv-docker-and-github-actions/

7.5 Hands-on Exercises

7.5.1 Level 1: Foundational Skills

These exercises focus on mastering the basic Docker commands and concepts.

Exercise 1: The Ephemeral Container and the Persistent Volume

The goal of this exercise is to solidify your understanding of how containers are ephemeral and how volumes provide persistence.

Run the rocker/tidyverse image in detached mode (-d), giving it a name (e.g., my-rstudio). Publish the port 8787 and set a password.
Connect to the RStudio instance in your browser. Create a new R script named test_script.R and save it in the default home directory (/home/rstudio). Inside the script, write a simple R command like message("Hello from inside the container!").
Stop and remove the container using docker stop and docker rm (or just use --rm from the start).
Run a new container with the same command. Connect to RStudio again. Is your test_script.R still there? (It shouldn’t be).
Now, create a folder on your local machine called r_projects.
Run the rocker/tidyverse container again, but this time, mount your r_projects folder as a volume to /home/rstudio/projects inside the container. The flag should look something like -v "$(pwd)/r_projects":/home/rstudio/projects.
Connect to RStudio, navigate to the projects folder, and create your test_script.R there.
Stop and remove the container. Check your local r_projects folder. Is the script there? (It should be). This demonstrates how volumes link your host machine to the container’s filesystem.

Exercise 2: The Container Inspector

This exercise is designed to get you comfortable with interacting with a running container from your terminal.

Run a basic ubuntu:latest container in detached mode (-d) and give it a name like my-ubuntu-box. Use the command sleep 3600 to keep it running for an hour.
- Hint: The full command would be docker run -d --name my-ubuntu-box ubuntu:latest sleep 3600.
Use docker ps to verify that your container is running.
Use docker exec to get an interactive bash shell inside the my-ubuntu-box container.
Once inside, run the following Linux commands: ls -la, pwd, whoami, and cat /etc/os-release. What do you observe?
Still inside the container, use apt-get update && apt-get install -y fortunes to install a fun package. Run the fortune command.
Exit the container’s shell (type exit). Is the container still running? (It should be).
Stop the container using docker stop my-ubuntu-box.

7.5.2 Level 2: Building and Distributing Images

These exercises focus on creating your own Dockerfile and sharing your work.

Exercise 3: Your First Custom R Image

Create a Dockerfile that builds a simple, non-Nix R image.

Create a new project folder. Inside, create a Dockerfile.
In the Dockerfile, start from rocker/r-ver:4.5.1 (or any other versioned tag).
Add a RUN command to install the R package {cowsay} from CRAN.
Create an R script named run.R in the same folder. The script should contain: r library(cowsay) say("Moo-ving to Docker!", by = "cow")
In your Dockerfile, add a COPY command to copy run.R into the image (e.g., to /home/run.R).
Set the CMD to execute your script using Rscript.
Build the image with the tag my-cowsay-app.
Run a container from your new image. You should see the cow’s message printed to your terminal.

Exercise 4: Publish Your Nix Base

Take the nix-base image you created in the chapter and practice the full distribution workflow.

If you haven’t already, build the nix-base image locally.
Create a free account on Docker Hub.
Log in to Docker Hub from your terminal using docker login.
Tag your nix-base:latest image with your Docker Hub username, e.g., docker tag nix-base:latest your-username/nix-base:1.0.
Push the image to Docker Hub using docker push.
To test that it works, remove your local copy of the image: docker rmi your-username/nix-base:1.0 and docker rmi nix-base:latest.
Create a simple Dockerfile that starts with FROM your-username/nix-base:1.0. When you build this Dockerfile, Docker should pull the image you just pushed from Docker Hub.

7.5.3 Level 3: The Capstone Project

This exercise integrates all the concepts from the chapter into a complete, reproducible data product.

Exercise 5: Package Your Own {rixpress} Pipeline

Take a {rixpress} pipeline (you can use the one from the previous chapter’s exercises or create a new one) and package it into a distributable Docker image.

Your project should contain all the necessary files: gen-env.R, gen-pipeline.R, your default.nix and pipeline.nix, and any data/function scripts.
Create a Dockerfile that uses your published nix-base image from Exercise 4 as its FROM source.
The Dockerfile should:
- Set a working directory (e.g., /app).
- COPY all your project files into the image.
- RUN the nix-build pipeline.nix command to execute the pipeline during the build process.
- Include an export-results.R script (you’ll need to write this) that saves one or more of your final pipeline artifacts to an /output directory.
- Set the CMD to run your export-results.R script.
Build the final image with a descriptive tag (e.g., my-final-analysis:latest).
Run the image, mounting a local output folder to the container’s /output folder.
Verify that the final results (e.g., a plot, a CSV, or a JSON file) appear in your local output folder. Congratulations, you’ve created a fully portable and reproducible data product!

Extra Challenge: Modify your solution from Exercise 5 for a “big data” scenario where the input data cannot be included in the image. * Your Dockerfile should not copy the data and should not run nix-build at build time. * Instead, the CMD should execute the entire pipeline at runtime. You will need to figure out how to pass the paths for input data and output results to your script. * The docker run command will now need to mount two volumes: one for the input data and one for the output results.