Nicolas

8 minute read

Recently, I wanted to create a blogpost about simple mapping in R. It required some data (a shapefile), some spatial librairies and the R packages to use thoses libraries.

Unfortunatly, the blogdown image that I was using at that time didn’t provided any of that. It was the generic one provided by the Blogdown guide.

First reaction was to make a rendered html file. But the build was still cancelled and the html file was not deployed.

Then I started to install those libraries into the runner. It was already long since it installs R and such before doing anything. So I decided to create my own image, from scratch (almost).

Following recommandations from Sébastien Rochette (ThinkR) on how to setup a fresh Ubuntu 18.04 + R 3.5 , I created a Dockerfile then a docker image to upload.

Here is the simple tasks to follow:

  • Create dockerfile
  • Create the docker image
  • Push the docker image to the docker repository
  • Call the docker image in gitlab-ci

Create Dockerfile

What is the Dockerfile

The Dockerfile (without file extension and a capital D) is the recipe for docker to build your images.

It is the template of the image. Let’s do some docker terminology for those who are not use to it.

In docker you have 2 main things, images and containers. Images are ready to us mini virtualmachines. It is a small operating system and some applications (and sometimes data). You can’t use directly images but you can deploy containers from those images. A container is a running image. You can have only one image of one kind/version but you can make all the containers you want (or your computer is able to handle) from it.

A nice thing to know is that images are built like onions, you start from a root and then you had layers around it. The root can also contains several layers. Each time you change a layer, the layer and the following are rebuilt. Past layers are not.

Why a custom Docker image

The rocker project provides several docker images customized for R computing. It cames with a lot of things, there is a geospatial one but without blogdown. Plus it is quite heavy as it has the whole tidyverse and a lot of geospatial libraries that I don’t have use for. So I went for a custom one, big plus, I’ll pratice docker that way.

So, what is in the Dockerfile ?

Dockerfile I use for my custom geospatial blogdown

FROM ubuntu:18.04
MAINTAINER bakaniko
## Based on https://rtask.thinkr.fr/blog/installation-of-r-3-5-on-ubuntu-18-04-lts-and-tips-for-spatial-packages/

# CONFIGURE TIMEZONE
ENV TZ=Europe/Paris
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

# Utilities
RUN apt-get update && apt-get install -y gnupg ca-certificates pandoc

# Enable UBUNTU GIS repository
RUN echo  'deb http://ppa.launchpad.net/ubuntugis/ubuntugis-unstable/ubuntu bionic main' >> /etc/apt/sources.list.d/ubuntugis.sources.list
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 314DF160

# Add R 3.5 repository
RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' >> /etc/apt/sources.list.d/cran35.sources.list
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9

RUN apt-get update
RUN apt-get upgrade

# Install GEOSPATIAL UBUNTU PACKAGES
RUN apt-get -y install libgdal-dev libproj-dev libgeos-dev libudunits2-dev libv8-dev libcairo2-dev libnetcdf-dev

# INSTALL R
RUN apt-get -y install r-base r-base-core r-recommended

# Install R packages
## Blogdown
RUN R -e "install.packages('blogdown', repos='https://cran.rstudio.com/')"
RUN R -e "blogdown::install_hugo()"

## Rspatial packages
RUN R -e "install.packages(c('sf','rgeos', 'ggplot2', 'spData', 'cartography'), repos='https://cran.rstudio.com/')"

So the Dockerfile is quite simple:

  • it is based on an already existant image of Ubuntu 18.04
  • Maintained by me
  • Add Ubuntu GIS and CRAN repository
  • Update and upgrade everything
  • Install geospatial librairies for Ubuntu
  • Install R and recommanded packages
  • Install blogdown and Hugo to render the site within the image
  • Install geospatial packages for R

What are the FROM, ENV,RUN instructions ?

These are instructions for the build tool. They are executed sequentially and each command is a layer (think a docker image like an onion where layers embrace previous ones).

If I change the RUN instruction for the ubuntu geospatial packages (by adding a new package for example), then all the following instructions will be re-executed (so the whole R installation). If I only change the Rspatial line, then only the last layer will be built and the builder will start from the layers with Blogdown and Hugo installed.

FROM

This instruction that tells to Docker what is the base image to use. It has to be available through a repository (in my case, Docker Hub). For this image I use the classic Ubuntu image in its 18.04 version.

MAINTAINER

It is the maintainer name.

ENV

When you want to set some environment variables within the image, you can use the ENV instruction to do so. In this example, I set the timezone to Paris. The syntax for the ENV instruction is the bash syntax that you can find in most Unix-like operating systems.

RUN

This is the most used command. That tells to docker to perform some action like update packages sources, install software, exectute a script. As the ENV instruction, the RUN instruction use bash syntax.

You can use any command that would work in your base image. Here I use an ubuntu image so I can install package from universe or multiverse ubuntu repos or a ppa (Personal Package Archive).

There is other keywords that I didn’t use, please read the docker documentation for more information.

Build the image

I use Ubuntu MATE as my daily drive so here I’ll provide commands in the bash console. If you are using Microsoft products, please check the Docker website.

I place myself in the folder containing the Dockerfile and execute the following command:

docker build -f Dockerfile -t bakaniko/geoblogr:0.2 -t bakaniko/geoblogr:latest

So I say to docker to build an image following the recipe provided by the file (the -f flag) called Dockerfile and call it geoblogr. It has the name of the maintainer (bakaniko here) before as well.

As you see I call the -t flag twice. The image will be built one time but with 2 tags: - 0.2 which is the version number you can increment - latest which is unique so people can call the latest build and be sure to get the last one. Easy peasy.

Since it download and install everything, this step can take some time. The image is built locally.

See the image

At the end of the build, you may want to check it.

In docker repo

docker image ls 

This command lists all images you have locally in your docker setup. So you should see the 2 images: 0.2 and latest

Try the image locally

You can access to the image (it’s bash prompt) to test it as well:

docker run -it --entrypoint /bin/bash bakaniko/geoblogr:latest 

If everything is fine, you can push it to docker hub, where docker images are publicly stored. If you want to use this image with Travis in Github or in a runner for Gitlab-CI like me, you need to made available to download from Docker hub.

So, the continuous integration tool will be looking there for the image and download it. There is a possibility to point to private repository but that’s not the goal of this blogpost.

Push the image to docker hub

First you have to login locally to your docker account in order to be able to push images. Then you push your image. It may take some time depending on the upload speed of your internet connection. Mine is bad, it tooks me an hour and a half.

You can push one image at a time, so be sure to push the version numbered before as since latest links to the same image as 0.2, the upload part will be really quick.

docker login # needs an account
docker push bakaniko/geoblogr:0.2 
docker push bakaniko/geoblogr:latest

Adapt .gitlab-ci.yml

Before

image: debian:buster-slim

before_script:
  - apt-get update && apt-get -y install pandoc r-base
  - R -e "install.packages('blogdown',repos='http://cran.rstudio.com')"
  - R -e "blogdown::install_hugo()"

pages:
  script:
    - R -e "blogdown::build_site()"
  artifacts:
    paths:
      - public
  only:
    - master

With the new image

image: bakaniko/geoblogr:latest

pages:
  script:
  - R -e "blogdown::build_site()"
  artifacts:
    paths:
    - public
  only:
  - master

So you can see I changed the image line to my custom one. And remove all the installation lines. Now it just build and serve the site.

It faster as it just download the image and don’t build it anymore so I took me time to create it but for the website rendering it is really really faster (from 6 to 7 minutes to less than 30 seconds).

Reflexion on security

You may argue that since I’m the sole maintainer of this image it might be outdated quickly and lacks latest security updates from Ubuntu. That is absolutly true. But since it is a static site without databases or JS, PHP or things like that, well it is harder to crack and use it for something else or deface it. And there is absolutly no user or visitor data to stole.

So I can use that image even in the next 2 or 3 years without problems and rebuilt it when needed.

What’s next

I’ll problably need at some point to add Tidyverse packages. I think I’ll do it one by one to keep it as light as possible.

For this case, as it is a blog and past posts has to be rendered, I can only add things to this image.

As Nate Day suggested me, rocker/geospatial is a good starting point for geospatial analysis with R and docker. But it lacks blogdown and it is twice bigger than my current image.