Nicolas

8 minute read

This post is based on a notebook I started about R spatial analysis for the project OSGeoLive

It aims to provide a quick introduction to R spatial analysis and cartography and will be extended.

R is a language dedicated to statistics and data analysis. It has also a lot of strong packages for spatial analysis. Recent packages like {sf} allows easy Simple Features manipulation.

This document aims to complete the R Overview and R Quickstart. If you don’t have read them yet, please consider doing it if you are new to R.

Please note that is post needs several spatial libraries (GEOS, GDAL), their dependencies (libuunits, etc) and some R wrapper. So I build a custom Docker image for the Gitlab-CI runner used to render this site. It is a mix from the Blogdown recommended .gitlab-ci.yml and Sébastien Rochette recommandations for installing R 3.5 on ubuntu 18.04.

Works pretty well. So let’s do some mapping.

Simple mapping

I was inspired by a QGIS example from its OSGeoLive quickstart.

I was also inspired by Sébastien Rochette (again !) blogpost on mapping with {sf}.

We want to represent the sudden infant death syndrome (SIDS) in North Carolina (USA) data from the {spData} package.

More information about the dataset here: nowosad.github.io/spData/reference/nc.sids.html.

Needed libaries

R comes with a lot of packages to provides functionnalities. Those functions are stocked in libraries and in order to call them, you need to load them beforehand. To do so, use the function library(< package name >).

This post use latest version (at the time of writing) of those packages:

  • {sf}: 0.6-3
  • {ggplot2}: 3.0.0
  • {spData}: 0.2.9.3

That is because ggplot2 version 3.0 compatibility with sf have been greatly improved and I like when things makes my life simple.

# install.packages(c('sf', 'ggplot2', 'spData'))
library('sf') #SimpleFeature Library to handle shapefiles
## Linking to GEOS 3.6.2, GDAL 2.2.3, proj.4 4.9.3
library('ggplot2') # Plotting library to create the maps

Please note that {sf} needs GEOS, GDAL and proj.4 installation on your system. If your are not familiar with those packages, they are probably not installed in your computer. So please consider {sf} requirements here.

Loading the data

We need to load the sids data that came from the sids.shp file. To do so, we’ll use the function sf::st_read().

In this example, the projection is not specified (shapefiles have an attached .proj file who contains the projection definition of the dataset, this file is not provided with this dataset). So I force it to the CRS (Coordinate Reference System) code 4326, which is corresponding to the WGS84 World Geodetic System use in GPS for example. It is commonly use when you see long/lat coordinates.

sids <- st_read(dsn = system.file("shapes/sids.shp", package = "spData"), crs = 4326 )
## Reading layer `sids' from data source `/usr/local/lib/R/site-library/spData/shapes/sids.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 22 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

Please note, that I read directly the shapefile, I didn’t load the {spData} package, because the sids dataset in there does not have geometry.

Following Edzer Pebesma tip, I use system.file("shapes/sids.shp", package = "spData") to get the data with a relative path. It is relative to a loaded package.

Let’s have quick show of the data 6 first rows:

head(sids)
## Simple feature collection with 6 features and 22 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##   CNTY_ID  AREA PERIMETER CNTY_        NAME  FIPS FIPSNO CRESS_ID BIR74
## 1    1825 0.114     1.442  1825        Ashe 37009  37009        5  1091
## 2    1827 0.061     1.231  1827   Alleghany 37005  37005        3   487
## 3    1828 0.143     1.630  1828       Surry 37171  37171       86  3188
## 4    1831 0.070     2.968  1831   Currituck 37053  37053       27   508
## 5    1832 0.153     2.206  1832 Northampton 37131  37131       66  1421
## 6    1833 0.097     1.670  1833    Hertford 37091  37091       46  1452
##   SID74 NWBIR74 BIR79 SID79 NWBIR79 east north      x       y       lon
## 1     1      10  1364     0      19  164   176 -81.67 4052.29 -81.48594
## 2     0      10   542     3      12  183   182 -50.06 4059.70 -81.14061
## 3     5     208  3616     6     260  204   174 -16.14 4043.76 -80.75312
## 4     1     123   830     2     145  461   182 406.01 4035.10 -76.04892
## 5     9    1066  1606     3    1197  385   176 281.10 4029.75 -77.44057
## 6     7     954  1838     5    1237  411   176 323.77 4028.10 -76.96474
##        lat L_id M_id                       geometry
## 1 36.43940    1    2 MULTIPOLYGON (((-81.47276 3...
## 2 36.52443    1    2 MULTIPOLYGON (((-81.23989 3...
## 3 36.40033    1    2 MULTIPOLYGON (((-80.45634 3...
## 4 36.45655    1    4 MULTIPOLYGON (((-76.00897 3...
## 5 36.38799    1    4 MULTIPOLYGON (((-77.21767 3...
## 6 36.38189    1    4 MULTIPOLYGON (((-76.74506 3...

You’ll note that there is a column called geometry. It is where the feature’s (spatial entities are called features, but for statisticians it is a record - but with a geometry) store its geometry. Each feature, in the example each county of North Carolina, has a geometry, here it is Polygons.

Please note that the Simple Features implementation is made of Points known in coordinates. LineString are made of Points at each summit who are connected by vertex. In the same way, Polygons summits are Points and the perimeter can be view as a LineString. Note that for the polygons, the first summit and the last summit needs to have exactly the same coordinates. That’s the way for the computer to know that it is a closed polygon and not an open LineString.

That is the basics of Simple Features.

Mapping

A basic Map

Let’s see what it looks :

ggplot(sids)+
  geom_sf()

Well, let’s change the projection for a more fitted one of North Carolina: NAD83 / North Carolina (ftUS) - ESPG code: 2264

sids <- sids %>% st_transform(2264)
ggplot(sids)+
  geom_sf() 

If we want to represent the rate of sids for the 1000 birth in the 1974 and 1978 period, we will use the data from the BIR74 and SID74 columns. In the quickstart, they represent counts with colors, as we don’t want to offence geographers, let’s use a ratio instead. So we want to create a new column with the ratio.

sids['sids_rate74'] <- (sids['SID74'] * 1000)/ sids['BIR74']
## Warning in `[<-.data.frame`(`*tmp*`, "sids_rate74", value =
## structure(list(: provided 2 variables to replace 1 variables

Let’s see if our ratio is present:

head(sids[,c(1,5,24)])
## Simple feature collection with 6 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 1193233 ymin: 860658.5 xmax: 2951619 ymax: 1044116
## epsg (SRID):    2264
## proj4string:    +proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.2192024384 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
##   CNTY_ID        NAME sids_rate74                       geometry
## 1    1825        Ashe   0.9165903 MULTIPOLYGON (((1270761 913...
## 2    1827   Alleghany   0.0000000 MULTIPOLYGON (((1340498 959...
## 3    1828       Surry   1.5683814 MULTIPOLYGON (((1570525 910...
## 4    1831   Currituck   1.9685039 MULTIPOLYGON (((2881106 948...
## 5    1832 Northampton   6.3335679 MULTIPOLYGON (((2525610 911...
## 6    1833    Hertford   4.8209366 MULTIPOLYGON (((2665018 911...

How does it look like ? Lets add that to our map.

Here we will use several functions and parameters:

  • ggplot(sids) -> we want to make plot of the SIDS dataset
  • geom_sf(aes(fill = sids_rate74)) -> we want to apply aesthetics to the filling of the geometry using the data from sids_rate74 column
  • scale_fill_viridis_c() -> with the viridis color scale dedicated to filling for continuous data
ggplot(sids)+
  geom_sf(aes(fill = sids_rate74))+
  scale_fill_viridis_c()

Not bad. Now we need some refinements like a title, some labels. Those functions are provided by ggplot2.

Making a better map

First we should store the map in an object so we won’t have to write all the code each time

Creating map object

map <- ggplot(sids)+
  geom_sf(aes(fill = sids_rate74))+
  scale_fill_viridis_c()

map

Adding a title and a subtitle

map <- map + ggtitle(
  "Sudden Infant Death Syndrom in North Carolina (USA)", # title
  "1974 -1978" # subtitle
  )
map

Change the legend title and place it below the map

map <- map + scale_fill_viridis_c(name = "SIDS cases \nfor 1000 births") + 
    theme(legend.position = "bottom")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
map

Remove the labels for x and y axis

map <- map + 
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(), 
    axis.ticks.x = element_blank()
  ) +
  theme(
    axis.title.y = element_blank(),
    axis.text.y = element_blank(), 
    axis.ticks.y = element_blank())

map

What’s next ?

One thing you can do is use an OpenStreetMap basemap using Leaflet and make it more interactive for example. Or display labels on the map.

Try to reproduce with the data from 1979 to 1984 (hint: use SID79 and BIR79).

I plan to redo and improve that mapping using the {cartography} since it has been recently updated to use {sf}.

Homer Simpson Wouhou !

There is a lot of documentation regarding R spatial but you might want to take a look at those ressources:

R session

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.1 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
## [1] C
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.0.0 sf_0.6-3     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18      knitr_1.20        magrittr_1.5     
##  [4] units_0.6-0       munsell_0.5.0     viridisLite_0.3.0
##  [7] colorspace_1.3-2  rlang_0.2.2       plyr_1.8.4       
## [10] stringr_1.3.1     tools_3.5.1       grid_3.5.1       
## [13] gtable_0.2.0      xfun_0.3          e1071_1.7-0      
## [16] DBI_1.0.0         withr_2.1.2       htmltools_0.3.6  
## [19] class_7.3-14      lazyeval_0.2.1    yaml_2.2.0       
## [22] rprojroot_1.3-2   digest_0.6.16     tibble_1.4.2     
## [25] crayon_1.3.4      bookdown_0.7      spData_0.2.9.3   
## [28] evaluate_0.11     rmarkdown_1.10    blogdown_0.8     
## [31] labeling_0.3      stringi_1.2.4     pillar_1.3.0     
## [34] compiler_3.5.1    scales_1.0.0      backports_1.1.2  
## [37] classInt_0.2-3