This post is based on a notebook I started about R spatial analysis for the project OSGeoLive
It aims to provide a quick introduction to R spatial analysis and cartography and will be extended.
R is a language dedicated to statistics and data analysis. It has also a lot of strong packages for spatial analysis. Recent packages like {sf} allows easy Simple Features manipulation.
This document aims to complete the R Overview and R Quickstart. If you don’t have read them yet, please consider doing it if you are new to R.
Please note that is post needs several spatial libraries (GEOS, GDAL), their dependencies (libuunits, etc) and some R wrapper. So I build a custom Docker image for the Gitlab-CI runner used to render this site. It is a mix from the Blogdown recommended .gitlab-ci.yml and Sébastien Rochette recommandations for installing R 3.5 on ubuntu 18.04.
Works pretty well. So let’s do some mapping.
Simple mapping
I was inspired by a QGIS example from its OSGeoLive quickstart.
I was also inspired by Sébastien Rochette (again !) blogpost on mapping with {sf}.
We want to represent the sudden infant death syndrome (SIDS) in North Carolina (USA) data from the {spData} package.
More information about the dataset here: nowosad.github.io/spData/reference/nc.sids.html.
Needed libaries
R comes with a lot of packages to provides functionnalities. Those functions are stocked in libraries and in order to call them, you need to load them beforehand. To do so, use the function library(< package name >).
This post use latest version (at the time of writing) of those packages:
- {sf}: 0.6-3
- {ggplot2}: 3.0.0
- {spData}: 0.2.9.3
That is because ggplot2 version 3.0 compatibility with sf have been greatly improved and I like when things makes my life simple.
# install.packages(c('sf', 'ggplot2', 'spData'))
library('sf') #SimpleFeature Library to handle shapefiles
## Linking to GEOS 3.6.2, GDAL 2.2.3, proj.4 4.9.3
library('ggplot2') # Plotting library to create the maps
Please note that {sf} needs GEOS, GDAL and proj.4 installation on your system. If your are not familiar with those packages, they are probably not installed in your computer. So please consider {sf} requirements here.
Loading the data
We need to load the sids data that came from the sids.shp file. To do so, we’ll use the function sf::st_read().
In this example, the projection is not specified (shapefiles have an attached .proj file who contains the projection definition of the dataset, this file is not provided with this dataset). So I force it to the CRS (Coordinate Reference System) code 4326, which is corresponding to the WGS84 World Geodetic System use in GPS for example. It is commonly use when you see long/lat coordinates.
sids <- st_read(dsn = system.file("shapes/sids.shp", package = "spData"), crs = 4326 )
## Reading layer `sids' from data source `/usr/local/lib/R/site-library/spData/shapes/sids.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 22 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
Please note, that I read directly the shapefile, I didn’t load the {spData} package, because the sids dataset in there does not have geometry.
Following Edzer Pebesma tip, I use system.file("shapes/sids.shp", package = "spData")
to get the data with a relative path. It is relative to a loaded package.
Let’s have quick show of the data 6 first rows:
head(sids)
## Simple feature collection with 6 features and 22 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## CNTY_ID AREA PERIMETER CNTY_ NAME FIPS FIPSNO CRESS_ID BIR74
## 1 1825 0.114 1.442 1825 Ashe 37009 37009 5 1091
## 2 1827 0.061 1.231 1827 Alleghany 37005 37005 3 487
## 3 1828 0.143 1.630 1828 Surry 37171 37171 86 3188
## 4 1831 0.070 2.968 1831 Currituck 37053 37053 27 508
## 5 1832 0.153 2.206 1832 Northampton 37131 37131 66 1421
## 6 1833 0.097 1.670 1833 Hertford 37091 37091 46 1452
## SID74 NWBIR74 BIR79 SID79 NWBIR79 east north x y lon
## 1 1 10 1364 0 19 164 176 -81.67 4052.29 -81.48594
## 2 0 10 542 3 12 183 182 -50.06 4059.70 -81.14061
## 3 5 208 3616 6 260 204 174 -16.14 4043.76 -80.75312
## 4 1 123 830 2 145 461 182 406.01 4035.10 -76.04892
## 5 9 1066 1606 3 1197 385 176 281.10 4029.75 -77.44057
## 6 7 954 1838 5 1237 411 176 323.77 4028.10 -76.96474
## lat L_id M_id geometry
## 1 36.43940 1 2 MULTIPOLYGON (((-81.47276 3...
## 2 36.52443 1 2 MULTIPOLYGON (((-81.23989 3...
## 3 36.40033 1 2 MULTIPOLYGON (((-80.45634 3...
## 4 36.45655 1 4 MULTIPOLYGON (((-76.00897 3...
## 5 36.38799 1 4 MULTIPOLYGON (((-77.21767 3...
## 6 36.38189 1 4 MULTIPOLYGON (((-76.74506 3...
You’ll note that there is a column called geometry. It is where the feature’s (spatial entities are called features, but for statisticians it is a record - but with a geometry) store its geometry. Each feature, in the example each county of North Carolina, has a geometry, here it is Polygons.
Please note that the Simple Features implementation is made of Points known in coordinates. LineString are made of Points at each summit who are connected by vertex. In the same way, Polygons summits are Points and the perimeter can be view as a LineString. Note that for the polygons, the first summit and the last summit needs to have exactly the same coordinates. That’s the way for the computer to know that it is a closed polygon and not an open LineString.
That is the basics of Simple Features.
Mapping
A basic Map
Let’s see what it looks :
ggplot(sids)+
geom_sf()
Well, let’s change the projection for a more fitted one of North Carolina: NAD83 / North Carolina (ftUS) - ESPG code: 2264
sids <- sids %>% st_transform(2264)
ggplot(sids)+
geom_sf()
If we want to represent the rate of sids for the 1000 birth in the 1974 and 1978 period, we will use the data from the BIR74 and SID74 columns. In the quickstart, they represent counts with colors, as we don’t want to offence geographers, let’s use a ratio instead. So we want to create a new column with the ratio.
sids['sids_rate74'] <- (sids['SID74'] * 1000)/ sids['BIR74']
## Warning in `[<-.data.frame`(`*tmp*`, "sids_rate74", value =
## structure(list(: provided 2 variables to replace 1 variables
Let’s see if our ratio is present:
head(sids[,c(1,5,24)])
## Simple feature collection with 6 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 1193233 ymin: 860658.5 xmax: 2951619 ymax: 1044116
## epsg (SRID): 2264
## proj4string: +proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.2192024384 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
## CNTY_ID NAME sids_rate74 geometry
## 1 1825 Ashe 0.9165903 MULTIPOLYGON (((1270761 913...
## 2 1827 Alleghany 0.0000000 MULTIPOLYGON (((1340498 959...
## 3 1828 Surry 1.5683814 MULTIPOLYGON (((1570525 910...
## 4 1831 Currituck 1.9685039 MULTIPOLYGON (((2881106 948...
## 5 1832 Northampton 6.3335679 MULTIPOLYGON (((2525610 911...
## 6 1833 Hertford 4.8209366 MULTIPOLYGON (((2665018 911...
How does it look like ? Lets add that to our map.
Here we will use several functions and parameters:
- ggplot(sids) -> we want to make plot of the SIDS dataset
- geom_sf(aes(fill = sids_rate74)) -> we want to apply aesthetics to the filling of the geometry using the data from sids_rate74 column
- scale_fill_viridis_c() -> with the viridis color scale dedicated to filling for continuous data
ggplot(sids)+
geom_sf(aes(fill = sids_rate74))+
scale_fill_viridis_c()
Not bad. Now we need some refinements like a title, some labels. Those functions are provided by ggplot2.
Making a better map
First we should store the map in an object so we won’t have to write all the code each time
Creating map object
map <- ggplot(sids)+
geom_sf(aes(fill = sids_rate74))+
scale_fill_viridis_c()
map
Adding a title and a subtitle
map <- map + ggtitle(
"Sudden Infant Death Syndrom in North Carolina (USA)", # title
"1974 -1978" # subtitle
)
map
Change the legend title and place it below the map
map <- map + scale_fill_viridis_c(name = "SIDS cases \nfor 1000 births") +
theme(legend.position = "bottom")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
map
Remove the labels for x and y axis
map <- map +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
) +
theme(
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
map
What’s next ?
One thing you can do is use an OpenStreetMap basemap using Leaflet and make it more interactive for example. Or display labels on the map.
Try to reproduce with the data from 1979 to 1984 (hint: use SID79 and BIR79).
I plan to redo and improve that mapping using the {cartography} since it has been recently updated to use {sf}.
There is a lot of documentation regarding R spatial but you might want to take a look at those ressources:
- Geocomputation with R by Robin Lovelace, Jakub Nowosad, Jannes Muenchow
- R Spatial by Edzer Pebesma
- Tidy spatial data analysis (video) by Edzer Pebesma at rstudio::conf 2018
- Introduction to mapping with {sf} & Co. on spatial analysis with R by Sebastien Rochette
R session
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.0.0 sf_0.6-3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.18 knitr_1.20 magrittr_1.5
## [4] units_0.6-0 munsell_0.5.0 viridisLite_0.3.0
## [7] colorspace_1.3-2 rlang_0.2.2 plyr_1.8.4
## [10] stringr_1.3.1 tools_3.5.1 grid_3.5.1
## [13] gtable_0.2.0 xfun_0.3 e1071_1.7-0
## [16] DBI_1.0.0 withr_2.1.2 htmltools_0.3.6
## [19] class_7.3-14 lazyeval_0.2.1 yaml_2.2.0
## [22] rprojroot_1.3-2 digest_0.6.16 tibble_1.4.2
## [25] crayon_1.3.4 bookdown_0.7 spData_0.2.9.3
## [28] evaluate_0.11 rmarkdown_1.10 blogdown_0.8
## [31] labeling_0.3 stringi_1.2.4 pillar_1.3.0
## [34] compiler_3.5.1 scales_1.0.0 backports_1.1.2
## [37] classInt_0.2-3
Share this post
Twitter
Google+
Reddit
LinkedIn
Email