# Part 3 Plotting with ggplot2

Like the animation belows shows, making a plot with `ggplot2` is like adding layers to a cake: This is inspired by Gina Reynolds and Garrick Aden-Buie.

``library('ggplot2')``

## 3.1 Adding layers to create a plot

For this example, we will use the `iris` dataset provided by ggplot.

### 3.1.1 Create a ggplot object and add data

``ggplot(iris) `` ### 3.1.2 Define X and Y axis

``````ggplot(iris) +
aes(Sepal.Length, Sepal.Width) ``````

Note: `aes` normally sits within the `ggplot` function! You can also name parameters.

``ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width))`` After defining what data will be used and the axis, we can define a geometry. We want a scatterplot so will use the `geom_point()` function.

``````ggplot(iris) +
aes(Sepal.Length, Sepal.Width) +
geom_point() `````` There is several `geom_*` functions to create plots. We will see a couple more later.

To change labels, you can use the `labs()` function.

#### 3.1.4.1 X axis label

``````ggplot(iris) +
aes(Sepal.Length, Sepal.Width) +
geom_point() +
labs(x = "Sepal Length") `````` #### 3.1.4.2 Y axis label

``````ggplot(iris) +
aes(Sepal.Length, Sepal.Width) +
geom_point() +
labs(x = "Sepal Length") +
labs(y = "Sepal Width") `````` #### 3.1.4.3 Plot title

``````ggplot(iris) +
aes(Sepal.Length, Sepal.Width) +
geom_point() +
labs(x = "Sepal Length") +
labs(y = "Sepal Width") +
labs(title="The famous iris data") `````` #### 3.1.4.4 Plot subtitle

``````ggplot(iris) +
aes(Sepal.Length, Sepal.Width) +
geom_point() +
labs(x = "Sepal Length") +
labs(y = "Sepal Width") +
labs(title="The famous iris data") +
labs(subtitle="Data collected by Anderson, Edgar (1935)") `````` ### 3.1.5 More options

#### 3.1.5.1 Define a color by specie

``````ggplot(iris) +
aes(Sepal.Length, Sepal.Width) +
geom_point() +
labs(x = "Sepal Length") +
labs(y = "Sepal Width") +
labs(title="The famous iris data") +
labs(subtitle="Data collected by Anderson, Edgar (1935)") +
aes(color= Species) `````` #### 3.1.5.2 Change theme

`ggplot2` comes with several themes (Black and white, dark, grey, minimal). You can also create your owns that fits your corporate identity and style guide.

``````ggplot(iris) +
aes(Sepal.Length, Sepal.Width) +
geom_point() +
labs(x = "Sepal Length") +
labs(y = "Sepal Width") +
labs(title="The famous iris data") +
labs(subtitle="Data collected by Anderson, Edgar (1935)") +
aes(color= Species) +
theme_bw(base_size=16) `````` This is an example on how to build a plot in ggplot by adding layers. Let’s see some more.

## 3.2 More examples with the `mpg` dataset

We will used the `mpg` dataset provided by the `ggplot2` package.

### 3.2.1 The `mpg` dataset

This a subset on fuel economy data.

``head(mpg)``
``````## # A tibble: 6 x 11
##   manufacturer model displ  year   cyl trans  drv     cty   hwy fl    class
##   <chr>        <chr> <dbl> <int> <int> <chr>  <chr> <int> <int> <chr> <chr>
## 1 audi         a4      1.8  1999     4 auto(… f        18    29 p     comp…
## 2 audi         a4      1.8  1999     4 manua… f        21    29 p     comp…
## 3 audi         a4      2    2008     4 manua… f        20    31 p     comp…
## 4 audi         a4      2    2008     4 auto(… f        21    30 p     comp…
## 5 audi         a4      2.8  1999     6 auto(… f        16    26 p     comp…
## 6 audi         a4      2.8  1999     6 manua… f        18    26 p     comp…``````

According to the documentation:

This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car. A data frame with 234 rows and 11 variables

• manufacturer
• model: model name
• displ: engine displacement, in litres
• year: year of manufacture
• cyl: number of cylinders
• trans: type of transmission
• drv: f = front-wheel drive, r = rear wheel drive, 4 = 4wd
• cty: city miles per gallon
• hwy: highway miles per gallon
• fl: fuel type
• class: “type” of car

#### 3.2.1.1 Scatterplot

Let’s visualise it by display (`displ`), horse power (`hwy`) et `class` as a scatterplot.

``````ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy,
color = class))`````` #### 3.2.1.2 Smooth curve

``````ggplot(data = mpg) +
geom_point(
mapping = aes(x = displ, y = hwy)
) +
geom_smooth(
mapping = aes(x = displ, y = hwy)
) ``````
``## `geom_smooth()` using method = 'loess' and formula 'y ~ x'`` ``````ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(mapping = aes(color = class))``````
``## `geom_smooth()` using method = 'loess' and formula 'y ~ x'`` This example creates a lot of warnings that can be disabled by adding an option in the code chunk:

``````

```r
# some code with warnings
`````````

#### 3.2.1.3 As a boxplot

Boxplot are great to visualise basics statistics (median, quartiles, etc.) and data distribution.

``````ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()`````` #### 3.2.1.4 Histogram

We want to see how is the distribution of motors by class.

``````ggplot(data = mpg) +
geom_bar(mapping = aes(x = class))`````` ``````ggplot(data = mpg) +
geom_bar(mapping = aes(
x = class,
fill = class # add a color by class
))`````` The possibilities are abundant but you can’t apply each kind of plot on all dataset. To help you choose easily, you can use the esquisse package. We know how to manipulate data and make plot from it. Let’s see how to do this with geospatial data.