We will leave theory for some later posts and concentrate on a practical aspect for now. Our goal is to be able to create maps. To do that, we need to know the boundaries of areas i.e. countries, provinces or districts. That kind of data is usually stored in a special type of file, shapefile. This is clearly and oversimplification, but, as it is useful, lets stick to it for a moment.

There are a number of R packages to handle shapefiles. Most important are rgdal, sp; however we will focus on a relatively new addition - sf package that follows tidy data philosophy. Please look into the tutorial linked to get more info on how sf package works and learn a bit more about spatial data.

In our example we will use data on districts in Wroclaw. As mentioned before, data is in .shp file. Let’s read it into R and check what kind of object is it.

districts_wroclaw <- sf::read_sf('../../static/data/districts_wroclaw/districts_wroclaw.shp', 
                                 options = "ENCODING=UTF-8")
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"
## Simple feature collection with 6 features and 9 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 16.89252 ymin: 51.04515 xmax: 17.10723 ymax: 51.16599
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 6 x 10
##   OBJECTI NROSIED NAZWAOS  DATA   SHAPE_A SHAPE_L old_dst dstrct_  inhbtnt
##   <chr>     <int> <chr>    <chr>    <dbl>   <dbl> <chr>   <chr>      <dbl>
## 1 340          21 KrzykiP… 2016/… 5254965    9995 Krzyki  Krzyki-…  14.0  
## 2 341          24 GądówPo… 2016/… 3134569    7589 Fabryc… Gądów-P…  26.2  
## 3 348          42 Sołtyso… 2016/… 4547041    9004 Psie P… Sołtyso…   2.90 
## 4 349          18 Bieńkow… 2016/… 1433161    4760 Krzyki  Bieńkow…   0.500
## 5 351          32 Żerniki  2016/… 3908726    9133 Fabryc… Żerniki    3.40 
## 6 352          29 Muchobó… 2016/… 6879545   13228 Fabryc… Muchobó…   8.30 
## # ... with 1 more variable: geometry <POLYGON [°]>

Interesting! Apart from being special sf object, it is also a regular data.frame. Thanks to that, one can use packages like dplyr and tidyr directly with spatial data. This might not sounds that exciting, but it is a huge improvement compared to what was available in R year ago.

It is always nice, to be able to use old toolbox with new data. Luckily for us, plotting can be done using ggplot. Just make sure you use the new version from Github, and you can enjoy new geom_sf!


Then everything becomes super easy.

ggplot() + 
  geom_sf(data = districts_wroclaw, aes(fill = inhbtnt), color='white') + 
  scale_fill_gradient(low = "white", high = "red")

Fill of districts represent number of inhabitants.

Some useful functions from sf

How to show distribution of spatial data? The following is based on a nice web tutorial. let’s say that our goal is to visualize the difference between where younger and older people live. We could calculate median and create choropleth, but that way we loose a significant amount of information.

We will use data from polish office for cartography.

wroclaw_demo <- sf::read_sf('../../static/data/dem-rejurb-rejstat-shp/REJURB_20171231.shp')

We don’t know where exactly each person lives and drawing 600K points makes no sense. Therefore, for each region, we shall draw randomly a number of points proportional to the number of inhabitants.

First, we preprocesss data to get number of inhabitants younger and older 44 (which is as close to median as we can get)

wroclaw_demo_splited <- wroclaw_demo %>%
  mutate(below_44 = W0_2 +W3_6 +W7_12+ W13_15+ W16_18+ W19_24+ W25_34+ W35_44,
         above_44 = SUMA - below_44) %>%
  gather(group, value, below_44, above_44) %>%

To generate points we need one more packages. Original code used purr imap, but it can be substituted by lapply. Result is somewhat more cumbersome, but in case you don’t know purrr, you can limit number of new things for the day.

generate_samples <- function(data) 
  suppressMessages(st_sample(data, size = round(data$value / 100)))

points <- lapply(wroclaw_demo_splited, generate_samples)
points <- lapply(names(points), function(name){
  st_sf(data_frame(age = rep(name, length(points[[name]]))),
                      geometry = points[[name]])
points <- do.call(rbind, points)

Once points are generated everything becomes easy.

More posts on how to create maps, including interactive, are coming! You can visit my workshop materials as well.