Choropleth Map of the UK with R - Part I

7 minute read

Lately, on one of my project, I had to work on the visualisation of data accross the whole UK. At my utmost surprise, I discovered any easy way to do it for the UK, in the contrary of others countries, like US or France. I speak about doing it in R, here.

I present in the first part a modified piece of code which fix this impair. The second one is an exploration of the open source data to make this plot relevant.

The main site I used to get the polygons of the UK areas is Thanks to Chris Bell for this amasing website.

Extract the data

As the data are under KML format, it is possible to parse the files with xmlParse and use xpath expression to tighten the polygon describing the area under a data frame format.

# libraries: library(XML) library(stringr) library(plyr) library(data.table) 
library(leaflet) library(geosphere) library(ggmap) library(grid)  # list of all regions: 
region.ct <- c("AB", "AL", "B", "BA", "BB", "BD", "BH", "BL", "BN", "BR", "BS", "BT", "CA", "CB", "CF", "CH", "CM", "CO", "CR", "CT", "CV", "CW", "DA", "DD", "DE", "DG", "DH", "DL", "DN", "DT", "DY", "E", "EC", "EH", "EN", "EX", "FK", "FY", "G", "GL", "GU", "HA", "HD", "HG", "HP", "HR", "HS", "HU", "HX", "IG", "IP", "IV", "KA", "KT", "KW", "KY", "L", "LA", "LD", "LE", "LL", "LN", "LS", "LU", "M", "ME", "MK", "ML", "N", "NE", "NG", "NN", "NP", "NR", "NW", "OL", "OX", "PA", "PE", "PH", "PL", "PO", "PR", "RG", "RH", "RM", "S", "SA", "SE", "SG", "SK", "SL", "SM", "SN", "SO", "SP", "SR", "SS", "ST", "SW", "SY", "TA", "TD", "TF", "TN", "TQ", "TR", "TS", "TW", "UB", "W", "WA", "WC", "WD", "WF", "WN", "WR", "WS", "WV", "YO", "ZE") 
 # namespace: xmlns <- ""  
# function to create a data frame from the file: 
read.files.kml <- function(z) { # z <- "IV"    
  pc2 <- xmlParse(file = paste0("", z, ".kml"))  
  # all the name of a Placemark 
  name.pc0 <- sapply(getNodeSet(pc2, "//xmlns:Placemark/xmlns:name" 
                               , c(xmlns = xmlns)), xmlValue)  
  # as some area have multiple polygons, we have to count the number of polygons(coordinates) per name and duplicate these names: <- llply(1:length(name.pc0), function(x) getNodeSet(pc2 
  , paste0("count(//xmlns:Placemark[", x, "]//xmlns:coordinates)"), c(xmlns = xmlns))) 
     # order the index and apply to name.pc: 
  name.pc <- name.pc0[rep(1:length(, times =]    # load the polygons: 
  polygon.pc0 <- sapply(getNodeSet(pc2, "//xmlns:Placemark//xmlns:coordinates" 
                                    , c(xmlns = xlmns)), xmlValue)    
  # Function which, for each area, tighten the coordinates of the poygons: 
  # In addition, a NA row is added to be able to plot the leaflet app: 
  recup.coordinate <- function(x) { # x <- 1: length(name.pc) ;  x <- 1  
    polygon.pc1 <- unlist(str_split(polygon.pc0[[x]], " ")) 
    df <- data.frame( name = name.pc[x]       , name.file = z 
      , lat = as.numeric(unlist(lapply(polygon.pc1, function(y) unlist(str_split(y, ",") )[2] ) )) 
      , lng = as.numeric(unlist(lapply(polygon.pc1, function(y) unlist(str_split(y, ",") )[1] ) )) 
      , stringsAsFactors = F)  
    # calculate the area of each polygon with the adequate formulae(in square meter): 
    df$area.plg <- areaPolygon(as.matrix(df[, c("lng", "lat")]))        
    # adding a NA row for the leaflet app:     df2 <- rbind(df, NA)   }    
 # Final results for the file:  fin.res <- ldply(1:length(name.pc), recup.coordinate)    }  
# run the code: # Depending of a good internet connection, 1-3 min ptm <- proc.time() 
contrib.pc.fin <- ldply(region.ct, read.files.kml) proc.time() - ptm  # convert to data.table: 
contrib.pc.fin.dt <- data.table(contrib.pc.fin)

Map the results

I used to be a fan of the googleVis package to do my visualisations. This was before the package ggmap. This is mainly due to the flexibility of ggmap and the availability of all the ggplot function. Here, the way to proceed is loading a google map layer and adding with geom_polygon the polygons mapping areas.

Leaflet is great as it allows to access to the map dynamically.

Both ways are nice and only need a few rows of code.

With gmap

map <- get_map(location = c(lon = -2.5, lat =  54.2), zoom = 6, maptype = "satellite")

A big plot, as there is a lot of details to look at.

library(grid) # ggmap function: ggmap(map)  + 
  geom_polygon(data = contrib.pc.fin.dt, aes(x = lng, y = lat 
                            , fill = name), alpha = 0.8, color = "grey20", size = 0.6) + 
  theme(legend.position = "none"         , axis.ticks = element_blank() 
        , axis.text = element_blank()         , axis.title = element_blank() 
        , panel.margin = unit(c(0, 0, 0, 0), "mm"))