Overlaying Density Heatmaps on Geographic Maps in R

# Overlaying Density Heatmaps on Geographic Maps in R

|

December 15, 2017

The other night I attended an R-Ladies meetup where Kaylene McClanahan presented a brief into to R Markdown and set-up a mini-hackathon-style coding session with NYC noise complaint data she prepared. The group I was in was working on mapping, although we didn't have enough time to do everything we wanted, so below I demonstrate how density data can be calculated and used in combination with `ggmap()` to produce interesting and useful visualizations.

## GitHub Source

Note: Code can be found on GitHub for the project (includuing the .Rhtml file used to make this article), and the specific density weighting code can be found here.

## Setup

First, let's load the data and calculate the response time from the `created date ` and ` closed date ` variables. Additionally, let's also filter out points with negative response time, response time over 50 hours (unrealistic), and anything with missing coordinates. I will be using base R for all of the (limited) data cleaning required here.

```library(ggplot2)
library(ggmap)
#this library produces color palettes that are easier to discern
library(cetcolor)
source('weighted_map_raster.r')
duration < 3000 &
!is.na(longitude),])
```

Above I loaded some code I wrote to calculate density values for a map of new york. Below, I will use them as a rasterized overlay of a map using `geom_raster()`, which is much, much faster (~100 times) than using `geom_tile()` if it is possible to use it. Note that you must use `coord_cartesian()` after `ggmap()`, since rasterized graphics don't work with non-cartesian projections.

```default_density = createWeightedGrid(party_ready_realistic)
```
```## [1] 0.7580717
## [1] 0.003790359
## [1] 0.005
```
```weighted_density = createWeightedGrid(party_ready_realistic, variable='duration')
```
```## [1] 0.7580717
## [1] 0.003790359
## [1] 0.005
```
```weighted_density\$normalized = weighted_density\$weight/(1e-10 + default_density\$weight)
weighted_density\$default_density = default_density\$weight
```

The resulting data frames look like this:

```cat(kable(head(default_density)))
```
lat_index lon_index longitude latitude weight
1 1 -74.25507 40.49382 0.0499639
2 1 -74.25507 40.49882 0.3094801
3 1 -74.25507 40.50382 3.9012009
4 1 -74.25507 40.50882 3.9115544
5 1 -74.25507 40.51382 3.8654261
6 1 -74.25507 40.51882 3.8625678
```cat(kable(head(weighted_density)))
```
lat_index lon_index longitude latitude weight normalized default_density
1 1 -74.25507 40.49382 9.243324 185.0000 0.0499639
2 1 -74.25507 40.49882 53.871410 174.0707 0.3094801
3 1 -74.25507 40.50382 401.538563 102.9269 3.9012009
4 1 -74.25507 40.50882 402.550277 102.9131 3.9115544
5 1 -74.25507 40.51382 393.665244 101.8427 3.8654261
6 1 -74.25507 40.51882 393.339337 101.8336 3.8625678

We will also load a map of New York City, along with defining a labeler for log-scaled legends and a constant.

```nyc <- get_map('new york city, new york', maptype='toner', source='stamen', zoom=11)

antilog_formatter = function(x){
round(exp(x))
}
r5 = sqrt(5)
```

Below is a graph of the density of noise complaints within New York City.

```ggmap(nyc) + coord_cartesian() +
geom_raster(data=weighted_density[weighted_density\$normalized > exp(0.101),],
aes(x=longitude, y=latitude,
fill=log( default_density),
alpha=log(default_density))) +
scale_alpha_continuous() +
colors=cet_pal(5), label=antilog_formatter,
breaks = log(c(1/r5, 1,r5, 4, 4*r5, 20, 20*r5, 100, 100*r5 ))) +
ggtitle('Density of Noise Complaints in NYC (per year)') +
guides(alpha='none')
```
```## Warning: Removed 1243 rows containing missing values (geom_raster).
```

Below is an (accurate) estimate of the average wait time for complaints. It seems that the northernmost region has an unusually high wait time.

```ggmap(nyc) + coord_cartesian() +
geom_raster(data=weighted_density[weighted_density\$normalized > exp(0.101),],
aes(x=longitude, y=latitude,
fill=log( normalized),
alpha=log(normalized))) +
scale_alpha_continuous() +
colors=cet_pal(5), label=antilog_formatter) +
ggtitle('Average Response Time to Noise Complaints in NYC') +
guides(alpha='none')
```
```## Warning: Removed 1243 rows containing missing values (geom_raster).
```

This is the code that I used to generate the density maps. It is about 2.5 times slower than the ` kde2d() ` from the ` MASS ` package, but its behavior is more explicit, and it can (most importantly) be used to calculate weighted averages. There is also a function ` kde2d.weighted` from the ` ggtern ` package, but it has a small bug in it and I haven't figured out how to get the results to interpret properly. The code below is about 3-4 times slower, but it works well.

```getMatrixElementsByLinearIndex <- function(mat, row_idx, col_idx){
nr = nrow(mat)
return(mat[row_idx + nr*(col_idx-1)])
}

## Function creates a kernel density grid, either based on density of points (default)
##          or a weighted density
# variable - character of a variable that is used to weight the density
# kernel_lat_sd - the standard deviation/bandwidth of the kernel in latitude units; default
#                 is approximately 1 mile
# tile_latitude_size - the length of each tile's latitude dimension; longitude will scale
#                      with this to be roughly square
# max_influence_sd - the maximum number of standard deviations in either axis that a point
#                    can create influence
# padding_lat - the amount of latitude added in each direction where the density will still
#               be calculated; padding for longitude is calculated from this
# normalize_weights - if TRUE, then the kernel density will be normalized over each affected
#                     region for each data point
createWeightedGrid <- function(data, variable=NULL, kernel_lat_sd=1/69, tile_latitude_size=0.005,
data = data[(!is.na(data\$longitude)) & !is.na(data\$latitude),]
if (!is.null(variable)){
data = data[!is.na(data[,variable]),]
}
longitude = data\$longitude
latitude = data\$latitude
med_lon = mean(range(longitude))
med_lat = mean(range(latitude))
lon_scale_factor = cos(med_lat * pi/180)
range_lon = range(longitude, na.rm=TRUE) + c(-1, 1) * padding_lon
range_lat = range(latitude, na.rm=TRUE) + c(-1, 1) * padding_lat

tile_longitude_size = tile_latitude_size * lon_scale_factor

lat_tile_values = seq(range_lat[1], range_lat[2], tile_latitude_size)
lon_tile_values = seq(range_lon[1], range_lon[2], tile_longitude_size)

n_lat_tiles = length(lat_tile_values)
n_lon_tiles = length(lon_tile_values)

weight_matrix = matrix(0, ncol=n_lon_tiles, nrow=n_lat_tiles)

lat_to_index <- function(latitude){
round(n_lat_tiles * (latitude - range_lat[1])/diff(range_lat)) + 1
}

lon_to_index <- function(longitude){
round(n_lon_tiles * (longitude - range_lon[1])/diff(range_lon))
}

lat_index_radius = round(n_lat_tiles * max_influence_sd * tile_latitude_size/diff(range_lat))
lon_index_radius = round(n_lon_tiles * max_influence_sd * tile_longitude_size/diff(range_lon))

calculate_latitude_bounds <- function(latitude){
lat_index = lat_to_index(latitude)
bounds = c(pmax(lat_index - lat_index_radius, 1),
return(bounds[1]:bounds[2])
}

calculate_longitude_bounds <- function(longitude){
lon_index = lon_to_index(longitude)
bounds = c(pmax(lon_index - lon_index_radius, 1),
return(bounds[1]:bounds[2])
}

if (!is.null(variable)){
x = data[,variable]
}
else{
x = rep(1, nrow(data))
}

calculate_distance <- function(latitude, longitude, grid){
sqrt(((grid\$longitude-longitude)*lon_scale_factor)^2 +
(grid\$latitude-latitude)^2)
}
for (i in 1:nrow(data)){
x_latitude = latitude[i]
x_longitude = longitude[i]
valid_lat_indexes = calculate_latitude_bounds(x_latitude)
valid_lon_indexes = calculate_longitude_bounds(x_longitude)
latlon_grid = expand.grid(lat_index = valid_lat_indexes,
lon_index = valid_lon_indexes)
latlon_grid\$latitude = lat_tile_values[latlon_grid\$lat_index]
latlon_grid\$longitude = lon_tile_values[latlon_grid\$lon_index]

new_weights = 1/(pi*kernel_lat_sd) * exp(-((calculate_distance(x_latitude,
x_longitude,
latlon_grid)))^2/
(2*kernel_lat_sd))
if (normalize_weights){
new_weights = new_weights/(sum(new_weights) + 1e-8)
}
weight_matrix[latlon_grid\$lat_index +
n_lat_tiles * (latlon_grid\$lon_index-1)] =
weight_matrix[latlon_grid\$lat_index +
n_lat_tiles * (latlon_grid\$lon_index-1)] +
new_weights * x[i]
}
final_grid = expand.grid(lat_index = 1:n_lat_tiles, lon_index=1:n_lon_tiles)
final_grid\$longitude = lon_tile_values[final_grid\$lon_index]
final_grid\$latitude = lat_tile_values[final_grid\$lat_index]
final_grid\$weight = getMatrixElementsByLinearIndex(weight_matrix,
final_grid\$lat_index,
final_grid\$lon_index)
return(final_grid)
}
```

Tags: