"Error Bars" on Tiled Heatmaps

"Error Bars" on Tiled Heatmaps

By Max Candocia


February 24, 2021

When visualizing data, you sometimes might want to represent points as tiles on a heatmap, where the color indicates the value. Take for example, the following toy data:


set.seed(2021 * 02 * 24)

mydata = expand.grid(
  x=1:6, y=1:6
) %>%
    val = rnorm(36),
    err_width = rexp(36) * 2/3,
    lower = val-err_width,
    upper = val+err_width

It is a 6-by-6 grid with normally-distributed values representing some sort of estimate, with exponentially-distributed error. The first 7 rows:

x y val err_width lower upper
1 1 -0.418 0.173 -0.592 -0.245
2 1 0.963 0.424 0.538 1.387
3 1 -0.531 0.060 -0.590 -0.471
4 1 0.286 0.785 -0.499 1.071
5 1 0.081 0.103 -0.022 0.183
6 1 -0.550 0.224 -0.774 -0.325
1 2 -0.160 1.186 -1.345 1.026

Ideally, an “error bar” will share the same type of scale as the value of the data, while being less prominent. Below is one such way of doing this, by placing two small, colored dots vertically within each cell. The top one represents the upper limit of a confidence interval—or any other kind of interval, like prediction interval or a quantile—and the lower dot the lower limit.

# symmetric color range
color_range = c(-1,1) * max(abs(c(min(mydata$lower), max(mydata$upper))))

) +
  geom_tile(aes(x=x,y=y, fill=val)) +
      x=x,y=y-0.2, color=lower
  ) +
      x=x,y=y+0.2, color=upper
  ) +
  # limits should be the same, using divergent palette for ease of seeing when 
  # interval contains 0
  scale_fill_gradientn(colors=cetcolor::cet_pal(7, 'd1a'), limits=color_range) +
  scale_color_gradientn(colors=cetcolor::cet_pal(7, 'd1a'), limits=color_range) +
  scale_x_continuous(breaks=1:6) + scale_y_continuous(breaks=1:6) +
  guides(color=FALSE) +
  theme_bw() +
  ggtitle('Sample Heatmap Tile with Dot "Error Bars"',
          subtitle='Lower dot = lower estimate | Upper dot = upper estimate') +
plot of chunk plots

When the dots are barely visible, the interval is very narrow. When they are different colors (red vs. blue), the interval contains the value 0, which is often the reference value used in statistical significance tests. i.e., if the confidence interval contains 0, the null hypothesis that the value is 0 is not rejected.

More Tiles

This can get a bit messier for larger data, but it is still manageable. Take for example a 25-by-25 grid:

plot of chunk example2

While it's slightly trickier to read, it is not too difficult to interpret the graph versus one without the error dots. I do not see a very large number of tiles working, at least on regular computer screens or mobile devices.

In those cases, it might suffice to either indicate the least extreme estimate—i.e., the one closest to 0 if the interval does not contain 0, or 0 itself if it does—or a binary (0 for "zero" or 1 for "nonzero") or ternary (-1, 0 or 1 for values with those signs respectively) indicator of the least extreme value.

All of this is dependent upon the application, of course. At some point, though, there can be so much information that it is not reasonable to neatly fit it all in a static visualization.


Recommended Articles

The Community of Garlicoin, the New Meme Cryptocurrency

Garlicoin is the hottest new meme cryptocurrency. I surveyed about 200 of its enthusiasts to get a good idea of what the community looked like and what they thought about the cryptocurrency.

How Dream (or Anyone) Could Cheat in Minecraft Speedruns Without Anyone Noticing

Recently, famous YouTuber Dream had his Minecraft speedrun records removed as a result of cheating. If he had cheated differently, would he have been able to evade detection?