"Error Bars" on Tiled Heatmaps

"Error Bars" on Tiled Heatmaps

By Max Candocia


February 24, 2021

When visualizing data, you sometimes might want to represent points as tiles on a heatmap, where the color indicates the value. Take for example, the following toy data:


set.seed(2021 * 02 * 24)

mydata = expand.grid(
  x=1:6, y=1:6
) %>%
    val = rnorm(36),
    err_width = rexp(36) * 2/3,
    lower = val-err_width,
    upper = val+err_width

It is a 6-by-6 grid with normally-distributed values representing some sort of estimate, with exponentially-distributed error. The first 7 rows:

x y val err_width lower upper
1 1 -0.418 0.173 -0.592 -0.245
2 1 0.963 0.424 0.538 1.387
3 1 -0.531 0.060 -0.590 -0.471
4 1 0.286 0.785 -0.499 1.071
5 1 0.081 0.103 -0.022 0.183
6 1 -0.550 0.224 -0.774 -0.325
1 2 -0.160 1.186 -1.345 1.026

Ideally, an “error bar” will share the same type of scale as the value of the data, while being less prominent. Below is one such way of doing this, by placing two small, colored dots vertically within each cell. The top one represents the upper limit of a confidence interval—or any other kind of interval, like prediction interval or a quantile—and the lower dot the lower limit.

# symmetric color range
color_range = c(-1,1) * max(abs(c(min(mydata$lower), max(mydata$upper))))

) +
  geom_tile(aes(x=x,y=y, fill=val)) +
      x=x,y=y-0.2, color=lower
  ) +
      x=x,y=y+0.2, color=upper
  ) +
  # limits should be the same, using divergent palette for ease of seeing when 
  # interval contains 0
  scale_fill_gradientn(colors=cetcolor::cet_pal(7, 'd1a'), limits=color_range) +
  scale_color_gradientn(colors=cetcolor::cet_pal(7, 'd1a'), limits=color_range) +
  scale_x_continuous(breaks=1:6) + scale_y_continuous(breaks=1:6) +
  guides(color=FALSE) +
  theme_bw() +
  ggtitle('Sample Heatmap Tile with Dot "Error Bars"',
          subtitle='Lower dot = lower estimate | Upper dot = upper estimate') +
plot of chunk plots

When the dots are barely visible, the interval is very narrow. When they are different colors (red vs. blue), the interval contains the value 0, which is often the reference value used in statistical significance tests. i.e., if the confidence interval contains 0, the null hypothesis that the value is 0 is not rejected.

More Tiles

This can get a bit messier for larger data, but it is still manageable. Take for example a 25-by-25 grid:

plot of chunk example2

While it's slightly trickier to read, it is not too difficult to interpret the graph versus one without the error dots. I do not see a very large number of tiles working, at least on regular computer screens or mobile devices.

In those cases, it might suffice to either indicate the least extreme estimate—i.e., the one closest to 0 if the interval does not contain 0, or 0 itself if it does—or a binary (0 for "zero" or 1 for "nonzero") or ternary (-1, 0 or 1 for values with those signs respectively) indicator of the least extreme value.

All of this is dependent upon the application, of course. At some point, though, there can be so much information that it is not reasonable to neatly fit it all in a static visualization.


Recommended Articles

How to use IPython Notebooks in Blog Posts

You can convert IPython notebooks to html with the command-line to insert them into your blog/website, and use a stylesheet provided here for proper formatting and syntax highlighting.

Converting fields of lists into wide and tall formats in R

If you have ever downloaded survey data, or any other kind of data, that has a field that is itself comma-separated, you may have found it annoying/difficult to reshape the data into a more useful form.