December 29, 2021
How to effectively hide messages and other data inside images.
March 28, 2021
Which Subreddits are most likely to generate awards for their users?
February 24, 2021
Heatmaps are a useful way of plotting 2-dimensional data, such as cross-tabulations. Adding "error bars" can seem non-intuitive, but expressing them in your visualization is possible with a small trick.
December 19, 2020
Recently, famous YouTuber Dream had his Minecraft speedrun records removed as a result of cheating. If he had cheated differently, would he have been able to evade detection?
December 03, 2020
If you have ever downloaded survey data, or any other kind of data, that has a field that is itself comma-separated, you may have found it annoying/difficult to reshape the data into a more useful form.
November 26, 2020
Pooled testing is a method of increasing the efficiency of medical tests, such as COVID-19 detection tests. How efficient is it, though? That largely depends on what percentage of the population is infected.
November 13, 2020
What candies would work best in a bundle? Using rankings and correlations, popular candies can be grouped together for optimal combinations.
November 12, 2020
If you need more data for a survey, you can use Reddit as a source of responses. In this article, we look at a few factors that affect the success of a survey posted to Reddit.
October 28, 2020
What would it look like if people across the US voted for a candy? Explore different results using different voting methods and different types of representation, such as a national vote versus the Electoral College.
October 02, 2020
Do you have trouble memorizing long strings, but want to keep things easy to remember? Look no further than the new keyToEnglish package in R, now available on CRAN.
September 13, 2020
When working with path-like data, such as a run recorded by GPS, you may want to group near-identical routes together. With a handful of data, I demonstrate how similarities can be calculated to find duplicate runs, as well as make general comparisons between runs.
August 30, 2020
When plotting data, you may want to use a log-scale for most of your data, but zeros, near-zero values, and negative values make this impossible. With piecewise linear and logarithmic functions, however, this effect can still be achieved.
May 17, 2020
A relatively straightforward method of visualizing the direction of a running path using R and ggmap. This also works for any sort of path data in general.
April 08, 2020
How Likely Are You to be Banned From Reddit? I got a bot for that.
February 26, 2020
How do you identify an "outlier" in a triathlon?
February 23, 2020
Images are one of the most common types of data that people view on the internet, but could they be hiding more than the eye can see?
January 20, 2020
A visualization of my runs in 2019 using R, with buttons allowing comparisons for my runs in 2018.
September 22, 2019
The recent Hong Kong protests have garnered much media attention. Reddit, one of the largest social media communities, has its own communities with different takes on the protests. Here, we take a look at four notable ones.
May 04, 2019
Insights and animated charts of the paces and splits of Boston Marathon runners.
April 01, 2019
After 508 individuals each rated some puns, I sorted them by average score. How far down the list can you go?