Candy Combinations for Bundling

By Max Candocia

|

November 13, 2020

If you would like to respond to the survey for a future update of this project, and other candy-related ones on this site, click here to open a survey form.

If you are trying to bundle candies together, how do you make sure that the recipient will enjoy most of the candies in it? First, you might want to choose more popular candies. Based on a survey I administered October 2020, below are the candies sorted by average likes minus average dislikes. See methodology this page for methodology and sample description.

plot of chunk likeability

Any of the candies with yellow bars are safe choices, as are most of the orange. The ones with darker bars are more disliked and are not as recommended in general.

Bundling Candies

While the above graph shows what candies are most liked, it doesn't show which ones would go best together. Though you aren't likely to go wrong with the most popular candies, the more middle-of-the-road and less well-liked ones are more polarizing. To solve this problem, I look at the numeric correlations between individual rankings of the candies from each survey respondent.

Dendrogram View

A simple way of looking at the above data is by using a dendrogram (a tree-like diagram). Each of the branches of the tree groups the most similar candies together. Contiguous colors indicate a cluster (total of 12).

plot of chunk phylo_plot

A few examples of candies that work well together:

Bear in mind how well liked candies are before choosing them. For example, while candy corn and circus peanuts have a high correlation with each other, I would only bundle them together if I was intent on selling the circus peanuts, the most disliked candy. Candy corn is probably best left being sold on its own in any other case.

Correlation Matrix

A more complex way of looking at the data is by looking at the correlation matrix itself. See the description after the image for tips on how to interpret it.

plot of chunk cormat

This graph is quite a bit to swallow, so here are a few guidelines:

  1. Darker gold color means the likeabilities of the two candies are highly correlated (e.g., 3 Musketeers and Milky Way) and could be paired together
  2. Bluer colors mean they are inversely related, and probably shouldn't be paired together (e.g., Reese's Peanut Butter Cups and bubblegum)
  3. The red boxes indicate a tight cluster, where many of the candies within it have strong correlations to each other
  4. The purple boxes are looser clusters, where many candies have positive correlations to each other

As a matter of practicality, you would likely do the following when building a bundle:

  1. Determine which candies are acceptable based on how well-liked they are
  2. Use the dendrogram to come up with groups of candies that you think would go well together
  3. Use the correlation matrix to verify that the correlations are high enough among the candies you've chosen for a particular bundle

Technical Details

Weighting

For the weighting of the models, I used raking across age group, gender, and US state. See https://maxcandocia.com/article/2018/Jun/24/survey-raking/ for more details on raking.

Correlations

For the correlations, I used a weighted polychoric correlation with the following values for likeability based on responses:

Clustering

I used the 'Ward.D2' method with hierarchical clustering, with 1-correlation as the distance metric.

GitHub Code

The current version of source code can be found in https://github.com/mcandocia/CandyRanking. See basket_analysis.r for code specific to this project. The data will be updated once I have more thoroughly analyzed it.


Tags: 

Recommended Articles

Hong Kong on Reddit

The recent Hong Kong protests have garnered much media attention. Reddit, one of the largest social media communities, has its own communities with different takes on the protests. Here, we take a look at four notable ones.

Pooled Testing for Viruses: How many tests can it save?

Pooled testing is a method of increasing the efficiency of medical tests, such as COVID-19 detection tests. How efficient is it, though? That largely depends on what percentage of the population is infected.