November 13, 2020
If you would like to respond to the survey for a future update of this project, and other candy-related ones on this site, click here to open a survey form.
If you are trying to bundle candies together, how do you make sure that the recipient will enjoy most of the candies in it? First, you might want to choose more popular candies. Based on a survey I administered October 2020, below are the candies sorted by average likes minus average dislikes. See methodology this page for methodology and sample description.
Any of the candies with yellow bars are safe choices, as are most of the orange. The ones with darker bars are more disliked and are not as recommended in general.
While the above graph shows what candies are most liked, it doesn't show which ones would go best together. Though you aren't likely to go wrong with the most popular candies, the more middle-of-the-road and less well-liked ones are more polarizing. To solve this problem, I look at the numeric correlations between individual rankings of the candies from each survey respondent.
A simple way of looking at the above data is by using a dendrogram (a tree-like diagram). Each of the branches of the tree groups the most similar candies together. Contiguous colors indicate a cluster (total of 12).
A few examples of candies that work well together:
Bear in mind how well liked candies are before choosing them. For example, while candy corn and circus peanuts have a high correlation with each other, I would only bundle them together if I was intent on selling the circus peanuts, the most disliked candy. Candy corn is probably best left being sold on its own in any other case.
A more complex way of looking at the data is by looking at the correlation matrix itself. See the description after the image for tips on how to interpret it.
This graph is quite a bit to swallow, so here are a few guidelines:
As a matter of practicality, you would likely do the following when building a bundle:
For the weighting of the models, I used raking across age group, gender, and US state. See https://maxcandocia.com/article/2018/Jun/24/survey-raking/ for more details on raking.
For the correlations, I used a weighted polychoric correlation with the following values for likeability based on responses:
I used the 'Ward.D2' method with hierarchical clustering, with 1-correlation
as the distance metric.
The current version of source code can be found in https://github.com/mcandocia/CandyRanking. See basket_analysis.r
for code specific to this project. The data will be updated once I have more thoroughly analyzed it.