Analyzing and Clustering Christmas Foods, Drinks, and Desserts

Analyzing and Clustering Christmas Foods, Drinks, and Desserts

By Max Candocia

|

December 22, 2017

This December, I surveyed a 312 users from Reddit, Facebook, and, to a lesser extent, other social media sources on how they celebrated (or didn't celebrate) Christmas. You can find an active version of the survey here . I automatically generate a good portion of this article, so I may update it in the future.

Other Articles in Series:

This is the fourth article in the series, where I analyze the foods, drinks, and desserts people consume on Christmas, both as a whole and by region of the US (or outside the US).

The first part will look at proportions of food, drink, and dessert overall and by region. The second part will cluster the regions and the various food, beverage, and dessert items (this is actually pretty cool).

Christmas Foods

Among Christmas foods, bread rolls and mashed potatoes were the most common in the set, even across different regions of the US. Mac 'n cheese is more common in the Southeast and, to some degree, the Southwest, and prime rib only seems really common in the West.

plot of chunk foods-food-overall
plot of chunk foods-food-region

Christmas Beverages

Among the beverages listed in the survey, apple cider seems to have the most regional variation among any of them, with a majority of responses from the West indicating they drank it on Christmas.

plot of chunk foods-beverage-overall
plot of chunk foods-beverage-region

Christmas Desserts

As far as desserts go, it seems cookies and pie art the most common in the US. Interestingly, gingerbread isn't quite as common, except among non-US responses, where 50% had gingerbread on Christmas.

plot of chunk foods-dessert-overall
plot of chunk foods-dessert-region

Clustering Regions

One interesting thing that can be done with these survey responses is clustering. In this case, clustering involves measuring similarities between different things, such as a region of the US or a food/drink item, and then making connections between those things with the similarities, starting with the most similar objects first. When two objects are connected, they form a group, and you can measure the similarity between groups in a similar way.

This first graph is a visualization of a "similarity matrix" between regions. I measured the similarity using the percentage of people from that region that consumed each food/drink/dessert. From this graph, it seems like the least similar regions are the US regions vs. outside the US (Non-US in the graph).

plot of chunk foods-region-simmat

Below is a dendrogram. Objects that are connected closest to each other are the most similar, and objects that connect to groups at a lower level (further to the right) are closer to that group than they are other groups.

In this case, the Midwest and Northeast are the closest to each other in terms of Christmas cuisine. Then, the type of food and drink becomes more distinct when you look at the Southeast, West, Southwest, and finally, outside the US.

plot of chunk foods-region-phylo

Clustering Foods, Drinks, and Beverages by Region Similarity

In addition to clustering region by cuisine, you can cluster cuisine by region. What this does is it shows which foods are more strongly correlated across regions. The lighter square regions of this similarity matrix represent groups of food and drinks that are commonly found in specific regions.

plot of chunk foods-item-simmat1

Below is a dendrogram of the above matrix, with many different clusters formed. Items that are next to each other are either likely to be found in the same region together. Conversely, if one is not popular in a region, the other is also less likely to be as popular within that region.

A good example here is pecan pie and macaroni and cheese being grouped together. Both are more strongly associated with the Southern US, which makes their close grouping unsurprising.

plot of chunk foods-item-phylo1

Clustering Foods, Drinks and Beverages by Individual Consumption

In addition to grouping cuisine by region, you can also use individual responses for clustering. In this case, I use the Jaccard index to calculate which foods share the most individuals with other foods.

For example, if Alice and Bob like candy canes, and Bob and Clark like gingerbread, then gingerbread is somewhat similar to candy canes because Bob likes them both, but not too similar, since Alice and Clark like candy canes and gingerbread, respectively, but not both.

plot of chunk foods-item-simmat2

What's interesting here is how different the clustering is from the region-based clustering. There are some similarities that remain, such as the turkey + stuffing/dressing combination, but it looks like foods that are more similar to each other tend to be grouped closer together than before. This makes sense, since individuals who like foods similar to each other would definitely eat both at a large feast.

It also appears that individuals who didn't have any foods, beverages, or desserts listed often had that issue across the three categories. It may have been a result of skipping over the question (I made it optional), or it could be the US-focused nature of the food & drink list I provided.

plot of chunk foods-item-phylo2

Source Code

I have the source code for my analysis on GitHub here. All the responses (after removing timestamp/order info) will be released once I finish my article series.


Tags: 

Recommended Articles

Exploring Data with Generative Clustering

Clustering is a common machine learning technique applied to data for a variety of reasons, including dimensionality reduction, finding similar objects, and discovering important features. Here I demonstrate how generative clustering can be used with the Iris data set.

How Likely Are You to be Banned From Reddit?

How Likely Are You to be Banned From Reddit? I got a bot for that.