December 19, 2017

This December, I surveyed 312 users from Reddit, Facebook, and, to a lesser extent, other social media sources on how they celebrated (or didn't celebrate) Christmas. You can find an active version of the survey here . I automatically generate a good portion of this article, so I may update it in the future.

This is the first article using this data out of a few that I will be posting in the next few days. The primary goal of this article is to visualize the overall responses to some of the main questions, divide them into groups, and provide some commentary on them. The following articles will dive deeper into different aspects, including when kids learn that Santa isn't real, what words people use to describe Christmas, and various foods, desserts, and drinks that people consume.

- Second article in series (when do kids stop believing in Santa?)
- Third article in series (describing Christmas in word clouds)
- Fourth article in series (analyzing and clustering Christmas food & drink)
- Fifth article in series (unpleasant statistics about Santa)

Note that with the exception of the logistic model at the very bottom, these graphs are representative of the data moreso than the general population.

In the survey I asked about gender, religion, and region of the US (or outside the US). I did not include race, as I will be making the survey responses public, and that may provide too much identifying information for some people's comfort. The most notable demographic biases that are not particularly representative of the US as a whole are the overrepresentation of the Midwest, a large number of atheist/agnostic responders, as well as a notable percent of responders outside of the US.

I also looked at who celebrates Christmas. Unsurprisingly, most of those who don't are not Christian.

#who celebrates christmas?Another question I asked is if an individual had a larger celebration on the 24th or 25th of December (or if they were about the same). One oversight when asking this question was not realizing that Eastern Orthodox Christmas is on January 6th, which is not an option.

I also looked at some of the activities that users did on Christmas.

#by region

Looking at the above graphs, there are some correlations that have to do with oversampling from certain regions. For example, there are a high number of Evangelical Christian responses from the midwest, which is not representative of the overall structure of the survey. A statistical technique known as logistic regression can be used to determine what factors influence an outcome. In this case, I am testing to see if region, religion, gender, and/or celebration of Christmas affect individuals hanging out with friends on Christmas, since that is one of the categories that seems to have interesting correlations in the above graph.

Below is a snippet of code I used to test this out.

friends_model = step(glm(activities_Spend.time.with.friends ~ religion + region + celebrates_christmas + gender + age_group, family=binomial, data=survey_categories), k=log(nrow(survey)), direction='both', trace=0) summary(friends_model)

## ## Call: ## glm(formula = activities_Spend.time.with.friends ~ celebrates_christmas, ## family = binomial, data = survey_categories) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -0.9405 -0.9405 -0.9405 1.4345 1.9728 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -0.5867 0.1254 -4.679 2.88e-06 *** ## celebrates_christmasNo -1.2051 0.4991 -2.415 0.0157 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 397.18 on 311 degrees of freedom ## Residual deviance: 389.87 on 310 degrees of freedom ## AIC: 393.87 ## ## Number of Fisher Scoring iterations: 4

It turns out that the only significant factor in determining if one is more likely to spend time on Christmas with friends is whether or not they celebrate Christmas to begin with. I am guessing those who don't do not know too many people who aren't celebrating that day or are otherwise apathetic.

I have the source code for my analysis on GitHub here. All the responses (after removing timestamp/order info) will be released once I finish my article series.

Tags: