If you have a few minutes to spare, consider filling in my “Vote for Candy” survey to rank your favorite candies as part of my latest project, especially if you are a resident of a smaller US state. Click/tap here to go to the survey. This tab will stay open.

Gift cards will be given to random participants.

Which Subreddits Swear the Most?

By Max Candocia


January 31, 2017


Above, you can see which of the (mostly default) Subreddits swear the most. The Subreddits are ordered from bottom-to-top in terms of overall swear usage, and the swears are ordered from left-to-right in terms of overall usage. /r/announcements, a Subreddit for official announcements, appears to contain the most foul language.

Looking at the frequencies of specific swears in Subreddits, we can see which swear words are most popular with different Subreddits:

Additionally, we can look at the ratio of a swear's usage in a Subreddit compared to the average of that word's usage in all Subreddits in order to see which words are most strongly associated with which default Subreddits:

Background and Comparison

The other day a friend sent me a link to this post, which claimed to be analyzing which subreddits swore the most. However, there were two major flaws:

  1. The list of words counted as swears included words like "sex" or "poop", which are not generally considered swears or bad words
  2. Swears were not grouped together by their root. For example, "crap" and "crapping" are very close, and should be grouped together

In my analysis, I fixed those two issues. Also, the post above used the raw count of swear words. I decided to use the count of comments with swears in them in order to avoid letting comments with numerous, repeated swears act as outliers.

Using this list, which is the same used in the original, I compared the results of the two lists using the bar plot below:

Sampling Methodology

I used my scraper, Tree Grab for Reddit to scrape the top "hot" posts from the default subreddits, plus /r/The_Donald and /r/gonewild (as the previous analysis did). Instead of scraping the comment feed of a Subreddit, I scraped the comments of various threads, which can go back further in time, but can be a bit slower and less comprehensive for a short period of time. The code for analysis can be found here.

Below you can see the comment counts from each Subreddit after I filtered out two bots, AutoModerator and WritingPromptsRobot.

Extra Visualizations

I also made boxplots of swear frequencies and swear ratios across Subreddits. These allow you to see how usage of swear words is distributed.


Recommended Articles

Hashing Data to Memorable Phrases

Do you have trouble memorizing long strings, but want to keep things easy to remember? Look no further than the new keyToEnglish package in R, now available on CRAN.

Visualizing the IHSA State Cross-Country Meet

An analysis of how runners perform in the state IHSA cross-country meet and what the top runners look like in terms of pacing strategy. Also, some old-school footage from a while ago.