Hong Kong on Reddit

Hong Kong on Reddit

By Max Candocia


September 22, 2019

Last Updated 09/25/2019

Beginning in the Spring of this year, protests began in Hong Kong over the Fugitive Offenders and Mutual Legal Assistance in Criminal Matters Legislation (Amendment) Bill, which would allow fugitives to be extradited from Hong Kong to China.

One large online community that has gotten involved with the protests is Reddit, which has a community called /r/HongKong, which is largely in favor of the protestors' cause. /r/China is another subreddit that has some (but significantly less) involvement in Hong Kong's affairs, but also leans towards the protestors' cause.

On the other side of the issue, /r/Hong_Kong is overwhelmingly against the protestors' cause, as well as the more nationalist /r/Sino counterpart to /r/China.

I collected a sample of 32,491 users who have commented and/or posted in one of those subreddits, and looked at their comment and submission histories since March 31st of this year. While one could simply look at which subreddits appear the most in these users' submission and comment histories, a lot of the big, default subreddits would take up most of the top spots. By scaling These values to the subreddit's respective sizes, we can get a better idea of what makes these communities more unique.

Words and Phrases (what makes these distinct from each other?)

I processed different words and phrases (including up to 3-character Chinese character sequences) in each of the subreddits and looked at which ones were more frequent (in terms of proportion) in one subreddit vs. any of the others. "laowai" is Chinese for "foreigner", which makes sense for /r/China, considering that other common words appear to be related to teaching English there.

"lennon" for /r/HongKong refers to the "Lennon Wall", which was a mosaic made of encouraging post-it notes during the 2014 Umbrella Protests in Hong Kong. "tvb" refers to a television broadcasting company in Hong Kong. Most of the words and phrases representing /r/HongKong here are related to the recent protests.

Other than the reference to the 1989 Tiananmen Square Massacre, I am not sure why the below phrases are appearing for /r/Hong_Kong.

For /r/Sino, there appears to be a large focus around race, specifically.

plot of chunk unnamed-chunk-1

Risk Score of Being Banned

I am currently testing out a "Creddit" risk score, which can also be accessed by my /u/CredditReportingBot. The score scales from 0 (highest risk) to 1,000 (lowest risk), with the relative risk doubling every 100 points. Below is a set of box plots comparing the risk scores of the four subreddits' userbases:

Note: I consider scores of under 200 to be "high risk", under 300 "moderately high risk", and under 400 "slightly elevated risk".

plot of chunk creddit_risk

/r/China and /r/Hong_Kong have fairly similar risk profiles, and /r/HongKong is noticeably lower, with about half the risk of the other subreddits when looking at commenters. The risk is about one quarter for those who post to /r/HongKong versus the other groups. /r/Sino members are at the highest risk of being banned.


Unscaled (most popular subreddits)

plot of chunk coms1

Scaled to Overall Populations (what sets this apart from the rest of Reddit)

plot of chunk comments2

Scaled to Sample Populations (what sets this apart from the other 3 subreddits)

plot of chunk comments3

Looking at the above, there are a few things to note:

  1. It appears that /r/Hong_Kong has a lot of similarities with /r/HongKong. This suggests that a non-trivial portion of the comments from that subreddit came from "brigaders" from /r/HongKong, who do not normally participate in /r/Hong_Kong. This is especially noticeable because the most common subreddits are associated with both of them, but /r/HongKong vastly outnumbers /r/Hong_Kong in subscriber count.
  2. /r/Sino is distinctively left-leaning from any 3 ways of scaling the population.
  3. /r/Hong_Kong is only particularly notable in its participation in /r/Sino, /r/HongKong, and /r/China compared to the rest of Reddit, while those other 3 subreddits have more distinctive features. This is not too surprising given its relatively smaller size and recent events.

Mutual Users Tables

To illustrate the overlap between the subreddits, here is a table showing the shared commenters between the subreddits since March 31st, 2019:
China HongKong Hong_Kong Sino
China 6386 2539 111 760
HongKong 2539 25175 431 1164
Hong_Kong 111 431 539 132
Sino 760 1164 132 2917

431 out of 539 commenters also commented in /r/HongKong. Although this table alone doesn't tell us if it was /r/Hong_Kong members that were going to /r/HongKong or the other way around, the fact that the commenting profile of /r/Hong_Kong is so similar to /r/HongKong implies that the larger subreddit has significant presence the smaller one. This does not exclude the possibility of /r/Hong_Kong members going over to /r/HongKong, but their impact would be significantly less.


Looking at submission-based statistics, the results are similar to the comments-based ones, but because of increased scrutiny of submissions vs. comments by moderators, these are often more representative of the general view of the community, even in spite of brigaders. The relationship of /r/Hong_Kong to /r/Sino becomes a bit more clear, and the submissions of /r/China submitters seems to be slanted more towards regional subreddits (e.g., Africa, Guangzhou) instead of only China-related ones.

Unscaled (most popular subreddits)

plot of chunk threads1

Scaled to Overall Populations (what sets this apart from the rest of Reddit)

plot of chunk threads2

Scaled to Sample Populations (what sets this apart from the other 3 subreddits)

plot of chunk threads3

Moderator Network

One last type of graphic I like to use to visualize subreddits is a network graph of the moderators, the individual accounts that create and enforce the rules of their respective subreddits. Below is a graph that demonstrates how /r/Sino and /r/Hong_Kong are part of a tightly knit group that is run by mostly the same moderators. Many of the subreddits besides /r/Sino and /r/Hong_Kong in that clique are not very active, as indicated by the size, which is proportional to the number of subscribers.

/r/China and /r/HongKong are completely separate as far as these subreddits go, but the network only looks at the Subreddits moderated by the individuals of the four subreddits analyzed above, so there is likely an actual connection further down the line.


The code used for the main analysis (but not data collection) is available on GitHub: https://github.com/mcandocia/hk_reddit


Recommended Articles

Costumes of Halloween

Which Halloween costumes have been the most popular, and which are the most similar? Visualizations of these with analysis using survey data from Americans across the country.

Converting fields of lists into wide and tall formats in R

If you have ever downloaded survey data, or any other kind of data, that has a field that is itself comma-separated, you may have found it annoying/difficult to reshape the data into a more useful form.