What Time Should You Post to Reddit?

By Max Candocia

|

July 29, 2017


Last Updated 10/24/2017

UPDATE: A more recent and thorough analysis can be found here.

When posting anything on social media, whether a news article, a picture of yourself, or a funny image (or a combination thereof), you usually want to reach the largest audience. When posting on Reddit, I have noticed that the success of a post is largely determined by the time of day and day of week that your submission is posted. There are a few other factors, such as whether the post is an image, an article, or a text-only submission.

I have used the Python scraper I built in order to collect data on articles of particular Subreddits I wish to analyze. Among the data collected, I have...

Using this information, I can formulate a model that describes what attributes affect the score. Specifically, I am looking for a percent change in the score with respect to values such as time of day, day of week, whether a post is an image post, etc. In my case, this can be approximated with this formula:

sign(score) * log(abs(score) + 1) = time_of_day_and_day_of_week + is_image_post + is_text_post + length_of_submission_title

I log-transform the score on the left side. Doing so ensures that the terms on the right side have a multiplicative effect on the score, as opposed to additive. The right side treats the time of day + day of week, the post being an image post, and its other attributes as independent factors that each scale the score by some value; i.e., I am controlling for other effects.

Below is a graph that estimates the effect of the time of day and day of week on six different subreddits I sampled collectively. I use Monday from 8 to 10 am as a reference, so the percentages are the percent increase in score you can expect if you post at the given time versus Monday from 8 to 10 am US Central Time .

Monday morning is a relatively good time to post in these subreddits, especially from 6-8 am. Sunday is even better during that time frame, with an expected score that is 74% higher than our reference, Monday from 8 to 10 am. Saturday, however, seems fairly strong most of the day.

Because the above image only applies to a relatively small amount of data, it helps to compare it to a different set of data. Below I sampled default subreddits, as well as thread commenter's comment histories, so this model generalizes to Reddit as a whole better.

This tells a similar story, except the tiles change a lot more smoothly. You could repeat the process, but the general takeaway is that the best time to post on Reddit is on Sunday, Monday, or Saturday from 6 to 8 am US Central Time. The next best times would be within 2 hours of that time range on those same days, or during that same time range on other days.

Additional Notes

Technically, the transformation I made to the score adds 1 to the score before calculating the percent change, and negative scores are calculated as having points equal to 1/(1+abs(score)) , which is a fractional score always decreasing as the score becomes more negative.

Code and Data

Below I have the R code I used to generate the images. You can download the data for the file here: constrasts_threadmode.csv.

library(plyr)
library(dplyr)
library(htmlTable)
library(ggplot2)
library(scales)

setwd('/mydirectory/reddit_posting')

#makes filenames possible/better
subslash <- function(x){
 x = (gsub(' ','-',x))
  return(gsub('/','-',x))
}

create_threads_plot <- function(threads, tname='none', subtitle_size=18){
  #group times to increase significance of data
  threads$hour_ = cut(threads$hour, seq(0,24,2), include.lowest=TRUE, right=FALSE)
  source_hour_ = levels(threads$hour_)
  target_hour_ = c('12-2 am','2-4 am', '4-6 am', '6-8 am','8-10 am', '10 am - 12 pm','12- 2 pm', '2- 4 pm', '4-6 pm','6-8 pm',
                   '8-10 pm','10 pm - 12 am')
  threads$hour_ = mapvalues(threads$hour_, from = source_hour_, to=target_hour_)
  threads$titlelen = nchar(as.character(threads$title))/100
  threads$logscore = sign(threads$score) * log(1 + abs(threads$score))
  threads$is_self = with(threads, ifelse(is_self=='t','Self Post','Link Post'))
  
  daysofweek = c('Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday')
  threads$dow = factor(daysofweek[threads$dow+1], levels=daysofweek)
  
  weekday_hour_grid = expand.grid(target_hour_, daysofweek)
  #make sure order is right
  weekday_hour_levels = paste(weekday_hour_grid[,2], weekday_hour_grid[,1])
  #for a better reference, ref=Monday 8-10 am
  weekday_hour_levels_ = c(weekday_hour_levels[17], weekday_hour_levels[-17])
  threads$weekday_hour = factor(paste(threads$dow,threads$hour_), levels =weekday_hour_levels_ )
  
  #domain vars
  threads$image_submission = factor( c('Image Submission',
                                       'Non-Image Submission')[2 - threads$domain %in% c('imgur.com','i.imgur.com','i.reddit.com')])
  threads$image_submission = relevel(threads$image_submission,ref='Non-Image Submission')
  
  #remove moderator posts, which will most likely be very high
  threads = threads %>% filter(is_distinguished=='f', is_stickied=='f')
  n_data_points = nrow(threads)
  #run linear model and extract coefficients
  model = lm(logscore ~ weekday_hour + titlelen + is_self + image_submission + subreddit, data=threads)
  model_summary = summary(model)
  coefs = model_summary$coefficients
  
  #round sig figs
  for (i in 1:4)
    coefs[,i] = signif(coefs[,i], 4)
  
  #used to produce HTML output of the model summary for display on web
  sink(subslash(paste0('reddit_thread_summary_table_', tname , '.html')))
  print(htmlTable(coefs))
  sink()
  
  #now format matrix to show results
  coefmat = as.data.frame(cbind(varname = rownames(coefs), coefs))[,1:2]
  coefmat = coefmat %>% filter(grepl('weekday_hour.*',varname))
  coefmat = rbind(data.frame(varname='weekday_hourMonday 8-10 am',Estimate=0), coefmat)
  coefmat$dow = factor(gsub( '.*hour','', gsub(' .*','',coefmat$varname) ), levels=daysofweek)
  coefmat$hour = factor(gsub('^[^0-9-]*? ','', coefmat$varname), levels=rev(target_hour_) )
  coefmat$`Percent Change`= (exp(as.numeric(coefmat$Estimate)) - 1) 
  
  #save plot to png
  png(subslash(paste0('expected_reddit_score_',tname, '.png')), height=720, width=920)
  print(
  ggplot(coefmat, aes(x=hour, y=dow, fill=`Percent Change`)) + 
    geom_tile() + xlab('') + ylab('') + #axes are self-explanatory with title
    ggtitle('Percent Change in Expected Reddit Submission Score Based on Time Posted',
            subtitle=paste('compared to Monday from 8 - 10 am & using',comma(n_data_points), tname,'submissions')) +
    theme_bw() + theme(plot.title = element_text(hjust=0.5, size=24), plot.subtitle=element_text(hjust=0.5, size=subtitle_size),
                       axis.text.x=element_text(size=18,angle=0, vjust=0.8), axis.text.y = element_text(size=18)) +
    scale_fill_gradient2(labels=scales::percent) + geom_text(aes(label=scales::percent(`Percent Change`)),size=6) +
    coord_flip()
  )
  dev.off()
}

#load file and create a plot + html table for each
threads = read.csv('/mydirectory/contrasts_threadmode.csv')
create_threads_plot(threads, 'nintendo/boardgames/rap/classicalmusic/democrats/conservative', subtitle_size=12)


Tags: 

Recommended Articles

Fixing Survey Samples by Raking Weights

If your sample populations are skewed by gender, race, age group, or any other segment, you can use raking to weight your data and get more accurate estimates.

Analyzing the Politics of Reddit - Part 1

Reddit's wide variety of political Subreddits can be compared and grouped together in several ways. Using user post and comment history, comment text, and moderator information, subreddits can be clustered into groups based on different ideas.