Pacing at the Boston Marathon

Pacing at the Boston Marathon

By Max Candocia


May 04, 2019

Last Updated 05/05/2019


In a marathon (26.2 miles/42.2 km), pacing is very important. If you start off too fast, you will suffer later in the race. If you start off too slow, you might not be able to reach your target time. Ideally, you want to keep your pace as even as possible. Of course, elevation changes during the race will affect your ideal pace, but it otherwise holds true for the most part.

The Boston Marathon is considered one of the most prestigous to run in, as it has a relatively competitive qualifying time based on sex and age group, as well as a very expensive charity requirement of $5,000 for those who do not qualify.

Split data from the Boston Marathon from 2015 through 2017 is available on Kaggle. By analyzing the data, one can see how pacing plays a key role in race strategy. The results of this are not immediately applicable to other marathons, as this course starts on a relatively steep downhill, dropping 120 feet in the first half mile, and a downhill that continues on until mile 4.

Pace Distribution

The first thing we should look at is how the runner's paces are distributed over the course of the race. Pace is measured in minutes per mile. To calculate miles per hour, divide 60 by the number of minutes per mile (e.g., 10 minutes/mile = 6 mph).

Notice how the distributions start off more peaked and become flatter throughout the race, mainly from a slow drift to the right, as runners slow down. This is especially visible in the men, who center around a 3-hour marathon target (about 6:52/mile), and then creep past that threshold.

If one looks at the distribution of paces at the beginning and end of the race, you end up with a heatmap like this. The yellow tiles have about 10 times as many people as orange, which have about 10 times as many people as magenta, which have 10 times as many people as dark purple.

It is obvious from the above that the paces for faster runners who are running under 5:30 a mile(!) are more stable over time, as they tend to know how fast they should be going.

Demographic Breakdown

I made a little widget so you can look at how pace varies over different demographics and years. In general, women and older individuals are better at keeping a consistent pacing than men and younger individuals.

Age Group:
Comparison Distance:

How do elite runners pace?

Out of curiosity, I took the top 6 runners for each sex for each year, and then plotted their paces against each other. The trend is a bit more interesting, as they all stick tightly together for most of the race, and then diverge from each other. 2016 was a bit of an anomaly, though, with the women's winner making a comeback around mile 20.

Despite the seemingly crazy shape, these bands are mostly 10-15 seconds per mile, which, while quite significant at the speeds the racers are running, is much less than the variation in pace among other runners. Also, unlike the majority of the racers, the goal of these runners isn't necessarily the best time, but to run faster than the other runners.

Predicting Performance of Runners by Splits

One last topic I will discuss (and visit in the near future) is predicting the final time/pace of a runner based on their previous splits.

A simple estimate would be to simply look at the pace at a split, and use that as an estimate. Or even better, as a simple linear regression model does, multiply that pace by a value, and possibly add a small correction constant. For example, the predicted pace at the end of the race is about 10% slower than the pace for the first 5K.

Going further, one could look at how runners slow down from split-to-split, and use that information in a general linear model to predict the future speed. For example, at the 10K split, one can estimate the runner's final pace by multiplying the pace by 1.126, slowing it down with the previous approach, or they can use the formula $$2.226 * pace_{10K} - 1.118*pace_{5K}$$

One can rewrite this approximately as $$pace_{final} = 1.11*pace_{10K} + 1.11*(pace_{10K} - pace_{5K})$$

This means that in addition to multiplying the 10K pace by 1.11, you also add 1.11 times the difference in average pace from 5K to 10K. If you slowed down from 5K to 10K, you are more likely to continue slowing down. Likewise, if you sped up, you are likely to continue speeding up.

Of course, the obvious explanation for the above is that you are just assuming that a runner will keep a steady pace for the remainder of the race. With more splits, though, more effects can be seen in how splits earlier in the race can be used to show a slowdown.

The last model that can be used also takes demographics and the race year into account for all of these. As you may have seen with the widget and animated graphs above, men and women of different ages tend to pace differently, which can affect a predictive model. I ran a few models at various splits, and these are the performances I came up with:

Essentially, taking past split history into account is a good way to predict the results of a race, even as the race comes to a finish. Using demographics is even better, but the improvement isn't much after halfway through the race.

Pace Ratio Table Appendix

Below are a couple tables with the ratios of the average paces at the finish of the race vs. the average paces at the 10K, 20K, and 30K markers for various years/demographics. Women have a slightly better consistency, with an average ratio that roughly corresponds to 8-15 seconds per mile to their corresponding split paces than men do. This affect was significantly less pronounced in 2015.

Gender Year final : 10K final : 20K final : 30K
F 2015 1.055 1.045 1.024
F 2016 1.086 1.062 1.031
F 2017 1.091 1.067 1.033
M 2015 1.058 1.051 1.032
M 2016 1.106 1.086 1.050
M 2017 1.110 1.089 1.051

Gender Year Age final : 10K final : 20K final : 30K
F 2015 18-24 1.058 1.049 1.026
F 2015 25-29 1.050 1.043 1.024
F 2015 30-34 1.053 1.044 1.023
F 2015 35-44 1.050 1.042 1.023
F 2015 45-54 1.058 1.046 1.025
F 2015 55+ 1.078 1.057 1.029
F 2016 18-24 1.093 1.066 1.033
F 2016 25-29 1.085 1.062 1.031
F 2016 30-34 1.084 1.061 1.031
F 2016 35-44 1.080 1.058 1.030
F 2016 45-54 1.087 1.062 1.032
F 2016 55+ 1.101 1.069 1.033
F 2017 18-24 1.098 1.074 1.037
F 2017 25-29 1.087 1.065 1.033
F 2017 30-34 1.088 1.065 1.032
F 2017 35-44 1.087 1.064 1.032
F 2017 45-54 1.092 1.067 1.032
F 2017 55+ 1.110 1.074 1.033
M 2015 18-24 1.055 1.054 1.036
M 2015 25-29 1.052 1.049 1.032
M 2015 30-34 1.048 1.044 1.029
M 2015 35-44 1.051 1.046 1.029
M 2015 45-54 1.057 1.051 1.032
M 2015 55+ 1.075 1.062 1.035
M 2016 18-24 1.126 1.105 1.061
M 2016 25-29 1.106 1.091 1.053
M 2016 30-34 1.102 1.086 1.050
M 2016 35-44 1.103 1.084 1.049
M 2016 45-54 1.107 1.085 1.049
M 2016 55+ 1.109 1.084 1.048
M 2017 18-24 1.124 1.100 1.059
M 2017 25-29 1.111 1.092 1.055
M 2017 30-34 1.111 1.090 1.053
M 2017 35-44 1.104 1.083 1.049
M 2017 45-54 1.109 1.088 1.052
M 2017 55+ 1.118 1.093 1.050

GitHub Code

The code I used to generate this graph can be found here:


Recommended Articles

How Likely Are You to be Banned From Reddit?

How Likely Are You to be Banned From Reddit? I got a bot for that.

"Error Bars" on Tiled Heatmaps

Heatmaps are a useful way of plotting 2-dimensional data, such as cross-tabulations. Adding "error bars" can seem non-intuitive, but expressing them in your visualization is possible with a small trick.