A Look at Whitewater Accidents

By Max Candocia


August 12, 2017

The other day, a friend of mine suggested looking at whitewater rafting/kayaking/canoeing accident data , prepared and hosted by American Whitewater. The data is fairly well-structured, but has a bit of missing data, so I will focus on only three things:

  1. Experience of Victim vs. Difficulty of River
  2. Number of Accidents per Year
  3. Cause of Injury/Death vs. Contributing Factors

An additional note: I do not separate injuries from deaths in these figures, as they are not well-differentiated in the stories provided on the website. For example, someone who is injured on the river and dies in the hospital is considered an injury in the database.

Experience of Victim vs. Difficulty of River

For the most part, category IV (advanced) rapids are the most common type among victims in whitewater accidents. This is not the case with the most experienced kayakers/rafters/canoers, who mostly fall victim in class V (expert) rivers. The presence of a large number of inexperienced victims in class IV rivers suggests that there is a sizable population that overestimates their own skill and/or underestimates river difficulty.

The mere presence of class VI (very dangerous) river deaths in the inexperienced/some experience categories is worrisome. The prevalence of injuries/death in class VI rapids among experts suggests that they are mostly avoided, judging by the greater number of accidents in class V rivers. see here for an example of a class VI rapids.

It also seems that the N/A experience category is similar to the inexperienced one in terms of the distribution of victims over different river difficulties. This strongly suggests that most of the N/A values are inexperienced.

Recorded Accidents Recorded Per Year

The above graph descries how many incidents have been recorded for this data set over the years. It appears as if kayaking/rafting/canoeing is becoming more popular, but I have some concerns with the data quality, especially given the number of N/A (missing) values for the experience of victims in recent years. This may be due to the method of data collection, which likely utilized the Internet more starting around the year 2000. Based on previous years, though, I would guess that most of the N/A values are inexperienced, which is also corroborated by the first graph. Below is a graph of the same data, except looking at the proportion by experience per year, as opposed to the raw count.

Note that some of the earlier years are a bit misleading due to the low count of recorded incidents.

Causes of Accident and Contributing Factors

Note: PFD = personal flotation device; examples of other terms provided in links below

Many of the recorded accidents do not have a special contributing factor, although high water and cold water are the two highest specifically listed.

Being pinned against a strainer, an object that has water rushing around/below it, is the most common cause of injury/death, followed by flush drowning, where the whitewater overwhelms the victim, who is unable to swim and keep water out of their face. Flush drowning appears to happen much more frequently under conditions of high and cold water.

Unsurprisingly, inexperienced victims tend to not have worn their PFD (personal flotation device), as well as getting caught in a low head dam.

Here are some examples of hazards shown on the bottom of the above graph:

Visualization Code

I have the code used for processing/visualizing the data here.


Recommended Articles

Christmas Survey Results Part 1

How do people celebrate Christmas? This is the first article in a series that looks at trends across different groups of people, where I look at more general aspects of the questions asked.

Exploring Data with Generative Clustering

Clustering is a common machine learning technique applied to data for a variety of reasons, including dimensionality reduction, finding similar objects, and discovering important features. Here I demonstrate how generative clustering can be used with the Iris data set.