A Tool for Censoring Geographic Data

A Tool for Censoring Geographic Data

By Max Candocia


October 03, 2017

Imagine you have a bunch of GPS data recorded from bike rides, runs, etc. You might want to share it with someone, but you are worried that they might find out too much information about you based on where your activities begin and end. Ideally, you should be able to censor segments that are too close to your living spaces, work spaces, etc.

This feature exists on some activity social media sites, such as strava.com, but if you want to process your own data, I wrote a script that can do that for you. Here is the GitHub repository for the code: https://github.com/mcandocia/examples/tree/master/censor_geo_data.


Below is a visualization made with two different configurations of the code. You can find the sample data that I used here.


The file, censor_and_package.py, is manually configured to search through directories you provide for CSV and GPX files and censor fields you request if they fall within a certain radius of a set of (longitude, latitude) coordinates you provide, either in the program file or in a CSV (recommended). Fields can be silently dropped by setting 'timestamp' and 'time' in the CENSOR_PARAMS object, which affects CSV and GPX files, respectively.

The output is files in a new directory you specify, as well as an optional zip archive of that directory, so you don't have to worry about overwriting your data, unless you specify that directory to be the same as the root directory.

The program requires BeautifulSoup 4 (for GPX processing) and numpy, although only the constants need to be changed. That is unless you want to extract additional information via the program (even though it is meant to remove information). For additional information, see the README.md or view the comments in the code.


Regardless of how much you censor, if there is any pattern to your data, there are still ways for others to abuse that data. Removing points close to your home/work will help provide ambiguity to your personal life, but it is no guarantee. I would recommend specifying several points in your general area, not exactly centered on your points of interest, so that it is harder to guess the reasoning behind the censoring criteria.


Recommended Articles

Effectively Hiding Data in Images

How to effectively hide messages and other data inside images.

Hashing Data to Memorable Phrases

Do you have trouble memorizing long strings, but want to keep things easy to remember? Look no further than the new keyToEnglish package in R, now available on CRAN.