A Tool for Censoring Geographic Data

By Max Candocia

|

October 03, 2017

Imagine you have a bunch of GPS data recorded from bike rides, runs, etc. You might want to share it with someone, but you are worried that they might find out too much information about you based on where your activities begin and end. Ideally, you should be able to censor segments that are too close to your living spaces, work spaces, etc.

This feature exists on some activity social media sites, such as strava.com, but if you want to process your own data, I wrote a script that can do that for you. Here is the GitHub repository for the code: https://github.com/mcandocia/examples/tree/master/censor_geo_data.

Example

Below is a visualization made with two different configurations of the code. You can find the sample data that I used here.

Specifics

The file, censor_and_package.py, is manually configured to search through directories you provide for CSV and GPX files and censor fields you request if they fall within a certain radius of a set of (longitude, latitude) coordinates you provide, either in the program file or in a CSV (recommended). Fields can be silently dropped by setting 'timestamp' and 'time' in the CENSOR_PARAMS object, which affects CSV and GPX files, respectively.

The output is files in a new directory you specify, as well as an optional zip archive of that directory, so you don't have to worry about overwriting your data, unless you specify that directory to be the same as the root directory.

The program requires BeautifulSoup 4 (for GPX processing) and numpy, although only the constants need to be changed. That is unless you want to extract additional information via the program (even though it is meant to remove information). For additional information, see the README.md or view the comments in the code.

Caveats

Regardless of how much you censor, if there is any pattern to your data, there are still ways for others to abuse that data. Removing points close to your home/work will help provide ambiguity to your personal life, but it is no guarantee. I would recommend specifying several points in your general area, not exactly centered on your points of interest, so that it is harder to guess the reasoning behind the censoring criteria.


Tags: 

Recommended Articles

Reverse-Engineering Strava's Calorie Estimator with Difficulty

Biking Calorie estimators can be mysterious, as many factors go in to how much power it takes to bike in various scenarios. Here I try reverse-engineering the calculations Strava uses for its own estimate.

The Community of Garlicoin, the New Meme Cryptocurrency

Garlicoin is the hottest new meme cryptocurrency. I surveyed about 200 of its enthusiasts to get a good idea of what the community looked like and what they thought about the cryptocurrency.