Exploring Data with Generative Clustering

Exploring-Data-with-Generative-Clustering_image

By Max Candocia

|

January 09, 2018

Clustering is a common machine learning technique applied to data for a variety of reasons, including dimensionality reduction, finding similar objects, and discovering important features. Here I demonstrate how generative clustering can be used with the Iris data set.


Visualizing My Runs in 2017

Visualizing-My-Runs-in-2017_image

By Max Candocia

|

January 01, 2018

A visualization of my running from 2017 using ggplot2.


Unpleasant Statistics: Santa Causes Kids to Turn Away From Christianity

Unpleasant-Statistics:-Santa-Causes-Kids-to-Turn-Away-From-Christianity_image

By Max Candocia

|

December 23, 2017

What are the holidays without abuse of statistical techniques? According to a recent survey, kids being told that they receive gifts from Santa are up to 150% more likely to turn away from Christianity as an adult.


Analyzing and Clustering Christmas Foods, Drinks, and Desserts

Analyzing-and-Clustering-Christmas-Foods,-Drinks,-and-Desserts_image

By Max Candocia

|

December 22, 2017

When celebrating Christmas, most people think of a large feast as part of the celebration. In the fourth part of my Christmas article series, I look at what foods are common across different regions of the US. Clustering also shows interesting relationships between different foods and drinks.


Word Clouds of Christmas

Word-Clouds-of-Christmas_image

By Max Candocia

|

December 21, 2017

Visualizations of how people describe Christmas using word clouds.


When do Kids Stop Believing in Santa?

When-do-Kids-Stop-Believing-in-Santa?_image

By Max Candocia

|

December 20, 2017

When do kids stop believing in Santa? Are there any backgrounds more at risk for not believing in Santa as long? I explore these questions using a Christmas survey in part 2 of my series of Christmas articles.


Christmas Survey Results Part 1

Christmas-Survey-Results-Part-1_image

By Max Candocia

|

December 19, 2017

How do people celebrate Christmas? This is the first article in a series that looks at trends across different groups of people, where I look at more general aspects of the questions asked.


Overlaying Density Heatmaps on Geographic Maps in R

Overlaying-Density-Heatmaps-on-Geographic-Maps-in-R_image

By Max Candocia

|

December 15, 2017

In this example, I use noise complaint data from New York City to demonstrate how you can plot densities of events on a map, as well as how extreme the averages are.


How to use IPython Notebooks in Blog Posts

How-to-use-IPython-Notebooks-in-Blog-Posts_image

By Max Candocia

|

November 08, 2017

You can convert IPython notebooks to html with the command-line to insert them into your blog/website, and use a stylesheet provided here for proper formatting and syntax highlighting.


Reinforcement Learning Demo with Keras

Reinforcement-Learning-Demo-with-Keras_image

By Max Candocia

|

November 05, 2017

Keras, a TensorFlow-based neural network library in Python, can be used to solve reinforcement learning tasks. In this demonstration, I demonstrate one method of solving a game to improve the odds of winning.


Suggestions for Making Your R Code Spookier

Suggestions-for-Making-Your-R-Code-Spookier_image

By Max Candocia

|

October 25, 2017

The air is getting colder, and sometimes you just want to add some pizzazz to your R code. Here are some tips for making it spookier this frightful season.


When Leverage Overshadows Regularization

When-Leverage-Overshadows-Regularization_image

By Max Candocia

|

October 18, 2017

Regularization is an effective way of preventing overfitting in a model. Highly influential points in a data set can create a large amount of variation in estimates despite its presence.


What Time Should You Post to Reddit? (Part 2)

What-Time-Should-You-Post-to-Reddit?--(Part-2)_image

By Max Candocia

|

October 12, 2017

When is the ideal time to post to Reddit? Using a large sample of Reddit posts from Google BigQuery and elastic-net regression techniques, I take a closer look at post scoring patterns across Reddit.


A Tool for Censoring Geographic Data

A-Tool-for-Censoring-Geographic-Data_image

By Max Candocia

|

October 03, 2017

Want to share some personal GPS data but worried about privacy? Here is a Python script that can be used to censor data located near points you specify in both CSV and GPX files.


Tips for Effectively Using Color in Visualizations

Tips-for-Effectively-Using-Color-in-Visualizations_image

By Max Candocia

|

September 26, 2017

Color is an important aspect of any visualization. Often, readability is sacrificed for simplicity or aesthetics, even though it is not necessary. Here I demonstrate some examples of how to use and improve visualizations via choice of colors and color palettes.


Converting Garmin FIT Files to CSV

Converting-Garmin-FIT-Files-to-CSV_image

By Max Candocia

|

September 22, 2017

Garmin GPS data can be converted to CSV files with important workout information, including location, heart rate, and elevation. I provide sources and a script for processing them using Python 3.


Reverse-Engineering Strava's Calorie Estimator with Difficulty

Reverse-Engineering-Strava's-Calorie-Estimator-with-Difficulty_image

By Max Candocia

|

September 19, 2017

Biking Calorie estimators can be mysterious, as many factors go in to how much power it takes to bike in various scenarios. Here I try reverse-engineering the calculations Strava uses for its own estimate.


A Look at Whitewater Accidents

A-Look-at-Whitewater-Accidents_image

By Max Candocia

|

August 12, 2017

Whitewater rafting is a fun, but potentially dangerous activity. In this article, I look at the experience level of victims, the difficulty of rivers they traverse on, as well as some of the causes/contributing factors to these accidents.


Using AI to Determine Strategy in Machi Koro

Using-AI-to-Determine-Strategy-in-Machi-Koro_image

By Max Candocia

|

July 30, 2017

One can use neural network-driven AI to simulate games of Machi Koro and determine an optimal strategy using visualizations.


What Time Should You Post to Reddit?

What-Time-Should-You-Post-to-Reddit?_image

By Max Candocia

|

July 29, 2017

When posting anything on social media, you usually want to reach the largest audience. Here, I show how I estimated the best times to post on Reddit.