If you have a few minutes to spare, consider filling in my “Vote for Candy” survey to rank your favorite candies as part of my latest project, especially if you are a resident of a smaller US state. Click/tap here to go to the survey. This tab will stay open.

Gift cards will be given to random participants.



Hiding Data in Images

By Max Candocia

|

February 23, 2020

Have you ever wondered if you could hide a secret note in an image? Or a cat video? Or anything you can find on your computer?

Of course you can! This is just one example of steganography, or the practice of hiding information in something that would not seem like it contains hidden information. One way to think about it is as "invisible ink", which can hide an important message on an otherwise unimportant sheet of paper.

At the same time, I was also wondering, "What do the sequences of bytes (values from 0 to 255) look like in different kinds of files?" Using an image to represent these files is one way of "zooming out" to see patterns in a file.

Combining the above with encryption, one could hide password-protected files in images, and it would not be immediately obvious to most people looking at the file. Encryption is the process of transforming data in a manner that makes it difficult for anyone to retrieve the original data without knowing both the process for transforming the data and any secret passwords or other information needed to decode the file.

There are countless ways of doing this, but here are some ways that I have done this:

  1. Converting a file to a plain grayscale (white, black, and various shades of gray) image
  2. Converting a file to an RGB image (what you normally in an image, with different colors)
  3. Converting a file to an RGBA image (same as RGB, but it has an "alpha" channel that indicates transparency)
  4. Converting 3 files to an RGB image (each file contributes to the red, blue, or green channel)
  5. Converting 4 files to an RGBA image (each file contributes to the red, blue, green, or alpha channel)
  6. Using a normal RGB image and converting 1 file to the alpha transparency channel (so you end up with an RGBA image)
  7. Hiding data in different channels in RGB/RGBA images
  8. All of the above, with the option of encrypting the data with a password

As an added bonus, when not encrypting data, all of the above tend to compress file sizes, especially for plain-text file formats like CSV. This is usually not nearly as strong as more standard compression techniques, though.

Visualizations

Basic Spreadsheet

As a simple example, here are the first few rows of a CSV of a race I ran in 2018:

timestamp position_lat position_long distance enhanced_altitude altitude enhanced_speed speed heart_rate cadence fractional_cadence temperature timestamp_utc timezone
2018-10-07 07:31:04-05:00 41.880614887923 -87.6206464972347 0.0 183.60000000000002 183.60000000000002 11.959200000000001 11.959200000000001 110 80 0.0 18 2018-10-07 12:31:04 America/Chicago
2018-10-07 07:31:05-05:00 41.88064372166991 -87.62064431793988 0.0032 184.20000000000005 184.20000000000005 11.9232 11.9232 110 80 0.0 18 2018-10-07 12:31:05 America/Chicago
2018-10-07 07:31:06-05:00 41.88067582435906 -87.62064239010215 0.00677 185.0 185.0 11.9232 11.9232 112 80 0.0 18 2018-10-07 12:31:06 America/Chicago
2018-10-07 07:31:07-05:00 41.88070650212467 -87.62064448557794 0.01019 185.79999999999995 185.79999999999995 11.8908 11.8908 112 80 0.0 18 2018-10-07 12:31:07 America/Chicago
2018-10-07 07:31:08-05:00 41.88073986209929 -87.62064532376826 0.013890000000000001 186.20000000000005 186.20000000000005 11.8908 11.8908 112 80 0.0 18 2018-10-07 12:31:08 America/Chicago
2018-10-07 07:31:09-05:00 41.88080842606723 -87.62065085582435 0.02152 187.0 187.0 11.9232 11.9232 116 83 0.0 18 2018-10-07 12:31:09 America/Chicago

In the file itself, there are commas between each of the fields, and new lines between each row, but it otherwise looks exactly like that. Now, if all these values were converted to their corresponding 0-255 values and put into an image, this is what the result looks like, with the command used to make it above it:

python bti.py samples/a_run.csv modified_samples/encoded_run.png --width=1500px

You'll notice a bunch of white dashes surrounded by gray in the image. The white dashes represent "America/Chicago", and the gray around it is mostly numbers. If you squint your eyes, you will also see at the end of the last row there is a bunch of random noise. This is a 0-byte (pure black) followed by random values from 1-255 that lets the decoder know when the content of the actual file ends.

Images

Images can also be coded this way and put into an image. However, the result looks a bit more chaotic. Take this picture of a sleeping cat, for example, and try to see if you can make sense of its encoded version, scaled down for ease of viewing:

# black-and-white
python3 ../bti.py article_samples/jpg/IMG_20191209_145826.jpg article_samples_processed/article_samples/jpg/IMG_20191209_145826.jpg.bw.png --width=720

If you look closely, you'll occasionally see some streaks in the data that are not random, but it mostly looks like noise for the most part. Its meaning isn't as intuitive as the spreadsheet's from above.

Encrypted Data

If you encrypt data, ideally there won't be any noticeable pattern for the file, with the possible exception of some standard text not related to its contents at all. For example, with the above image of the cat, it can be encrypted with a password "testpass" with strong parameters in Argon2/AES-256, then it looks like one of these:

# Black-and-white (default)
python3 hide_in_image.py samples/sleepy_kitty.jpg modified_samples/encrypted_sleepy_kitty_bw.png --padding random --password=testpass --mode=encrypt --color-mode=L

# RGB kitty
python3 hide_in_image.py samples/sleepy_kitty.jpg modified_samples/encrypted_sleepy_kitty_rgb.png --padding random --password=testpass --mode=encrypt --color-mode=RGB

You'll notice that the second one has color and is smaller, which is a better way of using screen space, although the effect on file size is negligible.

Hiding data inside a regular image

You can also hide data/encrypted data inside a regular image. This can be done by replacing the red, blue, green, and/or alpha channels of a regular image with the data from some other file. If you replace one of the RGB channels, the image will most likely look pretty weird, but if you choose the alpha channel for a relatively small data file and pad with a value of 255, then the effect is barely noticeable, since only the upper left-hand corner is affected. Below is an example of another cat picture hidden in a picture of a shrine (note that the swastikas in this image are a religious symbol and unrelated to Nazis):

echo "HIDE DATA IN ALPHA CHANNEL OF IMAGE"

echo "ENCRYPT"
python3 hide_in_image.py samples/shrine.jpg samples/sleepy_kitty.jpg modified_samples/kitty_hidden_elsewhere_in_shrine2.png --hide-data-as-alpha --padding=255 --mode=encrypt --password=testpass

echo "DECRYPT"
python3 hide_in_image.py samples/shrine.jpg modified_samples/unred_sleepy_kitty_from_shrine2.png modified_samples/kitty_hidden_elsewhere_in_shrine2.png --hide-data-as-alpha --padding=mask --mode=decrypt --password=testpass

Since the original image is pretty big, it takes up a good chunk of the image, but a smaller file (or larger "mask" image) would be more hidden.

You can also hide several images in one as long as you remember which channels go with which image.

Obamacare

I also wondered what the Affordable Care Act (aka "Obamacare") looked like when I ran it through my script with a width of 720. It's kind of big, but it's interesting to see some patterns show up, especially near the bottom.

Final Comments

Although the above examples don't exactly pass for "regular" images, when combined with encryption, the data is truly "hidden". Regardless, this could be a fun way of passing secret messages. In fact, here is one that might have a prize for the first person who solves it (I used the --heavy flag when encrypting the data):

Github Source Code

The code and documentation used to generate the above images can be found in the three scripts in https://github.com/mcandocia/canyouseeme.

The tests folder gives examples of how the scripts are run, and enc.py and hide_in_image.py are similarly formatted when run, but bti.py has the order of python3 bti.py [input] [output], which may be slightly confusing.

Any feedback/suggestions are always appreciated. See my about page for my contact info.


Tags: 

Recommended Articles

Scraping Data from Reddit

A new tool, Tree Grab for Reddit, can be used to store user, thread, and comment data in a PostgreSQL database, with a variety of command-line options to customize and specify what kind of data is selected.

Overlaying Density Heatmaps on Geographic Maps in R

In this example, I use noise complaint data from New York City to demonstrate how you can plot densities of events on a map, as well as how extreme the averages are.