February 23, 2020
Have you ever wondered if you could hide a secret note in an image? Or a cat video? Or anything you can find on your computer?
Of course you can! This is just one example of steganography, or the practice of hiding information in something that would not seem like it contains hidden information. One way to think about it is as "invisible ink", which can hide an important message on an otherwise unimportant sheet of paper.
At the same time, I was also wondering, "What do the sequences of bytes (values from 0 to 255) look like in different kinds of files?" Using an image to represent these files is one way of "zooming out" to see patterns in a file.
Combining the above with encryption, one could hide password-protected files in images, and it would not be immediately obvious to most people looking at the file. Encryption is the process of transforming data in a manner that makes it difficult for anyone to retrieve the original data without knowing both the process for transforming the data and any secret passwords or other information needed to decode the file.
There are countless ways of doing this, but here are some ways that I have done this:
As an added bonus, when not encrypting data, all of the above tend to compress file sizes, especially for plain-text file formats like CSV. This is usually not nearly as strong as more standard compression techniques, though.
As a simple example, here are the first few rows of a CSV of a race I ran in 2018:
timestamp | position_lat | position_long | distance | enhanced_altitude | altitude | enhanced_speed | speed | heart_rate | cadence | fractional_cadence | temperature | timestamp_utc | timezone |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2018-10-07 07:31:04-05:00 | 41.880614887923 | -87.6206464972347 | 0.0 | 183.60000000000002 | 183.60000000000002 | 11.959200000000001 | 11.959200000000001 | 110 | 80 | 0.0 | 18 | 2018-10-07 12:31:04 | America/Chicago |
2018-10-07 07:31:05-05:00 | 41.88064372166991 | -87.62064431793988 | 0.0032 | 184.20000000000005 | 184.20000000000005 | 11.9232 | 11.9232 | 110 | 80 | 0.0 | 18 | 2018-10-07 12:31:05 | America/Chicago |
2018-10-07 07:31:06-05:00 | 41.88067582435906 | -87.62064239010215 | 0.00677 | 185.0 | 185.0 | 11.9232 | 11.9232 | 112 | 80 | 0.0 | 18 | 2018-10-07 12:31:06 | America/Chicago |
2018-10-07 07:31:07-05:00 | 41.88070650212467 | -87.62064448557794 | 0.01019 | 185.79999999999995 | 185.79999999999995 | 11.8908 | 11.8908 | 112 | 80 | 0.0 | 18 | 2018-10-07 12:31:07 | America/Chicago |
2018-10-07 07:31:08-05:00 | 41.88073986209929 | -87.62064532376826 | 0.013890000000000001 | 186.20000000000005 | 186.20000000000005 | 11.8908 | 11.8908 | 112 | 80 | 0.0 | 18 | 2018-10-07 12:31:08 | America/Chicago |
2018-10-07 07:31:09-05:00 | 41.88080842606723 | -87.62065085582435 | 0.02152 | 187.0 | 187.0 | 11.9232 | 11.9232 | 116 | 83 | 0.0 | 18 | 2018-10-07 12:31:09 | America/Chicago |
In the file itself, there are commas between each of the fields, and new lines between each row, but it otherwise looks exactly like that. Now, if all these values were converted to their corresponding 0-255 values and put into an image, this is what the result looks like, with the command used to make it above it:
python bti.py samples/a_run.csv modified_samples/encoded_run.png --width=1500px
You'll notice a bunch of white dashes surrounded by gray in the image. The white dashes represent "America/Chicago", and the gray around it is mostly numbers. If you squint your eyes, you will also see at the end of the last row there is a bunch of random noise. This is a 0-byte (pure black) followed by random values from 1-255 that lets the decoder know when the content of the actual file ends.
Images can also be coded this way and put into an image. However, the result looks a bit more chaotic. Take this picture of a sleeping cat, for example, and try to see if you can make sense of its encoded version, scaled down for ease of viewing:
# black-and-white
python3 ../bti.py article_samples/jpg/IMG_20191209_145826.jpg article_samples_processed/article_samples/jpg/IMG_20191209_145826.jpg.bw.png --width=720
If you look closely, you'll occasionally see some streaks in the data that are not random, but it mostly looks like noise for the most part. Its meaning isn't as intuitive as the spreadsheet's from above.
If you encrypt data, ideally there won't be any noticeable pattern for the file, with the possible exception of some standard text not related to its contents at all. For example, with the above image of the cat, it can be encrypted with a password "testpass" with strong parameters in Argon2/AES-256, then it looks like one of these:
# Black-and-white (default)
python3 hide_in_image.py samples/sleepy_kitty.jpg modified_samples/encrypted_sleepy_kitty_bw.png --padding random --password=testpass --mode=encrypt --color-mode=L
# RGB kitty
python3 hide_in_image.py samples/sleepy_kitty.jpg modified_samples/encrypted_sleepy_kitty_rgb.png --padding random --password=testpass --mode=encrypt --color-mode=RGB
You'll notice that the second one has color and is smaller, which is a better way of using screen space, although the effect on file size is negligible.
You can also hide data/encrypted data inside a regular image. This can be done by replacing the red, blue, green, and/or alpha channels of a regular image with the data from some other file. If you replace one of the RGB channels, the image will most likely look pretty weird, but if you choose the alpha channel for a relatively small data file and pad with a value of 255, then the effect is barely noticeable, since only the upper left-hand corner is affected. Below is an example of another cat picture hidden in a picture of a shrine (note that the swastikas in this image are a religious symbol and unrelated to Nazis):
echo "HIDE DATA IN ALPHA CHANNEL OF IMAGE"
echo "ENCRYPT"
python3 hide_in_image.py samples/shrine.jpg samples/sleepy_kitty.jpg modified_samples/kitty_hidden_elsewhere_in_shrine2.png --hide-data-as-alpha --padding=255 --mode=encrypt --password=testpass
echo "DECRYPT"
python3 hide_in_image.py samples/shrine.jpg modified_samples/unred_sleepy_kitty_from_shrine2.png modified_samples/kitty_hidden_elsewhere_in_shrine2.png --hide-data-as-alpha --padding=mask --mode=decrypt --password=testpass
Since the original image is pretty big, it takes up a good chunk of the image, but a smaller file (or larger "mask" image) would be more hidden.
You can also hide several images in one as long as you remember which channels go with which image.
I also wondered what the Affordable Care Act (aka "Obamacare") looked like when I ran it through my script with a width of 720. It's kind of big, but it's interesting to see some patterns show up, especially near the bottom.
Although the above examples don't exactly pass for "regular" images, when combined with encryption, the data is truly "hidden". Regardless, this could be a fun way of passing secret messages. In fact, here is one that might have a prize for the first person who solves it (I used the --heavy
flag when encrypting the data):
The code and documentation used to generate the above images can be found in the three scripts in https://github.com/mcandocia/canyouseeme.
The tests
folder gives examples of how the scripts are run, and enc.py
and hide_in_image.py
are similarly formatted when run, but bti.py
has the order of python3 bti.py [input] [output]
, which may be slightly confusing.
Any feedback/suggestions are always appreciated. See my about page for my contact info.
Tags: