Effectively Hiding Data in Images

Effectively Hiding Data in Images

By Max Candocia

|

December 29, 2021

Question: Can you hide a video in a picture of a cat, or a game in a picture of a dog?

Answer: Yes, provided the pictures are large enough and/or the video or game are small enough.

Question: How can I do this?

Answer: A program in Python I wrote can do just this, GitHub link here. You can password-protect the file, too, and as long as the password is not easy to guess, it's impossible to crack.

Question: What if the file I want to hide is large?

Answer: You can make a zipped folder of many images and hide the file in multiple images.

Question: How does it work?

Answer:

The process of hiding the data involves 2 main steps: encrypting the data so that it needs a password, and hiding the data in the pixels of an image.

Encrypting Data

By encrypting the data, it becomes unreadable without a special key. The key is created by putting the password into a complex formula that will virtually never produce the same keys for different passwords, although it is theoretically possible.

Hiding the Data

Encrypted data is noticeable, as it will be just a bunch of random bits (1s and 0s). Hiding it in an image in a sort-of-random way makes it less conspicuous. I do this in a few ways:

  1. The password is used (in a slightly different way) to create random sequences of pixels to hide the data in, with 1 byte per pixel.
  2. Each pixel can theoretically store 3 bytes of data in its 3 channels (red, blue, and green channels in each pixel can store 1 byte each). However, that would completely overwrite the pixel. 1 byte is a lot less conspicuous. I use the least-significant 2 and 3 bits of each of those channels to store the data, so 3 bits, 3 bits, and 2 bits of the channels are used, the 2-bit one being randomly selected in the same manner as the pixels are chosen

Each byte has a value from 0-255, whether in pixel form or just raw data, so imagine that you want to store the value of 100 in an RGB pixel, with 3 bits being in red, 3 bits in green, and 2 in blue.

In binary, 100 is 01100110, Which are values of 4, 1, and 2 for red, green, and blue, respectively. For the red and green pixels, the original image's value is rounded down to the nearest multiple of 8, and for blue, the nearest multiple of 4. Then the values of 4, 1, and 2 are added to the rounded pixel's value, adding a small bit of noise, but nothing that a human could notice. Or possibly a computer algorithm, if the image was noisy enough already.

Unpacking the hidden data

All one needs to unpack the data is the password used to hide the data. There are additional settings one might need to know, but the default is sufficient for security purposes.

Best Practices

  1. Since you can encrypt 1 byte per pixel, you should have more pixels than bytes. The code is not currently optimized to handle heavily-saturated images, and will slow down significantly. i.e., it takes 2x as long per pixel when 1/2 are remaining, 3x for 1/3 remaining, etc.
  2. Both the reading and writing take similar amounts of memory. The --use-disk option is highly recommended to save memory that is used to keep track of which pixels have already been written. This slows down the process a little bit if you otherwise have sufficient memory.
  3. If your goal is to truly make an archive inconspicuous, noisy images are preferred, and blurred images are not. In some tests, the distribution of colors is obviously changed for low-quality, JPEGy pictures, as well as some different computer-generated images heavily-edited photos.
  4. For the most part, this effect wasn't that the colors didn't appear in the original, but they were much less common.

Note that regardless of what you do, the encrypted message is safe. What the above prevents is someone from knowing the relative size of the data, as well as whether there is anything unusual about the images containing the data themselves.

Example

Below is an example of two images, the left one being the original, and the second one being the one with a hidden message. The password to it is the answer to this riddle:

How do cats greet each other on December 25th? (answer is all lowercase, no punctuation, except possibly spaces and/or apostrophes ')

Why Did I Make This?

I find the topic interesting, and I wanted to create puzzle boxes that were sufficiently secure, but would still be interesting for people to have if they couldn't open them (hence the pictures).

Previous Work

I previously hid data in both unencrypted/encrypted form in images in a simpler article on my website. The data density was a lot higher, but a lot more conspicuous, and didn't work with archives.


Tags: 

Recommended Articles

"Error Bars" on Tiled Heatmaps

Heatmaps are a useful way of plotting 2-dimensional data, such as cross-tabulations. Adding "error bars" can seem non-intuitive, but expressing them in your visualization is possible with a small trick.

Hashing Data to Memorable Phrases

Do you have trouble memorizing long strings, but want to keep things easy to remember? Look no further than the new keyToEnglish package in R, now available on CRAN.