Embedded files

From Encyclopedia Dramatica
Jump to navigation Jump to search
As explained by this confusing collection of boxes

An embedded file is a file that is stored or hidden inside another file, particularly inside an image which may then be posted to the *chans. For example, concatenating a JPEG file with a RAR file produces an embedded archive which can be read either as a JPEG or a RAR, depending on how it's opened.

File concatenation

An embedded MP3 file, commonly seen on /a/.
Because of this image, all other images on eBaumsworld.com contain child pornography.

One of the most common ways of embedding files into images is simple concatenation. That is, the new file contains the data from the first file followed by the data from the second. Which file you see depends on the program you open it with.

This only works for certain combinations of file types. Many types of files will work for the first part, but it should be a GIF, JPEG, or PNG file if you want to post it to 4chan. The second file should be one of the following types:

In addition:

  • Broken web pages occasionally append HTML to the end of the images they serve. In most cases, the contents are unremarkable. But several images from the diaper fetish website wetherbed.com contain the login credentials. These images are often reposted in diaper fetish threads on /b/ with the posters unaware of what's in them. You can find this information by opening the files in a text editor such as Wordpad, and searching for "password".
  • Many animated GIFs have been turned into concatenated sequences of JPEG images by a broken thumbnail maker on tumblr. [2]

Examples

In Windows:

copy /B foo.jpg + bar.rar foobar.jpg

In *nix:

cat foo.jpg bar.rar > foobar.jpg

Both of these examples will create a file named foobar.jpg, that when viewed graphically is identical to foo.jpg, but when unrar'd contains the contents of bar.rar.

Why does it work?

In GIF, JPEG, and PNG files, as well as many other file types, there is information in the file that tells the program reading it how long the file is and/or where to stop. So if you put additional data after the end of the original data, no one will give a shit about it.

Many types of compressed archives (7Z, RAR, ZIP) can be distributed as self-extracting files, which are composed of an executable file concatenated with the archive. So these file types are designed to be readable even if they've been appended to another file. For 7Z and RAR, the extractor searches for the "magic number" that indicates the start of the archive data. ZIP files, on the other hand, are read starting from the end of the file.

File binders

A file binder is a program that appends files and their names to images in its own particular format, and extracts the files other people add to images. They often apply simple transformations to the data to circumvent filters.

  • pFBind (Praetox File Binder) was created to get around 4chan's block on embedded RARs and save Lithursday, but it was eventually blocked from 4chan itself.
  • 4chan sounds plays OGG sound files which have been concatenated to images in the following way (On Windows, replace "cat" with "type"):
 echo [SoundName]>>image.jpg
 cat soundfile.ogg>>image.jpg

These are now blocked on 4chan, so if you want to participate in sound threads, you need to use a player which can play sounds that have been embedded in other formats.

  • ChanGrouper (v1:[3] v2:[4]) is yet another file binder, written in Java. It has not yet been blocked from 4chan. The ChanGrouper websites may be down; you can alternately download ChanGrouper here (v1:[5] v2:[6]). The original source code of the program is included in the JAR file; you can examine it by downloading the file and either renaming it to .zip or opening it in your favorite archiver. This was the format used by Coupon Guy to distribute his guides on making fake coupons before he was vanned.

Blocked on 4chan

The old technique for posting embedded archives on 4chan. No longer works.

Embedded 7z, RAR, and ZIP archives are currently blocked on 4chan, giving posters the message "Image file contains embedded archive." At first moot's jpg-rar and sounds filters were particularly easy to circumvent, since he didn't scan the whole file, only the first 64 KB (later updated to 256 KB) and last 64 KB. In those days, all you needed to do to get around it was add padding after the image (for example by using several copies of the image) to push the beginning of the RAR file past the 256 KB threshold.

The filter was updated in November 2012 to include Ogg sound files. Moot's statements on the block are in this /q/ thread. The 4chan sounds script was quickly rewritten to play files in which "OggS" had been replaced with various strings; moot responded by adding "libVorbis" to the filter. This resulted in the use of yet more ways to circumvent the filter as well as new versions of 4chan sounds to play the obfuscated files. (You can use this to post sounds.)

In December 2012, the filter was updated to scan the entire file, killing the padding method. Since many of the strings moot scans for are only 4 bytes long, this led to numerous files failing to upload because they randomly contained a string that moot interprets as the magic number of an "embedded archive."

Things escalated for one day in January 2013 when a mad developer tried to kill 4chan by placing batshit arbitrary limits on file sizes. If the size of your image in kilobytes (1024 bytes) was larger than

  • 2/3 of its width+height (for JPEG)
  • 10/13 of its width+height (for PNG)

then 4chan would give you the helpful error message "Error: Your image contains an embedded file." Since the developer who came up with these formulas had no clue what a reasonable filesize for an image should be, if you posted a large image on 4chan, you stood a good chance of triggering this message. It did nothing to stop embedded archives; they just needed to be attached either to GIFs or to images with huge resolutions and small filesizes (pictures with large blocks of solid color will do the trick).

In April 2013, in yet another attempt to block sound threads, the developer decided to write a filter that blocked images larger than 12 bytes per pixel unless the image was a GIF. But because he's too lazy to test changes out on a test board, when he inevitably fucked up it wreaked havoc all over 4chan. He fucked up the filter twice, the first time blocking all GIF images, and the second time allowing all GIF images but blocking non-GIF images smaller than 12 bytes per pixel. The third time he got it right. After all this, you can still post embedded files such as sound images, but you have to either use an image that's not absurdly small compared to the filesize (if an image is blocked, you can just enlarge it), or use a GIF.

None of this has stopped the posting of embedded files, which continue to be posted using methods such as:

  • Alter the magic number in the RAR file, for example by replacing "Rar!" with "Bar!". Use a hex editor to do this so you don't make other unintentional changes to the file.
  • Apply any number of simple transformations to the embedded data. For example, the scripts on this page will scramble or unscramble any data appended after the image.
  • Concatenate the image and file without compressing the file. If file isn't an archive or an Ogg sound file, it most likely won't be blocked. But if the file isn't one of the types listed above, you'll need to use a hex editor to extract it. If the image is a JPEG file, search for FF D9 to find the end of the image data, and delete it. Alternatively, those of you not versed in Computer Science III may want to try this Greasemonkey script, which can detect the added data in images on 4chan and split the image back up into its original pieces. Also useful for telling fake jpeg-rar books from real ones. Do not use this technique to upload source code or HTML files as this may trigger the anti-4chan.js filter and get you banned.

Or you can also try one of the many other methods of embedding files in images...

Metadata blocks

Files can also be embedded in the metadata blocks of images. This technique has not seen as much use since it takes more work than concatenation, and isn't significantly harder to block. But it does have the advantage of working on sites which strip off appended data.

Image Data

Cornelia format

A Cornelia-style archive containing tools for making more.

These archives are embedded into the image data of a 24-bit Windows bitmap, then converted it to a PNG so you can post it on 4chan. This was the format used by Cornelia to post the dox of infected users. Moot never figured out how to filter out Cornelia's posts efficiently as he had done with previous incarnations of 4chan.js, and instead gave up and added CAPTCHA to 4chan. So it's clear he doesn't have an effective way of blocking it. And if he does figure out how to block it, he may consider removing CAPTCHA, so it's a win-win.

The main advantage of Cornelia over other formats that can be posted on 4chan at this time is that you don't need a special program to read them. All you need to do is:

  1. Convert the image to a 24-bit BMP file. You can do this by opening it in an image editor, and saving it as the correct type:
    • In MSPaint: Make sure the save type is set to "24-bit Bitmap". You may have to make and undo an edit to force deletion of the alpha channel.
    • In Mac OS X's Preview: Before saving the image, flip it vertically. Choose the format "Microsoft BMP". Make sure the "Alpha" box, if present, is unchecked, and that the "Rotate without modifying contents" box is checked.
    • In The GIMP: Change the extension to ".bmp". In the next dialog, make sure "24 bits: R8 G8 B8" is selected under "Advanced Options".
  2. Open the .bmp file with 7-Zip or WinRAR.

You can also do this manually on *nix:

convert inputimage.png +matte tmp1.bmp
7z x tmp1.bmp

The big drawback is that the data isn't hidden at all; it's visible as a big ugly gray band in the image.

There are now userscripts which support posting archives in Cornelia format as well as extracting the files and viewing them in your browser.

To create one manually, start with an image with enough blank space at the bottom to hold the archive data. The number of pixels needed is 1/3 the length of the archive. It's also important that the image width is a multiple of 4. Then on Linux / OS X you can do:

convert inputimage.gif[0] -type truecolor -depth 8 +matte tmp1.bmp
head -c 138 tmp1.bmp > tmp2
cat inputarchive.7z >> tmp2
dd if=tmp2 of=tmp1.bmp conv=notrunc
convert tmp1.bmp outputimage.png

Other tools for creating them can be found bundled in the image to the right.

Snowcrash format

A guide on making and extracting snowcrash images.

Archives and other types of files embedded in Photoshop RAW files and converted to PNG are also sometimes posted on 4chan. These have become known as "snowcrashes."

Steganography

Steganography is the art of hiding messages. Many of the things we've discussed above could be called steganography, but usually the term is used for schemes that are a bit harder to detect.

One of the simplest and most common forms of digital steganography is to embed files in the image data, but only to use the least significant bit (LSB) of each byte. For example, if the original image contained the bytes 00011000 01000001 01100001 01010010, you could embed a message (example: 1010) by changing the bits in the ones position: 00011001 01000000 01100001 01010010. Additional data can be stored by using multiple bits. If the number of bits used is small, the change will generally not be detectable by eye.

The use of LSB steganography on 4chan has seen an increase recently, starting in late 2012 when moot blocked the upload of 4chan Sound images, and in response dnsev wrote a sound player for sounds encoded in the last few (typically around 4) bits of PNG images. Another LSB steganography program designed for sharing files on 4chan is the 4chan Gold File Embedder (Java source inside JAR archive).

Google will find you all sorts of programs which claim to be steganography utilities. Some of them actually are; others are really just file binders or embedded archive makers as described above. And many of the programs that actually are steganography have serious flaws. See [7] for some details.

You will often hear that steganography is undetectable, but this is a myth. In particular, no algorithm can hide an arbitrary 3 MB file inside an image without making the size of the image at least 3 MB. Generally, the larger the file you're trying to hide, the easier it is to detect. Data hidden with programs designed to share megabyte-size files on 4chan, such as the LSB-based programs above, may hide files from moot's filter or your little sister, but they won't deter the Party Van. The hidden files can be detected by such methods as looking for high-frequency noise or analyzing the color histograms.

If you want to hide short text messages in a JPEG, you're in a bit better shape. But even for the best steganography algorithms out there, experts are constantly searching for and finding ways of detecting the files they hide. Some steganography methods that have performed well in tests are Modified Matrix Embedding (need link to an implementation!) and Perturbed Quantization. A C implementation of the original, weaker version of PQ is available here. Some other tools for hiding short messages in JPEGs include F5, OutGuess, and steghide. Outguess was notably used in the Cicada puzzle. Be aware that in order for the messages to be hard to detect, you should embed them in photographs. If you use computer graphics, especially those containing large chunks of solid color, embedding files steganographically will create noticeable artifacts. See the images below for examples of these artifacts.


See Also


Softwarez series.jpg

Embedded files is part of a series on

Softwarez

Visit the Softwarez Portal for complete coverage.