A while back I heard some samples of the difference between lossless files and their compressed forms (MP3) and thought it seemed pretty cool to hear what data the compression algorithms are throwing away. I just got around to trying this for myself. Here's what I did:

  • First things first, you'll need sox with the appropriate formats installed. Optionally install octave, and vorbis-tools or a compressor of your choice. On debian: apt-get install octave sox libsox-fmt-all vorbis-tools
  • Start with a lossless file, eg a FLAC or a WAV ripped from a CD.

$ sox ~/Music/Dalriada/Jegbonto/05\ -\ Téli\ ének.flac original.wav

  • Convert it to a lossy format. I'd reccomend using vorbis, since I had trouble with MP3, see below. For example:

$ oggenc -q 5 -o lossy.ogg original.wav

  • Convert the compressed file and original to RAW format. Check that they're the same size.

$ sox original.wav original.raw
$ sox lossy.ogg lossy.raw
$ ls -l *raw

  • Start octave or matlab and run the following commands to subtract the files as 16-bit signed ints (alternative follows):

> original=fread(fopen('original.raw'),Inf,'int16');
> lossy=fread(fopen('lossy.raw'),Inf,'int16');
> fwrite(fopen('out.raw','w'),original-lossy,'int16');

  • Alternatively, if you don't have/want to install octave, Techmeology provides some C code to acheive the same thing.

$ gcc -o compare compare.c
$ ./compare original.raw lossy.raw > out.raw

  • Now convert it to a usable format (eg wav). Optionally you can amplify it to bring it to an audible volume level. Here I've applied 10dB of gain:

$ sox -r 44100 -t raw -e signed-integer -c 2 -b 16 out.raw out.wav gain +10

And there you have it. I was left with a sort of haunting ghostly screaming sound. I would share but you know, copyright and all that. If anyone can suggest some creative commons music to publish samples of though, I'd be happy to do that.

I have noticed a problem though - by converting to MP3 and back you seem to end up with fewer samples. I've tried padding it out with zeros before the comparison, but it then the difference file becomes just a choppier-sounding version of the original. If anyone knows how to resolve this, please do let me know.