Encoding an image to sound
Submitted by esalazar on Mon, 01/07/2008 - 8:36pm.
::
The purpose of this project is to encode an image to a sound that can be viewed with a spectrogram. For some time I have known that musical artists have encoded pictures into their music. Most notable of these is artists is Aphex Twin. Luckily I had a copy of Windolicker and a great visualization program Sonic Visualiser. After looking at the images I decided it would be cool to try and encode my own images. I saw a few programs available, but decided it would be a better challenge to write my own program from scratch using Perl.
Spectrograms
A spectrogram is a graph representing the intensity or a frequency with relation to time. Normally the frequencies are along the Y axis, with the time on the X axis. The intensity of the frequency is represented by the brightness of the color. The frequency and color can use either a linear scale or a logarithmic scale. Below is an spectrogram of a few piano chords. The audio file used can be found on Wikipedia here.
Image encoding
The idea I had to encode the image was to simply create a sine wave at a corresponding frequency to represent the Y axis, a corresponding time to represent the X axis and a corresponding amplitude to represent the pixel color intensity.
Creating Sound
The first step to encoding an image was to learn how audio formats work. At first I tried writing a script that plays a frequency to the '/dev/dsp' (Which is the sound card on Linux). When writing straight to /dev/dsp you are limited by a sample rate of 8000hz and a sample size of 8bits. Below simple Perl script that plays a concert A 440hz. To execute run './sin.pl > /dev/dsp'.
The DSP defaults do not offer much fidelity I needed at least the fidelity of an audio CD, which is 16bits at 44.1khz. I did some of searching on CPAN to find a library that allowed me write wave files. Most of the audio libraries had a too much overhead for what I wanted to do. Instead I looked up the file format for a '.wav' and coded my own library. This library is limited to only producing a 16bit 44.1khz mono wave.
Reading a Bitmap
Luckily I found a simple bitmap reader on CPAN called Image::BMP. This is a nice lightweight library that dose not depend on any external libraries or compiled code. Using this library I was able to easily load and read the bitmap data.
Encoding the Image
The first pass of my program disregarded the color data and only produced a frequency for the Y axis if the color intensity was less that half the sum of all colors. Below is an example. Note: I converted the WAV to an MP3 to conserve bandwidth, at 320kbps not much data is lost.
Audio File: ohmpie.mp3
I was really shocked to fist see the image! The only tweaking I needed to do was to use a linear scale for the frequency. Also if I selected too high an amplitude for the sin wave, clipping occurred in areas with too much black. For image above I used an amplitude of about 1000 on a scale of 0 to 32768.
The next step was to add amplitude scaling to match the color intensity. For this I summed all the color channels for a given pixel and scaled it to represent the max amplitude '(R + G + B) / 768 * max_amplitude'. Below is a picture of me after using the scaling.

Audio File: evan.mp3
By selecting a color scheme that goes from black to white and using a linear scale for the volume I get a very good black and white image. To prevent clipping on very dark images I added an inverse option that will invert the color producing a negative image.
Audo File: evanInv.mp3
You can reverse the color scheme to go from white to black to produce the regular image
Full Program
Below you can view and/or download the full code to this program. Currently performance is not optimized. So don't write me telling me its slow. I currently have a few idea to speed it up. Also for best results use a small image around 100px tall.
View Code
Download: imageEncode-0.7.tar.gz
Spectrograms
A spectrogram is a graph representing the intensity or a frequency with relation to time. Normally the frequencies are along the Y axis, with the time on the X axis. The intensity of the frequency is represented by the brightness of the color. The frequency and color can use either a linear scale or a logarithmic scale. Below is an spectrogram of a few piano chords. The audio file used can be found on Wikipedia here.
Image encoding
The idea I had to encode the image was to simply create a sine wave at a corresponding frequency to represent the Y axis, a corresponding time to represent the X axis and a corresponding amplitude to represent the pixel color intensity.
Creating Sound
The first step to encoding an image was to learn how audio formats work. At first I tried writing a script that plays a frequency to the '/dev/dsp' (Which is the sound card on Linux). When writing straight to /dev/dsp you are limited by a sample rate of 8000hz and a sample size of 8bits. Below simple Perl script that plays a concert A 440hz. To execute run './sin.pl > /dev/dsp'.
#!/usr/bin/perl
use Math::Trig;
use strict;
use POSIX;
my $sample = 8000;
my $frequency = 440;
my $cycles = 6;
my $period = POSIX::floor($sample / $frequency * $cycles);
while (1) {
for(my $i=1;$i<=$period;$i++)
{
my $x = 128 + sin($cycles * 2 * pi * $i / $period) * 128;
$x = POSIX::floor($x);
my $char = pack("C",$x);
print "$char color="#ff00ff">";
}
}
The DSP defaults do not offer much fidelity I needed at least the fidelity of an audio CD, which is 16bits at 44.1khz. I did some of searching on CPAN to find a library that allowed me write wave files. Most of the audio libraries had a too much overhead for what I wanted to do. Instead I looked up the file format for a '.wav' and coded my own library. This library is limited to only producing a 16bit 44.1khz mono wave.
#!/usr/bin/perl
#Author Evan Salazar
#--------------------------------------------
#
#Generate a .wav file for 16 bit mono PCM
#
#-------------------------------------------
use strict;
package SimpleWave;
sub genWave {
#Get the reference to the data array
my ($audioData) = @_;
#This is the default sample rate
my $samplerate = 44100;
my $bits = 16;
my $samples = $#{$audioData} + 1;
my $channels = 1;
#Do Calculations for data wave headers
my $byterate = $samplerate * $channels * $bits / 8;
my $blockalign = $channels * $bits / 8;
my $filesize = $samples * ($bits/8) * $channels + 36;
#RIFF Chunk;
my $riff = pack('a4Va4','RIFF',$filesize,'WAVE');
#Format Chunk
my $format = pack('a4VvvVVvv',
'fmt ',
16,1,
$channels,
$samplerate,
$byterate,
$blockalign,
$bits);
#Data Chunk
my $dataChunk = pack('a4V','data',$blockalign * $samples);
#Read audoData array
my $data;
for(my $i=0;$i<$samples;$i++) {
$data .= pack('v',$audioData->[$i]);
}
#Return a byte string of the wave
return $riff . $format . $dataChunk. $data;
}
1;
Reading a Bitmap
Luckily I found a simple bitmap reader on CPAN called Image::BMP. This is a nice lightweight library that dose not depend on any external libraries or compiled code. Using this library I was able to easily load and read the bitmap data.
Encoding the Image
The first pass of my program disregarded the color data and only produced a frequency for the Y axis if the color intensity was less that half the sum of all colors. Below is an example. Note: I converted the WAV to an MP3 to conserve bandwidth, at 320kbps not much data is lost.
Audio File: ohmpie.mp3
I was really shocked to fist see the image! The only tweaking I needed to do was to use a linear scale for the frequency. Also if I selected too high an amplitude for the sin wave, clipping occurred in areas with too much black. For image above I used an amplitude of about 1000 on a scale of 0 to 32768.
The next step was to add amplitude scaling to match the color intensity. For this I summed all the color channels for a given pixel and scaled it to represent the max amplitude '(R + G + B) / 768 * max_amplitude'. Below is a picture of me after using the scaling.

Audio File: evan.mp3
By selecting a color scheme that goes from black to white and using a linear scale for the volume I get a very good black and white image. To prevent clipping on very dark images I added an inverse option that will invert the color producing a negative image.
Audo File: evanInv.mp3
You can reverse the color scheme to go from white to black to produce the regular image
Full Program
Below you can view and/or download the full code to this program. Currently performance is not optimized. So don't write me telling me its slow. I currently have a few idea to speed it up. Also for best results use a small image around 100px tall.
View Code
Download: imageEncode-0.7.tar.gz



first i had a question..
are you producing multiple frequencies for each x position when you have multiple pixels in the column? or are you producing a single frequency from left to right for each pixel and then going dwon row by row?
now the fun part
ok so now do the reverse.
Convert dolphin speak to an image so we can see what they are saying if they are speaking in pictograms!
seriously write a backwards versions of the code and lets put some online dolphin and whale recording through them and see what pictures they make.
I have always suspected that dolphin speak might be audio encoded pictograms.
www.vivzizi.com
your comments dont show?
I am producing a single frequency for each pixel along the x axis at a time offset, the value at the Y axis determines the frequency and the pixel intensity sets the amplitude.
The reverse is done with the spectrogram.
Once again, you blow me away. The depth of this is awesome, and I absolutely love the idea - I'm going to personally use this on some of my songs.
Something even better seen on KVR: http://www.kvraudio.com/forum/viewtopic.php?t=202374
"Perl programmer Evan Salazar has created a cool little app that encodes an image in an audio file. Essentially,..."
I like your site.
Very cool program. I'm rendering some .wav's for myself and noticed for each horizontal pixel, there is a small audio slice that looks like all the audio spectrum is lit up. This in turn, makes clicking on every pixel. Do you know what causes this?
If you look at this page under 'Resolution Issues' http://en.wikipedia.org/wiki/Short-time_Fourier_transform
it will explain how using different STFT time resolutions will blur the area between samples.
Evan
well I wouldn't say better but it does a similar thing.
an article about that lead me to this program which also does a similar thing.
http://www.hitsquad.com/smm/programs/Coagula_win32/
cheers,
Geo
http://www.vivzizi.com
Followed some links and found
There is are two free programs that also do a similar thing.
Audio paint
http://www.softpedia.com/get/Multimedia/Audio/Other-AUDIO-Tools/AudioPai...
and one called coagula that does a similar thing
http://www.hitsquad.com/smm/programs/Coagula_win32/
Yours is best though because it is open source and we can play with it!
Thanks!
cheers,
Geo
http://www.vivzizi.com
Evan,
Very nice job!
I have worked on two similar projects which you may find of interest:
* Opensonify: a very incomplete open source clone of the webcam-to-sound software called The vOICe. You can download the code from CVS. It is written in C.
* Graphics on an Oscilloscope using Audio CDs: a first post and a follow-up post. This essentially converts graphics into stereo-sound which is rendered on a scope in XY mode.
From working with these projects I learned a few things that might help you out:
* You said "The DSP defaults do not offer much fidelity I needed at least the fidelity of an audio CD, which is 16bits at 44.1khz..."
The easy way around this is simply to set the /dev/dsp device to a different bits-per-sample and sample rate. This can be done with ioctl() calls, at least in C. I learned how to do this from this page.
If you then want to make wav files, I found the easiest thing to do was not to use a wav library but rather to simply write the data that would otherwise goto /dev/dsp into a file, like "sound.out". Then I opened this file in Audacity as "Raw Audio Data", making sure to select the correct parameters. Then I could save in any format Audacity supports.
I don't know about you, but I ran into lots of problems writing to /dev/dsp because other applications were using the sound card. The OSS emulation (/dev/dsp) is really a legacy feature provided by ALSA so it would make more sense to just work with ALSA in the first place. This article is where I learned (sucessfully) the basics of ALSA programming.
* You say that there is not much loss due to compression in your mp3s. I found the same was more-or-less true with my graphics->sound->scope system, except when I had multiple shapes on the screen with a large displacement from one another (orbiting circles). The compression made the scope traces quite fuzzy.
* Definately check out "The vOICe" software. This technology uses the same kind of background engine you do for turning images into sound, but they use it to help the blind see. I was so interested in this software (and frustrated that it didn't run on linux) that I began an open source clone of it called OpenSonify. I got the core engine working but never took the time to add features to it. Feel free to check it out on Sourceforge (linked above).
One thing that never occured to me while working on it was to use a spectral analyzer in real-time! I will have to get out the old webcam and give this a try!
Thanks very much for your code, I will reply again if/when I get your perl code working on my machine.
Yours,
Chris Merck
http://hotwigati.blogspot.com/
ok, yeah that made me hard. now i have to write one in QB or VB. dammit, i had plans for this weekend too. buttmunch. sweet program. pure 1337.
What size was the picture of your face, what parameters did you use, and how many hours (days) did it take to run ?
I have a fast system but the perl script sure is slow.
"The_vOICe" produces excellent results almost immediatly.