Encoding an image to sound

The purpose of this project is to encode an image to a sound that can be viewed with a spectrogram. For some time I have known that musical artists have encoded pictures into their music. Most notable of these is artists is Aphex Twin. Luckily I had a copy of Windolicker and a great visualization program Sonic Visualiser. After looking at the images I decided it would be cool to try and encode my own images. I saw a few programs available, but decided it would be a better challenge to write my own program from scratch using Perl.


A spectrogram is a graph representing the intensity or a frequency with relation to time. Normally the frequencies are along the Y axis, with the time on the X axis. The intensity of the frequency is represented by the brightness of the color. The frequency and color can use either a linear scale or a logarithmic scale. Below is an spectrogram of a few piano chords. The audio file used can be found on Wikipedia here.

Image encoding

The idea I had to encode the image was to simply create a sine wave at a corresponding frequency to represent the Y axis, a corresponding time to represent the X axis and a corresponding amplitude to represent the pixel color intensity.

Creating Sound

The first step to encoding an image was to learn how audio formats work. At first I tried writing a script that plays a frequency to the ‘/dev/dsp’ (Which is the sound card on Linux). When writing straight to /dev/dsp you are limited by a sample rate of 8000hz and a sample size of 8bits. Below simple Perl script that plays a concert A 440hz. To execute run ‘./sin.pl > /dev/dsp’.

use Math::Trig;
use strict;
use POSIX;

my $sample = 8000;
my $frequency = 440;
my $cycles = 6;
my $period = POSIX::floor($sample / $frequency * $cycles);

while (1) {
for(my $i=1;$i<=$period;$i++)
my $x = 128 + sin($cycles * 2 * pi * $i / $period) * 128;
$x = POSIX::floor($x);
my $char = pack(C,$x);
print $char color=”#ff00ff”>”

The DSP defaults do not offer much fidelity I needed at least the fidelity of an audio CD, which is 16bits at 44.1khz. I did some of searching on CPAN to find a library that allowed me write wave files. Most of the audio libraries had a too much overhead for what I wanted to do. Instead I looked up the file format for a ‘.wav’ and coded my own library. This library is limited to only producing a 16bit 44.1khz mono wave.

#Author Evan Salazar
#Generate a .wav file for 16 bit mono PCM
use strict;
package SimpleWave;

sub genWave {

#Get the reference to the data array

my ($audioData) = @_;

#This is the default sample rate
my $samplerate = 44100;
my $bits = 16;
my $samples = $#{$audioData} + 1;

my $channels = 1;

#Do Calculations for data wave headers
my $byterate = $samplerate * $channels * $bits / 8;
my $blockalign = $channels * $bits / 8;
my $filesize = $samples * ($bits/8) * $channels + 36;

#RIFF Chunk;
my $riff = pack(a4Va4,RIFF,$filesize,WAVE);

#Format Chunk
my $format = pack(a4VvvVVvv,
fmt ,

#Data Chunk
my $dataChunk = pack(a4V,data,$blockalign * $samples);

#Read audoData array
my $data;
for(my $i=0;$i<$samples;$i++) {

$data .= pack(v,$audioData->[$i]);

#Return a byte string of the wave
return $riff . $format . $dataChunk. $data;

Reading a Bitmap

Luckily I found a simple bitmap reader on CPAN called Image::BMP. This is a nice lightweight library that dose not depend on any external libraries or compiled code. Using this library I was able to easily load and read the bitmap data.

Encoding the Image

The first pass of my program disregarded the color data and only produced a frequency for the Y axis if the color intensity was less that half the sum of all colors. Below is an example. Note: I converted the WAV to an MP3 to conserve bandwidth, at 320kbps not much data is lost.

Audio File: ohmpie.mp3

I was really shocked to fist see the image! The only tweaking I needed to do was to use a linear scale for the frequency. Also if I selected too high an amplitude for the sin wave, clipping occurred in areas with too much black. For image above I used an amplitude of about 1000 on a scale of 0 to 32768.

The next step was to add amplitude scaling to match the color intensity. For this I summed all the color channels for a given pixel and scaled it to represent the max amplitude ‘(R + G + B) / 768 * max_amplitude’. Below is a picture of me after using the scaling.

Audio File: evan.mp3

By selecting a color scheme that goes from black to white and using a linear scale for the volume I get a very good black and white image. To prevent clipping on very dark images I added an inverse option that will invert the color producing a negative image.

Audo File: evanInv.mp3

You can reverse the color scheme to go from white to black to produce the regular image

Full Program

Below you can view and/or download the full code to this program. Currently performance is not optimized. So don’t write me telling me its slow. I currently have a few idea to speed it up. Also for best results use a small image around 100px tall.

Download: imageEncode-0.7.tar.gz

Leave a Reply

Your email address will not be published. Required fields are marked *