Hack Jam Log Book is a log of progress made in and around a weekly hack session. Topics include natural language processing, high energy electronics, linguistics, interface design, &c. Enjoy.

Recent Posts:

Archives:

31.10.08

 

Livescribe Pulse Hacking I: The Frequency Domain

One of the frustrating things about the exciting livescribe pulse is it's inability to communicate with linux. Even through virtualization of a windows machine, you'll be lucky to get a working system.

Another is the fact that, while programs build with the livescribe pre-release SDK can save files to the pen, they cannot send those files to the desktop machine --- the only way to interact with program data is through the pen.

Well, I say pooh to that: "Pooh!"

Since I was first able to program my livescribe pulse (in a rude rendition of 'hello, world'), I've been scheming on how to communicate with the pen through hackish means. While reverse engineering the livescribe communications protocol would have been the correct way to do it, I feared it was, and is, beyond my means and motivation. I concentrated instead on the other two ways the pen has of communicating: the screen and the speaker.

My original plan had been to use light sensors, an AVR, and a program that would flash the pen's OLED display in order to send data back to the desktop computer. While this is plausible, the narrow bandwidth of the light produced by the OLED display makes it difficult to sense --- and the communication would always be one way. Besides, special hardware would be necessary, and that would make the potential user base tiny: linux-using livescribe owners who have the knowledge and hardware to program microcontrollers.

No, there must be a better way; so we turn to the speaker. Most computer owners have a microphone on hand; and communicating via tones is a tried and true method. My path is clear: create a livescribe modem!

In order to recieve information, the computer must be able not just to record the sounds coming from the pen, but to analyze them. To write software that can do that, we must have a firm understanding of the frequency domain. (Prerequisites note: If you don't have a firm understanding of the concepts frequency, wavelength, and sinusoidal graphs, it's time for a trip to wikipedia.

The central nugget of the matter goes like this: any continuous, periodic function can be represented as an infinite sum of sines and cosines.

Give that a minute to sink in; any function that is continuous --- not differentiable, just continuous --- and periodic --- but the length of the period can be as long as you like --- will do. Square waves, sawtooths, Bart Simpson's hair; any arbitrary waveform can be represented by a simple sum of sines and cosines.

Wow.

You may be worried about that word 'infinite' up there. Luckily, it turns out that we can approximate our function with some finite number k sines and cosines; and that each time we increase k, the approximation becomes more accurate.

Now let's take a moment to remember what the time domain, looks like. Imagine a signal coming from your microphone. It is a long string of numbers, and each number indicates a magnitude of sound at a single point in time (which is to say, the difference in pressure on the front and rear surfaces of the microphone pickup). If we were to graph the sound of someone whistling by using the time as the independent variable, and magnitude as the dependent, we'd see a lovely sine wave --- the archetypal image of sound, in the time domain.

But there is another way of plotting the same sound. We said that any curve --- no matter how complex our time domain curve might be --- can be represented by a sum of sines and cosines, right?

And, hold on to your hats, we only need 2 numbers to define a sine wave: the frequency of the wave, and the amplitude. So we have lots of data points enumerated by two real numbers? That sounds like a graph. And indeed it is. If we graph frequency as the independent variable and amplitude as the dependent, we have arrived at a graph of our sound in the frequency domain.

For those of you still having a little trouble visualizing, here's what the sound of me whistling looks like in the time domain:




And here's what it looks like in the frequency domain:



See that big sharp spike there around 800Hz? That is the primary frequency of the whistle (772Hz: a G5 that's a little flat). All the other stuff is the shower, the refrigerator, the computer fan, the traffic outside, the wind, and so on; but remember we're measuring in decibels, so they are whole orders of magnitude less in amplitude than the whistle.

Most of you are probably yelling at the screen by now, asking me how on earth we're supposed to find these numbers. Where do we get this sum of sinusoids if all we're given is the time domain information? The empty-words answer is "The fourier transform." To understand that answer, we're going to have to go in a journey --- one that I'll leave to my next post.

Labels: , , ,


Comments: Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]