Wednesday, May 28, 2008

The Vocoder

In Between Planets by Heinlein, the intelligent dragons who lived on Venus used a “voder” to speak. “Voder” was clearly a contraction of “vocoder,” which itself is a contraction of “voice encoder-decoder.” In The Moon is a Harsh Mistress, Mike, the intelligent computer, used a vocoder.

The vocoder was a real device and a pretty cool gizmo. It took a sound sample and fed it through a series of notch filters, very narrow bandwidth filters, and measured the amplitude of each narrow frequency, making it essentially a device for producing a power spectrum. That’s the encoding part. The decoder essentially reversed the process. If you put enough bands into it, you can get more-or-less recognizable speech out of the decoder, at a small fraction of the bandwidth of full speech.

The trick is that you are tossing out phase information, the connection between each sound frequency, so you never get the sound of real speech out of a vocoder, no matter how many frequencies you segment the sound into. What you get is one of those “robot voices” that you’ve heard in movies and TV since the 50s. You can also twiddle with the playback by changing the nature of the original frequency set, or even imposing a voice envelope onto other sounds. That’s how Disney and Bell Labs TV specials got all those “talking instruments” ‘way back when. I’m not sure about Gerald McBoing-Boing

The unnaturalness of the vocoder output sent sound researchers back to an older vision: vocal tract modeling. I’m told that before the phonograph, there was a lot of interest in “talking machines,” literally, machines that talked like people do, by expelling air through a vocal tract. Vocal tract modeling attempted to do the same thing, only digitally, and it met with about the same success: not much. It sounded okay if restricted to some very amenable phrases (“We were away a year ago”), but more frequently, it was just unintelligible.

Eventually, cheaper hardware, especially memory, came to the rescue. Current speech generators simply look up words in a dictionary, and spit out the correct phonemes, linked together with some special rules. They can sound fairly realistic, provided your idea of realistic speaks with a Swedish accent. Stephen Hawking uses one of these types of speech synthesizers, by the sounds of it, but he has it set to sound more like the old vocoder style of robotic intonation, perhaps to emphasize that it is a robotic voice he’s using, or maybe because Hawking is a bit of a card.

2 comments:

Anonymous said...

There's a banner ad somewhere on the internets that features an animated female head and invites the viewer to type a text message for vocal playback. For some reason I feel compelled to offer the young cartoon lady words that would embarrass all of her ancestors if she ever spoke them aloud. She invariably reads them back in a slightly off kilter mechanical twang. I always laugh, I think because I admire her courage.

James Killus said...

Including her cartoon ancestors? I'm imagining a Betty Boop voice here.