A Dilemma about Sound



Channelizing and compressing sound, as used in making hearing corrections, faces a dilemma about sound. A single instrument ascending the chromatic scale will provide its full power to every one of the frequency bands along the way. But an orchestra will produce a blizzard of power in all of them at once.

If you have ever watched a spectrum display while editing your music you may recognize what I’m about to tell you. As you increase the frequency precision of the display by upping the number of points across your FFT’s, that noise floor appears to drop further and further.

Wow! If we just used a gazillion-point FFT there wouldn’t be any noise in the display!

That would actually be true… What the FFT display shows you is not a single-frequency estimate of the spectrum of sound at each displayed point, but rather, an average over a narrow bandwidth. And as that bandwidth shrinks with increasing FFT precision, the cumulative power in each of those bands shrinks – but only if from a noise or orchestral storm. A single pure tone would continue to display as brightly as ever.

So here’s the dilemma. If I make those frequency channels too narrow, there won’t be enough power left in them to make any sensible hearing correction calculation, unless the channel contained only a pure tone.

Yet, it doesn’t matter how finely you divvy up the spectrum, we actually do still hear all of that noise. Regardless of whether or not the channel contains only a pure tone, or a bunch of random noise-like signals, we still need to be able to provide the same kind of hearing correction for both.

So, years of research by hearing investigators before me, learned that our hearing naturally produces its own self-organizing frequency channels around loud tones. They aren’t hard wired frequency channels like we are accustomed to creating with filters. Rather they adaptively form around whatever frequencies have those loud sounds. And they tend to have an equivalent bandwidth of about 1 Bark, which is also a measure of size on the basilar membrane.

But unlike the fixed-width frequency bins of an FFT analyzer, these critical bandwidths become increasingly wider as you move to higher frequencies. So I suspect that biology tripped over the same dilemma aeons ago in our early ancestors, because the power spectrum of natural sounds diminishes with increasing frequency too. And if you kept constant bandwidth analysis channels, there soon wouldn’t be enough power in the high frequency bands to make sense of anything.

That’s why Crescendo adopted the use of Bark channels. And when we want to know how loud the sound is at some frequency, we measure the power across an entire Bark bandwidth. We use 100 Bark channels, even though there is only room for 24 Bark bandwidths across our audible spectrum from 20 Hz to 20 kHz. We place our Bark bands at quarter-Bark spacing across the audible frequency range. They are hardwired frequency channels, not self-organizing the way our hearing is, but located on a fine grid.

We did this fixed grid because another requirement of Crescendo processing is that it be capable of performing in real time with minimal throughput delay. (Once again with the approximations!) If we had all the time in the world, we could more closely mimic the self-organizing Bark bands. But our biology is far more powerful even than our largest computers. And on a simple computer we have to compromise.

[ You never know, when making approximations, whether their results will be good enough. You have to just try and see. And after numerous trials and listening tests, it seems that 100 Bark bands at quarter-Bark spacing is a pretty good approximation.

You would be surprised to see how crudely you can approximate in some areas, but not in others. Pitch perception is quite demanding, but loudness correction does not seem to be as much. ]

This audio power dilemma should be kept in mind. Whenever you see a hearing aid bragging about having 4, 15, 30 bands, don’t be too quick to become impressed. Modern hearing aids use WOLA filter banks, which is just a clever way of making running FFT’s. They all have constant bandwidth in those, however many, bands. And they aren’t matched to how we hear. It’s just the best we can do with very low-power DSP processing today, with acceptable throughput delay.

  • DM

[ Heh! This is where “marketing” seems to win out over physics sensibility. You can’t simultaneously brag about huge numbers of channels, while also claiming to match human hearing, on low-power devices.

You really could use an FFT with lots of channels. We do. But then you have to aggregate them into Bark-like bands, which lowers your bragging rights. And aggregated processing also demands more horsepower than seems possible on today’s low-power DSP’s.

There are only 24 Bark bandwidths across the entire audible range. And only 10 year old kids can really utilize the top band. But there are reasons in signal processing to use at least twice as many, or more, in the same audible range. Is 100 Bark bands overkill? Perhaps. We did a version with 50 Bark bands. I think 100 bands sounds a tad better.]

Author: dbmcclain

Astrophysicist, spook, musician, Lisp aficionado, deaf guy

Leave a Reply