Crescendo Re-Engineered

As a result of the flurry of experiments to defeat the background sonic artifacts, I have completely re-engineered the Crescendo algorithm. It no longer uses 100 1/4-Bark bands across the audible spectrum, but rather 12 bands (11 interior + 2 endpoint half-bands). Our loudness perception simply isn’t as frequency selective as to need semitone intervals.

The filters are now redesigned to be squared-Cosine shapes, overlapping by 50%, rather than stacked rectangular bandpasses.

And the hearing model has been revamped to reflect actual experimental results conducted in the Refined Audiometrics Laboratory, rather than relying on the gaggle of “experts” in the field. The experts are so fraught with internal disputes and mathematical fitting nonsense, that I finally threw my hands up in disgust.

Absolute Threshold of Hearing

This is related to the shape and location of equal loudness contours of our hearing. When you examine the graphs for Fletcher-Munson, Robinson-Dadson, and the 2003 revised ISO 223 standard, none of them show any error bars. But I tend toward agreement with ISO 223.

The original Fletcher-Munson curves exhibit a flattening response at louder levels, while the ATH (Absolute Threshold of Hearing) curve shows pronounced sensitivity around 3 kHz. But let’s back up a minute there…

What produces the 3 kHz sensitivity at threshold? We can consider two possibilities:

(A) the sensory nerves and brain are somehow different around the 3 kHz region, or that

(B) the shape of the ATH is due to mechanical factors and that the sensory system is basically the same at all frequencies.

Occam’s razor would point us toward answer (B). The simplest possible explanation is most likely.

We can easily account for the 3 kHz peak by considering that our ear canals act as 1/4-wave resonators, closed at one end. The wavelength of 3 kHz sound in air is about 4 inches. So 1/4 wavelength is about 1 inch, which is approximately an average length of a human ear canal.

But if that really is the cause for extra ATH sensitivity around 3 kHz, then it should also be present at all other loudness levels. A tube is a linear device for the passage of sound at typical levels. It cannot change its behavior as sound levels increase.

The same reasoning applies to the broader shape of the ATH with frequency, though not caused by the ear canal. There are many other mechanical details – the behavior of the 3 little bones between the ear drum and the oval window of the cochlea, the coupling of the basilar fluid to the hair cells, even the shape of the cochlea as a transmission channel. But all are mechanical details, and so the shape of the loudness contours have no reason to change, given the small levels of vibration involved. We aren’t anywhere near to nonlinear mechanical effects at normal loudness levels.

There is wide disagreement about the overall shape of the ATH curve. Some researchers indicate another region of local sensitivity around 12 kHz. But nobody really knows. It is so difficult to measure hearing above 8 kHz that every measurement produces surprising variations. The quarter-wavelength there is a fraction of an inch and minor variations in headphone cup placement, the shape of the pinnae, etc., all serve to make measurements impossible. Your measurements will vary drastically from mine. And we could both be correct – for our own purposes.

[If anything, there ought to be a comb filtering in the frequency domain, where odd multiples of the 1/4-wave resonance recur, and where even multiples become attenuated. That argues for a diminished sensitivity near 6.6 kHz, and an enhancement again at 9.9 kHz, then another diminish at 13.2 kHz, and so on.]

So I use an ATH model that is broadly correct:

{ATH}(f_{kHz}) \, {dBSPL} =  3.64 \, f^{-0.8} - 6.5 \, e^{-0.6 (f - 3.3)^2} + 0.001 \, f^4 - 3.37


The ATH curve (in red) is normalized to show 0 dBSPL at 1 kHz. All contours mimic the shape of the ATH curve.

dB’s Don’t Shrink…

And furthermore, the spacing between contours for 10 dB increments in presentation levels remains also 10 dB. There is no physical mechanism that I can imagine that should cause these levels to squeeze toward narrower spacing, at any frequency. The scale on the ruler remains unchanging. Only its zero point changes due to mechanical attenuation or enhancement.

[At both ends of the audible range it only seems like the contours are squeezing together. They really aren’t, at least not along the frequency axis. There is a 10 dB distance between all contours, at all frequencies.]

I will assume that the effect of a sound on a cochlear hair cell depends on the mechanical pre-filtering and the coupling efficiency between the fluid and the hair cell. But after the pre-filter, and for whatever that coupling is, the hair cell will be affected same regardless of its position along the basilar membrane, for the same level of excitement at the hair cell.

Hence, I shall consider that units of dBHL merely represent the elevation of the signal intensity above the ATH at that frequency. And the EarSpring and Conductor equations apply equally well at all frequencies, as long as the driving signal is measured in dBHL.

dBHL = dBSPL - ATH

The equations show what happens in loudness sensing, and of course, our hearing above 40 dBHL goes pretty much into cube root compression. So the scaling of our loudness sensation is 1/3 of the presentation scaling.

But we can’t put a measuring device on our loudness sensation. We can’t measure the actual dB of sensation. For that we would have to sense the sensation itself. What we measure instead is the equivalence of that sensation to some comparison sound being delivered at 1 kHz. And we give that test sound the same dBHL as however much sound had to be delivered at 1 kHz to produce an equivalent sensation.

So the dB scale retains its increment size, not divided by 3. The measurement takes place in the same space. If you could place a meter on the hair cell, you would see it moving by 1/3 as much, in dB measure, as the dB difference of two sound levels. When the two sounds are 10 dB different, that’s 10 times in power ratio, and the hair cell would demonstrate 3 dB of movement change, or twice as much movement for the louder sound.

People will report two sounds as being equally loud at different frequencies, or one appearing twice as loud as the other, whatever that actually means. We model that sensation with Sones units, and the equations show us how Sones develop with respect to presentation level, i.e. after mechanical filtering.

Filter Shapes

In my previous post I looked at using overlapping triangular bands in Bark frequency space. An improvement on that shape can be had by using Cosine-squared filter profiles, which also allows adjacent bands to sum to unity as a signal traverses one band and into the next.

An idealized view of our new filter bank, based on 2-Bark bandwidth Cosine-squared filters in the Bark frequency domain, but viewed from linear frequency space.

The improvement should provide lowered stop-band response. But when using FFT sizes of 256 samples at 48 kHz sample rate, the window of measurement in time is too short to show much improvement overall. Had we used longer time frames, and consequently longer throughput latency, we could do a more refined representation of the filters in the FFT domain.

A general rule of thumb that I have used in the past, is that your time frame duration should be at least 3 times the period of the frequency of interest. I would revise that now, to stating that it should be 5-6 periods, before you see acceptable behavior from the synthesized filter.

And the results are borne out with experiment. At 256 samples, that is 5.33 ms. One fifth of that is around 1.07 ms, which corresponds to a frequency of about 1 kHz. And the filter bank I produced does only start behaving decently above 1 kHz. But that’s where we need the behavior anyway, so we really aren’t penalized in this application.

Triangular passbands would be equally effective for all practical purposes, with such short FFT lengths. But I’ll stick with the Cosine-squared shapes.

Final Results

With the new hearing model, and the new filter bank installed, Crescendo now sounds superb. It was already quite good, but it is now even better.

The only detectable artifacts that I can hear are caused by things that have nothing to do with Crescendo. I have a pretty severe case of hyper-recruitment in a narrow region around 1700 Hz. Whenever that gets driven too hard, it causes a crunchy sound. And without other corrective measures, that artifact becomes strongly distracting. I can turn off Crescendo and the troublesome sounds still remain just as bothersome. So this has nothing to do with Crescendo.

But we already know how to fix that. Just ahead of Crescendo, I place a dipping EQ at 1700 Hz, width about 1/3 octave, and depth about 6 dB. And because of the magic of Crescendo, that dip pretty much takes care of all the crunchies at all levels. And now I can really enjoy the music.

Turns out, beneath those crunchies, it was being driven by harmonics of a sound down around 425 Hz. By dipping the crunchy zone at 1700 Hz, I can now hear that driving sound. I never realized it was there before.

  • DM