As I write this Blog, I’m re-examining all the things that had been put to rest over the past decade. This morning I was looking at the problem slightly differently than before. Specifically, I was re-examining my approximations to the EarSpring / Crescendo equations.
These two equations are both nonlinear, and require iterative techniques to solve them for specific values. You guess at an initial value, examine the error, and try again with a slightly different initial value. You repeat this over and over till you are “close enough”.
That’s all well and good for a dissertation discussing the solutions to these equations, but it doesn’t work very well for an audio plugin that has to process incoming audio in real time.
So I have always resorted to approximate solutions that can be computed very rapidly, and without the need for iterative searching. Remember, I have to compute these gains across 100 Bark bands, in stereo, every few milliseconds. That’s nearly 100,000 gain calculations every second. So they need to be efficiently computable.
In the beginning, I had no idea how precise I really needed to be. And those recruitment curves get mighty steep down near their threshold levels. So I thought, if I were going to boost the signal enough to overcome their distance from the raised thresholds, and then a little bit more to get the perception to the right level, then I’d better be pretty accurate in my estimates.
With a near threshold slope of around 10, that meant that if I wanted my perception to be within a dB, then my nudges better be accurate to 0.1 dB or better.
And that is largely true. But over the years of listening, I have learned that our loudness perception is not precise. Not like our pitch perception. And there are many fleeting things in music that make precision very difficult – for instance, tremolo.
At any rate, I developed very precise approximations all across the different levels of incoming loudness, and across all the different levels of hearing impairment. I created a bunch of precision lookup tables that can be interpolated for the in-between values. And those tables are computed on the fly at the nearest parameter values to the situation at hand, in real time.
The equations of approximation were fairly sophisticated rational polynomial approximations, computed with the Remez Exchange algorithm to achieve minimax errors – which means that the magnitude of the maximum errors in the approximations were minimized across the entire domain of the approximations, and all such extrema become roughly equal in magnitude. But more than that, I weighted the approximations so that they would be more accurate in the regions of steep slope. (Got that?)
And that works very well. But could there be anything simpler that would also work? Well there are a number of more crude approximations that could be tried. I already mentioned one of them as being a NYC Compressor with a specific threshold, compression ratio, and makeup gain, that depends only on your threshold elevation at each frequency.
Here’s another approximation that I discovered this morning:
That’s approximately the gain in dB that you need to apply, when your hearing threshold is elevated to dB, and the presentation level is
dB.
The error in this approximation is not minimax, by any means. But it seems to remain within 0.5 dB for the thresholds and presentation levels of most interest to us.
Here is a graph of the exponential portion of that approximation.
When you are very far away in presentation levels, from your elevated threshold level, there is almost no gain needed at the loud end. And at the faint end, you need mostly that linear term to bring the presentation level up to your hearing range. Right at your threshold elevation, you seem to need about 9.2 dB of gain to make it sound right.
Remember, I said it was a near-fact, mostly true over the range of sound levels in our daily experience, that the correction gain needed depends only on the difference between presentation level and threshold elevation? Here’s a comparison graph showing the agglomeration of the real gains needed, for all possible hearing thresholds relative to presentation level, using the accurate corrections derived from EarSpring/Conductor:
The red curves are the approximation we just gave, overlaid atop the actual correction gains for all the different threshold elevations.
I just arrived at the numbers 9.2 dB and 34, in the approximation, by fiddling with them and doing an eyeball estimation of errors. But do those numbers have any physical meaning?
I’m not too sure about the 9.2 dB number. Surely it must be related to the physics in some way, but I don’t have a good explanation for it yet.
But the other number, 34, is very closely related to the cube-root nature of our hearing. And despite having any hearing impairment, everyone’s hearing obeys that same cube-root compressive behavior at sufficiently loud sound levels. It is innate to our mechanical construction and the feedback system provided by the brain and 8th nerve.
The EarSpring equation shows that it isn’t exactly cube-root, and it declines ever so slightly to greater compression ratios at the very loudest levels. But a good approximation over everyday sound levels is cube-root compression.
So, okay… we have a new approximation. How does it sound?
I put the new approximation into the Crescendo plugin DSP code, in lieu of those elaborate minimax Remez rational polynomials with interpolation. Offhand, it sounds very nearly the same. Only a careful A/B comparison over many hours will reveal the true differences. I expect the greatest differences to show up at the faint levels where recruitment curves exhibit steep slopes.
If it sounds worse, or better, it will have to do with those faint scintillating high strings and female chorus music. But for now, it really does sound about the same…
And if it sounds okay, is there any reason to use it? or should we keep the existing precision gain calculations? Well, that depends on how efficiently we can do all of these calculations. The good enough, most efficient, algorithm wins.
- DM