Crescendo splits the incoming audio stream into separate Bark-bandwidth channels for power estimation. The incoming power is related directly to how loud things will sound coming out of your speaker/headphones. As sound grows louder in any channel, the dynamic compression becomes eased so that very loud sounds get almost no help at all – you can hear those just fine. But as sounds grow fainter, the compression increases, causing additional gain to be applied to compensate for recruited hearing in each channel.
Splitting audio into separate Bark bands is tricky. We have to live by the “Uncertainty Principle”, which, in this case, states that the quality of filtering (the ability to cleanly separate audio into each band) must be balanced against the competing demands of short throughput latency.
For real-time use of Crescendo, in live performance settings, we want to keep the throughput delay below 10 ms, or less. Crescendo actually tries to keep the latency below about 5 ms. That small delay is just barely perceptible as a flam. Of course any general purpose computer adds on its own buffering latencies, which makes it grow worse. But that is beyond the control of Crescendo. Tightening up more than a computer will allow requires the use of dedicated computing hardware. And we do that for live session settings.
But the short processing latency in Crescendo also implies that we would have trouble cleanly dissecting the audio into channels below about 600 Hz. There are 7 Bark bands below 600 Hz, yet all that power falls into only 3 raw processing bands of the FFT. This would be a problem, except for the fact that sensioneural impaired hearing becomes essentially normal below 600 Hz. Any threshold elevation measured at 250 Hz and 500 Hz is going to be so slight that the Crescendo corrections would be imperceptible if they would be made.
But there is another problem with fast filtering… Fast filters have wings that are sensitive to sound frequencies outside of their band. A strong signal in a different band will bleed slightly into adjacent bands, causing those adjacent bands to believe that some small amount of power is present, even when there really isn’t any power in their bands.
This is exacerbated by the fact that most music is very bass heavy, with power falling about -6 dB/octave above 1 kHz. So if a 1 kHz Crescendo filter were listening for 1 kHz signals, the bass would bleed into the filter.
So what to do… Up until now, we just lived with these facts, and it sounds pretty good anyway. But detailed analysis and measurement shows the presence of subtle artifacts. Can we do any better? What about that uncertainty principle?
As it happens, we can do something to make the filters behave better. If you performed a series of measurements where you feed precisely band-limited white noise into a Crescendo, and record the power response in every Crescendo Bark channel, and do this feeding the noise into each channel, one at a time… You end up with 24 separate records, for 24 Bark bands, each record showing the power response of all 24 channels, where the noise is fed to one particular channel at a time. There are a total of 24 different noise placements, and for each placement, there are 24 different power responses across the filter bank.
This image is a composite made from each record of the measurement. Each row of this image corresponds to a different placement of band-limited noise. Left to right shows the response from each Bark band, with bass frequencies to the left and treble to the right.
This image is scaled to show dB power in each Bark band. Strong signals are white, fading to nothing as black (-140 dB). You can see our difficulty in measuring the lowest Bark bands where the region in the upper left shows some funny business with filter channel responses. That’s okay, we are going to ignore those channels anyway.
But starting in about row 8, Bark band 7, we are wanting to respond to power between 639 Hz and 915 Hz, which covers the 750 Hz audiology test band. That is about the first band that can begin to show sensioneural impairment at measurable levels. And from Bark band 7 on toward higher frequency bands, our image looks relatively clean – except for the responses to out-of-band power…
Take any row, see where the brightest spot is – that is the channel to which precisely band-limited noise was fed. All the other blocks on the same row show the response of other Bark bands to that noise. Ideally, there should be no response except in the band where the noise was presented.
Similarly, take any column, and that shows the effective filter shape, where the boxes toward the top of the image correspond to sensitivity to bass frequencies, and boxes toward the bottom show the filter response to higher frequencies.
Ignoring the mess in the upper left corner for a moment, all the rest of the image shows a diagonally dominant character. You could imagine that this image resulted from stacking a series of measurements, each of which can be described by a measurement matrix, , applied to all the incoming channels of audio, vector . The output from that matrix multiplication produces one row of the image:
where is our measurement vector. (PS: that image matrix IS the measurement matrix, )
In our measurement case, we actually know that the result should light up just one ideal Bark channel, and leave all the others untouched. Our test noise signal was just in one channel. We would wish that our filters formed a measurement matrix that was purely diagonal. What we ideally want to see is that vector, not our measured vector.
So how about ? This says that if you measure , then you can get the ideal spectrum by matrix multiplication of the measurement vector by the inverse of the measurement matrix.
Suppose we idealize a diagonally dominant matrix, , as composed of a diagonal matrix, , and all the rest of the off diagonal elements in another matrix, :
We can expand the right hand side to first order in as:
Since the original matrix was diagonally dominant, we can be sure that the neglected terms in and higher are much smaller.
So, let’s try that out. When we use our first order approximation against the test measurement vectors, and stack the corrected measurements into another image, we get:
This new image has exactly the same scaling applied as in the first image. Only the rows which were diagonally dominant were affected by the correction (those measurements for bands 7 and higher). All the black regions are where the decibel scaling was applied to zero or negative values. Since we can’t have negative power in any measurement, we arbitrarily set those to -140 dB. Same for zero power.
But this image shows what we want for an ideal measurement. That crude first order approximation to the inverse measurement matrix worked great. (Ignore the dropped region in the upper left – we won’t use those bands anyway.)
And, indeed, when I plant this correction into the Crescendo algorithm the results sound fabulous. Bench tests with Plugin Doctor now show no more evidence of subtle artifacts caused by filter bleed.
This kind of restoration is also known as blind deconvolution. All measurement correspond to the convolution of the object being measured with the smearing produced by measurement. In this case our measurement apparatus consisted of a bank of approximate Bark band filters. By applying the correction, we end up very close to ideal Bark filters.
So what do these filters look like anyway, and what causes the filter bleed?
Here is an example filter for the 2 kHz Bark band #13, showing what it looks like as an FIR filter, and its frequency response:
In general, FIR filters won’t perform well unless there are at least 3 cycles of the center frequency waveform across the FIR window. But in this case, for Bark Channel #13, we easily satisfy that condition, and the 2 kHz filter performs quite well.
All filters in Crescendo have the same length, 256 taps. This means that all filters cover a duration of (provide an average power estimate over) 5.3 ms, with the impulse response being delayed by half that amount, or 2.7 ms, for a sample rate of 48 kHz. At 44.1 kHz, these numbers are higher by 10%.
For performance reasons, we don’t compute 24 simultaneous 256-tap FIR filters during power estimation for each Bark band. That would be far too costly, causing Crescendo to perform roughly 1,000 times more work than it does already. Rather, we take advantage of the equivalence with an FFT providing simultaneous FIR filtering across 128 uniformly spaced frequency bands. And then to get our non-uniform bandwidth Bark channels we have to re-map the FFT output into 24 Bark channels.
All those ripples in the region away from the passband are caused by the “windowing” on the FIR filter – its broad envelope opening from zero at one end, then widening in the middle, and then dropping back to zero at the other end of the FIR filter.
That windowing causes the center passband to become slightly wider than planned, and the shape of the window envelope can control the level of stopband ripple to some degree. But the ripples exist because of the finite width of the filter. The only way to get no stopband rippling is to have a filter of infinite length.
At the higher frequencies, a Bark band is roughly 1/4 octave wide. Just slightly narrower than the filters in a 30-band 1/3-octave graphic equalizer.
But remember… All of this work is only a crude approximation of our reality. Nature does whatever it does, and the job of physics is to try to understand by modeling the obvious character we see in nature. Bark bands approximately describe our hearing, but our hearing doesn’t strictly channelize sounds into discrete Bark bands. Rather, the loudest sounds form self-organizing critical bands around that sound, whose width is roughly one Bark wide at that frequency. Our Crescendo “model” is a crude attempt to match nature.
Hey! Why aren’t you using GammaTone or GammaChirp filters here?
That is an intelligent question, and I gave a lot of thought to it. The reason why I consciously chose NOT to use them is because I am not building a model of the cochlea here. Our cochlea are still present and continue to perform their own behavior while we listen to music.
GammaTone and GammaChirp filters both have fat tails toward the bass frequencies, giving rise to upward loudness masking of sounds. I don’t want to do the masking all over again, while your own cochlea will do it for you. If I had fat tails picking up bass contributions, then it would suppress the compressive gain being applied to higher frequency Bark bands. And remember, most music is bass heavy, so it will have a significant effect. Your cochlea will do that for you, so I should not also do it.
All I am trying to do is to elevate, in a safe and intelligent manner, those sounds where we have trouble detecting them at faint levels. I am treating the whole system of hearing, not just the cochlea. That gives them a fighting chance to be heard again.
- PPS: Well, after a week of listening through these upgraded Bark filters, I’m finding that certain kinds of music, especially vocals with almost absent music, demonstrate peaky behavior on some higher pitches. That gives rise to a sharp attack chirp artifact.
What seems to be happening is that adjacent channels have now been robbed of power estimation, which causes them to operate at much higher levels of compression gain, allowing an overloud component through as the tonal pitch varies nearby in frequency.
So, while compression artifacts are substantially reduced with the narrowed Bark filters, worse tonal artifacts are allowed through. Some “good ideas” are worse… time to revert.