Author Topic: FFT65536  (Read 241 times)

Hanuman

  • Posts: 107
FFT65536
« on: 7 Mar '23 - 15:55 »
I have this pitch-detection algorithm that relies on FFT. It works relatively well but is still a bit funky at times. The only thing that would improve its accuracy is to use FFT65536 instead of FFT32768. Any chance that this will be supported?

Ian @ un4seen

  • Administrator
  • Posts: 25067
Re: FFT65536
« Reply #1 on: 7 Mar '23 - 18:04 »
Are you already interpolating to get a more precise frequency? If not, you could try that. For example, something like this:

Code: [Select]
float y0 = fft[peaki - 1], // magnitude of peak-1 bin
y1 = fft[peaki], // magnitude of peak bin
y2 = fft[peaki + 1]; // magnitude of peak+1 bin
float peakf = peaki + 0.5 * (y2 - y0) / (2 * y1 - y0 - y2); // interpolate with neighbouring bins to finetune the peak location
freq = peakf * samplerate / fftsize; // translate it to Hz

Hanuman

  • Posts: 107
Re: FFT65536
« Reply #2 on: 9 Mar '23 - 09:33 »
I don't understand what this is doing, could you explain a bit more?

Someone also reported having better results with smaller sampling rate. If I reduce the sampling rate, then it has the same effect as increasing FFT, correct? More FFT bands per Hz.
« Last Edit: 9 Mar '23 - 09:38 by Hanuman »

Ian @ un4seen

  • Administrator
  • Posts: 25067
Re: FFT65536
« Reply #3 on: 9 Mar '23 - 15:40 »
Yes, reducing the sample rate will increase frequency resolution, like increasing the FFT size does. Note that when the frequency doesn't fall exactly in the centre of an FFT bin then its energy will be split between neighbouring bins, with the closer one having the greater share. Interpolation takes account of that to get a more accurate value (although possibly still not exact).

Hanuman

  • Posts: 107
Re: FFT65536
« Reply #4 on: 10 Mar '23 - 21:53 »
Did some tests. Interpolation does not help. Just like "rounding up/down" the FftIndex to the closest is worse than "round down" to the index.

I'm not exactly sure how much downsampling helps, or what is the optimal... 44100, 32000 and 24000 all have their merits and occasional "weird" values.

One thing that CAN definitely help, however, is to calculate all 3 and take the middle value.

Hanuman

  • Posts: 107
Re: FFT65536
« Reply #5 on: 11 Mar '23 - 02:32 »
I might as well ask something just to be sure.

Does the sample rate affect anything as to the tone range of the audio? I was assuming that 48000 had wider range and more bass, while 24000 is more limited to the ranged used for voice communication.

But when I'm calculating all the frequency tones over 5 octaves, it doesn't seem to change anything at all here? Algorithm seems to be working fine unless I'm missing something.

Code: [Select]
/// <summary>
/// Returns a cached array of tones 20 to 62 (5 octaves + a tone before and after).
/// </summary>
protected static float[] ToneFreq
{
    get
    {
        if (_toneFreq == null)
        {
            _toneFreq = new float[62];
            for (var i = 0; i < 62; i++)
            {
                _toneFreq[i] = (float)Math.Pow(2, (20 + i - 49) / 12.0) * 440;
            }
        }
        return _toneFreq;
    }
}

Also the code calculates on what tone (increasing the above array by increments of 0.1hz) it more closely matches FFT peaks, by summing all matching FFT values. Highest sum wins.

Then I can calculate at various sample rates and take the most confident value (highest sum).

One weird thing though, at lower sample rates, the sums are lower, which could be explained by narrower bands. But to make up for it and be able to compare the sums side-by-side, I need to multiply the sums by this factor. I had to go by trial-and-error testing to get this .2577 constant, and I don't understand it. Got any explanation here?

Code: [Select]
var adjust = (44100f / sampleRate - 1) *.2577f + 1;

Ian @ un4seen

  • Administrator
  • Posts: 25067
Re: FFT65536
« Reply #6 on: 13 Mar '23 - 14:37 »
Does the sample rate affect anything as to the tone range of the audio? I was assuming that 48000 had wider range and more bass, while 24000 is more limited to the ranged used for voice communication.

Higher sample rates can capture higher frequency sounds, up to half the sample rate (see Nyquist). So 48000 will have a wider range than 24000 but only at the high end.

But when I'm calculating all the frequency tones over 5 octaves, it doesn't seem to change anything at all here? Algorithm seems to be working fine unless I'm missing something.
...

Are you trying to detect the fundamental frequency of an instrument? If so, this old thread may help:

   www.un4seen.com/forum/?topic=13623

Hanuman

  • Posts: 107
Re: FFT65536
« Reply #7 on: 13 Mar '23 - 17:56 »
That's another pitch-detection algorithm that serves the exact same purpose? How good or precise is it?

I did an improvement to my algorithm that reduces flakiness and makes it pretty solid: I run 3 passes, at 42000hz, 34000hz and 27000hz, and take the most confident value. I run them in separate threads so it doesn't take more time to run. One further change I can do is to run first pass on 30 first seconds, 2nd pass on the 30 seconds after, and 3rd on the next 30 seconds; or 30 first, 30 middle and 30 last.

Ian @ un4seen

  • Administrator
  • Posts: 25067
Re: FFT65536
« Reply #8 on: 14 Mar '23 - 16:28 »
That's another pitch-detection algorithm that serves the exact same purpose? How good or precise is it?

With an instruments that has harmonics, it'll probably give you a more accurate value (and faster) than your current method. It won't work when harmonics aren't present (eg. a pure sine wave), but it could be tweaked to just use the single peak in that case.

Hanuman

  • Posts: 107
Re: FFT65536
« Reply #9 on: 18 Mar '23 - 06:33 »
does it work with rock music with a singer and a bunch of instruments? Real music get kind of noisy and hard to separate individual waves or instruments

Ian @ un4seen

  • Administrator
  • Posts: 25067
Re: FFT65536
« Reply #10 on: 20 Mar '23 - 14:24 »
Are you trying to detect the key of music? I've never looked into key detection myself, so unfortunately I can't advise on it, but I believe it is a bit more complicated than just detecting the peak frequency. Perhaps there are ready-made libraries that you can use for it? For example, you could then get PCM data (instead of FFT) from BASS_ChannelGetData, pass it the library, and get back the key.