Performance bottleneck with BASS_MIDI_StreamEvent(s) under Linux

Started by bree,

bree

Hi Ian,

I've encountered an unusual issue that's causing sluggish playback with Black MIDIs on Linux.

While adding multiplatform support to OmniMIDI (based on the OMv2 codebase), I had several Linux users — including myself — test it with large MIDI files. We all noticed that BASSMIDI on Linux doesn't seem to handle as many events as the Win32 version does.

I ran a series of tests to pinpoint the cause, including timing how long each function takes to return while processing the events buffer. The results consistently point to BASS_MIDI_StreamEvent(s) taking significantly longer to execute on Linux, which leads to playback slowdowns. Interestingly, this happens without any audio dropouts — the streams are simply starved of MIDI data, so it just sounds like the tempo is slowing down.

At first, I suspected it might be due to improper initialization of the libraries, but since the codebase is shared across platforms and both Win32 and Linux are initialized identically, that seems unlikely.

It seems that BASS_MIDI_StreamEvent(s) just performs better on Win32 for some reason. Would it be possible to look into this further?

Thanks in advance!

Ian @ un4seen

That's strange, as the BASS_MIDI_StreamEvent(s) code is the same on all platforms. To confirm that the performance difference is entirely platform-related, are you doing the Linux vs Windows comparisons on the same hardware? Are BASS_MIDI_StreamEvent and BASS_MIDI_StreamEvents equally affected, or one more than the other? Please also confirm whether you're using async events, and whether changing that makes any difference to your performance results.

bree

Quote from: Ian @ un4seenThat's strange, as the BASS_MIDI_StreamEvent(s) code is the same on all platforms. To confirm that the performance difference is entirely platform-related, are you doing the Linux vs Windows comparisons on the same hardware? Are BASS_MIDI_StreamEvent and BASS_MIDI_StreamEvents equally affected, or one more than the other? Please also confirm whether you're using async events, and whether changing that makes any difference to your performance results.
Hi Ian.

Yes, I use the same exact hardware configuration on both OSes (Windows 11 24H2 vs. Fedora Linux 41).

BASS_MIDI_StreamEvents is slightly slower than BASS_MIDI_StreamEvent, but that's probably because it has to internally convert the RAW MIDI data into a BASSMIDI event. Both are slower under Linux, compared to Windows.

I am using the BASS_MIDI_ASYNC flag on both, disabling it makes performance worse on both platforms, with Linux lagging behind Windows.

Ian @ un4seen

OK. Perhaps it's related to synchronization primitives (eg. used for event buffer access) then, as that stuff is platform-specific. I'll look into that.

Is this the code that you're currently using?

    https://github.com/KeppySoftware/OmniMIDIv2/blob/9b2b9e3a37d9d0fba79b9667194a560bafdfc05c/OmniMIDI/src/BASSSynth.cpp

If so, I would suggest removing the BASS_ChannelUpdate calls, at least when using async events. Async event timing granularity isn't affected by update rate, so there's no need for lots of very small updates and perhaps they're delaying the BASS_MIDI_StreamEvent(s) calls. With playback buffering disabled (BASS_ATTRIB_BUFFER=0), the BASS_CONFIG_DEV_PERIOD setting determines the update rate.

If you play a troublesome MIDI file all the way through and then play it again (in the same stream), does it seem better the 2nd time? Please also check the BASS_ATTRIB_MIDI_QUEUE_ASYNC / BASS_ATTRIB_MIDI_QUEUE_BYTE / BASS_ATTRIB_MIDI_QUEUE_TICK values at the end on Windows and Linux.

bree

Quote from: Ian @ un4seenOK. Perhaps it's related to synchronization primitives (eg. used for event buffer access) then, as that stuff is platform-specific. I'll look into that.

Is this the code that you're currently using?

    https://github.com/KeppySoftware/OmniMIDIv2/blob/9b2b9e3a37d9d0fba79b9667194a560bafdfc05c/OmniMIDI/src/BASSSynth.cpp

If so, I would suggest removing the BASS_ChannelUpdate calls, at least when using async events. Async event timing granularity isn't affected by update rate, so there's no need for lots of very small updates and perhaps they're delaying the BASS_MIDI_StreamEvent(s) calls. With playback buffering disabled (BASS_ATTRIB_BUFFER=0), the BASS_CONFIG_DEV_PERIOD setting determines the update rate.

If you play a troublesome MIDI file all the way through and then play it again (in the same stream), does it seem better the 2nd time? Please also check the BASS_ATTRIB_MIDI_QUEUE_ASYNC / BASS_ATTRIB_MIDI_QUEUE_BYTE / BASS_ATTRIB_MIDI_QUEUE_TICK values at the end on Windows and Linux.
Hi,

Yes, that's the code I'm currently using.

The reason I'm creating separate audio threads using BASS_ChannelUpdate is that I implement some multithreading techniques to handle 25–30k active voices in real time. I assign each channel to its own thread, and with the ->ExpMTKeyboardDiv option, I can further subdivide each channel into smaller chunks—for example, setting it to 4 would divide a channel into 4 key chunks, resulting in 16 × 4 = 64 threads.

If I don't run these threads, playback tends to stall. It seems like this is due to the single (possibly built-in?) BASS thread, which updates based on BASS_CONFIG_DEV_PERIOD. That, in turn, leads to audio dropouts.

I'll take a look at the queue values and follow up with another message shortly.

Ian @ un4seen

It looks like those threads are optional? If you don't enable them, is the performance then the same on Windows and Linux?

I notice the threads have different Utils.MicroSleep calls (and implementation) on Windows and Linux, which seem like they could cause timing differences. To check if the problem is related to that, can you try using the same Utils.MicroSleep stuff on Windows as on Linux and see if the problem happens on Windows too then?

Btw, from some initial tests, the synchronization primitives I mentioned appear to be just as fast on Linux as Windows, so it seems unlikely they're causing the problem (but more testing may be needed for edge cases).

bree

Quote from: Ian @ un4seenIt looks like those threads are optional? If you don't enable them, is the performance then the same on Windows and Linux?

I notice the threads have different Utils.MicroSleep calls (and implementation) on Windows and Linux, which seem like they could cause timing differences. To check if the problem is related to that, can you try using the same Utils.MicroSleep stuff on Windows as on Linux and see if the problem happens on Windows too then?

Btw, from some initial tests, the synchronization primitives I mentioned appear to be just as fast on Linux as Windows, so it seems unlikely they're causing the problem (but more testing may be needed for edge cases).
Hi,

Those threads are not optional. Disabling them would actually worsen performance.

I did try disabling them as you requested, and the slowdowns still occurred even without them.

Also, prior to the last few commits, the MicroSleep behavior was identical on both Windows and Linux, yet Linux still experienced worse slowdowns compared to Windows.

Quote from: Ian @ un4seenIf you play a troublesome MIDI file all the way through and then play it again (in the same stream), does it seem better the 2nd time? Please also check the BASS_ATTRIB_MIDI_QUEUE_ASYNC / BASS_ATTRIB_MIDI_QUEUE_BYTE / BASS_ATTRIB_MIDI_QUEUE_TICK values at the end on Windows and Linux.
I tested this, and replaying it a second time does not have improved performance. It still seems to struggle whenever BASS_MIDI_StreamEvent(s) is called too frequently.
What exactly should I be looking for in the queue queries?

Ian @ un4seen

Quote from: breeThose threads are not optional. Disabling them would actually worsen performance.

I did try disabling them as you requested, and the slowdowns still occurred even without them.

Is the performance the same on Windows and Linux then, ie. is there only a difference between them when the threads are enabled? If so, that would seem to suggest that the thread code is where to look for platform-specific differences, such as the MicroSleep stuff. Please try making that stuff the same on Windows and Linux, and see if performance is then the same on both.

Please also confirm that you're using the exact same settings on both platforms, eg. same BASS_SetConfig / BASS_Init / BASS_MIDI_StreamCreate parameters. Some differences there could cause performance differences.

Quote from: breeI tested this, and replaying it a second time does not have improved performance. It still seems to struggle whenever BASS_MIDI_StreamEvent(s) is called too frequently.
What exactly should I be looking for in the queue queries?

What BASS_ATTRIB_MIDI_QUEUE_ASYNC / BASS_ATTRIB_MIDI_QUEUE_BYTE / BASS_ATTRIB_MIDI_QUEUE_TICK values are you seeing (from BASS_ChannelGetAttribute) on each platform? The purpose of checking this is to see if more events are being queued on one platform than the other for some reason (most likely different update rates). When checking these values, please make sure you don't set them yourself first (let them start at the default 0) - BASSMIDI will automatically increase them as needed.

bree

Quote from: Ian @ un4seen
Quote from: breeThose threads are not optional. Disabling them would actually worsen performance.

I did try disabling them as you requested, and the slowdowns still occurred even without them.

Is the performance the same on Windows and Linux then, ie. is there only a difference between them when the threads are enabled? If so, that would seem to suggest that the thread code is where to look for platform-specific differences, such as the MicroSleep stuff. Please try making that stuff the same on Windows and Linux, and see if performance is then the same on both.

Please also confirm that you're using the exact same settings on both platforms, eg. same BASS_SetConfig / BASS_Init / BASS_MIDI_StreamCreate parameters. Some differences there could cause performance differences.

Quote from: breeI tested this, and replaying it a second time does not have improved performance. It still seems to struggle whenever BASS_MIDI_StreamEvent(s) is called too frequently.
What exactly should I be looking for in the queue queries?

What BASS_ATTRIB_MIDI_QUEUE_ASYNC / BASS_ATTRIB_MIDI_QUEUE_BYTE / BASS_ATTRIB_MIDI_QUEUE_TICK values are you seeing (from BASS_ChannelGetAttribute) on each platform? The purpose of checking this is to see if more events are being queued on one platform than the other for some reason (most likely different update rates). When checking these values, please make sure you don't set them yourself first (let them start at the default 0) - BASSMIDI will automatically increase them as needed.
Hello.

Sorry for taking so long to get back to you. I did various tests, and the value for BASS_ATTRIB_MIDI_QUEUE_ASYNC does not seem to change. There's nothing BASS_ATTRIB_MIDI_QUEUE_BYTE because I do not make use of that.

I think the problem stems from me creating *lots* of BASS threads at once, it seems to struggle when I create more than 32 at once. At around 256, playback becomes so sluggish that even small Black MIDIs seem to slow down, even though the CPU usage per thread is pretty low.

Ian @ un4seen

Are such numbers of threads helping performance on Windows? What about if you limit it to the number of CPU threads?

bree

Quote from: Ian @ un4seenAre such numbers of threads helping performance on Windows? What about if you limit it to the number of CPU threads?
They do help achieve a higher voice count (since I split the load between all the threads, by spawning multiple threads with a low voice limit), but BASSMIDI doesn't seem to like handling that many streams at once.
If I limit the CPU threads, I start having issues when one of the streams goes above 1536-2048 voices at once, since I have to compensate the lack of additional threads with a higher voice limit.

Ian @ un4seen

Have you tried passing the MIDI events directly to the BASSMIDI stream(s) without first buffering them? If not, you could give that a try. When using async mode (BASS_MIDI_ASYNC), BASSMIDI buffers the events itself, so your own event buffering seems unnecessary and perhaps it's reducing performance somewhat? And also adding latency.

bree

Quote from: Ian @ un4seenHave you tried passing the MIDI events directly to the BASSMIDI stream(s) without first buffering them? If not, you could give that a try. When using async mode (BASS_MIDI_ASYNC), BASSMIDI buffers the events itself, so your own event buffering seems unnecessary and perhaps it's reducing performance somewhat? And also adding latency.
I benchmarked my buffer system on its own, and it is more than fast enough to feed BASSMIDI with enough data. It handles XSynth (another open source synth) just fine, but with BASS it seems to struggle because of how it has to wait for BASS_MIDI_StreamEvent(s) to return, so it is definitely not an issue on my end. Raising the threads count makes the problem worse without apparently increasing CPU usage on my code, so it seems to point more and more to an issue in BASSMIDI itself.

Ian @ un4seen

I guess your event buffering system predates BASSMIDI's async mode. The async mode includes its own event buffering, so yours shouldn't be necessary now when using that. The extra buffering probably isn't affecting performance much, but any saving is good :) ... I think the greater benefit of removing it would actually be improved event timing and latency. Your buffering will be negating the async mode's event timing enhancements because BASSMIDI doesn't know the real time of an event then (it only knows when the event left your buffer). You can still keep your multiple streams and BASS_ChannelUpdate threads.

I've added a new BASS_ATTRIB_MIDI_QUEUED_ASYNC option to check how many events are currently buffered, which could be used to prevent things getting out of control, ie. drop events when it gets too high (you can use your old event buffer size here). You should probably only drop note-on events, and not note-off events to avoid hanging notes.

    www.un4seen.com/stuff/bassmidi.zip
    www.un4seen.com/stuff/bassmidi-linux.zip

This update also includes some performance tweaks to the async event buffering, so you might notice an improvement even without removing your buffering. Let me know if you have any trouble with it.

bree

Quote from: Ian @ un4seenI guess your event buffering system predates BASSMIDI's async mode. The async mode includes its own event buffering, so yours shouldn't be necessary now when using that. The extra buffering probably isn't affecting performance much, but any saving is good :) ... I think the greater benefit of removing it would actually be improved event timing and latency. Your buffering will be negating the async mode's event timing enhancements because BASSMIDI doesn't know the real time of an event then (it only knows when the event left your buffer). You can still keep your multiple streams and BASS_ChannelUpdate threads.

I've added a new BASS_ATTRIB_MIDI_QUEUED_ASYNC option to check how many events are currently buffered, which could be used to prevent things getting out of control, ie. drop events when it gets too high (you can use your old event buffer size here). You should probably only drop note-on events, and not note-off events to avoid hanging notes.

    www.un4seen.com/stuff/bassmidi.zip
    www.un4seen.com/stuff/bassmidi-linux.zip

This update also includes some performance tweaks to the async event buffering, so you might notice an improvement even without removing your buffering. Let me know if you have any trouble with it.
Hi.

Your update completely fixed the problem, now BASSMIDI doesn't cause the playback (or the MIDI app, if both UI/piano roll rendering and audio playback are on the same thread) to stall.

It does still randomly locks up from time to time if I directly send the events to BASSMIDI, even when using its own internal async buffer, which is why I decided to keep my own events buffer in between as well, as to decouple the MIDI events processing (BASS_MIDI_StreamEvent[\s]) from the app's MIDI thread (the one that calls midiOutShortMsg/SendDirectData). I will though offer an option to directly send to BASSMIDI instead, if people want lower latencies.

Thank you for the update!

Ian @ un4seen

Great to hear the update helped. When you say it sometimes "locks up" when sending events directly, do you mean a deadlock or just a delay? If a deadlock, please get a dump file of that to have a look at.

Were you monitoring the new BASS_ATTRIB_MIDI_QUEUED_ASYNC value to limit the number of buffered events when sending them directly? If not, you could try that, and also preallocate the event buffer via the BASS_ATTRIB_MIDI_QUEUE_ASYNC option to avoid possible delays from having to expand the buffer mid-playback. You should probably have 2 BASS_ATTRIB_MIDI_QUEUED_ASYNC limits: one for note-on events and a higher one for other events, with BASS_ATTRIB_MIDI_QUEUE_ASYNC set to the latter number.

bree

Sorry for taking so long to get back to you, I was out for vacation. :(

Quote from: Ian @ un4seenGreat to hear the update helped. When you say it sometimes "locks up" when sending events directly, do you mean a deadlock or just a delay? If a deadlock, please get a dump file of that to have a look at.

Were you monitoring the new BASS_ATTRIB_MIDI_QUEUED_ASYNC value to limit the number of buffered events when sending them directly? If not, you could try that, and also preallocate the event buffer via the BASS_ATTRIB_MIDI_QUEUE_ASYNC option to avoid possible delays from having to expand the buffer mid-playback. You should probably have 2 BASS_ATTRIB_MIDI_QUEUED_ASYNC limits: one for note-on events and a higher one for other events, with BASS_ATTRIB_MIDI_QUEUE_ASYNC set to the latter number.

What I meant is that it would stall the playback and resume after a short delay, yeah. It wasn't deadlocking.

In the end we just kinda scrapped the idea of having 64+ BASSMIDI streams outputting to the default audio device (kinda too much oops), and moved to a different approach of having still lots of streams, but this time with the BASS_STREAM_DECODE flag applied to them; now we gather the data from all the streams using BASS_ChannelGetData, put it in a temp buffer and then mix it all together before sending it to the final output, using PortAudio. :)

This change got rid of the problem altogether, and made it so Linux now actually outperforms Windows in terms of note throughput. ;D

Thank you for the help though, I will open another topic for another issue we're facing right now.