Low latency best practices

With all this talk about latency and buffer sizes I thought I’d remind y’all that the most important thing for the plugin itself is “how much time is in the buffer” - because thats what it has to work with to determine what note is played and how.

While the plugin obviously uses advanced overtone detection methods etc., at some point there just won’t be any good data for a good prediction and it will have to wait for the next buffer to make one anyway, increasing the latency again. So for reference, a 100 Hz wave takes 10ms to just go through one cycle.

That is actually not how spectral analysis work. The buffersize defines how big consecutive chunks of audio passed to fft analysis are, but does not say anything meaningful about how fast we can get reliable readings other than defining a lower limit for how fast we can get the first reading. Although it is true that the fundamental of a 100 Hz note takes 10ms to complete a cycle, this is not true of the upper partials. We know from hints from the creator that MG uses machine learning / neural networks for detection. We can’t know the specifics about how this works but my guess is that the MG neural newtwork is based on spectral descriptors like Melbands or something similar. Theses descriptor would at least in theory make it possible to detect a 100 Hz spectrally rich signal before one cycle of the fundamental frequency has happened. So, spectral analysis / fft analysis is a complicated issue and the buffer size is just one of many factors here. We tend to over emphasize the importance of the buffer size - maybe because it is the only parameter we have control over.

You seem to think I stated things I didn’t state. As I said, it’s using overtones for recognition, so as I also said, the time for a 100Hz wave was a reference. What it actually uses is hard to say as it’s a neural network.

A simple fact remains: The detection can not be faster than the buffer size so YES, the buffer size does say how fast we can get information. My whole point was that at a certain point a lower buffer size won’t give you a shorter detection time, as you just confirmed as well.

I was commenting on you first statement

With all this talk about latency and buffer sizes I thought I’d remind y’all that the most important thing for the plugin itself is “how much time is in the buffer” - because thats what it has to work with to determine what note is played and how.

I would argue, that no, that is not the most important thing. See my tests above which show that tracking speed is virtually the same across different buffer sizes. Also there is no ‘time in the buffer’. Fft analysis work in the frequency domain not the time domain so the buffer gets filled with a ‘snapshot’ of the spectral content of the audio, not the audio itself. One important aspect of this analysis is how much the windows (ie. consecutive fft blocks) overlap and this information we don’t have access to. But even with this information we couldn’t really say anything meaningful about what this would mean in practical terms.

The time spent discussing about latency increases it dramatically in each of your projects. :face_with_hand_over_mouth:

Oh my. Whatever. The FFT bins are directly dependent on the sampled time but believe whatever you want. I guess this is exactly why companies refrain from sharing technical detail.

On the other hand I think this is precisely why companies should share technical data. My stance on this is based on one assumption and some test. The assumption, which I think is pretty safe, is that buffer size in the MG interface relates directly to the window size of the fft analysis. If this is correct we can deduce how fast he first output of the first fft analysis is ready (the buffer size converted to ms) but without knowing the window overlap / hopsize we cannot know when the 2nd reading will be available. We don’t know if MG internally upsamples the signal which could affect this. In any case we are nowhere near getting outputs from MG based on one fft window being analyzed, which is not so surprising, to get good pitch confidence and onsets reading, usually you would need some more information than one buffer of analyzed fft data. So now the question is if we get better/faster readings with a smaller buffer size - my tests show that tracking latency is largely independent of buffer size, but if you can produce test with MG2 which shows better/faster results with smaller buffer sizes I would be very interested in knowing. From doing fft analysis in other softwares (mainly SuperCollier using various ugens) I am not so surprised that smaller buffer sizes do not automatically produce better / faster results.

And btw. I would love if I was wrong and you were right. I have been working with the smallest buffer size for a long time under the assumption that a smaller buffer size yielded better results. But after doing more thorough test it just does not seem to be the case.

You keep talking about your own talking points and don’t seem to read at all what other people are saying. I repeat for a third time: At some point a lower buffer size won’t give you a shorter detection latency - that is my whole point!

Still, the buffer size defines the base latency of the audio interface. This doesn’t have anything to do with MIDIGuitar or its algorithm, it’s a simple fact of digital audio interfaces. It’s simply impossible to get a lower overall latency (as in audio in to audio out) than the audio interfaces buffer size plus its hardware latency. So we still can expect some latency improvements when there will be lower buffer sizes for MG3.

But this whole topic is often misunderstood anyway, people seem to think the audio magically appears in the CPU memory. I extra made a video about buffer size and the DSP meter a while ago:

3 Likes

Ok , this is an incredible convo of latency and how to run this plugin. I’m glad to have so many great contributions to the thread. We are looking for clear “best practices” for sample rate and buffer size and anything else that can contribute.

Let me see if I am summarizing correctly …

  1. The latency of the audio interface is very important
    1b . The audio interfaces will be at lower latency when they run at a faster sample rate - aka 96k will be less latency than 44.1k

  2. Buffer size can reduce latency because it is possible that this plugin can detect the note value before the buffer size has completed.
    2b. If buffer size is smaller, there may be issues in the plugin tracking quality .
    2c. We have some disagreement over the optimum buffer size , but it is clear that MG2 works well at multiple buffer sizes and determining the best buffer size is not obvious.

Ok , so MG3 is locked at 44.1k /256 …based on our opinions , that seems like it would not be a best practice value for playability.

Wouldn’t 96k/128 be a better fit ?

As long as everyone agrees that the plugin is capable of tracking at 128 buffer , would this setting create less latency and thus more playability ?

I have a 200$ modern interface that is capable of running 96/128.

Could this be the best practice for how to set up the audio interface and the daw to run the plugin ?

What do you all think ?

In general higher sample rates is the best way to get better performance but wether sample rates above 44 will produces better results inside MG is yet to be seen. It could also depend on wether the signal is already upsampled inside MG. For instance, if a 44.1 kHz signal is already being upsampled by a factor 2 or 4, then there is a chance that higher sample rates will not be a significant improvement (other than the less-than-one-ms gained from ie. going from 48 to 96 kHz in regards to input latency.

When I tested different buffer sizes in MG2 (and I suspect this will be the same for MG3) there was no real benefits from using a smaller buffer size.

I asked the creator about this and some other stuff, he did not answer the questions regarding a possible increase in performance at higher sample rates or wether MG internally upsamples the signal. I guess we will have to wait and see.

The buffer size is the main parameter for the audio interface latency. The higher sample rate is only contributing because more samples per time (96k instead of 44.1k per second) means that one buffer holds a shorter amount of time.

For the recognition I doubt that the higher sample rate will do any good as this type of network is typically trained for one sampling rate and internally up- or downsamples to 44.1k anyway.

If you were talking about hardware input buffer size all along (which we have no control over in MG) we are in agreement. But then I do not understand your statement:

With all this talk about latency and buffer sizes I thought I’d remind y’all that the most important thing for the plugin itself is “how much time is in the buffer” - because thats what it has to work with to determine what note is played and how.

MG does not interact with the hardware buffer size, so which buffer size are you referring to in the above statement? And what do you mean by “how much time is in the buffer”? MG receives a continuous audio signal from the interface. How the signal was buffered by the interface is irrelevant to the operations inside MG and other audio apps but relevant to us humans as the (small) latency caused by the interface is added to any latency produced by MG.

Sure you can set the interface buffer size in MG3 - if you use it in standalone mode.

And yes, the buffer size is relevant (for the plugin as well) because pretty much all DAWs except Reason and Bitwig use the audio interface buffer size internally to send chunks of audio to the plugins. So no matter if you use the plugin or the standalone version, MG3 will get the audio data in exactly that chunk size and will deal with that internally. This also - again - means that it defines the minimum latency for the plugin. Simple as that.

Now, it seems to me (and this is speculation) that the (VST/AU) plugin internally always uses a minimum buffer size of 256. So even when you have you DAW set to 64 samples you will only hear output from MG3 after 4 calls of the plugins process() method. The fact that the standalone version doesn’t even allow for a different buffer size than 256 hints at that.

And now back to my initial post - IMO the fact that we currently have a fixed buffer size implies that we can expect slightly better latency when we get smaller buffer sizes but we shouldn’t expect that latency gain to be linear due to the fact that at a certain buffer size there is just not enough audio time to make any prediction from that. Agreed? :slightly_smiling_face:

Well, I hope you are right but I am not sure. It is impossible to know how different MG3 is from MG2 in this regard. When I compared all three buffer sizes in MG2 the tracking was virtually the same (within about 1 ms) and sometimes (maybe on average 1 out of 4 notes) 256 was faster than 64, but again just by one ms or less.

Now, it seems to me (and this is speculation) that the (VST/AU) plugin internally always uses a minimum buffer size of 256. So even when you have you DAW set to 64 samples you will only hear output from MG3 after 4 calls of the plugins process() method. The fact that the standalone version doesn’t even allow for a different buffer size than 256 hints at that.

That might be true, I don’t really have any experience writing VST or UA plugins. What do you base this on? Still, If we could actually get to 256 samples of latency that would be a game changer!

Are you sure about that? If that is true then changing the buffer size in MG would change the latency of monitoring your signal through a 3rd party software. When I run MG2 in tandem with SuperCollider, I monitor my signal trough SC and I haven’t noticed any difference when switching buffer sizes in MG2. It could be that the difference between 1.3 ms and 5.3 ms is so small I don’t notice. Oddly, the MG interfaces states 6 ms for a 256 sample buffer size at 48kHz instead of rounding down to 5. Or could it be that it depends on the interface and that Apollo Solo, which I am using, does not allow a client to change buffer sizes?

MacOS’ driver model allows for multiclient, that is when one application requests 64 samples buffer size and another requests 256 then the interface will run at 64 but the second app (i.e. MG3) will receive 256 sample buffer chunks anyway, i.e. 4 hardware buffers in one.

The times MG3 shows are the time equivalent one buffer. The actual in-to-out is at least twice that (one round to get the buffer to analyze and one round to send one buffer with sound out) - plus any hardware latency. Watch my video, it makes it a lot more clear.

1 Like