Optimization Notes
Notes on ProtoFlux optimization. Most of these only become relevant in extreme cases (such as making parsers or extremely large-scale projects), but it's nonetheless good to write down.
Some of these notes--most notably Bit Manipulation and Strings As Lookup Tables--most likely arise because of how old the .NET version is that Resonite currently runs, that being .NET Framework 4.6.2. Such an old compiler isn't quite optimized compared to those nowadays, making some of those ancient techniques that are usually resolved by compilers be relevant again. They might become much less relevant once the .NET 8 update happens, but we can only wait and see.
Dynamic Impulses
Dynamic Impulses on slots containing a lot of children are slow, at least when a lot happen in one update. I think it's just because of searching for the intended receivers. Not sure. In any case, if you only need one receiver for an impulse, you can directly reference the receiver node as the slot to trigger a dynamic impulse on. This makes a considerable speedup from my tests on slots of ~400 children and thousands up dynamic impulses per update.
Impulse Multiplexers Are Fast
Impulse multiplexers are O(1), since they just select the output impulse based on the index. It's how you obtain the index that can be the real bottleneck of code.
One interesting thing about impulse multiplexers is that any index outside the bounds of the multiplexer causes the impulse to drop, which is useful for parsers. You can combine this with the string as lookup table part to filter out any specific characters (such as whitespace characters) by setting them to 1 until you reach a non-whitespace character, which will be 0.
Reducing Branches
Branches are the main obstacle to avoid when trying to optimize code. Taking branches mean either waiting for the pipeline to complete before trying, or branch predicting and risking a miss. Compilers nowadays are really good at compiling out what one would assume to be branches into faster code, but due to the old mono runtime Resonite uses, ProtoFlux's architecture, or both, branches aren't quite optimized out that well, which can lead to really slow code. Here's a few ways to reduce the amount of branches in ProtoFlux.
Bit Manipulation
Bit Manipulation is achieving a desired result by leveraging individual bits of a number, usually involving a bunch of shifts and bit operators like OR, AND, NOR, etc.
For a large list of bit manipulations and other optimization goodies along the line that you could utilize in ProtoFlux, take a look at Bit Twiddling Hacks.
Strings As Lookup Tables
This one is a pretty neat one that allowed me to reduce my parsing of a 50k character long JSON string from ~260ms (using a two-input index of first match) to ~160ms. If you're making a parser and can be sure that the "relevant" characters (for e.g. escaping or determining a type) lie in pure ASCII, it doesn't take that much memory at all to make a lookup table. Here's how:
For your string, signify any "irrelevant" characters as the null character (0x00
) and any "relevant" characters as something other than the null character. Whatever the relevant character value is depends on your use case, but it can go up to 65535 given that a string is just an array of chars. For my JSON string parser, I made the relevant values decimal 1 and 2 for backslash and quote. These relevant values would then directly change the state and plug into an impulse multiplexer for determining what to do next. You can either make this string in game, which would involve either manual character entering (pain) or a ton of string multiply -> concat string char (also pain), or you can create it externally with a hex editor and import it as a .txt file. One thing to be careful about when using a hex editor is to not use any bytes above 0x7F, since importing a file with bytes above 0x7F will corrupt the data. Or, if you're smart, you can make the file UTF-8 from the get-go so it works... but that's also pain. I've personally never needed any values higher than 0x7F myself
Reducing Nodes
It's not just branches that can be reduced. Reducing nodes in general can pose a non-insignificant speedup. I highly recommend testing if reducing nodes actually matter for your use case before committing to it at all, but here's a few examples of what I have experienced.
Index Of First Match<char> VS Index Of String
An Index Of First Match<char> and an Index Of String node should theoretically be relatively the same thing with the same speed. You have to iterate over all the characters of the inputs or input string, comparing each one to the desired character. However, from my testing, Index Of String scales *much* better than Index Of First Match. This was mainly relevant for my "replacing characters with other characters" bit of JSON string parsing, but the results were quite surprising, with Index Of String being quite faster for large strings with ~8 needed replacements. I don't have concrete numbers, but I will probably add them later.
Reducing String Writes
Strings are more expensive than you might think. There's no real concept of slices in C# or ProtoFlux, so any string operation creates a completely new string that needs to completely override the old string. Consider the following pseudocode:
String input // user-provided input String output = "" // starts empty // define a state enum where the values are Normal and Escape enum State: { Normal, Escape } State state = Normal for (i from 0 to input.length - 1): switch (state): // no fallthrough case Normal: if (input[i] == '\'): state = Escape else: output = output + input[i] case Escape: // we want to use \n to represent a newline if (input[i] == 'n'): output = output + newline else: output = output + input[i] State = Normal // go back to the normal state }
Intuitively, one would think that output is a fixed memory location (that can get reallocated when needed), and "adding" to it is simply adding another character to the end of that same thing. However, translating this code to protoflux, that assumption breaks. Each write is a completely new string at a new memory location generated dynamically from the concatenation. This starts to get really slow as strings get longer and longer.
The solution is to buffer these writes. If most of the string is expected to be unchanged, then writes can be significantly reduced by only writing when you really need to. Here's how you'd rewrite the code to buffer these writes:
String input String output = "" enum State: { InitNormal, Normal, Escape } State state = InitNormal int buffer_start for (i from 0 to input.length - 1): switch (state): case NormalInit: buffer_start = i // start of write buffer is the current index state = Normal // FALLTHROUGH TO NEXT CASE case Normal: if (input[i] == '\'): // perform the actual write since input = output until here output = output + input.substring(buffer_start, i - buffer_start) state = Escape case Escape: if (input[i] == 'n'): output = output + newline else: output = output + input[i] State = NormalInit // this causes the buffer start to be correct } // append the rest to the output output = output + input.substring(buffer_start)
This causes the amount of writes to be linear with the length of the string. Since we expect most of the string to be "normal", and only escaping when we need a newline, this will result in a significant speedup. For my JSON string parser I mentioned earlier, it went from handling a 50k character long string in a full second or two to ~260ms (this was before the string lookup table mentioned before).