BlueTrin wrote: Mon Sep 02, 2024 7:21 am
@BenSeldel only read the initial post, did you post a link to what you work on, it looks interesting outside of Factorio applications.
Unfortunately I didn't, and now I can't as the projects been shut down.
If your interested, it was an experimental ERP system with a laundry list of features. The main one was that all the system functionality was defined in a database and were editable while the system was running. Because those definitions could be changed at any time, it meant that the entire application structure had to be fully dynamic so that all behaviour and state was consistent. This included any outstanding transactions as we were guaranteeing sequential consistency (there is an ordering of atomic transactions that will produce the observed state). Eg, if someone created a new field and marked it as required, all running applications would have to put that field on the screen and enforce that the value is there (as well as it being enforced by the database server).
This had to happen on both the client and the server, as we never wanted anything to be restarted (for minor logic changes).
The system had a couple of evolutions in its internal data propagation systems: from callbacks, to observables, to a monadic implementation of the observables (map & bind), and then finally to "signals" - An Implementation of monadic observables that guaranteed glichless propagation of updates in O(N) time, where N is the number of observables that needed to be updated. At the time it was a big break-through as there was only one glitchless implementation that we were aware of and it had O(U) update time (The ELM programming language). Since then there have been a couple of other glitchless implementations, but this was 15+ years ago now and I've not stayed up-to-date.
The data propagation system was called "Signals" because it was based on the ideas in the VHDL programming language and the wires (interconnects) that are inside a chip. The rules were simple: Each wire carries a signal. It must have a signal at all times (even if it's random, such as startup). There is a system in place to prevent invalid or incomplete signals from propagating to other components before it becomes stable/valid. Components were only executed when all its inputs were stable. (Stable in this context means fully executed).
While the Signals propagation system sat as the main execution engine of the ERP, its major benefit wasn't for its multi-threaded capabilities but for it's code understand ability. It massively simplified the entire code base because it removed all the interconnection logic. For example, instead of creating a Textbox then hooking up the observables, you would pass in a Signal<Integer> for its width, Signal<Text> for its text content, etc, to a constructing function that would create the control and the connection logic would be entirely self-contained within the Signals system.
Aside - I remember logging the maximum number of outstanding updates, even though we only ran one thread at the time, and it was about 10,000 for a simple form. So we (in theory) had 10,000 code blocks that had their inputs ready to be run. This was the test that showed me that we (as computer engineers) don't have any understanding of the level of concurrency that potentially exists in even the most simple of application.
It wasn't until later that we tried rewriting the execution engine to use multi threading. We did a bunch of stuff, such as writing out values on each thread sequentially to a thread only buffer, and at the end of each clock, discarded what was the previous clock's buffer(s) that were being read from (we called them the "input slate" and the "output slate"). We also marked nodes as "deterministic" and just ran nodes 2,3,or more times on each thread instead of latching the value as all the inputs were latched anyway. We also merged nodes together so their intermediate values were never latched, but stored in the running context and discarded when that context was completed. This node merging technique was part of my original post where I said you could "move" the inserter from being attached to the box to being attached to the belt.
But most of that was just for fun (read: we didn't get any significant time to write any of these implementations so there were always bugs). We did eventually replace the single threaded execution system with a multi threaded one, but that just used a basic work-stealing algorithm as it was really simple to implement. Unfortunately, even that was hard to get in, as the single threaded version was "good enough" for our requirements. Not to mention there was a lot of code that was hand-optimised to run well on a single core that slowly had to be removed.
In the end we could run some extremely large computations on many cores and the system would "just work". I did have a graph of the performance characteristics back in the day (I don't know if it's in this thread), but IIRC, after about 4-6 cores, the system would be faster than running on the one core and would continue to raise in performance up to the max we could test (64 cores on a 4 socket NUMA server). *This doesn't sound right as I don't recall each socket having 16 cores in it, but I'm 99% certain we tested it up to 64 cores*. This graph was the only reason the work stealing multi threaded version got in.
While the application was an ERP and not a game, the lessons learnt showed me that multi threading is possible and that most of the rhetoric out there is absolutely wrong. Hence the almost sarcastic listing of the different multi threaded "techniques" in the original post.