First, Why read the post(s)?
I have spent the last 6 years researching and writing a dynamic application whose structure is not known at compile time. The current implementation is capable of executing on a significant number of cores, not relying on pure CPU clock speed but more on memory transfer rates. I can't go into the exact details of the application, but I can discuss the lessons I have learnt along the way.
So, Multithreading is easy. Just ask any functional programming guru. Haskell supports as many cores as you can throw at it. There are even runtimes that will distribute your application over clusters of computers. For those that are wondering, Haskell is turing complete. If Factorio was written in Haskell then it would be using all the cores on everyone's machines. Unfortunately, functional programming is terrible, just ask any procedural programmer (Note that procedural programming is the binary opposite of functional programming, not a "step-down" or alternate to object-orientated programing). Now, don't get me wrong, procedural programming is just as bad as functional programming. While they both have their strengths, they both have their weaknesses. Inserting an item into an array is trivial in a procedural language but extremely expensive in a functional one. Multithreading comes naturally to a functional language, but is extremely difficult in a procedural one. We have two extremely different, incompatible techniques at our disposal that solve different issues but we are unable to use them both. So what do we do? We suffer.
Reactionary programming to the rescue!
Yes that's right. Reactive programming. As embodied in RX.net and the LINQ programming techniques put out by Microsoft over the last 5? or so years. With it's roots back in a paper published in 1997 entitled "Functional Reactive Programming", Reactive programming is supposed to merge the functional world with the procedural world. Unfortunately it doesn't. It's crap. Don't touch it with a 10 foot pole. You may think that this is my opinion, and you can think that. But you will eventually realise that it does not work when you try to use it. Reactionary programming leads you to believe that you get all the benefits of functional programming: multithreading in a thread-safe manner while keeping the mutability of your data. Unfortunately that's not the case. Issues with event propagation orders and mutability of data start cropping up in all but the most trivial examples. Why? Because you cannot have thread-safe processes WITH mutable data.
Oops... What am I saying. Of course you can. That is what semaphores are used for right? Well not really. Current techniques for controlling access to memory involve ensuring that only one thread can write to that memory at a time while no others can read from it or ensuring that the result of a process being run twice gives the same results. If you only have one thread accessing your data then it's not multi-threaded. Blocking operations, while not an issue on a single core machine, fail when put onto multi-cored machines. There are non-blocking algorithms for many tasks, but you usually find them hidden away somewhere within these higher-level memory access control structures, or are specific to one use-case.
So what do we do? Do I have some new technique that no one knows about? Hell no, It's been around for decades. If you want to program asynchronously, then all you have to do is ask the experts: CPU designers. Every component on the chip executes its function independently of the others. What do you think the pipeline is? So how do they keep this all in check? How are they able to keep their mutable state from destroying their concurrent execution of the various parts of the CPU? They use a clock. The clock synchronises all the executions within the CPU so that information can be moved from one area of the chip to another area of the chip. At no other time does information get moved through these "clock boundaries". In this way they can have memory holding state between executions, yet still allow each process to be executed concurrently.
Programming with a clock:
Yes, you need a clock. However, a clock is not a literal clock or a time-piece. A clock is simply some point at which you say "I am done, here is my result", except that everything running must say it at the same time. Clocks allow separate, asynchronous pieces of code the ability to synchronise. It's the time where one thread publishes its results of its previous task and takes the input it needs to do its next task. Clocks are the key to high-level asynchronous programming.
How does this apply to Factorio? and games in general? Well, they have a natural clock - the frame update cycle. This is a natural starting point to implement a clock. At the end of each frame-rendering (or in the case of Factorio, the game update cycle) each process can say "I have finished with my input, and here is my output". If we look at what occurs in a chip, we would see the output bits being copied across the clock boundary, being placed in the output lines of the chip component. As these output lines are connected to the input lines of other components, they write to all the input memory of the dependant components. As we don't want to copy all our memory we can use a tick-tock memory structure to get the same effect by redefining what is the "output line" memory the same way double-buffered video memory works.
So what is left?
To define an algorithm that takes the "output lines" memory to generate the next state of the game that can be used as the next cycle's "output lines".
That is it. Using a clock you can develop applications that can use thousands or millions of cores, in the same way a CPU has millions of components. Once these components are defined, you put them in a list, divide the lists up over your threads and iterate. It may seem like it's too good to be true, and for the most part it is. It's about as good as object orientated programming. Sure it took you some time to get your head around the ideas behind it, and sure after years of it being the industry standard you are beginning to realise that it's not a one-stop solution for all your issues, but by hell does it solve a specific subset of the issues you are facing. Clock based programming is exactly the same. It does not remove the need for semaphores and locks, they are there to solve an entirely different issue. It only helps with getting your program to run over many threads, at a reasonably high level in your code.
The following section seems to be a major sticking point in the rest of the thread.
For those that don't know what example means: the dictionary definines it as "A thing characteristic of its kind or illustrating a general rule". As stated within the first line of the following, It is an example, not the way it should be/will be/is the best way to/is viable, etc implementing it.
The chest is given the previous states of the 2 inserter arms. It checks if the input arms are ready to drop an item. If they are then the chest calculates the amount that the inserter arms can drop and where the items will go. If there is a tie then some predefined, deterministic tiebreaker is used to pick what inserter arm will drop the item. The chest then writes it's new contents to it's "back buffer", leaving its previous state - the state that all the other processes can see, the same. In a separate execution, possibly on another thread, the inserter arm is updated. It is given the previous state of the chest and the previous state of all inserting inserters. It does the same calculation to determine if it is able to drop any items in the chest. If it is able, it calculates its state including the amount currently being held (it may not have dropped them all), writing them to its "back buffer".
This is an extremely basic example of how you would code a clock-based factorio. It's not optimised at all and does not take into account possible low-level synchronisation that may be able to be made to increase performance. For example, it may be possible to merge the inserter component and the chest component into a more complex component that updates both the inserter arm and chest at the same time. It's also possible to use low level cross-thread synchronisation to attach and detach the inserter from the amalgamated component as the arm swings back and forth between the chest and its pickup location, or to use a CAS on some part of memory to write the drop count to cache it for the other component's execution. It may also be possible to eliminate the double-buffering of the component's state IF it's possible to ensure that that information is not required in another update process, etc.
But anyway, Clocks (and clock boundaries) are an extremely powerful technique that I have never seen applied in software. If any programmers out there spend some time thinking about it then they should be able to see how it would allow an application to compartmentalise its functionality to be executed across as many threads as needed. If you think that it can't be done or that software is just "too complex" then I implore you to learn VHDL and see what chip designers have been doing since the the 1980's.
There is one issue with this style of programming though: The industry does not support it. By this I mean that the CPU designers are focused more on micro-parallelisation such as the out-of-order instruction processing, the vector extension operations, branch prediction and all the other nifty things that make your sequential program run faster. CPU's with a large number of cores have been coming for over 15 years now, but we really only see 4. Not until mainstream games require at least 16+ cores will you see Intel making them standard. They follow where software developers lead.
If you have read all that then congratulations.