Performance - Multiple Questions

Post your ideas and suggestions how to improve the game.

Moderator: ickputzdirwech

ravenbs
Long Handed Inserter
Long Handed Inserter
Posts: 61
Joined: Wed Jun 13, 2018 11:07 am
Contact:

Performance - Multiple Questions

Post by ravenbs »

Hi,

We have strong Gaming PC's.
But our Base is down to 40 UPS, and the game only uses 24% of the (4 Cores, 8 Threads) CPU.
Because Factorio do not scale with more CPU Cores, it is not even possible to buy a better CPU with 8, 12, 16, 32 Cores to speed it up.

Any tipps for me, what I can do to speed up our base?



So my Questions and requests are:

Multithreading:
I know converting a Game loop from Single to Multithreading include a massive change for locks for the main data structures.
This will even lead to overhead for those locks - even using user space spinnlocks because it is not to be expected to have much locks in lock state.
But this can be avoided.
The assumption is, that a giga base is spread across a huge area or multiple surfaces.
Futher assumption is, that every entity can only interact with other entitys which are close (100 Tiles?) to other ones. For example I think an assumption/Assertion can be made, that no inserter pick from more than distance x (coose a big one - 100 Tiles?).

So the Base can be splittet in sections, and all of them updated individually, even without the need for locks.
One of the sections is the "grid" between the tiles, which need to be updated single threaded afterwards, because there could be overlaps.
That also should not break deterministic multiplayer... Because it shouldn't update the order in which you update the tiles..


Example:
So Multithread tiles:
100<x<100.000 100<y<100.000
-100<x<-100.000 100<y<100.000
100<x<100.000 -100<y<-100.000
-100<x<-100.000 -100<y<-100.000
And then serial the remaining grid...
-100<x<100 -<y<+
-<x< -100<y<100+
Can be done for 8, 12, 16 Cores as well...


Or:
Transport lines take currently about 50% of the load.
Our complete Base is belt based and no robots.
Maybe it would be easier to give each connected belt one job for a thread pool?
Dont know your data structures, but I can imagine a connected belt will be some sort of merged to one update entity...

Or:
Entity updates.
I cant image the progress update of one factory affects another? Only if there are goods finished there is interaction to import/export the items.
So the updates without creating or consuming stuff can be fully parallel?


LuaJIT:
For huge Mod, this would be a really great improvement. Currently our mods take ~8ms.
Benchmarks for LuaJIT show 25 times faster as LUA interpreter (in perfect condition).

Multithread mod:
Allow Multiple threads for heavy load mods.
Maybe as an option in the mods info.json.
It "Use Multithreading" is set to true, just put a fassade with a guard in eacht interface call in between.
This can be generated automatically and do not need to be hand written by you.
I think your c++ bindings to LUA are generated anyway?

Benchmark for Factorio
I think currently there is no Database, which CPU/RAM/... will have how many UPS.
That would allow me to compare and buy a new CPU for Factorio.
Can a benchmark be done to get such a Database?
Think some sort of really massive Gigabase would be enouh, then post UPS including used Hardware as the CPU.
I would spent a big amount for a new CPU if this would help, but I dont think a 12, 16 ... core CPU is significant faster compared our current 4 Core one..


I think it would be a huge improvement to Factorio to allow even bigger bases.
And to scale (at least a bit) with CPU count, because then players like me can just buy a bigger CPU to continue the base instead of starting a new one.
Would be a shame to use only 2% of the performance of a 64 Core Threadripper 3990X CPU...

Thank you for your time and this great game!
ravenbs
Long Handed Inserter
Long Handed Inserter
Posts: 61
Joined: Wed Jun 13, 2018 11:07 am
Contact:

Re: Performance - Multiple Questions

Post by ravenbs »

I have an offer for you:

I am a very skilled c++ programmer – even if saying this is not really modest.
I have written 3 world wide sold (Windows + Linux) PC games (you can have the names as PN/Mail, dont want to advertise here) and worked for a PC game company for 10 Years.
I have written a lot of code for over 20 Million OEM Car navigation Systems out there.

So my offer to you is:
I can develop a Multithread patch for factorio without any risc for you.
If I not succeed or you do not like my patch, Programming, Result,… – yust drop it.
If you love what I do and merge it into the game, you pay me the time I have worked for it or an all-inclusive price we agreed on.

So:
No risc for you but a high chance to get Factorio multithreaded for a very cheap amount.
If you are interested in this application, please contact me and I give you all my detailes.

Im absolutely convinced this can be done, and I have the skill and time to do it.
If you are concerned about the savety of your code, we also can do some sort of a NDA, short-time employment or something like that.
User avatar
valneq
Smart Inserter
Smart Inserter
Posts: 1272
Joined: Fri Jul 12, 2019 7:43 am
Contact:

Re: Performance - Multiple Questions

Post by valneq »

Factorio is already heavily optimized.
ravenbs wrote: Tue Jun 16, 2020 8:02 am Would be a shame to use only 2% of the performance of a 64 Core Threadripper 3990X CPU...
You seem to assume that every computer program can be easily parallelized.
But the (im)possibility of multithreading in Factorio has already been discussed extensively.
See:
viewtopic.php?f=5&t=39893

Are you sure you want to repeat those discussions?
ravenbs
Long Handed Inserter
Long Handed Inserter
Posts: 61
Joined: Wed Jun 13, 2018 11:07 am
Contact:

Re: Performance - Multiple Questions

Post by ravenbs »

In fact there are in deed very vew algorithms that can not be parallelized. But those are absolutely reare and related to special mathematically equations.

I will read the topic you linked, but for Factory - which is object based - there is no mathematically impossibility that this can not be done.
And if you can do most of the stuff parallel (not all) you also have a huge performance gain.

PS: Im not talking about "easily".
I said: It can be done - i can not see how many effort it take without knowing the data structures.
ravenbs
Long Handed Inserter
Long Handed Inserter
Posts: 61
Joined: Wed Jun 13, 2018 11:07 am
Contact:

Re: Performance - Multiple Questions

Post by ravenbs »

Read a bit in this named Thread and also in this post:
https://www.factorio.com/blog/post/fff-151

Was a great plan explained in this FFF.
Yes, some parts are suitable for Multithreading, others are not.
Good plan go this way.


And the discussion thread is mostly about bashing who said what, and who is skilled by which level.
Sorry of not reading 18 sites of this ^^.
Without the knowledge of the data structures, there could be no tecnical discussion in this forum - so no discussion needed.

I made an concrete offer. You can take the chance without risc or leave it.
Thats what I intend - no discussion about skills, blaming, history,...
Bilka
Factorio Staff
Factorio Staff
Posts: 3671
Joined: Sat Aug 13, 2016 9:20 am
Contact:

Re: Performance - Multiple Questions

Post by Bilka »

Just for fun as a lunch break distraction:
ravenbs wrote: Tue Jun 16, 2020 8:02 am The assumption is, that a giga base is spread across a huge area or multiple surfaces.
Futher assumption is, that every entity can only interact with other entitys which are close (100 Tiles?) to other ones. For example I think an assumption/Assertion can be made, that no inserter pick from more than distance x (coose a big one - 100 Tiles?).
Regardless of whether this assumption is true for entities (it's not), it will never be true for mods. Mods can take any entity on any surface and use its data to affect any other entity on any other surface.
Maybe it would be easier to give each connected belt one job for a thread pool?
Dont know your data structures, but I can imagine a connected belt will be some sort of merged to one update entity...
Transport lines already merge up to 200 belt entities into one object to be updated. The transport lines are affected by other transport lines, so you cannot change the order in which they are updated -> you cannot just willynilly give them all to different threads. (https://factorio.com/blog/post/fff-176)
Entity updates.
I cant image the progress update of one factory affects another? Only if there are goods finished there is interaction to import/export the items.
So the updates without creating or consuming stuff can be fully parallel?
It does affect one another, e.g. power production affects all machines, belts affect inserters which affect assemblers etc. Also, mods are a thing here too, making anything affect anything else.
Note that energy source, heat buffer and fluid boxes are already updated in parallel to each other, though sequentially to the factory (https://factorio.com/blog/post/fff-324).
For huge Mod, this would be a really great improvement. Currently our mods take ~8ms.
Benchmarks for LuaJIT show 25 times faster as LUA interpreter (in perfect condition).
LuaJIT is not deterministic.
I think currently there is no Database, which CPU/RAM/... will have how many UPS.
https://factoriobox.1au.us/ (fan-made, anyone can run a benchmark on their system).
I'm an admin over at https://wiki.factorio.com. Feel free to contact me if there's anything wrong (or right) with it.
ravenbs
Long Handed Inserter
Long Handed Inserter
Posts: 61
Joined: Wed Jun 13, 2018 11:07 am
Contact:

Re: Performance - Multiple Questions

Post by ravenbs »

Thank you for the benchmark link - will have a look

Yes mods are a big unknown and the hardest to Multithread.
May be the last one or never be done...
Transport lines already merge up to 200 belt entities into one object to be updated. The transport lines are affected by other transport lines, so you cannot change the order in which they are updated -> you cannot just willynilly give them all to different threads. (https://factorio.com/blog/post/fff-176)
Not willynilly, but you can seperate them to clusters - for example - affecting or not affecting each other.
Implementation detail will depend on the data structure, but if there are completely disjunct belts, there is nothing again doing them in wrong order and still be dederminating.
Alwasy the difference: dederminating vs: deterministic
What you need for Multiplayer is only dederminating per tick - right?

LuaJIT
Maybe a hard one and therefore last optimization (LUA and Mods). But also no reason why it shouldnt if you change the same you did in normal LUA like random generator....
Random generator can also be thread-save dederminating regardless of the order you call it...
Yust imagine a seed related to the corresponding object instead of a global seed...
User avatar
Klonan
Factorio Staff
Factorio Staff
Posts: 5423
Joined: Sun Jan 11, 2015 2:09 pm
Contact:

Re: Performance - Multiple Questions

Post by Klonan »

ravenbs wrote: Tue Jun 16, 2020 9:22 am Implementation detail will depend on the data structure, but if there are completely disjunct belts, there is nothing again doing them in wrong order and still be dederminating.
Well, specifically with belts, when the transport lines update, they can 'wake up' inserters and assembling machines, if this is done in a willynilly order, then it will lead to issues

But yes, theoretically we can update separate transport lines in parallel. It is something we can do, if we solve issues with wake up lists and things.
However at this time most benchmarks i've seen don't show the transport line update as a significant area for improvement.

Right now the biggest time sink is in the Inserter updates, and they would be much harder to update in parallel,
For instance, an inserter can pick an item, which is read by the circuit network to disable another inserter, or turn on a whole section of factory,
Or you have two inserters trying to take from the same container and on different clients a different one takes the item.
ravenbs
Long Handed Inserter
Long Handed Inserter
Posts: 61
Joined: Wed Jun 13, 2018 11:07 am
Contact:

Re: Performance - Multiple Questions

Post by ravenbs »

All true what you say and not easy.
Dont want to drift to this tecnical discussion, because I have no idea of your code and can only guess - mostly being wrong of course, because on what base should I discuss?

Its hard and complicated, but I sill belive It can be done. It is not impossible.
Offer is made - i can do it for you. Fulltime. On my own risc.
Sorting out one problem after another...

If I fail - i fail and lost a few month of my time.
If I succeed - you can get this as cheap as no other way.
Can you please forwand this offer to the responsible staff?

In 10 Years we have 256 Core CPU's.... Should Factorio only using 0,5% of CPU power then?
User avatar
ssilk
Global Moderator
Global Moderator
Posts: 12889
Joined: Tue Apr 16, 2013 10:35 pm
Contact:

Re: Performance - Multiple Questions

Post by ssilk »

Ravenbs: This is interesting and I hope they will give you some access to the code. (Even knowing that this knowledge can be used to create a concurrent game - from my sight as player this is fine)

What I want to add is, I have in my head that the most limiting factor currently is the memory bandwidth. Not the CPU. I think Kovarex wrote about it in a FFF some years ago.
Didn’t find that, but is mentioned here https://factorio.com/blog/post/fff-204

It’s worth reading the blog about tries they already made to use multithreads - For example what Klonan said about inserters: they already tried to update inserters in chunks. https://factorio.com/blog/post/fff-151
But failed. Must be some blogs later.

And personally I find it much more interesting to use the graphics card as physics simulation for the pipe networking or - long dream - the electric network. ( viewtopic.php?f=80&t=15546 )
Or “simple” things like saving in background on Windows. :)

Or just a real 3D world... :) but that’s Factorio 2... :D
Cool suggestion: Eatable MOUSE-pointers.
Have you used the Advanced Search today?
Need help, question? FAQ - Wiki - Forum help
I still like small signatures...
ravenbs
Long Handed Inserter
Long Handed Inserter
Posts: 61
Joined: Wed Jun 13, 2018 11:07 am
Contact:

Re: Performance - Multiple Questions

Post by ravenbs »

I want to post you an example Algorithm how to parallelize Inserters.

This is based only on public information from Forum and FFF. I don’t know if all my assumptions from this are correct, but this should illustrate an approach. Maybe it can be adapted to your real World. I picked Inserters, because Klonan said this is the biggest time sink. Algorithm is only very short, but I want to explain it with images, so you can point me to my mistake I did.

My assumptions:
- Inserter update order doesn’t really matter. But it needs to be the same all time to be determinative for Multiplayer and to solve conflicts – later more.
- Update is in Stages. So first you do for example all belt movements, then the inserters. There is no belt movement during inserter update…
- The Result after a tick needs to be exactly identical on all Multiplayer PCs. It is not a requirement to have it the same state during the tick update.
- You “wake up” Inserters, when the associated belt has a change or a machine finishes his job. Therefore I assume Belt updates and Machine updates are upfront. They can also be in parallel, but for the moment being assume this is done without change like currently.

First I try a basic example.
All four Inserters can be updated in parallel threads. It is no difference what the real order of update is, the result after the tick is always the same.
Image

Algorithm Preparation:
My assumption is, currently you fill a list of inserters during updating other entity’s like belts or machines. Lets call it wakeup list. Most likely it is a map, because you do not want to have inserters duplicated there.
Instead of 1 wakeup list – you need to have n.
Where n = (number of threads this CPU can * 10) + 1
So for a 4 Core HT enabled CPU – you need 41 lists.
Magic number 10 to be adjusted….

The special list 40 (n-1) is for all non-parallel updates. This is executed first, before the parallel workload starts.
Lists 0 to 39 (n-2) are executed from worker threads later on. Number of worker threads equals number of CPU Threads. There are more lists as threads, because they will not be equal length and this is for distributing the load equal to all threads.

Now the Problems:
1. Circuit Network:
Basically you can do anything there. Up to cutt off/on power if an inserter has something in its hand. So it is a graphic programming language like a mod. For the moment being – let us not parallelize all Items connected to a logistics network or mod lua updates. Everything connected to a circuit network goes directly to list 40 (n-1).


2. Simple problem
Image
In this example the order of update really matters. Which one will pick the ore?

So when the belt updates and wake up both inserters, we need to make sure they go to the same job list. This can be done by using a unique integer number, the triggering element (the belt?) has. I don’t know if you have an ID, but even the coordinates x+y or only x would do the trick. Lets pick (x+y) for the moment.
listToInsert = (beltX + beltY) % (n-2)
Ensuring on all clients that those inserters are in the same list and therefore executed in deterministic order will always lead to the same result.

Now skip the other low hanging fruits and go right to the “hell” example where and why we need to propagate those IDs.
Image

In this example we load the same resource from multiple belts to one entity. We do not only have the problem “who pick the ore?”. We also have the problem when there is only 1 needed in the furnace – is it the right or the left who fills this one up? And this example is valid for all machines and also longer “chains” of “connected problems”.
For that reason, when you fill the wake up list, you save the “listToInsert” number in both entitys (Belt + Furnace). You don’t need to save this in the savegame, it is a temp value in the object. You always use the same number for all inserters you wake up with this (e.g. belt) entity.
- If you find such a number, use this instead of calculating a list number.
- If you find Multiple, and different such numbers, because the left furnace has a 5 and the right furnace has already a 7 – use the lower number 5. And in addition you need to move the complete content of the higher list 7 to the lower list 5. Here we are merging update domains, because both need to be in a sequence. Propagate the used number (5) back to the entity with the higher number. So we need to solve this conflict only for 1 Tick – the next tick it will directly use the correct list… So it is possible that the first game tick is slower then all following.

This is the reason for the need of 10 times the Job list over the thread count. Because the lists with lower number will get statistically more entry’s as the ones with higher numbers. This 10 can also be a 100 or 1000 – a real world benchmark would show up the optimal number. My assumption is, the bigger the base, the bigger this number must be to have a meaningful distribution. But the overhead of multiple lists should not such big, even if you use a high number of them.

Further optimization options:
- This algorithm is only needed for the Weakup of inserts. Once it has an item and rotate to its destination – there could no conflict happen until the inserter reaches his endpoint. So this can be just updated in parallel equal distributed to workers without this algorithm.

- Once you have this “cluster parallel job lists based on the ID in the objects thing” – you can even implement smarter cluster finding as the algorithm above. For example already find those ID’s when building a new inserter, by copy it from its neighbour.

- Or you can initial find the smallest possible clusters by a graph traversal, what inserters are “connected” over entity’s or belts, chests,… This will lead to the best possible balanced job lists. For example use a “Foreach ( inserterer) … if has no threadlistID … traverse “connected” elements and give the whole cluster one”. Think this could be even simpler as the algorithm above, but is a short delay when loading the map.
The definition of “Connected” is the key. Its at least Belts, Chests, and all types of Machines. But it would be good if this is only a single belt grid - and not the merged 100 long belt to have small clusters…


Conclusion:
This is just a stimulation what can be done – completely based on my fiction how your code might look like. Hope you see this not as an attack, I just want to point out a way can be found….

Have I missed a point regarding the inserters where it breaks this “small clusters of sequencial inserters” thing?
You build the greatest game already.
I think it could be even more fun with bigger bases, and most gamers (99,65% according to steam) have multi core CPU’s…
User avatar
Impatient
Filter Inserter
Filter Inserter
Posts: 884
Joined: Sun Mar 20, 2016 2:51 am
Contact:

Re: Performance - Multiple Questions

Post by Impatient »

Hehe, this thread could also be read as a very smart job application. ;-)

Clustering sounds intriguing. Though I don't know anything about anything. But it sounds like a game with independent outposts or a city block style game could profit massively by clustering. Maybe even the prevalent base layout with a main bus, where a lot of separated production lines are only connected by the bus and the power network.

On the other hand, game mechanics that often connect a lot of things, especially in the later game and thus in large bases are the circuit and the logistic network.

Also, once put into the same cluster, entities would remain there and thus, with every change made to the factory, there is a chance that clusters will be joined and remain joined. Extrapolated over an unlimited game time, everything would end up in the same cluster sooner or later. UNLESS every cluster is reevaluated, if it can be broken apart, periodically. That reevaluation could run in parallel, almost independent from the core game loop and over several ticks. I see it not being time critical. Also there are conditions for when a reevaluation makes predictably the most sense, namely and foremost when entities are removed.

Sigh, discussing things like these and developing ideas really is fun. :)
Koub
Global Moderator
Global Moderator
Posts: 8046
Joined: Fri May 30, 2014 8:54 am
Contact:

Re: Performance - Multiple Questions

Post by Koub »

To anyone deeply convinced the parallelization has not been explored enough by the devs, and that a waving a magic wand will make Factorio exploit fully a multithreaded 32 core, I urge to read entirely this thread valneq has already linked : viewtopic.php?f=5&t=39893. And especially read the devs' posts.

I'm not saying nobody can find a good idea the devs didn't have yet, but chances are very low.
Koub - Please consider English is not my native language.
User avatar
Impatient
Filter Inserter
Filter Inserter
Posts: 884
Joined: Sun Mar 20, 2016 2:51 am
Contact:

Re: Performance - Multiple Questions

Post by Impatient »

Koub wrote: Wed Jun 17, 2020 4:01 pm To anyone deeply convinced the parallelization has not been explored enough by the devs, and that a waving a magic wand will make Factorio exploit fully a multithreaded 32 core, ...
Just in case my post was one of the or THE reason for yours: I did not write anything like that. One of the first things I wrote was that I don't know anything about anything. That to make my post easily dismissable as fun time throwing around ideas and speculations, if one does not want to follow.

Anyways, thanks for stressing the link, I see dev answers in the thread and I am going to read them, to get more knowledge about the topic.
Post Reply

Return to “Ideas and Suggestions”