Friday Facts #215 - Multithreading issues

ske · Post by **ske** » Fri Nov 03, 2017 9:26 pm

I have an idea for the data locality. You say that much of the data is usually accessed read-only but some data of each object is often written to (thereby invalidating the whole page). Would it be an option to put the changing data into separate arrays?

For example from:

Code: Select all

volatile_electrical_info {...some bytes that change often...};
struct entity {
  volatile_electrical_info e_data;
  volatile_electrical_info& get_e_data() {return e_data;}
  ...
};

to:

Code: Select all

volatile_electrical_info {...some bytes that change often...};
std::vector<volatile_electrical_info> array_of_electrical_info;
struct entity {
  int e_data_index;
  volatile_electrical_info& get_e_data() {return array_of_electrical_info[e_data_index];}
  ...
};

Maybe have the localized data contain enough information to get the possibility to perform some update actions sequentally on the array instead the entities.

Another improvement would be to separate the arrays into chunks.

I don't think this method is a good idea for early..mid development because it introduces indirection and centralizes data storage which makes changes more complicated. More performance is nice but it's not a deciding factor right now (my opinion). If you want to have the resources to continue development you could do something like this after 1.0 is released.

ratchetfreak · Post by **ratchetfreak** » Fri Nov 03, 2017 9:29 pm

I don't agree that false sharing is an issue at all.

Threads mutually stepping on each other's L3 (and maybe shared L2) cache entry and purging stuff a thread still needs is a much more likely issue.

Also std containers can take custom statefull allocator, so can allocate_shared (you'll have to make your own allocate_unique). So you can make each entity take a pointer to the chunk's allocator and that way you can group every byte of memory in each single chunk. You can even split things up per type that way.

basementjack · Post by **basementjack** » Fri Nov 03, 2017 10:15 pm

shadetheartist wrote:the devs should really consider open sourcing the code.

Heck No.

If you're interested in working on the code, reach out to the devs and ask for a Job, You can sign an NDA and I'm sure the'd love to have you, but open sourcing the code is business suicide.

Factorio has an amazing following already (>1 million copies) but there are other untapped opportunities like an ipad version, console versions, etc...

It's possible that someone like EA would pay millions to partner with Wube to bring this to another platform. Once the code is open source, there's not much motivation to do that - and instead some big company would create a clone product instead, avoiding any royalties.

They've spent 5 years building something they own, and while the brand name definitely plays a part, a very large part is also the code base they've developed.

After all, none of us paid our $20 to buy the logo, we wanted the code.

golfmiketango · Post by **golfmiketango** » Fri Nov 03, 2017 10:33 pm

basementjack wrote:
shadetheartist wrote:the devs should really consider open sourcing the code.
Heck No.

If you're interested in working on the code, reach out to the devs and ask for a Job, You can sign an NDA and I'm sure the'd love to have you, but open sourcing the code is business suicide.

Although I'd never try to tell anyone to open-source anything they don't want to, I think this statement is odd. Open source requires a very different business model to turn a profit but you can't just say "suicide" as there is a near-endless list of counterexamples.

Here we have an optimization puzzle that entails difficult multidisciplinary and highly platform-variant-dependent factors which Wube developers are probably having to learn about as they go; well, perhaps if Wube were to factor out the relevant bits of architecture into a highly modular, isolated code base, and release that chunk of functionality as an open-source data-processing framework, then, conceivably, somebody who is more expert on such questions might be able to make some contribution. It's hard to imagine how this would be business suicide. Again, I'm not saying that's what Wube actually should do -- after all, whether or not anything would be gained as a result of doing all that refactoring work is not at all clear -- just reacting to what I see as a wierd (or maybe just weirdly stated) perspective on the economics of open source.

dinodod · Post by **dinodod** » Fri Nov 03, 2017 10:45 pm

As a basic programmer, it sorta sounds like how you have functions with private variables and also global variables. Would it be possible to do that with threads? the output from 1 thread goes into some sort of a global variable that all threads can access (read only)??

dinodod · Post by **dinodod** » Fri Nov 03, 2017 11:17 pm

Is there any way to optimize how biters affect gameplay in MP? For me and several of my friends, we get lag hits when players go biter hunting. We also have problems from time to time where players go and explore the entire map as much as possible and thus this kills the save file size / MP performace.

Normally, this is late game issue but I still don't understand why the lag exists.

I would like to know if there might be a way to create static chunks / unload chunks / filter the entities on a per player basis. For me, I normally don't venture out so I don't need to see all the bases on the map or receive updates from some base far away when players attack it. Sure, invalidate the base but update it when I go into the area. Otherwise, I opt for simply blacking out the region until I go there and explore, thus I would reveal the area for myself. Other players might want to have different options. Maybe make personal radars instead of a global one and make the global one have options to blackout updates??? maybe allow the global radars to have a filtered view per player? I'm just trying to figure out a way to keep performance in late game.

I was trying to look into the debug screen and figure out what entities are causing the issue. Any advice on how to troubleshoot performance lag? I read so many possible issues, like concrete, lots of bots, belts, inserters. I read that I needed to look at the update section. So it tells me the speed of the updates but not exactly what is in that list.

Would direct teleportation into / out from a factory be a thing for late game to help with optimization? This is a space game and I sure do love Star Trek. Teleportation was real.

malventano · Post by **malventano** » Fri Nov 03, 2017 11:33 pm

ske wrote:Interesting cache invalidation effects! Have you (or somebody else) been comparing the performance of factorio on different CPU architectures like Intel i9 7900x, AMD Ryzen 1800X, AMD Threadripper 1950X (with/without NUMA)? I'm asking for a friend who can't decide which CPU to buy.

Intel generally wins by a decent margin in single threaded, and that widens in situations where caches are being shared (especially across core complexes, which takes far longer than a typical cache hit).

math4origami · Post by **math4origami** » Fri Nov 03, 2017 11:41 pm

In other words, something like Entity Component System.

Regarding chunks, you say that changing position requires reallocation. Why is that? Are you currently bundling whole entities into chunks, and moving them around as they travel between chunks? But at the same time, you have multi-chunk or global updates such as electricity and trains?

I feel like the proper application of ECS should not encounter this issue. All the data necessary to do "electricity update" should be managed by the "electricity system", and the entity just has an id pointing to that data to get info when needed. Then, when the "electricity system" does the update, all the data it owns is local and contiguous. You can move the entity between chunks as necessary, but it won't need to reallocate electricity data because the "electricity system" owns it and the entity only has an id.

ske · Post by **ske** » Fri Nov 03, 2017 11:47 pm

malventano wrote:
ske wrote:Interesting cache invalidation effects! Have you (or somebody else) been comparing the performance of factorio on different CPU architectures like Intel i9 7900x, AMD Ryzen 1800X, AMD Threadripper 1950X (with/without NUMA)? I'm asking for a friend who can't decide which CPU to buy.
Intel generally wins by a decent margin in single threaded, and that widens in situations where caches are being shared (especially across core complexes, which takes far longer than a typical cache hit).

Yes, this is what I would expect, too. The big question is: Is it really so and how high is the impact?

luc · Post by **luc** » Fri Nov 03, 2017 11:50 pm

basementjack wrote:open sourcing the code is business suicide.

Tell that to Red Hat Enterprise Linux, Overleaf, VersionOne, Wire.com, and countless other such companies and products.

basementjack wrote:They've spent 5 years building something they own, and while the brand name definitely plays a part, a very large part is also the code base they've developed.

Definitely. But it doesn't have to be only black or white either: they could release the source code that's 5 years old. So around now, we'd get the source code of the earliest prototypes; and in 5 years, we'll get the source code of the game as it is now. Anyone trying to clone the game will have 5 years of innovation to catch up with. And if Wube stops with the development, five years after that there will still be lots of players, many of whom can code, and will enjoy improving the game further. See OpenRCT2, which still requires you to have the original CD for the graphics, hence still making Chris Sawyer (the creator) money while being open source.

There are a lot of options here, many of which make both business sense and are nice for the community at the same time. It's silly to say that closed source is the one true path.

dinodod · Post by **dinodod** » Fri Nov 03, 2017 11:54 pm

ske wrote:
malventano wrote:
ske wrote:Interesting cache invalidation effects! Have you (or somebody else) been comparing the performance of factorio on different CPU architectures like Intel i9 7900x, AMD Ryzen 1800X, AMD Threadripper 1950X (with/without NUMA)? I'm asking for a friend who can't decide which CPU to buy.
Intel generally wins by a decent margin in single threaded, and that widens in situations where caches are being shared (especially across core complexes, which takes far longer than a typical cache hit).
Yes, this is what I would expect, too. The big question is: Is it really so and how high is the impact?

Are you the kind of person that micro manages every FPS in every game? Then it doesn't matter, you will get 60 FPS and that's not going to stress the CPU at all. I have an older XEON CPU that doesn't get above 15% CPU for this game. and what works well for one CPU wont work the same on the other so you are just wasting your time trying to micro analyze CPU performance. I say wait and see what falls out. The race between AMD & NVIDIA has lead to some bad apples being made. Same goes with Intel / AMD. And no one needs an I-9 unless they are over the top seriously multithreading.

DanGio · Post by **DanGio** » Fri Nov 03, 2017 11:57 pm

Don't change the subject here...... this does not look like concrete.

/me leaves the room

zebediah49 · Post by **zebediah49** » Sat Nov 04, 2017 12:23 am

dinodod wrote:I would like to know if there might be a way to create static chunks / unload chunks / filter the entities on a per player basis. For me, I normally don't venture out so I don't need to see all the bases on the map or receive updates from some base far away when players attack it. Sure, invalidate the base but update it when I go into the area. Otherwise, I opt for simply blacking out the region until I go there and explore, thus I would reveal the area for myself. Other players might want to have different options. Maybe make personal radars instead of a global one and make the global one have options to blackout updates??? maybe allow the global radars to have a filtered view per player? I'm just trying to figure out a way to keep performance in late game.

Not really; at a minimum Factorio multiplayer would need to move away from its "synchronous lockstep" model into a true client/server model, with some large challenges.

To elaborate, at the moment, all players simulate the entire world for every tick. It's not that you receive updates keeping up with the other players fighting biters elsewhere -- you actually have to simulate the entire battle. The only thing you get sent is the information about what the players themselves are done (movement, shooting, etc.). Everything else is completely deterministic, and all players calculate it separately. This is because there would be an enormous amount of data movement to do it any other way.

In order to let you not simulate that, the server would need to keep track of the region of space you have loaded, and push updates to you about anything that enters or leaves that region. If the regions were automatically generated based on content it is technically possible to do this, but still very very challenging. The trivial case of base sections isolated from outputs by trains is moderately workable, but most of your ups cost is going to be from the main base.

Hence, the devs have more-or-less decided that it'd work out better to just make everything faster and more efficient than to switch the multiplayer netcode.

Post by **Loewchen** » Sat Nov 04, 2017 12:33 am

ske wrote:
malventano wrote:
ske wrote:Interesting cache invalidation effects! Have you (or somebody else) been comparing the performance of factorio on different CPU architectures like Intel i9 7900x, AMD Ryzen 1800X, AMD Threadripper 1950X (with/without NUMA)? I'm asking for a friend who can't decide which CPU to buy.
Intel generally wins by a decent margin in single threaded, and that widens in situations where caches are being shared (especially across core complexes, which takes far longer than a typical cache hit).
Yes, this is what I would expect, too. The big question is: Is it really so and how high is the impact?

There were several comparison posts made after the Ryzen release (Link).

On one specific save file:

That save runs at ~35 ups with my 7700k.

On Ryzen 7 1800X OC 4.1GHZ + DDR4 3200GHz I get 22.5 UPS

Jap2.0 · Post by **Jap2.0** » Sat Nov 04, 2017 12:56 am

basementjack wrote:
shadetheartist wrote:I feel like once the game is released, (and i know this has been mentioned before, but tentatively) the devs should really consider open sourcing the code. Because there are some very enthusiastic and talented people out here that might actually be able to take the time to go through and figure things out. And i'm not saying the devs couldn't do it too, just that it seems like 1.0 launches they are going to back off hard, which is perfectly reasonable.

I for one just want to marvel at the optimized code. I mean, you think your belt maneuvering is clever, you should see what they must have under the hood.
Heck No.

If you're interested in working on the code, reach out to the devs and ask for a Job, You can sign an NDA and I'm sure the'd love to have you, but open sourcing the code is business suicide.

Factorio has an amazing following already (>1 million copies) but there are other untapped opportunities like an ipad version, console versions, etc...

It's possible that someone like EA would pay millions to partner with Wube to bring this to another platform. Once the code is open source, there's not much motivation to do that - and instead some big company would create a clone product instead, avoiding any royalties.

They've spent 5 years building something they own, and while the brand name definitely plays a part, a very large part is also the code base they've developed.

After all, none of us paid our $20 to buy the logo, we wanted the code.

First: After 1.0, I'm personally hoping they follow a path similar to what Minecraft and some other games have done, just done better, obviously, because they are the Factorio devs

Second: I'm fairly sure there are no plans to bring this to console or mobile, due to the amount of hotkeys, relatively low power, and difficulty in porting.

Third: I'm pretty sure there are some people who might want to murder you for suggesting that Wube do business with EA. I'm not saying I would, but a lot of people love Wube, and a lot of people hate EA.

Fourth: That's not quite how cloning works. Plenty of people have cloned Minecraft, which isn't open source, and the legal stuff is a bit more compliucated if you just take some open-source stuff, port it, and sell it.

AssaultRaven · Post by **AssaultRaven** » Sat Nov 04, 2017 1:25 am

shadetheartist wrote:I feel like once the game is released, (and i know this has been mentioned before, but tentatively) the devs should really consider open sourcing the code. Because there are some very enthusiastic and talented people out here that might actually be able to take the time to go through and figure things out. And i'm not saying the devs couldn't do it too, just that it seems like 1.0 launches they are going to back off hard, which is perfectly reasonable.

I for one just want to marvel at the optimized code. I mean, you think your belt maneuvering is clever, you should see what they must have under the hood.

Certainly I think a case should be made that crowdfunded projects should be open sourced as a matter of course. You both want to get paid in advance, and yet also be able to own the product for your own profit if things go well? Factorio may have worked out well in the end, but the concept seems like an inherent scam.

Even for other titles, it's a huge cultural and technical loss that so much gaming code remains closed source indefinitely. Even moreso for crowdfunded projects that get canceled: they should at least post what they managed to do, in case anyone else wants to run with it, but source isn't something people are used to expecting, unfortunately.

bk5115545 · Post by **bk5115545** » Sat Nov 04, 2017 2:31 am

I thought I ran into the same type of problem (cache-line invalidation) a couple of years ago but it actually turned out to be cache-misses on a handful of triple dereferenced entities and some &** making the branch predictor and speculative execution almost completely irrelevant.

Modern processors (think after ~2005) all have multi-way cache association with aggressive snooping via a concept of broadcasting (mostly only for SIMD instructions). The only time a total page invalidation should occur ('should' instead of 'could' because overclocking cache frequency really messes with this) is if 2 processors both modified the same cache line in their own caches and a flush was triggered within a few cycles of each other AND the first line to flush modified the location of the target of the second flush. The page is then synced from L3 and written to memory asynchronously so it's not the end of the world if it happens every now and then.

Depending on which compiler you're using, you can hint L3 temporal locality for pointers. This approach can somewhat help with having a long reference chain if you keep pointers in the cache instead of objects as pointers are much less likely to get evicted; this might also be the most effective solution if most of the pointers don't change every frame (if they do change frequently then you should reconsider that design decision because it completely removes the processor's ability to cache A->D in the A->B->C->D chain). See GCC __builtin_prefetch and I'm sure Clang has an equivalent.

As always feel free to ask questions to the guys at the Intel Developer forums. If you truly believe this to be a cache problem then it applies to all modern processors (because all the current processors use ALMOST the same cache layout) and I'm sure those devs can help. They've helped me quite a bit in the past.

rbtcollins · Post by **rbtcollins** » Sat Nov 04, 2017 8:10 am

Notwithstanding the interesting discussion about whether your analysis is right, I have a suggestion for achieving parallel thread update isolation at whatever granularity you need without moving data as entities move.

Given N threads that will be updating in parallel some object type T, create N slab allocators for T - one per thread - and place entities via the slab allocator chosen by some partition function based on the id. E.g. you could take the id & 0x0f to mask across 16 Threads. Or you could do a crc32 and then take the bottom N bits. Or whatever

.

Then in your update loop, in each thread, only consider entities whose id hashes to the thread's allocator.

It doesn't matter what granularity you need here, because each allocator will be allocating pages (or more) independently, so you won't collide at that level. You may still have cache line issues, but see the other comments about that.

If you need to allocate objects and then assign an id, its a little more work - you need to move the object when the id changes - but thats all that should be required.

sketch:

Code: Select all

struct X {
  void* operator new(std::size_t sz, int id) {
       int thread = choose_thread(id);
       return threads.get(thread).allocate(sz);
    }
}

void thread_worker(int thread_id) {
   for (X in all_X) {
       if choose_thread(X.id) != thread_id {
           continue;
       }
      ....
   }
}

henke37 · Post by **henke37** » Sat Nov 04, 2017 9:29 am

rbtcollins wrote:Notwithstanding the interesting discussion about whether your analysis is right, I have a suggestion for achieving parallel thread update isolation at whatever granularity you need without moving data as entities move.

Given N threads that will be updating in parallel some object type T, create N slab allocators for T - one per thread - and place entities via the slab allocator chosen by some partition function based on the id. E.g. you could take the id & 0x0f to mask across 16 Threads. Or you could do a crc32 and then take the bottom N bits. Or whatever .

Then in your update loop, in each thread, only consider entities whose id hashes to the thread's allocator.

It doesn't matter what granularity you need here, because each allocator will be allocating pages (or more) independently, so you won't collide at that level. You may still have cache line issues, but see the other comments about that.

If you need to allocate objects and then assign an id, its a little more work - you need to move the object when the id changes - but thats all that should be required.

sketch:
Code: Select all
struct X {
  void* operator new(std::size_t sz, int id) {
       int thread = choose_thread(id);
       return threads.get(thread).allocate(sz);
    }
}

void thread_worker(int thread_id) {
   for (X in all_X) {
       if choose_thread(X.id) != thread_id {
           continue;
       }
      ....
   }
}

Bad idea, that loop has each thread touching each X. Keep separate per thread prefiltered lists instead.

Oktokolo · Post by **Oktokolo** » Sat Nov 04, 2017 9:42 am

DanGio wrote:Don't change the subject here...... this does not look like concrete.

I disagree. This friday facts look like providing pretty concrete insights into the state of multiplayer start area generation. They also describe a concrete problem encountered while trying to get more megabase UPS by picking a seemingly low-hanging fruit of parallelization.

Factorio Forums

Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues

Re: Friday Facts #215 - Multithreading issues