Friday Facts #204 - Another day, another optimisation

Regular reports on Factorio development.
Zaflis
Filter Inserter
Filter Inserter
Posts: 493
Joined: Sun Apr 24, 2016 12:51 am
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by Zaflis »

pleegwat wrote:Doesn't work if you're actually destroying entities outside their own update call though, because you may end up swapping an entity which acted this cycle with one which has not.
Not if you iterate the vector from end to beginning. Some pseudo if you wanted to make particle system:

Code: Select all

for (p = PCount-1; p >= 0; p--)
  particle[p].DealWithIt()
  particle[p].life--
  if particle[p].life <= 0 then
    PCount--
    particle[p] = particle[PCount]
    particle.deleteLast()
  end
end
What you get is every single particle is dealt with exactly once on every loop, and deleting one means just one swap operation with the last. On the downside the list won't keep the original order. It is only a problem if you need sorting. But on a free list of entities that doesn't sound like being the case. In case of particles aswell the order doesn't matter.

ske
Filter Inserter
Filter Inserter
Posts: 412
Joined: Sat Oct 17, 2015 8:00 am
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by ske »

I wonder what all this optimization should accomplish. Not to talk it down. It's amazing. I'm just wondering about the allocation of developer resources. Did you set a specific goal like "get this map from 30 UPS up to 60 UPS"? It seems more like you are drawn into optimizing things by natural force. I mean that's what factorio is strong in. It's easy to build a factory that does everything okayish. But then you start fiddling with the belts here and there and rearrange some assemblers and before you know it's dawn again. So it seems natural that you start optimizing things and exploring new ways to make everything work better. Still, there are those cards. Haunting. "Build a Spidertron." uhhhhhh a creepy voice says "they expect nothing less than perfection" and "make it amazing". Allthewile nobody really knows what that Spidertron actually does. Or should do. Or could do. Most importantly what it couldn't and shouldn't do. Sometimes it does anyways. It would interact with oh-so-many things. Do we make it remote-controlled? Like on click on the radar-map and it will run there and enable item manipulation as if the player was there. For example in radioactive areas. Wait, radioactivity doesn't hurt players? There have been people walking home with plutionium in their jacket after work. A few days later they fell sick. All dead now. Yet, there it is, highly active Uranium, sitting in our pocket next to some trains, fuel tanks and a whole rocket launching facility. Put in one more piece of uranium and it explodes. Or it doesn't. But it could. People would learn the hard way. That's what nobody expects now except those who know. Therefore it should.

Zool
Fast Inserter
Fast Inserter
Posts: 106
Joined: Fri Jul 10, 2015 6:55 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by Zool »

After so many comments of people claiming to know best what type of programming-constructs are superior, I have doubts, that the devs will ever post sources again *sniff*

Peter34
Smart Inserter
Smart Inserter
Posts: 1100
Joined: Mon Nov 10, 2014 12:44 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by Peter34 »

darkfrei wrote:
By changing the zoom rate from 1.1, to the 7th root of 2 (1.104089...), the zoom now increments perfectly from 1.0 to 2.0 in 7 steps.
It's music!
http://pages.mtu.edu/~suits/notefreqs.html
It's modern music. Old skool music didn't do it that way.

Peter34
Smart Inserter
Smart Inserter
Posts: 1100
Joined: Mon Nov 10, 2014 12:44 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by Peter34 »

safan wrote:i completely didn't understand anything in this fff
Not even the part about zoom scaling increments?

daniel34
Global Moderator
Global Moderator
Posts: 2761
Joined: Thu Dec 25, 2014 7:30 am
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by daniel34 »

It turned out that our zooming system never showed an exact zoom of 2.0, which would be the 'pixel perfect' zoom level for the HR entities. By changing the zoom rate from 1.1, to the 7th root of 2 (1.104089...), the zoom now increments perfectly from 1.0 to 2.0 in 7 steps.
Currently the F9 key (Controls --> Debug --> Reset zoom level) sets the zoom to exactly 1.0 and is very useful when taking screenshots or videos or just enjoying pixel-perfect graphics. Would it be possible to add an extra key binding (e.g. CTRL+F9) that sets the zoom to exactly 2.0 to also have pixel perfect views for HR, instead of having to zoom in 7 times?
quick links: log file | graphical issues | wiki

bobucles
Smart Inserter
Smart Inserter
Posts: 1708
Joined: Wed Jun 10, 2015 10:37 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by bobucles »

When multiplying and dividing extremely specific floating point decimals there is a very good chance you'll accumulate a "drift" from tiny rounding errors (reason number 907.0000000000000156 to hate floats). Why not use a simple list of zoom values? Or at least snap the zoom level when it should hit an integer.

User avatar
impetus maximus
Smart Inserter
Smart Inserter
Posts: 1299
Joined: Sat Aug 20, 2016 10:07 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by impetus maximus »

+1 for AMD testing. i see a AMD Ryzen™ workstation in my future.

i've been zooming for screenshots (poorly) by eye. thanks for the F9 tip. :)
Gergely wrote:
cpy wrote:First! Hurray! :D
Some forums have rules aginst that kind of "spam" you know... (Sadly, not this one.)
click their name, add foe. you won't see their spam in the future.

User avatar
DaveMcW
Smart Inserter
Smart Inserter
Posts: 3713
Joined: Tue May 13, 2014 11:06 am
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by DaveMcW »

bobucles wrote:When multiplying and dividing extremely specific floating point decimals there is a very good chance you'll accumulate a "drift" from tiny rounding errors (reason number 907.0000000000000156 to hate floats). Why not use a simple list of zoom values? Or at least snap the zoom level when it should hit an integer.
I think a 99.9999999999% pixel perfect image is good enough. ;)

mathturtle
Inserter
Inserter
Posts: 45
Joined: Sat Jan 21, 2017 8:19 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by mathturtle »

AlastarFrost wrote: ...

It looks like everything is connected tightly with everything else so that you can not break it apart, but thats actually not true.
If you want to come to a reasonable simulation, not a perfect one, everything that has a large enough buffer creates a propagation delay of every action that is big enough to extrapolate the outcome without a big error margin. Thats why I mentioned trains: If a mining facility has to fill a train before the outcome is propagated (the train starts moving), you could simulate it until the train starts right away if noone is looking, and not touch it until the train leaves. Then you could simulate in large chunks until the buffer chests run full, until the next train arrives.

...
You assume noone is looking though. Map view breaks that assumption. Everyone I know leaves a radar at every outpost now so they can look at it if problems arise. Having the ability to view the outpost (and the number of ore in the chests at the station) in real time is great. Also you don't know if the biters are going to eat your power plant ...or come eat the outpost. Or maybe you get a little careless with your deconstruction planner and delete the one power pole that connects the power plant to everything else. Maybe you could get away with this optimization if you took away map view, turned off biters and decoupled power somehow (local power for the outpost?). But I don't think that is the direction most factories will go.

ili
Long Handed Inserter
Long Handed Inserter
Posts: 88
Joined: Thu Apr 14, 2016 6:19 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by ili »

the measurements for this chart were done on a different CPU (i7-6700K vs i7-4790K previously)
The downside is, that you are entering the realm of architecture-specific micro-optimization. If you aren't careful, it can even be bad for performance.
You are "entering the realm of architecture-specific micro-optimization" and you are testing only on Intel CPUs.
Please don't leave behind your AMD users
Buy or ask AMD for a Ryzen system and test your optimization on it to.
Ryzen have a very different caching architecture and your changes my hurt or improve performance on it differently

MINIMAN10000
Burner Inserter
Burner Inserter
Posts: 16
Joined: Tue Mar 21, 2017 7:02 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by MINIMAN10000 »

pleegwat wrote:A linked list of multiply-inheriting classes? I'd have expected an array of structs.
When it comes to performance of a large number of entries ( a few thousand or so ) it's better to use SOA https://software.intel.com/en-us/articl ... formations to quote intel
Although it may seem more logical to organize data in Array of Strutures (AoS), this organization makes it extremely difficult to access the memory for reads (gather) and writes (scatter). This prevents many optimizations by the compiler, most notably it often prevents efficient vectorization. Algorithms with this characteristic are often very poor performers on both Intel® Xeon Architecture and Intel® MIC Architecture.

If possible, it is desirable to reorganize data into a Stucture of Arrays (SoA) organization. Unfortunately, for real world applications this may prove to be a non-trivial amount of effort. However, the benefits of doing such a transformation may be quite significant.
Basically in modern languages AOS vs SOA is a non trivial change so it's often not used.

Cine
Burner Inserter
Burner Inserter
Posts: 7
Joined: Sat Aug 19, 2017 7:53 am
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by Cine »

If memory bandwidth is a problem, then it is probably not necessary to use a whole double as the frameReference. A single byte ought to be significant enough to prevent visual artifacts.
For most calculations in a game it is probably unnecessary to use more than a float.
Another trick to cut down memory usage is to use a memory pool for each kind of entity (or just same sized objects) and then use a short/24 bit int/int depending on max number of entities as the index into the pool. This is especially useful on 64 bit architecture where a pointer is otherwise a whole 8 bytes.
On x64 architecture only 48 bits are actually used for adressing meaning that it is unnecessary to store those last 2 bytes in high density structures.

kovarex
Factorio Staff
Factorio Staff
Posts: 8193
Joined: Wed Feb 06, 2013 12:00 am
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by kovarex »

Cine wrote:Another trick to cut down memory usage is to use a memory pool for each kind of entity (or just same sized objects) and then use a short/24 bit int/int depending on max number of entities as the index into the pool. This is especially useful on 64 bit architecture where a pointer is otherwise a whole 8 bytes.
But where do you find the pointer to the beginning of the array? It usually means, that you have one more memory indirection where you have to retrieve the pointer to the beginning of the array, and then the data itself.

Zavian
Smart Inserter
Smart Inserter
Posts: 1646
Joined: Thu Mar 02, 2017 2:57 am
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by Zavian »

If that pointer is being accessed often, then it is probably in cache. That means the extra indirection itself is relatively cheap, especially if doing things that way results in code that generates more cache hits and lower overall cache misses.

pleegwat
Filter Inserter
Filter Inserter
Posts: 278
Joined: Fri May 19, 2017 7:31 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by pleegwat »

Zavian wrote:If that pointer is being accessed often, then it is probably in cache. That means the extra indirection itself is relatively cheap, especially if doing things that way results in code that generates more cache hits and lower overall cache misses.
But it's still one extra memory region, taking up valuable L1 cache space and a precious TLB slot.

AlastarFrost
Burner Inserter
Burner Inserter
Posts: 13
Joined: Sun Jun 12, 2016 3:12 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by AlastarFrost »

mathturtle wrote:
AlastarFrost wrote: ...

It looks like everything is connected tightly with everything else so that you can not break it apart, but thats actually not true.
If you want to come to a reasonable simulation, not a perfect one, everything that has a large enough buffer creates a propagation delay of every action that is big enough to extrapolate the outcome without a big error margin. Thats why I mentioned trains: If a mining facility has to fill a train before the outcome is propagated (the train starts moving), you could simulate it until the train starts right away if noone is looking, and not touch it until the train leaves. Then you could simulate in large chunks until the buffer chests run full, until the next train arrives.

...
You assume noone is looking though. Map view breaks that assumption. Everyone I know leaves a radar at every outpost now so they can look at it if problems arise. Having the ability to view the outpost (and the number of ore in the chests at the station) in real time is great. Also you don't know if the biters are going to eat your power plant ...or come eat the outpost. Or maybe you get a little careless with your deconstruction planner and delete the one power pole that connects the power plant to everything else. Maybe you could get away with this optimization if you took away map view, turned off biters and decoupled power somehow (local power for the outpost?). But I don't think that is the direction most factories will go.
Map view is low resolution compared to the actual game window. I am talking about lowering the time resolution at loosely coupled transport systems. In Map view you can get away with updates every 500ms for production for example, but when you actually look at the moving belts you cant get away with it. And I dont even talk about lowering the the resolution of the actual simulation in the tightly coupled areas, just finding a way to simulate multiple ticks in a row to keep the caches hot. So the question is: How much could I fast forward a mining post or small remote factory that fills a train so that you don't notice when you are not actually there to look at it?.

And if that gives you a significant performance boost, you could even do double work on things players look at and discard the fast forwarded result if someone looks and switch to realtime simulation (given that it is a small area and that you save time for all the things you are not looking at).

And as I said in the first post, It has to be evaluated by the team that actually knows the code and tested. It is just an interesting idea.

Cine
Burner Inserter
Burner Inserter
Posts: 7
Joined: Sat Aug 19, 2017 7:53 am
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by Cine »

Those you would keep in a static field side by side for themselves so that they take up a single or two cache lines. It is very unlikely it would ever be evicted from the 1st level cache. You trade direct memory access to any location in memory for a extra single static mem load plus some extra instructions. Plus ofcourse your memory management becomes harder... however having all the same entities of the same size allocated in the same memory pool itself could also be very benifical to locality and lower the cost of allocation.
kovarex wrote:
Cine wrote:Another trick to cut down memory usage is to use a memory pool for each kind of entity (or just same sized objects) and then use a short/24 bit int/int depending on max number of entities as the index into the pool. This is especially useful on 64 bit architecture where a pointer is otherwise a whole 8 bytes.
But where do you find the pointer to the beginning of the array? It usually means, that you have one more memory indirection where you have to retrieve the pointer to the beginning of the array, and then the data itself.

mensinda
Burner Inserter
Burner Inserter
Posts: 5
Joined: Wed Jun 29, 2016 7:14 pm
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by mensinda »

kovarex wrote:
Rseding91 wrote:
pleegwat wrote:I don't know your code of course, and I'm hampered because my experience is C not C++. Depending on the list and the type of removals, I'd wonder if you really need to delete them or can just mark them inactive. If a new entity gets placed on the map (or one gets removed), then yes the list needs to be updated. If it's just an assembler or inserter going into an indefinite wait (or coming out of it), just marking it as inactive is probably enough.

I don't have numbers to back this up, but if you have an array of 100 entities and 80 of them are inactive, iterating past the inactive entities without dereferencing the entity is probably still just as fast as iterating over a 20-element array.
That approach would require dynamic memory allocation for each entity in the array to store the "is active" value. Otherwise you have to store it on the entity which means you still need to deference the entity as you iterate.

Either way, it's still slower than what we have now and doesn't handle the case when an entity is destroyed being O(N) + not able to happen during an update loop as it would invalidate the array while iterating it.
Actually, one possibility could be to have a vector of ActiveEntity poitners and remove from it by switching with the last one. The entity would know the index of its position in the vector. This way you still have O(1) removal, instead of 2 pointers, you have 1 index, and you only have to touch one entity when removing, not the 2 neighbours.
Have you considered using PLF Colony (http://plflib.org/colony.htm)? PLF Colony is an unordered container that basically has most benefits of std::vector (memory locality) and std::list (O(1) insert and remove).

I recommend the website and this video: https://www.youtube.com/watch?v=wBER1R8YyGY for mor information.

From the website:
A colony is the highest-performance C++ template-based data container for high-modification scenarios with unordered data. Specifically it provides better performance than other std:: library containers when:

Insertion order is unimportant,
Insertions and erasures to the container are occurring frequently in realtime ie. in performance-critical code, and/or
Pointers which point to non-erased container elements must not be invalidated by insertion or erasure.

Cine
Burner Inserter
Burner Inserter
Posts: 7
Joined: Sat Aug 19, 2017 7:53 am
Contact:

Re: Friday Facts #204 - Another day, another optimisation

Post by Cine »

The ordering of entities is very likely to be important in code like this. Furthermore changes in the linked list is likely to be rare but insertions and deletions more common.

A better solution than a linked list in these scenarios are array linked lists. Here the linked list is between arrays of elements rather than individual elements. Every once in a while you then merge arrays to skip a jump from one array to another. E.g. you drop a blueprint and wham you end up with 1 array with perfect locality for all entities in the structure.

mensinda wrote:
kovarex wrote:
Rseding91 wrote:
pleegwat wrote:I don't know your code of course, and I'm hampered because my experience is C not C++. Depending on the list and the type of removals, I'd wonder if you really need to delete them or can just mark them inactive. If a new entity gets placed on the map (or one gets removed), then yes the list needs to be updated. If it's just an assembler or inserter going into an indefinite wait (or coming out of it), just marking it as inactive is probably enough.

I don't have numbers to back this up, but if you have an array of 100 entities and 80 of them are inactive, iterating past the inactive entities without dereferencing the entity is probably still just as fast as iterating over a 20-element array.
That approach would require dynamic memory allocation for each entity in the array to store the "is active" value. Otherwise you have to store it on the entity which means you still need to deference the entity as you iterate.

Either way, it's still slower than what we have now and doesn't handle the case when an entity is destroyed being O(N) + not able to happen during an update loop as it would invalidate the array while iterating it.
Actually, one possibility could be to have a vector of ActiveEntity poitners and remove from it by switching with the last one. The entity would know the index of its position in the vector. This way you still have O(1) removal, instead of 2 pointers, you have 1 index, and you only have to touch one entity when removing, not the 2 neighbours.
Have you considered using PLF Colony (http://plflib.org/colony.htm)? PLF Colony is an unordered container that basically has most benefits of std::vector (memory locality) and std::list (O(1) insert and remove).

I recommend the website and this video: https://www.youtube.com/watch?v=wBER1R8YyGY for mor information.

From the website:
A colony is the highest-performance C++ template-based data container for high-modification scenarios with unordered data. Specifically it provides better performance than other std:: library containers when:

Insertion order is unimportant,
Insertions and erasures to the container are occurring frequently in realtime ie. in performance-critical code, and/or
Pointers which point to non-erased container elements must not be invalidated by insertion or erasure.

Post Reply

Return to “News”