Friday Facts #176 - Belts optimization for 0.15

Regular reports on Factorio development.
ratchetfreak
Filter Inserter
Filter Inserter
Posts: 952
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by ratchetfreak »

TheYeast wrote:
ratchetfreak wrote:It sounds like you could add a few "detection regions" [...] This would let circuit detection not influence the segment sizes.
I don't know whether others do but I rarely use circuit detection. If my factory uses 10 at a time it is a lot. But even a few hundred extra belt segments due to circuit detection would hardly be noticeable in performance as far as I can tell... I wouldn't go there unless profiling proves this to be a problem.
This is still cheaper than splitting up the segment into 3 (pre, post and the 9 belt segment which contains the connection).

Active connections that could stop the belt will require splitting the segment.
sillyfly
Smart Inserter
Smart Inserter
Posts: 1101
Joined: Sun May 04, 2014 11:29 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by sillyfly »

bobingabout wrote:Forgive me if I'm not up to speed with all this, but from what I read about the IPv6 there... that is total bullshit. Not what you're saying, but the fact that 0xFFFF == 0x0000.

You can't write a spec that says "0xFFFF hasn't been checked, and 0x0000 is invalid and should be dropped" without saying "if a genuine checksum results in 0xFFFF or 0x0000 you need to do this"


UDP checksum calculations use ones-complement arithmetic, and in it (assuming you use 16 bit numbers) 0x0000 does equal 0xFFFF (meaning +0 = -0). This is precisely why the UDP specification (in RFC 768) states that if the calculated value of the checksum field is 0x0000 it should be replaced with 0xFFFF. Presumably, the bug is that some routers ignore this specification, and pass 0x0000 as 0x0000.
Daid
Fast Inserter
Fast Inserter
Posts: 200
Joined: Sun Jul 03, 2016 7:42 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by Daid »

sillyfly wrote:
bobingabout wrote:Forgive me if I'm not up to speed with all this, but from what I read about the IPv6 there... that is total bullshit. Not what you're saying, but the fact that 0xFFFF == 0x0000.

You can't write a spec that says "0xFFFF hasn't been checked, and 0x0000 is invalid and should be dropped" without saying "if a genuine checksum results in 0xFFFF or 0x0000 you need to do this"


UDP checksum calculations use ones-complement arithmetic, and in it (assuming you use 16 bit numbers) 0x0000 does equal 0xFFFF (meaning +0 = -0). This is precisely why the UDP specification (in RFC 768) states that if the calculated value of the checksum field is 0x0000 it should be replaced with 0xFFFF. Presumably, the bug is that some routers ignore this specification, and pass 0x0000 as 0x0000.
To add to the fun, some network cards have build in CRC calculations so the software doesn't have to do this (not that it is CPU intensive these days). Then sniffers can also report "wrong" values, as they never see the correct value of send packets. Maybe some of these offloaded calculators have bugs as well :-)
Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by Harkonnen »

Moving the belt under items is essentially same as adjusting initial/final gap sizes - thing with those yellow lines.
For circuits - when it's just reading counts - yes, it can operate on part of the belt. It's just that you have to iterate all previous items to get to that start offset that makes you wanting to keep transport sequence under circuit reader short (not 1 segment, but probably 3-9 segments). As for stopped belts due to circuit connection or marking for deconstruction - these must be detached from the rest for sure just because they have different speed (zero speed).
User avatar
Gergely
Filter Inserter
Filter Inserter
Posts: 616
Joined: Sun Apr 10, 2016 8:31 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by Gergely »

Cyberboss_JHCB wrote:Will underground belts still be any faster than regular belts, even if it's just miniscule?
Let's hope not.
ratchetfreak
Filter Inserter
Filter Inserter
Posts: 952
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by ratchetfreak »

Harkonnen wrote:Moving the belt under items is essentially same as adjusting initial/final gap sizes - thing with those yellow lines.
For circuits - when it's just reading counts - yes, it can operate on part of the belt. It's just that you have to iterate all previous items to get to that start offset that makes you wanting to keep transport sequence under circuit reader short (not 1 segment, but probably 3-9 segments). As for stopped belts due to circuit connection or marking for deconstruction - these must be detached from the rest for sure just because they have different speed (zero speed).
But once you have that start and end offset you can maintain and update that offset frame to frame and the code I posted to do that has build-in pulse detection.

Creating the segments seems to have a major slowdown potential when bots are constructing a bunch of belts and segments are getting created and appended. Or when a large tangle of belts get marked for deconstruction.
Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by Harkonnen »

On that offset yes - I have logics like this for inserters already, so it may be applied to circuit networks as well. As for slowdowns - I don't think this will ever be an issue. Whenever something slow happens like once in 5 or 10 ticks, it's hardly noticeable. Also automatic merging will not be analyzed every tick for ever transport line, it will be kinda spread out, so within a minute of gameplay all active belts decide if they want to merge/unmerge on their distinct ticks.
ratchetfreak
Filter Inserter
Filter Inserter
Posts: 952
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by ratchetfreak »

Harkonnen wrote:On that offset yes - I have logics like this for inserters already, so it may be applied to circuit networks as well. As for slowdowns - I don't think this will ever be an issue. Whenever something slow happens like once in 5 or 10 ticks, it's hardly noticeable. Also automatic merging will not be analyzed every tick for ever transport line, it will be kinda spread out, so within a minute of gameplay all active belts decide if they want to merge/unmerge on their distinct ticks.
Will different peers be able to have different segment distribution or must they be synced as well?

It should be possible to maintain the same determinism with different boundaries (if that bug Zeblote found is fixed)
arbarbonif
Fast Inserter
Fast Inserter
Posts: 110
Joined: Fri Jul 01, 2016 2:46 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by arbarbonif »

GuiltyBystander wrote:I'm excited for the new belt optimization changes, however, I have one question.
With 0.15 you will never ever have to build underground belts for the sake of performance :) Factorio's heart is beating nicely now and will not distract from blueprinting the truly important things.
Currently, if you want max compression coming out of any high output line of machines, you need to have your inserters outputting onto a underground belt because inserters can't put items into the tiny gaps that sometimes exist between items. Will this new code allow inserters to create fully compressed line without the undeground belt tricks?
Hmm, if I'm understanding the new way correctly, this should be easy to fix now. Basically if an inserter is poised to insert and sees a gap, of any size, it places and then cascades up the belt, removing/shortening gaps as it goes until the full space for the item is freed. Since items are all the same size and gaps have a minimum size, there would be a fixed number of checks that would need to be made. So the inserter would compress the belt for you. If there was somehow not enough space, it would remove all the gaps and then abort the placement (leaving just the less than full item gap). So the inserter (or act of inserting) would end up handling the belt compression.
Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by Harkonnen »

ratchetfreak wrote:
Harkonnen wrote:On that offset yes - I have logics like this for inserters already, so it may be applied to circuit networks as well. As for slowdowns - I don't think this will ever be an issue. Whenever something slow happens like once in 5 or 10 ticks, it's hardly noticeable. Also automatic merging will not be analyzed every tick for ever transport line, it will be kinda spread out, so within a minute of gameplay all active belts decide if they want to merge/unmerge on their distinct ticks.
Will different peers be able to have different segment distribution or must they be synced as well?

It should be possible to maintain the same determinism with different boundaries (if that bug Zeblote found is fixed)
While I would like to have transport lines not affecting determinism, and item movement is coded that way for as much as possible (that bug Zeblote found is really a bug, it's not intended side-effect of new implementation), still there are limitations like number of recursion steps during updating streaks of transport lines, so that may break determinism anyway if lines are merging at random. So every savegame will have that distribution saved and rules regarding merging/unmerging sequences as the game proceeds will be deterministic.
eidolonFIRE_XI
Inserter
Inserter
Posts: 21
Joined: Wed Mar 04, 2015 9:38 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by eidolonFIRE_XI »

About the UDP ISSUE.


https://ask.wireshark.org/questions/139 ... m-be-zero0

UDP has a special case where 0x0000 is reserved for "no checksum computed". Thus for UDP, 0x0000 is illegal and when calculated following the standard algorithm, replaced with 0xffff.
User avatar
TheYeast
Inserter
Inserter
Posts: 35
Joined: Sun Feb 05, 2017 3:57 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by TheYeast »

Let me clarify the belt algorithm I posted earlier. Since no one commented I may have failed to explain it clearly.

We would like to retain the ability to quickly check a specific region of a belt segment for items. If we can store the position of an item, we can apply a binary search on the item buffer for those positions.

The items on a belt can be separated into two ranges: the freely moving items (range0), and the blocked items at the end of the belt, which move slower or not at all (range1). When freely moving items collide with the blocked items, they migrate from range0 to range1.

Keeping track of these ranges is easy. We let an index into the item buffer point to the last item in range0. When items collide into range1, this index is decremented to reflect the change.

For the purpose of storing item positions, note that the positions of items in either range, relative to each other, never change. Or put differently: when the absolute positions change, they all change by exactly the same amount. Therefore, for each item we can split the absolute position into a constant part xRel and a variable part shift. Each range has its own shift, but the shift is shared between all items in the range.

Item absolute positions can be converted to relative and visa versa using:

Code: Select all

For range0:

item[i].xRel = xAbs         - belt.shift0;
xAbs         = item[i].xRel + belt.shift0;

For range1:

item[i].xRel = xAbs         - belt.shift1;
xAbs         = item[i].xRel + belt.shift1;
When an item migrates from range0 to range1, its xRel is recalculated using

Code: Select all

item[i].xRel = item[i].xRel + belt.shift0 - belt.shift1;
Every tick belt.shift0 is incremented, while belt.shift1 is only incremented when an item can leave the belt.

When belt.shift1 cannot be incremented, we need to check for item collision. The collision check is cheap, but can be further optimized by caching the gap size between range0 and range1 into a belt.gap variable. Only when this reaches 0 migration needs to be performed. Migration is also cheap. The index to the last item in range0 is decreased, and item[migrating].xRel is adjusted by (belt.shift0 - belt.shift1)

When the belt needs to be checked for items on position xAbs, range0 can be binary searched for a value xRel equal to (xAbs - belt.shift0) and range1 can be binary searched for a value xRel equal to (xAbs - belt.shift1). This is the advantage over storing distances between items. Two binary searches can be used to find an item instead of having to iterate and sum distances. This should allow for much larger belt segments.

Also the code should be easier to read, understand, maintain and reason about.

But I suppose I might be overlooking something, otherwise the Factorio devs would probably have taken this approach. If anyone can tell me what I'm missing I would be grateful.
Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by Harkonnen »

TheYeast
A very good algo indeed! :) I was thinking into this thing with two ranges, but I was thinking of it in terms of range0 movable and range1 stationary which gave a problem that each time blocked belt starts moving due to last item leaving to next belt, I would have to iterate every single item of stationary range1 to migrate them to movable range0. Allowing both range0 and range1 to move with distinct shifts solves that problem - that didn't come to my head. However there is still a case where my algorithm beats yours :) That's the case when inserter picks up an item from range1, so the gap appears. With my algo that gap can be updated at O(1), with your algo I will have to iterate all the remainder of range1 to migrate those items to range0. Yes - with my algo I will still have to iterate items to get that item locked by inserter, but it happens only once when inserter reaches the belt, afterwards that item is kept tracked via which part of items array has moved last tick and for what delta disstance which allows to update both locked item index and absolute position. Another thing to notice - due to all that cache stuff binary search would start giving benefit at like 1000 items which is like 250 belt segments which is rare case, especially under inserters. I actually was surprised that when I merged ALL belts in giga-factories and inserter item locking was not yet implemented (so inserter scanned entire line with every tick) - their time in profiler upped from like 9ms to like 11ms - a lot less than I expected. Forward scans of arrays are very fast today.
ratchetfreak
Filter Inserter
Filter Inserter
Posts: 952
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by ratchetfreak »

Are the segments implemented as a circular buffer?

Because if not then I can see the constant shifts of items along the vector being a problem when items get pushed off/get inserted at the end for saturated express belts.

A segment can only hold so many items so that's immediately an upperbound for the circular buffer. And most of the logic remains mostly the same except for wrapping the index.
User avatar
TheYeast
Inserter
Inserter
Posts: 35
Joined: Sun Feb 05, 2017 3:57 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by TheYeast »

Thank you for the reply Harkonnen! Yes, I figured we both had to do exactly one iteration for removing an item. You to find it, me to migrate items back to range0. I don't see why my iteration is more expensive than yours but I'll take your word for it :)

But then again, I assumed that removing an item from the buffer would mean all item data before (or after) the removed index would have to be shifted into the now invalidated buffer entry, so iteration would happen anyway.

With this shifting of buffer entries in mind my first instinct would be not to merge belts under inserters at all, unless proven a bottleneck that can be remedied by merging.

Or does the item buffer allow zero-width 'space' entries to replace the removed item, as to avoid having to shift other entries into the gap?
ratchetfreak
Filter Inserter
Filter Inserter
Posts: 952
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by ratchetfreak »

TheYeast wrote:Thank you for the reply Harkonnen! Yes, I figured we both had to do exactly one iteration for removing an item. You to find it, me to migrate items back to range0. I don't see why my iteration is more expensive than yours but I'll take your word for it :)

But then again, I assumed that removing an item from the buffer would mean all item data before (or after) the removed index would have to be shifted into the now invalidated buffer entry, so iteration would happen anyway.

With this shifting of buffer entries in mind my first instinct would be not to merge belts under inserters at all, unless proven a bottleneck that can be remedied by merging.

Or does the item buffer allow zero-width 'space' entries to replace the removed item, as to avoid having to shift other entries into the gap?
is probably more overhead in moving the item from belt to belt every few frames than there is to iterate over a few dozen items to shift them over once every second.
User avatar
TheYeast
Inserter
Inserter
Posts: 35
Joined: Sun Feb 05, 2017 3:57 pm
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by TheYeast »

ratchetfreak wrote:Are the segments implemented as a circular buffer?
Yup, he said so earlier on this thread :)
Harkonnen wrote:The new implementation has linear ring-buffer.
viewtopic.php?f=38&t=40790&start=40#p241268
Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by Harkonnen »

Yes, it's a circular buffer which dynamically grows as needed. std::deque splits everything in packs of like 16 bytes to maintain iterator persistence, so it was not an option (and we don't need iterator persistence here anyway). boost circular_buffer was avoided due to its fixed capacity which would be needed to deal with and control and make reallocations not so predictable. also it overwrites items through assignment operator which was not suitable for us due to technical reasons, so I made my own simple one which loops as classic fifo and grows x2 when tail reaches head and shuffles everything via memcpy/memmove to avoid extra overheads of c++ move semantics.

As for iterating after removal by inserter - yes, such problem exists. Actually inserting virtual item instead of gap is an interesting idea. It will complicate things though when dropping items, but probably not much. I think I'll try that.

As for "not merging belts under inserters at all" - we thought about such option, but actually rethrowing item from one line to another is much bigger performance hog than scanning several belts for inserter to find an item, all due to sequential ram access.
orzelek
Smart Inserter
Smart Inserter
Posts: 3923
Joined: Fri Apr 03, 2015 10:20 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by orzelek »

Will it get more interesting when stack inserter rolls over the belt and collects 3-4 items from one line and another 3 from second line? ;)
Engimage
Smart Inserter
Smart Inserter
Posts: 1069
Joined: Wed Jun 29, 2016 10:02 am
Contact:

Re: Friday Facts #176 - Belts optimization for 0.15

Post by Engimage »

By the way. I was thinking of belt optimizations and multithreading and came up with an idea and I want to share it with you.

I was thinking about segmentation of belts, rails etc. And I read about your division of the map into chunks and tries to process them separately with some sync procedures.
But what if the whole factory would be divided into segments based on item adjacency to each other? Most of factories consist of multiple bases connected to each other by trains/electricity which gives an opportunity to process these bases in paralel. The only issues here are electricity, bots, circuit networks.

Electricity is not a problem as it is already mostly a global thing reflecting a total number of accumulators and energy generators.
So when any object is placed on the map you can check if it has some adjacent ones to group it up with them and assign a segment number to it. Then every segment can be processed in a separate thread.

So while there might be some big central base, removing all outposts and defences into separate threads will most likely remove the bottleneck anyways.

I know this is not a complete scheme but just and idea for you to think about.
Post Reply

Return to “News”