Friday Facts #176 - Belts optimization for 0.15

Post by **Harkonnen** » Wed Feb 08, 2017 10:40 pm

PacifyerGrey wrote:By the way. I was thinking of belt optimizations and multithreading and came up with an idea and I want to share it with you.

I was thinking about segmentation of belts, rails etc. And I read about your division of the map into chunks and tries to process them separately with some sync procedures.
But what if the whole factory would be divided into segments based on item adjacency to each other? Most of factories consist of multiple bases connected to each other by trains/electricity which gives an opportunity to process these bases in paralel. The only issues here are electricity, bots, circuit networks.

Electricity is not a problem as it is already mostly a global thing reflecting a total number of accumulators and energy generators.
So when any object is placed on the map you can check if it has some adjacent ones to group it up with them and assign a segment number to it. Then every segment can be processed in a separate thread.

So while there might be some big central base, removing all outposts and defences into separate threads will most likely remove the bottleneck anyways.

I know this is not a complete scheme but just and idea for you to think about.

Such method would have drawback in granularity. When you have 5000 chunks to process by 8 threads - almost all the time all threads are busy. If instead you have 20-30 pieces of different size with unpredictable update time - in the end of the update only one thread will be working with others having no more jobs for them.

ratchetfreak · Post by **ratchetfreak** » Thu Feb 09, 2017 12:07 am

And segmenting a large base is not an easy task.

In fact it's the min k cut problem which is NP-complete for an optimal solution

TheYeast · Post by **TheYeast** » Fri Feb 10, 2017 10:00 am

Harkonnen wrote:Actually inserting virtual item instead of gap is an interesting idea. It will complicate things though when dropping items, but probably not much. I think I'll try that.

Cool, I may have contributed to Factorio development. I would love to hear if this made a difference!

Thinking about belt optimization made me wonder whether belt segments get partially/best-effort sorted such that downstream segments get processed before upstream segments, to make room for the incoming items. Otherwise i'd expect items on a long compressed belt to move really slow if segments are processed in the wrong (worst case: opposite) order. Physically sorting them in memory might also give you even more sequential ram access.

Such a sorting would also identify disconnected belt layouts eligible for parallel processing. But indeed the main base might very well be much larger than all outposts combined. Dividing it and synchronizing the edges sounds like a difficult job, but better if it can be done.

Post by **Harkonnen** » Fri Feb 10, 2017 2:19 pm

TheYeast wrote:Cool, I may have contributed to Factorio development. I would love to hear if this made a difference!

Yeah, I know that feeling

community helps a lot. There are many random things going on on forums but every 100th post has something in it that we can use. So keep it up

TheYeast wrote:Thinking about belt optimization made me wonder whether belt segments get partially/best-effort sorted such that downstream segments get processed before upstream segments, to make room for the incoming items

That's exactly how it works

Before belt segment is updated, belt piece that it leads to is updated first recursively if it was not already updated this tick, with splitters both endpoints get update first. Limitation of that recursion is 1000 (that limit appeared after someone got stack overflow with that long belt sequence full of items). And that is actually the thing that will be broken if belts get per-chunk-threading because that recursion will have to stop at chunk boundary (32 tiles) making those sporadic decompressions frequent enough to be noticed.

TheYeast · Post by **TheYeast** » Fri Feb 10, 2017 5:59 pm

Harkonnen wrote:There are many random things going on on forums but every 100th post has something in it that we can use. So keep it up

If you think it is worth sifting through the other 99 posts, then I will

Harkonnen wrote:That's exactly how it works Before belt segment is updated, belt piece that it leads to is updated first recursively if it was not already updated this tick, with splitters both endpoints get update first. Limitation of that recursion is 1000 (that limit appeared after someone got stack overflow with that long belt sequence full of items).

Hmm but if I read this correctly you seem to say the recursion is performed at every tick, and segments are visited multiple times only to find they were already updated.

But it seems to me the recursion always yields the same order. If the recursion could be performed only once (until a player edits the factory) and the belt segments are stored in the order they should be processed
then you can simply iterate over the segment list once every tick. Less stack usage during a tick might save some memory bandwidth. And if you can actually copy the segment state into a struct array (instead of a list of pointers) you get that sweet sequential ram access as well. I can't judge if it is worth the effort though..

Anyway, I was contemplating the possibility to get natural belt behaviour without any sorting or recursion. It made my head hurt, but I wonder if this could work:

Process all segments in an arbitrary order. All segments know the order in which they are processed through an order index.
If a belt segment is processed before the segment it leads into, and there is no space to deliver anything, it is allowed to do so anyway and put the receiving segment into a pressurised state. When that segment gets processed later in the tick it either de-pressurises if it can get rid of stuff, or stays pressurised until it can. Once pressurised a segment cannot be pressurised more.

What would happen? If segments are processed in order from origin to destination (normally worst case), instead of a gap travelling backwards one segment/tick, a pressure wave travels forwards at full speed and dissolves naturally if an item can be removed.

If a large stretch of belt is blocked, a pressure will build up in out-of-order segments, putting some more pixels worth of items on the belt than normal. When the belt unblocks it de-pressurises first before items at the origin start moving again.

Alternatively all segments could be allowed to pressurise (so also when their processing order is normal). This would make visual artefacts in a blocked line consistent, but the rate at which the artefacts accumulate somewhat inconsistent.

In my mental simulation it works pretty sweet. I'm not saying this is better than recursion/sorting, but I wanted to share the thought anyway

Post by **Harkonnen** » Sat Feb 11, 2017 12:36 am

TheYeast
All transport lines have only one exit. It can fork only with splitter. Some lines though may have several inputs (due to side-loading). So almost always it's a linear case, but yeah - there is first a recursion wave to reach the last active transport line, and then it all unwinds to do the actual movement. Part of that belt optimization is to keep transport-line-manager, so it goes away from active entities (currently I hacked it by attaching whole line to leading belt piece, but it was just to get working concept sooner). With that transport-line manager - yeah, it will control the order, recursion will be replaced with a loop, and - YES, I am thinking every night about reogranizing belt arrays to resemble that order of processing transport line updates so that ram access is linear when sequential transport lines get updated.

chrisgbk · Post by **chrisgbk** » Sat Feb 11, 2017 11:18 am

Commander Gizmo wrote:I'm sure this has already been covered somewhere, but you got me interested. A quick google search had me reading some interesting papers. Forgive me if my interpretation is wrong, as it's been a lot of years since I renewed my CCNA.

According to the IETF RFC 6935 and RFC 2460, IPv6 requires that a zero value UDP checksum is invalid and the packet must be dropped. Therefore, to send such a packet, you must use a checksum of 0xFFFF, which indicates "checksum not calculated", or zero. Thus, some equipment apparently erroneously views 0xFFFF = 0x0000 and drops the packet, such as an example here, and one here. Basically, it appears to be a somewhat widespread problem in developers not considering the edge case where a checksum of 0xFFFF will need special handling to be distinguished from 0x0000 when doing binary comparisons.

In IPv4, setting the UDP checksum field to 0x0000 indicates that the checksum is not calculated, causing nodes to pass on data that only the end receiver verifies. IPv6 mandates that all UDP packets must have checksums calculated, so 0x0000 is never valid. (Except "Applicability Statement for the Use of IPv6 UDP Datagrams with Zero Checksums" [RFC6936]. This is used for ie: tunneling, where inspection of every UDP packet on all the hops causes higher load than if the validation is left to only the receiver to unbox the tunneled data.)

The stored value in the header,( ie: checksum result) is the bitwise compliment of the 16 bit 1's compliment sum of certain fields. Because the sum of 1's compliment numbers can only be 0x0000 (ie: +0) if every number is 0, and a UDP packet can never be all 0, the result 0xFFFF(ie: ~(+0)) is not possible; only 0x0000 (ie: ~(-0)) is possible, because by adding you can only get 0xFFFF (ie: -0).

Where the checksum result comes out to 0x0000 naturally, it is transformed to 0xFFFF because of that conflict with the meaning of 0x0000; 0xFFFF does NOT mean that the checksum is not calculated. This transformation is the same in IPv4 as it is in IPv6. Some vendors, however, decided to implement 0xFFFF as a code for diagnostic packets only, perhaps due to lax adherence to standards. Some others, I'm sure, forgot the transformation step, and when calculating the checksum don't transform the result 0x0000 to 0xFFFF on send, or from 0xFFFF to 0x000 on receive(see below for RFC 1142/1624). This implies that roughly 0.0015% of valid random packets get dropped as invalid by only certain devices on the network.

RFC 768: UDP IPv4 wrote: If the computed checksum is zero, it is transmitted as all ones (the
equivalent in one's complement arithmetic). An all zero transmitted
checksum value means that the transmitter generated no checksum (for
debugging or for higher level protocols that don't care).

RFC 2460: UDP IPv6 wrote: Unlike IPv4, when UDP packets are originated by an IPv6 node,
the UDP checksum is not optional. That is, whenever
originating a UDP packet, an IPv6 node must compute a UDP
checksum over the packet and the pseudo-header, and, if that
computation yields a result of zero, it must be changed to hex
FFFF for placement in the UDP header.

It's important to note that UDP packets deviate from other IP packets in this regard; in other IP packets, 0xFFFF is not usually considered valid (RFC 1624), and 0x0000 has no special meaning so is a valid checksum. Most of your other links are to discussions on IP in general or TCP/IP. This is one of the reasons why RFC 1624 has a discussion on this topic of positive/negative zeroes, and the avoiding of comparing checksums directly. As well, an earlier RFC, 1141, included a technical issue regarding positive/negative zeroes due to the nature of 1's compliment arithmetic.

RFC 1624 wrote: Although this equation appears to work, there are boundary conditions
under which it produces a result which differs from the one obtained
by checksum computation from scratch. This is due to the way zero is
handled in one's complement arithmetic.

In one's complement, there are two representations of zero: the all
zero and the all one bit values, often referred to as +0 and -0.
One's complement addition of non-zero inputs can produce -0 as a
result, but never +0. Since there is guaranteed to be at least one
non-zero field in the IP header, and the checksum field in the
protocol header is the complement of the sum, the checksum field can
never contain ~(+0), which is -0 (0xFFFF). It can, however, contain
~(-0), which is +0 (0x0000).

RFC 1141 yields an updated header checksum of -0 when it should be
+0. This is because it assumed that one's complement has a
distributive property, which does not hold when the result is 0 (see
derivation of [Eqn. 2]).

...

If an end system verifies the checksum by including the checksum
field itself in the one's complement sum and then comparing the
result against -0, as recommended by RFC 1071, it does not matter if
an intermediate system generated a -0 instead of +0 due to the RFC
1141 property described here. In the example above:

0xCD7A + 0x3285 + 0xFFFF = 0xFFFF
0xCD7A + 0x3285 + 0x0000 = 0xFFFF

However, implementations exist which verify the checksum by computing
it and comparing against the header checksum field.

It is recommended that intermediate systems compute incremental
checksum using the method described in this document, and end systems
verify checksum as per the method described in RFC 1071.

Knowing what we know now, the standard -should- have been designed so that 0xFFFF was never a valid checksum, the error in RFC 1141 should not have been made, and UDP should have been designed so that 0xFFFF signalled that the checksum should not be computed by intermediate nodes instead of 0x0000, but hindsight is 20/20. The creators obviously envisaged an implementation like that presented in RFC 1071 where 0x0000 and 0xFFFF are identical from a checksumming standpoint, and didn't expect a comparison of checksums directly.

Expressed as pseudo-code:

Code: Select all

Correct non-RFC 1071 UDP checksum

checksumA = packet.checksum
if checksumA == 0x0000 then process(packet)
if checksumA == 0xFFFF then checksumA = 0x0000
packet.checksum = 0x0000
checksumB = calculate_checksum(packet)
if checksumA == checksumB then process(packet) else drop(packet)


Correct RFC 1071 UDP checksum

checksum = packet.checksum
if checksum == 0x0000 then process(packet)
checksum = calculate_checksum(packet)
if checksum == 0xFFFF then process(packet) else drop(packet)


What actually happens on some non-RFC 1071 systems

checksumA = packet.checksum
if checksumA == 0x0000 then process(packet)
packet.checksum = 0x0000
checksumB = calculate_checksum(packet)
if checksumA == checksumB then process(packet) else drop(packet)

The last one fails in two cases: one, UDP packets which have a final checksum result of 0x0000(ie: ~(-0)) that was transformed to 0xFFFF, and two, packets which are generated with the strange checksum result of 0xFFFF(ie: ~(+0)) from RFC 1141()includes non-UDP packets). This unfortunately exists in production code/devices.

tl;dr:

Systems that do not adhere to the RFC 1071 method of checksum verification run the risk of compatibility issues with RFC 1141 systems which haven't switched to RFC 1624, and systems that do not properly follow RFC 768/2460 for UDP IPv4 and IPv6 respectively will drop certain UDP packets in error. Systems which DO follow RFC 1071 will be unaffected, because that method of checksum verification is equivalent in the case of 0x0000 AND 0xFFFF.

Zyrconia · Post by **Zyrconia** » Wed Feb 15, 2017 2:05 pm

Any chance of getting this ported to 0.14?

It will still take some time until 0.15 get's released and months after that until all the mods get updated (especially the one that touch science like bobs and the uranium power ones), so some of us will be playing 0.14 for quite some time after 0.15 release.

Post by **Rseding91** » Wed Feb 15, 2017 2:16 pm

Zyrconia wrote:Any chance of getting this ported to 0.14?

It will still take some time until 0.15 get's released and months after that until all the mods get updated (especially the one that touch science like bobs and the uranium power ones), so some of us will be playing 0.14 for quite some time after 0.15 release.

No.

Zyrconia · Post by **Zyrconia** » Thu Feb 16, 2017 11:12 am

Rseding91 wrote:
Zyrconia wrote:Any chance of getting this ported to 0.14?

It will still take some time until 0.15 get's released and months after that until all the mods get updated (especially the one that touch science like bobs and the uranium power ones), so some of us will be playing 0.14 for quite some time after 0.15 release.
No.

OK, thanks for the answer.

Recon777 · Post by **Recon777** » Thu Feb 23, 2017 8:23 am

What about pipes? Do underground pipes have a better performance than a chain of above-ground pipes? I heard that the game considers a long underground pipe as a single entity and the liquid throughput is dependent on how many pipe entities there are.

Post by **Rseding91** » Thu Feb 23, 2017 8:39 am

Recon777 wrote:What about pipes? Do underground pipes have a better performance than a chain of above-ground pipes? I heard that the game considers a long underground pipe as a single entity and the liquid throughput is dependent on how many pipe entities there are.

Yes.

bobingabout · Post by **bobingabout** » Thu Feb 23, 2017 10:42 am

From what I understand via modding... Each "underground pipe" entity (each end piece, therefore 2 pipes for an underground pipe segment) has one normal pipe connection, and 1 special "underground" pipe connection. The underground pipe therefore basically functions as if you had 2 pipes next to each other, except there can be up to a 9 tile gap between the two, even more if you mod them for greater distances.

Moonheart08 · Post by **Moonheart08** » Thu Feb 23, 2017 3:59 pm

One way they could possibly make pipes faster is handle them how Space Station 13 handles ITS pipes, it first combines straight lines of pipes and treats them as one to save processing time, then, it keeps track of and calculates all the lines together using a network of lines (all the lines that connect to eachother join into a pipenet)

bobingabout · Post by **bobingabout** » Fri Feb 24, 2017 9:13 am

One of the problems with treating a straight line as a single piece of pipe though, is that then makes it impossible to simulate a flow over that length.

Post by **Rseding91** » Fri Feb 24, 2017 9:22 am

bobingabout wrote:One of the problems with treating a straight line as a single piece of pipe though, is that then makes it impossible to simulate a flow over that length.

That's easy. You just take the total volume of the pipe and the amount in the pipe and with a little math that's your flow rate.

Recon777 · Post by **Recon777** » Mon Feb 27, 2017 12:47 am

bobingabout wrote:From what I understand via modding... Each "underground pipe" entity (each end piece, therefore 2 pipes for an underground pipe segment) has one normal pipe connection, and 1 special "underground" pipe connection. The underground pipe therefore basically functions as if you had 2 pipes next to each other, except there can be up to a 9 tile gap between the two, even more if you mod them for greater distances.

Yeah, Tungsten pipes are epic! Many thanks.

fregate84 · Post by **fregate84** » Mon May 29, 2017 8:29 am

Any news about belt optimisation ?

Because my (big) vanilla factory run at about ~25 UPS. So i'm very interesting by this upgrade. Any chance to have it on minor release of 15.x ? Do you still work on it ?

Post by **Harkonnen** » Mon May 29, 2017 8:54 am

We plan to include it in 0.16 which shouldn't be far from 0.15. Optimization works themselves are finished, we just didn't want to get many ambiguous desyncs if there are any because 0.15 was a big update on its own. As an experiment I may try your save with that branch to see how much it improves things.

mrvn · Post by **mrvn** » Mon May 29, 2017 10:04 am

posila wrote:
Tankh wrote:
whenever a belt compresses - it will stay that way forever. It means that once two items are stuck close together - away from inserters they will stay stuck close forever. This property allows us to cache the index of the last positive gap location, and update it on the fly because that index can never increase, only decrease.
I'm guessing you also tested this for items going from a slow belt to fast belt. It's probably treated like a separate belt section.
That is correct, only belts of the same type are merged to the same section.

But do they have to be split? I think now you measure the distance of items in tiles, e.g. 0.1 tiles separation. Why not measure the distance in time, as in 0.1s from the first item passing to the second item for a stationary observer.

This has 2 implications:

1) inserters know how long to sleep till the next item passes irespective of the type of belt.
2) the time distance doesn't change between items when the belt changes.

One problem though: Different belt types have different minimum time separation. Switching to a slower belt the items might have to be spaced out more. You can't cram a full red belt onto a yellow belt without a traffic jam. But even if you don't handle the change in minimum time separation special in the middle of a segment you could still join faster belt segments after slower belt segments.

The other think that comes to mind is that segments could be harder to render. At least for the case of a segment only being half visible. But factorio is dynamic. Could belt segments be split and joined fast enough to adjust them to the visible area as you move around or zoom? It seems like a cheap operation. Splitting and joining segments doesn't change the distance between items, only the item(s) at the split itself are affected at all. Unless every item has a pointer to the segment it is on. Then it is a O(n) operation for sure.

Factorio Forums

Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15

Re: Friday Facts #176 - Belts optimization for 0.15