It's mostly a question of when the limit is reached. If 99% of players never have to think about designing for performance, then it's fine. For the rest... well, if you make Factorio 10x faster, they will build a 10x bigger factory or load 10x as demanding mods.Nican wrote:Unfortunately, I feel like this game is always going to be CPU-bound, and there will always be strategies for players to reduce CPU.Hope not. I don't want performance issues to be governing how factories look.
Friday Facts #176 - Belts optimization for 0.15
-
- Long Handed Inserter
- Posts: 58
- Joined: Sun Jun 08, 2014 4:00 am
- Contact:
Re: Friday Facts #176 - Belts optimization for 0.15
Re: Friday Facts #176 - Belts optimization for 0.15
The game will always be CPU/RAM bound - that's how programs work. If you ever manage to find a program that isn't hardware bound .. I uhh... I'm not sure - I guess you've won the world?Nican wrote:Unfortunately, I feel like this game is always going to be CPU-bound, and there will always be strategies for players to reduce CPU.Hope not. I don't want performance issues to be governing how factories look.
If you want to get ahold of me I'm almost always on Discord.
Re: Friday Facts #176 - Belts optimization for 0.15
Lol. But thats not what he meant.Rseding91 wrote:The game will always be CPU/RAM bound - that's how programs work. If you ever manage to find a program that isn't hardware bound .. I uhh... I'm not sure - I guess you've won the world?Nican wrote:Unfortunately, I feel like this game is always going to be CPU-bound, and there will always be strategies for players to reduce CPU.Hope not. I don't want performance issues to be governing how factories look.
Re: Friday Facts #176 - Belts optimization for 0.15
There's plenty, e.g. programs bound by the UI, say, minesweeper, where there is negative marginal benefit for the user beyond certain program state sizes well before reaching hardware bounds. I expect the world to be shipped to me asap.Rseding91 wrote:The game will always be CPU/RAM bound - that's how programs work. If you ever manage to find a program that isn't hardware bound .. I uhh... I'm not sure - I guess you've won the world?Nican wrote:Unfortunately, I feel like this game is always going to be CPU-bound, and there will always be strategies for players to reduce CPU.Hope not. I don't want performance issues to be governing how factories look.
Edit: Unless someone adds automation aspect for the factory blueprinting (e.g. real von-neumann factories), there's obviously such limit for Factorio as well. There's only so fast humans can click, only finite time they can spend playing Factorio, only finite number of humans that can play Factorio The limit just happens to be well beyond current consumer hardware capabilities. In fact, MOST of the programs actually are bound by user inputs at one level or another, the "infinitely scalable problems" are not that useful for real world applications.
Re: Friday Facts #176 - Belts optimization for 0.15
roy7 wrote:nou_spiro on Reddit pointed this out:
Hmm actually 0xFFFF seems invalid accordig to RFC1624Since there is guaranteed to be at least one non-zero field in the IP header, and the checksum field in the protocol header is the complement of the sum, the checksum field can never contain ~(+0), which is -0 (0xFFFF).
Except in UDP the value of 0x0000 is reserved to mean the checksum was not calculated, so RFC 768 states that in UDP a sum of 0xFFFF, which is certainly possible, and theoretically should result in a checksum value of 0x0000, should result in the checksum value of 0xFFFF.
Re: Friday Facts #176 - Belts optimization for 0.15
Nican
On ring buffer - in previous "slow" belts it was just an array due to at most 4 items per transport line. Updating 4 integers is faster or at least same-speed than doing all that ring-buffer logics for just 4 items. The new implementation has linear ring-buffer. As for turning that buffer to a list - there is no need to do it, just keep it a ring buffer. Many sequences are full of items and standing still, then this entire sequence starts moving when belts ahead got some free space, so converting between ring-buffer and list will happen very frequently with no benefit.
On multithreading - we have that implementation already at per-chunk level. I described it here viewtopic.php?f=5&t=39893&start=60#p238247
On ring buffer - in previous "slow" belts it was just an array due to at most 4 items per transport line. Updating 4 integers is faster or at least same-speed than doing all that ring-buffer logics for just 4 items. The new implementation has linear ring-buffer. As for turning that buffer to a list - there is no need to do it, just keep it a ring buffer. Many sequences are full of items and standing still, then this entire sequence starts moving when belts ahead got some free space, so converting between ring-buffer and list will happen very frequently with no benefit.
On multithreading - we have that implementation already at per-chunk level. I described it here viewtopic.php?f=5&t=39893&start=60#p238247
Re: Friday Facts #176 - Belts optimization for 0.15
I had some time thinking about how this could happen. My theory is that the copper plate to the right of that your arrow points at was inserted onto the belt at sometime after the last and third to last copper plate was already there. We just dont see this on the gif.Zeblote wrote:There's a bug in your first gif
Then there is a chance of computing the distances wrong.
Like lets say there is a distance of "5" between two copper plates. Then if you insert a third plate between them, lets say it is a distance of 3 behind the next plate... then the distance to the plate behind it is *not* 2 because you have to also count the width of the plate you inserted. If you do this wrong then I think it would be possible that the next plate only stops when it is one plate width too close to the plate ahead of it.
Just a theory. Please let me know if I was on the right
edit: That said, it looks like it has gone more like one quarter of a width too far so I could be completely wrong. Depends on how the visual width of the plates is relative to units used when calculating their width. maybe.
Re: Friday Facts #176 - Belts optimization for 0.15
Thanks for the asnwer.Harkonnen wrote:Nican
On ring buffer - in previous "slow" belts it was just an array due to at most 4 items per transport line. Updating 4 integers is faster or at least same-speed than doing all that ring-buffer logics for just 4 items. The new implementation has linear ring-buffer. As for turning that buffer to a list - there is no need to do it, just keep it a ring buffer. Many sequences are full of items and standing still, then this entire sequence starts moving when belts ahead got some free space, so converting between ring-buffer and list will happen very frequently with no benefit.
On multithreading - we have that implementation already at per-chunk level. I described it here viewtopic.php?f=5&t=39893&start=60#p238247
Re: Friday Facts #176 - Belts optimization for 0.15
This to me is the best way to support multithreading, hell with this model you could actually simulate factories on multiple systems independently and the only synchronization you would need is at the boundaries of the areas and things that cross them. By limiting the items that can cross boundaries to players and trains you drastically decrease the intersystem communication and handoff making it possible. If a user creates a belt that connects two separate areas they merge into a single area. It just make so much sense and would only come into play for absolutely gigantic factories. Think about the case where you have 30 16 core system all simulating bases, how gloriously large could you get. There would still be an upper limit, and would only be used by a VERY small segment of the player base but could be a ton of fun.Nican wrote:Interesting that you guys posted about the belts. I was thinking about how you guys implemented that yesterday. xD
And for multi-threading, I had a different idea in mind. I had thought of a system where the game would separate the different "sections" of a base on various threads. Meaning, the main base would run on one thread, each mining base would run on a separate thread, and each smelting base on another thread, and so on. With this, also encourage mega-base builders to modularize their bases with trains.
I do not think this is something that realistically would ever make it into factorio vanilla for the 1.0 release. Damn, would it be glorious and all of the current fine tuning is still needed as each separate area still needs to take as few CPU cycles as possible.
Re: Friday Facts #176 - Belts optimization for 0.15
Have you ever considered that, instead of moving anything along the conveyor, you move the start of the conveyor away from them?Klonan wrote:another dose of FFF goodness:...
Since you store the conveyor as an array, for example, moving the element used as the start of the conveyor back one moves all the elements on it along one. OK so you'd have to implement code to make arrays effectively circular, and there'd have to be some nonsense to deal with pile ups at the end of a conveyor, which I'd probably do by having a separate "static stuff on the conveyor" array which I'd move stuff to when there's a pile up or something, but I'm sure you can work something better out.
I am sure you could implement a class which returned the nth position in the conveyor as if it was a normal array, so that using conveyor(12) would give the item 12 places from the start.
You could also store the position of the next unoccupied position in the conveyor before and after an occupied space, so that they formed a linked list when you want to cycle through all the items on a conveyor and miss out the unoccupied bits. That way, when you wanted to access the next one or previous one they were easily accessible. You could do this either as pointers to the items or indexes or the number of spaces back/forwards to move relative to the current item. Personally I'd just use the absolute index of the next/previous item in the array.
Re: Friday Facts #176 - Belts optimization for 0.15
I believe he was referring to the alternative as being GPU bound. Y'know, like most games are when they're not trying to keep track of a gajillion belts at once With GPU-bound, you always just drop graphics settings to keep things moving, but you're generally up a creek if the CPU/RAM is your bottleneck.Rseding91 wrote:The game will always be CPU/RAM bound - that's how programs work. If you ever manage to find a program that isn't hardware bound .. I uhh... I'm not sure - I guess you've won the world?
On a related note, I know there's been some discussion about how memory performance can have a surprising effect on this game's performance. Do you still expect that to be the case in the future? I've got a new computer build coming up, and my factories are big enough to worry about that kind of thing...
Re: Friday Facts #176 - Belts optimization for 0.15
I'm not sure if I get the problem. If 'last item' refers to the item that is nearest to the end of the belt, then the time when it reaches the end of the belt can be computed by:Harkonnen wrote:That reminds me of what kovarex initially proposed and we had ton of arguments about it thereafter. The biggest problem - you also have to predict the time when that last item reaches end of the belt and the belt gets into blocked state. While happening relatively not that often, this is still an issue.Lilly wrote:Instead of storing the distance between items, you can store for each item the time/tick when it will reach the end of the segment (if it won't block or be removed).
Code: Select all
tick when item is at end of segment = distance to end of segment / belt speed,
So, am I correct that 'last item' refers to the item that is furthest away from the end of the belt? And that 'the belt gets into blocked state' refers to all items on the belt are blocked (i.e. no longer moving)? Then why do you need to predict the time when this happens? I.e. what changes or needs to be updated?
The belt only needs to receive updates when items are inserted or removed. This does not change when all items on the belt are blocked. Furthermore, the position of items also don't need to be updated. Those are already fully described by the 'tick when item is at end of segment' and the following formula (where the item nearest to the end of the belt receives index 0):
Code: Select all
distance to end of segment = max( index in array * item width, (tick when item is at end of segment - current tick) * belt speed ).
Re: Friday Facts #176 - Belts optimization for 0.15
Like Lilly, I don't see the need to lose the position of items. Consider the following algorithm:
Every belt segment gets a belt.shift variable, which is incremented at every tick. Items entering the belt will be assigned the static relative position
At a later time, its position can be calculated by
Now the front of the belt may be moving slower, or not at all, when it is blocked. A second variable belt.shiftSlow will govern the impeded items. It is only incremented when an item can leave the belt.
A third variable belt.gap will track the distance between the freely moving and the blocked items. When belt.shift increases but belt.shiftSlow does not, the gap decreases. When it reaches zero, items are colliding.
Items are stored in a ring buffer with a buf.head and buf.tail index. A third index buf.gap somewhere between head and tail points to the item right before the gap ( between the blocked and the moving items ). For each colliding item the static position is recalculated with
( because its position from now on is obtained by x = item.x0 + belt.shiftSlow ) and the buf.gap index is decreased by 1. Finally belt.gap is updated to the new gap width.
For most segments, a tick only increments belt.shift and belt.shiftSlow.
Both the [buf.tail : bug.gap] and [buf.gap : buf.head] range remain suitable for binary search on the item.x0 value.
It should be comparable in performance to storing distances between items, but you retain easy access to item position. Losing that is something you may come to regret later on.
But ofc I may be missing something
Every belt segment gets a belt.shift variable, which is incremented at every tick. Items entering the belt will be assigned the static relative position
Code: Select all
item.x0 = - belt.shift;
Code: Select all
x = item.x0 + belt.shift;
A third variable belt.gap will track the distance between the freely moving and the blocked items. When belt.shift increases but belt.shiftSlow does not, the gap decreases. When it reaches zero, items are colliding.
Items are stored in a ring buffer with a buf.head and buf.tail index. A third index buf.gap somewhere between head and tail points to the item right before the gap ( between the blocked and the moving items ). For each colliding item the static position is recalculated with
Code: Select all
item.x0 = item.x0 + belt.shift - belt.shiftSlow;
For most segments, a tick only increments belt.shift and belt.shiftSlow.
Both the [buf.tail : bug.gap] and [buf.gap : buf.head] range remain suitable for binary search on the item.x0 value.
It should be comparable in performance to storing distances between items, but you retain easy access to item position. Losing that is something you may come to regret later on.
But ofc I may be missing something
Last edited by TheYeast on Mon Feb 06, 2017 9:40 am, edited 1 time in total.
-
- Burner Inserter
- Posts: 11
- Joined: Tue Dec 06, 2016 9:45 am
- Contact:
Re: Friday Facts #176 - Belts optimization for 0.15
I'm sure this has already been covered somewhere, but you got me interested. A quick google search had me reading some interesting papers. Forgive me if my interpretation is wrong, as it's been a lot of years since I renewed my CCNA.
According to the IETF RFC 6935 and RFC 2460, IPv6 requires that a zero value UDP checksum is invalid and the packet must be dropped. Therefore, to send such a packet, you must use a checksum of 0xFFFF, which indicates "checksum not calculated", or zero. Thus, some equipment apparently erroneously views 0xFFFF = 0x0000 and drops the packet, such as an example here, and one here. Basically, it appears to be a somewhat widespread problem in developers not considering the edge case where a checksum of 0xFFFF will need special handling to be distinguished from 0x0000 when doing binary comparisons.
According to the IETF RFC 6935 and RFC 2460, IPv6 requires that a zero value UDP checksum is invalid and the packet must be dropped. Therefore, to send such a packet, you must use a checksum of 0xFFFF, which indicates "checksum not calculated", or zero. Thus, some equipment apparently erroneously views 0xFFFF = 0x0000 and drops the packet, such as an example here, and one here. Basically, it appears to be a somewhat widespread problem in developers not considering the edge case where a checksum of 0xFFFF will need special handling to be distinguished from 0x0000 when doing binary comparisons.
Re: Friday Facts #176 - Belts optimization for 0.15
We don't directly control the checksum. We simply send data that we want to send and what ever the checksum comes out as is what it comes out as.Commander Gizmo wrote:I'm sure this has already been covered somewhere, but you got me interested. A quick google search had me reading some interesting papers. Forgive me if my interpretation is wrong, as it's been a lot of years since I renewed my CCNA.
According to the IETF RFC 6935 and RFC 2460, IPv6 requires that a zero value UDP checksum is invalid and the packet must be dropped. Therefore, to send such a packet, you must use a checksum of 0xFFFF, which indicates "checksum not calculated", or zero. Thus, some equipment apparently erroneously views 0xFFFF = 0x0000 and drops the packet, such as an example here, and one here. Basically, it appears to be a somewhat widespread problem in developers not considering the edge case where a checksum of 0xFFFF will need special handling to be distinguished from 0x0000 when doing binary comparisons.
If you want to get ahold of me I'm almost always on Discord.
-
- Burner Inserter
- Posts: 11
- Joined: Tue Dec 06, 2016 9:45 am
- Contact:
Re: Friday Facts #176 - Belts optimization for 0.15
Rseding91 wrote:We don't directly control the checksum. We simply send data that we want to send and what ever the checksum comes out as is what it comes out as.Commander Gizmo wrote:I'm sure this has already been covered somewhere ... not considering the edge case where a checksum of 0xFFFF will need special handling to be distinguished from 0x0000 when doing binary comparisons.
Right, naturally, the checksum is based on the data in the packet. I was just sharing the evidence I happened to find that seemed to confirm what you guys suspected in your FFF. It seems that the IPv6 specifications don't mention this specific case clearly enough, and the result is that the specs aren't always implemented consistently, such as in this case at RedHat. The issue comes down to this fact:
- 0xCD7A + 0x3285 + 0xFFFF = 0xFFFF
0xCD7A + 0x3285 + 0x0000 = 0xFFFF
Code: Select all
if (0x0000 == 0xFFFF)
- bobingabout
- Smart Inserter
- Posts: 7352
- Joined: Fri May 09, 2014 1:01 pm
- Contact:
Re: Friday Facts #176 - Belts optimization for 0.15
Forgive me if I'm not up to speed with all this, but from what I read about the IPv6 there... that is total bullshit. Not what you're saying, but the fact that 0xFFFF == 0x0000.
You can't write a spec that says "0xFFFF hasn't been checked, and 0x0000 is invalid and should be dropped" without saying "if a genuine checksum results in 0xFFFF or 0x0000 you need to do this"
I think that's why Wireless N was so unstable. the structure wasn't 100% defined, so manufacturers filled in the blanks, which basically meant if you weren't using hardware of all the same brand AND VERSION, that it would be unstable. Mine used to lock up constantly, and require a reset on average of twice a day.
You can't write a spec that says "0xFFFF hasn't been checked, and 0x0000 is invalid and should be dropped" without saying "if a genuine checksum results in 0xFFFF or 0x0000 you need to do this"
I think that's why Wireless N was so unstable. the structure wasn't 100% defined, so manufacturers filled in the blanks, which basically meant if you weren't using hardware of all the same brand AND VERSION, that it would be unstable. Mine used to lock up constantly, and require a reset on average of twice a day.
Re: Friday Facts #176 - Belts optimization for 0.15
Yes, the steps to create the checksum are completely unambiguous, there is no grey area here. If which ever hardware developer screwed this up would have been following it then 0x0000 simply can not be a result and 0xFFFF is a viable result like every other as well, the last step of the creation rules does not allow for anything else. There is no additional step you need to perform to work with IPv6 either, the only difference is, that IPv4 would theoretically allow packets that never had a checksum to be forwarded, which is beyond the issue since nobody creates such packages.bobingabout wrote:Forgive me if I'm not up to speed with all this, but from what I read about the IPv6 there... that is total bullshit. Not what you're saying, but the fact that 0xFFFF == 0x0000.
You can't write a spec that says "0xFFFF hasn't been checked, and 0x0000 is invalid and should be dropped" without saying "if a genuine checksum results in 0xFFFF or 0x0000 you need to do this"
-
- Filter Inserter
- Posts: 952
- Joined: Sat May 23, 2015 12:10 pm
- Contact:
Re: Friday Facts #176 - Belts optimization for 0.15
It sounds like you could add a few "detection regions" where the items on a particular belt are detected these could be updated in O(n) where n is the number of detection regions. This would let circuit detection not influence the segment sizes.
each detection region has a start index, a start offset, end index and an end offset. These describe in which item gap start and end are and which offset it has relative to that item. Counting the
Then after you decrement the first gap you iterate over the regions and do (assuming item[0] is the item the closest to the end of the belt)
There is a bit more finagling regarding the bounds but that won't cause too much slowdown.
each detection region has a start index, a start offset, end index and an end offset. These describe in which item gap start and end are and which offset it has relative to that item. Counting the
Then after you decrement the first gap you iterate over the regions and do (assuming item[0] is the item the closest to the end of the belt)
Code: Select all
if(cache_pointer < start_index) start_offset -= speed;
while(start_offset < 0) {
start_index++;
item_gap += items[start_index].gap;
if(circuit_output = PULSE)
add_circuit_output(item.item, 1);
}
if(cache_pointer < end_index) end_offset -= speed;
while(end_offset < 0) {
end_index++;
end_gap += items[end_index].gap;
}
Re: Friday Facts #176 - Belts optimization for 0.15
I don't know whether others do but I rarely use circuit detection. If my factory uses 10 at a time it is a lot. But even a few hundred extra belt segments due to circuit detection would hardly be noticeable in performance as far as I can tell... I wouldn't go there unless profiling proves this to be a problem.ratchetfreak wrote:It sounds like you could add a few "detection regions" [...] This would let circuit detection not influence the segment sizes.