- Update order and orientation independent. In fact it is deterministic under any arbitrary update order.
- Can be calculated with a *single* pass over a dense array. (plus some index lookups)
- Support arbitrary partitions (of any size, e.g. 1 to 1M+) allowing multithreaded calculation, data locality tuning etc.
- Possibly extendable to allow concurrent entity update and fluid update on different threads.
Intro
To expose the lede, the main thrust of the idea is to (a) liberate the fluid network from the bounds of the entity/chunk system, and to (b) perform a two-phase update (the interesting part imo). The rest of this post explains what that means, what other design decisions they require, and what the practical and programmatic implications are.Freedom
This first part is pretty simple. Entity ticking is a very general system that must support many different types of interactions and it's great for simulations like factorio. But because it's general, it's not very fast, hence the gradual offloading of certain common patterns into their own simulation system. (See belt optimizations for a recent example.) So the first step of our new fluid system is to separate the fluid calculation process from the general entity ticking system. Entities like pipes and machines no longer contain their own fluid directly, but a static reference to a specific box in the 'fluid grid' that it's attached to. To read or update their current fluid contents, they do that through this reference. This means that all fluid-only entities are never even touched by entity ticking anymore.Ok, so what's this 'fluid grid' thing? A fluid grid is just an array of fluid boxes that contain just enough information to calculate the flow of the fluids through the grid. E.g. each box will contain its capacity, current level, fluid type, and references to any of its neighbor boxes (which may be anywhere in the array). There can be any number of grids on a map (from one for the whole map to one per fluid entity), a single grid can have boxes that never interact (i.e. separated fluid networks), and grids can connect to each other. You want big grids because steady state array iteration is fast, but not so big that neighbor data is really far away making loading it slow, so some tuning is necessary. Notably, all fluid grid calculations are completed before/after entity ticking occurs.
Phasers set to 2
This two-phase update is what gives us the nice properties of orientation and update order independence. Here's how it works. There are three integer fields on each fluid box that are used during the calculation: Current which is immutable and doesn't change throuout the calculation, Next which is the value that each box calculates for itself, and Delta which is where neighbor boxes put changes. Fluids are pushed from high to low, so lets make that a rule: if a box has more fluid than a neighbor it can push (add) fluid to their Delta field, but there is no "pulling" or "sucking" fluid. Second rule: when calculating how much fluid to push into a neighbor you can only use their Current field (and capacity and type etc, this doesn't change the percentage filled fluid system), it cannot use Next or Delta (see [1]). Third, once a fluid box is done calculating its transfer to its neighbors (if any), it puts the remainder in its own Next field.All that was phase one. The above rules make fluid calculation orientation and update order independent because each box sees the exact same state of the world via Current and only makes local decisions. Finally, once everything is done calculating, phase two is just: Current is updated to be the sum of Next and Delta both of which are then set back to zero. (If that sounds like a second pass to you, you're not wrong, but that could be elided by calculating it the first time it's encountered on the next round.)
But Why?
I've mentioned update order and orientation independence a couple times, why are these properties useful or interesting? Orientation independence is a nice perk, we like more determinism of course, but it's more of a side effect of update order independence which is the real prize here. This gives us the ability to arbitrarily restructure the fluid network to be as fast to calcuate as possible, including calculating it in as many pieces and over as many threads as we want, since the order doesn't matter. Iterating through an array is fast: the CPU specifically looks for that special case to speed it up. By itself, that doesn't help us much, since we're looking up neighbors which could be anywhere in the array. However, if we partition the grids and sort them right we can get very good data locality which could claw back the neighbor lookup time. Basically, update order independence reduces fluid simulation problems into graph problems (partitioning, walking, etc.) for which we have many tools.This is also why I'm separating the fluid grid from the entities system: the data structures for these are necessarily big to handle all the different types of entities and all their states. By separating fluid into it's own type with fixed fields and fixed size, you can greatly reduce the size of the data needed and can fit more at once throuout the memory heirarchy. Assuming grids of no more than 65k fluid boxes, you could probably fit each fluid box into 16 bytes: 2 bytes for indexes of up to 4 neighbors, 2 bytes each for Current, Next, and Delta, 1 byte for fluid Type, 1 byte for Capacity (as a power of 2 or as a lookup table with 256 sizes) (maybe 20 bytes if that's a little cramped, but still extremely compact compared to most entities). Pack that all into an array and you have a data structure with a very small footprint, great data locality despite index lookups, and very fast iteration time.
FAQ
So there are a bunch of little things left to explain, so I'll explain that in the form of an FAQ, even though these are my own questions, and they only have a frequency of 1 (so far).How do you add/remove more fluid boxes from a grid?: The entities in the chunk are still the canonical location for connection information. If you add a pipe, it knows what other entities it can connect to, and they know where to find their respective fluid boxes, so just append a new empty box to the end and wire up all the neighbors. Same thing for deleting a pipe: clear its neighbors, then either move a box from the end of the array down or just leave the location blank, a fluid box with no fluid or neighbors will just be passed over normally since it can't do anything.
How are two touching grids calculated?: Carefully. They should be pretty rare. One solution is keep a set of 'sigil' neighbor ids that triggers you to look up the box from another grid and save any added Delta to a list until after all the threads are done. Another is to keep a list of boxes that touch other grids and update all of them first. Another is to add a special fourth ThreadedDelta that you only add to with atomic ops.
How are grids partitioned and ordered?: Grids start small, so just grow and merge them naturally. A greedy graph walk from any point will probably make a reasonable sort ordering. If you have a huge grid that needs splitting up, you have a plethora of graph partitioning schemes you could use. The nice thing is you can do this in the background and you only have to do it very rarely after a grid has changed significantly.
Won't the delay lead to sloshing?: Yes, but that's natural. Imagine three pipes with a capacity of 100 connected in a line with the middle one empty and the outer ones full (using my calculation below): First it looks like [100][0][100], then the outer ones push half of their fluid to the center Delta making it look like [50][100][50], then the center one pushes back to make it settle at [67][66][67]. This is great! It looks and feels very fluid-like.
Wait, can't you get higher than 100% capacity in some cases?: Yes, but it should sort itself out very quickly. Just think of it like a high pressure wave. Same example as above but the outside ones are full at a capacity of 1000 instead of 100 (like a tank): [1000][0][1000], the outer ones each think they can push up to 100 into the inner one so the inner one gets overpressurized to [900][200][900], now the middle is at 200% and the outsides are at just 90% so, again, it pushes back to [952][96][952]. Yes I agree this is less than ideal, but if this does similar things to fluids as the belt optimiztion did to belts, I think it's worth it. Just think of it like a high pressure wave. This should be pretty rare.
How do machines and pumps work in this?: They have an internal "pipe" that acts normally, except before the fluid grid updates, it add/subtracts from the Current field whatever it can consume/produce.
Can this fix the 1 wrong fluid stuck in the middle of a pipe problem?: Yes! Glad you asked! Make a special case in the neighbor distribution calculation that makes a box try to push all of its fluid to other neighbors if one neighbor is found with a different fluid type and higher level at an extra rate of the difference between the levels of the other fluid and this box.
Conclusion
Whew. I've been mulling that over for the past month, glad to get it out. What do you think? Did I miss something obvious? Any criticisms/critiques? Any clarifications you want me to make? (Mods: Is this the right subforum for this?)Thanks for reading!
[1]: How to calculate how fluid is distributed to neighbors is an interesting question in itself and can significantly change how the fluid behaves, but is a bit out of scope. Something simple like: for each neighbor with less fluid than itself, highest first, transfer half of the difference between their levels to that neighbor (rounding down). But I could imagine many kinds of calculations, e.g. a 'viscous' flow that only transfers up to a quarter of the difference or something. You only need the property that given no changes it should even out over time.