Friday Facts #322 - New Particle system

Regular reports on Factorio development.
User avatar
Klonan
Factorio Staff
Factorio Staff
Posts: 3964
Joined: Sun Jan 11, 2015 2:09 pm
Contact:

Re: Friday Facts #322 - New Particle system

Post by Klonan » Sun Nov 24, 2019 2:14 pm

Ultros wrote:
Sat Nov 23, 2019 11:33 pm
I guess it's up to the devs whether such an optimization is worth it. Considering the volumes of biters on multiplayer games, I think this would be an excellent improvement for multiplayer speed optimization.
Well, we just did this whole optimization, which should already majorly resolve the problem

We will see how community testing goes once 0.18 is out, there is always more room for improvement,
I think it is probably good enough now to last till 1.0 and beyond

Ambaire
Fast Inserter
Fast Inserter
Posts: 104
Joined: Fri Jun 26, 2015 11:37 pm
Contact:

Re: Friday Facts #322 - New Particle system

Post by Ambaire » Mon Nov 25, 2019 9:01 am

Very nice. Should help reduce the lag when nuking large numbers of biters.

burninghey
Inserter
Inserter
Posts: 33
Joined: Fri Sep 14, 2018 2:06 am
Contact:

Re: Friday Facts #322 - New Particle system

Post by burninghey » Mon Nov 25, 2019 8:42 pm

Let's swim in Blood and Gore and let them stay forever for a day or two

movax20h
Long Handed Inserter
Long Handed Inserter
Posts: 53
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: Friday Facts #322 - New Particle system

Post by movax20h » Tue Nov 26, 2019 4:31 pm

In the last picture, why there is still a lot of things that look like particles, but are not handled by new system?

User avatar
Klonan
Factorio Staff
Factorio Staff
Posts: 3964
Joined: Sun Jan 11, 2015 2:09 pm
Contact:

Re: Friday Facts #322 - New Particle system

Post by Klonan » Tue Nov 26, 2019 6:04 pm

movax20h wrote:
Tue Nov 26, 2019 4:31 pm
In the last picture, why there is still a lot of things that look like particles, but are not handled by new system?
There are things like explosion and particle-fountains, which are not performance concerns at the moment

User avatar
SHiRKiT
Filter Inserter
Filter Inserter
Posts: 691
Joined: Mon Jul 14, 2014 11:52 pm
Contact:

Re: Friday Facts #322 - New Particle system

Post by SHiRKiT » Wed Nov 27, 2019 9:19 pm

IF there's a thing that impreses me, is that 20 members of the team later, and 5 years, and you gus still improve the performance by huge margins. Your dedication to this game is unparallel

ratchetfreak
Filter Inserter
Filter Inserter
Posts: 943
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Friday Facts #322 - New Particle system

Post by ratchetfreak » Thu Nov 28, 2019 1:44 pm

SHiRKiT wrote:
Wed Nov 27, 2019 9:19 pm
IF there's a thing that impreses me, is that 20 members of the team later, and 5 years, and you gus still improve the performance by huge margins. Your dedication to this game is unparallel
this one was quite low hanging fruit though

cwalter5
Manual Inserter
Manual Inserter
Posts: 2
Joined: Mon Jul 25, 2016 8:38 am
Contact:

Re: Friday Facts #322 - New Particle system

Post by cwalter5 » Thu Nov 28, 2019 7:43 pm

Tynach wrote:
Sat Nov 23, 2019 12:27 am
vorax wrote:
Sat Nov 23, 2019 12:05 am
Here are some thoughts for possible future improvements.

Take advantage of the commonality between particles produced by a common source.

Say you have a particle source object, which contains the majority of those 64bytes you mentioned. Then a particle would be a tiny structure which would be appended to the source object. Say you used three numbers for the position relative to the source of each particle (16 bits each) and three more for the velocity (8? bits each) a texture offset (8bit) and a rotation (8bit). That's 14 bytes per particle. I believe that would be sufficient to animate all sorts of particle effects. I'm assuming three coordinates because for example the "blood" streams out like a fountain, I think it would be most efficient to model particle behavior in 3D then translate the 3 coordinates to 2D.

Many of these tiny particles could be saved as a contiguous array in a random order. Particle deletion could then be performed in order. Particle update would usually require iterating the array, but this could depend on the specific particle animation.

This idea may not work so well if the source of the particles was itself in motion. I don't know how often that happens in factorio. in these cases it maybe would be necessary to create new sources periodically instead of allowing the sources to move.
Positions and rotations would be stored in floats or doubles, each being 32 or 64 bits each (respectively). I imagine the 64 bytes are:

1. Position (x and y): 8 bytes to 16 bytes.
2. Velocity (x and y): 8 bytes to 16 bytes.
3. Rotation: 4 bytes to 8 bytes.
4. Time (how long the particle has existed, for calculating where in its path it is and when it should fade out): 4 to 8 bytes.
5. Pointer to texture to draw: 8 bytes.
6. Pointer to particle behavior specification: 8 bytes.

Or something along those lines. If doubles are used instead of floats, that'd be all 64 bytes right there.
It’s things like this that make me wish factorio could be open source.

My assumptions would be 32 bit floats would be sufficient for most positioning and velocity. Anymore than a 16 bit float for rotation seems like over kill as we have pixel art. If sprites aren’t rotated but selected from sprite directions (ala the biters) I’d question if we need more than 4 bits (32 directions).

Time only needs to be measured per tick (so 60 frames per second). 8 bit gives us a max lifetime of just over 4 seconds which probably isn’t quite enough time for a lot of scenarios. At 16 bit we have a max lifetime of 18 minutes. That should be more than enough for almost every use case.

There is no need to use a proper pointer for each texture. Simply using a basic int id per texture could get us under 2 bytes, maybe even to one byte, especially if we are using texture sets. Ie textures of a particle will typically be confined to one set with the timing of said particle able to indicate the actual frame of the animation set to use. Likewise rotation could provide another simplification of the texture sets to use.

I again doubt we have more than 2 bytes worth of particle behavior specifications. Considering that particle specifications are probably dictated based on particle type and thus texture set, I’d doubt we need more than one byte to indicate the specification.

This reduces us down to approximately 23 bytes per particular best case scenario. Finally all of this could possibly compressed by storing an index of the spawning particle. This would lead to 2 arrays, root particle with extra information and child particular with a simplified set. If this was done child particular could probably get away with 16 bit floats for positioning and velocity. Textures and behavior would be determined by the parent and reduced lookups could be used reducing object size even more to maybe 12. And typically child particles are more common, meaning an even smaller size of the total array. And if the arrays are small enough not wreck cache hits (though it may still cause other issues).

With all that said I obviously don’t know what extra information we need to store or extra information that needs bundles.

Rseding91
Factorio Staff
Factorio Staff
Posts: 10019
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Friday Facts #322 - New Particle system

Post by Rseding91 » Thu Nov 28, 2019 8:44 pm

cwalter5 wrote:
Thu Nov 28, 2019 7:43 pm
It’s things like this that make me wish factorio could be open source.

My assumptions would be 32 bit floats would be sufficient for most positioning and velocity. Anymore than a 16 bit float for rotation seems like over kill as we have pixel art. If sprites aren’t rotated but selected from sprite directions (ala the biters) I’d question if we need more than 4 bits (32 directions).

Time only needs to be measured per tick (so 60 frames per second). 8 bit gives us a max lifetime of just over 4 seconds which probably isn’t quite enough time for a lot of scenarios. At 16 bit we have a max lifetime of 18 minutes. That should be more than enough for almost every use case.

There is no need to use a proper pointer for each texture. Simply using a basic int id per texture could get us under 2 bytes, maybe even to one byte, especially if we are using texture sets. Ie textures of a particle will typically be confined to one set with the timing of said particle able to indicate the actual frame of the animation set to use. Likewise rotation could provide another simplification of the texture sets to use.

I again doubt we have more than 2 bytes worth of particle behavior specifications. Considering that particle specifications are probably dictated based on particle type and thus texture set, I’d doubt we need more than one byte to indicate the specification.

This reduces us down to approximately 23 bytes per particular best case scenario. Finally all of this could possibly compressed by storing an index of the spawning particle. This would lead to 2 arrays, root particle with extra information and child particular with a simplified set. If this was done child particular could probably get away with 16 bit floats for positioning and velocity. Textures and behavior would be determined by the parent and reduced lookups could be used reducing object size even more to maybe 12. And typically child particles are more common, meaning an even smaller size of the total array. And if the arrays are small enough not wreck cache hits (though it may still cause other issues).

With all that said I obviously don’t know what extra information we need to store or extra information that needs bundles.
We use fixed point positions (see https://www.factorio.com/blog/post/fff-64) for map positions. Additionally, things like textures/sounds/non-mutable-properties end up getting stored on what we call a Prototype which gets a runtime ID assigned. Most prototypes use unsigned 16 bit IDs which means the game supports up to 2^16 - 1 prototypes of a given type. The ID can be used to get the prototype pointer which gets you access to all of the non-mutable data associated with that thing.
If you want to get ahold of me I'm almost always on Discord.

User avatar
Oktokolo
Filter Inserter
Filter Inserter
Posts: 705
Joined: Wed Jul 12, 2017 5:45 pm
Contact:

Re: Friday Facts #322 - New Particle system

Post by Oktokolo » Fri Nov 29, 2019 5:08 am

Rseding91 wrote:
Thu Nov 28, 2019 8:44 pm
We use fixed point positions (see https://www.factorio.com/blog/post/fff-64) for map positions. Additionally, things like textures/sounds/non-mutable-properties end up getting stored on what we call a Prototype which gets a runtime ID assigned. Most prototypes use unsigned 16 bit IDs which means the game supports up to 2^16 - 1 prototypes of a given type. The ID can be used to get the prototype pointer which gets you access to all of the non-mutable data associated with that thing.
For particles as other games use them, you would have a known emiter of wich the particles would "emerge" and therefore would only need to store the position offset from that emitter for each particle.
I doubt that there would be distances greater than 1k pixels away from the emitter. So 10 bit ought to be enough for everyone™ (20 bit for the whole 2D offset).
If you need to have moving emitters, group particles together so they can share one common emitter position and spawn them with their position offset initially set to the difference of the group position and the emitter position.
If you need to store a rotation, 10 bit should be enough for that too.
The animation step counter could probably get away with being 10 bits too. But if there are long-lived particles, a fixed animation step multiplier on the particle prototype might fix that.
The remaining 26 bits of a 64-bit register can be filled with other information (maybe make the animation step counter larger) or used to pack the particles even tighter...
As Factorio is memory-throughput-constrained, wasting some cycles on extra unpacking and multiplications/additions may be okay. For faster multiplications constrain the animation step multipliers to factors of two so you can use the ultra-fast bitshift instructions.

Probably not worth the dev time though.

Rseding91
Factorio Staff
Factorio Staff
Posts: 10019
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Friday Facts #322 - New Particle system

Post by Rseding91 » Fri Nov 29, 2019 1:21 pm

The particles are already 64 bytes so there's no point in packing them into smaller structures unless they can be reduced to 32 bytes which isn't likely to happen without removing functionality that the game currently uses.
If you want to get ahold of me I'm almost always on Discord.

cwalter5
Manual Inserter
Manual Inserter
Posts: 2
Joined: Mon Jul 25, 2016 8:38 am
Contact:

Re: Friday Facts #322 - New Particle system

Post by cwalter5 » Sat Nov 30, 2019 4:08 am

Rseding91 wrote:
Fri Nov 29, 2019 1:21 pm
The particles are already 64 bytes so there's no point in packing them into smaller structures unless they can be reduced to 32 bytes which isn't likely to happen without removing functionality that the game currently uses.
Well what oktokolo is suggesting would take you down to 64 bits so 8 bytes. Of course that’s only if our assumptions are good.

DaleStan
Filter Inserter
Filter Inserter
Posts: 277
Joined: Mon Jul 09, 2018 2:40 am
Contact:

Re: Friday Facts #322 - New Particle system

Post by DaleStan » Sat Nov 30, 2019 5:46 am

So your assumptions lead you to believe that the factorio devs missed a factor-of-eight size optimization? And you have thought about this and determined that your assumptions are probably correct?


If I were trying to design a bit-packed particle system, I'd also include:
* Emitter ID (minimum 20 bits)
* Particle spriteset (various smoke, steam, fire, blood, etc. animations: 16 bits)
* Particle X/Y velocity (8 bits each)
* Particle Z position and velocity (10+8 bits: particles don't bounce, but they do fountain)
* Lifetime (12 bits: 68 seconds, tight but likely OK. I can use the animation counter to measure the age of the particle, but I also need to know how many ticks the particle should live.)

With Oktokolo's 40 bits, I'm already up to 16 bytes per particle. In addition, you now have to pull the X and Y coordinates of all emitters into memory before you know whether or not you need to draw that particle. Unless emitters are a lot smaller than particles, that's going to be a cache line per emitter. This probably saves bandwidth if I have a small number of emitters. But if I have a small number of emitters, I also have a small number of particles, so the memory throughput for that case isn't particularly interesting. On the other hand, if I'm out on a killing spree, and I just killed 100 biters, that's a fifth of my L1d, just to hold the emitter definitions. If I'm not very careful about how I allot slots in the particle array (Easy for blood. Hard for steam.) I'm going to discover that the emitter definition I need has been evicted in favor of particle definitions, and now I need to go fetch it from memory again.

Further, even with 1,048,576 emitters, there's going to have to be reuse of the emitter IDs, which adds a bunch of unpleasant code to assign and recycle emitter IDs. I'd also have to write code to sweep the particle array if all million emitters have live particles when a new emitter wants to join the party, which is going to be terrible for memory throughput.

NoHomeLikeLocalHost
Manual Inserter
Manual Inserter
Posts: 1
Joined: Sun Dec 01, 2019 8:16 pm
Contact:

Re: Friday Facts #322 - New Particle system

Post by NoHomeLikeLocalHost » Sun Dec 01, 2019 9:02 pm

DaleStan wrote:
Sat Nov 30, 2019 5:46 am
So your assumptions lead you to believe that the factorio devs missed a factor-of-eight size optimization? And you have thought about this and determined that your assumptions are probably correct?


If I were trying to design a bit-packed particle system, I'd also include:
* Emitter ID (minimum 20 bits)
* Particle spriteset (various smoke, steam, fire, blood, etc. animations: 16 bits)
* Particle X/Y velocity (8 bits each)
* Particle Z position and velocity (10+8 bits: particles don't bounce, but they do fountain)
* Lifetime (12 bits: 68 seconds, tight but likely OK. I can use the animation counter to measure the age of the particle, but I also need to know how many ticks the particle should live.)

*snip*
I have a few problems with this:

Entity ID and sprite set can be cut from the particle's data structure. Whatever created the particles manages their lifetime, and whatever renders the particles gives them their appearance, and both access a huge array of them through a single indirection. The only thing that is needed to differentiate a particle in the same system is position (of which, X and Y are missing from this data structure), velocity, and life time.

I don't agree with packing things in a bitfield because then the shader that renders the particles has to do manual bit twiddling on the GPU. HLSL doesn't allow integer arithmetic, so it's not possible without using a floating point mod and a 23-bit mantissa in a float (which means about a third of all data sent to the GPU is padding). Unpacking the particles before sending them to the GPU is just not an option because it means we have to have an uncompressed buffer full of them anyway, and we save nothing on memory bandwidth between the CPU and GPU. There will just be mild improvements to memory usage on the CPU if each particle system is small and there are many particle systems.

What is most important when optimizing particles is keeping them in pools so the new frame's particles can be uploaded straight to the GPU without modifications and then rendered in as few draw calls as possible.
  • When the memory is contiguous, then the memory controller can go wild moving hundreds of megabytes of memory to the GPU instead of 64 bits at a time per particle.
  • If all particles in a pool share the same material but differ in something like position, then instancing can be used, which draws all particles using a single API call. Reducing API calls means the program and driver spend significantly less time processing each particle on the CPU.

User avatar
Oktokolo
Filter Inserter
Filter Inserter
Posts: 705
Joined: Wed Jul 12, 2017 5:45 pm
Contact:

Re: Friday Facts #322 - New Particle system

Post by Oktokolo » Mon Dec 02, 2019 12:19 am

DaleStan wrote:
Sat Nov 30, 2019 5:46 am
So your assumptions lead you to believe that the factorio devs missed a factor-of-eight size optimization? And you have thought about this and determined that your assumptions are probably correct?
Bit-level memory optimization seldom is the first, second or even third thing you think about when thinking about something that has to happen every frame. So yes, i assumed they either discarded it early on or completely "missed" that. It wastes CPU cycles as hell and doing it surely is a sign of insanity - except, maybe (i never intended my idea to sound like it would be guaranteed to not make the game perform worse), when being memory throughput limited.
Yes, it is totally common to "miss" a factor-of-eight (or even more) size optimization when storing data in memory. Using a 64-bit word for a boolean happens a lot. After all, what matters most of thetime is developer time - not code performance.
No, i did not concluded that my assumptions are probably correct - only, that there is enough probability for them being correct, that i could mention them without having to feel stupid for doing that. When doing micro-optimizations like that, it is the norm that there are enormous error margins - that is why the most important measure is thouroughly mesuring the bottlenecks before optimizing and the effects of tried optimizations. Hardware is complex and op code execution order is nondeterministic from the viewpoint of the coder since CPUs do all sorts of optimizations and predictions.
I still think, that the idea was non-just-the-same-they-heared-from-everyone-they-asked enough to be worth mentioning.
DaleStan wrote:
Sat Nov 30, 2019 5:46 am
* Emitter ID (minimum 20 bits)
Redundant for particles stored in a circular buffer pointed at in the emitter's data structure.
DaleStan wrote:
Sat Nov 30, 2019 5:46 am
* Particle spriteset (various smoke, steam, fire, blood, etc. animations: 16 bits)
Redundant for particles of emitters that have their particles share the same sprite set pointed at from inside the emitter's data structure. Want more sprite sets? Use more emitters.
DaleStan wrote:
Sat Nov 30, 2019 5:46 am
* Particle X/Y velocity (8 bits each)
* Particle Z position and velocity (10+8 bits: particles don't bounce, but they do fountain)
Abstracted away by animation step. The animation step could be an index into a list of precalculated matrices that would be applied to the particle's position vector. That list could be pointed at from inside the emitter's data structure.
DaleStan wrote:
Sat Nov 30, 2019 5:46 am
* Lifetime (12 bits: 68 seconds, tight but likely OK. I can use the animation counter to measure the age of the particle, but I also need to know how many ticks the particle should live.)
Particle lifetime is stored in the emitter's data structure. Particles age is the animation step.

I assumed a ratio of particles to emitters > 100. So it would make sense to have as much data stored in the emitter data structure as possible. Processing is done by looping over emitters and then loop over their ring buffer containing the particles. As particles die, they are deleted by advancing the ring buffer start index. Adding particles is done by storing them at start index + current particle count (wich then is increase by one). Obviously, the indices into the ring buffer are all modulo the ring buffer length.

Also keep in mind, that i had 26 unused bits left in the 64 bit particle structure. So if coordinate offsets, rotation, or animation step need to have a slightly higer resolution - there is some room for that. One possible extension would be four bits to be used to differentiate up to 16 different particle types per emitter wich might use completely different base offsets, animations, spriteset, whatever - all stored on the emitter instead on the hundreds of particles that one emitter might have visible each frame. But you could just have more emitters instead so code complexity can be kept (relatively) low.

P.S.: I am aware of this discussion having become purely academic because of the GPU needing to be fed with floating point numbers. I did not know that GPUs miss the commands to do the unpacking.

Post Reply

Return to “News”

Who is online

Users browsing this forum: No registered users