DaleStan wrote: ↑Sat Nov 30, 2019 5:46 am
So your assumptions lead you to believe that the factorio devs missed a factor-of-eight size optimization? And you have thought about this and determined that your assumptions are probably correct?
Bit-level memory optimization seldom is the first, second or even third thing you think about when thinking about something that has to happen every frame. So yes, i assumed they either discarded it early on or completely "missed" that. It wastes CPU cycles as hell and doing it surely is a sign of insanity - except, maybe (i never intended my idea to sound like it would be guaranteed to not make the game perform worse), when being memory throughput limited.
Yes, it is totally common to "miss" a factor-of-eight (or even more) size optimization when storing data in memory. Using a 64-bit word for a boolean happens a lot. After all, what matters most of thetime is developer time - not code performance.
No, i did not concluded that my assumptions are probably correct - only, that there is enough probability for them being correct, that i could mention them without having to feel stupid for doing that. When doing micro-optimizations like that, it is the norm that there are enormous error margins - that is why the most important measure is thouroughly mesuring the bottlenecks before optimizing and the effects of tried optimizations. Hardware is complex and op code execution order is nondeterministic from the viewpoint of the coder since CPUs do all sorts of optimizations and predictions.
I still think, that the idea was non-just-the-same-they-heared-from-everyone-they-asked enough to be worth mentioning.
DaleStan wrote: ↑Sat Nov 30, 2019 5:46 am
* Emitter ID (minimum 20 bits)
Redundant for particles stored in a circular buffer pointed at in the emitter's data structure.
DaleStan wrote: ↑Sat Nov 30, 2019 5:46 am
* Particle spriteset (various smoke, steam, fire, blood, etc. animations: 16 bits)
Redundant for particles of emitters that have their particles share the same sprite set pointed at from inside the emitter's data structure. Want more sprite sets? Use more emitters.
DaleStan wrote: ↑Sat Nov 30, 2019 5:46 am
* Particle X/Y velocity (8 bits each)
* Particle Z position and velocity (10+8 bits: particles don't bounce, but they do fountain)
Abstracted away by animation step. The animation step could be an index into a list of precalculated matrices that would be applied to the particle's position vector. That list could be pointed at from inside the emitter's data structure.
DaleStan wrote: ↑Sat Nov 30, 2019 5:46 am
* Lifetime (12 bits: 68 seconds, tight but likely OK. I can use the animation counter to measure the age of the particle, but I also need to know how many ticks the particle should live.)
Particle lifetime is stored in the emitter's data structure. Particles age is the animation step.
I assumed a ratio of particles to emitters > 100. So it would make sense to have as much data stored in the emitter data structure as possible. Processing is done by looping over emitters and then loop over their ring buffer containing the particles. As particles die, they are deleted by advancing the ring buffer start index. Adding particles is done by storing them at start index + current particle count (wich then is increase by one). Obviously, the indices into the ring buffer are all modulo the ring buffer length.
Also keep in mind, that i had 26 unused bits left in the 64 bit particle structure. So if coordinate offsets, rotation, or animation step need to have a slightly higer resolution - there is some room for that. One possible extension would be four bits to be used to differentiate up to 16 different particle types per emitter wich might use completely different base offsets, animations, spriteset, whatever - all stored on the emitter instead on the hundreds of particles that one emitter might have visible each frame. But you could just have more emitters instead so code complexity can be kept (relatively) low.
P.S.: I am aware of this discussion having become purely academic because of the GPU needing to be fed with floating point numbers. I did not know that GPUs miss the commands to do the unpacking.