Well said. I'm quite fond of 'data oriented design' for this reason - it encourages using contiguous arrays for most things, and then having the indices (or a pointer) act as an identifier. So if you have an array of generic entities, and then an array specifying which of those entities are a certain type of entity, you can have that second array contain structs/objects that are essentially a pointer to which generic entity they belong to, and also all the properties that are specific to that type of entity.jcranmer wrote: βFri Nov 22, 2019 3:23 pmEach pointer is 8 bytes. As you get to medium-sized factories, that means your lists are each going to be in the 100K-1MB range. You might then say "I've got 8+GB of RAM, who cares?", but the reality is that it's not the RAM that's precious, it's your caches. If your lists don't fit in the cache, you're now paying a miss to main memory to get the address of the location you need to retrieve from main memory, all the while twiddling your thumbs while you're waiting for main memory. When you have enormous memory requirements, performance optimization tends to become all about optimizing cache locality. The classic example here is the random insertion in a list: at what N does it become faster to use a linked list instead of an array list for randomly inserting elements into the middle?
The result is that if you only want to iterate over one type of entity, you never have to iterate over the other types. But if you need to iterate over *all* entities, you still can. Also, an entity can be made to act like more than one type (if desired) by having an entry in 2 different 'entity type arrays', so that iterating over each type for processing a certain behavior causes one entity to be processed twice, once for each sort of behavior.
One of the reasons behind data oriented design is that when you're performing operations and comparisons across the data of a large set of entities in a list, you're often comparing the same sort of data with each other. For example, if you're processing collisions, you're doing a lot of bounds checking of rectangles. As such, it makes sense to keep all of the rectangles in a contiguous array separate from the rest of the data, so that iterating through them to do bounds checking becomes incredibly fast, as more rectangles are able to be loaded into the cache at a time for such a comparison.
This is why sometimes you just use the indices as identifiers. All entities share some basic properties with each other, so the idea is to have a structure holding several arrays - one for each property. It acts like a table, and a given index should refer to the same entity for every array in the struct.
Edit: Just wanted to conclude with a 'final' example, a more robust version of the first. You can have a structure of arrays for the properties that all entities have, and more structures of arrays, each for the properties that only a specific type of entity has. In this case, since a single pointer won't be sufficient for pointing to the 'base' entity object, the structures for entity sub-types would have an array holding which index to use in the base entity structure of arrays.