Entity/component model in prototypes
Posted: Wed Apr 27, 2016 11:34 pm
Hi,
I am currently in the process of writing my first mod for Factorio and there are quite some aspects of the API that I find cumbersome. First a bit of background about me: I have been programming since I can remember (mostly C++) and my professional work is centered around 3D/game engines, so I do know how these things work (I also speak Czech ). This is partly the reason why the mod API surprised my a bit, since I assume it reflects how the game works internally.
To my understanding from reading the forum a "type" in Lua corresponds to a class in C++ where entities are probably stored in big arrays, one for each entity type and processed every tick (hopefully without abstract base classes, virtual functions, and the whole inheritence forest problem). So this basically makes it impossible to change an entity's behavior without having to touch C++ code. It also means that if I want to have an electric boiler I also have to touch C++ code and either hardcode a new entity class or give the existing entity some new operation mode (I think electric boiler mods currently actually use a "pump" type to circumvent this, with lots of Lua hacks in on_tick). To me this indicates a quite inflexible system, both for mod devs *and* for the game devs.
I have also looked at other mods to get some insights and see whether I'm doing something wrong. One particular pattern showed up repeatedly, especially when they try to model an entity that is more complex. They listen to the on_built_entity event, then create some new internal invisible entities, use the on_tick event to modify those internal entities, and when the "primary" entity is removed they also remove the secondary entities. For example, I wanted to have a simple entity that consumes electricity and does something with it, but there are only a handful entities that are capable of consuming electricity. So what I'm doing is creating an internal pump and every tick remove "energy" from the pump to simulate consumption. And it seems others are doing similar things.
This "entity amalgamation" is so ubiquitous that I think it deserves attention.
There is an easy solution called the entity/component system (KSP calls it part/modules but it's the same thing). Are the devs familiar with that concept?
In this pattern an entity is actually nothing else but a container for components (in some engines it is literally a container of pointers to components, in others it's just an ID). An entity doesn't do anything on its own, it is just an abstraction used to associate components to the same entity. It's the components which are responsible for behavior. Actually, even the components are just "plain old data" objects. Every component type has an associated system which does the real work. Every tick the system for component type A processes all components of type A, where all components of type A are stored in one big array containing *all* components of type A active in the game. Then system B processes all components of type B, and so on. This arrangement has very good data and instruction cache efficiency. In such a design components are the smallest granularity of "behavior" in the game. Let me show this on the example of a boiler. Currently a boiler is a hardcoded entity that does one specific thing. But it can actually be disassembled into multiple components/behaviors:
1. Inventory
2. Fluid box
3. Fluid connector(s)
4. Fuel burner
5. Energy buffer
6. Fluid heater
Now there is no longer a "boiler" per-se. The fact it is a boiler is the result of how its components interact. The update every tick goes through each system and manipulates all components of that type:
1. Fluid heater increases temperature of fluid box taking energy from buffer
2. Burner takes item from inventory and stores the contained energy in buffer if it has available capacity
3. Fluid system moves fluid
Now imagine you want an electric boiler. All you have to change are the components of the entity:
1. Power grid coverage
2. Fluid box
3. Fluid connector(s)
4. Power grid consumer
5. Energy buffer
6. Fluid heater
1. Fluid heater increases temperature of fluid box taking energy from buffer
2. Consumer takes energy from power grid and stores it in energy buffer if it has available capacity
3. Fluid system moves fluid
This approach gives you a lot more re-usability of components. As each component/system is plug-and-play. Some might now look at this and think "this is adding more things, therefore overhead, therefore must be slow", but actually the opposite is the case. Because the "fluid heater" system can process *all* fluid heater components in one batch, data and instruction cache utilization is much better (if arranged in arrays). So doing job 1 for all fluid heaters, then job 2 for all energy converters, then job 3 for all fluid boxes is CPU cache friendlier than doing jobs 1, 2, 3 for boiler 1, then jobs 1, 2, 3 for boiler 2, etc. This means that if, for example, you have both boilers and electric boilers in the game the "fluid heater" system runs for both boiler types because they have "fluid heater" components. (It actually doesn't iterate entities searching for ones with associated fluid heaters but instead the one big "fluid heater" array.) This is a really important aspect of this design.
The better cache utilization comes from the fact that the same operation (usually short, fitting well into icache) is executed for every array element, and once done with the component type it is never touched again and the cache progresses to the next array. The prefetcher plays a big role in this. Even if a system needs to process an array multiple times doing different things the data is "hot" and likely still in L1/L2 cache (how important cache locality is can be seen in experiments where bubble sort beats any other sorting algorithm for small data sets and even linear searches in arrays can be faster than binary search).
Under this system an entity is defined by the components associated with it. From the game engine's perspective there is no difference between a coal fueled car and an electric car (that can only move in range of power poles, wouldn't that be funny). There are no dedicated "fuel car" and "electric car" entities, because entities are just IDs. It's how you wire up the components that determines the behavior of an entity. What we know as "prototype" would no longer reflect an actual C++ class but a blueprint to the game engine on what components to instantiate and how to connect them together (often called an "archetype" in other engines).
So how would this affect mods? Let's assume I want to make a wind turbine. Currently I'd have to make an entity representing the actual turbine and the object the player places on the ground. I need to create an internal-only steam engine and fill it with a hot liquid based on wind speed. This means hooking into on_built_entity, on_entity_died, on_preplayer_mined_item and on_robot_pre_mined, creating the second entity, ensuring the second entity is invisible, invincible and cannot be marked for deconstruction (otherwise the whole thing falls apart) and remove it when the turbine dies or is mined. But I won't get around the issue of the mouse-over info for my entity to either show nothing or a magic liquid in its fluidbox that doesn't really make any sense. Instead what I want is
1. Power grid coverage
2. Energy buffer
3. Power grid supplier
Then all I have to do in my on_tick is add energy to the buffer based on wind speed, since that is the "special" behavior the C++ code doesn't model. This is how it might look:
The names of components are identifiers used by the game engine to connect components so the C++ high-performance code knows where to take stuff and where to put stuff. With this more fine-grained control over the behavior of entities it becomes easier to build more complex entities without having to run huge amounts of Lua code slowing the game down (and we all know modders like doing that), or having to special-case every combination of components as a dedicated C++ class. Components become the basic building blocks. Many folks in the industry attest to how simple it is to extend a game with such a system (both internally and by modders) because a lot of functionality is already there and can be easily re-used in other entities. I remember the KSP devs giving a talk about this at GDC on how it boosted their productivity once they transitioned to this system.
Now, for all I know the game might already work this way and just not expose it in the mod API making all this talk moot . Otherwise this might be some food for thought to the devs in case they aren't familiar with the entity/component design. It may or may not be a significant endeavor that may or may not be worth it (certainly not if only for the benefit of modders).
Thanks for reading. Now if you excuse me, I have a mod to make.
I am currently in the process of writing my first mod for Factorio and there are quite some aspects of the API that I find cumbersome. First a bit of background about me: I have been programming since I can remember (mostly C++) and my professional work is centered around 3D/game engines, so I do know how these things work (I also speak Czech ). This is partly the reason why the mod API surprised my a bit, since I assume it reflects how the game works internally.
To my understanding from reading the forum a "type" in Lua corresponds to a class in C++ where entities are probably stored in big arrays, one for each entity type and processed every tick (hopefully without abstract base classes, virtual functions, and the whole inheritence forest problem). So this basically makes it impossible to change an entity's behavior without having to touch C++ code. It also means that if I want to have an electric boiler I also have to touch C++ code and either hardcode a new entity class or give the existing entity some new operation mode (I think electric boiler mods currently actually use a "pump" type to circumvent this, with lots of Lua hacks in on_tick). To me this indicates a quite inflexible system, both for mod devs *and* for the game devs.
I have also looked at other mods to get some insights and see whether I'm doing something wrong. One particular pattern showed up repeatedly, especially when they try to model an entity that is more complex. They listen to the on_built_entity event, then create some new internal invisible entities, use the on_tick event to modify those internal entities, and when the "primary" entity is removed they also remove the secondary entities. For example, I wanted to have a simple entity that consumes electricity and does something with it, but there are only a handful entities that are capable of consuming electricity. So what I'm doing is creating an internal pump and every tick remove "energy" from the pump to simulate consumption. And it seems others are doing similar things.
This "entity amalgamation" is so ubiquitous that I think it deserves attention.
There is an easy solution called the entity/component system (KSP calls it part/modules but it's the same thing). Are the devs familiar with that concept?
In this pattern an entity is actually nothing else but a container for components (in some engines it is literally a container of pointers to components, in others it's just an ID). An entity doesn't do anything on its own, it is just an abstraction used to associate components to the same entity. It's the components which are responsible for behavior. Actually, even the components are just "plain old data" objects. Every component type has an associated system which does the real work. Every tick the system for component type A processes all components of type A, where all components of type A are stored in one big array containing *all* components of type A active in the game. Then system B processes all components of type B, and so on. This arrangement has very good data and instruction cache efficiency. In such a design components are the smallest granularity of "behavior" in the game. Let me show this on the example of a boiler. Currently a boiler is a hardcoded entity that does one specific thing. But it can actually be disassembled into multiple components/behaviors:
1. Inventory
2. Fluid box
3. Fluid connector(s)
4. Fuel burner
5. Energy buffer
6. Fluid heater
Now there is no longer a "boiler" per-se. The fact it is a boiler is the result of how its components interact. The update every tick goes through each system and manipulates all components of that type:
1. Fluid heater increases temperature of fluid box taking energy from buffer
2. Burner takes item from inventory and stores the contained energy in buffer if it has available capacity
3. Fluid system moves fluid
Now imagine you want an electric boiler. All you have to change are the components of the entity:
1. Power grid coverage
2. Fluid box
3. Fluid connector(s)
4. Power grid consumer
5. Energy buffer
6. Fluid heater
1. Fluid heater increases temperature of fluid box taking energy from buffer
2. Consumer takes energy from power grid and stores it in energy buffer if it has available capacity
3. Fluid system moves fluid
This approach gives you a lot more re-usability of components. As each component/system is plug-and-play. Some might now look at this and think "this is adding more things, therefore overhead, therefore must be slow", but actually the opposite is the case. Because the "fluid heater" system can process *all* fluid heater components in one batch, data and instruction cache utilization is much better (if arranged in arrays). So doing job 1 for all fluid heaters, then job 2 for all energy converters, then job 3 for all fluid boxes is CPU cache friendlier than doing jobs 1, 2, 3 for boiler 1, then jobs 1, 2, 3 for boiler 2, etc. This means that if, for example, you have both boilers and electric boilers in the game the "fluid heater" system runs for both boiler types because they have "fluid heater" components. (It actually doesn't iterate entities searching for ones with associated fluid heaters but instead the one big "fluid heater" array.) This is a really important aspect of this design.
The better cache utilization comes from the fact that the same operation (usually short, fitting well into icache) is executed for every array element, and once done with the component type it is never touched again and the cache progresses to the next array. The prefetcher plays a big role in this. Even if a system needs to process an array multiple times doing different things the data is "hot" and likely still in L1/L2 cache (how important cache locality is can be seen in experiments where bubble sort beats any other sorting algorithm for small data sets and even linear searches in arrays can be faster than binary search).
Under this system an entity is defined by the components associated with it. From the game engine's perspective there is no difference between a coal fueled car and an electric car (that can only move in range of power poles, wouldn't that be funny). There are no dedicated "fuel car" and "electric car" entities, because entities are just IDs. It's how you wire up the components that determines the behavior of an entity. What we know as "prototype" would no longer reflect an actual C++ class but a blueprint to the game engine on what components to instantiate and how to connect them together (often called an "archetype" in other engines).
So how would this affect mods? Let's assume I want to make a wind turbine. Currently I'd have to make an entity representing the actual turbine and the object the player places on the ground. I need to create an internal-only steam engine and fill it with a hot liquid based on wind speed. This means hooking into on_built_entity, on_entity_died, on_preplayer_mined_item and on_robot_pre_mined, creating the second entity, ensuring the second entity is invisible, invincible and cannot be marked for deconstruction (otherwise the whole thing falls apart) and remove it when the turbine dies or is mined. But I won't get around the issue of the mouse-over info for my entity to either show nothing or a magic liquid in its fluidbox that doesn't really make any sense. Instead what I want is
1. Power grid coverage
2. Energy buffer
3. Power grid supplier
Then all I have to do in my on_tick is add energy to the buffer based on wind speed, since that is the "special" behavior the C++ code doesn't model. This is how it might look:
Code: Select all
data:extend({
{
-- note there is no "type" here
name = "wind-turbine",
..., -- graphics, animations, stuff
components = {
grid = {
type = "power-grid-coverage",
},
buffer = {
type = "energy-buffer",
capacity = "100kJ",
},
generator = {
type = "power-grid-supplier",
source = "buffer",
},
},
},
})
Now, for all I know the game might already work this way and just not expose it in the mod API making all this talk moot . Otherwise this might be some food for thought to the devs in case they aren't familiar with the entity/component design. It may or may not be a significant endeavor that may or may not be worth it (certainly not if only for the benefit of modders).
Thanks for reading. Now if you excuse me, I have a mod to make.