[0.13.8] [Pending] on_research_finished research crash.
Posted: Thu Jul 14, 2016 4:07 am
While trying to update the Research Queue mod (viewtopic.php?f=92&t=17219) to work on Factorio v0.13.8, I ran into a MEMORY_ACCESS_EXCEPTION.
Was looking into the LUA code trying to figure out what could cause this exception and nothing was sticking out, so opened up Visual Studio and decided to Debug Factorio with it to try and see what was happening and found the cause of this bug.
I will use the symbol names below that VS gave me to help isolate where the issue is.
I also found another related issue which wont cause a crash, but instead an infinite loop (or close enough) in Factorio. As they are both related I will talk about them both here, however I will talk about the MEMORY_ACCESS_EXCEPTION one first.
Quick background into the code
In the function ResearchManager::onResearchProgressChanged you do a call to Technology::setResearchAndApply. During that function call another call is made out to the LUA on_research_finished event. After Technology::setResearchAndApply finishes, you decrement a value (I will call it "Research Flag" from now on, it has no symbol associated with it), check Research Flag is zero and returns.
If a LUA script calls game.forces[event.research.force.name].current_research = (new value) then this will cause a new research to be started. As part of that process Research Flag is set to 1.
Memory Access Crash
The game does need to be in a specific state for this crash to occur, otherwise instead of crashing, it will just cause an infinite loop. The state the game needs to be in is:
No research is currently going on.
A LUA script calls game.forces[some force].current_research = (new value) anywhere in its code.
Once this happens, the following will cause the crash.
The issue occurs when the LUA script call to set the research is done during the on_research_finished event. In this case:
Technology::setResearchAndApply is called
The LUA script is called
The LUA script will set the new technology to research, Research Flag is set to 1.
Technology::setResearchAndApply returns
Research Flag's value is decremented by 1.
Research Flag's value is checked against zero, is found equal and the function returns.
However the game is now in a unusual state, because research is going on but ResearchFlag's value is set to zero. If the next time ResearchManager::onResearchProgressChanged() is called the LUA script again sets a new research, the above steps will again happen, the game will continue to run, but will still be in this unexpected state.
The next step of this issue occurs when the LUA script does not set a new research. In this case the following happens
Technology::setResearchAndApply is called
The LUA script is called
The LUA script does not set the new technology to research, Research Flag stays at 0.
Technology::setResearchAndApply returns
Research Flag's value is decremented by 1. It now is -1
Research Flag's value is checked against zero, is found not to be equal and the code continues.
Code reads from a pointer that is uninitialized (nullptr), this causes a MEMORY_ACCESS_EXCEPTION and the game crashes.
Infinite Loop
To set up this case, the game must already be doing a research when Research Queue is told to start up a new queue that has the same technology. (eg, you are researching Automation 1, and Research Queue is told to research everything up to Electronics 1).
In this case during the last stage of the MEMORY ACCESS crash, the following will occur:
Technology::setResearchAndApply is called
The LUA script is called
The LUA script does not set the new technology to research, Research Flag is stays at 0.
Technology::setResearchAndApply returns
Research Flag's value is decremented by 1. It now is -1
Research Flag's value is checked against zero, is found not to be equal and the code continues.
Code checks a pointer (but in this case it is initialized) and continues on, everything seems to be working - but we are still in a bad state.
After some time a new research is told to start. As part of the code to start a new research it will clear any existing researches and as it thinks there is existing research (Research Flag's value is != 0) it will cause an almost infinite loop by constantly trying to clear research until research flag is set back to zero.
Was looking into the LUA code trying to figure out what could cause this exception and nothing was sticking out, so opened up Visual Studio and decided to Debug Factorio with it to try and see what was happening and found the cause of this bug.
I will use the symbol names below that VS gave me to help isolate where the issue is.
I also found another related issue which wont cause a crash, but instead an infinite loop (or close enough) in Factorio. As they are both related I will talk about them both here, however I will talk about the MEMORY_ACCESS_EXCEPTION one first.
Quick background into the code
In the function ResearchManager::onResearchProgressChanged you do a call to Technology::setResearchAndApply. During that function call another call is made out to the LUA on_research_finished event. After Technology::setResearchAndApply finishes, you decrement a value (I will call it "Research Flag" from now on, it has no symbol associated with it), check Research Flag is zero and returns.
If a LUA script calls game.forces[event.research.force.name].current_research = (new value) then this will cause a new research to be started. As part of that process Research Flag is set to 1.
Memory Access Crash
The game does need to be in a specific state for this crash to occur, otherwise instead of crashing, it will just cause an infinite loop. The state the game needs to be in is:
No research is currently going on.
A LUA script calls game.forces[some force].current_research = (new value) anywhere in its code.
Once this happens, the following will cause the crash.
The issue occurs when the LUA script call to set the research is done during the on_research_finished event. In this case:
Technology::setResearchAndApply is called
The LUA script is called
The LUA script will set the new technology to research, Research Flag is set to 1.
Technology::setResearchAndApply returns
Research Flag's value is decremented by 1.
Research Flag's value is checked against zero, is found equal and the function returns.
However the game is now in a unusual state, because research is going on but ResearchFlag's value is set to zero. If the next time ResearchManager::onResearchProgressChanged() is called the LUA script again sets a new research, the above steps will again happen, the game will continue to run, but will still be in this unexpected state.
The next step of this issue occurs when the LUA script does not set a new research. In this case the following happens
Technology::setResearchAndApply is called
The LUA script is called
The LUA script does not set the new technology to research, Research Flag stays at 0.
Technology::setResearchAndApply returns
Research Flag's value is decremented by 1. It now is -1
Research Flag's value is checked against zero, is found not to be equal and the code continues.
Code reads from a pointer that is uninitialized (nullptr), this causes a MEMORY_ACCESS_EXCEPTION and the game crashes.
Infinite Loop
To set up this case, the game must already be doing a research when Research Queue is told to start up a new queue that has the same technology. (eg, you are researching Automation 1, and Research Queue is told to research everything up to Electronics 1).
In this case during the last stage of the MEMORY ACCESS crash, the following will occur:
Technology::setResearchAndApply is called
The LUA script is called
The LUA script does not set the new technology to research, Research Flag is stays at 0.
Technology::setResearchAndApply returns
Research Flag's value is decremented by 1. It now is -1
Research Flag's value is checked against zero, is found not to be equal and the code continues.
Code checks a pointer (but in this case it is initialized) and continues on, everything seems to be working - but we are still in a bad state.
After some time a new research is told to start. As part of the code to start a new research it will clear any existing researches and as it thinks there is existing research (Research Flag's value is != 0) it will cause an almost infinite loop by constantly trying to clear research until research flag is set back to zero.