mrvn wrote:With fork() on the other hand you just copy the page tables to mark memory copy-on-write.
Clarification: At a hardware level, the pages are not marked copy-on-write -- they are marked read-only. The OS remembers internally (purely in software) that these pages are to be treated as copy-on-write.
mrvn wrote:That would still mean it runs in parallel with the game.
No, that is incorrect. While the serialization work would execute in parallel with the game thread, the copy(-on-write) operations do not run in parallel with the game thread. The sequence that actually happens is:
1. fork() marks all the pages as read-only
...
2. the game thread tries to update (write to) some memory (in a page it hasn't written since step #1)
3. that write operation page-faults (fails)
4. the hardware suspends the game thread and invokes the OS page fault handler (interrupt 14 - #PF)
5. the OS page fault handler then examines the page fault information provided to it by the hardware and determines that the fault is due to a write to a copy-on-write page
(The details the hardware provides include that this was a write, the address of the instruction that was trying to do the write, and the data address that faulted which for an unaligned write could be slightly different than the address that was written, along with various other bits and bobs. The hardware does not provide the OS information about what value was being written.)
6. the OS copies the page and adjusts the game thread's page table to have a read-write page table entry pointing to the new copy
(It could in theory change everyone else to point to the new copy instead, but that would in general be more costly.)
7. the OS returns execution back to the game thread (which has been suspended for the entire copy operation)
8. the game thread reissues the write that previously failed, and this time it works
As you can see, the entire copy-on-write operation is serialized with the game thread, not in parallel. fork() is a low-performance solution (relative to the scheme I suggested) because:
1. For every single page copied you have to pay the price of a page fault, including two context switches (which trashes your cache), not to mention time spent actually executing the OS code and updating the page table. (There are no page faults in my scheme.)
2. That copying is single threaded. That's bad for performance because it generally takes more than one thread to fully saturate the memory system of a decent gaming box, so the copy will be much slower than necessary. (My scheme allows for a multithreaded copy operation.)
3. Data that doesn't need to be copied will be copied. E.g. when the game goes to do graphics and sound work that will involve writing to memory which will then trigger copy-on-write. That data doesn't need to be copied though because it's not used by the save-game thread. (My scheme allows in-process separation of such data so it doesn't get copied. To do the same with fork you'd have to move all such data to yet another process which would then likely have a significant, possibly crippling, performance impact on normal tick updates.)
4. It only copies single pages at a time, and single page copies are slightly slower than copying superpages due to page-bank misses in the DRAM. (My scheme allows copying superpages. Yes, this one is minor, but I hate wasting performance.)
Oh, and fork() isn't available on Windows. My scheme works under all OSes. Why is anyone still talking about fork?
(The devs aren't going to do any of this so the discussion is not useful for that. But I was once again "triggered" by people posting *false* information.)