Page 1 of 5

Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:03 pm
by Klonan

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:03 pm
by posila
What I wrote originally was pretty technical and it was pretty hard to digest even for other non-graphics programmers on the team so we decided to lighten it up a lot. I decided to put some of the original content in this post, in case some of you are interested to read it. I might be adding more info to this post if I feel like something needs to be explained further.

The vertex buffer streaming articles (and generally others talking about the same topic) suggest to use vertex buffer of fixed size. Map the buffer with D3D11_MAP_WRITE_NO_OVERWRITE/GL_MAP_UNSYNCHRONIZED_BIT flag, copy your batch into a part that has not been used in previous draw calls, unmap it and make a draw call. Once the buffer is full, map it with D3D11_MAP_DISCARD/GL_MAP_INVALIDATE_BUFFER_BIT flags. This will let the driver know you don’t care about previous content of the buffer anymore, and either gives you the same chunk of memory, if there are no pending draw calls using the buffer, or it will allocate new memory for you and keeps the old buffer around until pending draw calls are finished. Chances are, the driver will end up reusing previously allocated buffers for subsequent discards, so the system stabilizes in state where there won’t be any more dynamic memory allocations. OpenGL reffers to this pattern as buffer orphaning.
So we implemented this version of streaming, and it was pretty fast, but we noticed mapping a buffer to system memory takes still lot of time even with D3D11_MAP_WRITE_NO_OVERWRITE/GL_MAP_UNSYNCHRONIZED_BIT flags as did memcpy of a batch to the buffer. That’s why we map the buffer once, write vertices into it directly, and write as many batches as we can before we unmap it. Then we loop through list of prepared batches and commit a draw call for each batch. This eliminated map/unmap per batch and unnecessary memcopy and gave us nice boost. We still continue using streaming though, because we often need to flush the buffer before it is full (when we change render target, or wrender something else than sprites …)

After all of this the new rendering code was already faster than the old one, but we were noticing that calling our draw functions takes-up lot of overhead time. At first we thought it is due to memory latency, but adding prefetching to this code speeded it up only very little. After looking into generated assembly code we realized the draw function uses many CPU registers, values of which are backup on stack when the function enter and restored when the function exits. This was creating lot of the overhead, because we were iterating through sprite draw commands and calling render on them one by one. We changed render to operate over range of sprites draw commands instead and gained quite large additional speedup.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:12 pm
by steinio
Nice to read about your progress but boring stuff.

Maybe add a random screenshot of the new GUI :)

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:15 pm
by Twinsen
steinio wrote:Nice to read about your progress but boring stuff.

Maybe add a random screenshot of the new GUI :)
You just earned yourself 2 more weeks about blueprint library, mister.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:16 pm
by MasterBuilder
Twinsen wrote:
steinio wrote:Nice to read about your progress but boring stuff.

Maybe add a random screenshot of the new GUI :)
You just earned yourself 2 more weeks about blueprint library, mister.
You say that like it's a bad thing :)

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:17 pm
by CakeDog
Excellent write-up. The technical aspects of optimization are often overlooked and rarely appreciated as much as they should be.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:28 pm
by fendy3002
CakeDog wrote:Excellent write-up. The technical aspects of optimization are often overlooked and rarely appreciated as much as they should be.
With stronger and stronger machines nowadays, optimization are often overlooked in favor of faster development. Bethesda games are primary example. It's very nice to see Factorio still try to keep the optimization high. I know it is hard, unfun, and many times not rewarding, but very useful for existing userbase.

On the other hand, the topic today is very hard indeed, hopefully there'll be users that can give advices, ideas and input for that.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:30 pm
by Meddleman
And I was wondering what the hell I should do on my August holidays. Thanks for giving me a good excuse to visit Prague! If anyone wants to meet up for a few beers and a LAN session I'm more than happy to tag along.

Question for the devs, will these computers have 0.17 on them? Or the current 0.16 stable?

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:38 pm
by Klonan
Meddleman wrote:And I was wondering what the hell I should do on my August holidays. Thanks for giving me a good excuse to visit Prague! If anyone wants to meet up for a few beers and a LAN session I'm more than happy to tag along.

Question for the devs, will these computers have 0.17 on them? Or the current 0.16 stable?
0.16 of course :)

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:41 pm
by ratchetfreak
In newer opengl versions you can have buffers mapped persistently. Then you can do away with the map/unmap operation.

Did you also consider tesselation shaders? It's the more specialized little brother of geom shading that doesn't have its drawbacks.

It will let you expand a single vertex into a quad using a constant tess control shader (domain shader in D3D speak) and the frag eval shader (hull shader in D3D speak) is what the vertex shader used to be.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:45 pm
by eradicator
MasterBuilder wrote:
Twinsen wrote:
steinio wrote:Nice to read about your progress but boring stuff.
Maybe add a random screenshot of the new GUI :)
You just earned yourself 2 more weeks about blueprint library, mister.
You say that like it's a bad thing :)
Yea. I vote to extend to 4 weeks. Until you give in and give us directory trees :D.

How often are you guys at the Library?
I'm considering staying a day in Prague on July 30th (instead of going straight through it), but haven't yet entirely convinced myself.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:50 pm
by Vertigo
I know some of the words from posila's part of the post, but I'm happy that 4790k got some great results :lol:

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 3:55 pm
by posila
ratchetfreak wrote:In newer opengl versions you can have buffers mapped persistently. Then you can do away with the map/unmap operation.
I know, but we are really doing OpenGL just for legacy support, so it doesn't make sense for us to have two different backends for OGL 3.3 and 4.5. We will do Vulkan instead.
ratchetfreak wrote:Did you also consider tesselation shaders? It's the more specialized little brother of geom shading that doesn't have its drawbacks.
We didn't. We are targeting DirectX 10 class hardware, where tessellation is not available yet.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 4:01 pm
by DrNick
Calling it now, next FFF is "For a Few Frames More", followed by "The Good, The Bad, and the Poorly Optimized"

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 4:03 pm
by posila
DrNick wrote:Calling it now, next FFF is "For a Few Frames More", followed by "The Good, The Bad, and the Poorly Optimized"
;) I am still not sure about the last one :D

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 4:06 pm
by DaemosDaen
You mentioned that you saw better improvements from AMD cards, after reviewing the benchmarks, were they are improvements for the NVidia cards? all I see are AMD and intel adapters.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 4:13 pm
by Ohz
My question may sounds over stupid but I'm not into computer:

What would be the best PC Build for Factorio (what do you recommend)? Just aim to the most expensive CPU and GPU ?

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 4:19 pm
by dasiro
Next we need to improve the GPU side of things, mainly excessive usage of video memory (VRAM),
YES PLEASE, (older) cards with less than 3GB are getting hit really hard so that would be awesome

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 4:22 pm
by dasiro
Klonan wrote:
Meddleman wrote:And I was wondering what the hell I should do on my August holidays. Thanks for giving me a good excuse to visit Prague! If anyone wants to meet up for a few beers and a LAN session I'm more than happy to tag along.

Question for the devs, will these computers have 0.17 on them? Or the current 0.16 stable?
0.16 of course :)
so "when it's done" is at least a few weeks to go :(

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 4:35 pm
by posila
DaemosDaen wrote:You mentioned that you saw better improvements from AMD cards, after reviewing the benchmarks, were they are improvements for the NVidia cards? all I see are AMD and intel adapters.
On the Y axis are CPU names and we didn't mention GPU, because the optiomization is supposed to be just CPU side of the rendering. But we marked tests that run on Intel integrated GPUs with asterisk, because it those cases the CPU was already waiting on GPU to finish rendering.

Here are benchmarks on my computer with the two different GPUs:
more-benchmarks2.png
more-benchmarks2.png (35.88 KiB) Viewed 13817 times