Page 2 of 5

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 4:56 pm
by Oktokolo
Twinsen wrote:You just earned yourself 2 more weeks about blueprint library, mister.
I take that and one of that famous bots vs. belts FFF for variety please.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 5:00 pm
by DaemosDaen
posila wrote:
DaemosDaen wrote:You mentioned that you saw better improvements from AMD cards, after reviewing the benchmarks, were they are improvements for the NVidia cards? all I see are AMD and intel adapters.
On the Y axis are CPU names and we didn't mention GPU, because the optiomization is supposed to be just CPU side of the rendering. But we marked tests that run on Intel integrated GPUs with asterisk, because it those cases the CPU was already waiting on GPU to finish rendering.
Fair enough I mis-read the benchmark information then. That's what I get for reading this while working. :) thanks for the follow up benchmark

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 6:26 pm
by krystof1119
OK, so about the lan party, when does it end, and if I'm unable to join, can I connect from home?

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 6:28 pm
by Jap2.0
Ohz wrote:What would be the best PC Build for Factorio (what do you recommend)? Just aim to the most expensive CPU and GPU ?
You want good memory (RAM) latency (so a high speed helps, 16GB should be more than enough) and good single-thread CPU performance. Factorio isn't especially GPU-heavy - most any dedicated GPU with 3GB VRAM should be able to run it quite well. This was a good thread about what are good PC parts for Factorio.


Can we connect to the LAN party remotely (I realize it's called LAN for a reason... but I'm about 4500 miles away).

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 6:32 pm
by Tekky
Thank you very much for writing about the details of your problems with creating a good graphics engine.

As a programmer who is also interested in creating games, I find it very interesting to read about this in the Friday Facts.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 7:16 pm
by wren6991
What a great writeup! Thank you guys! (And also thanks to posila for posting the extra info in this thread). :D

Thanks also for the links to the articles, lots of interesting reading there

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 7:34 pm
by Omnifarious
Ohz wrote:My question may sounds over stupid but I'm not into computer:

What would be the best PC Build for Factorio (what do you recommend)? Just aim to the most expensive CPU and GPU ?
Just tell me how much money you have, and I'll sell you a PC for that amount of money. Want to spend a million dollars, I've definitely got the PC for you.

It'll come with Linux though. No negotiating on that point. But I will pre-load Factorio on it for you.

You might want to say "What would be the PC build on which I could run the largest possible factory with the most time-consuming to render stuff at a full 60/60 FPS/UPS? I don't care how much it costs as long as I'm not overcharged.". That would be better.

Otherwise, you know, I could tell you that a gold plated PC looked really nice and was excellent for playing Factorio on, but it'd cost you $50000. But, you said you wanted the most expensive one, and that is the most expensive one. Or, someone could tell you that you could run Factorio on a supercomputer cluster with 10000 nodes. That would probably be even more expensive, even though any nodes after the first wouldn't help you any (unless you ran Clusterio).

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 7:37 pm
by aland
One of the reasons I love Factorio is that it runs just fine on my antique Ubuntu laptop. I can't run it with all of the options at the best quality, but it runs well and is still a lot of fun. I appreciate all of the work into optimization. It means Factorio will likely still keep working well, and maybe I can turn the quality options up a bit! : :D

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 7:55 pm
by Tekky
Ohz wrote:My question may sounds over stupid but I'm not into computer:

What would be the best PC Build for Factorio (what do you recommend)? Just aim to the most expensive CPU and GPU ?
This has already been discussed in the following thread:

viewtopic.php?f=49&t=51532 Which PC for Factorio?

The thread also has a developer comment.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 9:02 pm
by ThePhantasm
posila -
I think the approach you want is implementing instancing directly in the shader. The drawbacks you mentioned are for the direct calls to ask the API/driver to do the instanced rendering.

What I have done in order to do faster sprite rendering in a modern API is to create a Vertex Buffer that contains the basic 6 vertices (assuming something like a (0,0,0)-(1,1,0) position) with the correct normal/texture coordinates. Then replicate that a few thousand times.
This is the primary rendering Vertex Buffer.

You then create a second Vertex Buffer Stream (DX terminology). In that, you only need 3 fields: New Translation, New Scale, New Texture Coordinates (assuming you're using a single texture for all your sprites)
You set the frequency of this vertex stream to match the size of your sprite (6 vertexes).

Your vertex shader changes to look like:
pos = (vertex.pos * vertex_stream_1.scale) + vertex_stream_1.pos;
uv = (math to translate your texture # to the actual UV on the sprite-map).
If you pretend your sprite-map has 4 entries in a 2x2 grid, the UV coordinate math looks like:
u = (uv.u / 2) + ((vertex_stream_1.texture_num % 2) /2); (modulo # of textures)
v = (uv.v / 2) + ((int)(vertex_stream_1.texture_num / 2); (divide by # of textures, throw away the mantissa)

This is instancing implemented directly in your shaders and should save you quite a bit of CPU time.
The core vertex buffer never changes - you just change the second stream (instance data).

Re: Friday Facts #251 - A Fistful of Frames

Posted: Fri Jul 13, 2018 10:11 pm
by Light
I am pleased this news is about something more important to most if not all of us.

A long time ago I felt it odd that this game seemed to utilise the CPU more than the GPU, as I use multiple systems which are drastically different in hardware yet the better CPU seemed to always win out. The gaming oriented PC performed well enough, yet the workstation PC with a better CPU and shitty GPU managed to slightly overtake it.

Then with the introduction of the HD textures things started to change, as performance tanked on all of them except the better GPU. (Likely due to higher VRAM) However, the differences between high and medium presets in terms of performance was jaw dropping and I've since stuck with medium settings despite the capability of using high until things were better optimised.

I've since been wondering if I could ever return to high settings without such a strong performance hit and lost hope that it was something the devs were looking at until today. Now I have something to look forward to in the future update.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Sat Jul 14, 2018 12:11 am
by QGamer
I am so happy that you have taken the time to optimize your game.
There are some other games that I play that have framerate issues, but I've never had those issues with Factorio. Maybe when you're done with Factorio could you optimize those other games? ;)

But seriously, thank you. :D
Can't wait until next Friday!

Re: Friday Facts #251 - A Fistful of Frames

Posted: Sat Jul 14, 2018 12:28 am
by Griffork
posila wrote:What I wrote originally was pretty technical and it was pretty hard to digest even for other non-graphics programmers on the team so we decided to lighten it up a lot. I decided to put some of the original content in this post, in case some of you are interested to read it.
I'm glad you decided to write about this because (as someone who's written a renderer for work) I'm quite interested in this! I did work in webgl which doesn't have fancy things like geometry shaders, I've included the things that I've learnt that seem relevant, although I'm sure you already know a lot of this.

If you're fragment bound (like you might be on older/integrated GPUs):
- What order are you drawing your sprites in? You should be drawing them front to back (so trees and poles first, then buildings then conveyers and tracks and then ground).
- If you want to get really fancy, you can draw everything from the bottom of the screen to the top and add a z-index in the vertex shader (e.g. all buildings are z-index 0.2) and set the shader so that it can't write to a place with the same z-index or greater than what you've already drawn so ground would only get evaluated around what else has already been drawn.
- Assuming you're writing some of your own fragment shaders, if you have any complex fragment shaders don't rely on if statements to give you a speedup I can go into more detail on this but in a lot of cases both sides (if and else) of an if statement are executed on a gpu.
- Try and make better fitting shapes than just a square for sparsely filled sprites (e.g. trees) because evaluating 20 pixels where nothing is being drawn will take longer than processing an extra 6 vertices.

If you're draw call bound:
As you've already discovered rebinding things or swapping things out takes time. The most expensive is to change a program and then the next most expensive is to change a texture. Try to make sure you draw everything with the same program all at once (e.g. all buildings, electricity poles (not cables), conveyor belts and rails together and draw their shadows afterwards). Do reuse programs as much as possible (it seems like you're already doing this, but just in case), most of your sprites are pretty similar (some have a basic animation but it's not a costly lookup and should be done in the vertex shader) so they should be able to use the same program.
If you're still draw call bound and you're not already you can definitely draw multiple types of things in the same draw call (e.g. belts and assemblers) just passing different values to the draw program (e.g. the location of the sprite on the atlas - assuming you're atlassing them together).

If you're vram bound:
This may not be as useful, but I'm curious as to whether you're running out of vram or if it's taking a while to access. If it's the latter then the only thing I can recommend which you're probably already doing is to render everything with the same texture at the same time if you can. I realised you're exceeding vram limits for low-end graphics cards, so nothing I said was relevant. Sorry!
This is not really a problem I've run into previously because my use-case is too different (when working in 3D you have mipping).

Very happy to talk more about this if you want and pass along any of my learnings if it's useful :)


EDIT:
ThePhantasm wrote:What I have done in order to do faster sprite rendering in a modern API is to create a Vertex Buffer that contains the basic 6 vertices (assuming something like a (0,0,0)-(1,1,0) position) with the correct normal/texture coordinates. Then replicate that a few thousand times.
Exactly what I was trying to get at but after reading their post I got the impression that they're already doing that (at least I hope so).
Actually if you're already using an index buffer you probably want to have only 4 vertices in the buffer and use something like triangle fan and have something like the following in the index buffer [3, 3,1,2,4,4] (the duplicate at the start and end prevents the GPU from attempting to draw stuff between fans) for each triangle so that for all of the supporting information only has to be written 4 times rather than 6 times. This is useful because often what causes a draw bottleneck is the speed of the bus between the CPU and GPU, and you want to limit the amount of data you're trying to send.

Actually on that note, something that will cause your game to slow down is error checking or reading from a map (e.g. getPixel). Don't do either of those at run-time, error checking (gl_getError) or whatever it is will cause the CPU to wait for the GPU to finish, the CPU should never have to wait on the GPU. Only do error checking at startup when you're making programs, buffer and textures but not during run-time.

It's been a couple of years since I did this stuff in detail so I might be a little rusty at points (function call names and whether to use fan or strip), but if it's going to be useful to you I can look up the correct info.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Sat Jul 14, 2018 12:32 am
by HolySmoke
On the topic of quad vertices, have you tried drawing them as triangle strips? Could eliminate the index buffer.

Re: Friday Facts #251 - A Fistful of Frames

Posted: Sat Jul 14, 2018 12:57 am
by Vandroiy
I'm confused how all this detailed performance knowledge is going on and then the topic is performance on 25k sprites.

I don't really know performance details like in posila's first reply, but I wrote a sprite renderer in OpenGL 4.3 and F# that can render a mix of hundreds of thousands of CPU-pushed sprites and millions of GPU-pushed particle sprites smoothly on fairly old hardware (as long as it's not limited by overdraw/fragment shader stuff).

The method was a vertex shader that gets an ID as input and then grabs everything from VRAM buffers. For example, a buffer that can fetch you texture page and position from a sprite ID. Where necessary, these buffers would hold IDs that reference into other buffers, with data structures chosen to minimize redundant updates and allow on-GPU movement of particles, which allows the big particle counts. The vertex shader to unpack everything always has to run six times per sprite, 'cause of the six vertices for a rectangle, but who cares? At mere millions of invocations per frame, isn't this just warmup for the gazillions of shader units GPUs have?

There was nothing like fancy invalidate buffer flags or such, and the CPU code was the usual garbage collected .NET spam, plus the not-always-fast F# on top. Low-level performance is surely no match here. But five-digit sprite counts weren't an issue. Is there maybe something in this approach that could be useful for Factorio?

Re: Friday Facts #251 - A Fistful of Frames

Posted: Sat Jul 14, 2018 1:31 am
by Demongornot
I have never done any GPU computing or graphic engine code so I might be writing the most stupid thing ever here, but wouldn't a initial (X, Y) coordinate with X and Y offset be better ?
If the GPU can always finish faster than the CPU on your code, wouldn't this make the CPU having to send less initial data while the GPU who can handle many calculations at once transform (X,Y) + offsets as :
(Guessing you render with the standard clockwise starting on top left) : (X1 & X4 ,Y1 & Y4) = (X,Y); (X2, Y2) = (X + X offset, Y); (X3 & X5, Y3 & Y5) = (X + X offset, Y + Y offset); (X6, Y6) = (X, Y + Y offset).
So both (X, Y) 1,2,3 & 4,5,6 of the two polygons are combined using offsets saving in what the CPU send and combining calculations on vertex ?

Re: Friday Facts #251 - A Fistful of Frames

Posted: Sat Jul 14, 2018 6:17 am
by Jerry Oak
Are computers in NTK accesible to all visitors or do I have to pay the subscription?

Re: Friday Facts #251 - A Fistful of Frames

Posted: Sat Jul 14, 2018 7:05 am
by shaman
@posila:
For me as a graphics programmer, your post was very interesting.
So far, I always implemented sprites in the Geometry Shader. But after your investigation, I think, I'll try out the method with a fixed index buffer and the sprite transformation in the vertex shader the next time I need it.
More of that, please!

Re: Friday Facts #251 - A Fistful of Frames

Posted: Sat Jul 14, 2018 8:03 am
by ske
Do you have statistics of your players on their hardware, how big their factories actually are and what FPS and UPS they are running the game at?

What percentage is pushing the limits with their big factories?

Re: Friday Facts #251 - A Fistful of Frames

Posted: Sat Jul 14, 2018 8:25 am
by lottery248
fam.

i got a question for you: there are not enough CPU for overall testing, the GPU test was not wide enough, like missing 8-core tier (???)

[hr]

by the way, the NTK will have Factorio, does that mean this game is originated in Czech?