Friday Facts #251 - A Fistful of Frames

Regular reports on Factorio development.
User avatar
Oktokolo
Filter Inserter
Filter Inserter
Posts: 875
Joined: Wed Jul 12, 2017 5:45 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Oktokolo »

Twinsen wrote:You just earned yourself 2 more weeks about blueprint library, mister.
I take that and one of that famous bots vs. belts FFF for variety please.

DaemosDaen
Long Handed Inserter
Long Handed Inserter
Posts: 69
Joined: Sat May 16, 2015 4:39 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by DaemosDaen »

posila wrote:
DaemosDaen wrote:You mentioned that you saw better improvements from AMD cards, after reviewing the benchmarks, were they are improvements for the NVidia cards? all I see are AMD and intel adapters.
On the Y axis are CPU names and we didn't mention GPU, because the optiomization is supposed to be just CPU side of the rendering. But we marked tests that run on Intel integrated GPUs with asterisk, because it those cases the CPU was already waiting on GPU to finish rendering.
Fair enough I mis-read the benchmark information then. That's what I get for reading this while working. :) thanks for the follow up benchmark

krystof1119
Manual Inserter
Manual Inserter
Posts: 4
Joined: Sun Apr 15, 2018 8:18 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by krystof1119 »

OK, so about the lan party, when does it end, and if I'm unable to join, can I connect from home?

Jap2.0
Smart Inserter
Smart Inserter
Posts: 2333
Joined: Tue Jun 20, 2017 12:02 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Jap2.0 »

Ohz wrote:What would be the best PC Build for Factorio (what do you recommend)? Just aim to the most expensive CPU and GPU ?
You want good memory (RAM) latency (so a high speed helps, 16GB should be more than enough) and good single-thread CPU performance. Factorio isn't especially GPU-heavy - most any dedicated GPU with 3GB VRAM should be able to run it quite well. This was a good thread about what are good PC parts for Factorio.


Can we connect to the LAN party remotely (I realize it's called LAN for a reason... but I'm about 4500 miles away).
There are 10 types of people: those who get this joke and those who don't.

Tekky
Smart Inserter
Smart Inserter
Posts: 1036
Joined: Sun Jul 31, 2016 10:53 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Tekky »

Thank you very much for writing about the details of your problems with creating a good graphics engine.

As a programmer who is also interested in creating games, I find it very interesting to read about this in the Friday Facts.

wren6991
Burner Inserter
Burner Inserter
Posts: 12
Joined: Sun Dec 17, 2017 3:56 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by wren6991 »

What a great writeup! Thank you guys! (And also thanks to posila for posting the extra info in this thread). :D

Thanks also for the links to the articles, lots of interesting reading there

User avatar
Omnifarious
Filter Inserter
Filter Inserter
Posts: 260
Joined: Wed Jul 26, 2017 3:24 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Omnifarious »

Ohz wrote:My question may sounds over stupid but I'm not into computer:

What would be the best PC Build for Factorio (what do you recommend)? Just aim to the most expensive CPU and GPU ?
Just tell me how much money you have, and I'll sell you a PC for that amount of money. Want to spend a million dollars, I've definitely got the PC for you.

It'll come with Linux though. No negotiating on that point. But I will pre-load Factorio on it for you.

You might want to say "What would be the PC build on which I could run the largest possible factory with the most time-consuming to render stuff at a full 60/60 FPS/UPS? I don't care how much it costs as long as I'm not overcharged.". That would be better.

Otherwise, you know, I could tell you that a gold plated PC looked really nice and was excellent for playing Factorio on, but it'd cost you $50000. But, you said you wanted the most expensive one, and that is the most expensive one. Or, someone could tell you that you could run Factorio on a supercomputer cluster with 10000 nodes. That would probably be even more expensive, even though any nodes after the first wouldn't help you any (unless you ran Clusterio).

aland
Manual Inserter
Manual Inserter
Posts: 4
Joined: Thu Apr 28, 2016 6:27 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by aland »

One of the reasons I love Factorio is that it runs just fine on my antique Ubuntu laptop. I can't run it with all of the options at the best quality, but it runs well and is still a lot of fun. I appreciate all of the work into optimization. It means Factorio will likely still keep working well, and maybe I can turn the quality options up a bit! : :D

Tekky
Smart Inserter
Smart Inserter
Posts: 1036
Joined: Sun Jul 31, 2016 10:53 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Tekky »

Ohz wrote:My question may sounds over stupid but I'm not into computer:

What would be the best PC Build for Factorio (what do you recommend)? Just aim to the most expensive CPU and GPU ?
This has already been discussed in the following thread:

viewtopic.php?f=49&t=51532 Which PC for Factorio?

The thread also has a developer comment.

ThePhantasm
Manual Inserter
Manual Inserter
Posts: 3
Joined: Thu Jun 22, 2017 6:08 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by ThePhantasm »

posila -
I think the approach you want is implementing instancing directly in the shader. The drawbacks you mentioned are for the direct calls to ask the API/driver to do the instanced rendering.

What I have done in order to do faster sprite rendering in a modern API is to create a Vertex Buffer that contains the basic 6 vertices (assuming something like a (0,0,0)-(1,1,0) position) with the correct normal/texture coordinates. Then replicate that a few thousand times.
This is the primary rendering Vertex Buffer.

You then create a second Vertex Buffer Stream (DX terminology). In that, you only need 3 fields: New Translation, New Scale, New Texture Coordinates (assuming you're using a single texture for all your sprites)
You set the frequency of this vertex stream to match the size of your sprite (6 vertexes).

Your vertex shader changes to look like:
pos = (vertex.pos * vertex_stream_1.scale) + vertex_stream_1.pos;
uv = (math to translate your texture # to the actual UV on the sprite-map).
If you pretend your sprite-map has 4 entries in a 2x2 grid, the UV coordinate math looks like:
u = (uv.u / 2) + ((vertex_stream_1.texture_num % 2) /2); (modulo # of textures)
v = (uv.v / 2) + ((int)(vertex_stream_1.texture_num / 2); (divide by # of textures, throw away the mantissa)

This is instancing implemented directly in your shaders and should save you quite a bit of CPU time.
The core vertex buffer never changes - you just change the second stream (instance data).

User avatar
Light
Filter Inserter
Filter Inserter
Posts: 678
Joined: Mon Oct 10, 2016 6:19 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Light »

I am pleased this news is about something more important to most if not all of us.

A long time ago I felt it odd that this game seemed to utilise the CPU more than the GPU, as I use multiple systems which are drastically different in hardware yet the better CPU seemed to always win out. The gaming oriented PC performed well enough, yet the workstation PC with a better CPU and shitty GPU managed to slightly overtake it.

Then with the introduction of the HD textures things started to change, as performance tanked on all of them except the better GPU. (Likely due to higher VRAM) However, the differences between high and medium presets in terms of performance was jaw dropping and I've since stuck with medium settings despite the capability of using high until things were better optimised.

I've since been wondering if I could ever return to high settings without such a strong performance hit and lost hope that it was something the devs were looking at until today. Now I have something to look forward to in the future update.

QGamer
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Apr 14, 2017 9:27 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by QGamer »

I am so happy that you have taken the time to optimize your game.
There are some other games that I play that have framerate issues, but I've never had those issues with Factorio. Maybe when you're done with Factorio could you optimize those other games? ;)

But seriously, thank you. :D
Can't wait until next Friday!

Griffork
Burner Inserter
Burner Inserter
Posts: 10
Joined: Sat Aug 05, 2017 2:17 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Griffork »

posila wrote:What I wrote originally was pretty technical and it was pretty hard to digest even for other non-graphics programmers on the team so we decided to lighten it up a lot. I decided to put some of the original content in this post, in case some of you are interested to read it.
I'm glad you decided to write about this because (as someone who's written a renderer for work) I'm quite interested in this! I did work in webgl which doesn't have fancy things like geometry shaders, I've included the things that I've learnt that seem relevant, although I'm sure you already know a lot of this.

If you're fragment bound (like you might be on older/integrated GPUs):
- What order are you drawing your sprites in? You should be drawing them front to back (so trees and poles first, then buildings then conveyers and tracks and then ground).
- If you want to get really fancy, you can draw everything from the bottom of the screen to the top and add a z-index in the vertex shader (e.g. all buildings are z-index 0.2) and set the shader so that it can't write to a place with the same z-index or greater than what you've already drawn so ground would only get evaluated around what else has already been drawn.
- Assuming you're writing some of your own fragment shaders, if you have any complex fragment shaders don't rely on if statements to give you a speedup I can go into more detail on this but in a lot of cases both sides (if and else) of an if statement are executed on a gpu.
- Try and make better fitting shapes than just a square for sparsely filled sprites (e.g. trees) because evaluating 20 pixels where nothing is being drawn will take longer than processing an extra 6 vertices.

If you're draw call bound:
As you've already discovered rebinding things or swapping things out takes time. The most expensive is to change a program and then the next most expensive is to change a texture. Try to make sure you draw everything with the same program all at once (e.g. all buildings, electricity poles (not cables), conveyor belts and rails together and draw their shadows afterwards). Do reuse programs as much as possible (it seems like you're already doing this, but just in case), most of your sprites are pretty similar (some have a basic animation but it's not a costly lookup and should be done in the vertex shader) so they should be able to use the same program.
If you're still draw call bound and you're not already you can definitely draw multiple types of things in the same draw call (e.g. belts and assemblers) just passing different values to the draw program (e.g. the location of the sprite on the atlas - assuming you're atlassing them together).

If you're vram bound:
This may not be as useful, but I'm curious as to whether you're running out of vram or if it's taking a while to access. If it's the latter then the only thing I can recommend which you're probably already doing is to render everything with the same texture at the same time if you can. I realised you're exceeding vram limits for low-end graphics cards, so nothing I said was relevant. Sorry!
This is not really a problem I've run into previously because my use-case is too different (when working in 3D you have mipping).

Very happy to talk more about this if you want and pass along any of my learnings if it's useful :)


EDIT:
ThePhantasm wrote:What I have done in order to do faster sprite rendering in a modern API is to create a Vertex Buffer that contains the basic 6 vertices (assuming something like a (0,0,0)-(1,1,0) position) with the correct normal/texture coordinates. Then replicate that a few thousand times.
Exactly what I was trying to get at but after reading their post I got the impression that they're already doing that (at least I hope so).
Actually if you're already using an index buffer you probably want to have only 4 vertices in the buffer and use something like triangle fan and have something like the following in the index buffer [3, 3,1,2,4,4] (the duplicate at the start and end prevents the GPU from attempting to draw stuff between fans) for each triangle so that for all of the supporting information only has to be written 4 times rather than 6 times. This is useful because often what causes a draw bottleneck is the speed of the bus between the CPU and GPU, and you want to limit the amount of data you're trying to send.

Actually on that note, something that will cause your game to slow down is error checking or reading from a map (e.g. getPixel). Don't do either of those at run-time, error checking (gl_getError) or whatever it is will cause the CPU to wait for the GPU to finish, the CPU should never have to wait on the GPU. Only do error checking at startup when you're making programs, buffer and textures but not during run-time.

It's been a couple of years since I did this stuff in detail so I might be a little rusty at points (function call names and whether to use fan or strip), but if it's going to be useful to you I can look up the correct info.

HolySmoke
Manual Inserter
Manual Inserter
Posts: 1
Joined: Sat Jul 14, 2018 12:26 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by HolySmoke »

On the topic of quad vertices, have you tried drawing them as triangle strips? Could eliminate the index buffer.

Vandroiy
Burner Inserter
Burner Inserter
Posts: 18
Joined: Fri Dec 15, 2017 9:54 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Vandroiy »

I'm confused how all this detailed performance knowledge is going on and then the topic is performance on 25k sprites.

I don't really know performance details like in posila's first reply, but I wrote a sprite renderer in OpenGL 4.3 and F# that can render a mix of hundreds of thousands of CPU-pushed sprites and millions of GPU-pushed particle sprites smoothly on fairly old hardware (as long as it's not limited by overdraw/fragment shader stuff).

The method was a vertex shader that gets an ID as input and then grabs everything from VRAM buffers. For example, a buffer that can fetch you texture page and position from a sprite ID. Where necessary, these buffers would hold IDs that reference into other buffers, with data structures chosen to minimize redundant updates and allow on-GPU movement of particles, which allows the big particle counts. The vertex shader to unpack everything always has to run six times per sprite, 'cause of the six vertices for a rectangle, but who cares? At mere millions of invocations per frame, isn't this just warmup for the gazillions of shader units GPUs have?

There was nothing like fancy invalidate buffer flags or such, and the CPU code was the usual garbage collected .NET spam, plus the not-always-fast F# on top. Low-level performance is surely no match here. But five-digit sprite counts weren't an issue. Is there maybe something in this approach that could be useful for Factorio?

Demongornot
Inserter
Inserter
Posts: 21
Joined: Fri Oct 07, 2016 9:29 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Demongornot »

I have never done any GPU computing or graphic engine code so I might be writing the most stupid thing ever here, but wouldn't a initial (X, Y) coordinate with X and Y offset be better ?
If the GPU can always finish faster than the CPU on your code, wouldn't this make the CPU having to send less initial data while the GPU who can handle many calculations at once transform (X,Y) + offsets as :
(Guessing you render with the standard clockwise starting on top left) : (X1 & X4 ,Y1 & Y4) = (X,Y); (X2, Y2) = (X + X offset, Y); (X3 & X5, Y3 & Y5) = (X + X offset, Y + Y offset); (X6, Y6) = (X, Y + Y offset).
So both (X, Y) 1,2,3 & 4,5,6 of the two polygons are combined using offsets saving in what the CPU send and combining calculations on vertex ?

User avatar
Jerry Oak
Manual Inserter
Manual Inserter
Posts: 2
Joined: Tue Jun 13, 2017 1:49 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by Jerry Oak »

Are computers in NTK accesible to all visitors or do I have to pay the subscription?

shaman
Manual Inserter
Manual Inserter
Posts: 3
Joined: Sat Jul 22, 2017 5:53 pm
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by shaman »

@posila:
For me as a graphics programmer, your post was very interesting.
So far, I always implemented sprites in the Geometry Shader. But after your investigation, I think, I'll try out the method with a fixed index buffer and the sprite transformation in the vertex shader the next time I need it.
More of that, please!

ske
Filter Inserter
Filter Inserter
Posts: 409
Joined: Sat Oct 17, 2015 8:00 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by ske »

Do you have statistics of your players on their hardware, how big their factories actually are and what FPS and UPS they are running the game at?

What percentage is pushing the limits with their big factories?

User avatar
lottery248
Inserter
Inserter
Posts: 21
Joined: Thu Jan 04, 2018 9:41 am
Contact:

Re: Friday Facts #251 - A Fistful of Frames

Post by lottery248 »

fam.

i got a question for you: there are not enough CPU for overall testing, the GPU test was not wide enough, like missing 8-core tier (???)

[hr]

by the way, the NTK will have Factorio, does that mean this game is originated in Czech?

Post Reply

Return to “News”