Performance on AMD Bulldozer

Post your ideas and suggestions how to improve the game.

Moderator: ickputzdirwech

wanne
Inserter
Inserter
Posts: 26
Joined: Tue Jan 28, 2020 7:24 am
Contact:

Re: Performance on AMD Bulldozer

Post by wanne »

Jap2.0 wrote: Wed Jan 29, 2020 5:35 pm The conclusion I draw from that is as follows:

If you want to play factorio you need decent speed memory.
Faster Memory makes better results. But you don't need it really: Compare these two:
https://factoriobox.1au.us/result/c5622 ... 67ebac3527
https://factoriobox.1au.us/result/973c5 ... cfe9639914
Olacken
Long Handed Inserter
Long Handed Inserter
Posts: 63
Joined: Wed Apr 17, 2019 1:37 pm
Contact:

Re: Performance on AMD Bulldozer

Post by Olacken »

Maybe the second one was running 15 tabs of chrome at the same times aka you should not draw conclusion from single point of data

https://factoriobox.1au.us/result/e44c3 ... 04cd2ff3ae
Last edited by Olacken on Wed Jan 29, 2020 6:24 pm, edited 1 time in total.
quyxkh
Smart Inserter
Smart Inserter
Posts: 1031
Joined: Sun May 08, 2016 9:01 am
Contact:

Re: Performance on AMD Bulldozer

Post by quyxkh »

That 3930K is almost certainly running four channels, and if so its memory is nearly as fast as that Ryzen rig's two channels. Which goes a long way towards explaining why its performance is nearly as good.
Jap2.0
Smart Inserter
Smart Inserter
Posts: 2423
Joined: Tue Jun 20, 2017 12:02 am
Contact:

Re: Performance on AMD Bulldozer

Post by Jap2.0 »

Sure, I was exaggerating slightly to get my point across. But here's the data (AMD CPUs marked with an asterisk):

0.18.1

Code: Select all

 400 MHz: 41 UPS
1333 MHz: 64 UPS
1600 MHz: 70 UPS
2400 MHz: 56 UPS*
2933 MHz: 101 UPS
3000 MHz: 83 UPS*
3266 MHz: 85 UPS*
0.17.79 (larger sample size)
NB: I discarded identical configurations with multiple significantly differing results or results that were extremely extraneous (eg. 0 UPS or 15492 UPS), as well as results from maps other than "cb0cd35aa6893dfdae2ce574d345d2c2".

Code: Select all

 400 MHz: 36 UPS
1333 MHz: 51 UPS, 53 UPS
1600 MHz: 54 UPS, 55 UPS, 59 UPS, 61 UPS
2400 MHz: 48 UPS*, 55 UPS*, 74 UPS
2666 MHz: 87 UPS
2933 MHz: 63 UPS*, 91 UPS
2954 MHz: 91 UPS
3000 MHz: 75 UPS*
3200 MHz: 62 UPS*, 72 UPS*, 76 UPS*, 78 UPS*, 80 UPS*
3466 MHz: 67 UPS*
3600 MHz: 84 UPS*, 87 UPS*, 104 UPS, 104 UPS
3700 MHz: 96 UPS
3733 MHz: 75 UPS*, 91 UPS*
3900 MHz: 99 UPS
4000 MHz: 102 UPS
0.16.59 just for fun:
(NB: this is map 7ddeeb9bacf723a81d420674714bca09)

Code: Select all

 667 MHz: 35 UPS*
1600 MHz: 49 UPS, 55 UPS, 56 UPS
1867 MHz: 55 UPS
2133 MHz: 61 UPS, 67 UPS, 73 UPS
2400 MHz: 55 UPS
3000 MHz: 77 UPS
3200 MHz: 64 UPS, 72 UPS, 85 UPS, 90 UPS
3600 MHz: 79 UPS
3700 MHz: 79 UPS
3800 MHz: 87 UPS
3867 MHz: 69 UPS, 78 UPS
(I realize that this is a bad format, I'll see if I can graph it and calculate some trend lines later.)

In conclusion, I think it's fair to say that while there are other significant factors, faster memory speed will always improve performance significantly on the same system, up until at least 3600 MHz (and quite likely higher).
There are 10 types of people: those who get this joke and those who don't.
Jap2.0
Smart Inserter
Smart Inserter
Posts: 2423
Joined: Tue Jun 20, 2017 12:02 am
Contact:

Re: Performance on AMD Bulldozer

Post by Jap2.0 »

Okay, graph time. Making graphs is way more convoluted than it should be in google docs and unfortunately I don't have access to Excel at the moment, so you're getting lovely TI-84 graphs. Boxes are Intel, plusses are AMD. Trend lines are unreliable for smaller data sets. Also to note is that multiple identical points affect the trend but aren't visible.
The scale is 0-4000 for the X-axis with an increment of 400, 0-120 for y incrementing by 20.

0.18:
018.jpg
018.jpg (2.42 MiB) Viewed 2318 times
0.17:
017.jpg
017.jpg (2.49 MiB) Viewed 2318 times
0.16:
016.jpg
016.jpg (2.41 MiB) Viewed 2318 times
And now I just realized that Office online exists, so you might be getting some nicer graphs there in a few minutes.
There are 10 types of people: those who get this joke and those who don't.
User avatar
BlueTemplar
Smart Inserter
Smart Inserter
Posts: 3234
Joined: Fri Jun 08, 2018 2:16 pm
Contact:

Re: Performance on AMD Bulldozer

Post by BlueTemplar »

Vintage graphs ! :D

That 14% performance improvement when using core pinning is nothing to sneeze at !
Some related pinning discussions, featuring the single core - hungry Supreme Commander 1 :
"Optimizing your Experience with Forged Alliance - It works"
Even with the old "Core Maximizer", designed more for Windows XP...
"Speeding up that dinosaur CPU"

So, is this the OS poorly optimizing thread usage ?
The issues with the Bulldozer having 2 cores share a single pipeline ?
The combination of the above ?

I'll have to try some benchmarks both on Win7 and Linux...

Bonus - Ashes of the Singularity performing rather poorly on Bulldozers
BobDiggity (mod-scenario-pack)
wanne
Inserter
Inserter
Posts: 26
Joined: Tue Jan 28, 2020 7:24 am
Contact:

Re: Performance on AMD Bulldozer

Post by wanne »

The issues with the Bulldozer having 2 cores share a single pipeline ?
I think the problem is more that they do not share the L3-Cache. Factorio communicates a lot with its RAM. I think on Intels a lot of this is absorbed by its L3 Cache. You don't have that on bulldozers. If one Factorio thread is communicating with the other they have to go over the RAM: Which has in addition higher latencies compared to Intel and newer processors.
So due to their high frequency FXes are is not that bad on single-core applications. I think if factorio would be be singlethreadded it wouldn't be that bad. And due to the many cores they have they are good for multiple processes doing Integer work. (Compiling! x264 encoding...) But they suck on sequential multi threaded applications with many data dependencies. (Like factorio.)
BlueTemplar wrote: Thu Jan 30, 2020 12:42 amSo, is this the OS poorly optimizing thread usage ?
The problem is: It can't. OS just sees: "Oh there are 10 Threads for factorio." For 90% of the applications it would be awfully stupid to put all of them on one module. – Wasting 6 of 8 cores. Only for factorio it makes perfect sense since the application has many threads but is a sequential application due to its dependencies.
Factorio could solve this by using pthread_setaffinity_np(). – But they don't want to care about cache architectures. As BenSeidel said: Most game developers (apart from the newer AAA games) care a shit about NUMA architectures. So Intels focus on micro-parallelisation such as the out-of-order instruction processing, the vector extension operations, branch prediction and all the other nifty things that make your sequential program run faster wins at the end. This is what developers been trained to optimize for 70 years. And Intel follows where software developers lead.

There was a similar discussion about Windows and Bulldozer: Windows actively shuffles threads around to balance load which results in a system that is easyer to cool but also in a greatly decreased cache hit rate. So for optimized applications and properly cooled systems this resulted in the absolute worst case scenario for performance. This caused a huge shitstorm in the community who cared about performance. But I think in the end it was the right decision by Microsoft. In the by fare most cases where Windows runs on Bulldozer this is a cheap badly cooled PC running a application never heard of NUMA or multi core. In this scenarios you get a few more milliseconds on Boost speed and it is a good thing.

Since programmers can't write code for every CPU. (With the exception of a few big projects with very little codebases like OpenSSL or OpenWall/john) such problems my be only solved by libraries like OpenMP or partly DX12 that give the programmer the ability to write programs that adapt to arcitectures and the OS/Scheduler the ability to interact with the program to adopt in the right way. – And a new generation or programmers that use this possibilities. But I think no side: OS/CPU/Programm can do this alone. Everything they do alone will only match the most common usage but not special cases like Factorio on Bulldozer or something similar.
wanne
Inserter
Inserter
Posts: 26
Joined: Tue Jan 28, 2020 7:24 am
Contact:

Re: Performance on AMD Bulldozer

Post by wanne »

Ah. And I think there is the big difference to 0.11-0.15 (or more 0.14 and 0.15 until 0.13 factorio was unplayable in multiplayer due to desyncs when I remember right.): These versions sucked at all architectures. So I could easily keep up with servers. But now there where made a lot of performance optimizations. And while they say the don't optimize for any architecture the reality is: You can't do that. If you don't know what is faster you can't optimize for speed. They optimize with the mindset of having a very primitive singlecore machine. And this is where Intel is unbeaten good at: Making singlecoroe code parallel: (Through out-of-order instruction processing, the vector extension operations, branch prediction. This is not about running sequential code faster. This is making sequential code parallel. And in all these fields Intel is still leading. And a lot of factorio can be processed in parallel. Programmers just have no idea what. )
So in the end all this Intel-servers (I think Intel has 95% market share on servers.) run much faster with 0.18. But for my bulldozer factorio is still a extremely bad optimized game. So if it is always a race between server and clients for the games I prefer I'm standing much worse than two years ago – Even if the hardware is the same. – Which it probably isn't.
Post Reply

Return to “Ideas and Suggestions”