Friday Facts #264 - Texture streaming

mrvn · Post by **mrvn** » Thu Oct 18, 2018 2:47 pm

This is something I don't understand at all given modern hardware.

In the older days we had cathod ray tube monitors with an electron beam going left to right real fast and top to bottom less fast so it would go over every single pixel on the monitor. Both of those motions were done with magnetic fields at a fixed rate. Later then you had multi-sync monitors that could change those rates. But changing the rates takes a second or two and then you stick with the selected rates for every frame. And that means for each frame you have to have the pixel data at the exact moment the electron beam goes over the pixel. You can't use it earlier or later.

Now fast forward to today. We have TFT monitors that have a big chunk of memory where they store a full frame and drive individual pixels from that internally to make an image appear. They can even scale the image from a e.g. 640x480 image to full HD what the monitor actually has.
There also is no need for any kind of fixed refresh rate. It's not like the pixel fades if it isn't refreshed at 60Hz.

So why don't we have graphic cards that send out screen refreshes to the monitor when we tell them to? Why not send frame 1 after 10ms, frame 2 after 17ms, frame 3 after 12ms and so on. Why can't the monitor update when we tell them to instead at a fixed rate that we have to keep up with?

ske · Post by **ske** » Thu Oct 18, 2018 4:54 pm

mrvn wrote: Thu Oct 18, 2018 2:47 pm So why don't we have graphic cards that send out screen refreshes to the monitor when we tell them to? Why not send frame 1 after 10ms, frame 2 after 17ms, frame 3 after 12ms and so on. Why can't the monitor update when we tell them to instead at a fixed rate that we have to keep up with?

This technology exists. It's called FreeSync on AMD and Intel and GSync on Nvidia cards. Both are different, incompatible, protocols. They allow a varying refresh rate as you suggest. The monitor also needs to support the protocol that the graphics card is using.

bobingabout · Post by **bobingabout** » Fri Oct 19, 2018 7:50 am

raidho36 wrote: Thu Oct 18, 2018 1:26 pmThat said, 8 gigs is already beyond huge. The only way you can use it up is if you're being incredibly wasteful. Go on, find a good use for every single of 2 000 000 000 32-bit numbers you could store in 8 gigs of memory, I dare you.

The annoying thing that I've noticed is that where a 32bit application will launch and use... 10 megabytes, the 64bit version of the same application will instantly take 300MB. and I've noticed that across the board. Firefox 54.0.1 32bit, 474 tabs open (though it only loads the ones you click on, so only 1 actually loads), one instance of 930MB, plus one instance of 45MB (not sure why it duel-instances), CPU idles around 20%(it's a low spec CPU), Close and reload the exact same tabs in the 64bit version, and it takes 1.2GB and 300MB, but idles around 17%.

I have no explanation for the additional memory usage, but in my specific case, I only have 4GB of RAM on this system, the 3% additional CPU usage is worth saving the half gig of extra RAM usage. I have no idea why the system does this, but it's annoying.

mrvn wrote: Thu Oct 18, 2018 2:47 pmSo why don't we have graphic cards that send out screen refreshes to the monitor when we tell them to? Why not send frame 1 after 10ms, frame 2 after 17ms, frame 3 after 12ms and so on. Why can't the monitor update when we tell them to instead at a fixed rate that we have to keep up with?

that's called freesync, or of if you go NVidia it's Gsync, because NVidia want to charge you through the arse.

dee- · Post by **dee-** » Fri Oct 19, 2018 1:31 pm

ske wrote: Thu Oct 18, 2018 4:54 pm
mrvn wrote: Thu Oct 18, 2018 2:47 pm So why don't we have graphic cards that send out screen refreshes to the monitor when we tell them to? Why not send frame 1 after 10ms, frame 2 after 17ms, frame 3 after 12ms and so on. Why can't the monitor update when we tell them to instead at a fixed rate that we have to keep up with?
This technology exists. It's called FreeSync on AMD and Intel and GSync on Nvidia cards. Both are different, incompatible, protocols. They allow a varying refresh rate as you suggest. The monitor also needs to support the protocol that the graphics card is using.

One is an actual DisplayPort (VESA Adaptive Sync) and HDMI-Standard (HDMI 2.1 VRR) and for free; one is a closed-down proprierary walled-garden implementation that costs on top for both monitor and graphics card.

raidho36 · Post by **raidho36** » Fri Oct 19, 2018 1:42 pm

dee- wrote: Fri Oct 19, 2018 1:31 pm One is an actual DisplayPort (VESA Adaptive Sync) and HDMI-Standard (HDMI 2.1 VRR) and for free; one is a closed-down proprierary walled-garden implementation that costs on top for both monitor and graphics card.

One is royalty-free, the other puts $100 markup on your purchase. And so one gains traction while the other fades into obscurity. No amount of shilling can combat the invisible hand of free market.

bobingabout wrote: Fri Oct 19, 2018 7:50 am The annoying thing that I've noticed...

It's kinda the same deal with 16 bit vs 32 bit isn't it? It has to do with the era during which the format was mainstream. Specifically, the compiler and the OS will put more essential (and not so much but still QoS improving) stuff into 64 bit program on assumption that plenty of memory will be available anyway. Graphics kernel duplicate, audio driver interface, etc. Beyond this bit tax, the programs will be the same size, plus 2x the pointer size which shouldn't be significant. You can still write 64 bit programs in less than 2 kilobytes in assembly and they will run just fine, except maybe not as fast and/or secure as with all the bells and whistles.

ledow · Post by **ledow** » Fri Oct 19, 2018 2:57 pm

gamah wrote: Fri Oct 12, 2018 3:43 pm 16.6ms is eons in computer-speak, about 50 million cpu cycles at 3ghz.... Instead of dropping a fame that can't be drawn wouldn't it be more visually pleasing to have a subroutine hold on to whatever the current/last frame data was, and constantly feed that at 60 fps whether or not you can update the whole thing? So what if the bottom 200 lines in the next screen draw are from the previous frame.... at least the first 880 can be updated with the current frame instead of abandoning all of that work entirely...

It's not. You're not executing instructions like that. You often have to wait for memory transfers, caches, bus activity, the GPU to be ready to accept data, other CPUs to flag they are locking parts of memory that are needed, the memory controller itself to send you data, etc. etc. etc. None of those care how many CPU instructions it is, they care how fast the relevant buses and caches can pull in the data you want and push it back out. You're also dealing with a million of things in the background... you could easily be pushing millions of instructions just for background OS tasks, millions lost just waiting for locking, memory contention, etc. etc.

It's enough time to do things, but not naively. If you program naively, you can use up that timing just waiting for a USB mouse to tell you what buttons it currently has held down.

PunPun wrote: Fri Oct 12, 2018 5:06 pm Umm. Absolutely no. You never ever draw to ram unless you wan't sub 1fps. Doublebuffering is verymuch done by telling the gpu to draw into a framebuffer that is very much in vram on the gpu.

RAM does not necessarily main RAM. (But I'm from an era where it, in fact, did!) VRAM is just the same, double-/triple-buffering do the same thing in that instance but using VRAM as a holding place until you have the full frame but then the flip still has to wait for the vsync to come down. Additionally, that STILL requires you to instruct the CPU to tell it what to put into the double-buffer for that frame (which is dependent on what's showing in the game!) in time for the vsync to "switch pointers" so the GPU shows the data from your double-buffer.

TheYeast · Post by **TheYeast** » Fri Oct 19, 2018 6:21 pm

posila wrote: Wed Oct 17, 2018 9:52 pm How much RAM do you have? According to Steam survey, over 90% of our players has 8GB+

I have 16 GB RAM, 1 GB VRAM.

Tinkering with graphics settings does not seem to matter much, except that setting VRAM usage to 'all' seems to be a little better than the lower options.

When zoomed out on a complex factory, FPS drops to around 30. However, in other games this is quite usable because it is rather consistent. When walking around, Factorio seems to drop frames in bursts, which makes it very apparent.
Debug info that outputs the number of consecutively dropped frames in the last few seconds might be useful as a performance metric.

posila wrote: Wed Oct 17, 2018 9:52 pm I added something like this while starting with virtual texture mapping just to figure out how big portion of atlas do we need to render a frame. It was usually 10% or less, maximum was about 20% when I zoomed out on huge biter base with lots of biters and spitters.

I thought it would be something like that.. if that 10% is spread out over most textures in the atlas, I can imagine my OpenGL driver running out of VRAM and having to move atlas textures from and to main memory a lot.

Being able to favour virtual textures over an atlas with the "Video memory usage" option as you said might indeed help.

posila wrote: Wed Oct 17, 2018 9:52 pm Disadvantage of replacing entire atlas system with streaming, is that then all sprites would have to be in RAM (unless streaming from HDD works well), which is waste of RAM is you have GPU with large VRAM (this could be solved by storing part of the virtual atlas in VRAM though).

When loading all sprites into a virtual texture in RAM, the tiles that are permanently cached in the VRAM are no longer needed in RAM and will eventually be swapped to HDD. So it is not so much a waste of RAM but more the waste of writing the swap file.

Another option to deal with RAM usage would be to keep virtual texture tiles only in RAM or VRAM: When a tile is needed in VRAM, space would be made by copying the least-recently-used tile back to a free slot in RAM, after which the required tile is copied to VRAM. The source slot of the required tile in RAM now becomes the free slot. That way every tile is only in one place at a time.
This doubles the number of copies needed when adjusting which tiles are being cached. However, permanently storing part of the virtual texture in VRAM as you suggested (as well as keeping a large atlas in VRAM), will reduce the size of the tile cache in VRAM and thus increase the need for copy operations. I expect that the latter has a bigger impact on performance.

posila wrote: Wed Oct 17, 2018 9:52 pm Also at the moment we don't want people who have enough VRAM to pay for performance overhead of the streaming

That sounds fair. Though with enough VRAM every tile would be cached, so the only performance overhead is using the indirection table. If you would at some point be able to measure what the penalty is for just the indirection in the shaders, without actually having to change cached tiles, I would love to know what it is.

mrvn · Post by **mrvn** » Mon Oct 22, 2018 9:27 am

raidho36 wrote: Fri Oct 19, 2018 1:42 pm
dee- wrote: Fri Oct 19, 2018 1:31 pm One is an actual DisplayPort (VESA Adaptive Sync) and HDMI-Standard (HDMI 2.1 VRR) and for free; one is a closed-down proprierary walled-garden implementation that costs on top for both monitor and graphics card.
One is royalty-free, the other puts $100 markup on your purchase. And so one gains traction while the other fades into obscurity. No amount of shilling can combat the invisible hand of free market.
bobingabout wrote: Fri Oct 19, 2018 7:50 am The annoying thing that I've noticed...
It's kinda the same deal with 16 bit vs 32 bit isn't it? It has to do with the era during which the format was mainstream. Specifically, the compiler and the OS will put more essential (and not so much but still QoS improving) stuff into 64 bit program on assumption that plenty of memory will be available anyway. Graphics kernel duplicate, audio driver interface, etc. Beyond this bit tax, the programs will be the same size, plus 2x the pointer size which shouldn't be significant. You can still write 64 bit programs in less than 2 kilobytes in assembly and they will run just fine, except maybe not as fast and/or secure as with all the bells and whistles.

Linux has almost the same features for 32bit and 64bit and mostly the same code. So there is no change there. Also you can start a 32bit firefox on a 64bit Linux and still see the increase. Compiler optimization though makes a difference. But usually to the better on amd64 because x86 is so register starved. But not always. Gcc and clang add a lot of bloat to binaries because that runs faster. But binary size is normally small compared to data.

That leaves the real reason: double the size of pointers. Also double the size to store the size of memory. A language with garbage collection, like java script (which is basically all the bad stuff in firefox and web pages), stores the size of each block of memory at the start of each block. And blocks need to be aligned for fast access. So even storing a simple int (4 byte) will need 16 byte (or even 32 byte) of memory. Lots of small things add up to large memory requirements quickly.

foodfactorio · Post by **foodfactorio** » Tue Oct 23, 2018 2:32 am

interesting post about graphics engines

im glad the game runs ok on a 2GB nvidia 540m, and an intel 3000 hd from 10 years ago lol

Factorio Forums

Friday Facts #264 - Texture streaming

Re: Friday Facts #264 - Texture streaming

Re: Friday Facts #264 - Texture streaming

Re: Friday Facts #264 - Texture streaming

Re: Friday Facts #264 - Texture streaming

Re: Friday Facts #264 - Texture streaming

Re: Friday Facts #264 - Texture streaming

Re: Friday Facts #264 - Texture streaming

Re: Friday Facts #264 - Texture streaming

Re: Friday Facts #264 - Texture streaming