News AMD researchers reduce graphics card VRAM capacity of 3D-rendered trees from 38GB to just 52 KB with work graphs and mesh nodes — shifting CPU work...

The article said:
AMD researchers reduce graphics card VRAM capacity of 3D-rendered trees from 38GB to just 52 KB
"Capacity"? Really?? Capacity is "the ability to hold something". The size of a data structure is the "something" that memory holds. It should have said either "footprint" or "utilization".

The article text is fine, though. So, how did this happen? If either a human or an AI is trying to "punch up" the headlines to get more clicks, you guys need to hold the line at anything that's factually inaccurate or simply nonsensical.

Tagging @PaulAlcorn
 
BTW, the idea of procedurally-generating geometry on-the-fly isn't terribly new. This is the same basic idea behind tessellation shaders and geometry shaders, but I guess just taking it another step.

And, in that vein, it's also a little funny to quote 38 GB, as the baseline - as if that represents the current state of the art.

The paper said:
When represented as a conventional static triangle mesh with positions, normals, and texture coordinates, the
tree geometry would amount to 34.8 GiB.

Partly because of what I said above, that's not how engines currently work. It means their figure is a huge overestimate of the practical benefits this technique would offer.
 
Last edited:
Partly because of what I said above, that's not how engines currently work. It means their figure [is] a huge overestimate of the practical benefits this technique would offer.
That's the only bit that bugged me about this. It looks like they did something quite useful--a practical capability to significantly increase foliage density. Use better colors/textures and reduce wind speed and vibration rate and this is a game-changer!

So why did they lie to make something 100x better sound 100,000x better? It was already really impressive.
 
Actually this reads more to me like:

AMD: guys, you should start optimize the game again and not brute force it and dump all fancy effects rendering something hidden behind 10 blocks of buildings on the fly
 
  • Like
Reactions: artk2219
Sounds cool, but come back to me when you can show it to me in game and can give me also other performance metrics, like FPS and latency. Also how well it will run on midrange and older cards that don't have dedicated hardware. Because as good as it feels, it us just tech demo and tech demo isn't a game. Maybe I have seen too many fancy stuff like this, which ended up never being in actual games and all they had to show for it decade latter was still just tech demos. Because they either practically didn't work well for dynamic environment of game or because player base would be way too limited to latest cards and stuff and before cards that can do it became common, something else already took over or it was just old tech to solve problem of the past in less backwards compatible way.

But they are free to prove it viable in actual game that is popular. Not just some Ashes of Benchmark you could use to see when reviewers start testing new cards because player count spikes. :-D
 
  • Like
Reactions: artk2219
how well it will run on midrange and older cards that don't have dedicated hardware. Because as good as it feels, it us just tech demo and tech demo isn't a game.
According to this introduction to work graphs in D3D:

"Any AMD Software: Adrenalin Edition™ driver 23.9.2 or newer for Windows has support for the Work Graphs API on AMD Radeon™ RX 7000 Series graphics cards"

It's unclear whether that was just where they focused, for their initial implementation, or whether it truly requires RDNA3 to support. If I had to bet, I'd say probably the latter.

According to this, work graphs should be supported on Nvidia GPUs from RTX 3000 and later:
 
BTW, the idea of procedurally-generating geometry on-the-fly isn't terribly new. This is the same basic idea behind tessellation shaders and geometry shaders, but I guess just taking it another step.

And, in that vein, it's also a little funny to quote 38 GB, as the baseline - as if that represents the current state of the art.

Partly because of what I said above, that's not how engines currently work. It means their figure is a huge overestimate of the practical benefits this technique would offer.
I suppose if you wanted to generate the same procedurally generated scene in a conventional triangle mesh, it would cost 34.8GB, which is why game engines don't render this way. The meshes are broken up and CPU has to issue draw calls to GPU for each set - inevitably leading to CPU-limited performance (think Flight Simulator with lots of draw distance and LODs). AMD states that frame-to-frame time (or simply frametime) is 7.74ms or somewhere around 127.8fps. Doing all of this via normal rendering techniques, you'd be lucky to see 25fps, mostly due to memory restrictions and you'd be using too many draw calls for geometry relative to the rest of the screenspace. DX12 allows for more draw calls, but draw calls have a cost per issue.

In the scenario AMD provides, the GPU is issuing work itself and simply drawing the trees and foliage itself. It's more of, if you tried to do this in a static triangle mesh, it'll cost x amount of RAM, along with x amount of CPU cycles. Many game devs are using tessellation in smarter ways or even not at all, since GPUs can render and cull unique geometry so quickly now. This is also why trees and their foliage in most games are quite simple because the amount of vertices needed for just one tree in AMD's example amounts to the total vertices for an entire scene or ~8m. With ray tracing, this would balloon the BLAS in BVH as well, since BLAS is the geometry level structure.

Mesh shaders were also supposed to help with procedural generation of complex geometry, but game devs are having issues getting improved performance vs traditional geometry pipelines (esp. vs Nvidia's very fast, efficient geometry pipelines). Perhaps game engines also need to better manage RAM/VRAM and CPU/GPU work to realize benefits or do things differently to extract maximum performance from mesh shaders. Those shaders aren't good for all geometry anyway.

Nvidia added a command processor to Blackwell to better support GPU-based (hardware) work scheduling and drawing. I found that interesting, so Microsoft is definitely moving DX12 towards GPU-initiated drawing, offloading CPU further.
 
I suppose if you wanted to generate the same procedurally generated scene in a conventional triangle mesh, it would cost 34.8GB, which is why game engines don't render this way. The meshes are broken up and CPU has to issue draw calls to GPU for each set
No, the CPU doesn't normally ship all the geometry over PCIe, for every frame.

Also, like I said about tessellation, you can send the GPU a far smaller amount of data, in form of patches, and it dynamically subdivides these into polygons as it draws them. This is also one place where you can implement LOD-based geometry reductions, since tessellation engines can generate only the amount of detail necessary for how big the patch is on screen.

This is also why trees and their foliage in most games are quite simple because the amount of vertices needed for just one tree in AMD's example amounts to the total vertices for an entire scene or ~8m.
You can use instancing to reduce this amount. So, basically you have some common branch and leaf shapes and you map out a forest in terms of trees comprised of instances of the different branch types. Then, you only need to send over the geometry for those archetypes and the GPU can instance them on-the-fly.
 
This article is actualy really... ehh.. How to put it. Look, this is just how procedural generation works. This is how geometry nodes and shader nodes in Blender works. And it saves crazy amount of V-ram. Down side is that the GPU has to calculate everything all the time. You are trading away the lookup table.
 
  • Like
Reactions: bit_user
This article is actualy really... ehh.. How to put it. Look, this is just how procedural generation works.
Exactly, although if you follow the trail of links back to the original paper, it does have the claim about static polygons right at the top. So, I' don't consider AMD blameless in this.

This is how geometry nodes and shader nodes in Blender works. And it saves crazy amount of V-ram. Down side is that the GPU has to calculate everything all the time. You are trading away the lookup table.
Yeah, the thing that's new here is workgraphs. What they do is basically let the GPU compute a bunch of stuff in parallel and then feed it into more processing by the GPU. Basically, it's the GPU creating its own work, rather than relying on the host CPU to guide it along the way. It uses GPU resources more efficiently, since it avoids some gaps where big parts of the GPU are potentially waiting for the next host-driven command.