News Blender 4.0 Released and Tested: New Features, More Demanding

Blender used to use the GPUs not at all, then very little and again later rather well.

What I am trying to say is that it's far from trivial to redesign a rending pipeline built for quality to exploit a GPU that is designed to fake convincingly enough. And that they obviously selected code snippets for which GPUs would work well enough, while others were left alone, resulting in quite a bit of data dependencies and lost efficiency.

So if some scaling behavor "does not make sense", I understand the author's sentiment, but I can just imagine the groans of the developers who really tried the most they could.

I remember quite well that at one point Blender rendered a benchmark scene exacdtly in the same time on my 18-core Haswell and the RTX 2080ti, but in the latest pre 4.0 releases, GPU renders have made nearly all CPU power irrelevant, even if it did a rather nice job at at least trying to keep the CPUs busy, even if they contributed little speed-ups to the final result and were probably not efficient in terms of additional engergy vs. time reduction.

Let's just remember that Blender is a product that needs to deliver quality and will sacrifice performance if it can't have both.

A computer game would make the opposite choice.
 
Blender used to use the GPUs not at all, then very little and again later rather well.

What I am trying to say is that it's far from trivial to redesign a rending pipeline built for quality to exploit a GPU that is designed to fake convincingly enough. And that they obviously selected code snippets for which GPUs would work well enough, while others were left alone, resulting in quite a bit of data dependencies and lost efficiency.

So if some scaling behavior "does not make sense", I understand the author's sentiment, but I can just imagine the groans of the developers who really tried the most they could.

I remember quite well that at one point Blender rendered a benchmark scene exactly in the same time on my 18-core Haswell and the RTX 2080ti, but in the latest pre 4.0 releases, GPU renders have made nearly all CPU power irrelevant, even if it did a rather nice job at at least trying to keep the CPUs busy, even if they contributed little speed-ups to the final result and were probably not efficient in terms of additional energy vs. time reduction.

Let's just remember that Blender is a product that needs to deliver quality and will sacrifice performance if it can't have both.

A computer game would make the opposite choice.
To be clear, the "doesn't make sense" on the A770 and A750 is more about Intel than it is about Blender. Because I'm sure Intel is helping quite a bit with getting the OpenVINO stuff working for Blender. AMD helps with HIP and Nvidia helps with Optix as well. I mostly point it out because it is a clear discrepancy in performance, where a faster GPU is tied with a slower GPU. I suspect Intel will work with the appropriate people to eventually make the A770 perform better.

I would also suggest that GPUs aren't made to "fake convincingly enough." They're made to do specific kinds of math. For ray tracing calculations, they can offer a significant speed up over doing the same calculations on CPU. Doing ray/triangle intersections on dedicated BVH hardware will simply be way faster than using general purpose hardware to do those same instructions (which means both GPU shaders and CPU cores). For FP32 calculations, they're as precise as a CPU doing FP32 and can do them much, much faster. For FP64, you'd need a professional GPU to get decent performance.

It's really the software — games — that attempt to render graphics in a "convincing" manner. CPUs can "fake" calculations in the same fashion if they want. It's just deciding what precision you want.
 
To be clear, the "doesn't make sense" on the A770 and A750 is more about Intel than it is about Blender. Because I'm sure Intel is helping quite a bit with getting the OpenVINO stuff working for Blender. AMD helps with HIP and Nvidia helps with Optix as well. I mostly point it out because it is a clear discrepancy in performance, where a faster GPU is tied with a slower GPU. I suspect Intel will work with the appropriate people to eventually make the A770 perform better.
What you observe is Blender not scaling with GPU performance, on the ARC in this case.

And, yes, I agree that is obviously a software issue. Now, if it's Blender not eager enough to wring the last bit out of ARCs and e.g. leaving too much on the CPU side, or Intel not eager enough to provide fully-usable Blender-ready ray-tracing libraries, is a matter of perspective: I'd say we have a kind of Direct-X11 vs Mantle situation here and the "DX12 equivalent" optimal Blender interface for all GPUs is still missing.

To me it just makes complete sense, especially when do a migration from a pure CPU render to an accelerator, differences in the level of abstraction between GPUs will show off via scaling issues.... so had you said that the missing improvements between the 750 and 770 hint at software scaling issues blocking the optimal use of the hardware, I wouldn't have put a comment.

I would also suggest that GPUs aren't made to "fake convincingly enough."
I'd say that's at least how they started
They're made to do specific kinds of math. For ray tracing calculations, they can offer a significant speed up over doing the same calculations on CPU. Doing ray/triangle intersections on dedicated BVH hardware will simply be way faster than using general purpose hardware to do those same instructions (which means both GPU shaders and CPU cores). For FP32 calculations, they're as precise as a CPU doing FP32 and can do them much, much faster. For FP64, you'd need a professional GPU to get decent performance.

It's really the software — games — that attempt to render graphics in a "convincing" manner. CPUs can "fake" calculations in the same fashion if they want. It's just deciding what precision you want.
And that may be where they have come to be, after transitioning into more GPGPU type hardware.

And there I might have predicted a continued split between real-time (Gaming-PU) oriented and quality-oriented (Rendering-PU) designs because at sufficient scale everything tends to split. It's like running LLMs on consumer GPUs: it's perfectly functional, but even at vastly lower prices per GPU the cost of running inference at scale still makes the vastly more expensive HBM variants more economical.

But that may be a discussion that is soon going to be outdated, because all rendering, whether classic GPU "real-time-first" rendering or ray-traced "quality-first" rendering according to Mr. Huang will be replaced by AIs faking all.

Why even bother with triangles, bump-maps, anti-aliasing and ray-tracing if AIs will just take a scene-description (from the authoring AI) and turn that into a video with "realism" simply part of the style prompt.

Things probably aren't going to get there in a single iteration, but if you aimed for head-start as a rendering startup, that's how far you'd have to jump forward.

Too bad Mr. Huang already got that covered...