But your comparison involved apples and oranges. If I understand it correctly, you compared a gen 9.x iGPU with 24 EUs against a Gen 12 (Xe) iGPU with 96 EUs and concluded that because the latter was 4x as fast that it scaled linearly. Except the difference isn't only in the number of EUs.
Who said anything about CPU? I was talking about the removal of register scoreboarding + other changes made to the shader ISA.
Sounds plausible, but you should probably read up on the changes made to Gen11, before going too far down a speculative rabbit hole.
How do you know they're not power-throttling? Doubling the shader count doesn't mean they'll continue to run at the same clockspeed as the baseline, especially under high-load.
Hint: look for a tool called
intel_gpu_top
To make things clearer (less anecdotical stuff):
I compared three NUCs (NUC8, NUC10 and NUC11) I operate in a RHV/oVirt cluster today, before putting them into "production", using the very same Windows 10 image, updated with the latest drivers for each at the time.
They were purchased within six month of each other each with the top i7 CPU and 64GB of DDR4-3200 memory (Kingston modules, which also supports the 2400 and 2933 timings of NUC8/10).
I got the NUC8 (i7-
8559U/
48EU) first in Summer 2019, I think, because it had fallen below €400 even with the Iris 655 iGPU, then got the NUC10 (i7-
10700U/
24EU) perhaps a month later at pretty much the same price (but with 2 extra cores) and finally I needed a 3rd for a proper HCI cluster and got lucky six months later in early 2020 with a NUC11 (i7-
1165G7/
96EU), because those remained almost impossible to buy for a long time afterwards.
Both the CPU and the GPU portion of the NUC8 and NUC10 are largely unchanged in terms of
architecture and fab process (14nm), they mostly trade silicon die area between the GPU (48 vs 24EU) and the CPU (4 vs 6 cores) parts of the chip, making them as comparable as it gets.
The NUC11 changes everything, 10nm process, redesigned CPU cores and GPU.
The NUCs have fully adjustable PL1/PL2 and TAU settings, a fan that is certainly capable of cooling 15 Watts, probably a bit more--if you can tolerate the noise... which I did for testing, but not for production. All NUCs (and most notebooks) will peak much to much higher PL2 for TAU seconds or until thermals kick in, HWinfo and its graphs showed all details on a remote observation system.
And to test peak power consumption and throttling I use Prime95 and Furmark, each and in combination like many others.
Of course I used the "maximum performance" settings first for the benchmarks and then tested various ways of dialing NUCs and their fans down to the point where they still gave me short-term peak performance as well as acceptable noise levels for sustained loads to ready them for the production use under Linux.
All those tests showed, that the iGPU portion of those SoCs is the last thing that ever throttles, CPU cores will always clock down first when PL2 runs beyond TAU or thermals kick in--unless the iGPU doesn't have load.
So the iGPU generally runs privileged and graphics benchmarks are not much affected by TDP settings until you go really low. Anandtech has tested passive Tiger Lake NUCalikes, which suffer in graphics as the system heats up, but that's below 10 Watts.
On the high-end the NUCs can go to 50 or even 64 Watts for the NUC11, but it's only the CPU cores that will really
use that wattage, the iGPU never goes near that and I believe I've never seen them use more than 10 Watts. Furmark never sees them throttle or clock down even at only 15 Watts PL1/PL2/TAU=0, for that you need to add Prime95 or go below 10 Watts of permissible TDP or cooling capacity (see Anandtech tests).
But with all that the NUC8, which should have twice the graphics performance of the NUC10 (48 vs 24 EUs), only got 50% uplift from 100% extra EUs and 128MB of eDRAM, while the NUC11 got 400% of the NUC10 graphics performance without eDRAM for a pretty much linear scale of 96vs24 EUs.
I hope that's enough data to put your doubts about the quality of my measurements to rest.
To me it showed that both the 24 extra EUs on the NUC8 and the 2 extra cores on the NUC10 weren't really worth having because of diminishing returns. In the first case, the graphics performance increase just wasn't worth the technical effort that went into making it happen and in the second case the 2 extra cores simply didn't pay off with only a 15 Watt TDP budget, because they needed to clock below the silicon knee even on truly parallel loads and wound up with a performance pretty similar to the NUC8.
Only when I unleashed PL2 and TAU they did reach their potential, but the NUC fans become intolerable above 2000rpm. The Tiger Lake managed the same CPU performance only using 4 cores with the improved IPC with much less noise.