News Intel's Meteor Lake GPU Doubles Integrated Graphics Performance Per Watt

Status
Not open for further replies.

bit_user

Polypheme
Ambassador
@JarredWaltonGPU thanks for publishing this writeup. I missed it, at the time, but knew I'd come back to it when Meteor Lake finally launched.

each Xe-Core comes with 16 Vector Engines. Those have the usual assortment of supported number formats, including FP32, FP16, INT8, and FP64 — the 64-bit support is a new feature, incidentally.
The fp64 support is for scalars, only. Calling it a "new feature" is only true in that it was initially dropped from Xe, but existed in prior generations of iGPU. fp64 is useful for certain geometry processing cases. Although it can be emulated, doing so would incur quite a performance penalty. I'm guessing the reason they re-added it is based on feedback from developers or from all the time they've spent investigating the various performance problems encountered by the Alchemist graphics cards. Whatever the reason, I'm glad to see its return, if only in scalar form.

One of the interesting aspects of the Meteor Lake "disaggregated architecture" is how the various elements that would normally be part of the GPU get scattered around the other tiles. The graphics tile handles everything discussed above, while the Display Engine and Xe Media Engine are part of the main SOC tile, and then the physical display outputs are part of the IO tile.
I believe the driving motivation behind this redistribution of blocks is to be able to power down the GPU tile, during activities like video playback.

Also, If I'm not mistaken, I think some of the display outputs are on the SoC tile, while others are on the I/O tile. If true, maybe the I/O tile can be kept in a low-power state when those displays aren't being used.

CGRqJAqwbPhqsBdVSmHhFT.jpg


It includes encoding and decoding support for AVC, HEVC, VP9, and AV1 — the latter two being the new additions relative to prior Intel GPUs. It can handle up to 8K60 10-bit HDR decoding, and 8K 10-bit HDR encoding
It's funny, because their slide sort of implies it doesn't handle realtime AV1 encoding.

hG7R4mDrJLHfeL9hDv3pwS.jpg


Also, the slide uses the ever-slippery weasel words "up to", when describing resolution and frame rate.

P7TSeyiBfpxsSsMfBweEpS.jpg


The low power optimizations work to avoid waking up cores, memory, and display elements when it's not necessary.
It sure sounds to me like they even went as far as integrating a chunk of DRAM into the display controllers, in order to avoid memory traffic. I won't quote the slides, but three of them contain the following annotations:
  • "Display buffering for race-to-halt on memory access" - okay, buffering where?
  • "Skip fetching and sending non-updated pixels" - you can only do this if you've got the previous frame buffered somewhere.
  • "Queuing frame into display to reduce core wakeups" - again, the frame is being buffered somewhere!

So, am I right that their display controllers have builtin frame buffers, or is the buffer memory in panel controllers (with PSR) actually that flexible?

we're not sure what sort of graphics configurations will be present on the desktop chips
As you probably know, by now, the Meteor Lake desktop CPUs got canceled. Back when you published this article, perhaps it was still an unconfirmed rumor.

Here are some other questions I had, which I'm guessing you probably don't know the answers to, but I thought I'd mention, anyway.

8zzR2zKRVA7BgWYmcUvwnW.jpg

So, when they say pairs of vector engines run in lock-step, does that mean they're fed by a dual-issue in-order instruction stream, as I think previous intel iGPU have done? Or is it more like they act in tandem, as an extended form of SIMD?

Also, what are Extended Math instructions? Do these consist of transcendental functions? Any others?
 
Last edited:
@JarredWaltonGPU thanks for publishing this writeup. I missed it, at the time, but knew I'd come back to it when Meteor Lake finally launched.
It's all a bit fuzzy in my head now and I'd have to bone up on things again, but the slides are included for people like you. :)
The fp64 support is for scalars, only. Calling it a "new feature" is only true in that it was initially dropped from Xe, but existed in prior generations of iGPU. fp64 is useful for certain geometry processing cases. Although it can be emulated, doing so would incur quite a performance penalty. I'm guessing the reason they re-added it is based on feedback from developers or from all the time they've spent investigating the various performance problems encountered by the Alchemist graphics cards. Whatever the reason, I'm glad to see its return, if only in scalar form.
I think there's (almost?) always limited FP64 support included, and maybe I had this wrong in the text.
I believe the driving motivation behind this redistribution of blocks is to be able to power down the GPU tile, during activities like video playback. Also, If I'm not mistaken, I think some of the display outputs are on the SoC tile, while others are on the I/O tile. If true, maybe the I/O tile can be kept in a low-power state when those displays aren't being used.
Probably correct.
It's funny, because their slide sort of implies it doesn't handle realtime AV1 encoding. Also, the slide uses the ever-slippery weasel words "up to", when describing resolution and frame rate.
I do seem to recall asking about this, I guess I may need to follow up with Intel to confirm lack of AV1. It would be very disappointing if Intel ditched that just to save die space, but from a cost-savings perspective lack of AV1 encode probably doesn't matter that much.
It sure sounds to me like they even went as far as integrating a chunk of DRAM into the display controllers, in order to avoid memory traffic. I won't quote the slides, but three of them contain the following annotations:
  • "Display buffering for race-to-halt on memory access" - okay, buffering where?
  • "Skip fetching and sending non-updated pixels" - you can only do this if you've got the previous frame buffered somewhere.
  • "Queuing frame into display to reduce core wakeups" - again, the frame is being buffered somewhere!
So, am I right that their display controllers have builtin frame buffers, or is the buffer memory in panel controllers (with PSR) actually that flexible?
Good observation, I'm not sure on the details and I suspect Intel may be cagey about what exactly is going on for a reason (as in, protect its secrets).
Here are some other questions I had, which I'm guessing you probably don't know the answers to, but I thought I'd mention, anyway.
So, when they say pairs of vector engines run in lock-step, does that mean they're fed by a dual-issue in-order instruction stream, as I think previous intel iGPU have done? Or is it more like they act in tandem, as an extended form of SIMD?

Also, what are Extended Math instructions? Do these consist of transcendental functions? Any others?
Not sure on the first bit, but the Extended Math should be all the less common stuff like transcendental functions, square roots, and other complex bits and bobs. There's probably a white paper on it somewhere that says what the breakdown is for math that's "general" and supported on all the execution cores, and what stuff qualifies as "extended."
 
  • Like
Reactions: bit_user

bit_user

Polypheme
Ambassador
I think there's (almost?) always limited FP64 support included, and maybe I had this wrong in the text.
Xe/Alchemist did 100% drop hardware fp64! Intel's architecture slides on Xe showed that fp64 was "optional", but I can say it's definitely not in the Xe iGPU in Alder Lake, at least. I have also read that it's absent from Alchemist, but not confirmed it myself.

I suspect Intel may be cagey about what exactly is going on for a reason (as in, protect its secrets).
If they patented it, then maybe not? I could go trolling through the patent database, but there's a limit to my curiosity.

I do recall something about a patent indicating on-die DRAM and we speculated about it acting as a L4 cache. What if that was actually the frame buffer memory?

Thanks for the reply!
: )
 
  • Like
Reactions: JarredWaltonGPU
Status
Not open for further replies.