@JarredWaltonGPU thanks for publishing this writeup. I missed it, at the time, but knew I'd come back to it when Meteor Lake finally launched.
each Xe-Core comes with 16 Vector Engines. Those have the usual assortment of supported number formats, including FP32, FP16, INT8, and FP64 — the 64-bit support is a new feature, incidentally.
The fp64 support is for scalars, only. Calling it a "new feature" is only true in that it was initially dropped from Xe, but existed in prior generations of iGPU. fp64 is useful for certain geometry processing cases. Although it can be emulated, doing so would incur quite a performance penalty. I'm guessing the reason they re-added it is based on feedback from developers or from all the time they've spent investigating the various performance problems encountered by the Alchemist graphics cards. Whatever the reason, I'm glad to see its return, if only in scalar form.
One of the interesting aspects of the Meteor Lake "disaggregated architecture" is how the various elements that would normally be part of the GPU get scattered around the other tiles. The graphics tile handles everything discussed above, while the Display Engine and Xe Media Engine are part of the main SOC tile, and then the physical display outputs are part of the IO tile.
I believe the driving motivation behind this redistribution of blocks is to be able to power down the GPU tile, during activities like video playback.
Also, If I'm not mistaken, I think some of the display outputs are on the SoC tile, while others are on the I/O tile. If true, maybe the I/O tile can be kept in a low-power state when those displays aren't being used.
It includes encoding and decoding support for AVC, HEVC, VP9, and AV1 — the latter two being the new additions relative to prior Intel GPUs. It can handle up to 8K60 10-bit HDR decoding, and 8K 10-bit HDR encoding
It's funny, because their slide sort of implies it doesn't handle realtime AV1
encoding.
Also, the slide uses the ever-slippery weasel words
"up to", when describing resolution and frame rate.
The low power optimizations work to avoid waking up cores, memory, and display elements when it's not necessary.
It sure sounds to me like they even went as far as integrating a chunk of DRAM into the display controllers, in order to avoid memory traffic. I won't quote the slides, but three of them contain the following annotations:
- "Display buffering for race-to-halt on memory access" - okay, buffering where?
- "Skip fetching and sending non-updated pixels" - you can only do this if you've got the previous frame buffered somewhere.
- "Queuing frame into display to reduce core wakeups" - again, the frame is being buffered somewhere!
So, am I right that their display controllers have builtin frame buffers, or is the buffer memory in panel controllers (with PSR) actually that flexible?
we're not sure what sort of graphics configurations will be present on the desktop chips
As you probably know, by now, the Meteor Lake desktop CPUs got canceled. Back when you published this article, perhaps it was still an unconfirmed rumor.
Here are some other questions I had, which I'm guessing you probably don't know the answers to, but I thought I'd mention, anyway.
So, when they say pairs of vector engines run in lock-step, does that mean they're fed by a dual-issue in-order instruction stream, as I think previous intel iGPU have done? Or is it more like they act in tandem, as an extended form of SIMD?
Also, what are Extended Math instructions? Do these consist of transcendental functions? Any others?