News Intel's New AVX10 Brings AVX-512 Capabilities to E-Cores

I am just glad they have a strategy to unify ISA with the new hybrid chips. Seems really myopic this wasn't solved ages ago. So horrible to have to fuse off capable silicon features due to not working out the details how small and large cores would work together.
 
Is it me, or should this have been called AVX 4 instead of AVX 10?

There was AVX or
AVX 1
AVX 2 (Mostly 256b)
AVX 3 (512b Extensions)

This update should've been called AVX 4?

There is no such thing as AVX3. We only have AVX, AVX2 and AVX-512 x86 ISAs. AVX10 is just a superset of AVX-512. AVX10 will enable AVX-512 capabilities across both Performance and Efficient core designs with hybrid processors.

AVX10 contains all the richness of AVX-512 and additional features/capabilities while being able to work for both P and E cores, respectively.

Edit:

AVX10 actually has 2 subsets, AVX10/256, similar to AVX2, and AVX10/512 which is similar to AVX-512.
 
Last edited by a moderator:
I am just glad they have a strategy to unify ISA with the new hybrid chips. Seems really myopic this wasn't solved ages ago. So horrible to have to fuse off capable silicon features due to not working out the details how small and large cores would work together.
I think this is mostly the manufacturing delays coming into play. RPL was never originally supposed to exist, and now we're getting a refresh of it.

I'm curious if MTL has any implementation (like ADL did with AVX 512) since the cores in it and Granite Rapids are the same.
 
Last edited:
didnt ppl get workarounds to run avx-512 on their hybrid chips then intel said "no" and shut that method down?
Well not really, you had to turn the hybrid into a classic CPU by turning off the e-cores and for intel it was more important for people to get used to the hybrid approach then for them to get avx-512.
I am just glad they have a strategy to unify ISA with the new hybrid chips. Seems really myopic this wasn't solved ages ago. So horrible to have to fuse off capable silicon features due to not working out the details how small and large cores would work together.
It won't be unified, the e-cores will still only be able to do avx-256 and they will have the thread director or whatever make it work.
They could have done this on older CPUs, they could do this now on all hybrid CPUs.
Diving deeper, the AVX10 (Advanced Instruction Extensions 10) ISA is a superset of AVX-512 and comes with all of the features of the AVX-512 ISA for processors with both 256-bit and 512-bit vector register sizes.
 
AVX 3 (512b Extensions)

This update should've been called AVX 4?
This is not really an extension of AVX-512, but rather a step back and a start-over.

Many in the programmer community have been asking for CPUs with a short-vector version of AVX-512 for a while, often using the provisional name "AVX-256".
The set isn't just about extending the width and number of registers. For instance, the use of boolean vectors for conditional load/store per lane is a big thing.
 
Last edited:
  • Like
Reactions: bit_user
Is it me, or should this have been called AVX 4 instead of AVX 10?

There was AVX or
AVX 1
AVX 2 (Mostly 256b)
AVX 3 (512b Extensions)

This update should've been called AVX 4?
The problem is that you think too logically. These things are decided by marketing people, and they probably feel like "AVX-512" sounds a lot like AVX5. So, they want to call it AVX10 to make it sound way better (even though it's not).

I mean, why did Nvidia go from 700, 900, 1000, 2000, 3000, 4000? It's because "one better" doesn't sound like much, once you get above 10. You want each generation to sound a lot better, even if it's not (as in this case).

Oh, and by the way, I wouldn't even call it AVX4. If we're being logical, then 10.1 is basically just a way to tell software whether the AVX registers are 256 bits or 512 bits, apart from whether or not the AVX-512 instruction set is itself supported. 10.1 really doesn't add any real functionality that doesn't already exist in AVX-512.
 
AVX10 actually has 2 subsets, AVX10/256, similar to AVX2, and AVX10/512 which is similar to AVX-512.
No, you had it right the first time. As it stands today, AVX-512 instructions can operate on 128 bit, 256 bit, or 512 bit operands. AVX10.1 is just rebranding AVX-512, while adding an additional variable to indicate whether the implementation supports all 3 operand sizes, or whether it supports only the first two.
 
RPL was never originally supposed to exist,
Really??? Source?

I'm curious if MTL has any implementation (like ADL did with AVX 512) since the cores in it and Granite Rapids are the same.
Yeah, good question... except that Meteor Lake's CPU tile is slated for the Intel 4 process node, while Granite Rapids is slated for Intel 3. I know the nodes are similar, but I don't know if their layout-compatible.
 
This is not really an extension of AVX-512, but rather a step back and a start-over.

Many in the programmer community have been asking for CPUs with a short-vector version of AVX-512 for a while, often using the provisional name "AVX-256".
The sad part is that they didn't make it truly variable-length, like ARM's SVE. I believe each instruction still has an explicit operand length indicator, which means you need two versions of your code, to handle both the 256-bit and 512-bit cases.
 
I'm betting that their hybrid CPUs with AVX10 will implement at 256-bits on both the P-cores and E-cores, unless you know differently. Going hybrid-ISA creates more headaches than it's worth.
That IS what I said.
e-cores will only be able to do 256 so it will not be unified, if you run 512 it will only run on the p-cores, if you run 256 or below it will probably run on all cores.
Unified would mean that all cores can do all the same things.
 
That IS what I said.
e-cores will only be able to do 256 so it will not be unified, if you run 512 it will only run on the p-cores, if you run 256 or below it will probably run on all cores.
Unified would mean that all cores can do all the same things.
Okay, well that's not consistent with the official Intel Technical paper, which says:

"The converged version of the Intel AVX10 vector ISA will include Intel AVX-512 vector instructions with an
AVX512VL feature flag, a maximum vector register length of 256 bits, as well as eight 32-bit mask registers and
new versions of 256-bit instructions supporting embedded rounding. This converged version will be supported on
both P-cores and E-cores. While the converged version is limited to a maximum 256-bit vector length
, Intel AVX10
itself is not limited to 256 bits, and optional 512-bit vector use is possible on supporting P-cores. Thus, Intel AVX10
carries forward all the benefits of Intel AVX-512 from the Intel® Xeon® with P-core product lines
"
Source: https://cdrdv2.intel.com/v1/dl/getContent/784343

They key word is "converged version", which seems to be a shorthand for AVX10/256. They are very clear about hybrid CPUs supporting this converged version, meaning even their P-cores will support only 256-bit.

It's the Xeon P-cores which they're saying will support 512-bit.
 
  • Like
Reactions: TJ Hooker
The problem is that you think too logically. These things are decided by marketing people, and they probably feel like "AVX-512" sounds a lot like AVX5. So, they want to call it AVX10 to make it sound way better (even though it's not).

I mean, why did Nvidia go from 700, 900, 1000, 2000, 3000, 4000? It's because "one better" doesn't sound like much, once you get above 10. You want each generation to sound a lot better, even if it's not (as in this case).

Oh, and by the way, I wouldn't even call it AVX4. If we're being logical, then 10.1 is basically just a way to tell software whether the AVX registers are 256 bits or 512 bits, apart from whether or not the AVX-512 instruction set is itself supported. 10.1 really doesn't add any real functionality that doesn't already exist in AVX-512.
This is why I "HATE" when marketing gets to make the naming decisions.

"Marketing" shouldn't be allowed to make names on the technical side of a product.

Leave that to the engineers and let them name it.

"Marketing" should only be about their job, selling it to the expected audience base.
 
  • Like
Reactions: bit_user
Okay, well that's not consistent with the official Intel Technical paper, which says:
"The converged version of the Intel AVX10 vector ISA will include Intel AVX-512 vector instructions with an​
AVX512VL feature flag, a maximum vector register length of 256 bits, as well as eight 32-bit mask registers and​
new versions of 256-bit instructions supporting embedded rounding. This converged version will be supported on
both P-cores and E-cores. While the converged version is limited to a maximum 256-bit vector length, Intel AVX10​
itself is not limited to 256 bits, and optional 512-bit vector use is possible on supporting P-cores. Thus, Intel AVX10
carries forward all the benefits of Intel AVX-512 from the Intel® Xeon® with P-core product lines"​

They key word is "converged version", which seems to be a shorthand for AVX10/256. They are very clear about hybrid CPUs supporting this converged version, meaning even their P-cores will support only 256-bit.

It's the Xeon P-cores which they're saying will support 512-bit.
Did you like post the quote without reading all of it?!?!?!
This is the end of it which says that the p-cores, and only those, will have full 512, unless you think hat "supporting p-cores" won't be in the future desktop CPUs even though they laser fused avx off in previous versions, was there any talk about having designed avx completely out of them??? Because I didn't hear anything of the sort.

Intel AVX10 itself is not limited to 256 bits, and optional 512-bit vector use is possible on supporting P-cores. Thus, Intel AVX10
carries forward all the benefits of Intel AVX-512 from the Intel® Xeon® with P-core product lines"​
 
Did you like post the quote without reading all of it?!?!?!
Yes, I read it. I think it's clear enough, but here's an excerpt from the Architecture Specification, providing further insight into their plans for 512-bit support:
A “converged” version of Intel AVX10 with maximum vector lengths of 256 bits and 32-bit opmask registers will be supported across all Intel processors, while 512-bit vector registers and 64-bit opmasks will continue to be supported on some P-core processors.​

I think "some P-core processors" means their P-core -only Xeons.

This is the end of it which says that the p-cores, and only those, will have full 512, unless you think hat "supporting p-cores" won't be in the future desktop CPUs
You seem to be overlooking this part:
"This converged version will be supported on both P-cores and E-cores. While the converged version is limited to a maximum 256-bit vector length"

There really doesn't seem to be any ambiguity, at this point. I think Intel just nailed shut the coffin on having 512-bit in their client processors.
 
  • Like
Reactions: TJ Hooker
Yeah, good question... except that Meteor Lake's CPU tile is slated for the Intel 4 process node, while Granite Rapids is slated for Intel 3. I know the nodes are similar, but I don't know if their layout-compatible.
Seeing as it doesn't sound like it'll be ready for e-cores yet I'm sure it would probably be disabled if it's there, but still curious.
Really??? Source?
View: https://twitter.com/IanCutress/status/1569233082212818944?t=8h9BOCT0QX5Y-OE9EnCY-g&s=19


It has popped up elsewhere, but I don't think it ever got any of its own articles.
 
  • Like
Reactions: cyrusfox
I think "some P-core processors" means their P-core -only Xeons.
The only reason they only mention the xeon CPUs here is that at the moment those are the only ones with active avx-512 support.
If desktop CPUs will have avx10 .1 .2 whatever then they will have avx-512, that's what carries forward means, it will carry over to anything with avx10 support.

Thus, Intel AVX10
carries forward all the benefits of Intel AVX-512 from the Intel® Xeon® with P-core product lines
"


Also here no distinction is being made, they don't say only on xeon p-cores.

Apart from a few special cases, those instructions will be supported at all vector lengths, with 128-bit and 256-bit vector lengths being supported across all processors, and 512-bit vector lengths additionally supported on P-core processors.
feSpM6x.jpg
 
Since I don't have a Twitter account, can you please copy-and-paste the quote?
You should be able to view direct links without one, but (and PRL is just RPL mistype obviously):
OK sorry reconfirmed this.
#IntelTechTour : PRL only exists because MTL wasn't going to be ready on time. RPL dev started 2 yr ago. GPU RTL and IO RTL hasn't changed from ADL. 41% improved MT perf of RPL over ADL, 15% ST, based on SPECint207.- Isic Silas, Intel Corp VP of CCG.
Sep 12, 2022
 
seems anandtech also has a post about this. and Gavin posted this :

"
To clarify the two different quotes. AVX-512 will still be there as it's a superset, hence the backward compatibility that AVX10 offers. Having x86 backward compatibility is important.

AVX10 will replace AVX-512 going forward, and developers, where applicable, can recompile to ensure compatibility and leverage the efficiency and performance bonuses.

Intel has alluded to divulging whether or not 512-bit wide vectors will be supported on chips and cores going forward, but they have committed to support 256-bit at the very least. "
 
You should be able to view direct links without one, but (and PRL is just RPL mistype obviously):
That used to be the case, but didn't Elon start requiring a sign-in just to read tweets?

Anyway, I've been blocking Twitter in my routing tables for ages, since you can use their ad network without even visiting Twitter.com and I wanted to ensure they got no ad revenue from me. That goes back to the pre-Elon era, when they took a stance on targeted political ads I didn't like. That practice is poisonous for democracy.