News Intel's Arrow Lake for Desktops and Laptops Will Have Different Instruction Sets

If you look at the table in the Tweet that the article references, the distinction seems to follow the generation of E-core. For instance Sierra Forest matches Arrow Lake (mobile), while Clearwater Forest matches Arrow Lake-S. The difference isn't that big, however. Just a couple crypto instructions and AVX-VNNI-INT16 - which doesn't seem like much of a game-changer for desktops. You'll still want a dGPU for any serious inferencing.

This E-core revelation is very interesting. It could indicate that Arrow Lake (mobile) will use Intel 3, rather than Intel 20A!

Intel's Arrow Lake S to support new AMX, AVX, SHA512, and SM instructions.
I don't know why the article's subtitle said anything about AMX, as there's no indication that it does. Furthermore, the mention of AVX got my hopes up that they were doing AVX10, but it seems not.
 
This E-core revelation is very interesting. It could indicate that Arrow Lake (mobile) will use Intel 3, rather than Intel 20A!
It does not.

Just based on the chart then it would indicate Lunar Lake is also on Intel 3 based on your logic, and at the same time on Intel 18A because it's also on the same chart as Clearwater Forest.

Arrowlake S doesn't have the LP E cores while mobile versions do hence why they need to disable the instructions on mobile(since LP E cores are still based on Crestmont), because the LP E cores doesn't support the latest instructions.

Arrowlake - N3 and 20A
Lunarlake - N3
 
Just based on the chart then it would indicate Lunar Lake is also on Intel 3 based on your logic, and at the same time on Intel 18A because it's also on the same chart as Clearwater Forest.
Maybe I didn't state that very clearly, because there's no inconsistency in my interpretation.

If you take another look, you'll see that Lunar Lake gets all the features of Arrow Lake (mobile), Arrow Lake S, and some additional features too. That's because it's new enough to inherit everything, by virtue of having yet newer-generation cores.

Arrowlake S doesn't have the LP E cores while mobile versions do hence why they need to disable the instructions on mobile(since LP E cores are still based on Crestmont), because the LP E cores doesn't support the latest instructions.
Okay, fair point. So, the culprit is likely the SoC tile of Arrow Lake (mobile) being a hand-me-down from Meteor Lake. Makes sense.
 
Are those instruction sets Intel specific? like the AVX-VNNI-INT16? which ones are used by AMD and Nvidia in their accelerators?
They're x86-64 instructions, so they have nothing to do with any GPUs.

I don't know what sort of arrangement Intel and AMD might have (or not), regarding new x86-64 ISA extensions. AMD has done a pretty good job of keeping just a couple generations behind Intel. I don't consider it a given that AMD will necessarily implement everything Intel does, however. Especially big things, like AMX.
 
  • Like
Reactions: Lucky_SLS
x86 ISA is already fragmented as hell and this gets worse the situation of Intel CPUs. Programming for Intel is becoming an increasing nightmare.
A really good choice in AMD CPUs is that of not fragment theiistruction set.
It seems that Intel is doing all it can to kill x86 ISA.
 
A really good choice in AMD CPUs is that of not fragment theiistruction set.
It seems that Intel is doing all it can to kill x86 ISA.
So you are telling us that AMD CPUs are still using the x86 ISA from 1970 without any modification?!
CPUs are constantly evolving and every gen has, or at least can have, different extra instructions on it. AMD is no different here, as they get access to newer instructions they implement them in their CPUs as well.
AMD was the one that came up with x86_64 which was the biggest segmentation of x86 EVER.
 
So you are telling us that AMD CPUs are still using the x86 ISA from 1970 without any modification?!
CPUs are constantly evolving and every gen has, or at least can have, different extra instructions on it. AMD is no different here, as they get access to newer instructions they implement them in their CPUs as well.
AMD was the one that came up with x86_64 which was the biggest segmentation of x86 EVER.
I'm obviously telling that in AMD CPUs, for every generetion, all market segments have the same istructions and every generation have a superset of the previous one.
In Intel CPUs, from years server and desktop have different istructions, recently servers have also extension for matrix, crypto and so on. On recent desktop the same CPU have cores with different set of istructions (performance/efficiency).
Look at the mess done in the years with MMX, SSE, AVX.
AVX is so problematic that now Intel itself need a new specification (AVX10) not to expand, but to group AVX istructions.
x86_64 (or AMD64 as originally called) is not a fragmentation but an expansion of x86 ISA, but I bet you know the difference. From my POV it was a very well done expansion considering the AMD resources and other factors.
 
I'm obviously telling that in AMD CPUs, for every generetion, all market segments have the same istructions and every generation have a superset of the previous one.
In Intel CPUs, from years server and desktop have different istructions, recently servers have also extension for matrix, crypto and so on. On recent desktop the same CPU have cores with different set of istructions (performance/efficiency).
Look at the mess done in the years with MMX, SSE, AVX.
AVX is so problematic that now Intel itself need a new specification (AVX10) not to expand, but to group AVX istructions.
x86_64 (or AMD64 as originally called) is not a fragmentation but an expansion of x86 ISA, but I bet you know the difference. From my POV it was a very well done expansion considering the AMD resources and other factors.
AMD does not in fact have the same instruction set in every generation across market segments, this is trivially provable if you actually look at AMD specs. Different segments can and do have different feature sets. You can't use iGPU and decoding instructions on CPUs that don't have integrated graphics, you can't use certain AI and and security processor instructions on CPUs that don't have them, and AMD Pro CPUs have different instruction sets than non pro CPUs for various management tasks and more importantly for the people who want to buy them disabled certain instructions from being run. And yes, AMD has depreciated instructions so no, not every AMD instruction set is a superset of some other instruction set.

AVX is not an Intel specific instruction set and AMD has the same fragmentation property for AVX as Intel across their supported products. What Intel has that AMD doesn't is CPU internally disaggregated instruction sets for AVX because some cores support some things and others don't. That's the purpose of AVX10. To create one instruction set with different features that can be present or not instead of AVX, AVX2, and AVX-512 and all the various additions to each being separate instruction sets. And that's something that will benefit AMD as well if they want to implement it because it would simplify their own instruction set situation allowing more rapid feature integration as well as simplify their own instruction aggregation that comes from doing AVX-512 by ganging AVX2 units.
 
Since SM3 and SM4 are meant for china, maybe they're not included as these chips will not be sold in china and wouldn't affect the rest of the world?
 
AMD does not in fact have the same instruction set in every generation across market segments, this is trivially provable if you actually look at AMD specs. Different segments can and do have different feature sets. You can't use iGPU and decoding instructions on CPUs that don't have integrated graphics,
There are no CPU instructions specifically for controlling the iGPU. It works like an independent core that sits on the same die and shares access to the same memory.

you can't use certain AI and and security processor instructions on CPUs that don't have them,
The security processor is actually an ARM-licensed core. It's treated like a black box. Even the OS doesn't get to run custom code on it.

and AMD Pro CPUs have different instruction sets than non pro CPUs for various management tasks
Really? Such as?

AVX is not an Intel specific instruction set and AMD has the same fragmentation property for AVX as Intel across their supported products.
Not really. All Zen-era CPUs support AVX and AVX2. From Zen 4 onward, they all support AVX-512.

What Intel has that AMD doesn't is CPU internally disaggregated instruction sets for AVX because some cores support some things and others don't.
No, they don't support only certain instruction on certain cores. They're trying very hard to make all cores look the same to software, so that the operating system can freely schedule any thread on any core.

Yes, the P-cores of Gen 12+ client CPUs have/had hardware for doing AVX-512, but that's been fused off. So, it really doesn't count for anything.

That's the purpose of AVX10. To create one instruction set with different features that can be present or not instead of AVX, AVX2, and AVX-512 and all the various additions to each being separate instruction sets.
It indeed unifies AVX-512. However, it does add the wrinkle of whether support is at 256-bit or 512-bit width. In practice, this probably means not much AVX10/512 code will be written, because it will only work on Intel's server CPUs, just like the situation we're currently in with AVX-512 on Intel CPUs.

And that's something that will benefit AMD as well if they want to implement it because it would simplify their own instruction set situation
No, it won't simplify anything for AMD, because they opted to support AVX-512 on all Zen 4 cores.

as well as simplify their own instruction aggregation that comes from doing AVX-512 by ganging AVX2 units.
That's not really how they implement it.
 
AMD does not in fact have the same instruction set in every generation across market segments, this is trivially provable if you actually look at AMD specs. Different segments can and do have different feature sets. You can't use iGPU and decoding instructions on CPUs that don't have integrated graphics, you can't
iGPU is a coprocessor, is has nothing to do with x86 ISA.
use certain AI and and security processor instructions on CPUs that don't have them, and AMD Pro CPUs have different
In the Pro AMD have added a AI accellerator but not modified the instruction set.
instruction sets than non pro CPUs for various management tasks and more importantly for the people who want to buy them disabled certain instructions from being run. And yes, AMD has depreciated instructions so no, not every AMD instruction set is a superset of some other instruction set.
Not from what I know.
AVX is not an Intel specific instruction set
Never said it is specific, I said that is an Intel creation.
and AMD has the same fragmentation property for AVX as Intel across their supported products. What Intel has that
AMD is less fragmented cause they choose to support a specific level from a specific generation and go forward never back. For example, all zen4, from monstrous server to smaller laptop support a specific feature set of AVX-512. Zen5 for sure will support same and more, so developers will have nice days. 🙂
AMD doesn't is CPU internally disaggregated instruction sets for AVX because some cores support some things and others don't. That's the purpose of AVX10. To create one instruction set with different features that can be present or not instead of AVX, AVX2, and AVX-512 and all the various additions to each being separate instruction sets. And that's something that will benefit AMD as well if they want to implement it because it would simplify their own instruction set situation allowing more rapid feature integration as well as simplify their own instruction aggregation that comes from doing AVX-512 by ganging AVX2 units.
AVX10 is just a specification, no new instructions, and while the purpose is good, the simple fact that it is needed demostrate that in Intel there is a lack of vision from long time.
Maybe the most terrific thing is that AVX10 already have problems which undermines its usefulness.
No comment.
 
There are no CPU instructions specifically for controlling the iGPU. It works like an independent core that sits on the same die and shares access to the same memory.

Well you should read better. I didn't say CPU instruction sets, I said instruction sets. I said this specifically because I was addressing a complaint that Intel CPUs were hard to code for because the instruction set was a mess but AMDs was not while the person made a bunch of exaggerated claims. You should really understand a conversation and it's context before opening your mouth and shoving your foot in it.
The security processor is actually an ARM-licensed core. It's treated like a black box. Even the OS doesn't get to run custom code on it.
Ahh! Why are people so dumb! You can specify in programming what encryption is used, and different CPUs have different availability of encryption sets based on if a security processor is present and what instruction sets it uses. And this is different across market segments!
No, they don't support only certain instruction on certain cores. They're trying very hard to make all cores look the same to software, so that the operating system can freely schedule any thread on any core.

Yes, the P-cores of Gen 12+ client CPUs have/had hardware for doing AVX-512, but that's been fused off. So, it really doesn't count for anything.
*headdesk* Why? Why would you say something this dumb and untrue! The different cores have different AVX instruction sets. That's just a fact. Worse yet, while there are some P cores with the AVX-512 physically fused off, it's not the case on the vast majority of them and certainly wasn't the case for the initial batch! For almost all P core implementations in client models the AVX-512 instructions are disabled in internal microcode.

It indeed unifies AVX-512. However, it does add the wrinkle of whether support is at 256-bit or 512-bit width. In practice, this probably means not much AVX10/512 code will be written, because it will only work on Intel's server CPUs, just like the situation we're currently in with AVX-512 on Intel CPUs.
No, it won't simplify anything for AMD, because they opted to support AVX-512 on all Zen 4 cores.
Not really. All Zen-era CPUs support AVX and AVX2. From Zen 4 onward, they all support AVX-512.
... AHHHH! These are both just nonsense! Not least of which is because you can't go back and change instruction sets on already released CPUs!

AVX has at least 3 different instruction sets for different features! AVX2 has something like a dozen different instruction sets that offer different features. AVX-512 is even more fractured! AVX on Zen 1 is a different instruction set than AVX on Zen 2. AVX2 on Zen 1 is a completely different instruction set from AVX2 on Zen2. And so on and so on. It's the same for the Intel side because the AVX series of instruction sets doesn't do optional features! Any additional changes create a whole new instruction set like SSE version numbers were different instruction sets. And Zen4 supports a very specific set of AVX-512 instructions but not others!

The big change with AVX series was that because of the nature of what was going on they didn't just version number it out because they didn't want to do a USB3 gen1/2 thing. So even though it's called AVX, AVX2, and AVX-512 there are dozens of different instruction sets involved. And what instruction sets each generation actually supports changes as well. And if you want to do the most optimization you have to setup optimizations per CPU because not all AVX2 supporting CPUs run all of the AVX2 instructions! This is even more true with AVX-512, and yes much worse on Intel's side because they have had AVX-512 longer.

And because of this, Developers have just tended to stick with AVX series instructions supported across the whole spectrum with some going so far as to do different compiles for different CPUs generations, but it's not really that common because it's a lot of time and effort still. But most commonly even AVX2 optimizations are forgone because no one wants to deal with having to make different compiles for Zen2, Zen3, and Zen4 CPUs and different complies for all of Intel's CPUs as well.

What AVX10 does is unify the whole AVX, AVX2, and AVX-512 instruction set and allow hardware level microcode to and compilers to obfuscate these differences between AVX versions, AVX2 versions, and AVX-512 versions for the developers so that they don't have to do as much work to code AVX series code. The first AVX10 specification is aimed at getting the individual AVX types to unify going forward and the second specification is supposed to provide even more obfuscation by unifying AVX types at the microcode level and compiler level so that a dev doesn't have to care at all about what AVX series is available on the processor because the CPU will sort it out for them by making sure that AVX-512 instructions go only to the 512 units and AVX2 instructions go to AVX2 compatible units. The compilers and the CPUs will also have fall back instructions built in if the code calls for a feature that's not available. And in theory this is all supposed to be transparent to both the Dev and the End User.

So yes, AMD does have something to gain in theory by moving to AVX10 going forward. Because it simplifies the coding for developers going forward for them before they get to the point Intel is at with over a dozen different AVX2 instruction sets because they have been putting AVX2 on chips since 2014 quite a few of which are still in service! AMD would be well advised to head off this problem before they have to deal with Zen 6 and Zen 7 based CPUs being in operation with Zen 4 CPUs and developers having to make 4 different versions of their AVX-512 code compiles or just defaulting to only the instructions in Zen 4's AVX-512 specification.

Intel needs AVX10 to alleviate it's problems with having to build big cores and then not being able to use a huge amount of the transistors because it's cores natively come with different instruction sets so it has to drop to the instructions supported across the entire core stack only. But going forward AVX10 will also simplify other aspects of AVX series instructions that have been hamstringing Dev adoption for year that both companies can benefit from.
 
If you take another look, you'll see that Lunar Lake gets all the features of Arrow Lake (mobile), Arrow Lake S, and some additional features too. That's because it's new enough to inherit everything, by virtue of having yet newer-generation cores.
Arrow and Lunar use same cores, Lion Cove and Skymont. The tile approach and low-level details will be different, hence why Lunar is aimed at ultra low power.

There are some rumors that some Arrowlake might come as a "Meteorlake on N3".
 
Well you should read better. I didn't say CPU instruction sets, I said instruction sets. I said this specifically because I was addressing a complaint that Intel CPUs were hard to code for because the instruction set was a mess
This is an article about x86, though. Trying to drag the subject of iGPUs into the discussion is not only irrelevant but also wrong.

It's wrong because Intel changed their iGPU instruction set between Gen 9 and Gen 11, and then again with Gen 12. It's also irrelevant, because games don't generally contain native GPU code, but instead contain Direct3D HLSL shader code that gets compiled at runtime.

You should really understand a conversation and it's context before opening your mouth and shoving your foot in it.
You saying that to me = 🤣

Ahh! Why are people so dumb! You can specify in programming what encryption is used, and different CPUs have different availability of encryption sets based on if a security processor is present and what instruction sets it uses. And this is different across market segments!
This is an article about x86 ISA differences. The security processor has nothing to do with that. What's "dumb" is to muddle a straight-forward discussion of x86 ISA with discussion of all the other hardware engines a CPU might or might not have, because they don't interact with programs in the same (or even a meaningful) way.

What @NinoPino seemed concerned with was application developers having to support their programs on different CPUs, and the issues posed by Intel's x86 ISA shenanigans. These userspace programs don't have any direct interactions with the security processor - that's all handled by the OS, drivers, and BIOS. Therefore, it's not relevant to the discussion.

If you think someone else is dumb for not agreeing with you, maybe the problem is that you're just missing something they're not.

*headdesk* Why? Why would you say something this dumb and untrue! The different cores have different AVX instruction sets. That's just a fact.
The key word is "support". They implemented, but don't support AVX-512 on P-cores in Gen 12+ CPUs. They also said they didn't validate it, which means there could be bugs affecting those who have early CPU revs and motherboards which let them enable it.

BTW, I think you need to learn the distinction between "dumb" and "ignorant". See also: arrogant.

AVX on Zen 1 is a different instruction set than AVX on Zen 2. AVX2 on Zen 1 is a completely different instruction set from AVX2 on Zen2. And so on and so on. It's the same for the Intel side because the AVX series of instruction sets doesn't do optional features! Any additional changes create a whole new instruction set like SSE version numbers were different instruction sets.
You're confused. Note that I say "confused" and not "dumb".

AVX introduced a set of instructions which are present on every CPU advertising the feature in its CPUID flags. Likewise, AVX2 did the same. If you take a piece of software which uses AVX or AVX2, then it'll run on any CPU which advertises AVX or AVX2 support, respectively. It's really that simple.

And Zen4 supports a very specific set of AVX-512 instructions but not others!
Zen4 implemented all AVX-512 subsets supported by Ice Lake, and then added Cooper Lake's BF16 instructions. Compared with Gen 12, the main things it's lacking are VP2INTERSECT and FP16. The former is a single instruction that you can easily emulate with a handful of others, while the latter is of limited value, given they already have BF16. Intel's Gen 12 P-core was the first to support either.


In other words, no. Zen4 isn't terribly lacking in the completeness of its AVX-512 support.

not all AVX2 supporting CPUs run all of the AVX2 instructions!
Since you're repeating this claim, I'm sure you'll be able to provide specific examples, should you decide to double-down on it.

This is even more true with AVX-512, and yes much worse on Intel's side because they have had AVX-512 longer.
What makes AVX-512 different is that they created distinct CPUID flags for each of the subsets. So, while we can talk about AVX-512 like it's one thing, a program has to actually check what subsets the CPU supports, in order to ensure binary compatibility.

because of this, Developers have just tended to stick with AVX series instructions supported across the whole spectrum with some going so far as to do different compiles for different CPUs generations,
Until Gen 11, there really wasn't much benefit to supporting any AVX-512, in non-server/workstation programs, because none of the client CPUs had it.

The other big problem facing early adopters of AVX-512 is how inefficient it ran on 14 nm CPUs. This manifested as clock-throttling, in server CPUs, and very high power-usage in client CPUs. With the latest generation of CPUs to implement it, these downsides are nearly gone.

What AVX10 does is unify the whole AVX, AVX2, and AVX-512 instruction set
Eh, not more than AVX-512 already did. It could already operate on 128-bit and 256-bit operands, as well as 512-bit ones, so AVX10 really isn't changing anything, there. Nor is it deprecating AVX or AVX2. Essentially, all it's doing is saying:
  • prior AVX-512 subsets are now subsumed into a single feature.
  • 512-bit operand support is now optional.

Beyond that, there are a couple, minor changes it makes in flag-handling and (IIRC) load/store instructions vs. AVX-512.

The first AVX10 specification is aimed at getting the individual AVX types to unify going forward
AVX10.1 is virtually just a CPUID-level change. Really minor stuff.

a dev doesn't have to care at all about what AVX series is available on the processor because the CPU will sort it out for them by making sure that AVX-512 instructions go only to the 512 units and AVX2 instructions go to AVX2 compatible units.
Intel has stated that hybrid client CPUs will not support different widths of AVX10 on different cores. AVX10/512 will be limited to P-core only CPUs.

Here's the quote:

A “converged” version of Intel AVX10 with maximum vector lengths of 256 bits and 32-bit opmask registers will be supported across all Intel processors, while 512-bit vector registers and 64-bit opmasks will continue to be supported on some P-core processors.

Source: https://cdrdv2-public.intel.com/784267/355989-intel-avx10-spec.pdf

BTW, it's not possible for the CPU to steer instructions to one core or another - that would need to be handled by the OS. All the CPU can do is generate a NMI, when an unsupported instruction was executed on a given core. And any scheme you come up with for having the OS steer threads to different cores, based on using unsupported instructions, has pitfalls and downsides that make it a problematic solution, at best.

The compilers and the CPUs will also have fall back instructions built in if the code calls for a feature that's not available. And in theory this is all supposed to be transparent to both the Dev and the End User.
You can't have a fallback for using 512-bit operands, if the CPU doesn't implement 512-bit registers.
 
Last edited:
  • Like
Reactions: thestryker
Intel has stated that hybrid client CPUs will not support different widths of AVX10 on different cores. AVX10/512 will be limited to P-core only CPUs.
" A “converged” version of Intel AVX10 with maximum vector lengths of 256 bits and 32-bit opmask registers will be supported across all Intel processors, while 512-bit vector registers and 64-bit opmasks will continue to be supported on some P-core processors."​
AVX10 expands what can use the instructions from AVX512 and I think that's probably the biggest advantage and we'll probably see a lot of 256-bit compiling.