News Nvidia and AMD to Develop Arm CPUs for Client PCs: Report

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

bit_user

Titan
Ambassador
I've seen rumors of this and benchmarks of Rosetta 2 are generally much improved on the M2, but I've never been able to confirm if Apple made changes specifically for Rosetta
I didn't really understand what this article was saying, until I followed the source link and read the original blog post:


or if it's just because the M2 has a larger L2 that is able to cache more translated instructions.
HUH? I'm pretty sure Rosetta2 works by doing JIT statically. Maybe they do some profile-guided-optimization, too, but the impact of L2 size on Rosetta2 should be comparable to however much it helps native apps.
 
  • Like
Reactions: JamesJones44

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,325
847
20,060
hen Microsoft was forcing you to choice on or the other because the tooling didn't exist to easily support two architectures. These days the tooling exists to support both ARM and x86 simultaneously. It helps with competition which is sorely needed in the desktop/laptop/server spaces.
RISC-V needs to take over ARM's entire position in the entire Computer industry, let's BANKRUPT ARM and force their IP to become public and available for everybody.
 

bit_user

Titan
Ambassador
Windows being available for smartphones did not change anything for competition and it won't in the future either...
Microsoft bungled Windows Mobile/Phone so badly, you almost can't believe it. However, the fundamental problem they couldn't get past is that Android is free. There simply wasn't enough demand for a Windows OS on peoples' phones to justify the added $50 or $100 cost that would've made it a profitable endeavor for Microsoft. Ballmer just completely misjudged that people wouldn't demand a Windows phone the same way they demand an iPhone (i.e. so much they're willing to pay more for it).

ARM has a place in the market and always will have but it's not a competitor to x86 except for the very very low end.
It's already competing very successfully in the cloud.
 
Last edited:
  • Like
Reactions: JamesJones44

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,325
847
20,060
Cloud gaming will be a thing,
LOL, no.

Laws of physics, latency, reliability, and they're not willing to implement high end data servers everywhere so that you experience low lag.

WnGnhjy.png

LOL, they can keep on trying to push "Cloud Gaming", but companies are just burning massive amounts of money.

Plenty of those current cloud gaming companies will go defunct sometime in the future when they run out of $$$.


Why so much hostility?
There can be only 1x major RISC player in the RISC world.

May it be RISC-V.
 
Last edited:

JamesJones44

Reputable
Jan 22, 2021
789
723
5,760
HUH? I'm pretty sure Rosetta2 works by doing JIT statically. Maybe they do some profile-guided-optimization, too, but the impact of L2 size on Rosetta2 should be comparable to however much it helps native apps.
Yes, that is what I meant with the statement, is that the larger L2 might just make it look as though Rosetta 2 got faster when really it's just the general ability to cache more.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
There can be only 1x major RISC player in the RISC world.

May it be RISC-V.
I'm leery of ARM, after their attempt to extort Qualcomm and I like an open and royalty-free ISA for the sake of open and fair competition. However, I'm more interested in getting past x86 than I am in avoiding ARM.

Also, I think RISC-V is a little over-hyped. Due to being royalty-free, it can't be the most innovative ISA, because you can bet there are patents on a lot of the newer & more cutting-edge ideas. It's still good to have a sort of lowest-common-denominator, however.

Not only for the patent-related reasons, but I think more generally RISC-V just isn't the kind of revolution in thinking that we're going to need for the next big leap in computing performance, security, reliability, etc.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,325
847
20,060
I'm leery of ARM, after their attempt to extort Qualcomm and I like an open and royalty-free ISA for the sake of open and fair competition. However, I'm more interested in getting past x86 than I am in avoiding ARM.
EVERYBODY who is a ARM Licensee should be VERY LEERY of what ARM is trying to do.

I'm not hating on x86 like some folks here. I like x86, there's a certain beauty to it's architecture.

Also, I think RISC-V is a little over-hyped. Due to being royalty-free, it can't be the most innovative ISA, because you can bet there are patents on a lot of the newer & more cutting-edge ideas. It's still good to have a sort of lowest-common-denominator, however.
Why do you think I want to liberate all of ARM's patents for the RISC-V world to use.

Not only for the patent-related reasons, but I think more generally RISC-V just isn't the kind of revolution in thinking that we're going to need for the next big leap in computing performance, security, reliability, etc.
I think it is, it's going to be the "Open Source" revolution, but coming to hardware.

For decades, people were poo-poo-ing "Open Source".

Guess, what the vast majority of the server market got taken over by Linux & Open Source.
 

setx

Distinguished
Dec 10, 2014
244
199
18,760
Yes, I will believe Intel's claims, because these seem very credible:
  • "APX-compiled code contains 10% fewer loads and more than 20% fewer stores"
  • there are 10% fewer instructions in APX-compiled code
I also believe those claims. But they tell us pretty much nothing about performance. And not even about code density.

The thing you're missing about move-elimination is that mov instructions still have costs, which correlate to that 10% figure:
  • Wasting memory bandwidth, since they have to be fetch from DRAM.
  • Wasting space in the instruction cache.
  • Wasting instruction decoder bandwidth.
And why do you assume that I've missed that? Let me quote: "Wow, such arrogance!" Re-read, please: "almost free mov". Again: the only real problem is instruction decoding. APX doesn't look very dense with that new prefix, so "10% fewer instructions" can as well be the same space.
That's tested easily enough.
For Intel, not for us.
You can do a simple experiment by restricting a compiler from using certain features on an existing CPU which has them, but you can also model your hypothetical CPU in a simulator and update the compiler to match.
Go ahead, please model that unreleased CPU that we know pretty much nothing about.
For the kind of enhancements that Intel is adding via APX, all you have to do is flip a switch and the compiler automatically utilizes them.
It's funny how exactly the same was said for AVX-512 when it was released. How it made compiler auto-vectorization so much easier and everyone would benefit soon.
So, if you like that, you should welcome our new ARM overlords!
I also wish them swift death.
Again, you're assuming (incorrectly, I might add) that AMD and Nvidia didn't have architectural licenses, before ARM started trying to extort Qualcomm!
You just love to think that your opponent doesn't know something... "Wow, such arrogance!"
Qualcomm didn't have a license? Maybe Nuvia didn't have it? ARM is set to extract as much profit as they can.
 

bit_user

Titan
Ambassador
I think it is, it's going to be the "Open Source" revolution, but coming to hardware.
Don't hold your breath. The economics of chip development don't favor open source.

RISC-V, as I'm sure you know, is open - but not open source. Indeed, that doesn't even make sense - open source applies to a specific implementation, not an entire ISA. And implementing the ISA does not require you to open source your chip's design.
 

bit_user

Titan
Ambassador
Let me quote: "Wow, such arrogance!"
That was in reference to you contradicting Intel. However, I am not Intel and neither are you.

Anyway, I think you're just in damage-control mode, upon seeing you were overly-dismissive of APX. The thing is, nobody cares about your face-saving. You're doing more damage to your reputation than if you'd simply drop the point.

Re-read, please: "almost free mov". Again: the only real problem is instruction decoding. APX doesn't look very dense with that new prefix, so "10% fewer instructions" can as well be the same space.
They actually said the mov reduction resulting from 3rd operand enabled them to recover the additional space needed for the extra prefix. It's right there, in the short article I linked.

"While the new prefixes increase average instruction length, there are 10% fewer instructions in APX-compiled code, resulting in similar code density as before."

It's funny how exactly the same was said for AVX-512 when it was released. How it made compiler auto-vectorization so much easier and everyone would benefit soon.
Anyone who knows anything about compilers wouldn't put these two things on the same level. Additional registers and 3-operand instructions take zero work for compilers to support. Plenty of CPUs already support the latter (even x86 has it, in some instructions). So, it effectively boils down to just changing two ISA-specific parameters that control what instructions the compiler emits.

As for how well compilers utilize conditional instructions, I can't really say - though, they're also nothing new. However, Intel attached no specific metric to that part of the extensions. I'm guessing there are probably cost model parameters you can adjust to help it decide whether to use the former or to insert a branch.

You just love to think that your opponent doesn't know something...
No, because I don't enjoy having to correct wrong statements. I would much prefer to agree with people than correct them. You'll see I'm not shy about clicking the "Like" button, when I do.

Qualcomm didn't have a license? Maybe Nuvia didn't have it?
Sigh. You should read up on the litigation, if you want to get into all of that. Essentially, ARM is saying that Nuvia's architectural license got voided (due to terms in the license, itself), when Qualcomm acquired them. It's not unusual to have non-transferrability clauses, in such contracts. That part is undisputed, as far as I understand. However, what Qualcomm claims is that they can use their architectural license for Nuvia, yet ARM disagrees. Hence, the lawsuit.

ARM is set to extract as much profit as they can.
Yes, prior to the IPO, I think ARM was desperately searching for ways it could boost its revenue projections, since its valuation is partially derived from that.
 

setx

Distinguished
Dec 10, 2014
244
199
18,760
That was in reference to you contradicting Intel.
Blind arrogance continues... Ok, where is the quote of me "contradicting Intel"?
Anyway, I think you're just in damage-control mode, upon seeing you were overly-dismissive of APX. The thing is, nobody cares about your face-saving. You're doing more damage to your reputation than if you'd simply drop the point.
You are absolutely the best at characterizing yourself. I don't even need to add anything.
"While the new prefixes increase average instruction length, there are 10% fewer instructions in APX-compiled code, resulting in similar code density as before."
Basically, you are saying I was completely correct about code density.
 

bit_user

Titan
Ambassador
Blind arrogance continues...
Oh, how uncharacteristically self-aware of you.

Ok, where is the quote of me "contradicting Intel"?

Pretty much the only definite advantage of ARM ISA is their simpler instruction decoding logic and APX, sadly, has nothing to do with that. Sure, more registers is nice but how much that affects performance in modern superscalars is hard to tell. 2op vs 3op arithmetic issue is already kind of solved with almost free mov.
  • Most of APX's features are equivalent to features in AArch64. To say ARM's only definite advantage is in decoding, which APX doesn't address, is to dismiss virtually everything in APX. So, that statement is a contradiction of APX's value proposition.
  • Intel states the value of more GPRs, which is to reduce loads by 10% and stores by 20%.
  • mov is not almost free, as Intel claims they reduced instruction bandwidth by enough to nullify the effect of the REX2 prefix. If you believe instruction or decoder bandwidth is almost free, you are wrong.

You are absolutely the best at characterizing yourself.
It's just one wrong statement after another, from you.

Basically, you are saying I was completely correct about code density.
The only thing you said about code density was:

they tell us pretty much nothing about performance. And not even about code density.

Which was not correct of Intel's full claims, because they indeed said that code density was unchanged.
 

setx

Distinguished
Dec 10, 2014
244
199
18,760
Oh, how uncharacteristically self-aware of you.
Oh, you've started to try, looks fun.

Now let's teach you to read other's people posts:
Most of APX's features are equivalent to features in AArch64. To say ARM's only definite advantage is in decoding, which APX doesn't address, is to dismiss virtually everything in APX. So, that statement is a contradiction of APX's value proposition.
That contradiction is only in your head. I've never dismissed value of APX, only questioned the size of it.

Intel states the value of more GPRs, which is to reduce loads by 10% and stores by 20%.
Unlike your statements, Intel's ones are smart: it tells us nothing about resulting performance changes. Clever technics like mirroring memory operands can achieve similar results, but I have no idea do they cost more or less than directly visible registers.

mov is not almost free, as Intel claims they reduced instruction bandwidth by enough to nullify the effect of the REX2 prefix. If you believe instruction or decoder bandwidth is almost free, you are wrong.
You don't even understand what you are trying to argue... Intel does not claim to reduce decoder bandwidth, re-read your own quote:
"While the new prefixes increase average instruction length, there are 10% fewer instructions in APX-compiled code, resulting in similar code density as before."
code density is similar, so decoder/cache/... bandwidth is also similar. They are actually trying to reduce needed decoder's width, the sore problem that I've mentioned first of all.
Which was not correct of Intel's full claims, because they indeed said that code density was unchanged.
Really desperate? That was direct reply to your quote, not to some "Intel's full claims". And later I've even added that "'10% fewer instructions' can as well be the same space" (= "the same code density"). So, you have to try harder to find contradictions.
 

bit_user

Titan
Ambassador
Unlike your statements, Intel's ones are smart: it tells us nothing about resulting performance changes. Clever technics like mirroring memory operands can achieve similar results,
Again, you're focusing only on the back-end impact. Even if we assume that Zen 2-style memory-renaming implementation makes them free, that doesn't negate the fact that those extra loads & stores waste instruction bandwidth. Given what a bottleneck the decoders can be in x86 CPUs, instruction bandwidth is a real issue. Not to mention the benefits of reducing instruction bandwidth on the cache & memory subsystems.

Furthermore, memory-renaming isn't free to implement. It burns die space & power and adds complexity, which can be a source of bugs. There must be some reason they left it out of Zen 3. It's better just to have more ISA registers (like ARM and RISC-V), so you don't need to do it at all.

You don't even understand what you are trying to argue... Intel does not claim to reduce decoder bandwidth, re-read your own quote:

code density is similar, so decoder/cache/... bandwidth is also similar.
If we take the overhead of the REX2 as given, then mov-elimination does reduce instruction bandwidth relative to not doing it. Furthermore, instruction bandwidth isn't only measured in terms of bytes. The decoders in these CPUs are parallelized, with each port being able to handle a certain number (and type) of instructions per unit of time. So, getting rid of those register-to-register movs means they can spend that time decoding other instructions, instead.
 

setx

Distinguished
Dec 10, 2014
244
199
18,760
Again, you're focusing only on the back-end impact. Even if we assume that Zen 2-style memory-renaming implementation makes them free, that doesn't negate the fact that those extra loads & stores waste instruction bandwidth. Given what a bottleneck the decoders can be in x86 CPUs, instruction bandwidth is a real issue. Not to mention the benefits of reducing instruction bandwidth on the cache & memory subsystems.
I'll just quote your Intel's quote: "resulting in similar code density as before". You again and again fail to understand that statement. "Similar code density" = "similar instruction bandwidth".

Furthermore, memory-renaming isn't free to implement.
No one said it's free.

It's better just to have more ISA registers (like ARM and RISC-V), so you don't need to do it at all.
I don't have enough knowledge to judge that for CPUs on the level of modern x86, and you seem have even less.

Furthermore, instruction bandwidth isn't only measured in terms of bytes.
Surprise: it's measured exactly in bytes per second (or bits). What you are trying to mention is called instruction throughput.

The decoders in these CPUs are parallelized, with each port being able to handle a certain number (and type) of instructions per unit of time. So, getting rid of those register-to-register movs means they can spend that time decoding other instructions, instead.
You've finally arrived at the point that I've mentioned long ago:
They are actually trying to reduce needed decoder's width, the sore problem that I've mentioned first of all.

Honestly, I'm quite tired educating you. Especially when you desperately repeat the same thing and don't re-read previous discussion after learning new bit.

This is my final reply in this thread.
 

bit_user

Titan
Ambassador
I'll just quote your Intel's quote: "resulting in similar code density as before". You again and again fail to understand that statement. "Similar code density" = "similar instruction bandwidth".
As I explained above, there are two ways to look at instruction bandwidth. The first is in terms of raw bytes per defined unit of work. This is what Intel said was unchanged.

The second is to look at the number of instructions, per unit of work. Not only did they say mov's, loads, and stores were reduced, but since we know each instruction is larger due to the prefix byte, it logically follows that if you hold the byte-count the same, the number of instructions must be less.

Why does instruction count matter? Because:
  1. it's instructions which are processed by the parallel decoder ports.
  2. It's instructions which occupy the pipelines.
  3. When you have a load or a store, that ties up the load/store units and performs energy-intensive cache lookups.

So, eliminating instructions before the decoders is a good thing, even if the memory footprint of the code is unchanged.

I don't have enough knowledge to judge that for CPUs on the level of modern x86,
Well, you could increase your knowledge if you'd do things like read the article I linked on APX.

I'm not depending on my own knowledge, but rather what Intel says. However, I do have experience optimizing C code until the point where the compiler starts generating lots of spills. I know, for a fact, that register pressure is a real issue limiting the code compilers can generate.

You've finally arrived at the point that I've mentioned long ago:
What you said was:

Pretty much the only definite advantage of ARM ISA is their simpler instruction decoding logic

...clearly referring to the simpler, fixed-length instruction format of AArch64. Not remotely the same thing as having to decode fewer instructions because 3-operand instructions let you emit fewer register-to-register movs.

Honestly, I'm quite tired educating you.
That's funny, because you have yet to post anything accurate that I didn't already know.

This is my final reply in this thread.
Excellent decision.
 
Last edited:
Status
Not open for further replies.