News AMD to Make Hybrid CPUs, Using AI for Chip Design: CTO Papermaster at ITF World

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
I'm *NOT* paying for useless little slow cores. Forget it. I do not want them. I do not want to be charged for them. They are completely worthless in every last one of my applications both at home and at the office. Give me my 16 fast cores and ah heck-off with the rest.
That's what people where saying about HTT/SMT more than 20 years ago...
If you have any software that can use 16 full cores than you have software that can and will use the small cores.
 

abufrejoval

Reputable
Jun 19, 2020
189
82
4,660
AMD's success in establishing extensions to the x86 architecture is limited to the one big 64-bit splash, all other attempts, including HSA, which would be most comparable, fizzled. Not because they were all bad, but because they weren't leading the market.

On one hand proprietary extensions for cloud service providers are crucially important for them, on the other hand hitting a spot that is general purpose enough for the wider audience will be fraught with compromise and the bane of backward compatibility, which the hyperscalers care much less about.

Going the ISA road with x86 is far tougher than RISC-V, going with some really heterogeneous extensions has its own risks in terms of design errors, security, longevity etc.

The fragmentation issue will only become bigger, but Intel is trying to shake AMD off going off ISA and outside the IP sharing obligation.

I fear a lot of dead silicon like the AVX-512 blocks on recent Intels or 3DNow! on earlier AMDs ahead.
 

abufrejoval

Reputable
Jun 19, 2020
189
82
4,660
I'm *NOT* paying for useless little slow cores. Forget it. I do not want them. I do not want to be charged for them. They are completely worthless in every last one of my applications both at home and at the office. Give me my 16 fast cores and ah heck-off with the rest.
Quite literally my thoughts until I couldn't resist anyway and got myself an Alderlake NUCalike based on an i7-12700H, a 45Watt SoC officially, very cheap, a full mainboard including the SoC for less than the original price for the SoC alone.

Of course I'm running it mostly on EL8, nothing that even begins to understand E/P, but I got numactl in case I really care and it's not just all cores full ahead.

The real story is, those pesky little e-cores get quite a bit of work done, these aren't Atom cores any more, of which I have aplenty right up to Jasper Lake. And they do it with much less Watts per core than the big P-cores.

They are still able to strech their legs, when the equivalent P-cores would be running below the CMOS-knee, no longer delivering the best value per Watt. On my Ryzen 7 5800U at 15 Watts with 8 cores busy, they get so little juice they have to clock below 2GHz and cease to beat an Intel E-core on the same power envelope. And at that point Intel can draw even or even move ahead at truly parallel workloads, which obviously shouldn't really run on a notebook.

But on a µ-server using that same notebook chip it gets compile jobs done significantly faster than its older Tiger Lake cousin, which is basically 4 P-cores, only.

The big issue is obviously price: Intel unfortunately isn't giving away these E-cores for free, like they used to with their iGPUs. The surcharge for an i7 with a full set of Es is quite simply too much, especially when it's very hard to believe that the lesser SKUs are actually binned for defective E-cores: P-cores are much more likely to be stillborn.

In my case that was offset by the fact that Intel is on Raptor Lake now and sells Alder Lake to Chinese vendors on surplus prices. That puts the i7-12700H at an i5-12500T price (6P zero E) so now I truly get the 8 E-cores for free!

And at that point, it's much cheaper than getting another box with an i3-N305 (0P8E).
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,143
660
20,060
If we went by TechPowerUp's assessment of the i9-12900K:
  • P-cores only provided 152% over the E-core only baseline
  • P+E cores provided 222% over the E-core only baseline
4 E-cores take up roughly the same die space as a P-Core, so even if you got rid of the E-cores and put P-cores in their place (which would bring you up to 10 P-Cores), doing some basic math, you'd only get about 190% better performance over the E-core only baseline. And I believe 4 E-cores also take up the same power as a P-core.

So if you'd rather have less performance overall, be my guest.
Now imagine if Intel split their DeskTop CPU line into a pure P-Core & E-core SKU's
Leave Hybrid to the Mobile CPU's where it makes the most sense.

Ko0b7gX.jpg
Imagine having one CPU with 40x E-cores
Now imagine the other CPU with 12x P-cores.

You'd get CPU's suited for completely different tasks, specialists in their own little realm.

P-cores for gaming, ST work-loads, low # of Thread Counts.

E-cores for Massively MT professional work-loads, simple home server usage, personal render machines.
3GSql1K.png

The big issue is obviously price: Intel unfortunately isn't giving away these E-cores for free, like they used to with their iGPUs. The surcharge for an i7 with a full set of Es is quite simply too much, especially when it's very hard to believe that the lesser SKUs are actually binned for defective E-cores: P-cores are much more likely to be stillborn.

In my case that was offset by the fact that Intel is on Raptor Lake now and sells Alder Lake to Chinese vendors on surplus prices. That puts the i7-12700H at an i5-12500T price (6P zero E) so now I truly get the 8 E-cores for free!

And at that point, it's much cheaper than getting another box with an i3-N305 (0P8E).
No CPU/Silicon/Transistor should be given away for free.
It costs money to R&D / Manufacture everything.

It's how they segment their technology that is the problem.

Hybrid setups make ALOT of sense for Mobile where you're Battery & Thermal constrained.

On DeskTop, you're not Battery constrained, only Thermal constrained.

I would prefer a "Pure P-Core" or "Pure E-Core" CPU and let me make two different PC's.

Each one specializing in different tasks and focus.
 
Last edited:

Amdlova

Distinguished
These e cores today have same IPC than old xeons have... why so much hate? Intel and amd are done with big cpu... amd melting cpu with motherboard... Intel melting the wall socket. Need more e cores lot of them
 

bit_user

Champion
Ambassador
When it comes to the latest XDNA AI Engine (not specifically generative AI), the Ryzen AI co-processor might boost boost AI capabilities even on Ryzen consumer CPUs. But one major hurdle/red flag is the cost and overall value proposition of the chip, and using AI in consumer space makes little sense.
Did you say the same things about Intel's GNA block, in their CPUs? How about the NPUs in Apple's SoCs or Qualcomm's Snapdragons? Your points should apply somewhat equally.

But adding Ryzen AI to a Threadripper chip with high core counts might be a good idea, even then, while it might be used for training purposes, it may not necessarily use it.
It sounds to me like you don't have a very good grasp of what XDNA is or what applications it's intended to address. It is not for training, and it's not even clear to me that it will be interesting for most of the generative AI stuff people are doing.

The major barrier to implementing Ryzen AI is the cost & that there has to be a good enough and actual reason to put Ryzen AI within budget chips & even desktop SKUs.
Do you think power-efficient background noise & image removal from video calls is useful? How about highly-accurate speech recognition and synthesis? We could also see AI-optimized video compression becoming more commonplace.

I think the main driving force that can overcome the value & cost barrier will be the "software". As software evolves and makes use of AI better and adds more value, then there definitely becomes a good reason to have dedicated AI hardware blocks on your chips. Otherwise not.
Speaking of the cost, do you know how big it is? This slide suggests the cost should be minimal.
KXqEFnhytXQQjvfKYH26JJ.jpg

Note the focus on real-time and efficiency, which helps explain why this is first appearing in a laptop CPU. They go on to give some further details:
QpjDjdD7s7fvG4EPtUVDNJ.jpg
N5TJ9bmMxyn2aV8TZX6JTJ.jpg
 
Last edited:
How do they get away with claiming that its the first integrated AI engine on an x86 ??
Just because intel did it on the iGPU?
Speaking of the cost, do you know how big it is? This slide suggests the cost should be minimal.
KXqEFnhytXQQjvfKYH26JJ.jpg
That's not the only cost and probably the smallest one of them, they had to give up a large part of their company to get this tech, this will pay out if more people buy their CPUs for the AI but if people would have prefered this space to be a few more cores , or just get a cheaper chip, because they have no use of AI then it might not turn out that great. You see how many people don't like the e-cores and think they are a waste of space and those actually run x86 code.
 

waltc3

Reputable
Aug 4, 2019
392
203
5,060
People may have missed the fact that Intel still uses a monolithic CPU design whereas AMD uses chiplets, and both Su and Papermaster have both emphasized recently that AMD is forging ahead with its chiplet CPU architectures. So whatever AMD does, it will not be imitating Intel's E cores, you can be sure...;) It also seems forgotten that Intel's monolithic design E cores are more the product of its internal FAB process node limitations than anything else. If Intel had not used eCores then it's top-performing CPUs would be sucking down 500W max, as opposed to the current 300W maximums.
 
If Intel had not used eCores then it's top-performing CPUs would be sucking down 500W max, as opposed to the current 300W maximums.
That is the dumbest nonsense ever....
You can't make a technology withstand more power than it can withstand just because you want to.
The 330-350W barrier is due to all the underlying tech not being able to support any more than that, no matter how many or few cores you use or how much power they need.

The ryzen has the exact same 330-350W limit, there is just no way to cool a ryzen enough to reach it without exotic cooling which is the only reason why you think that it runs cooler.
7950x = 337W
 
So whatever AMD does, it will not be imitating Intel's E cores, you can be sure...;)
And yet Papermaster said (emphasis adjusted from the original article)
But what you'll also see is more variations of the cores themselves, you'll see high-performance cores mixed with power-efficient cores mixed with acceleration.
That's what an E-core is. It doesn't matter if the processor is using a monolithic design or a chiplet one.

However I doubt AMD's design will separate E-cores and P-cores into their own chiplets.
 
  • Like
Reactions: bit_user

waltc3

Reputable
Aug 4, 2019
392
203
5,060
That is the dumbest nonsense ever....
You can't make a technology withstand more power than it can withstand just because you want to.
The 330-350W barrier is due to all the underlying tech not being able to support any more than that, no matter how many or few cores you use or how much power they need.

The ryzen has the exact same 330-350W limit, there is just no way to cool a ryzen enough to reach it without exotic cooling which is the only reason why you think that it runs cooler.
7950x = 337W
You don't have a clue..I'll just leave it at that.
 

waltc3

Reputable
Aug 4, 2019
392
203
5,060
And yet Papermaster said (emphasis adjusted from the original article)

That's what an E-core is. It doesn't matter if the processor is using a monolithic design or a chiplet one.

However I doubt AMD's design will separate E-cores and P-cores into their own chiplets.
I don't care about that...;) Wait until the product rolls out, 'all I can say. Whatever AMD does it will not be like Intel does it, I can promise you that, entirely...;)
 
My theory to AMD's approach is Performance core chiplet/s and e core chiplet/s.
Mix and match as wanted.
Low end 1 performance chiplet. 6/8 cores
Next 1 p and 1 small e chiplet. 6/8+4
Next 1 p and 1 large e chiplet. 6/8+8
Next 2 p core and 2 small or large e cores.
Just add more chiplets as needed/wanted till you get to the top SKU.

All the technical stuff just blows over my head :??:

I don't care how they make it , I just wan to play with it.:)
 
My theory to AMD's approach is Performance core chiplet/s and e core chiplet/s.
Mix and match as wanted.
Low end 1 performance chiplet. 6/8 cores
Next 1 p and 1 small e chiplet. 6/8+4
Next 1 p and 1 large e chiplet. 6/8+8
Next 2 p core and 2 small or large e cores.
Just add more chiplets as needed/wanted till you get to the top SKU.
While that's what AMD would like us to think how this thing works, you can't really just add more chiplets as desired just like that.

The major problem is what the I/O die supports. The consumer I/O dies only have ports for two chiplets. Sure you could add more ports, but then if they go unused, that eats into the value proposition of the I/O die. And you can't really do something like daisy chaining CPU chiplets because that adds latency and other architectural complexity.

They're probably going to make the CPU chiplet contain both the P and E cores. E-cores are also supposed to take up less space, which you can either use that to "replace" a P-core without a loss of performance (assuming you have enough E-cores in there) or fit them in spaces which would've otherwise gone unused or something (though that's me thinking a lot of this is built like playing nanoscale Tetris)
 
  • Like
Reactions: bit_user
While that's what AMD would like us to think how this thing works, you can't really just add more chiplets as desired just like that.

The major problem is what the I/O die supports. The consumer I/O dies only have ports for two chiplets. Sure you could add more ports, but then if they go unused, that eats into the value proposition of the I/O die. And you can't really do something like daisy chaining CPU chiplets because that adds latency and other architectural complexity.

They're probably going to make the CPU chiplet contain both the P and E cores. E-cores are also supposed to take up less space, which you can either use that to "replace" a P-core without a loss of performance (assuming you have enough E-cores in there) or fit them in spaces which would've otherwise gone unused or something (though that's me thinking a lot of this is built like playing nanoscale Tetris)
Not quite, they can do exactly what they want because of the IF. The only downside will be the latency penalty of going via the IF for cross-cache stuff and heavy context switches between the CCDs, but having the IF is precisely why they can mix and match so much stuff around the package.

Also, AMD's take on an "E" core is not quite the same as Intel from what I've seen. They'll just re-use the cores from the APUs (less cache, different efficiency curve) as "E" cores and leave regular beefed up Zen cores as what you'd call "P". How much space they would save, I have no clue, but a good estimation would be 1/3 less space per core? So it's realistic to think 8 big Zen cores in a CCD would be around 12 little Zen cores in the other? I doubt they want to trim too much out of whatever they come up with, since otherwise they'd have the same problems as Intel with the E-cores missing too much.

The rest about accelerators... That's a trickier one, as they could perfectly live inside the I/O Die, but that thing would become humongous rather soon, so it may as well live in the CCD or in a new chiplet neighbor? They most definitely have interesting options thanks to the IF and, at the same time, have a tricky balancing act to perform.

Regards.
 

abufrejoval

Reputable
Jun 19, 2020
189
82
4,660
No CPU/Silicon/Transistor should be given away for free.
It costs money to R&D / Manufacture everything.
In an ideal market, that would probably be true. But whatever the reason, Intel has given away transistors for free.

And so much of it seems to have been for anti-competitive reasons, that they pretty much permantenly have to prove, that they aren't artifically culling chips just to fit every niche where a competitor might have product. That came from a couple of comments on Anandtech some time ago.

It's the bigger iGPU SkyLake models which got me puzzled first: an Iris Plus iGPU with 48 EUs was about half the die area of then then dual-core mobile chips. And then they required an 64 or 128MB eDRAM chip on the die carrier to provide the extra bandwidth these extra EUs needed.

Yet Intel sold these vastly bigger and more complex SoCs at nearly the same list price as the normal chips... which would be a dramatically better deal in terms of transistors for the buck!

The only problem was that you either had to be Apple to buy them or you'd have to land a clearout deal on the left-overs.

The only other way to get them (or similar gen-8 SoCs) was to buy a NUC: Intel would put these special chips into NUCs even before they reached sunset.

If you compare die area/$, Intel has always charged much less for GPU than for CPU real-estate. IMHO the main reason was to kick out ATI and Nvidia way back then when those still sold chipsets often with integrated graphics: Intel wanted all that money for themselves and the only way to get there was to charge for the CPU and give iGPU and Northbridge away for free.

That's why I enjoyed the AMD Ryzen revenge so much, AMD took the iGPU real-estate to double the cores and kick Intel where they had hurt them as ATI.

Now Intel can fit about 4 E-cores into the real-estate of a single P-core. Defect logic dictates that the likelyhood of an E-core being hit by a defect should be vastly inferior to a dead P-core. Yet binning culls E-cores by 8, 4 or none, while P-cores rarely seem affected by more than a speed reduction.

That's pure market segmentation, nothing technical that I can believe and exactly what Intel is prohibited from doing: artificial culling of chips to flood all market segments: those deactivated E-cores on those i7 (4 instead of 8) and lesser i5 (0 out of 8) have very little chance of actually being defective (or pushing the chip beyond its TDP limits).

And if Intel charged per die area I would swap the deactivated iGPU from the "F" parts against 16 or so E-cores instead.

But they aren't charging according to the principles you quoted so no chance I'd get a chips like that for what it costs them to manufacture.
It's how they segment their technology that is the problem.

Hybrid setups make ALOT of sense for Mobile where you're Battery & Thermal constrained.

On DeskTop, you're not Battery constrained, only Thermal constrained.

I would prefer a "Pure P-Core" or "Pure E-Core" CPU and let me make two different PC's.

Each one specializing in different tasks and focus.
Well that's how I might use Alder-Lake basically. But I don't run things in bare metal boxes any more, I use VMs or containers. And for those I could partition my cores into less latency sensitive and more interactive workloads effectively creating the two types of machines you mention as VMs or CGROUPs, with quite a bit more flexibility and reconfigurability in case workloads change.

With numactl you can already very easily switch your hybrid system into only P or only E parts or control the mix. Cool for experimentation, the practical advantages would require cloud scale or severe energy constraints to pay off the effort of management, I'm afraid.
 
  • Like
Reactions: bit_user
Not quite, they can do exactly what they want because of the IF. The only downside will be the latency penalty of going via the IF for cross-cache stuff and heavy context switches between the CCDs, but having the IF is precisely why they can mix and match so much stuff around the package.
They can only add as many chiplets as there are IF ports on the I/O die. So to a certain extent, yes, they can mix and match however they want. But the thing is, how many ports there are limits how many configurations they can do. But if they wanted to be able to do something like 2 P-core chiplets and 2 E-core chiplets, then they'd need four IF ports on the I/O die to do so. And an IF port is an IF port, so if they don't populate enough ports with something, adding those extra ports hurts the value of the I/O die more. At least that's how I'd see it.

I also don't really see value in making separate chiplets for P or E cores exclusively. Especially as you pointed out there's a latency penalty if they have to share data.

Also, AMD's take on an "E" core is not quite the same as Intel from what I've seen. They'll just re-use the cores from the APUs (less cache, different efficiency curve) as "E" cores and leave regular beefed up Zen cores as what you'd call "P". How much space they would save, I have no clue, but a good estimation would be 1/3 less space per core? So it's realistic to think 8 big Zen cores in a CCD would be around 12 little Zen cores in the other? I doubt they want to trim too much out of whatever they come up with, since otherwise they'd have the same problems as Intel with the E-cores missing too much.
The APUs as far as I can tell still use the same core as the desktop counterparts. So there's no die space saving here. Also as far as I can tell, AMD hasn't made something to compete against the Atom since Puma.

Now imagine if Intel split their DeskTop CPU line into a pure P-Core & E-core SKU's
Left Hybrid to the Mobile CPU's where it makes the most sense.
I don't see how leaving hybrid CPUs only for mobile applications makes the most sense. Energy consumption is a concern beyond what your little world demands. Imagine an office with hundreds or thousands of desktops. The handful of watts you start to shave off makes a lot more sense when you multiply it by that much. And if we want to go on an even larger scale, sure individually saving maybe a handful of watts doesn't make sense, but when you multiply it by say a neighborhood, then it starts adding up. Or in another way, you can increase the performance but keep the same power envelope.

Besides, I've ran into a lot of occasions where I don't get a performance increase for the power draw anyway. Like it doesn't make sense to me at all to spend say 20W to do something that 5W CPU can do in practically the same amount of time. Similarly, it doesn't make sense to let my video card go to 100% TBP on a game when there's zero performance gain after 75% TBP, and yes, it'll still happily gobble up all the power it wants.

Also focusing on efficiency means you don't have to ask your customers of desktops to need elaborate cooling setups just to use their machines. I don't want a higher midrange CPU to require a 360mm AIO just to be able to use on its default settings. Though I guess both sides are throwing that out the window anyway.

Imagine having one CPU with 40x E-cores
Now imagine the other CPU with 12x P-cores.

You'd get CPU's suited for completely different tasks, specialists in their own little realm.

P-cores for gaming, ST work-loads, low # of Thread Counts.

E-cores for Massively MT professional work-loads, simple home server usage, personal render machines.
And it sounds like you're asking me to build multiple systems when that would cost even more money, especially for tasks that I may do on occasion, but still want it to perform decently well in. e.g,, I'm not building a system to crunch on Handbrake renders because I only need that functionality occasionally. Even if I'm not using those E-cores or P-cores for their supposed role, they still help overall with other tasks.

I tried doing the "having a system for a specific job" thing before. All it gives me is a headache trying to manage multiple machines, especially when it comes to data syncing as I'd like data that I care about at the moment to be local rather than rely on a network thing. The only one I'm willing to tolerate is having a dedicated NAS unit, because it's mostly set and forget. I log in maybe once a week to service any updates, but that's about it.
 
Last edited:

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,143
660
20,060
Well that's how I might use Alder-Lake basically. But I don't run things in bare metal boxes any more, I use VMs or containers. And for those I could partition my cores into less latency sensitive and more interactive workloads effectively creating the two types of machines you mention as VMs or CGROUPs, with quite a bit more flexibility and reconfigurability in case workloads change.

With numactl you can already very easily switch your hybrid system into only P or only E parts or control the mix. Cool for experimentation, the practical advantages would require cloud scale or severe energy constraints to pay off the effort of management, I'm afraid.
In a ideal world, VM's would work with all apps w/o issue.

But it isn't an ideal world, so I prefer to use "Bare Metal" for everything to avoid those weird app incompatibilities, bugs, & issues.

Current software is buggy enough, I don't need the VM to cause more issues on top of that.
 

bit_user

Champion
Ambassador
How do they get away with claiming that its the first integrated AI engine on an x86 ??
Just because intel did it on the iGPU?
I wouldn't count iGPUs that can merely be programmed for AI workloads as an "integrated AI engine", because both AMD and Intel have had programmable iGPUs for a long time.

However, as I already mentioned, Intel has had GNA (Gaussian Neural Accelerator) in a few generations of processors, already. So, I'm agreeing with you in that I think AMD's claim of being first is inaccurate.

they had to give up a large part of their company to get this tech,
If you're talking about the Xilinx acquisition, I think that was actually pretty smart. That business is one of the few profitable parts of AMD, right now. When your stock is over-valued, the smart CEO goes out and makes acquisitions to turn that share price into real revenue growth. That's what happened, here. And I think it's looking like a shrewd move.
 

bit_user

Champion
Ambassador
People may have missed the fact that Intel still uses a monolithic CPU design whereas AMD uses chiplets,
AMD's laptop and previous-gen iGPU-enabled desktop CPUs have been monolithic. Phoenix is monolithic, as well. So, it remains to be seen if they will utilize chiplets in all market segments, or just the higher-end ones.
 

bit_user

Champion
Ambassador
The ryzen has the exact same 330-350W limit, there is just no way to cool a ryzen enough to reach it without exotic cooling which is the only reason why you think that it runs cooler.
7950x = 337W
It's no fair citing overclocking in arguments over power dissipation, or else we should be talking about this:

 
If you're talking about the Xilinx acquisition, I think that was actually pretty smart. That business is one of the few profitable parts of AMD, right now. When your stock is over-valued, the smart CEO goes out and makes acquisitions to turn that share price into real revenue growth. That's what happened, here. And I think it's looking like a shrewd move.
Oh it was a smart move, but that doesn't change the fact that AMD gave up a large part of their shares for it. That's a price for that AI.
It's no fair citing overclocking in arguments over power dissipation, or else we should be talking about this:
This IS what I was talking about, that's the hardware you need for that much power, for 500W on an intel platform you need at least liquid nitrogen, for the ryzen platform you need it for 300-350W , an intel CPU with 16 p-cores could still not use 500W because it would need exotic cooling to manage that.
It would still run at 300-350W because that's all the platform can handle with even high end AIO.
 

bit_user

Champion
Ambassador
Oh it was a smart move, but that doesn't change the fact that AMD gave up a large part of their shares for it.
AMD's stock price is about the same as it was, back when the deal closed. Their market cap is $173.8B and the deal was estimated at about $50B. So, what are you basing it on, when you say they "gave up a large part of their shares"?

That's a price for that AI.
Most of Xilinx' business is not AI. Even in the datacenter, FPGAs have other uses than that. AI might've partially motivated it, but one thing AMD got was a stable revenue stream and a greater presence in the embedded market.
 
  • Like
Reactions: TJ Hooker

TRENDING THREADS