News Intel Raptor Lake Refresh, Arrow Lake CPU Performance Projections Leaked

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
I don't think that you really do. GDDR is bandwidth optimized at the cost of latency. CPU's care much more about latency whereas GPU's can mask high latency to an extent.

Just look at the 4700S review. The latency was much high and if you look at the benchmarks in the following pages, it doesn't bode well.



There were a few nearly non-existant desktop Broadwells. Only to were even LGA IIRC and it was quickly replaced by Skylake.

I said I wanted one, not that it would be better than anything else. Plenty of hardware out there that is interesting in concept but not amazing either due to price or performance.

I've bought AMD E-350, J and N Celeron class NUCs, and other niche products just to mess with them. This would be more of the same.

A bit disappointed Intel is ending their NUC division, but AMD NUCs are getting more and more enticing. Shouldn't be too long before some Z1 based NUCs show up.


And I am still mad they didn't make Broadwell compatible with Z87 motherboards, would have totally bought one otherwise.
 
Whenever you have DRAM baked into the products, they UPCHARGE the price by ALOT.

WAY more than the actual cost of DRAM.
They don't upcharge for DRAM. They charge whatever they think the market will let them get away with for a given product until consumer pushback convinces them otherwise. If you look at more budget-oriented chromebooks, phones, tablets, etc. where everything is soldered and glued in, the total cost of parts if you were to buy them as one-offs often exceeds the cost of buying the things pre-made.

Once on-package DRAM becomes the norm, they won't be able to charge much of a premium for it anymore if they want to keep selling enough chips to remain in business. On-package memory already saves the cost of a DIMM PCB, on-DIMM buffer chips, DIMM connectors, DIMM pins on the CPU socket and substrate, DIMM traces on the motherboard, all of the support components like resistors and capacitors, separate encapsulation for every chip, etc. AMD and Intel can take their memory-side "profit" from eliminating that chunk of cost from their equation while keeping retail per-GB pricing about the same as DIMMs. Since the memory controller only needs to deal with 0.2mm TSVs to the memory chips instead of 10cm long traces, all of the termination and equalization circuitry in the memory controller can be eliminated, allowing further area, latency and power savings there.

At maturity, on-package DRAM will become a cost-cutting necessity, not a premium feature.
 
  • Like
Reactions: bit_user
I said I wanted one, not that it would be better than anything else. Plenty of hardware out there that is interesting in concept but not amazing either due to price or performance.

I've bought AMD E-350, J and N Celeron class NUCs, and other niche products just to mess with them. This would be more of the same.

A bit disappointed Intel is ending their NUC division, but AMD NUCs are getting more and more enticing. Shouldn't be too long before some Z1 based NUCs show up.


And I am still mad they didn't make Broadwell compatible with Z87 motherboards, would have totally bought one otherwise.

Who wouldn't want an E-350 netbook when they were all the rage and the competition was an in order Atom?
 
  • Like
Reactions: cyrusfox
They don't upcharge for DRAM. They charge whatever they think the market will let them get away with for a given product until consumer pushback convinces them otherwise.
That's even worse, that's literaly Highway Robbery.


At maturity, on-package DRAM will become a cost-cutting necessity, not a premium feature.
And what happens if the customer wants to expand their memory or get new faster RAM?

Too bad, so sad?

What if RAM goes bad, it's not like that isn't a unheard of situation.

Replace the entire CPU?
 
And what happens if the customer wants to expand their memory or get new faster RAM?
If you want more memory, put it in an x16/x8/x4 slot. Speed-wise, you aren't going to beat direct-stacked on-package memory's TBps-scale bandwidth with off-package memory. At least not on a sub-$1000 motherboard and sub-$1000 CPU.

What if RAM goes bad, it's not like that isn't a unheard of situation.

Replace the entire CPU?
The same could be said from when IDE, floppy, parallel, serial, PS/2, etc. interfaces got aggregated from being all discrete logic on separate cards to low-density ASICs on the motherboard, then later consolidated into a single super-IO chip which has become today's south bridge. Same goes with the GPU slot and memory controller going from north-bridge to integrated into the CPU. The likelihood of something going bad within a given chip increases every time you increase transistor count, whatever the reason might be.

Without integration, we'd still have XT-style motherboards where most stuff from serial ports through drive controllers are implemented using 74-series discrete logic chips and a few slightly fancier off-the-shelf ASICs in DIP sockets struggling to run much faster than 10MHz.

Integration is inevitable if you want progress to continue. Both AMD and Intel's newest-and-greatest server chips have 64+GB of HBM. It is only a matter of time before on-package DRAM makes its way into everything that uses DRAM.
 
  • Like
Reactions: bit_user
The same could be said from when IDE, floppy, parallel, serial, PS/2, etc. interfaces got aggregated from being all discrete logic on separate cards to low-density ASICs on the motherboard, then later consolidated into a single super-IO chip which has become today's south bridge. Same goes with the GPU slot and memory controller going from north-bridge to integrated into the CPU. The likelihood of something going bad within a given chip increases every time you increase transistor count, whatever the reason might be.

Without integration, we'd still have XT-style motherboards where most stuff from serial ports through drive controllers are implemented using 74-series discrete logic chips and a few slightly fancier off-the-shelf ASICs in DIP sockets struggling to run much faster than 10MHz.
But like the meme:
k6Noqhh.jpg

There's nothing holding us back technologically from having both:
On-Package Memory
&
Expandable Standard Memory connected via OMI
?

So why not have the best of both worlds?

Include some fast as hell On-Package Memory.

Give the Customer a option to expand past your On-Package Memory Capacity limits.

And we can still have MANY MORE pins available on the CPU for more PCIe lanes.


Integration is inevitable if you want progress to continue. Both AMD and Intel's newest-and-greatest server chips have 64+GB of HBM. It is only a matter of time before on-package DRAM makes its way into everything that uses DRAM.
Server-side products can afford the cost of HBM.

You won't be using that grade of DRAM on consumer chips generally.

You'd be going back down to bog-standard DRAM packages as you see in Apple products & consumer LapTops.

At that point, the performance advantages of on-board RAM is far smaller than HBM.
 
Last edited:
At maturity, on-package DRAM will become a cost-cutting necessity, not a premium feature.
Not only that, but also a power-saving necessity - especially, if you want to continue scaling up bandwidth.

And what happens if the customer wants to expand their memory or get new faster RAM?
With HBM acting like a big cache, the speed of any externel memory you add for capacity isn't a big deal. So, you might as well use CXL for that.

What if RAM goes bad, it's not like that isn't a unheard of situation.

Replace the entire CPU?
DDR5 has on-chip ECC, to deal with the single-bit errors. The DDR5 spec leaves plenty of flexibility for memory manufactures to dial in the desired ratio of ECC to data bits, in order to hit their chosen reliability target. And if you can't extract enough reliability from on-chip ECC, then you can either add more dies or use in-band ECC.
 
Last edited:
There's nothing holding us back technologically from having both:
On-Package Memory
&
Expandable Standard Memory connected via OMI
?
We will, except it'll be CXL.

You can rage all you want, but CXL beat OpenCAPI, just like blu-ray beat HD-DVD. You don't have to accept it - you're free to live in your own bubble, if you want. Just remember that the rest of us live in the real world.
 
So why not have the best of both worlds?
General usefulness vs cost.

When you have a mainstream CPU with 64GB of on-package memory, chances are that a memory-only slot will be wasted board space and CPU pins for the vast majority of people including most power-users. May as well use those pins for extra PCIe slots that stand a much greater chance of being useful for something at some point such as extra NVMes, USB5 card or anything else that might come up in the future. If you want more RAM, you can still get a CXL memory expansion card fitted with however much of whatever DRAM type you like that someome makes a controller chip for. If you need more than 64GB of RAM and 5.0x16 is still too slow for the amount of bandwidth you need out of your 256GB DDR5 RAM expansion, it is likely fair to say you should get an HEDT/workstation/server solution, not a mainstream PC.

Server-side products can afford the cost of HBM.

You won't be using that grade of DRAM on consumer chips generally.
The fundamental structure of DRAM chips hasn't changed since the days of FPM-DRAM: you still have row address decoder that select which memory row you want to activate, a row of sense amplifiers that detect whether the sense line of each column gets pulled high or low by the memory cell on activation to determine whether it is 0 or 1, a row of D-latches storing the sense amplifier's result, a column address mux and decoder to handle reads and writes to those D-latches and write drivers to put the new value back into cells when the memory row gets closed. The only thing that changed in a meaningful way is the external interface and even that isn't drastically different from FPM to HBM or GDDRx.

Which DRAM is cheapest at any given time is all about mass manufacturing. If direct-stacked memory was manufactured in the same volume as DDR5, it would become cheaper than DDR5 for a given capacity due to all of the extra packaging and assembly it eliminates.
 
  • Like
Reactions: bit_user
With HBM acting like a big cache, the speed of any externel memory you add for capacity isn't a big deal. So, you might as well use CXL for that.
I put it the other way: the on-package memory is the the primary system memory pool where the OS tries to keep all active data in, not a cache, and the external DRAM is where least recently accessed pages get evicted to when the primary pool is running low, exactly like a conventional swapfile minus the file system and SSD/HDD overhead and latency.
 
  • Like
Reactions: bit_user
We will, except it'll be CXL.

You can rage all you want, but CXL beat OpenCAPI, just like blu-ray beat HD-DVD. You don't have to accept it - you're free to live in your own bubble, if you want. Just remember that the rest of us live in the real world.
DbvSGPm.jpg

The latency from having to go through the on device memory controller, through the CXL protocol, and through all the layers of PCIe just to reach the CPU is always going to add latency.

There will always be a need for RAM that is as fast and low latency as current Main Memory.

CXL-Memory is going to always play 2nd fiddle to traditional Main Memory due to latency, and the modularity will be more valuable.

While CXL might have beaten OpenCAPI and absorbed it, OMI is now part of the same family of standards.
 
General usefulness vs cost.

When you have a mainstream CPU with 64GB of on-package memory, chances are that a memory-only slot will be wasted board space and CPU pins for the vast majority of people including most power-users. May as well use those pins for extra PCIe slots that stand a much greater chance of being useful for something at some point such as extra NVMes, USB5 card or anything else that might come up in the future. If you want more RAM, you can still get a CXL memory expansion card fitted with however much of whatever DRAM type you like that someome makes a controller chip for. If you need more than 64GB of RAM and 5.0x16 is still too slow for the amount of bandwidth you need out of your 256GB DDR5 RAM expansion, it is likely fair to say you should get an HEDT/workstation/server solution, not a mainstream PC.
Or we can not try to change the current paradigm and just give you MORE DIMM slots to play with thanks to OMI and take all the extra pins as well for PCIe connections.

Win/Win, minimal changes.

With OMI, using all 4x slots on a standard consumer MoBo wouldn't tank your DIMM slot speeds due to seperate memory controllers for every pair of DIMM slots. Ergo you get the maximum Memory Speeds for each pair of DIMMs.

You get to keep your On-Package memory.

If you run out of DIMM slots, you can add in more memory via CXL.

Not everybody is going to want to spend the kind of $$$ on HEDT / WorkStation / Server solution just to have more memory. That's very Bourgeois thinking that the average person should have to spend that kind of money just to have access to more DIMM slots / memory.

Thinking that 64 GiB of on-package is "Good Enough" for you peasants who don't have the money for HEDT / WS / Server parts. We'll just take away your DIMM slots and you can have CXL in it's place. While the stupid Video Card gets so large that it's now taking 5 effing slots and blocking most of the MoBo.

The fundamental structure of DRAM chips hasn't changed since the days of FPM-DRAM: you still have row address decoder that select which memory row you want to activate, a row of sense amplifiers that detect whether the sense line of each column gets pulled high or low by the memory cell on activation to determine whether it is 0 or 1, a row of D-latches storing the sense amplifier's result, a column address mux and decoder to handle reads and writes to those D-latches and write drivers to put the new value back into cells when the memory row gets closed. The only thing that changed in a meaningful way is the external interface and even that isn't drastically different from FPM to HBM or GDDRx.

Which DRAM is cheapest at any given time is all about mass manufacturing. If direct-stacked memory was manufactured in the same volume as DDR5, it would become cheaper than DDR5 for a given capacity due to all of the extra packaging and assembly it eliminates.
All the automation, mass production, sunk costs for DDR5 on DIMMs is done, it's there.

HBM has been around for quite some time, yet the costs still haven't gone down enough that companies would consider implementing them on consumer parts after the disasterously expensive VEGA 56 / 64 / Radeon VII.

And you know that the average main stream consumer won't tolerate that level of "Profit Margins" that a Enterprise-Level memory offers.
 
Last edited:
Real World, CXL Latency will never beat traditional Main Memory.
It doesn't need to. Like @InvalidError said, the HBM would be your main memory and you'd just treat the external RAM similar to swap.

While CXL might have beaten OpenCAPI and absorbed it, OMI is now part of the same family of standards.
I'm sure it's mainly the IP rights that the CXL consortium wanted. The standard itself is dead. I can't find any announcements of OMI or OpenCAPI-based products since then. Can you?

HBM has been around for quite some time, yet the costs still haven't gone down enough that companies would consider implementing them on consumer parts after the disasterously expensive VEGA 56 / 64 / Radeon VII.
HBM hasn't come to consumer CPUs or stayed in consumer GPUs because it simply wasn't needed. That's starting to change.

Also, the problems with Vega 10 and Vega 20 weren't due to HBM.
 
It doesn't need to. Like @InvalidError said, the HBM would be your main memory and you'd just treat the external RAM similar to swap.
Either way, they'll make it work some how.

I'm sure it's mainly the IP rights that the CXL consortium wanted. The standard itself is dead. I can't find any announcements of OMI or OpenCAPI-based products since then. Can you?
The standard itself is just used by IBM only. It's what IBM put out, but nobody else has jumped on because IBM keeps insisting on shoving the Memory controller onto the DIMM. The same fundamental problem that FB-DIMM did in the past when Intel tried to push it.

They went the "Bridge too far" and it's come to bite them.

They weren't willing to go to the compromise solution where the Memory controller is mounted on the MoBo and connected to the DIMM slots. That's a far more practical solution IMO.

HBM hasn't come to consumer CPUs or stayed in consumer GPUs because it simply wasn't needed. That's starting to change.
We'll see, I can see a few products that would need it in the future, but only a select few.

Also, the problems with Vega 10 and Vega 20 weren't due to HBM.
While the performance problems weren't because of HBM, the costs for the memory didn't help, the costs were a driving decision to go back to GDDR5/6
 
Correct. Those are not the actual limitations. It's just like how you can burn 1900W with a Xeon W, if you're willing to clock it high enough:

If they had more cores, and could feed them with enough memory bandwidth, they could simply shave a couple hundred MHz off the all-core clock speed and it'd be fine.
Yeah, a 500W desktop CPU would go over real gang busters...
As for the AIOs part of your comment, you know very well that the power isn't the issue, but rather their heatspreader. You can still cool it fine with a lesser heatsink and not lose much.
And you know very well that the end user doesn't care why they have to pay for extremely expensive cooling, only that they do have to.
Even if they would keep it at 230W and have it run at half speed, they would have performance regression in some things which is never a good thing.
Yes, actually it is. DDR5 is the main reason why the 7950X is 45.5% faster than the 5950X, at multithreaded workloads:
x2cMJc3dGBZ6QUfBE2c6VV.png

Since you like Intel so much, perhaps you'll find the DDR5 advantage on Alder Lake more persuasive:
117496.png

That's a 31.3% and 37.4% advantage, on multithreaded int and float workloads, for DDR5. Yes, the DDR4 just running at stock speeds, but my point is merely to show how bandwidth-hungry these CPUs get, when all the cores & threads are really cranking. And that CPU has only 24 threads.
Wasn't even part of my point but good on you trying to push your agenda...
this just backs up that they would be making a desktop part that would put their server sales in danger, and that's a bigger reason for them not doing it than having to figure out quad channel or whatever.
 
Yeah, a 500W desktop CPU would go over real gang busters...
Like I said, you can keep the same power limit and just dial back the all-core clockspeed by a couple hundred MHz.

How do you think they made the 65W versions?

And you know very well that the end user doesn't care why they have to pay for extremely expensive cooling, only that they do have to.
They don't. You're the only one saying they do. This clearly shows the average performance loss from using a modest air cooler on the 7950X is only a minimal 2.6%:
fan-scaling-noctua.png

Wasn't even part of my point but good on you trying to push your agenda...
And what's my agenda? That we need more memory bandwidth? Because I'm from Big DRAM™?
: D

this just backs up that they would be making a desktop part that would put their server sales in danger,
I don't follow. Which scenario are you talking about: more cores or more memory channels?

I think they're not adding cores because 16c/32t is quite simply enough for the vast majority of mainstream users. Even Intel has maxed out at 32t, so no real difference there. Neither are adding memory channels due to the platform costs it would add.

People who want more of either (cores or memory channels) can buy a workstation, and that will also provide them with more PCIe lanes.
 
Last edited:
Who wouldn't want an E-350 netbook when they were all the rage and the competition was an in order Atom?
E350 was amazing for its time, Intel did not give atom its due(resources, fear of competing with core?) and was really showing its age back when AMD came on scene. AMD would repeat Intel's posturing toward these lower margin/performance sector. Bobcat to kabini cores did not show much uplift and stagnated during the whole bulldozer affair, what was a promising start to the AM1 platform went to nowhere. Back when AMD also made 2 distinctively different cores. AMD bet the farm on zen and now the small cores are the same zen core just stripped down with half the cache...

But Atom is still alive and kicking , from the N270 to the latest Gracemont e-cores that are found in alder/raptor lake or you can find individually in a n100/n300 chip. Wasn't till Goldmont till Atom got OoO execution, tremendous performance improvment in chromebooks as much better media accelerators.

The different strategies of AMD and Intel going forward are fascinating, is AMD right to have everything be unified with instruction sets making smaller cores only with smaller cache? Or will intel's stratgey of making 2 distinctively different cores from the ground up be better received? The mobile wars 2023+ :) Phoenix vs Meteorlake is the start, AMD is taking early lead with all these handheld platform wins and great reception. Is meteor lake really coming this year? 80% of success is showing up
 
Like I said, you can keep the same power limit and just dial back the all-core clockspeed by a couple hundred MHz.
The wraith at 60% is about 130W power and that gives you 4.3 out of 5.2Ghz.
They would lose a full Ghz of speed turning this into a pure server CPU, great for blender and CB but not so great for general desktop work.
Unless they double down on game-mode and make people pay twice the money and then have them turn off half or 3/4 of the CPU if they want to actually use it. Now that's choosing with your wallet...

(is the mem controller still on the CPU? is a quad channel one going to use a lot more power?)
power-cinebench-mt.jpg

clocks-table.png
 
All the automation, mass production, sunk costs for DDR5 on DIMMs is done, it's there.
And all of those costs will come again with DDR6, LPDDR6, GDDR7, GDDR7, GDDR7X, GDDR7W, etc. and every other interface tweak that will come next. Many of those costs will reoccur multiple times during each generation too as DRAM manufacturers tweak their designs to optimize speed, power and yields.

As I wrote earlier, DRAM prices are all about mass manufacturing. DDR4/5 chips get made by the billions each year because it is in everything from Pis to supercomputers. GDDR6 is more expensive because it is much lower volume and HBM is yet more expensive because of the added silicon interposer cost needed to slap it next to whatever uses it, which is only DC/AI stuff at the moment.

We're headed into tile-based architectures where mainstream CPUs have silicon interposers to tie everything together and dies are cribbed with TSVs for backside power distribution. The added cost and complexity of stacking DRAM directly onto whatever needs it won't be relevant when everything non-trivial already has countless TSVs for power. The only meaningful extra cost to slapping DRAM on top at that point will be the DRAM itself as every other cost-adding step is already present regardless.
 
  • Like
Reactions: TJ Hooker
And all of those costs will come again with DDR6, LPDDR6, GDDR7, GDDR7, GDDR7X, GDDR7W, etc. and every other interface tweak that will come next. Many of those costs will reoccur multiple times during each generation too as DRAM manufacturers tweak their designs to optimize speed, power and yields.

As I wrote earlier, DRAM prices are all about mass manufacturing. DDR4/5 chips get made by the billions each year because it is in everything from Pis to supercomputers. GDDR6 is more expensive because it is much lower volume and HBM is yet more expensive because of the added silicon interposer cost needed to slap it next to whatever uses it, which is only DC/AI stuff at the moment.

We're headed into tile-based architectures where mainstream CPUs have silicon interposers to tie everything together and dies are cribbed with TSVs for backside power distribution. The added cost and complexity of stacking DRAM directly onto whatever needs it won't be relevant when everything non-trivial already has countless TSVs for power. The only meaningful extra cost to slapping DRAM on top at that point will be the DRAM itself as every other cost-adding step is already present regardless.
Then you wouldn't mind having On-Package DRAM + Standard Expandable Memory via DIMM slots.

Plenty of LapTops already do this, why can't we just do this on DeskTop as well?

Implementing OMI to get Serialized Memory would re-gain all those extra pins for PCIe lanes which is what we all want.
 
Then you wouldn't mind having On-Package DRAM + Standard Expandable Memory via DIMM slots.
Because DIMMs would be an order of magnitude slower than direct-stacked memory would be, which means you'd run into performance issues with heterogeneous memory if you tried to treat them the same. When you have more memory on-package than most mainstream software will ever actively use (games are the biggest mainstream memory hogs and we're barely at the point where 16GB is getting too tight for comfort), most software won't care that extra memory is on PCIe instead of a dedicated external memory bus.

The most common example of high memory usage in the mainstream is video editing. While 8k video editing may benefit from having 200+GB of RAM to cache clips for timeline scrubbing, that is only a cache to avoid going all the way back through the OS and file system to storage, not memory that is continuously being accessed across a meaningful fraction of its space. Having 256GB of external RAM would still work perfectly fine for this and similar applications over a 5.0x4 link.
 
  • Like
Reactions: bit_user
Because DIMMs would be an order of magnitude slower than direct-stacked memory would be, which means you'd run into performance issues with heterogeneous memory if you tried to treat them the same. When you have more memory on-package than most mainstream software will ever actively use (games are the biggest mainstream memory hogs and we're barely at the point where 16GB is getting too tight for comfort), most software won't care that extra memory is on PCIe instead of a dedicated external memory bus.
Show me where direct Attached DIMMs are a order of magnitude slower than on-package?
So far, On-Package have a obvious latency advantage, but "Order of Magnitude" slower?

Where are you getting that?

Obviously you'd tier the memory when you're mixing Direct Attached Memory along with DIMMs. But laptops that have mixed on-board RAM & DIMMs seem to work fine by co-existing.

The most common example of high memory usage in the mainstream is video editing. While 8k video editing may benefit from having 200+GB of RAM to cache clips for timeline scrubbing, that is only a cache to avoid going all the way back through the OS and file system to storage, not memory that is continuously being accessed across a meaningful fraction of its space. Having 256GB of external RAM would still work perfectly fine for this and similar applications over a 5.0x4 link.
You're telling me that it would work fine over a Higher Latency link on CXL compared to a lower latency direct attached DIMM? The same exact 256 GB of RAM, but slap on significantly more latency?
DbvSGPm.jpg

That people should ditch direct attached DIMM's for CXL?
 
Show me where direct Attached DIMMs are a order of magnitude slower than on-package?
So far, On-Package have a obvious latency advantage, but "Order of Magnitude" slower?

Where are you getting that?
Not just on-package but DIRECT-STACKED, the logical evolution of HBM: eliminate the base die by integrating whatever essential functionality is still necessary directly into whatever the raw DRAM stack gets installed on top of, such as the IO die that contains the memory controller. HBM already goes to 1TB/s per stack, direct-stacked memory should be able to go even higher by ditching the base die and practically eliminating wiring stubs between the DRAM dies and memory controller. DDR5-8000 on the other hand is still only 64GB/s per aggregate channel.

1TB/s vs 64GB/s looks like a solid order of magnitude to me.
That people should ditch direct attached DIMM's for CXL?
Once CPUs have enough direct-stacked memory to handle the active data set? Yes, because the extra memory beyond that is just a glorified swapfile, not seeing much traffic relative to how large it is.
 
  • Like
Reactions: bit_user