News Intel Foundry Services Chief Steps Down

Kamen Rider Blade · Dec 19, 2022

bit_user said:
Yes, obviously. They do get wider and deeper as transistor budgets increase, but not as aggressively as Apple.

Apple went "HAM" on the transistor budget and I love them for it.
It shows what can be done when you have 192 KiB of L1$.
Something I've been wanting to see in CPU's for quite a while, but Apple was there first.

Meteor Lake and I think also Arrow Lake will still locate all of the CPU cores on a single tile. Maybe some future generation will separate them onto their own chiplets, but that carries a latency penalty.

Well, we'll see, it doesn't hurt to offer a variety of chiplet types based on what product you intend to cater to.

For sure. Otherwise, why do it?

As a continuation of their ATOM line, and as the eventual replacement for their primary P-cores down the line.
Also as a solution to their power consumption issues.

I'm saying it's still not enough cache to truly make up for the 128-bit memory interface.

They should do some testing to find out. Even some simulation time on a FPGA.

And I'm very interested to see how they're going to use it. I think the big win (requiring a big investment) will be to use it as a faster tier of DRAM, rather than a transparent cache.

L4 Cache is one of the optional modes, less code needs to be re-written.

Memory speeds increase, but so do CPU speeds and core counts. The extra headroom that DDR5 adds is already consumed by the Raptor Lake i9, as you can easily see in the DDR4 vs. DDR5 benchmarks.

And the E-cores would do just fine consuming that extra bandwidth offered by DDR5

As for matching the bandwidth of 4x DDR4-3200, you're looking at theoretical throughput of 104 GB/s. For 2x DDR5-5600 (the fastest supported by Raptor Lake), it's only 89.6 GB/s. And let's not go down a rabbit hole of OC memory. Intel decided DDR5-5600 was the fastest they would support, in this generation. If your hypothetical CPU existed today, that's also what Intel would've likely chosen.

That's fine, as long as the number of total nodes on the RingBus don't exceed 12x Quad E-Core clusters, it should be fine.

Somebody is going to make a new CPU to beat a 3-year-old one? And not long before a successor 2 generations newer is about to be introduced??

It just has to offer that level of performance, with lower power consumption, at a much cheaper price.

There's no bad products, just bad prices.

Bringing Enterprise Performance to the masses.

Memory capacity is also an issue. The 3000-series Threadrippers (non-Pro) supported up to 256 GB, while LGA 1700 can only support 128 GB, and that's at a performance deficit relative to 64 GB. BTW, I'm also reading that the 32-core TR 3970X has 144 MB of cache, which you're not going to be able to match.

While you might not match the L3$ size, I didn't expect it to match, you can obivously improve support by limiting RAM support to only whatever you can fit into 4x DIMM slots.
Given that DDR5 has RAM Packages that intend to go from 1 GiB to 2 GiB, 3 GiB, ... up to 8 GiB.
Future support will of RAM will increase as new, more memory dense DIMM's comes out, with the only limit being how much you can package onto 4x DIMM slots.

bit_user · Dec 19, 2022

Kamen Rider Blade said:
It shows what can be done when you have 192 KiB of L1$.
Something I've been wanting to see in CPU's for quite a while, but Apple was there first.

The reason people don't usually make such big L1 caches is that there's a latency tradeoff.

However, with more general-purpose registers, ARM shouldn't have as many spills and can therefore better afford to take a slight hit on latency. Another huge factor is the lower clockspeed targets of Apple's cores. The latency stays constant in nanoseconds, but translates into fewer clock cycles @ a lower clock speed.

Kamen Rider Blade said:
They should do some testing to find out. Even some simulation time on a FPGA.

They could simulate it, but not using FPGAs. It's better to use an abstract chip model for this, than RTL-level.

Kamen Rider Blade said:
And the E-cores would do just fine consuming that extra bandwidth offered by DDR5

But, if your point is to make a CPU that exceeds Raptor Lake, and Raptor Lake has already maxed-out the DDR5, then you'll plateau at about the same performance as it.

Kamen Rider Blade said:
It just has to offer that level of performance, with lower power consumption, at a much cheaper price.

I doubt it, because I doubt many are buying those old ThreadRippers, unless they really just need the PCIe lanes (which your LGA1700 wouldn't have).

Kamen Rider Blade said:
There's no bad products, just bad prices.

Bad products are ones which are annoying or troublesome to use, due to things like: heat, noise, usability issues, or bugs. If the deficiencies are bad enough, there might be no good price for them.

Kamen Rider Blade said:
Bringing Enterprise Performance to the masses.

Not if it's too bandwidth-starved to have good multithreaded performance, and has bad lightly-threaded performance because no P-cores. Plus, limited to 128 GB (with performance penalties) and only 24 PCIe lanes or whatever.

Kamen Rider Blade said:
you can obivously improve support by limiting RAM support to only whatever you can fit into 4x DIMM slots.

Raptor Lake is already limited to 2 DIMMs per channel, and performance suffers if you have more than 1 DIMM per channel.

Kamen Rider Blade said:
Given that DDR5 has RAM Packages that intend to go from 1 GiB to 2 GiB, 3 GiB, ... up to 8 GiB. Future support will of RAM will increase as new, more memory dense DIMM's comes out, with the only limit being how much you can package onto 4x DIMM slots.

It only matters what DDR5 is supported while the CPU is on sale. Intel doesn't care beyond that, and the motherboard won't validate or support any RAM introduced after the motherboard stops being sold.

More importantly, CPUs are limited in terms of how many address bits they expose. Even if you can put more RAM in the DIMM slots, the CPU might not be able to address it.

Kamen Rider Blade · Dec 20, 2022

bit_user said:
The reason people don't usually make such big L1 caches is that there's a latency tradeoff.

However, with more general-purpose registers, ARM shouldn't have as many spills and can therefore better afford to take a slight hit on latency. Another huge factor is the lower clockspeed targets of Apple's cores. The latency stays constant in nanoseconds, but translates into fewer clock cycles @ a lower clock speed.

That's great for Apple, but I want to see Intel & AMD slowly migrate their designs to 192 KiB L1$ on both Instruction & Data.

That would allow for my vision of Dynamic SMT and appropriate Core Resource segmentation.

They could simulate it, but not using FPGAs. It's better to use an abstract chip model for this, than RTL-level.

Ok, so be it, let's go with that.

But, if your point is to make a CPU that exceeds Raptor Lake, and Raptor Lake has already maxed-out the DDR5, then you'll plateau at about the same performance as it.

The goal is to make a CPU that excels in what the E-cores are naturally good at, massively parallel MT workloads.
They're such adorable little cores, working together in a Quad-Core cluster.

I doubt it, because I doubt many are buying those old ThreadRippers, unless they really just need the PCIe lanes (which your LGA1700 wouldn't have).

LGA1700 has DMI 4.0 x8 lanes, which is just PCIe with a fancy name.
But Z790 Chipset offers PCIe 4.0 x20 lanes & PCIe 3.0x8 lanes.
Effectively PCIe 4.0 x24 in total lanes, but they forced 4x PCIe 4.0 lanes into 8x PCIe 3.0 lanes.
I think that's kind of asinine and should let me, the end user decide, what to do with those lanes, including if I want to double them into PCIe 3.0.
But the bigger issue is getting MoBo vendors to move the PCIe x16 lane slot to the bottom of the MoBo and have the other PCIe slots be "Un-Obstructed" by the modern massive multi-slot video cards that exist today. (Seriously 4-5x slots are starting to become normal at the high end, WTF?)
And tell the MoBo makers to SERIOUSLY, knock it off with the excessive M.2 lanes.

PCIe slots can be converted to support M.2 via Add-in cards, it's a PITA to convert M.2 back to support regular PCIe slots.

Realistically, we only need 2x M.2 slots on any mobo, one on the front, another on the back, directly opposite of the one in the front.

That's it. One M.2 slot for OS Drive, the other for basic Data Drive.

Bad products are ones which are annoying or troublesome to use, due to things like: heat, noise, usability issues, or bugs. If the deficiencies are bad enough, there might be no good price for them.

But I doubt Intel would relase that bad of a product in this day & age on the CPU side.

Not if it's too bandwidth-starved to have good multithreaded performance, and has bad lightly-threaded performance because no P-cores. Plus, limited to 128 GB (with performance penalties) and only 24 PCIe lanes or whatever.

We'll have to see how they fare in the tests, I don't see it having issues with only 10-12x slots on the RingBus being occupied by the E-core Quad-Core clusters.
Intel needs to get working on a improved IMC to support more DIMMs and at faster speeds along with higher capacities.
As for the 24x PCIe lanes off the chipset, see my rant above.

Raptor Lake is already limited to 2 DIMMs per channel, and performance suffers if you have more than 1 DIMM per channel.

Tell Intel to fix their IMC to perform better with more DIMMs.
Or finally follow IBM and use OMI, move the IMC into it's own chip and mount it directly opposite of the DIMM slots for shortest possible trace paths.
And use a Serial Connection to link the IMC to the CPU.

It's time to finally move the last piece of PC Memory, off of a parallel connection and onto a Serial Connection.

It only matters what DDR5 is supported while the CPU is on sale. Intel doesn't care beyond that, and the motherboard won't validate or support any RAM introduced after the motherboard stops being sold.

If the IMC is designed properly, they would be able to support future larger DDR5 RAM Packages since they would be part of the DDR5 spec.

More importantly, CPUs are limited in terms of how many address bits they expose. Even if you can put more RAM in the DIMM slots, the CPU might not be able to address it.

Then Intel needs to raise the limit, that's a limit on their hardware side.
If Enterprise CPU's can address a massive amount of RAM currently, they can design DDR5 IMC to be "Future Ready" since we already know the road map and can predict how many "Maximum Density" DIMMs will come down the line and how much RAM we will eventually be getting in DDR5 world.

bit_user · Dec 20, 2022

Kamen Rider Blade said:
That's great for Apple, but I want to see Intel & AMD slowly migrate their designs to 192 KiB L1$ on both Instruction & Data.

Again, the reason not to is the extra latency hit that's amplified at the higher clockspeeds AMD and Intel use.

Kamen Rider Blade said:
But the bigger issue is getting MoBo vendors to move the PCIe x16 lane slot to the bottom of the MoBo and have the other PCIe slots be "Un-Obstructed" by the modern massive multi-slot video cards that exist today. (Seriously 4-5x slots are starting to become normal at the high end, WTF?)

The farther you move it from the CPU, the more expensive it gets. Especially because PCIe 5.0, right?

Kamen Rider Blade said:
And tell the MoBo makers to SERIOUSLY, knock it off with the excessive M.2 lanes.

Agreed, but I guess they're concerned about a few people who want to make a RAID of their M.2 SSDs.

Kamen Rider Blade said:
Or finally follow IBM and use OMI, move the IMC into it's own chip and mount it directly opposite of the DIMM slots for shortest possible trace paths.
And use a Serial Connection to link the IMC to the CPU.

It's time to finally move the last piece of PC Memory, off of a parallel connection and onto a Serial Connection.

Didn't IBM give in and announce they're adopting CXL memory? That seems to be where the industry is headed.

Another way to go is to use RDIMMs, like severs do. The downside is a slight latency tradeoff and it makes the DIMMs slightly more expensive. I'll bet there's no technical reason why you couldn't support both UDIMMs and RDIMMs in a desktop platform. Intel traditionally has, in some of their workstation and server CPUs.

Kamen Rider Blade said:
If the IMC is designed properly, they would be able to support future larger DDR5 RAM Packages since they would be part of the DDR5 spec.

Intel has no incentive, though. They're launching Meteor Lake next year. Once it's on the market, any limitations in Raptor Lake will serve to create more incentives for upgrades and buying the latest and greatest model.

Kamen Rider Blade said:
Then Intel needs to raise the limit, that's a limit on their hardware side.

Also, more physical address bits means more levels of page table hierarchy. And that makes misses more costly. So, just wiring up the bits isn't "free".

Kamen Rider Blade said:
If Enterprise CPU's can address a massive amount of RAM currently, they can design DDR5 IMC to be "Future Ready"

But they support RDIMMs. There's a downside to driving more address pins, especially with UDIMMs, which is that it should increase the electrical load the IMC has to handle.

Kamen Rider Blade said:
since we already know the road map and can predict how many "Maximum Density" DIMMs will come down the line and how much RAM we will eventually be getting in DDR5 world.

You can't really validate with RAM that doesn't even exist, yet.

Search

News Intel Foundry Services Chief Steps Down

Kamen Rider Blade

Distinguished

bit_user

Titan

Kamen Rider Blade

Distinguished

bit_user

Titan

TRENDING THREADS

Latest posts

Moderators online

Share this page