News New Zen 5 128-core EPYC CPU weilds 512MB of L3 cache

Status
Not open for further replies.
Didn't they just say that going from L3 to L3 on different chiplets takes about as long as going to RAM?

For that reason, "32 MB per CCX" remains the important number (96 MB for 3D cache models). 512 MB, or 640 MB with L2 cache are just marketing numbers or trivia.
 
Didn't they just say that going from L3 to L3 on different chiplets takes about as long as going to RAM?
No, they just said it's still faster than going to DRAM!

For that reason, "32 MB per CCX" remains the important number (96 MB for 3D cache models). 512 MB, or 640 MB with L2 cache are just marketing numbers or trivia.
Yes, AMD's L3 cache is segmented, unlike Intel's. So, what really matters is the L3 cache per CCD, but it's easier to compare if you look at it in terms of L3 cache per core.

For me, this article is a nothing burger. All it does is confirm that they're keeping the same base configuration of 4 MiB of L3 per Zen 5 core that AMD has used since Zen 2! I mean, there's absolutely no way they'd realistically regress on that front (leaving C-cores aside).
 
All it does is confirm that they're keeping the same base configuration of 4 MiB of L3 per Zen 5 core that AMD has used since Zen 2!
I don't like that number either, because a core can use all of the L3 cache available to it in the CCX.

The big change between Zen 2 desktop and Zen 3 desktop was unifying the CCX and giving 32 MiB to any of the cores on the chiplet instead of 16 MiB.

Renoir to Cezanne was even more dramatic: It went from 4 MiB available to a whopping 16 MiB, a quadrupling.

The new Strix Point effectively has no improvement for big/fast cores: 16 MiB max, same as the previous generations since Cezanne. Then 8 MiB serving the C cores. So when someone inevitably highlights that it's 24 MiB now (!), there's nothing to be impressed by.

I look forward to 3D cache making it onto the mainstream APUs in the future.
 
  • Like
Reactions: helper800
I don't like that number either, because a core can use all of the L3 cache available to it in the CCX.

The big change between Zen 2 desktop and Zen 3 desktop was unifying the CCX and giving 32 MiB to any of the cores on the chiplet instead of 16 MiB.
That was an architectural optimization, but the amount of L3 per core is what costs money and impacts scaling. That's why I choose to focus on the amount per core.

Renoir to Cezanne was even more dramatic: It went from 4 MiB available to a whopping 16 MiB, a quadrupling.
Yes, I should've qualified that I'm talking just about server CPUs, here. The article was about server CPUs, so I took it as given, but it's fair to point out the ratio didn't hold for APUs.

The new Strix Point effectively has no improvement for big/fast cores: 16 MiB max,
See, that's where we disagree. It increases the number per Big core to 4 MiB, like the chiplet CCDs have, which I think is a material change that should be reflected in its performance.
 
  • Like
Reactions: helper800
As much as that seems like a lot, this is a server CPU and so even as much as 512MB of L3 cache would get used so this would have easily measurable benefits in the data centre space.

On the other hand, that much cache in a desktop environment would be mostly wasted because there's nothing I can think of that could make use of that kind of cache.
 
Didn't they just say that going from L3 to L3 on different chiplets takes about as long as going to RAM?
Who said that? That seems impossible because even cache on a different CCX would be physically closer (and therefore faster) than even the fastest RAM. Compared to all levels of cache, RAM is painfully slow. I really think that you read something wrong because saying that cache is no faster than RAM is like saying that VRAM is no faster than RAM.

It just doesn't seem possible.
 
  • Like
Reactions: jp7189
Who said that? That seems impossible because even cache on a different CCX would be physically closer (and therefore faster) than even the fastest RAM.
Yes, it takes time for electric signals to propagate, but consider that they do so on the order of 150 mm per nanosecond. DRAM latencies are on the order of 100 nanoseconds. Electric signals can propagate about 15 meters in that amount of time. So, the reason DRAM latencies are so long is not because of physical distances.

Where I see this flawed assumption arise most often is when people assume that on-package memory is lower latency because it's physically closer. Do the math, people. In fact, HBM traditionally has longer best-case latencies than DIMMs.
 
Not sure why a few thought that L3 would be slow or have higher latency.

Check out:
https://www.nextplatform.com/2023/0...phire-rapids-xeon-sp-against-amd-genoa-epycs/
https://hothardware.com/news/cpu-cache-explained
https://www.phoronix.com/review/xeon-max-9468-9480-hbm2e/7

L3 latency is always lower than RAM, and a large cache is sometimes better, sometimes not; compared with a smaller cache. It all depends on the workload, fortunately for some gaming is frequently better off with a larger cache; but this is a thread about servers.

Where I would spend my money is on clock frequency, having purchased a CPU with a reasonable size cache (and not paying for -X) a faster clock helps you all the time; while an extremely large cache only help you to a lesser extent.

Still, it's great that they are making these enormous caches, as someday the price will come down. Till then I'll spend my cache on clocks.
 
Not sure why a few thought that L3 would be slow or have higher latency.
I don't get it either. It's like saying that an HDD is faster than an SSD. It's so fundamentally incorrect that I didn't even know how to respond at first. It's like someone saying that the sky is red, you're stunned for a moment that anyone would un-ironically say that. 🤣
 
  • Like
Reactions: Rob1C
Status
Not open for further replies.