News AMD microcode improves cross-CCD latency on Ryzen 9000 CPUs — Ryzen 9 9900X and Ryzen 9 9950X cross-CCD latency cut in half to match previous-gen m...

Admin · Sep 17, 2024

AMD has released a new AGESA microcode update for Zen 5 that cuts the cross-CCD latency of the Ryzen 9 9950X and 9900X in half, matching the previous gen Ryzen 9 7950X and 7900X.

AMD microcode improves cross-CCD latency on Ryzen 9000 CPUs — Ryzen 9 9900X and Ryzen 9 9950X cross-CCD latency cut in half to match previous-gen m... : Read more

phxrider · Sep 17, 2024

I thought I remembered reading cross CCD communication was supposed to be improved over Zen 4 somewhere?

helper800 · Sep 17, 2024

phxrider said:
I thought I remembered reading cross CCD communication was supposed to be improved over Zen 4 somewhere?

Did they change the I/O die or infinity fabric speed? I think that is determinative of the latency, no? To be clear I am more of a layman with the minutia of CPU architecture.

The Historical Fidelity · Sep 17, 2024

helper800 said:
Did they change the I/O die or infinity fabric speed? I think that is determinative of the latency, no? To be clear I am more of a layman with the minutia of CPU architecture.

It could be as simple as the data taking the scenic route between the CCDs lol.

On a serious note, it could be that, due to the particular internal test load they were tuning for, it improved performance to sacrifice latency for increased bandwidth. Some workloads prefer large packets of data getting there at the same time than small bits of data getting there really fast then having to wait until the entire data set arrives. But I could be way off.

bit_user · Sep 17, 2024

The article said:
one of AMD's architects revealed that the high latency regression in Zen 5 resulted from new tuning parameters they implemented to help boost performance in workloads the company was testing against. The only problem with AMD's tuning was that it did not reportedly account for synthetic benchmarks, which created the high core-to-core latencies seen on latency-sensitive benchmarks.

Zen 5's cross-CCD latency optimizations also reportedly took a long time to develop due to testing and validation, which is why we only see them now. Thankfully, it appears AMD's latency problems were mostly related to synthetic benchmarks, so there's a good chance we won't see significant performance variance with this new AGESA update in regular multi-threaded workloads.

Um... what?

They're saying they tuned some low-level parameters to optimize real workloads, and not bother about synthetics. How can you then conclude that (apparently) reverting the changes to optimize synthetics won't affect performance on real workloads???

The only scenario I see where we don't see a regression in real workload performance is if the synthetics were hitting some corner case they could handle in relative isolation. That seems a little unlikely, but perhaps we'll find out. On that front:

"Users are reporting that they are getting up to 400-600 points improvement in Cinebench R23. A few users who own the Ryzen 9 9950X also report that both CPU-z and 3DMark CPU benchmarks have seen noticeable uplifts and the best part is that the BIOS runs flawlessly without issues."

Source: https://wccftech.com/amd-agesa-1-2-...-ryzen-9000-cpus-major-performance-increases/

So, it does sound all-around positive. I'm hoping Phoronix will run a broad test suite, where I expect we'll probably see at least a few regressions.

Deleted member 2986452 · Sep 17, 2024

bit_user said:
"Users are reporting that they are getting up to 400-600 points improvement in Cinebench R23. A few users who own the Ryzen 9 9950X also report that both CPU-z and 3DMark CPU benchmarks have seen noticeable uplifts and the best part is that the BIOS runs flawlessly without issues."

Source: https://wccftech.com/amd-agesa-1-2-...-ryzen-9000-cpus-major-performance-increases/

Interesting. My R23 multicore score was 43,323 the other day. Will run again once I get this update and see what kind of improvement I get. MSI is only showing 1.2.0.1 update and it's a trial mode beta version.

Mama Changa · Sep 17, 2024

helper800 said:
Did they change the I/O die or infinity fabric speed? I think that is determinative of the latency, no? To be clear I am more of a layman with the minutia of CPU architecture.

No they did not. That's coming in Strix Halo )new 3nm IO die) and Zen 6. Zen 6 apparently finally fixes the issues caused by having two ccd's and slow IF.

Bikki · Sep 17, 2024

This sounds more like bug fix after rushed release than optimizaton.

usertests · Sep 17, 2024

Bikki said:
This sounds more like bug fix after rushed release than optimizaton.

There's no more pretending that Ryzen 9000 wasn't broken at launch.

Steve Nord_ · Sep 18, 2024

That's great, what's a CCD? At least cite the AMD64 Programmer's Guide?

bit_user · Sep 18, 2024

usertests said:
There's no more pretending that Ryzen 9000 wasn't broken at launch.

No, "broken" means it flatout doesn't work correctly. I think it's more accurate to say it was launched prematurely and without adequate testing & tuning.

Let's not forget that AMD just concurrently launched a new desktop & laptop CPU for the first time! They also added the new Strix Halo product tier, that teams are also working on. They sort of had this tier before, with Dragon Range, but this time it's more than just a desktop CPU in a BGA package.

Also, AMD made two versions of both Zen 5 and Zen 5C cores, this time: one for laptops and one for servers. In the case of the server Zen 5C cores, it seems like they're fabbing them on TSMC N3.

My point is that AMD appears to adding product lines and diversifying the underlying technology at a pretty rapid pace. I think what we're seeing is an effect of that, with some resources being stretched thin. Hopefully, they'll be able to pick up some good talent from Intel, with the layoffs and others' stock options underwater.

NinoPino · Sep 18, 2024

The Historical Fidelity said:
It could be as simple as the data taking the scenic route between the CCDs lol.

On a serious note, it could be that, due to the particular internal test load they were tuning for, it improved performance to sacrifice latency for increased bandwidth. Some workloads prefer large packets of data getting there at the same time than small bits of data getting there really fast then having to wait until the entire data set arrives. But I could be way off.

If this were the case, when you reduce the latency the bandwidth will decrease.
It seems to me more a case of over-conservative settings in the rush of release.

bit_user · Sep 18, 2024

Steve Nord_ said:
That's great, what's a CCD?

It's the compute chiplet, which contains the CPU cores and all the cache*. Currently, there are two types of dies in the multi-chip package versions of AMD CPUs: Compute Complex Dies (CCD) and the I/O Die (IOD).

* In the X3D models, one of the CCDs will have an extra die of 64 MiB L3 cache stack on top of it.

Steve Nord_ said:
At least cite the AMD64 Programmer's Guide?

It might not be in there. That probably focuses on a software view of the system, where as this is hardware. You're welcome to search it for the term.

NinoPino · Sep 18, 2024

Bikki said:
This sounds more like bug fix after rushed release than optimizaton.

A thing that impact performance cannot be considered a bug. Particularly if the impact is so small as it seems.

bit_user · Sep 18, 2024

NinoPino said:
A thing that impact performance cannot be considered a bug. Particularly if the impact is so small as it seems.

That's a grey area, IMO. Whether something is a bug can really depend on the question of intentionality, especially when there's not a clear, objective definition of "broken".

If their original implementation worked as intended, but just disadvantaged some cases exposed by these synthetic tests, then I wouldn't consider it a bug. However, I can imagine some bugs that would manifest in higher latency, simply due to some code that wasn't implemented as intended.

If we take them at their word, then it wasn't a bug. However, if that's true, we should expect to see at least a few cases where the BIOS update results in performance regressions.

thestryker · Sep 18, 2024

bit_user said:
No, "broken" means it flatout doesn't work correctly. I think it's more accurate to say it was launched prematurely and without adequate testing & tuning.

I think this is the worst launch of a product that is actually a good piece of technology I've ever seen. I know we'll likely never find out, but I would love to know why this launch was seemingly slapdash in the manner in which it was executed.

bit_user · Sep 18, 2024

thestryker said:
I think this is the worst launch of a product that is actually a good piece of technology I've ever seen. I know we'll likely never find out, but I would love to know why this launch was seemingly slapdash manner in which it was executed.

The rolling launch delays should've been a warning sign. I was hoping it was something as superficial as what AMD claimed, but it appears they might've have been struggling to come to terms with how underbaked the product really was.

Technology-wise, I think reusing that IO Die was probably the most disappointing part. If Ryzen 9000 had been AMD really pulling out all the stops, then I think it would've been stiff competition for Arrow Lake. As it is, they probably made Intel's job a lot easier.

thestryker · Sep 18, 2024

bit_user said:
The rolling launch delays should've been a warning sign. I was hoping it was something as superficial as what AMD claimed, but it appears they might've have been struggling to come to terms with how underbaked the product really was.

It absolutely seems like something that would have been better with an extra month of validation. Also if marketing the AVX512/branch prediction/power envelope and being transparent about which situations to expect uplift in they'd have been in a substantially better position.

bit_user said:
Technology-wise, I think reusing that IO Die was probably the most disappointing part.

On a pure technology front I agree completely, but from a practicality standpoint I understand it. I think CUDIMM production is running behind which really seems necessary for the frequencies to get up high enough to offset mismatched ratios. To me this seems somewhat like Intel using the MTL graphics tile for ARL instead of moving forward. Of course this has a much bigger potential real world impact than integrated graphics does.

bit_user said:
If Ryzen 9000 had been AMD really pulling out all the stops, then I think it would've been stiff competition for Arrow Lake. As it is, they probably made Intel's job a lot easier.

I'm not sure they could have done much better unless higher clock scaling was on the table. They might have been able to squeeze some better memory performance, but I doubt it would have been set it and forget it like the current 6000/CL30 recommendation seems to be.

I have no doubt they've made Intel's job infinitely easier than it otherwise would have been. AMD released a product to negative/questioning press that now isn't selling well. If I was Intel I'd go press light into the launch and let ARL speak for itself (assuming it's as good as the leaks seem to indicate). Then do a marketing push with third party benchmarking to backstop.

Hotrod2go · Sep 18, 2024

delete

Hotrod2go · Sep 18, 2024

bit_user said:
No, "broken" means it flatout doesn't work correctly. I think it's more accurate to say it was launched prematurely and without adequate testing & tuning.

Let's not forget that AMD just concurrently launched a new desktop & laptop CPU for the first time! They also added the new Strix Halo product tier, that teams are also working on. They sort of had this tier before, with Dragon Range, but this time it's more than just a desktop CPU in a BGA package.

Also, AMD made two versions of both Zen 5 and Zen 5C cores, this time: one for laptops and one for servers. In the case of the server Zen 5C cores, it seems like they're fabbing them on TSMC N3.

My point is that AMD appears to adding product lines and diversifying the underlying technology at a pretty rapid pace. I think what we're seeing is an effect of that, with some resources being stretched thin. Hopefully, they'll be able to pick up some good talent from Intel, with the layoffs and others' stock options underwater.

Strategically launched at the time to hit Intel hard where it hurts, after the RL fiasco & problems that platform had then.

mitch074 · Sep 18, 2024

From echoes I could find here and there, the firmware for Zen 5 was first developed from Zen 2's because the team was more used to that. It's not difficult to think that them engineers had a given number of cycles of latency set for inter CCD communication that was tightened for Zen 3/4, but wasn't backported and ended up in production.
Changing it might have been a matter of minutes, but validation is another matter entirely.

bit_user · Sep 18, 2024

thestryker said:
I'm not sure they could have done much better unless higher clock scaling was on the table. They might have been able to squeeze some better memory performance, but I doubt it would have been set it and forget it like the current 6000/CL30 recommendation seems to be.

In the ChipsAndCheese interview with Mike Clark, he mentioned that some interesting optimizations were left on the table that they could've done, had it only needed to be fabricated at N3.

A Video Interview with Mike Clark, Chief Architect of Zen at AMD

Today’s “article” is a little bit different to what you readers are used to. This article is a transcript of our video interview I conducted with Mike Clark at AMD. This was my fi…

chipsandcheese.com

thestryker said:
I have no doubt they've made Intel's job infinitely easier than it otherwise would have been. AMD released a product to negative/questioning press that now isn't selling well. If I was Intel I'd go press light into the launch and let ARL speak for itself (assuming it's as good as the leaks seem to indicate). Then do a marketing push with third party benchmarking to backstop.

What I hope will happen is that most reviewers will compare against a fully-updated Ryzen 9000 setup, and then the benchmarks will have to speak for themselves.

Also, I expect AMD to do some repricing when Arrow Lake launches, but maybe some of that already happened to try and address weak demand.

The Historical Fidelity · Sep 18, 2024

NinoPino said:
If this were the case, when you reduce the latency the bandwidth will decrease.
It seems to me more a case of over-conservative settings in the rush of release.

That could very well be the case, however, I was just hypothesizing based on AMD’s stated reason of “optimizations for internal test loads”

stuff and nonesense · Sep 18, 2024

The chiplet design makes the AMD product stack scalable, its inherent weakness is its interconnection and latency. Someone somewhere screwed up the timings relative to Zen 4. The company will have learned from this. As people move on and the mess up is slowly forgotten hopefully there will be a few who remain around to be able to say… “we screwed this up before, check your work”. With luck it won’t be seen again.

bkuhl · Sep 18, 2024

What I would like to know is if this eliminates the need for core parking now? Zen4 didn’t require it and if they brought Zen5 latency down to the Zen4 levels, we should be able to cut this sh!t out…

News AMD microcode improves cross-CCD latency on Ryzen 9000 CPUs — Ryzen 9 9900X and Ryzen 9 9950X cross-CCD latency cut in half to match previous-gen m...

Administrator

Distinguished

Judicious

Reputable

Titan

Deleted member 2986452

Guest

Proper

Honorable

Splendid

Commendable

Titan

Reputable

Titan

Reputable

Titan

Judicious

Titan

Judicious

Commendable

Commendable

Splendid

Titan

Reputable

Honorable

Distinguished

Share this page