News AMD microcode improves cross-CCD latency on Ryzen 9000 CPUs — Ryzen 9 9900X and Ryzen 9 9950X cross-CCD latency cut in half to match previous-gen m...

Did they change the I/O die or infinity fabric speed? I think that is determinative of the latency, no? To be clear I am more of a layman with the minutia of CPU architecture.
It could be as simple as the data taking the scenic route between the CCDs lol.

On a serious note, it could be that, due to the particular internal test load they were tuning for, it improved performance to sacrifice latency for increased bandwidth. Some workloads prefer large packets of data getting there at the same time than small bits of data getting there really fast then having to wait until the entire data set arrives. But I could be way off.
 

bit_user

Titan
Ambassador
The article said:
one of AMD's architects revealed that the high latency regression in Zen 5 resulted from new tuning parameters they implemented to help boost performance in workloads the company was testing against. The only problem with AMD's tuning was that it did not reportedly account for synthetic benchmarks, which created the high core-to-core latencies seen on latency-sensitive benchmarks.

Zen 5's cross-CCD latency optimizations also reportedly took a long time to develop due to testing and validation, which is why we only see them now. Thankfully, it appears AMD's latency problems were mostly related to synthetic benchmarks, so there's a good chance we won't see significant performance variance with this new AGESA update in regular multi-threaded workloads.
Um... what?

They're saying they tuned some low-level parameters to optimize real workloads, and not bother about synthetics. How can you then conclude that (apparently) reverting the changes to optimize synthetics won't affect performance on real workloads???

The only scenario I see where we don't see a regression in real workload performance is if the synthetics were hitting some corner case they could handle in relative isolation. That seems a little unlikely, but perhaps we'll find out. On that front:

"Users are reporting that they are getting up to 400-600 points improvement in Cinebench R23. A few users who own the Ryzen 9 9950X also report that both CPU-z and 3DMark CPU benchmarks have seen noticeable uplifts and the best part is that the BIOS runs flawlessly without issues."

Source: https://wccftech.com/amd-agesa-1-2-...-ryzen-9000-cpus-major-performance-increases/

So, it does sound all-around positive. I'm hoping Phoronix will run a broad test suite, where I expect we'll probably see at least a few regressions.
 

TeamRed2024

Upstanding
Aug 12, 2024
187
119
260
"Users are reporting that they are getting up to 400-600 points improvement in Cinebench R23. A few users who own the Ryzen 9 9950X also report that both CPU-z and 3DMark CPU benchmarks have seen noticeable uplifts and the best part is that the BIOS runs flawlessly without issues."​

Interesting. My R23 multicore score was 43,323 the other day. Will run again once I get this update and see what kind of improvement I get. MSI is only showing 1.2.0.1 update and it's a trial mode beta version.
 
  • Like
Reactions: bit_user

Mama Changa

Great
Sep 4, 2024
77
42
60
Did they change the I/O die or infinity fabric speed? I think that is determinative of the latency, no? To be clear I am more of a layman with the minutia of CPU architecture.
No they did not. That's coming in Strix Halo )new 3nm IO die) and Zen 6. Zen 6 apparently finally fixes the issues caused by having two ccd's and slow IF.
 
  • Like
Reactions: helper800

bit_user

Titan
Ambassador
There's no more pretending that Ryzen 9000 wasn't broken at launch.
No, "broken" means it flatout doesn't work correctly. I think it's more accurate to say it was launched prematurely and without adequate testing & tuning.

Let's not forget that AMD just concurrently launched a new desktop & laptop CPU for the first time! They also added the new Strix Halo product tier, that teams are also working on. They sort of had this tier before, with Dragon Range, but this time it's more than just a desktop CPU in a BGA package.

Also, AMD made two versions of both Zen 5 and Zen 5C cores, this time: one for laptops and one for servers. In the case of the server Zen 5C cores, it seems like they're fabbing them on TSMC N3.

My point is that AMD appears to adding product lines and diversifying the underlying technology at a pretty rapid pace. I think what we're seeing is an effect of that, with some resources being stretched thin. Hopefully, they'll be able to pick up some good talent from Intel, with the layoffs and others' stock options underwater.
 

NinoPino

Respectable
May 26, 2022
483
301
2,060
It could be as simple as the data taking the scenic route between the CCDs lol.

On a serious note, it could be that, due to the particular internal test load they were tuning for, it improved performance to sacrifice latency for increased bandwidth. Some workloads prefer large packets of data getting there at the same time than small bits of data getting there really fast then having to wait until the entire data set arrives. But I could be way off.
If this were the case, when you reduce the latency the bandwidth will decrease.
It seems to me more a case of over-conservative settings in the rush of release.
 

bit_user

Titan
Ambassador
That's great, what's a CCD?
It's the compute chiplet, which contains the CPU cores and all the cache*. Currently, there are two types of dies in the multi-chip package versions of AMD CPUs: Compute Complex Dies (CCD) and the I/O Die (IOD).

2FXWu6297W26pGiqqBBwRb.png


* In the X3D models, one of the CCDs will have an extra die of 64 MiB L3 cache stack on top of it.

At least cite the AMD64 Programmer's Guide?
It might not be in there. That probably focuses on a software view of the system, where as this is hardware. You're welcome to search it for the term.
 
Last edited:

bit_user

Titan
Ambassador
A thing that impact performance cannot be considered a bug. Particularly if the impact is so small as it seems.
That's a grey area, IMO. Whether something is a bug can really depend on the question of intentionality, especially when there's not a clear, objective definition of "broken".

If their original implementation worked as intended, but just disadvantaged some cases exposed by these synthetic tests, then I wouldn't consider it a bug. However, I can imagine some bugs that would manifest in higher latency, simply due to some code that wasn't implemented as intended.

If we take them at their word, then it wasn't a bug. However, if that's true, we should expect to see at least a few cases where the BIOS update results in performance regressions.
 
No, "broken" means it flatout doesn't work correctly. I think it's more accurate to say it was launched prematurely and without adequate testing & tuning.
I think this is the worst launch of a product that is actually a good piece of technology I've ever seen. I know we'll likely never find out, but I would love to know why this launch was seemingly slapdash in the manner in which it was executed.
 

bit_user

Titan
Ambassador
I think this is the worst launch of a product that is actually a good piece of technology I've ever seen. I know we'll likely never find out, but I would love to know why this launch was seemingly slapdash manner in which it was executed.
The rolling launch delays should've been a warning sign. I was hoping it was something as superficial as what AMD claimed, but it appears they might've have been struggling to come to terms with how underbaked the product really was.

Technology-wise, I think reusing that IO Die was probably the most disappointing part. If Ryzen 9000 had been AMD really pulling out all the stops, then I think it would've been stiff competition for Arrow Lake. As it is, they probably made Intel's job a lot easier.
 
  • Like
Reactions: thestryker
The rolling launch delays should've been a warning sign. I was hoping it was something as superficial as what AMD claimed, but it appears they might've have been struggling to come to terms with how underbaked the product really was.
It absolutely seems like something that would have been better with an extra month of validation. Also if marketing the AVX512/branch prediction/power envelope and being transparent about which situations to expect uplift in they'd have been in a substantially better position.
Technology-wise, I think reusing that IO Die was probably the most disappointing part.
On a pure technology front I agree completely, but from a practicality standpoint I understand it. I think CUDIMM production is running behind which really seems necessary for the frequencies to get up high enough to offset mismatched ratios. To me this seems somewhat like Intel using the MTL graphics tile for ARL instead of moving forward. Of course this has a much bigger potential real world impact than integrated graphics does.
If Ryzen 9000 had been AMD really pulling out all the stops, then I think it would've been stiff competition for Arrow Lake. As it is, they probably made Intel's job a lot easier.
I'm not sure they could have done much better unless higher clock scaling was on the table. They might have been able to squeeze some better memory performance, but I doubt it would have been set it and forget it like the current 6000/CL30 recommendation seems to be.

I have no doubt they've made Intel's job infinitely easier than it otherwise would have been. AMD released a product to negative/questioning press that now isn't selling well. If I was Intel I'd go press light into the launch and let ARL speak for itself (assuming it's as good as the leaks seem to indicate). Then do a marketing push with third party benchmarking to backstop.
 
  • Like
Reactions: NinoPino

Hotrod2go

Prominent
Jun 12, 2023
217
59
660
No, "broken" means it flatout doesn't work correctly. I think it's more accurate to say it was launched prematurely and without adequate testing & tuning.

Let's not forget that AMD just concurrently launched a new desktop & laptop CPU for the first time! They also added the new Strix Halo product tier, that teams are also working on. They sort of had this tier before, with Dragon Range, but this time it's more than just a desktop CPU in a BGA package.

Also, AMD made two versions of both Zen 5 and Zen 5C cores, this time: one for laptops and one for servers. In the case of the server Zen 5C cores, it seems like they're fabbing them on TSMC N3.

My point is that AMD appears to adding product lines and diversifying the underlying technology at a pretty rapid pace. I think what we're seeing is an effect of that, with some resources being stretched thin. Hopefully, they'll be able to pick up some good talent from Intel, with the layoffs and others' stock options underwater.
Strategically launched at the time to hit Intel hard where it hurts, after the RL fiasco & problems that platform had then.
 
From echoes I could find here and there, the firmware for Zen 5 was first developed from Zen 2's because the team was more used to that. It's not difficult to think that them engineers had a given number of cycles of latency set for inter CCD communication that was tightened for Zen 3/4, but wasn't backported and ended up in production.
Changing it might have been a matter of minutes, but validation is another matter entirely.
 

bit_user

Titan
Ambassador
I'm not sure they could have done much better unless higher clock scaling was on the table. They might have been able to squeeze some better memory performance, but I doubt it would have been set it and forget it like the current 6000/CL30 recommendation seems to be.
In the ChipsAndCheese interview with Mike Clark, he mentioned that some interesting optimizations were left on the table that they could've done, had it only needed to be fabricated at N3.


I have no doubt they've made Intel's job infinitely easier than it otherwise would have been. AMD released a product to negative/questioning press that now isn't selling well. If I was Intel I'd go press light into the launch and let ARL speak for itself (assuming it's as good as the leaks seem to indicate). Then do a marketing push with third party benchmarking to backstop.
What I hope will happen is that most reviewers will compare against a fully-updated Ryzen 9000 setup, and then the benchmarks will have to speak for themselves.

Also, I expect AMD to do some repricing when Arrow Lake launches, but maybe some of that already happened to try and address weak demand.
 
Mar 10, 2020
417
378
5,070
The chiplet design makes the AMD product stack scalable, its inherent weakness is its interconnection and latency. Someone somewhere screwed up the timings relative to Zen 4. The company will have learned from this. As people move on and the mess up is slowly forgotten hopefully there will be a few who remain around to be able to say… “we screwed this up before, check your work”. With luck it won’t be seen again.
 
Last edited:
  • Like
Reactions: TeamRed2024

bkuhl

Distinguished
Dec 31, 2007
24
19
18,515
What I would like to know is if this eliminates the need for core parking now? Zen4 didn’t require it and if they brought Zen5 latency down to the Zen4 levels, we should be able to cut this sh!t out…
 
  • Like
Reactions: bit_user