Discussion Interesting musings and questions about the Ryzen 5000

So I've been reading the architecture papers on the Ryzen 5000 (Zen 3) architecture.

  • The IO Die in unchanged.
  • Under Zen 2, when a cache miss occurred, it went to the IO Die and then back to the chiplette.
  • Zen 3 stays on the chiplette since it has a unified L3 cache.
Question 1: Is the cross core complex penalty still there if you switch to the 2nd chiplette? (5900X 5950X) The obvious answer is yes. But how much does this affect the IPC? Rarely does a program lock down to 1 core and will often spread out across chiplettes.

Question 2: Since the IO die is unchanged (AMD's wording) and the IO Die is responsible for BUS communications, why is Smart Access Memory limited to Ryzen 5000 series when Ryzen 3000 series is PCIe 4.0 compliant? Is this an artificial limitation?
 
Question 1: Is the cross core complex penalty still there if you switch to the 2nd chiplette? (5900X 5950X) The obvious answer is yes. But how much does this affect the IPC? Rarely does a program lock down to 1 core and will often spread out across chiplettes.
I'd imagine it would be better to start, since there's only one other L3 cache to populate, rather than three more. But if the task runs long enough and the thread bounces around evenly across all cores, then eventually all L3 caches have the data the thread wants. Then it becomes a question of if that thread fills up 16MB in L3 cache.

However this scenario is unlikely to happen given that Windows understands that Zen 2 and beyond have "preferred" cores and Linux apparently won't try to load share (I could be wrong).

EDIT: I'm aware I'm grossly simplifying things, but I'd rather not try to think of the dozens of variables that would inevitably lead down a black rabbit hole

Question 2: Since the IO die is unchanged (AMD's wording) and the IO Die is responsible for BUS communications, why is Smart Access Memory limited to Ryzen 5000 series when Ryzen 3000 series is PCIe 4.0 compliant? Is this an artificial limitation?
Apparently this is just an existing feature rebranded by AMD: View: https://www.reddit.com/r/hardware/comments/jlhg7z/amd_smart_access_memory_what_it_is_and_how_it/


So it's likely just to make people upgrade. Zen 2 owners could get a backport, but there's literally nothing stopping anyone from doing a similar thing.
 
Last edited:
  • Like
Reactions: drea.drechsler
So I've been reading the architecture papers on the Ryzen 5000 (Zen 3) architecture.

  • The IO Die in unchanged.
  • Under Zen 2, when a cache miss occurred, it went to the IO Die and then back to the chiplette.
  • Zen 3 stays on the chiplette since it has a unified L3 cache.
Question 1: Is the cross core complex penalty still there if you switch to the 2nd chiplette? (5900X 5950X) The obvious answer is yes. But how much does this affect the IPC? Rarely does a program lock down to 1 core and will often spread out across chiplettes.

Question 2: Since the IO die is unchanged (AMD's wording) and the IO Die is responsible for BUS communications, why is Smart Access Memory limited to Ryzen 5000 series when Ryzen 3000 series is PCIe 4.0 compliant? Is this an artificial limitation?
Q1...so it seem obvious there's a latency penalty if a process moves between dies. But isn't that mitigated by CPPC and an 'architecture aware' scheduler now? Even with Matisse...although it's not as effective without the monolithic cache architecture.

Q2...my vote is it's purely articificial, and that's what we'll find out as time moves on. It's already known that Gen 4 is artificially limited to 500 series chipsets when we know it will work just fine on 400 series (and 300). It's merely convenient to tie these technologies to up-market platforms for market segmentation purposes. We've seen far, far worse from Intel
 
Q1...so it seem obvious there's a latency penalty if a process moves between dies. But isn't that mitigated by CPPC and an 'architecture aware' scheduler now? Even with Matisse...although it's not as effective without the monolithic cache architecture.

Q2...my vote is it's purely articificial, and that's what we'll find out as time moves on. It's already known that Gen 4 is artificially limited to 500 series chipsets when we know it will work just fine on 400 series (and 300). It's merely convenient to tie these technologies to up-market platforms for market segmentation purposes. We've seen far, far worse from Intel

With pre-emptive OS's the OS Kernel Scheduler determines which core a task goes to when it's "swapped" back in. There are different algorithms for this. For example: Linux handles it a different way than windows. Each method has it's benefits and cost.

But a single task does NOT have to be tied to a specific core, and will often bounce around. (Unless specifically hard coded to a specific core which is RARE.)

I'm thinking the best gaming benchmarks (with a few rare execptions) will without a doubt come out of the 5800X, clock for clock.
 
With pre-emptive OS's the OS Kernel Scheduler determines which core a task goes to when it's "swapped" back in. There are different algorithms for this. For example: Linux handles it a different way than windows. Each method has it's benefits and cost.

But a single task does NOT have to be tied to a specific core, and will often bounce around. (Unless specifically hard coded to a specific core which is RARE.)

I'm thinking the best gaming benchmarks (with a few rare execptions) will without a doubt come out of the 5800X, clock for clock.
Applications cannot be hard coded to a specific core, because applications cannot know ahead of time what kind of system they're running on. As far as I know, processor affinity is also something that's not application programmable either.

However, I'm also certain that OS schedulers are becoming more aware of heterogeneous processing in the CPU itself. That is, it can no longer expect the CPU cores to all perform the same. Linux should certainly be aware of this, given Android has to work on a heterogeneous processing unit. And given that Windows also understands Zen 2 has preferred cores (as well as the Core 0 issue discovered in early 2019), it's also likely that Microsoft made at least some attempt to avoid running threads in cores where locality would be an issue.
 
...it's also likely that Microsoft made at least some attempt to avoid running threads in cores where locality would be an issue.
I'm pretty certain that is the case. HWInfo shows not only the core performance ranking fused in by AMD but also the core performance order (CPPC) that the scheduler uses. The core preference order (CPPC) adjusts the core ranking so that it can keep a thread local to shared resources.