News AMD Ryzen 5 3500X Rivals Intel Core i5-9400F in Early Listing

I really wish they would write the specs better. Stating 24 PCIe lanes is a bit misleading. It has 24 total but 4 are used for the link to the chipset so that leaves 20 lanes. Of that 16 are dedicated to graphics with the other 4 typically set to be used for NVMe, by most motherboard manufactures. If they want to do it that way then they should also list the i5 as having 20 since it has 16 for graphics and 4 for DMI, the link to the chipset.

I also love the marketing terms like "Gamecache". 32MB seems a bit much and would take up a lot of die space. I doubt it will provide any tangible gaming performance gains but guess we will have to wait and see.

Ahhh marketing. It never fails to amuse.
 

InvalidError

Titan
Moderator
I also love the marketing terms like "Gamecache". 32MB seems a bit much and would take up a lot of die space. I doubt it will provide any tangible gaming performance gains but guess we will have to wait and see.
The 32MB "gamecache" is the same as the 3600-3800 and AMD needs it, along with the doubling of L2 caches, to mitigate the impact of chiplets having 10-20ns worse memory latency than 2nd-gen Ryzen.
 
The 32MB "gamecache" is the same as the 3600-3800 and AMD needs it, along with the doubling of L2 caches, to mitigate the impact of chiplets having 10-20ns worse memory latency than 2nd-gen Ryzen.

That latency is due to the IMC being on a chiplet instead of the CPU die, correct? So the cache is more of a way to make sure it performs at least on par in that regard.

Still marketing is great. What a name for it.
 

InvalidError

Titan
Moderator
That latency is due to the IMC being on a chiplet instead of the CPU die, correct? So the cache is more of a way to make sure it performs at least on par in that regard.
Pretty much.

I don't think the IMC moniker still applies to Zen 2 though as memory controllers got divorced from CPU cores and now reside in the on-package north bridge / MCH.
 
Pretty much.

I don't think the IMC moniker still applies to Zen 2 though as memory controllers got divorced from CPU cores and now reside in the on-package north bridge / MCH.

I would agree. Sort of odd they would move the MC to the chiplet. I would think they would want the MC on the CPU die and the rest on the chiplet as memory latency is always a key issue. However it might have helped the cores themselves in terms of power draw and overall temperatures so there was probably a benefit.

I am not sure Intel will ever do that. I can see them pulling I/O off onto a on package chip but the MC is always the one that gets effected most by latency.
 

InvalidError

Titan
Moderator
Sort of odd they would move the MC to the chiplet.
Not really that odd. Memory is just one particular type of IO and just like any other IO, the front-end circuitry does not scale much with process. Also, one major problem with giving each CPU chiplet its own local memory is heavier performance penalties when cores need to access non-local memory, which is particularly troublesome for consumer software generally oblivious to memory layout. Centralizing all memory controllers in the IO die eliminates issues with non-uniform memory access, albeit at the expanse of 10-20ns worse memory latency for every core.

Since optimizing for one subset of workloads often requires beefing parts of the architecture that may be detrimental to other workloads, such as accommodating workloads with larger cache footprint with larger caches at the expense of increased L2/L3 latency which is detrimental to workloads with small cache footprint that benefit more from low latency, it is impossible to design a chip that is simultaneously superior in every measurable way.

CPU design has always been a game of compromises.
 
Not really that odd. Memory is just one particular type of IO and just like any other IO, the front-end circuitry does not scale much with process. Also, one major problem with giving each CPU chiplet its own local memory is heavier performance penalties when cores need to access non-local memory, which is particularly troublesome for consumer software generally oblivious to memory layout. Centralizing all memory controllers in the IO die eliminates issues with non-uniform memory access, albeit at the expanse of 10-20ns worse memory latency for every core.

Since optimizing for one subset of workloads often requires beefing parts of the architecture that may be detrimental to other workloads, such as accommodating workloads with larger cache footprint with larger caches at the expense of increased L2/L3 latency which is detrimental to workloads with small cache footprint that benefit more from low latency, it is impossible to design a chip that is simultaneously superior in every measurable way.

CPU design has always been a game of compromises.

What would you say to stacked or on die memory?
 

InvalidError

Titan
Moderator
What would you say to stacked or on die memory?
On-die DRAM is generally a no-go since the low leakage current process tech required to make DRAM is not compatible with high-speed logic. You can make logic on DRAM process if you don't mind much lower clock frequencies.

If you stack memory on CPU chiplets to use as local RAM and have more than one such stack in a CPU, you will run into the same NUMA issues as before, albeit with lower overall latencies due to the direct connection between the chiplet's IMC and stacked DRAM, if the OS isn't making sure to keep most data dependencies for processes running on a given core in the appropriate chiplet's RAM address space.
 

MasterMadBones

Distinguished
If you stack memory on CPU chiplets to use as local RAM and have more than one such stack in a CPU, you will run into the same NUMA issues as before, albeit with lower overall latencies due to the direct connection between the chiplet's IMC and stacked DRAM, if the OS isn't making sure to keep most data dependencies for processes running on a given core in the appropriate chiplet's RAM address space.

That's true, but to me it doesn't sound entirely unrealistic for AMD to make an on-package L4 cache somewhere in the future. That would be tied directly to the IO die, not to any of the CCDs. We've seen L4 caches before and they were not so effective, but as a method to bridge the large gap between the different L3 slices and DRAM, a single 2-8Gb DRAM die could prove very useful.
 

InvalidError

Titan
Moderator
That's true, but to me it doesn't sound entirely unrealistic for AMD to make an on-package L4 cache somewhere in the future.
The main problem with DRAM as L4 cache is that DRAM of any type of external interface still has a ~10ns latency of its own from CAS to first data read, still has hefty penalties from having to rewrite the row data register back into DRAM cells whenever you want to address a different memory row since reading DRAM cells is destructive, still requires periodic refresh of every row, etc. A third issue is that having another tier of tag-RAM and cache for memory requests to filter through will increase total latency for everything that ultimately winds up in the memory controller queues.

Intel's 64-128MB eDRAM on Haswell-Skylake chips with Iris Pro IGP did wonders for IGP performance and workloads with core datasets larger than L3 yet smaller than L4. Hurt most other things so Intel scrapped it along the move to DDR4.

The more likely outcome for embedded memory is that we'll have CPUs with 16+GB of embedded memory to run the system on and optional external DRAM/NVDIMM as a very-high-speed (50+GB/s) swapfile.