Question Is CPU cache still relevant?

Roclemir

Honorable
Jun 9, 2016
5
0
10,510
So, with the advent of DDR5 we find ourselves in a position with RAM getting faster than the CPU itself. I am seeing sales of DDR5 sticks rocking out to the tune of 6000mhz. Given that CPUs are only cranking boost speeds of 5.7Ghz, is cache actually still relevant since the CPU can get the data from the RAM in real time?

Perhaps my understanding of cache is wrong. My understanding of cache is this (this is a simplistic explanation):
  • Lvl1 cache matches the CPU speed so it can supply the data at the speed the CPU uses it. Historically making volatile memory at this speed was the most expensive hence why it was always the tiniest. To combat the small amount of lvl1 cache, lvl 2 cache was invented.
  • Level 2 cache was not as fast as lvl1 cache but was bigger than lvl1 and faster than RAM. So larger amounts of data is pulled into lvl2 cache ready for lvl1 to take it when needed.
  • Level 3 cache was brought in with multi-core CPUs as the amount of data required for all those course to process was ever increasing. Level1 remained attached to the core (there were multiple lvl1 caches, 1 per core). Level 2 originally was shared by all the level 1 caches. Then processing requirements increased with the invention of SMT, now level 1 AND level 2 caches were per-core and level 3 caches are shared by all the level 2 caches.
Now we have RAM that is as fast as, if not faster than the CPU. For example, the Ryzen 9 7900x boasts a speed of 4.6Ghz boost to 5.7Ghz. RAM and motherboards are being made capable of handling 5.2ghz, 5.4ghz, even 6ghz. If the RAM is as fast as or faster than the CPU, why not portion of a section of the RAM for direct access by the CPU? no cache needed.

Interested in your thoughts TH.

EDIT: For the tech savvy, you can pretty much swat the word "data" with "processing instruction(s)". You can also assume I understand the concept of pre-fetching.
 
Last edited:
Two x4 PCIe 4.0 NVMe drives in RAID0 can move 15.75GB/s which is faster than a single channel of DDR3-1866. Would you rather use DDR3 or page to such a RAID0 array? Did RAID0 make RAM obsolete?

The reality is your CPU example at such a speed will be waiting hundreds of clocks for any access to main memory, as DDR5-6000 actually runs at an internal clock rate of 750MHz so if you ignore latency, you are looking at a CPU that effectively runs at 750MHz instead of 5.7GHz, then add the 14ns CAS delay for the number of CPU clocks wasted waiting for the first word (this is what prefetching into the cache attempts to paper over).

DDR5 is minimum 8-words so is essentially like RAID0 of four DDR1 banks (750 x 4 = 3000 in DDR for effective SDR equivalent of "6000"). And DDR5-6000 at CAS42 is 14ns, which is more latency than DDR1-400 CAS2 at only 10ns (both are fastest JEDEC standard latencies). Sure, the 8th word from such DDR5 is twice as quick as on DDR1 but that first word delay is a performance killer

--fixed math
 
Last edited:

KyaraM

Admirable
And even withput the excellent technical explanation above, just comparing the 3D cache chips from AMD vs their non-3D cache counterparts should give you all the answer you need. The lowered clock speed might lower productivity as a whole, sure, but you can't really deny that the difference between the two chips is pretty huge when cache is a factor, primarily but not only in gaming. So yes, even if it was only for gaming (which it, btw, isn't), for that alone it would be important to keep cache around since gaming is the main use on the private CPU market.
 
  • Like
Reactions: PEnns

kognak

Commendable
Apr 19, 2021
32
9
1,565
So, with the advent of DDR5 we find ourselves in a position with RAM getting faster than the CPU itself. I am seeing sales of DDR5 sticks rocking out to the tune of 6000mhz. Given that CPUs are only cranking boost speeds of 5.7Ghz, is cache actually still relevant since the CPU can get the data from the RAM in real time?

Now we have RAM that is as fast as, if not faster than the CPU. For example, the Ryzen 9 7900x boasts a speed of 4.6Ghz boost to 5.7Ghz. RAM and motherboards are being made capable of handling 5.2ghz, 5.4ghz, even 6ghz. If the RAM is as fast as or faster than the CPU, why not portion of a section of the RAM for direct access by the CPU? no cache needed.
You forget it takes hundreds of cycles to get data from system memory(and clock speed is half with DDR). Just because clock speeds on surface are similar, it doesn't mean those two operate in sync. Meanwhile L3 latency is around 40 to 50 cycles. In performance metrics the gap in latency is 5-6x, in bandwidth it's more than 10x. Cache is so much faster. Removing cache would completely demolish CPU performance. But it's expensive and not all data is time critical for program execution. So it has to be used sparingly.
The big benefit of very large cache CPUs is that RAM performance becomes largely irrelevant and there's very little point to invest in more expensive high quality RAM. On desktop it's mostly gaming workloads.
 
Those AMD chips use quick SRAM for their extra L3 V-Cache with a latency of 47 clocks for 5800X3D. What's impressive is the extra cache doesn't actually have to be that much quicker than main memory to make a huge difference--last time Intel did that with Broadwell they added 128MB of cheap eDRAM as L4 instead of using SRAM, so latency wasn't all that much lower at about 75% of main DDR3-1600 memory (so ~150 CPU clocks instead of 200 clocks).

Even 5 years afterwards in games it remained competitive with 5-generation newer chips featuring twice the cores running over 1GHz faster on twice as much DDR4-2933 which has nearly the same bandwidth as the cache on Broadwell.

Of course for streaming applications insensitive to latency (unzipping, encoding, rendering) it didn't keep up due to the extremely low clockspeed (it was even 700MHz slower than the previous 4790k also in those charts) and RAM bandwidth, but I think if you want a gaming machine that lasts well into the future, getting a chip with extra cache can make more sense than upgrading platforms twice as often.
 
  • Like
Reactions: KyaraM

Roclemir

Honorable
Jun 9, 2016
5
0
10,510
You forget it takes hundreds of cycles to get data from system memory(and clock speed is half with DDR). Just because clock speeds on surface are similar, it doesn't mean those two operate in sync. Meanwhile L3 latency is around 40 to 50 cycles. In performance metrics the gap in latency is 5-6x, in bandwidth it's more than 10x. Cache is so much faster. Removing cache would completely demolish CPU performance. But it's expensive and not all data is time critical for program execution. So it has to be used sparingly.
The big benefit of very large cache CPUs is that RAM performance becomes largely irrelevant and there's very little point to invest in more expensive high quality RAM. On desktop it's mostly gaming workloads.


First, Thank you for responding. I appreciate it.
Ok, I get 100% of what you are saying.
Cache has the benefit of being on-die or at least very close to it.
What if the motherboard now runs similar architecture between cpu and RAM. We know that electricity does not have latency, so calls to RAM can be processes in real time.

I really think the hold back here is the way we currently conceive cache as the next-to-CPU memory.

If we intentionally designed a motherboard that allowed the CPU to access the memory directly, we would not need CPU cache with the new DDR5 memory sticks.

also: I think the reason it takes hundreds of cycles to get info from RAM. It has to go through the DMC. if all other intermediaries were illiminated, surely the RAM could serve instead of the cache.
 

Roclemir

Honorable
Jun 9, 2016
5
0
10,510
Thank you for your response. I appreciate it.

I think though we need to compare apples with apples
I am talking about today's RAM speeds and todays CPU speeds
I 100% agree with everything you said when CPU speeds were outstripping RAM.
Now RAM is faster than CPUs.

Should we rethink how we're using cache and how motherboards give access of CPUs to RAM.
A little rethink would mean CPUs could have direct access to a portion of RAM that is delivering at the same speed it needs it, just like level 1 cache.
 
Now RAM is faster than CPUs.
Clock speed isn't really relevant here. The only thing that's relevant is the bandwidth and/or latency between the CPU and RAM. You could achieve the more or less same bandwidth by say doubling the bus width and halving the clock speed. Not to mention DDR SDRAM is operating at half the number of the DDR number. So a DDR5-6000 module is only operating at 3000MHz. Things get a little more confusing with say GDDR RAM, as GDDR6 operates at an 8th of their transfer speed by using multiple clock sources, just phase shifted.

Should we rethink how we're using cache and how motherboards give access of CPUs to RAM.
A little rethink would mean CPUs could have direct access to a portion of RAM that is delivering at the same speed it needs it, just like level 1 cache.
The problem with main memory is twofold:
  • It's using DRAM. DRAM requires refreshing, which cuts into the time that you could use to access DRAM. We could switch to SRAM which doesn't require refreshing, but then bit density gets cut by at least a 5th because it requires more transistors to make an SRAM cell.
  • Locality. It's always going to take longer to access something outside of the CPU than it is inside the CPU. Part of it is due to physical locality, but another part of it is needing mechanisms for data to cross clock frequency domains.
 
Last edited:
Again, the DRAM chips in DDR5-6000 operate at 750MHz in parallel banks. DRAM latencies have been regressing and are now worse than 20 years ago. It's the only performance metric of PCs that hasn't improved, and we should be thankful that process shrinks have made cache cheap enough to just throw at the problem to hide it.

I don't know how CPUs could have more direct access to DRAM since the memory controllers were integrated into CPUs in 2003 for AMD and 2008 for Intel.
 
  • Like
Reactions: KyaraM

kognak

Commendable
Apr 19, 2021
32
9
1,565
Thank you for your response. I appreciate it.

I think though we need to compare apples with apples
I am talking about today's RAM speeds and todays CPU speeds
I 100% agree with everything you said when CPU speeds were outstripping RAM.
Now RAM is faster than CPUs.

Should we rethink how we're using cache and how motherboards give access of CPUs to RAM.
A little rethink would mean CPUs could have direct access to a portion of RAM that is delivering at the same speed it needs it, just like level 1 cache.
Nope. RAM is still slow as af. It's slow because in single cell array it's possible to read or write only one bit at the time. Cells are extremely simple, just a transistor&cap and laid out in arrays for high density(as said above) but drawback is performance. SRAM cells are more complex but cells are individually addressable. This makes SRAM cells 10-fold faster but they take a lot more space which is expensive. If you made a 8GB SRAM memory module, it would have price tag around $10k. AMD vcache chip is just 64MB of SRAM and it's 41mm^2. Zen3 chiplet is twice as big. For 8GB you would need same amount 7nm silicon as 64 pieces of 8-core Ryzen 5000 has. This is fundamental reason why one type is used as system memory and another is used as cache.
 
  • Like
Reactions: KyaraM