News AMD Shows New 3D V-Cache Ryzen Chiplets, up to 192MB of L3 Cache Per Chip, 15% Gaming Improvement

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
SRAM may be technically simpler to make but it has ~1/6th the density of DRAM for single-ported SRAM. However, it is common for SRAM to have 2-4 total read+write ports, so you have to duplicate the entire address decoding and data RW matrix a couple of times, making the density much worse.
Multi-ported SRAM cells are very expensive in terms of area. Roughly speaking the area of a SRAM cell grows proportionally to the square of the number of ports. Multiported SRAM is used for register files but not for cache. In general Multi-ported caches are implemented by splitting the cache in banks so that multiple concurrent accesses can be performed provided they access different banks, but, in order to have good density the SRAM cells are single-ported.
 
Looks like you have no clue what tagRAM actually does and how it is completely different from SRAM. tagRAM is content-addressable memory that stores the base address bits for a given cache line and continuously compare that address against all addresses presented to the cache to match addresses with cache lines. That is billions of comparisons per second and uses considerable active power.

As opposed to what? Passive power!

Do you care to put an order of magnitude number to your words (O(0.1), O(1), O(10), O(100) where the number in parentheses has the units of Watts). Oh and a citation or three as to how much power the current Ryzen 64MB L3 uses for tagRAM and the L3 cache in total. Otherwise whatever you say here is mere speculation at this point in time.

Lookup tables likely use somewhere between O(1) and O(10) but very likely much closer to O(1) if I were to (S)WAG a number.
 
As opposed to what? Passive power!
Active power = power used to actively do something, ex.: read/write, which is equal to (activity%) x (effective switching CMOS capacitance of the circuit) x frequency x (voltage squared) / 2
"Passive" (leakage) power = power that gets used even when there is no activity whatsoever.

For large caches, leakage power is a significant problem too. In fact, most results about cache power I have found are primarily concerned about leakage accounting for upward of 60% of power in caches 4MB and up.
 
What's very curious and IMPORTANT to note here is they locked the processor at 4GHz. We all know the 5900X can run at considerably higher speeds here.

10:1 They are thermal throttling. Note the lack of thermals.

Call me crazy but eventually sockets will have cooling from below and above working in a compression fashion where each heat sink (front and back) pull against each other.

View: https://www.youtube.com/watch?v=16s1s0B3eog

Ian Cuttress asked AMD. They said the cache is not sitting on top of cores so thermals shouldn't be an issue.

Cooling has been an issue for a long time. It is only getting worse. I think someone was even looking into having small watercooled pipes running inside the silicon.
 
This has been a thing for over a decade. I don't know what the hold up is other than finding a way to implement this in an economical fashion.
The curse of something like 99% of Earth-shattering new tech: not economically viable at least with currently available technology and materials. We know how to make logic out of graphene, except that doing so requires atomic precision with knocking carbon atoms out of a perfect graphene sheet to replace it with phosphor or other dopant and you need to avoid isotopes that may decay into something else since those could make the circuit fail. The science is there, the technology to make it happen outside the labs isn't.
 
I think someone was even looking into having small watercooled pipes running inside the silicon.

IBM looked into liquid cooling/power delivery. I think nothing came of it because it would mean a complete retooling of all the fabs.

Functional Electronic Packaging and Redox Flow Power were the two proposed solutions. First one was also a foray into optical computing.
 
I would imagine that as significant as the %15 CPU improvement is with this technology for pure compute, this could create a much higher relative performance improvement in APUs in many use cases. Current gen AMD Vega APUs don't support dedicated RAM like Intel(/AMD) Kaby Lake-G did, but removing all that video traffic off the main bus should offer up both system and graphics improvements by giving each core component a non-shared path to memory. I'd imagine the use of the through vias could offer up enough connectivity for even HBM2 connections (if HBM2 were economically feasible for APU level graphics), but certainly enough for embedding DDR6 to run the display, if not a minimum of 2-4GB for game assets if you could squeeze that in.
Or as part of NAVI moving to APUs, it might instead be implemented as a modest Infinity Cache (even RX6600 is reportedly getting 32MB) offering memory access reductions, perhaps even power/performance improvements by enabling effective faster access or use slower/lower power RAM at same effective rate.
 
Status
Not open for further replies.