News DDR5 Specification Released: Fast RAM With Built-In Voltage Regulators

Single DIMM now going up to to 4TB with DDR5 LRDIMM die-stacking? Wonder if Intel will still have an itch to charge extra for memory support beyond 1.5TB :)


Right I was thinking the same exact thing. I have seen some PC's, although from a few years ago in the DDR3 Day on servers that have like 1.5 TB of ram. The whole chassie has like RAM Daughter boards ALL OVER THE PLACE! like there are dozens and dozens of sticks of ram. Now all that times 2-3 can fit on one single stick.
 
A few minutes ago "Drve platforms" now reads "Server platforms".
It makes me wonder what are "Drve platforms"?
And what other articles have been modified in the meantime... just like in "1984".
 
Currently, with DDR4; using 2 Rows and Double-Sided + 2 GiB/RAM Package, they can only fit in 64 GiB per DIMM.

You can buy 64 GiB DIMM Modules right now:
https://www.newegg.com/p/pl?N=100007611 601349177 601275379&Order=1

If they go 1x Height DIMM Specs with 2 Rows and Double-Sided + 8 GiB/RAM Module, they should be able to fit in 256 GiB per DIMM.

Remember these specialty "Double-Height" DIMM's?
https://www.gamersnexus.net/guides/3462-zadak-32gb-3200mhz-double-capacity-dimm

If Memmory Manufacturers go 2x Height DIMM Specs with 4 Rows and Double-Sides + 8 GiB/RAM Module, they should be able to fit in 512 GiB per DIMM

But I would bet that would be limited to Enterprise setups at best. Only they can truly benefit from that much RAM.
 
I know this is more a corner(ish) case, but I'm wondering how AMD's APU graphics performance will be affected by this doubling of bandwidth.
DDR5-6400 will likely carry an eye-watering price tag for a while and not make much sense for a budget build. Performance-wise, it'll depend on whether AMD decides to scale up the IGP size to match. DDR5-4800 will likely be the mainstream speed for a while, so I could imagine AMD increasing the shader unit count by 50% on top of Navi IPC gains, which would translate to 60-70% net IGP performance gain.
 
DDR5-6400 will likely carry an eye-watering price tag for a while and not make much sense for a budget build. Performance-wise, it'll depend on whether AMD decides to scale up the IGP size to match. DDR5-4800 will likely be the mainstream speed for a while, so I could imagine AMD increasing the shader unit count by 50% on top of Navi IPC gains, which would translate to 60-70% net IGP performance gain.
Could be, but previous benchmarks proved that past a certain threshold, iGPU performance didn't scale as well with RAM speed, number of unit notwithstanding (that was with 2x00G-3x00G IGPs) making me think that AMD would have to reduce latency between the APU and the RAM for these new throuoghputs to really show an improvement. Now they may already be there with Renoir, we'll have to wait a few months still to be sure.
 
previous benchmarks proved that past a certain threshold, iGPU performance didn't scale as well with RAM speed
You can't scale RAM performance beyond the point where you have ~100% core utilization and IGPs are under-powered by design because they have to leave a fair chunk of memory bandwidth for the CPU. Latency shouldn't be a major concern for GPUs since they have multiple mechanisms to hide. However, with IGPs, you cannot test the IGP independently from the CPU and the CPU is certainly far more latency-sensitive, enough so in most cases to explain away any IGP benchmark differences. You do get nearly perfect IGP performance scaling from single-channel to dual-channel and memory clocks across the budget-friendly range, which clearly indicate that IGPs are typically heavily constrained by memory bandwidth.

(Well, it does vary quite a bit depending on the particular benchmark.)
 
Last edited:
Here we go again , more security holes by new hardware design.
I doubt there is any practical way of exploting "rowhammer" type vulnerabilities since a successful exploit would require:
1- knowing the physical memory address of the memory block you want to attack
2- somehow managing to get a memory allocation for the physical memory rows immediately next to it
3- nuking the system's performance with cache evictions to prevent refresh from occurring normally
4- not accidentally corrupting data with under-refresh of the target memory bank and crashing the system before the attack succeeds
 
You can't scale RAM performance beyond the point where you have ~100% core utilization and IGPs are under-powered by design because they have to leave a fair chunk of memory bandwidth for the CPU. Latency shouldn't be a major concern for GPUs since they have multiple mechanisms to hide. However, with IGPs, you cannot test the IGP independently from the CPU and the CPU is certainly far more latency-sensitive, enough so in most cases to explain away any IGP benchmark differences. You do get nearly perfect IGP performance scaling from single-channel to dual-channel and memory clocks across the budget-friendly range, which clearly indicate that IGPs are typically heavily constrained by memory bandwidth.
I mean past a certain point. Going dual channel gives a huge boost (45%), going from 2133 to 3200 gives yet another one (20%)... But going from 3200 to 3600 is much lower, in the 5% range or so, while it should increase by 10%. Now I would agree with you if we got different percentages between a 3200G and a 3400G, but we don't. Something else is bottlenecking these IGPs and I wouldn't be surprised to learn that latency became the biggest bottleneck (as it stays quite constant regardless of RAM throughput).
 
Now I would agree with you if we got different percentages between a 3200G and a 3400G, but we don't.
From the looks of curves over here: https://www.anandtech.com/show/12621/memory-scaling-zen-vega-apu-2200g-2400g-ryzen/3
The 2200G/2400G IGP see their single largest bumps from 2133 to 2400 and that bump is almost exactly in line with vega11 being ~35% bigger than vega8. Basically, looks like the vega11 IGP was designed for DDR4-2400, therefore you get rapidly diminishing returns from going faster since the IGP wasn't designed to make efficient and effective use of more memory bandwidth than that.

(Well, it does vary considerably between benchmarks with some do scaling quite well to 3200+MT/s while others don't.)
 
Last edited:
Same as CPUs - you get a massive performance hit if your algorithm has wildly scattered dependencies that keep invalidating caches or is organized in chunks larger than what the caches can hold.
Considering the way graphics APIs are made, I wouldn't be surprised if this problem was more prevalent with APU/GPU than with CPU, and GPUs mitigate it with very low latency access to their VRAM.
 
GPUs mitigate it with very low latency access to their VRAM.
DRAM manufacturers don't seem to be putting latency specs (and most other timing information for that matter) in public datasheets anymore. From memory, GDDRx had 10-15% worse overall first-word latency than same-generation DDR due to the extra internal pipelining stages required to hit GDDRx bit rates. Works fine for GPUs since GPUs mostly deal with relatively large data chunks (texture tiles) so the higher bandwidth makes up for latency by completing bursts faster.
 
DRAM manufacturers don't seem to be putting latency specs (and most other timing information for that matter) in public datasheets anymore. From memory, GDDRx had 10-15% worse overall first-word latency than same-generation DDR due to the extra internal pipelining stages required to hit GDDRx bit rates. Works fine for GPUs since GPUs mostly deal with relatively large data chunks (texture tiles) so the higher bandwidth makes up for latency by completing bursts faster.
Yeah, but there is neither sharing nor intermediary memory controller. Word length and bus width discrepancies could be a problem, too.
 
Yeah, but there is neither sharing nor intermediary memory controller.
Well, GPUs have multiple memory channels with the associated controllers, mechanisms to allow shader modules to access all memory, arbitration logic to keep things fair between shader modules, video encoders and decoders, display outputs, SLI bridges where applicable, PCIe and whatever other function blocks that may require memory access there may be, so sharing memory and arbitrating accesses between multiple endpoints aren't anything unique to IGPs either.
 
I doubt there is any practical way of exploting "rowhammer" type vulnerabilities since a successful exploit would require:
1- knowing the physical memory address of the memory block you want to attack
2- somehow managing to get a memory allocation for the physical memory rows immediately next to it
3- nuking the system's performance with cache evictions to prevent refresh from occurring normally
4- not accidentally corrupting data with under-refresh of the target memory bank and crashing the system before the attack succeeds

SBRF allows the System to use other banks while one is refreshing. rings any bells ? I am already thinking of an algorithm.