Calculating DDR4 specs to exhaust CPU's max memory bandwidth

spaolo

Commendable
Jun 2, 2016
21
0
1,510
Let say I have a single CPU namely 5930K.
Intel states the max memory bandwidth is 68 GB/s

Considering:
a) no overclocking
b) quad channel DDR4 DIMMs (or dual channel if needed for sake of optimization. I understand they don't exist, but imagine pair or quad chips working together where available)
c) motherboard quad channel support
d) mobo have 8 dimm slots
e) DDR4-2133 (as reference to understand the equation but feel free to change this, CAL and true latency with another spec)
f) CAS 15 (just for sake of argument, I know it can be better..)
g) therefore true latency = 14.07 ns
h) I am good with mathematics
i) X99 chipset
l) default setting for most mobo (no RAM sparing, mirroring etc) Asus Deluxe if that helps..
m) dual or single rank DIMMs difference can be overlooked, unless you are a genius and figure this out as well on the equation.

The question is:

how do I mathematically calculate the theoretical (I know it will differ from actual due to a number of factors) optimal memory configuration to use most of those 68 GB/s? That is to have the right amount of memory latency, # of DIMMs and speed to achieve as near as possible those 68 GB/s (I guess capacity enter into effect depending on the average files size I work on..)

I have read many articles but when I think I am close to it, another one comes up and create confusion. Please help I can't get my head around today :/

I would like to know how you get there, formulas would be great so that I can get there myself next time :)

Is of my understanding that:
- overclocking ram brings no benefit in terms of steady performance (5% or more) in everyday use (heavy excel, CAD, video & graphic) is mostly done for sport to squeeze everything out of it (right?)
- latency do count but not sure how far it goes as most reviews shows comparison between very high CL DIMMs, what I would like to see is a real side to side bandwidth comparison between a standard 2400 CL15 vs 2400 CL10 to have a better idea but I can't find it.. if you could point out to an article that would be great, there is a lot on DDR3 but nothing on DDR4
 
Solution


Bandwidth = <Number of IO pins per channel> * <Number of channels> * <Data rate per IO pin>

All x86 platforms have used used a consistent 64 bit memory channel since the early 1990s.

<64 IOs per channel> * <4 channels> * <2,133 megabits per second per IO pin> = 64 * 4 * 2133E6 = 54.6GiB/s

Intel's product listing does show 68GB/s, but I'm not sure how they arrived at this number mathematically. It may be that it's referring to the bandwidth on the microprocessor's system bus rather than the DRAM busses alone. Intel's -E series microprocessors typically have two system busses rather than one. I can't be assed to look over the topology right now or dig into datasheets to find the source of the 68 GiB/s (or GB/s, it's unclear what the base is. damn marketing departments).

CL is masked by interleaving memory operations. Don't get hung up on it. DDR4 was designed to operate with a high CL from the outset.
 
Solution
Ok, that start making sense :) thanks

Now, in this article Corsair have tested a 4790K processor which should only be able (as stated by Intel) to reach only 25.6 GB/s of max memory bandwidth. But on their test they reach up to 37GB/s. How is that possible? I understand a small % of headroom, but that is 50% more! :/
On that same article, a comment even states a 40GB/s transfer rate.

Most of all on that same article, what they are saying is that CL (as you said) makes no difference what-so-ever.

Therefore, to summarize if I get a DDR4-2666 (64*4*2666E6) and have the mobo run it at 2666 (at whichever CL), I would achieve the full 68GB/s (approximately.. which is what they achieved on Corsair test)

Lastly I would assume that, by increasing the MT/s (by OC) the bandwidth would increase proportionally, instead from the corsair article this doesn't happen. I guess it depends on bottle neck and other hd limitations..
 


Intel's advertised max data rate is based on extremely conservative constraints designed to minimize bit error rates and maximize interoperability; it is this range of configurations that Intel warranties. Higher data rates are definitely possible, but this may result in some products exhibiting a bit error rate that Intel deems unacceptable. For mission-critical applications or in servers with oodles of memory this makes perfect sense; for an enthusiast it's nothing noteworthy.

There does exist a point where the column read latency will be long enough that bus cycles will end up going unused due to the memory controller being unable to track the state of the outstanding operations internally. However, I don't know what this value is with respect to Intel's current memory controller design or whether it's even within the programmable CL range for DDR4.
 
Gotcha! Many thanks for your time and patience Pinhedd :)

One more thing if you don't mind. What about the 8 DIMMs on the mobo, would they have any effect on the RAM bandwidth? I know that two DMMs per channel are usually able to maintain the MT's but as I am planning to run them at 2666 using an embedded XMP profile, would they maintain the OC MHz or will it drop to mobo/ram stock or even worse below that? Any experience with this? Is the Asus Deluxe..
 


Each rank that is added to a channel decreases the impedence on the command/address bus which compromizes signal integrity. Most consumer DIMMs are either single rank or dual rank. In practice, four ranks per channel (two dual-rank DIMMs) is the practical upper limit for unbuffered DIMMs. When more thank two ranks are installed, additional tweaking may be necessary above and beyond enabling XMP. This is rarely difficult, and usually involves little more than a slight boost to the supply voltage for the IMC and DRAM. Data rate is usually sustainable.