Understanding memory timings

MikeJRamsey

Distinguished
Jun 19, 2009
247
0
18,690
I have been trying to understand memory timing numbers and how they work with the Front Side Bus (FSB) and memory speeds to determine memory transfer rates to and from the CPU.

The following table shows the relationship between memory rates, FSB clock rates, and peak transfer rates.

Code:
DDR Speed  	Memory clock  	Cycle time  	FSB Bus clock  	Module name  	Peak transfer rate
DDR3-800 	 100 MHz 	        10 ns 	      400 MHz 	        PC3-6400 	 6400 MB/s
DDR3-1066 	133 MHz 	        7.5 ns          533 MHz 	        PC3-8500 	 8533 MB/s
DDR3-1333 	166 MHz 	        6 ns 	 	  667 MHz 	        PC3-10600 	10667 MB/s
DDR3-1600 	200 MHz 	        5 ns 	 	  800 MHz 	        PC3-12800 	12800 MB/s

DDR, DDR2, and DDR3 are examples of synchronous dynamic random access memory (SDRAM). Synchronous means that the memory timing is driven by the [strike]FSB[/strike] Memory clock rate (with Nehalem, the FSB is no longer involved with memory access). So the memory timing numbers for SDRAMs are in units of clock cycles.

The memory timing numbers are a measure of the latency (i.e. delay) between when a memory action is requested and when it will finish. There are four memory actions whose latencies are indicated by memory timings. From left to right, the integers denoting the latency in the number of memory cycles are:

■Column address strobe latency - elapsed time in clock cycles between the moment a memory controller tells the memory module to access a particular column in a selected row, and the moment the data from the given array location is available on the module's output pins.

■Row to column address delay - elapsed time to move from one row to the next

■Row Precharge time - elapsed time to change the voltage between a one and a zero; computers are binary

■Row Active Time - the number of clock cycles taken between a bank active command and issuing the precharge command

So a memory timing of 7-7-7-20 means that it takes 7 clock cycles to perform each of the first three actions. The row active time (the 4th and last number) is approximately the sum of the first three numbers.

So how does DDR3-800 running at 6-6-6-15 timing compare to DDR3-1333 running at 9-9-9-24? The first chip has slower clock rate (bad) but a shorter latency (good). The second chip has a faster clock rate (good) but a longer latency (bad).

■CAS Latency for DDR3-800 6-6-6-15 = 6 / 100 MHz = 6*10**-8 seconds

■CAS Latency for DDR3-1333 9-9-9-24 = 9 / 166 MHz = 5.4*10**-8 seconds

In spite of its higher memory timing, DR3-1333 9-9-9-24 is faster than the DDR3-800 6-6-6-15.

CAS latency is the best case number. The Row Active Time (RAT) is the worst case number.

■RAT Latency for DDR3-800 6-6-6-15 = 15 / 100 MHz = 1.5*10-7 seconds
■RAT Latency for DDR3-1333 9-9-9-24 = 24 / 166 MHz = 1.4*10-7 seconds

So the advantage for the DDR3-1333 9-9-9-24 is less. But if you are moving billions of bytes per second then adding a little to a little adds up to a big savings.

One other key point. A big difference between DDR2 and DDR3 is that DDR3 doubled the size of the data prefetch buffer from 4 bits per cycle to a full 8 bits (i.e. a byte) with each pass. That is a 100% increase.

[strike]A question that I still have is how does Intel's Quick Path Interconnect (QPI) technology affect all this? Quad cores trying to access shared memory has got to create some bottlenecks. Shared nothing architecture? A question for another night.[/strike] In Nehalem, Intel has moved memory access responsibility to a new Integrated Memory Controller (IMC). The IMC directly communicates between the L3 shared cache and the DDR3 triple channel memory, potentially allowing three concurrent memory accesses. In a single Quad-core CPU configuration, QPI provides a point to point connection between that CPU's L3 cache and the X58 IO Hub. The IO Hub handles communication with the PCIe 2.0 graphics card(s). In future multi-CPU configurations (i.e. servers) QPI would also directly link each CPU to every other CPU.
A QPI connection consists of two 20-pair point-to-point data links, one in each direction. This allows communication in both directions simultaneously. The old northbridge (prior to Nehalem) architecture defined one path and communication could occur in only one direction at a time (not simultaneously). Take about S-L-O-W, especially since memory requests for the CPU were competing on this same path with traffic from the PCI graphics card(s); traffic jam!

Each Nehalem processor core has its own dedicated L1 and L2 cache. http://www.intel.com/Assets/PDF/manual/253665.pdf

BTW, I will correct any mistakes that the community finds in the above analysis. I am trying to understand; no guarantee that I do understand.

I used the following Wikipedia entries:
http://en.wikipedia.org/wiki/SDRAM
http://en.wikipedia.org/wiki/Front_side_bus
http://en.wikipedia.org/wiki/Memory_timings
http://en.wikipedia.org/wiki/CAS_latency
http://en.wikipedia.org/wiki/Precharge_interval

And this article from Benchmark Reviews which I highly recommend
http://benchmarkreviews.com/index.php?option=com_content&task=view&id=174&Itemid=1&limit=1&limitstart=2

EDIT: Added info about Nehalem's QPI. Added information about QPI being full-duplex. Noted that with Nehalem, the FSB is no longer directly involved with memory clock rates.
 
Nice post. I wish I had that link when I started.

I have given this post a Nehalem flavor (sorry AMD guys). Continuing along that line, I have learned that the timings within an Intel i7 CPU is determined by the BCLK (base clock) frequency. The CPU, the IMC, and the QPI clock rates are derived from the BCLK.

The BCLK frequency is set by default to 133MHz. This says that if you do nothing, your DDR3-1600 memory runs not at the 200 MHz that you thought you were buying but at the same speed as that DDR3-1066 memory that you turned your nose up at. Don't believe me, ask Intel: http://www.intel.com/support/processors/sb/CS-029913.htm

"What is the maximum frequency for DDR3 memory when used with Intel® Core™ i7 desktop processors?

These processors support DDR3 memory with a maximum frequency of 1066 MHz. If faster DDR3 memory is used (such as 1333 MHz or 1600 MHz), it will be down-clocked to operate at 1066 MHz."

And if the CAS timing on that DDR3-1600 stick was greater than 7, boy are you embarressed. :cry:

We can also derive that the default memory multiplier for the i7 processors is 1066/133 = 8

To get to the higher memory cycle rate that you paid for, you have to overclock. For example, here: http://www.computerpoweruser.com/editorial/article.asp?article=articles%2Farchive%2Fc0903%2F24c03%2F24c03.asp

"Finally, there is the BCLK (base clock) to consider. The ultimate frequency of a Core i7 processor, its QPI link, and its memory speed are all derived from a BCLK frequency, which is set to 133MHz by default. Although the FSB is no more, it may help to think of the BCLK frequency as similar to the FSB clock. Raising the BCLK raises the CPU clock, QPI speed, and memory speed in lockstep. To keep all things running within stable limits, most X58-based motherboards for the Core i7 give users the ability to alter QPI and memory multipliers independently of the CPU, QPI, and memory clocks. All can be fine-tuned to some degree. Raising the base clock is how we overclocked our Core i7 920."

<Added by EDIT> If you owned DDR3-1333 memory like I do, the memory multiplier has to go to 10.

Mike
 
I want to ask about overclocking ddr3 VS latencies. I want to build a gaming pc and want to know if I use DDR3 RAM with lower latencies (5-7-5 @ 1375MHZ) VS (7-8-7-20 @ 2000MHZ) will help the computers preformence? The big problem is $$$ for the CL5 DDR3 is $400 for 1x1GB or $800 2x1GB but how does the price bump work? Will I beable to see a bandwidth increase worth all the $$$ for CL5, or is the CL7 worth it? Ive seen people benchmark the (7-8-7-20 @ 2000MHZ) with around 48-52 GBs transfer rate. I think im asking will the lower latences show a preformence boost?