I have been trying to understand memory timing numbers and how they work with the Front Side Bus (FSB) and memory speeds to determine memory transfer rates to and from the CPU.
The following table shows the relationship between memory rates, FSB clock rates, and peak transfer rates.
DDR, DDR2, and DDR3 are examples of synchronous dynamic random access memory (SDRAM). Synchronous means that the memory timing is driven by the [strike]FSB[/strike] Memory clock rate (with Nehalem, the FSB is no longer involved with memory access). So the memory timing numbers for SDRAMs are in units of clock cycles.
The memory timing numbers are a measure of the latency (i.e. delay) between when a memory action is requested and when it will finish. There are four memory actions whose latencies are indicated by memory timings. From left to right, the integers denoting the latency in the number of memory cycles are:
■Column address strobe latency - elapsed time in clock cycles between the moment a memory controller tells the memory module to access a particular column in a selected row, and the moment the data from the given array location is available on the module's output pins.
■Row to column address delay - elapsed time to move from one row to the next
■Row Precharge time - elapsed time to change the voltage between a one and a zero; computers are binary
■Row Active Time - the number of clock cycles taken between a bank active command and issuing the precharge command
So a memory timing of 7-7-7-20 means that it takes 7 clock cycles to perform each of the first three actions. The row active time (the 4th and last number) is approximately the sum of the first three numbers.
So how does DDR3-800 running at 6-6-6-15 timing compare to DDR3-1333 running at 9-9-9-24? The first chip has slower clock rate (bad) but a shorter latency (good). The second chip has a faster clock rate (good) but a longer latency (bad).
■CAS Latency for DDR3-800 6-6-6-15 = 6 / 100 MHz = 6*10**-8 seconds
■CAS Latency for DDR3-1333 9-9-9-24 = 9 / 166 MHz = 5.4*10**-8 seconds
In spite of its higher memory timing, DR3-1333 9-9-9-24 is faster than the DDR3-800 6-6-6-15.
CAS latency is the best case number. The Row Active Time (RAT) is the worst case number.
■RAT Latency for DDR3-800 6-6-6-15 = 15 / 100 MHz = 1.5*10-7 seconds
■RAT Latency for DDR3-1333 9-9-9-24 = 24 / 166 MHz = 1.4*10-7 seconds
So the advantage for the DDR3-1333 9-9-9-24 is less. But if you are moving billions of bytes per second then adding a little to a little adds up to a big savings.
One other key point. A big difference between DDR2 and DDR3 is that DDR3 doubled the size of the data prefetch buffer from 4 bits per cycle to a full 8 bits (i.e. a byte) with each pass. That is a 100% increase.
[strike]A question that I still have is how does Intel's Quick Path Interconnect (QPI) technology affect all this? Quad cores trying to access shared memory has got to create some bottlenecks. Shared nothing architecture? A question for another night.[/strike] In Nehalem, Intel has moved memory access responsibility to a new Integrated Memory Controller (IMC). The IMC directly communicates between the L3 shared cache and the DDR3 triple channel memory, potentially allowing three concurrent memory accesses. In a single Quad-core CPU configuration, QPI provides a point to point connection between that CPU's L3 cache and the X58 IO Hub. The IO Hub handles communication with the PCIe 2.0 graphics card(s). In future multi-CPU configurations (i.e. servers) QPI would also directly link each CPU to every other CPU.
A QPI connection consists of two 20-pair point-to-point data links, one in each direction. This allows communication in both directions simultaneously. The old northbridge (prior to Nehalem) architecture defined one path and communication could occur in only one direction at a time (not simultaneously). Take about S-L-O-W, especially since memory requests for the CPU were competing on this same path with traffic from the PCI graphics card(s); traffic jam!
Each Nehalem processor core has its own dedicated L1 and L2 cache. http://www.intel.com/Assets/PDF/manual/253665.pdf
BTW, I will correct any mistakes that the community finds in the above analysis. I am trying to understand; no guarantee that I do understand.
I used the following Wikipedia entries:
http://en.wikipedia.org/wiki/SDRAM
http://en.wikipedia.org/wiki/Front_side_bus
http://en.wikipedia.org/wiki/Memory_timings
http://en.wikipedia.org/wiki/CAS_latency
http://en.wikipedia.org/wiki/Precharge_interval
And this article from Benchmark Reviews which I highly recommend
http://benchmarkreviews.com/index.php?option=com_content&task=view&id=174&Itemid=1&limit=1&limitstart=2
EDIT: Added info about Nehalem's QPI. Added information about QPI being full-duplex. Noted that with Nehalem, the FSB is no longer directly involved with memory clock rates.
The following table shows the relationship between memory rates, FSB clock rates, and peak transfer rates.
Code:
DDR Speed Memory clock Cycle time FSB Bus clock Module name Peak transfer rate
DDR3-800 100 MHz 10 ns 400 MHz PC3-6400 6400 MB/s
DDR3-1066 133 MHz 7.5 ns 533 MHz PC3-8500 8533 MB/s
DDR3-1333 166 MHz 6 ns 667 MHz PC3-10600 10667 MB/s
DDR3-1600 200 MHz 5 ns 800 MHz PC3-12800 12800 MB/s
DDR, DDR2, and DDR3 are examples of synchronous dynamic random access memory (SDRAM). Synchronous means that the memory timing is driven by the [strike]FSB[/strike] Memory clock rate (with Nehalem, the FSB is no longer involved with memory access). So the memory timing numbers for SDRAMs are in units of clock cycles.
The memory timing numbers are a measure of the latency (i.e. delay) between when a memory action is requested and when it will finish. There are four memory actions whose latencies are indicated by memory timings. From left to right, the integers denoting the latency in the number of memory cycles are:
■Column address strobe latency - elapsed time in clock cycles between the moment a memory controller tells the memory module to access a particular column in a selected row, and the moment the data from the given array location is available on the module's output pins.
■Row to column address delay - elapsed time to move from one row to the next
■Row Precharge time - elapsed time to change the voltage between a one and a zero; computers are binary
■Row Active Time - the number of clock cycles taken between a bank active command and issuing the precharge command
So a memory timing of 7-7-7-20 means that it takes 7 clock cycles to perform each of the first three actions. The row active time (the 4th and last number) is approximately the sum of the first three numbers.
So how does DDR3-800 running at 6-6-6-15 timing compare to DDR3-1333 running at 9-9-9-24? The first chip has slower clock rate (bad) but a shorter latency (good). The second chip has a faster clock rate (good) but a longer latency (bad).
■CAS Latency for DDR3-800 6-6-6-15 = 6 / 100 MHz = 6*10**-8 seconds
■CAS Latency for DDR3-1333 9-9-9-24 = 9 / 166 MHz = 5.4*10**-8 seconds
In spite of its higher memory timing, DR3-1333 9-9-9-24 is faster than the DDR3-800 6-6-6-15.
CAS latency is the best case number. The Row Active Time (RAT) is the worst case number.
■RAT Latency for DDR3-800 6-6-6-15 = 15 / 100 MHz = 1.5*10-7 seconds
■RAT Latency for DDR3-1333 9-9-9-24 = 24 / 166 MHz = 1.4*10-7 seconds
So the advantage for the DDR3-1333 9-9-9-24 is less. But if you are moving billions of bytes per second then adding a little to a little adds up to a big savings.
One other key point. A big difference between DDR2 and DDR3 is that DDR3 doubled the size of the data prefetch buffer from 4 bits per cycle to a full 8 bits (i.e. a byte) with each pass. That is a 100% increase.
[strike]A question that I still have is how does Intel's Quick Path Interconnect (QPI) technology affect all this? Quad cores trying to access shared memory has got to create some bottlenecks. Shared nothing architecture? A question for another night.[/strike] In Nehalem, Intel has moved memory access responsibility to a new Integrated Memory Controller (IMC). The IMC directly communicates between the L3 shared cache and the DDR3 triple channel memory, potentially allowing three concurrent memory accesses. In a single Quad-core CPU configuration, QPI provides a point to point connection between that CPU's L3 cache and the X58 IO Hub. The IO Hub handles communication with the PCIe 2.0 graphics card(s). In future multi-CPU configurations (i.e. servers) QPI would also directly link each CPU to every other CPU.
A QPI connection consists of two 20-pair point-to-point data links, one in each direction. This allows communication in both directions simultaneously. The old northbridge (prior to Nehalem) architecture defined one path and communication could occur in only one direction at a time (not simultaneously). Take about S-L-O-W, especially since memory requests for the CPU were competing on this same path with traffic from the PCI graphics card(s); traffic jam!
Each Nehalem processor core has its own dedicated L1 and L2 cache. http://www.intel.com/Assets/PDF/manual/253665.pdf
BTW, I will correct any mistakes that the community finds in the above analysis. I am trying to understand; no guarantee that I do understand.
I used the following Wikipedia entries:
http://en.wikipedia.org/wiki/SDRAM
http://en.wikipedia.org/wiki/Front_side_bus
http://en.wikipedia.org/wiki/Memory_timings
http://en.wikipedia.org/wiki/CAS_latency
http://en.wikipedia.org/wiki/Precharge_interval
And this article from Benchmark Reviews which I highly recommend
http://benchmarkreviews.com/index.php?option=com_content&task=view&id=174&Itemid=1&limit=1&limitstart=2
EDIT: Added info about Nehalem's QPI. Added information about QPI being full-duplex. Noted that with Nehalem, the FSB is no longer directly involved with memory clock rates.