turkey3_scratch :
DDR4 does not result in higher IPC, the processor architecture does.
Both DDR2-1600 and DDR3-1600, for instance in my mind, would have an IO bus frequency of 800Mhz. That is a 4:1 ratio to the actual memory clock speed. Does this mean Pinhedd that every cycle of the memory 4 bits can be transferred essentially?
And if this is the case, what makes DDR2 different from DDR3 if the IO bus frequency and memory clock speed are the same?
Great questions. I've answered these in my memory tutorial at the top of the forum, but I'll summarize the answer here just for good measure.
DDR3 has an 8 word prefetch. Every time a read operation is executed, the command is sent to the appropriate selected bank. The bank reads 8 words from the open row (also called the open page) in parallel. The size of the word is equal to the number of IO pins on the DRAM chip. Most personal computers use 8-bit DRAM chips, but 16-bit DRAM chips can be found in compact devices and 4-bit DRAM chips are found in servers with truckloads of memory.
The 8 words that are selected are always right beside each other in the column's address space. The lower 3 bits of the column address that accompanies the read command determines the order that they are serialized in. If the lower 3 bits are all 0 (by far the easiest), the order is simply 0,1,2,3,4,5,6,7.
So, a read command on an 8-bit DDR3 chip selects 8 8-bit words that are spatially adjacent to eachother for a total of 64-bits that need to be transmitted back to the memory controller that issued the read request. Once the data is selected, it is fed into the shared IO logic which is connected to the IO bus. The actual transmission process takes 4 cycles; the timing parameter for this is Tccd, or Column-Command-Delay. This is the minimum number of cycles between consecutive read commands to the open row. Issuing a read command to a bank before Tccd cycles has elapsed would result in data that is in queue for transmission getting clobbered.
Tcas is the number of cycles between the latching of the read command by the DRAM chip and the DRAM chip presenting the first word on the IO bus. Since Tccd, which represents the shared IO time, is fixed at 4 cycles for DDR3, Tcas is lower bounded by Tccd. The lowest programmable Tcas for DDR3 is 5 cycles which leaves only one cycle for the entire decoding, fetching, and serialization process. This is only practical at very low data rates; in practice, Tcas between 9 and 11 cycles is common which leaves between 5 and 7 cycles for the backend logic to complete. I believe that JEDEC requires compliant DDR3 to support programmable Tcas between 7 and 14 cycles, with the rest being optional.
The neat point here is that memory operations can overlap.
So, each DDR3 DRAM chip transfers a single word each half-cycle for a total of 4 cycles as part of a single read operation. x86 DIMMs place a number of DRAM chips in parallel to form a 64-bit functional unit called a rank. On PCs this is typically (but not always) 8x8-bit chips located on one side of a PCB. Other rank forms are 4x16-bit chips found in compact devices, and 16x4-bit chips found in servers.
64 bits per rank * 8 transfers per command = 64 bytes transferred per command
One of the chief differences between DDR2 and DDR3 is that DDR2 has a 4 word prefetch compared to DDR3's 8 word prefetch. Similarly, Tccd for DDR2 is only 2 cycles rather than 4.
So you are correct that DDR2-1600 (I'm not sure that any modules of this data rate were ever available, the highest that I own is DDR2-1200) would have the same command clock frequency as DDR3-1600 but the DDR2-1600 core would be operating twice as fast as the DDR3-1600 core.
DDR4 retains the 8 word prefetch from DDR3.