[citation][nom]InvalidError[/nom]I'm comparing mainstream memory with mainstream memory. DDR1-400 was 2-3 cycles, DDR2-800 was 3-5 cycles, DDR3-1600 is 9-10 cycles. Modern GPUs are have been engineered around GDDR5 which has 15-20 cycles latency so they have been designed to work with high latency and will not benefit much from shaving a cycle here or there as long as the memory subsystem can deliver bandwidth. DRAM chips are pipelined and burstable, commands can be issued while data transfers are in progress so higher latency does not have a significant impact on usable bandwidth when transfers are optimized to account for that, which is exactly what GPUs have been designed to excel at.The differences between a PC and GPU:1- the GPU is soldered to the motherboard, CPUs aren't - at least not until mainstream Broadwell2- the GDDR5 chips are soldered to the motherboard, PC DDR4 is soldered on a DIMM which inserts into a mechanical socket which is soldered to the motherboard3- GDDR5 chips are about an inch away from the GPU's BGA package, DIMMs are 2-3 inches away from the CPU socket and the DIMM socket+PCB add another inch4- address and control lines on GPUs have a fan-out to 4-8 GDDR5 chips per channel, DIMM address and control lines fan-out to 16 chipsSo managing to do 64bits through two mechanical interfaces over much longer distances is going to be considerably more difficult than doing it on GPUs. If cranking clock rates on parallel busses was as easy as it may sound, companies would prefer sticking with simplicity over the extra power and complexity of implementing high speed TMDS links.The reality is that sockets and slots, particularly those of a very cost-sensitive nature such as mainstream PCs, are a signal integrity nightmare.[/citation]
What was mainstream memory is irrelevant because the DDR3 memory we are talking about, DDR3-2400, is not mainstream and furthermore, what is mainstream memory changed over time even within each generation of DDR. For example, DDR3-800 to DDR3-1066 and later on 1333 were mainstream when it first came out, but now even DDR3-1600 and DDR3-1866 are mainstream and DDR3-2133 imay become mainstream before DDR4 is common.
That GPUs are different doesn't change much of anything. We've had GDDR5 for a long time and will probably have a new GDDR6 or GDDR7 out in a year or two which far exceeds GDDR5. If we haven't improved enough in the several years that we've had GDDR5 to make system memory with even half the transfer rates using more advanced technology as system memory, then we are fail, especially since we've already got it worked out anyway. Besides, we have already been told that DDR4 made improvements in latency as well as bandwidth and anyone who looks at the memory technologies over time notices that your timing numbers for each generation of DDR are not necessarily accurate (I'll get back on that later in this post). We even have information on how DDR4 is to achieve this publicly available.
That the GPU is soldered, memory is soldered, and differences in chip count are all insignificant to my point. They prove that it can be done and in the past, we've had FB-DIMMS with huge transfer rates despite being DIMMs no closer to the CPU which was also not soldered to the board than we have memory today. In fact, they were often much farther.
DDR4's increased capacity per chip would enable 2-4GB modules with four or eight chips with ease. Heck, we can do that with current DDR3 chips. Chip count is absolutely not a problem.
For examples towards what I said earlier about timings, let's look at some common CL-tRCD-tRP timings for common official frequencies that overlap between DDR2 and DDR3.
800MHz-
DDR2= 4, 5, 6 (usually 5, but both 4 and 6 are both common)
DDR3= 5, 6 (I can't find a lot of examples still around)
No serious loss there for DDR3. Sure, it's something of a loss, but not much of one and these were all old modules rather than new ones.
1066MHz-
DDR2= 5, 6, 7 (usually 5)
DDR3= 6, 7, 8 (usually 7)
Again, no serious loss for DDR3. Furthermore, many newer high-frequency DDR3 modules with decent timings can be underclocked and timing-tightened much tighter than even DDR2 modules achieved. So, although at first, DDR2 had a slight advantage, nowadays, that advantage belongs to DDR3. DDR4 made far more changes and improvements in the technology than any DDR generation did over its predecessors in the past and is expected to have such an advantage from the very start. Even if it doesn't, it is even more likely to have it by the time that it gets into AMD's APU systems. Again, even if it doesn't by then, it's only a matter of time before it does.
Yes, increased complications made in each generation of DDR did mean higher timings initially, but technology improvements after the early models have always allowed the newer generation to overtake the older generation, even if not until higher frequency modules are out and they'd need to be underclocked to match frequencies with the previous generation for an apples to apples comparison. It doesn't even necessarily take expensive modules either. For example, Samsung has a DDR3-1600 kit that overclocks ridiculously well and underclocks similarly well. Some low voltage kits are also excellent about this.
Back to the chip density argument- DDR3 chips are made up to 1GB per chip. We might even have some 2GB chips going around in the server market. 512MB chips are common in 8GB DDR3 memory modules and many video cards have 512MB chips as well. DDR4's increased density will make 1GB chips probably be common by the time DDR4 trickles down to consumers, let alone into APU systems and even if not, 512MB chips are plenty to making say four quad-chip modules for a quad DIMM, 8GB kit with great frequencies and timings.
You can say that companies would prefer sticking to more simplicity, but it's already started. DDR4 might not have full sixteen chip modules running at its top speeds, but it's very unlikely that it won't get modules at those top speeds at all and regardless, it doesn't need to to prove my first point about DDR4-2400. That's an easy target.
Worst comes to worst, memory modules could implement a more PCIe- style of achieving high speeds with multiple serial lanes instead of one wide parallel bus. Rambus's XDR2 memory modules are another good example of pushing the limits of SDRAM-based memory modules even without that, granted they don't see much use given how Rambus is almost universally hated and/or not trusted as a business partner.
EDIT: Sorry this reply took so long; I wanted to be more thorough in backing up my claims.