Tradesman1 :
The base specs per JEDEC are simply a continuation of DDR3, the specs were being written at the same time, both starting about 2003/4 era. DDR3 originally ended with 1600 (1866 and 2133 were add ons and long after the manufacturers already had 1866, 2133 and 2400 sticks out). DDR4 specs (per JEDEC picked up at 1600 and 1866 and went on up, and now that it's here, note you don't see any lower end DDR4 as was provided for. Where you see the biggest differences between the two is the newer chips being used (higher end) and if trying to compare, the improved internal caches in the X99 CPUs and stronger MCs (memory controllers).
I'm not referring to the data rate, I'm referring to the logical design of the chip.
Each generational jump from SDR SDRAM -> DDR SDRAM -> DDR2 SDRAM -> DDR3 SDRAM doubled the size of the prefetch buffer on each bank. This allowed the IO interface to run faster with respect to the bus controller while allowing the DRAM core to operate at a comfortable power level.
SDR SDRAM has only a 1n prefetch, so each read or write command transfers a single word in a single cycle. DDR SDRAM has a 2n prefetch, so each read or write command transfers two words in a single cycle (that's where the DDR comes in). DDR2 SDRAM has a 4n prefetch, so each command transfers up to four words over two cycles. DDR3 SDRAM has an 8n prefetch, so each command transfers up to eight words over four cycles.
When reading, the words need to be serialized from the row buffer into the IO gating logic, and when writing they need to be deserialized from the IO gating logic into the row buffer. As the prefetch depth grows, the complexity and power requirements of the decoding and IO gating logic grows exponentially. If DDR4 were to continue the same trend as its predecessors it would need to have a prefetch depth of 16n per bank.
Since the prefetch depth of 8n interacts nicely with the 64-byte cache line size used by x86 and ARM microprocessors there's was industry resistance to altering the prefetch depth for reasons other than power consumption and logic complexity. The decision was made to keep the prefetch at 8n and reorganize the bank architecture to allow for a greater command throughput and subsequently higher [meaningful] data rates.
High data-rate DDR3 modules suffer from starvation which limits total command throughput due to design constraints, so DDR4 can often outperform DDR3 even at the same data rate.
For example, Tfaw limits the number of row active commands across the entire chip to four within a rolling time window. On JEDEC DDR2 this is around 37.5 ns for an 8x chip and on JEDEC DDR3 it is around 30ns for an 8x chip. On JEDEC DDR4 it has been reduced to around 20ns, so the memory controller can issue row active commands more frequently. Even if DDR3 data rates were to approach infinite, device performance would still be limited by its ability to activate the rows that it needs to activate in order to do work. If the command queue is full of commands to rows that need to be activated and the activation window hasn't rolled over yet then the memory controller has no choice other than to idle.