The problem with ever-faster DDR standards is that latency tends to increase faster than bandwidth.
HBM2 is interesting because it's:
Eventually, it might even be cheaper, since it'll save on board traces & manufacturing costs.
The Bandwidth vs Latency have increased identical to each other. 7ns is the floor for traditional DRAM models because the signal needs to reach the end of the bus before a new signal can by put on that bus. Each of the chips on a DRAM channel on chained together and the more DRAM chips you have on a bus the longer the buffer time on the signal. This is why 2x2 configurations have better performance then 2x4 configurations with the same quality chips, half the chips per channel means less time needed for the signal to terminate and tighter timings. SDR through DDR3 kept the same general topology with various minor adjustments, GDDR memories were just their system counterparts with specialized topology for a greatly expanded number of individual ranks and bus's. HBM just takes the GDDR concept and amplifies it to an obscene level.
And here is where the problem begins, large wide configurations have lower clock rates then small narrow configurations. This results in them having longer latencies because physics (giant discussion here if you want). Both GDDR and HBM have really high command latencies, usually 3~10x what a piece of DDR would have. In the graphics world this is fine because a GPU's vector workload is so paralleled that feeding massive quantities of data is more important then waiting on a single command's return, their entire design is around buffering a ton of data and sending it in a torrent to the cores for calculation, once the torrent starts up it doesn't stop. In contrast a system CPU needs to constantly alter and adjust it's data stream based on the results of a previous instruction, this makes the ability to rapidly respond to commands more important then having a vast data torrent pouring in.
Short version, for GPU's Bandwidth >>> Latency, for CPU's Real Latency >>> Bandwidth. That is why we use different memory technologies for each.