bit_user :
You missed my point. I'm not talking about a DRAM-based cache. If someone substituted HBM2 for L3, there'd be no cache lookup stage.
A cache, regardless of the underlying technology, still needs a map of what addresses are at what location in said cache and the worst-case first word data access latency to DRAM is over 60ns plus lookup time vs a constant 10-15ns total for SRAM. If you meant replacing RAM with HBM, HBM is still much slower and much higher latency than L3. The only way they might be able to get away with ditching L3 would be to make L2 bigger but that isn't going to happen until larger L2 can be made without negatively impacting latency between the CPU cores and L2.
bit_user :
The silver lining is that when 3D XPoint hits mainstream DIMM slots, a few optimizations in: the OS, a couple web browsers, and a couple game engines can deliver substantial returns for users.
You can achieve the same "benefits" by simply having enough RAM in your system to keep frequently used data there and update software to keep data unpacked in RAM instead of discarding everything and reloading from storage as most games currently do when reloading a level. That's just inefficient software design and can be addressed without introducing any new technology by making better use of available memory.
Before introducing new technology to address a "problem", actually check that there is a problem in the first place.