gamerk316 :
I'm just trying to ask what can HSA be used to solve
Not much, honestly. It's a convenience, nothing more. Anything you can do with HSA you could already do with OpenCL/Direct Compute. Having everything in the same memory context is nice, but you ran into the same problem I said you'd have from day 1: Slow system RAM. That's why you still have, and will continue to have, dGPUs, since the memory you can put on them will always be faster then what you use as main system memory. The only overhead is the initial transfer, which is small compared to the speedup of GDDR5 versus DDR3.
You have one thing good at serial workloads, and one thing good for parallel ones. Trying to mix and match will dilute both, and advance none.
We have one thing good at serial workloads (which we're using for serial and parallel workloads) and one thing good at parallel ones (which we're using for a handful of parallel workloads). The goal is not to make a serial processor do parallel or vice versa, the goal is the opposite - to not mix and match. And for many parallel tasks, the initial transfer overhead is far too high because the GPU requires different data for each computation (I remember reading about GPU-accelerated collision detection that would be accelerated upwards of 100x if it didn't have to be copied back and forth between GPU and system memory). Not all parallel tasks are low-data high-processing, which is what dGPUs are best at.
I'm just imagining the nightmare that would occur if you tried to copy to the dGPU information about the state of a bunch of objects, have it check for collisions, copy the resulting collisions back to the CPU for it to determine which one occurred first and if there are dependencies among the collisions (because you can't analyze data dependencies in parallel), then copy the ones that do have dependencies back to the dGPU to re-evaluate with the updated information, and repeating until all the objects have had all collisions for the given time frame evaluated. It would be a pain to try to program, and an even bigger pain to try to make work efficiently.
I can imagine it working much better with HSA. But no, I don't think dGPUs are going to suddenly go extinct - GDDR5 system memory is still impractical, and the main use of graphics cards will remain rendering - one of the tasks where the initial transfer cost is well worth the high-speed RAM speedup. And I would imagine that trying to fit high-end graphics chips like the R9 290X (doesn't that have stock water cooling? Or am I thinking of the wrong one?) into the same chip as a CPU will remain impractical for a long time to come.
Summary: Graphics cards will remain around, to do graphics. Perhaps we should start thinking about APUs as... well... APUs instead of as a CPU + iGPU. Personally I would be thrilled to have an APU doing the main execution of a game I'm playing and having a dGPU doing the rendering (in terms of GFLOPS, doesn't the a10-7850k overall outdo intel's higher-end parts?).