cdrkf :
If we look back at processor design the amount of external components now on die: The FPU (maths co-processor for anyone as old as me), level 2 cache, level 3 cache (super socket 7 + K6-III) PCI controllers, USB controllers, PCIe controllers, the entire north and south bridges.... Why is graphics and main memory so sacred that they cannot ever be integrated? Eventually the number of transistors available on die reaches such a point that including these things becomes inconsequential.
I honestly think if you want back to the early 90's (thinking first 32 bit processors- 386 / 486 era) and showed people a schematic of a current CPU they'd think you were crazy...
high end, high performance gpu is not a low hanging fruit. that's why. by the time one integrates a "high performance" gpu by today's or future standard, an even higher performance gpu will be made possible by the same tech, so the integrated gpu won't be "high end, high performance" anymore.
it sort of looks like you're getting emotional about what cannot or should not be put on die. before... pre-65nm i think, fabrication wasn't about trade offs. that was the p4 era iirc. now, you can put 7-8 Billion transistors on a big die but can't run it without trading off clockrate or thermals (heat generation, temperatures i.e. physical limits). power management and turbo help, but even those have limits and are subject to design limitations as well. this is why everyone goes after the low hanging fruit first. current hsa, mantle, apu, pcie integration, fivr all gradually become low hanging fruits. but a high performance gpu is a very different thing. a gpu, by itself is a standalone asic with it's own processing units and memory hierarchy. you're not integrating an fpu or pcie controller, you're integrating a full blown asic. heterogenous computing techs make the gpu usable for general purpose. when you're running something like a 7850k (~248mm^2 die, 3.4B trsnsistors?), even in current paradigm you're switching on a large portion of the i.c. on load whereas on a cpu you'd be switching on a far smaller portion of the i.c.. keep this in mind because it'll become vital very shortly. also include the possibility that the soc has very good power management to prevent it from overheating (real time load calculation and balancing). now imagine the "big" soc with the high perf. cpu and igpu you prefer: imagine how much of the i.c. you'd be turning on on load(on gpus, more cores are in use due to paralles processing), how far higher the heat generation would be per area (remember how it affected ivb and haswell, but much bigger impact) on load. heat generation per area won't go down much because you'd be packing more transistors per area and then switching them on. current, better process tech can reduce leakage OR improve performance at the expense of power use. in the latter case, you'd power use and heat generation issues. in the former case, you won't be getting the high performance you'd expect from putting together a high perf. cpu and gpu on die but you'll get lower power use. and those are
after you've fixed yield issues - which have plagued every foundry with each shrink. this is where hsa comes in - going after the current low hanging fruit of software and coding overheads.
oh, and adding components on die adds to total cost, so there's economic concerns as well.