juanrga :
You start by partially misquoting me. My claim was that discrete GPUs will be killed by about 2020. The article on APUsilicon gives some details why this will happen.
You pretend that APUs are only for budget segment, but are not. In fact our claim is that high-performance APUs will replace top discrete cards. You mention we cannot make 500mm^2 APUs, but the design that Nvidia engineers are working occupies 650mm^2 on 7nm node. AMD engineers don't give details about the size of their APU, but would be similar.
Cross-firing will not help, because will increase the problems described in the article.
It is funny that you claim we cannot combine different APUs to provide more performance, when this is exactly that will be made. The APUsilicon article gives a node representation with one central APU plus several assistant APUs.
This is about tech and the underlying physics. In former posts I discussed economic reasons why discrete GPUs don't have future and will disappear.
The issue with this speculation is 2 fold:
1.) Whatever you can do with an APU die, you can get significantly more cores for GPGPU functions onto a dedicated GPGPU die. You can also decrease power consumption if that is your goal by design for this exascale push.
2.) Unless you are discussing using TSV interposers,or some other similarly absurd system for interconnects across all the APUs you claim will be used to make exascale computers, you are still not overcoming the issue of interconnects consuming the lion's share of the power. Even TSV interposers consume power, albeit less than your normal interconnects.
The issue is, it would not be cost effective to try to mount multiple APUs + HBM/DRAM on interposers. Additionally, you end up with the issue being that while you are reducing the power consumption of your devices, the interconnects still consume a few watts here and there for each interconnect to transfer data.
The biggest issue with bigger HPCs is not the power cost of the processing units themselves, but the interconnects to get all of them working together.
Horst Simon has given multiple presentations about this particular subject, and why it will not happen by 2020, or likely within a decade of that time frame:
http://www.top500.org/blog/no-exascale-for-you-an-interview-with-berkeley-labs-horst-simon/
There is a quasi-summary of one such presentation in interview type form (Q&A).
Also, data movement will cost more than flops (even on the chip). Limited amounts of memory and low memory/flop ratios will make processing virtually free. In fact, the amount of memory is relatively decreasing, scaling far worse than computation. This is a challenge that’s not being addressed and it’s not going to get less expensive by 2018.
A relevant point above.