AFAIK nvidia already the tech to make such thing happen. And they already have actual comercial product doing that: The DGX-2 where all 16 Tesla V100 being connected via NVswitch and the system only see there is one massive GPU instead of 16. And it was supposed to be transparent to the the operating system and any application that using that "single" GPU. The only concern that may raise from this kind of networking was latency. So far what being confirmed was using the new NVlink on Turing will allow the amount of VRAM being added to the whole resource instead of each card needing their own resource. Next step most likely how to make it so they can connect two different die together like how AMD did with infinity fabric.