[SOLVED] Can cache coherence be used to solve multi-GPU scaling? And how would you do/code it?

Dec 12, 2018
88
1
35
Since GPUs communicate through L2 cache (https://devblogs.nvidia.com/wp-content/uploads/2018/09/image2.jpg), can cache coherence be used to connect the 2 graphics cards together? (almost like this: https://www.top500.org/news/new-cac...es-aim-at-datacenter-accelerators/#rating-250)
Also, I don't believe latency would be a big problem as GPUs are "high latency, high throughput processors" (slide 30 of http://download.nvidia.com/developer/cuda/seminar/TDCI_Arch.pdf)

So all I'm asking are for 2 things: 1) Is this possible?
2) How would you go forth and create/code this?