get a HEDT then?
16x 5.0 lanes is overkill for current GPUs. These GPUs take up 3 to 4 slots when desktop cases only feature up to 7 slots anyways.
Like what are you trying to run? 7x Apex storage X21's?
Well, some people use GPUs for GPGPU workloads like machine learning and then find that 16 lanes of PCIe v5 between two such GPUs is slightly cheaper and more easily available than NV-link, while few models tolerate such drastic bandwidth reductions between different layers of the LLM, when they feel 3TB/s of HBM or 1TB/s of GDDR6 is slow already.
Just saying that GPUs aren't just about gaming, and today fixed slot allocations to lanes lack flexibility and result in unusable capacity. I'd prefer being able to freely allocate sets of 4-lane bundles between components, much like CXL seems to envisage.
Cables capable of running PCIe v5 speeds will be terribly expensive and connectors aren't as reliable as solder, but they have a good chance of catching up with precise run lenghts vs PCB traces and at these speeds: 1mm of PCB may be the distance electrons run between clocks for all I know.
BTW I was completely shocked to discover
a) the RTX2060m in my Enthusiast NUC11 was only using 4 lanes to the TigerLake 1165G7 (U-class mobiles just don't have more than 8 lanes for a dGPU and this one needs 4 for Thunderbolt support).
b) It didn't matter a bit for gaming performance, which was still extremly good, especially for a system that cost almost exactly the same as another NUC11 without that dGPU, when I bought it (it was not attractive at its original price).
c) that was all PCIe v3, because the RTX20 series can't do better, while the Tiger Lake could.
But that's basically my kind of Xbox, not a workstation.
Nvidia hates people combining GPUs to tackle bigger workloads, so they have OEMs make cards too wide to use bifurcation.
Hackers hate Nvidia putting obstacles like this on their path to get the best value for their money, so they swap out shrouds, convert GPUs to liquid cooling or just use cables and do it anyway. Plenty of RTX4090 being hacked like that in China and I would have wanted two like that to run Llama-2 70B at 4 bit quantization in my lab.
I had to do with PNYs a 3 slot 4090 and a 2 slot 4070 which are just that crucial bit smaller to have them fit into an x570 board without killing warranty, to test how bad LLMs would suffer from a PCIe v4 x8 bottleneck between them at the different layers for distinct model variants.
There are dramatically diminishing returns, just in case you're interested, but prices for those A100 are exponential and that means there may be an intersect of interest.