Sigh, it's obvious you aren't reading the answers in this very thread.
Any switch chip will provide full performance if only one device is attached at each end. Just as if your gigabit switch had only two things plugged in, everything runs at full line speed. More than that and they will need to share bandwidth if used simultaneously.
The point is not if dropping to x8 or sharing bandwidth reduces performance, but
whether adding more cards anyway still improves things.
X299 can do x16/x8/x8/x8 in most boards but you may find one that can do x16/x16/x8
Threadripper
can do x16/x16/x16 just fine, you just need to either find a X399 board that does this if nothing is plugged into the 4th slot, or else use a board
with only 3 slots like the
ASRock X399M Taichi (that one's mATX so you'd need a x16 extender to run a 3rd double-width card).
Intel can do x16/x16/x16 too if you use a 48-lane CPU like
Skylake-W (Xeon W-2135, 2145 or 2155) on a C422 chipset board such as the
Gigabyte MW51-HP0.
If you want to use more cards than that all at x16, then like I said you'd need multiple Skylake-SP Xeon processors or a single Epyc.
All of these Intel chips can do some form of AVX512 but honestly if your application can run in CUDA it would probably be faster running in the GPUs. FWIW, if your application can run in GPUs then CPU speed and cores should hardly matter, and the 4-core Xeon W-2125 only has a MSRP of $444 which is about the same as the current price of the 8-core TR 1900x. So for x16/x16/x16 you can choose between AVX512 and higher single-thread performance, or twice as many cores with three times as many NVMe. You could always add money to either platform to get more cores too.