Where is Cross Fire??? They surely can make it in 2024 since they did it 2014, 2004? 8800XT will be around a 7900XT for $500 they say, well, make it CF and it will beat the 4090 no problm
In both cases the big issue is the extra headaches
and diminishing returns for the extra GPUs.
It's like baking a cake or building a house: adding another person doesn't always double the output or half the time. When you add 5, 500 or 5000 there is a good chance no cake nor house will ever happen, unless your problem doesn't suffer too much from Amdahl's law and your solution process is redesigned to exploit that.
It's very hard to actually gain from the extra GPUs, because cross-communications at PCIe speeds vs. local VRAM is like sending an e-mail in-house and having to hand-write and hand deliver it as soon as it needs to reach someone in the next building.
In some cases like mixture of expert models there are natural borders you can exploit. I've also experimented with an RTX4070 and RTX4090 because they were the only ones I could fit into a single workstation for the likes of Llama-2. Some frameworks give you fine controll on which layers of the network to load on which card so you can to exploit where layers are less connected.
But in most cases it just meant that token rates went down to the 5 token/s you also get with pure CPU inference, because that's just what an LLM on normal DRAM and PCIe v4 x16 will give you, no matter how much compute you put into the pile.
ML models or workloads need to be designed to very specific splits to suffer the least from a memory space that may be logically joined but is effectively partitions via tight bottlenecks. And so far that's a very manual job that doesn't even port to a slightly different setup elsewhere.