News AMD enhances multi-GPU support in latest ROCm update: up to four RX or Pro GPUs supported, official support added for Pro W7900 Dual Slot

Admin · Jun 19, 2024

AMD has updated its ROCm driver/software open-source stack with improved multi-GPU support. Users can now take advantage of up to four qualifying GPUs in a single system for AI workflows.

AMD enhances multi-GPU support in latest ROCm update: up to four RX or Pro GPUs supported, official support added for Pro W7900 Dual Slot : Read more

oofdragon · Jun 19, 2024

Where is Cross Fire??? They surely can make it in 2024 since they did it 2014, 2004? 8800XT will be around a 7900XT for $500 they say, well, make it CF and it will beat the 4090 no problm

oofdragon · Jun 19, 2024

I've been always AMD but next gen I want 4090 perf for 1440p and/or at least 1.3x4090 perf for 4K, hope they make it somehow or I'll have to go Nvidia 😭

abufrejoval · Jun 19, 2024

oofdragon said:
Where is Cross Fire??? They surely can make it in 2024 since they did it 2014, 2004? 8800XT will be around a 7900XT for $500 they say, well, make it CF and it will beat the 4090 no problm

In both cases the big issue is the extra headaches and diminishing returns for the extra GPUs.

It's like baking a cake or building a house: adding another person doesn't always double the output or half the time. When you add 5, 500 or 5000 there is a good chance no cake nor house will ever happen, unless your problem doesn't suffer too much from Amdahl's law and your solution process is redesigned to exploit that.

It's very hard to actually gain from the extra GPUs, because cross-communications at PCIe speeds vs. local VRAM is like sending an e-mail in-house and having to hand-write and hand deliver it as soon as it needs to reach someone in the next building.

In some cases like mixture of expert models there are natural borders you can exploit. I've also experimented with an RTX4070 and RTX4090 because they were the only ones I could fit into a single workstation for the likes of Llama-2. Some frameworks give you fine controll on which layers of the network to load on which card so you can to exploit where layers are less connected.

But in most cases it just meant that token rates went down to the 5 token/s you also get with pure CPU inference, because that's just what an LLM on normal DRAM and PCIe v4 x16 will give you, no matter how much compute you put into the pile.

ML models or workloads need to be designed to very specific splits to suffer the least from a memory space that may be logically joined but is effectively partitions via tight bottlenecks. And so far that's a very manual job that doesn't even port to a slightly different setup elsewhere.

systemBuilder_49 · Jun 19, 2024

Except AMD is so stingey with pcie slots (only 24 on the 1000x - 9000x cpus) that this feature is USELESS to all but threadripper customers. Nice one, AMD, democratizing AI for nothing but their richest customers!

LabRat 891 · Jun 19, 2024

oofdragon said:
Where is Cross Fire??? They surely can make it in 2024 since they did it 2014, 2004? 8800XT will be around a 7900XT for $500 they say, well, make it CF and it will beat the 4090 no problm

I reccommend you take a look @

'Crossfire' is gone, long-gone now. M-GPU only works on VK/DX12, where supported.
However, AMD has already figured out how to 'bond' GPUs together over InfinitiFabric. The feature merely has not been offered to the consumer space.

I may be incorrect, but I believe InfinitiFabric inter-GPU communication is involved with ROCm Multi-GPU, too.

Amdlova · Jun 19, 2024

oofdragon said:
I've been always AMD but next gen I want 4090 perf for 1440p and/or at least 1.3x4090 perf for 4K, hope they make it somehow or I'll have to go Nvidia 😭

I gave up from amd... I take the worse card from nvidia the infamous rtx 4060ti 16gb

Don't wait to go green team.

Get one on cheap before the new cards come out.

DS426 · Jun 19, 2024

systemBuilder_49 said:
Except AMD is so stingey with pcie slots (only 24 on the 1000x - 9000x cpus) that this feature is USELESS to all but threadripper customers. Nice one, AMD, democratizing AI for nothing but their richest customers!

Entry-level Threadripper (7960X) is not terribly expensive (~$1,400) and gives 88 usable PCI-E lanes on the TRX50 platform. A quad 7900 XTX system will probably need that much CPU, depending on the AI workload. We're still talking about performant, relatively lower cost AI systems here with no full-blown EPYC or Xeon server systems.

BTW, AM4 had 24 lanes but AM5 has 28 lanes, and they are PCIe 5.0 capable. I don't know that 7900 XTX needs more than PCIe 4.0 x8 bandwidth (or maybe a small bottleneck??), so at least a dual GPU setup seems more than feasible to me as 16 lanes (minimum) are dedicated to PCIe slots.

DS426 · Jun 19, 2024

Amdlova said:
I gave up from amd... I take the worse card from nvidia the infamous rtx 4060ti 16gb

Don't wait to go green team.

Get one on cheap before the new cards come out.

Coming from a RX 6700XT? Why?

Amdlova · Jun 19, 2024

DS426 said:
Coming from a RX 6700XT? Why?

My rx6700xt died playing helldivers 2 lives 13 months no rma for me

abufrejoval · Jun 21, 2024

systemBuilder_49 said:
Except AMD is so stingey with pcie slots (only 24 on the 1000x - 9000x cpus) that this feature is USELESS to all but threadripper customers. Nice one, AMD, democratizing AI for nothing but their richest customers!

Lanes come at a cost, actually a huge cost in terms of die area and power consumption.

AMD gives you options: Make do with 16 lanes on APUs, 24-28 lanes on "desktop" SoCs and plenty more with Threadripper and EPYC.

Not everyone wants to pay extra for extra lanes on lower tier SoCs.

And some may be able to make do without having the full complement of lanes for every GPU: in GPU mining a single lane was good enough while with LLMs even 64 PCIe v5 lanes may still be too slow to be useful.

In theory you could even employ switches, which is what these ASmedia chips are, too.

Whether you're stingy with your money or AMD is stingy on the lanes is a difference in perspective that complaining cannot bridge.

Search

News AMD enhances multi-GPU support in latest ROCm update: up to four RX or Pro GPUs supported, official support added for Pro W7900 Dual Slot

Admin

Administrator

oofdragon

Distinguished

oofdragon

Distinguished

abufrejoval

Reputable

systemBuilder_49

Distinguished

LabRat 891

Honorable

Amdlova

Distinguished

DS426

Upstanding

DS426

Upstanding

Amdlova

Distinguished

abufrejoval

Reputable

TRENDING THREADS

Latest posts

Moderators online

Share this page