• Happy holidays, folks! Thanks to each and every one of you for being part of the Tom's Hardware community!

Question Advice on a Motherboard/Case capable of accommodating 4 x 4090 GPUs ?

pjw

Distinguished
Feb 17, 2009
47
1
18,535
I'm interested in building a system initially with 2x 4090 GPUs but with the ability to add two more later if the need arises. This is for running LLMs and perhaps fine tuning small ones.

So far I have found a very limited number of motherboards that have more than 3 PCI slots, and of those, the 3rd is often a x1 slot.

Also, those motherboards have closely spaced slots leading to the need to mount the 4090s elsewhere.

I have no experience of mounting cards elsewhere in a case, and no idea what features to look for.

Any advice about a case, a motherboard, and how to mount cards elswhere would be greatly appreciated!

As a secondary benefit, a quieter solution would be good, but everything I have read suggests GPUs work better with air cooling, so and and all advice welcome!
 
My recommendation is to use Newegg filters to find the mobo with that spec. In a super quick search just now, it appears that this requirement is going to push towards AMD but I did not dig particularly deep into it.

I also agree that there is going to be some issues with mounting in a case. In this instance you may find a more suitable solution in some of the mining rig style case/bench for this specific a requirement.
 
Thaks for the reply. I'll have a close look at the mining cases, but a lot seem to be open to the air making them (I assume) very noisy.

The newegg filters help, but once I get top 4 slots I get into seriously overpowered CPU (for my needs) -- as I understand it, for LLMs, the CPU is not a priority since all work done in GPU. A decent CPU and 1.5x GPU memory for main memory seems to be the primary focus.
 
Thaks for the reply. I'll have a close look at the mining cases, but a lot seem to be open to the air making them (I assume) very noisy.

The newegg filters help, but once I get top 4 slots I get into seriously overpowered CPU (for my needs) -- as I understand it, for LLMs, the CPU is not a priority since all work done in GPU. A decent CPU and 1.5x GPU memory for main memory seems to be the primary focus.
I know I'm late to this discussion, but I'm new to the forum and just came across this.

As someone who has messed around a lot with LLMs and is currently putting together a dual GPU build specifically for this purpose, I want to point out that it is a myth that CPU does not matter for LLMs.

For one, if you have a fast, high core-count CPU, you can run larger models using CPU inference for layers that can't fit into VRAM. For example, with 256GB of DDR5 and a TR 7000X CPU you could run an unquantized 70B model using GPU for some layers, and CPU/RAM for the rest. It would be slow, but still usable (a few tokens/sec).

Also, once you start running 4x4090 cards, you are going to run into issues with PCIe limitations of consumer CPUs, and will appreciate how higher end CPUs like the Threadripper Pro 7000x series have the capacity to handle traffic between GPUs (especially since NVLink isn't possible with RTX4090).

Finally, if you are going to be doing any fine-tuning or training of your own LLM models, rather than pure inference, the pre-processing stage of training is ALL about CPU/RAM. When it comes to processing and moving around terabytes of text data, you're going to appreciate a fast CPU.
 
Last edited:
> it is a myth that CPU does not matter for LLMs

Fair enough, but as soon as you offload any layers to the CPU your performance plummets.

The vast bulk of my work will be inference on LLMs tuned elsewhere so, for me, less CPU need. That's the basis of CPU not mattering; I agree that it still matters to have a decent CPU for pre/post processing, just not a lot of use for 64 cores with 128 threads, I believe, especially in my case (100% inference).
 
You won't find a standard case that can mount 4x RTX 4090s unless they're water cooled. I'm not sure how much PCIe bandwidth matters, but if it does you cannot use a desktop CPU for it. The lane counts are extremely limited and you won't be able to connect them directly to the CPU. If the PCIe bandwidth doesn't matter then you can look for a board with the additional slots being 4 lanes though these are usually fairly expensive.
 
> it is a myth that CPU does not matter for LLMs

Fair enough, but as soon as you offload any layers to the CPU your performance plummets.
Yes, it is absolutely the case that models that can fit entirely into VRAM will be at least an order of magnitude faster than those where you're having to do CPU inference. And if all you're doing is inference, then you're right that focusing on having lots of VRAM is probably the most important thing to focus on, at least for 70B and smaller models. However, if you want to run really large models (e.g. Mixtral 8x22B, Miqu 120B, Command R+, etc ) without having to buy hundreds of GB of VRAM, then having a powerful CPU and fast RAM (ideally 8 channels) will let you run these huge 100B+ models at usable speeds (3-10 tok/sec).

To answer your original question though, you might want to look into the WRX90E-SAGE motherboard as far as one that has enough PCI slots for 4 GPUs (although as TheStryker pointed out, you won't have space for 4 GPUs unless they are watercooled). It also has 8 slots of DDR5 RAM (which is double the memory bandwidth of quad-channel motherboards)

I just built a 2xRTX 4090 system on the WRX90E-SAGE, and they are both air cooled. Later, when I want to expand to 4x4090s (when prices drop after release of 5090), then I will convert them all to water cooled so that I can fit all 4 in the case. If you go with the WRX90E-SAGE though, it will require the purchase of a Threadripper Pro CPU.
 
Last edited: