Build Advice 3 x 4U server rack 4090 colocation build advice ?

Jul 3, 2024
3
0
10
Hello everyone.

I started a business last year that generates AI video content. By now, the business has grown and the need for powerful GPU servers increased, however, the revenues are not yet there that I could just rent or buy powerful and overpriced nvidia enterprise GPU's to run my operation, yet the GPU power is needed to lower processing time of videos. Therefore, I have decided to build consumer hardware setups and send them to colocation. I can NOT deal with unreliable cloud GPU's anymore, we currently have automated on-demand service from paperspace.com, but fairly often they have issues due to lack of GPU's. It's getting ridiculous actually ... And not to mention, we are paying about 1500-2000 EUR monthly to them. Since consumer cards can't be bought by reputable datacenters(thanks nvidia), customers like me are forced to pay high prices for enterprise cards. Can I fix this with consumer hardware, maybe ...

I am no way PC build expert and actually built my first computer ever last year for home use - 24/7 training models. No other use, I rock with laptop for everything else.
mypc2.jpg



Specs:
i7 13700KF
4090 Zotac (5 year warranty)
Vengance 128gb of RAM(4 sticks) running on 5200mhz stable. 5600 resulted occasional blue screen.
Asus prime z790-P wifi
deepcool ak620
Crucial 4tb nvme
1200w PSU
5 140mm intake fans, 1 exhaust fan
Can't remember the name of the case, but if someone needs, can find out.

Now, almost year later, the thing is rock solid, no issues, it's been working pretty much full year with no breaks - just restart few times per month. I understand some parts are proper overkill, like PSU and fans. During training, GPU doesn't go over 64c and after limiting it to 400w, it's usually even below 60c. The thing I hate about it is the electricity bill. This PC brings me extra 150-200eur/usd per month extra as this part in EU, electricity is rather expensive. Not to mention the need for AC during summer as it does heat up the room.

So, after crunching numbers, I figured building 3 consumer hardware based GPU servers and sending them to colocation will actually pay for itself after ONLY 11 months - this INCLUDES the cost of the hardware. If I decide to sell the servers after 1 year, that would be extra profit.

This is what I have now:
3server.jpg


What are they exactly?
4U Unykach UK4329 19' racks 101 EUR/1
I7 14700KF 325 EUR/1
BeQuiet Dark Rock TF 2 71EUR/1
MSI pro z790-P wifi 170 EUR/1
Crucial PRO 64GB DDR5 5600mhz 152EUR/1
Lexar nm790 4tb NVME 209EUR/1
MSI 4090 SLIM 1537EUR/1
BEQUIET Pure Power 12 M 850w GOLD 103EUR/1
Be quiet! Accessory 12V-2x6/12VHPWR 90° Cable 10EUR/1
2x Noctua NF-F12 iPPC-3000 PWM 120mm 22EUR/1
3x Noctua NF-A8 PWM 80mm 15EUR/1
Missing rails!

Total price per one: 2,767 EUR(not including VAT)

Some explanations:
Dark Rock TF 2 - could of saved some money with this, but sadly maximum CPU cooler that most 4U cases take can't be more than 150mm - guess what, most are. Didn't want to take the risk. TF 2 was 144mm, fits well with quite much extra room.
MSI 4090 SLIM - same issue like last, could of bought cheaper options, but this is one of the smallest 4090s out there. Could not get my hands on FE's sadly.
12V-2x6/12VHPWR 90° Cable - can't build 4U with 4090 without this thing:
size.png


Still missing PSU's, should arrive today!

Sadly, I CAN not run these at home, due the noise, heat and electricity cost. BUT, not all EU countries have high electricity prices yet, I found TIER-3 SLA datacenter in Czech Republic where I could send my servers for
149 EUR per month, this price includes:

-ELECTRICITY based on PSU (850w), if 700w, price would be 129EUR.
-Unlimited traffic on the 100 Mbps band or 10 TB on 1 Gbps.
-24-hour support and assistance in replacing emergency parts, SLA response from 10 minutes.
-1 IPv4 address.
-A warehouse in the data center to store your spare parts, tools, and accessories.
-Reception of courier deliveries and storage of equipment.
-ISO images to disk/flash drive without restrictions.
-UPS battery system, diesel powered generator.
-Repair/Work for 50EUR per hour.

By using this solution, I will have 3 bare metal 4090 servers ready for processing videos for the total cost of 13k eur per year.

These servers don't really need fast internet nor much security, they will only communicate with my ''brain'' server, which gives them jobs. Job: Process this video using this model and upload outcome to cloud storage. All model files will be placed to servers, so servers won't download them. I won't even login to them often. All 3 will synced.

I have never used colocation before and I am wondering if someone has advice for me if I forgot or missed something with these builds? Or should I add something?
Each of them has 4tb NVME's, this should be good for some time, but what about if I need to add more storage? What would be the smart solution? Or maybe I should throw few sata SSD's inside of them already? How reliable is windows network file sharing, maybe I could just add SSD's to one of them and have other 2 just access the files? Or is there some smarter solution for this? I understand my combo lacks PCIe lanes, so adding second nvme might limit the speed of first one. Or maybe smartest would be just wait till I actually need more storage and send simple 1U NAS server there?

All my model files are backed up in cloud, pre-configured OS with all lib's and code in flash drive. Nothing important on the servers if I would loose NVME - they are just for processing.


Thank you for reading my long post. I am open for questions and suggestions.
 
Ive used these cases before, not yours exact but same layout with dual 80mm fans in back and they just cant remove the heat out of the case.

I have switched to this case LINK with my latest build and have had no issues with cooling. The only issue i have is if the fans go to full speed they sound like a server. I have set a fan curve on them so they never spin over 50% but there is enough mesh in the back of the case those fans just force the air out the back. But if your putting these in a Datacenter i would set the fan curve a lot higher to keep everything nice and cool.

RackChoice 4U Server Chassis -
Asus Prime X670-P WiFi -
Ryzen 7 7700x - G.Skill Trident Z5 6000MT/s CL30 -
Noctua NH-D12L Low Height Dual Tower
Asus Tuff 4070 TI Super OC -
Sound Blaster ZxR -
Asus ROG Thor 1200w Platinum II

The case also gives you the option for 8 drive bays in the front, if your motherboard has the sata ports to support them. I only plugged in 4 drives for mine, but i have the option for 2 more or add a pci card to support the other 4.


My last build was in a Silverstone RM42-502 4U case with a 8700K and 2080 Ti and i always fought cooling issues, at some point i actually just removed the lid to the case to get better cooling.
 
Ive used these cases before, not yours exact but same layout with dual 80mm fans in back and they just cant remove the heat out of the case.

I have switched to this case LINK with my latest build and have had no issues with cooling. The only issue i have is if the fans go to full speed they sound like a server. I have set a fan curve on them so they never spin over 50% but there is enough mesh in the back of the case those fans just force the air out the back. But if your putting these in a Datacenter i would set the fan curve a lot higher to keep everything nice and cool.

RackChoice 4U Server Chassis -
Asus Prime X670-P WiFi -
Ryzen 7 7700x - G.Skill Trident Z5 6000MT/s CL30 -
Noctua NH-D12L Low Height Dual Tower
Asus Tuff 4070 TI Super OC -
Sound Blaster ZxR -
Asus ROG Thor 1200w Platinum II

The case also gives you the option for 8 drive bays in the front, if your motherboard has the sata ports to support them. I only plugged in 4 drives for mine, but i have the option for 2 more or add a pci card to support the other 4.


My last build was in a Silverstone RM42-502 4U case with a 8700K and 2080 Ti and i always fought cooling issues, at some point i actually just removed the lid to the case to get better cooling.

Ran thermal tests on all 3, no issues what so ever( 2 x 120s do most of the work). CPU gets slightly hot during cinebench 2024, but after lowering PL's and setting one of the 3k rpm 120mm noctua fan to go vrooooom if CPU is hot, it stayed within range and didn't throttle. But I care lesser about CPU anyway, ran 30 minute full power test for GPU and that didn't go over 64c(same range as my desktop home PC, which has 5 140mm fans). Need to mention that the room I been running these tests is 29c atm. Brute forcing fans helps with temps. These industrial noctua's are loud tho. Could not see anyone able to consecrate when they switch over 50% rpm's.
 
Here are the thermals for CPU.

Room temp:
temp1.jpg


CPU - PL1 150w, PL2 200w
temp2.jpg


Cinebench 2024 scores.
Auto PL: 1880 (Throttles a lot)
My PL1/PL2: 1740
GPU is fine with 90% PL staying at 60-65c in 30c room(35k score no PL, 90% PL 34k score) - 12 hour test.