Question Best value GPU with 27+GB VRAM for running LLMs ?

Zork283

Distinguished
Jul 5, 2014
28
0
18,530
My use case is I have installed LLMs in my GPU using the method described in this Reddit Post. I want to run WizardLM-30B, which requires 27GB of RAM. (I also have a quantized version with lower RAM requirements, but I do not have a number on the RAM requirements. I just know that I can fit 40/61 layers on my 16GB GPU, implying I would need 25GB to run it entirely on GPU). I cannot currently run the LLM entirely on my ARC A770 16GB GPU, forcing the rest of it to use system ram and the CPU, slowing the model down considerably. I would like to have a GPU with enough memory to handle the model itself, however, all the GPUs that I have found with more than 24GB are extremely expensive. The 24GB Radeon RX 7900 XTX is a thousand dollars, but I haven't been able to find any GPUs with more memory that are even double that price.

Of course, this entire issue could be avoided if there is a way to split the LLM up among multiple GPUs, then I could just get a second A770. The A770 seems to have a very good value of VRAM per Dollar. Unfortunately, I have not been able to find any resources for this. This post I have found with my exact issue does not say that it is possible https://datascience.stackexchange.com/questions/121639/load-an-llm-in-multiple-gpus )

Here is a thread about 24GB GPUs,
https://forums.tomshardware.com/thr...n-gaming-graphics-card.3806569/#post-22998565

Here is my previous thread on this topic, before I refined my requirements

Does anyone know an affordable GPU with at least 27GB of VRAM?
 
Nothing from nvidia on PCIe passed 24GB until Volta which will still cost thousands unless you have a platform that can utilize OAM modules. Anything after that will cost even more. The P40 and M40 are probably the cheapest 24GB cards you could buy. AMD has had a few 32GB cards that you might be able to find, but most will still cost at least a thousand. I think the S9170 might be the only remotely affordable one.
 

Zork283

Distinguished
Jul 5, 2014
28
0
18,530
The FirePro S9170 does have enough VRAM, but it is an 8 year old card on a 28nm process and only 5 TFLOPS. Is that powerful enough for this use case?
 

Zork283

Distinguished
Jul 5, 2014
28
0
18,530
I have no clue as I have no interest in LLM or anything of the sort. You'd just asked about cards with an arbitrary VRAM amount and PC hardware interests me.
Thank you. I appreciate your candor.

There is none. 24GB is the most amount of RAM for now. You are looking for a GAMING GPU for AI, then are surprised that none fits your needs.
I was also looking at workstation and server GPUs, but all of the ones I looked at were well over three thousand dollars. The S9170 was the first one with enough memory that I have seen that doesn't cost more than my entire rig.
 
Is this an absolute requirement, or this is something you'd simply like because it supposedly improves performance?

Because I'm pretty sure while the amount of memory is important, the performance of the thing actually doing the processing is even more important.
 
Thank you. I appreciate your candor.


I was also looking at workstation and server GPUs, but all of the ones I looked at were well over three thousand dollars. The S9170 was the first one with enough memory that I have seen that doesn't cost more than my entire rig.
And there's the rub... If you want a card with more than 24GB of VRAM, you're going to have to spend the money for a professional workstation card and they make the RTX 4090 look like a bargain.
 
  • Like
Reactions: Order 66

Zork283

Distinguished
Jul 5, 2014
28
0
18,530
Is this an absolute requirement, or this is something you'd simply like because it supposedly improves performance?

Because I'm pretty sure while the amount of memory is important, the performance of the thing actually doing the processing is even more important.

If the GPU does not have enough vram fit the entire model, then some of it must be run on main system memory and used the CPU for processing rather than the GPU. This greatly slows down the LLM as the CPU/system ram becomes the bottleneck.
 

TRENDING THREADS