Hello,
I would appreciate some guidance on what hardware (GPU or otherwise) I should purchase to enable me to run LLMs locally on my machine.
Here are my system specs.
CPU: AMD Ryzen 9 7950X
Motherboard: ASRock X670E PG Lightning AM5 ATX Mainboard.
RAM: 64GB 5200 MHz (in 2 32GB DIMMs, leaving 2 slots free for expansion)
Case: Fractal Design Define 7
PSU: 1000W
GPU: Arc A770 16GB
As you can see by the oversized PSU and case, I have plenty of room for expansion.
I will start by describing my use case and then go from there:
I installed Nous-Hermes-13B-GGML & WizardLM-30B-GGML using the instructions in this reddit post. The main limitiation on being able to run a model in a GPU seems to be its VRAM. Nous-Hermes-13B-GGML requires 12.26 GB of ram and I am able to offload the entire model to my A770 GPU, causing it to run much faster than when even some of its layers are left on the CPU. I have aboslutely no complaints about how Nous-Hermes-13B-GGML runs, however, the model itself clearly has limitiations. WizardLM-30B-GGML requires 27GB, I can only send 40 of 63 layers of to the GPU, and it runs very slowly, only outputting about 1 token per second, and the program crashes frequently, seemingly whenever any video demands are put on the GPU (Like just generating basic screen output). As a result, WizardLM-30B-GGML takes several minutes to post a single reply.
Additionally, I would like to be able to run even larger LLMs as they become available, so I may just hold off on buying anything now and wait until there is an open source one that can rival GPT-4. In this case, I would still need to know how to go about choosing which components to handle the workload.
When launching koboldccp.exe to run the LLM, I chose 16 threads since the Ryzen 9 7950X has 16 physical cores, and select my only GPU to offload my 40 layers to. When I offload 40 layers, my VRAM usage on my GPU is 15.1GB. However, when the model is running, my GPU usage it typically only around 48%-55%.
At this point, I am going to just share my thoughts and musings on this subject, as I do not have any good answers.
I assume that I would need a GPU with more than 27GB of VRAM to run the whole model, but I have not seen any GPU with that much VRAM that isn't insanely expensive. If it is possible to divide the workload among multiple GPUs, then I could just get another A770 16GB because it seems to have very good VRAM per dollar.
Are GPUs even the way to go for this workload? I remember a Veritasium video featuring a company called Mythic AI that was making analogue components for running Neural Networks efficiently. I went to their website, but didn't see any way to buy any of their products.
Any advice or suggestions on the subject would be appreciated. Thank you in advance.
I would appreciate some guidance on what hardware (GPU or otherwise) I should purchase to enable me to run LLMs locally on my machine.
Here are my system specs.
CPU: AMD Ryzen 9 7950X
Motherboard: ASRock X670E PG Lightning AM5 ATX Mainboard.
RAM: 64GB 5200 MHz (in 2 32GB DIMMs, leaving 2 slots free for expansion)
Case: Fractal Design Define 7
PSU: 1000W
GPU: Arc A770 16GB
As you can see by the oversized PSU and case, I have plenty of room for expansion.
I will start by describing my use case and then go from there:
I installed Nous-Hermes-13B-GGML & WizardLM-30B-GGML using the instructions in this reddit post. The main limitiation on being able to run a model in a GPU seems to be its VRAM. Nous-Hermes-13B-GGML requires 12.26 GB of ram and I am able to offload the entire model to my A770 GPU, causing it to run much faster than when even some of its layers are left on the CPU. I have aboslutely no complaints about how Nous-Hermes-13B-GGML runs, however, the model itself clearly has limitiations. WizardLM-30B-GGML requires 27GB, I can only send 40 of 63 layers of to the GPU, and it runs very slowly, only outputting about 1 token per second, and the program crashes frequently, seemingly whenever any video demands are put on the GPU (Like just generating basic screen output). As a result, WizardLM-30B-GGML takes several minutes to post a single reply.
Additionally, I would like to be able to run even larger LLMs as they become available, so I may just hold off on buying anything now and wait until there is an open source one that can rival GPT-4. In this case, I would still need to know how to go about choosing which components to handle the workload.
When launching koboldccp.exe to run the LLM, I chose 16 threads since the Ryzen 9 7950X has 16 physical cores, and select my only GPU to offload my 40 layers to. When I offload 40 layers, my VRAM usage on my GPU is 15.1GB. However, when the model is running, my GPU usage it typically only around 48%-55%.
At this point, I am going to just share my thoughts and musings on this subject, as I do not have any good answers.
I assume that I would need a GPU with more than 27GB of VRAM to run the whole model, but I have not seen any GPU with that much VRAM that isn't insanely expensive. If it is possible to divide the workload among multiple GPUs, then I could just get another A770 16GB because it seems to have very good VRAM per dollar.
Are GPUs even the way to go for this workload? I remember a Veritasium video featuring a company called Mythic AI that was making analogue components for running Neural Networks efficiently. I went to their website, but didn't see any way to buy any of their products.
Any advice or suggestions on the subject would be appreciated. Thank you in advance.