This is a big misconception - you don't need a super fast GPU for inference if you don't sell this inference. And even in this case, the first and only thing that matters for inference is the VRam capacity, and that's it. There is no point in a super fast machine if it can only go a couple of miles on a full tank.But research and development people trying to do inference workloads are for sure buying 4090/4080, and will also buy 5090.
Yes, these people will buy them by the pallet and put 4-8 pcs in the chassis. The dual-slot design will help a lot with this) This is an analogue of mining fever. But this has nothing to do with ordinary people. This does not mean that the GPU is worthy of 5 stars. You forget that AI has not yet brought profit to anyone, and it has not brought it to the buyers of H100/200 who are now renting out their capacities at a loss - you can rent an H100 for a year at the price of $2000. For home use for output this is the stupidest purchase - you definitely need a GPU with at least 20GB VRam and that's it - they are all supported by llama.cpp and work at about the same speed.
Last edited: