News $95 AMD CPU Becomes 16GB GPU to Run AI Software

Status
Not open for further replies.
> Unfortunately, he only provided demos for Stable Diffusion, an AI image generator based on text input. He doesn't detail how he got the Ryzen 5 4600G to work with the AI software on his Linux system.

Very typical in the home AI space; everyone is quick to show off their results but no one ever wants to detail exactly how they did it. Leaving the rest of us to bang our heads against the keyboard with endless CUDA, Pytorch, and driver errors, let alone even having the correct command, models, and file configurations.
 
Logically, the APU doesn't deliver the same performance as a high-end graphics card, but at least it won't run out of memory during AI workloads, as 16GB is plenty for non-serious tasks.
Stable Diffusion doesn't really run out of memory during AI workloads, at least the implementations I'm familiar with (Automatic1111 and ComfyUI) can work even with low (≤4GB) VRAM GPUs with a speed penalty, moving stuff between DRAM and VRAM. Kobold and similar programs can do the same with text generation, but the speed penalty, in my experience, is so large that it doesn't make it worthwhile.
 
Last edited:
This is a 0GB GPU. Shared ram is nowhere near the same thing. It's like calling a 3.5in floppy disk a Hard Drive or USB 2.0 flash drive a SSD.
When you set the RAM on an iGPU, you are reserving that amount of RAM specifically for the iGPU. That in turn means it is a 16GB GPU. Now you do not get the same performance as if it had its own VRAM due to shared bandwidth, but the frame buffer is the full 16GB.
 
That's exactly the HSA use case for which I bought my Kaveri, except that I didn't have ML in mind but figured there might be a smarter spreadsheet that would use 512 GPGPU cores to do a recalc instead of 4 integer and 2 FP cores. The ability to switch between CPU and GPU code at subroutine level was just mind blowing, if only anyone had been able to exploit that potentioal!

Problem was that nobody really wanted to rewrite their spreadsheets or parametric design software to really use GPGPU code back then. And while the unified memory space even allows GPUs to use "CPU-RAM" today, the memory wall in front of DRAM over PCIe would likely anihilate much of the effort.

I hate to say it, because IMHO the last good Apple was a ][, but it's there, where the Mx designs with their super wide DDR RAM could reopen the HSA potential, if they sold them outside the walled iEnclave.

Even the current generations of consoles wouldn't do too badly, if only you could run a Linux and ROCm on them...

APUs or GPGPU architectures won't conquer the leading edge, but their potential for usability towards the edge is huge. Too bad it's also where budgets are much less so and trickling down isn't that natural.
 
  • Like
Reactions: gg83
Typically, 16GB is the maximum amount of memory you can dedicate to the iGPU. However, some user reports claim that certain ASRock AMD motherboards allow for higher memory allocation, rumored up to 64GB.

That is way more than I have ever seen BIOS allow for iGPUs. Took a look, my ryzen board only support 2G max.

But is simply allocating more BIOS memory really a "trick"? Is this like "one weird trick" or something? And really, a simple "trick" BIOS setting change that's worth a whole article? Was there an upgrade to CoreBOOT that I missed?
 
Last edited:
We wonder if AMD's latest mobile Ryzen chips, like Phoenix that taps into DDR5 memory, can work and what kind of performance they bring.
It's worth noting that Phoenix and later desktop APUs would not only include substantially better iGPUs and DDR5 support, but also the XDNA AI accelerator. I don't know if it would be any faster than using the RDNA graphics, but it could allow you to use your APU to game while running Stable Diffusion or whatever on the accelerator at the same time.
 
  • Like
Reactions: gg83
I took me about 9 seconds to render a 512x512 at 50 steps using stable diffusion.
The Stable diffusion defaults for me is 512x512 at 20 steps and that took 4 seconds.
Rendered on a Geforce 3080Ti.
Impressive that it ran on a cpu ... not impressive that it took 1000s of percent longer!
 
There goes Tom's Hardware again. "You can turn a cheap CPU with iGPU into a GPU for AI with this one simple trick!" And it's just an OEM APU with 16 GB of slow DDR4.
 
If they could get it to allocate more than 16GB reliably, then it could be quite good for some AI workloads, especially since many devs working on those programs are not willing to provide proper support for shared memory + dedicated VRAM. While shared memory causes a large performance drop on memory hard computation, there are times when a user won't mind the slowdown. for example, tweaking some AI optical flow, then turning your 12GB card into a 24GB+ card so that you can do the optical flow on the 4K footage while you go out for some grocery shopping or some other task.

Same for some stable diffusion models, mane people will rather they run 30-60% slower than not run at all, but many of those applicatios will not allow for a spillover to shard system memory with there is not enough dedicated VRAM.
 
When you set the RAM on an iGPU, you are reserving that amount of RAM specifically for the iGPU. That in turn means it is a 16GB GPU. Now you do not get the same performance as if it had its own VRAM due to shared bandwidth, but the frame buffer is the full 16GB.
It still needs to access that RAM pool via the CPU, because the iGPU does not have its own memory interface.
And if you need to jump via the CPU to access RAM anyway, mays as well use a GPU so you have access to its internal memory and an arbitrarily large pool of fenced-off RAM connected via the CPU.
 
Last edited:
It still needs to access that RAM pool via the CPU, because the iGPU does not have its own memory interface.
And if you need to jump via the CPU to access RAM anyway, mays as well use a GPU so you have access to its internal memory and an arbitrarily large pool of fenced-off RAM connected via the CPU.
As far as I know the iGPUs have DMA to the RAM. Therefore it doesn't go through the CPU for access.
 
  • Like
Reactions: greenreaper
If the author or anyone else could provide information which motherboards allow over 16 GB (by linking to the discussion in question for example) I would be very grateful since I want to try some of the large token, large language models but I don't have the money for an A100 either. Since it's for learning about this, it's fine with me if I have to let the training process run for weeks. I am just trying to learn.
 
As far as I know the iGPUs have DMA to the RAM. Therefore it doesn't go through the CPU for access.
So can GPUs on the PCIe bus. But DMA is a logical process, in both cases the RAM is physically connected to the memory controller on the CPU die, and neither a dGPU nor iGPU can access that RAM without use of the PCU's memory controller. This is not like Kaby Lake G where the GPU had its own independent memory controller.
 
There goes Tom's Hardware again. "You can turn a cheap CPU with iGPU into a GPU for AI with this one simple trick!" And it's just an OEM APU with 16 GB of slow DDR4.
There it's you again, falling for the headline, when you should know that can't be quite true in the sense that you imagine...

Functionally, that's what's been done, in terms of actual usability it's like using Raspberry PIs for HPC or arguing you can know run ChatGPT on it (while it's just acting a proxy).

You deserve to be disappointed if you made the mistake of getting your hopes up to what's unrealistic.
 
Status
Not open for further replies.