News $95 AMD CPU Becomes 16GB GPU to Run AI Software

Status
Not open for further replies.

Firestone

Distinguished
Jul 11, 2015
101
20
18,585
> Unfortunately, he only provided demos for Stable Diffusion, an AI image generator based on text input. He doesn't detail how he got the Ryzen 5 4600G to work with the AI software on his Linux system.

Very typical in the home AI space; everyone is quick to show off their results but no one ever wants to detail exactly how they did it. Leaving the rest of us to bang our heads against the keyboard with endless CUDA, Pytorch, and driver errors, let alone even having the correct command, models, and file configurations.
 

Ewitte

Distinguished
Nov 1, 2015
46
1
18,545
This is a 0GB GPU. Shared ram is nowhere near the same thing. It's like calling a 3.5in floppy disk a Hard Drive or USB 2.0 flash drive a SSD.
 
  • Like
Reactions: RedBear87

RedBear87

Commendable
Dec 1, 2021
150
114
1,760
Logically, the APU doesn't deliver the same performance as a high-end graphics card, but at least it won't run out of memory during AI workloads, as 16GB is plenty for non-serious tasks.
Stable Diffusion doesn't really run out of memory during AI workloads, at least the implementations I'm familiar with (Automatic1111 and ComfyUI) can work even with low (≤4GB) VRAM GPUs with a speed penalty, moving stuff between DRAM and VRAM. Kobold and similar programs can do the same with text generation, but the speed penalty, in my experience, is so large that it doesn't make it worthwhile.
 
Last edited:
This is a 0GB GPU. Shared ram is nowhere near the same thing. It's like calling a 3.5in floppy disk a Hard Drive or USB 2.0 flash drive a SSD.
When you set the RAM on an iGPU, you are reserving that amount of RAM specifically for the iGPU. That in turn means it is a 16GB GPU. Now you do not get the same performance as if it had its own VRAM due to shared bandwidth, but the frame buffer is the full 16GB.
 

abufrejoval

Reputable
Jun 19, 2020
615
454
5,260
That's exactly the HSA use case for which I bought my Kaveri, except that I didn't have ML in mind but figured there might be a smarter spreadsheet that would use 512 GPGPU cores to do a recalc instead of 4 integer and 2 FP cores. The ability to switch between CPU and GPU code at subroutine level was just mind blowing, if only anyone had been able to exploit that potentioal!

Problem was that nobody really wanted to rewrite their spreadsheets or parametric design software to really use GPGPU code back then. And while the unified memory space even allows GPUs to use "CPU-RAM" today, the memory wall in front of DRAM over PCIe would likely anihilate much of the effort.

I hate to say it, because IMHO the last good Apple was a ][, but it's there, where the Mx designs with their super wide DDR RAM could reopen the HSA potential, if they sold them outside the walled iEnclave.

Even the current generations of consoles wouldn't do too badly, if only you could run a Linux and ROCm on them...

APUs or GPGPU architectures won't conquer the leading edge, but their potential for usability towards the edge is huge. Too bad it's also where budgets are much less so and trickling down isn't that natural.
 
  • Like
Reactions: gg83

ezst036

Honorable
Oct 5, 2018
766
643
12,420
Typically, 16GB is the maximum amount of memory you can dedicate to the iGPU. However, some user reports claim that certain ASRock AMD motherboards allow for higher memory allocation, rumored up to 64GB.

That is way more than I have ever seen BIOS allow for iGPUs. Took a look, my ryzen board only support 2G max.

But is simply allocating more BIOS memory really a "trick"? Is this like "one weird trick" or something? And really, a simple "trick" BIOS setting change that's worth a whole article? Was there an upgrade to CoreBOOT that I missed?
 
Last edited:

usertests

Distinguished
Mar 8, 2013
969
856
19,760
We wonder if AMD's latest mobile Ryzen chips, like Phoenix that taps into DDR5 memory, can work and what kind of performance they bring.
It's worth noting that Phoenix and later desktop APUs would not only include substantially better iGPUs and DDR5 support, but also the XDNA AI accelerator. I don't know if it would be any faster than using the RDNA graphics, but it could allow you to use your APU to game while running Stable Diffusion or whatever on the accelerator at the same time.
 
  • Like
Reactions: gg83
I took me about 9 seconds to render a 512x512 at 50 steps using stable diffusion.
The Stable diffusion defaults for me is 512x512 at 20 steps and that took 4 seconds.
Rendered on a Geforce 3080Ti.
Impressive that it ran on a cpu ... not impressive that it took 1000s of percent longer!
 

nuttynut

Distinguished
Jun 7, 2016
107
1
18,715
There goes Tom's Hardware again. "You can turn a cheap CPU with iGPU into a GPU for AI with this one simple trick!" And it's just an OEM APU with 16 GB of slow DDR4.
 

razor512

Distinguished
Jun 16, 2007
2,159
87
19,890
If they could get it to allocate more than 16GB reliably, then it could be quite good for some AI workloads, especially since many devs working on those programs are not willing to provide proper support for shared memory + dedicated VRAM. While shared memory causes a large performance drop on memory hard computation, there are times when a user won't mind the slowdown. for example, tweaking some AI optical flow, then turning your 12GB card into a 24GB+ card so that you can do the optical flow on the 4K footage while you go out for some grocery shopping or some other task.

Same for some stable diffusion models, mane people will rather they run 30-60% slower than not run at all, but many of those applicatios will not allow for a spillover to shard system memory with there is not enough dedicated VRAM.
 

edzieba

Distinguished
Jul 13, 2016
589
594
19,760
When you set the RAM on an iGPU, you are reserving that amount of RAM specifically for the iGPU. That in turn means it is a 16GB GPU. Now you do not get the same performance as if it had its own VRAM due to shared bandwidth, but the frame buffer is the full 16GB.
It still needs to access that RAM pool via the CPU, because the iGPU does not have its own memory interface.
And if you need to jump via the CPU to access RAM anyway, mays as well use a GPU so you have access to its internal memory and an arbitrarily large pool of fenced-off RAM connected via the CPU.
 
Last edited:
It still needs to access that RAM pool via the CPU, because the iGPU does not have its own memory interface.
And if you need to jump via the CPU to access RAM anyway, mays as well use a GPU so you have access to its internal memory and an arbitrarily large pool of fenced-off RAM connected via the CPU.
As far as I know the iGPUs have DMA to the RAM. Therefore it doesn't go through the CPU for access.
 
  • Like
Reactions: greenreaper

JRRT

Prominent
Mar 25, 2022
41
0
530
If the author or anyone else could provide information which motherboards allow over 16 GB (by linking to the discussion in question for example) I would be very grateful since I want to try some of the large token, large language models but I don't have the money for an A100 either. Since it's for learning about this, it's fine with me if I have to let the training process run for weeks. I am just trying to learn.
 

edzieba

Distinguished
Jul 13, 2016
589
594
19,760
As far as I know the iGPUs have DMA to the RAM. Therefore it doesn't go through the CPU for access.
So can GPUs on the PCIe bus. But DMA is a logical process, in both cases the RAM is physically connected to the memory controller on the CPU die, and neither a dGPU nor iGPU can access that RAM without use of the PCU's memory controller. This is not like Kaby Lake G where the GPU had its own independent memory controller.
 

abufrejoval

Reputable
Jun 19, 2020
615
454
5,260
There goes Tom's Hardware again. "You can turn a cheap CPU with iGPU into a GPU for AI with this one simple trick!" And it's just an OEM APU with 16 GB of slow DDR4.
There it's you again, falling for the headline, when you should know that can't be quite true in the sense that you imagine...

Functionally, that's what's been done, in terms of actual usability it's like using Raspberry PIs for HPC or arguing you can know run ChatGPT on it (while it's just acting a proxy).

You deserve to be disappointed if you made the mistake of getting your hopes up to what's unrealistic.
 
Status
Not open for further replies.