It's good to see AMD offering a refresh or a completely new AI accelerator each respective year.
The MI325X AI accelerator appears more like a interim solution, since it is a beefed-up refresh of the current MI300X, composed of eight compute, four I/O, and eight memory chiplets stitched together.
So AMD is now confident that its MI325X system could support 1 trillion parameter model ? But they are still focusing on FP16, which requires twice as much memory per parameter as FP8.
MI300X does have hardware support for FP8, but AMD has generally focused on half-precision performance in its benchmarks. At least for inferencing/vLLM, the MI300X was stuck with FP16. And vLLM lacks proper support for FP8 data types.
This isn't any apples-to-apples comparison though. But while it's great to have 288GB of capacity, I'm just worried the extra memory doesn't get overshadowed in a model that would run at FP8 vs Nvidia's H200.
Because that would still require extra memory on the MI325X, twice to be precise. But it appears AMD might have overcome this limitation with this new accelerator .
I hope so !