News AMD's Instinct MI355X accelerator will consume 1,400 watts

CDNA4 appears to be a decent improvement over CDNA3 as both the MI325X and "default" MI355X are rated at 1000W, yet perf is up considerably on the latter. I'm sure the AI zealots also appreciate the new data formats.
 
  • Like
Reactions: Makaveli
Curious about this computer slang: how chiplets could be "reticle-sized" ?

Wiki: A reticle also known as crosshair, is a pattern of fine lines or markings built into the eyepiece of an optical device such as a telescopic sight, spotting scope, theodolite, microscope to provide measurement reference during visual inspection
 
Also curious - when making specialized GPUs by removing HPC-oriented FP64 instructions hardware on a chip, how much silicon is actually saved? If it is relatively small and negligible, let it will still stay there. When all this AI hardware will be similarly like a fire become obsolete and go to the city dumps in masses then with FP64 at least it will be used for HPC
 
Last edited:
Curious about this computer slang: how chiplets could be "reticle-sized" ?

Wiki: A reticle also known as crosshair, is a pattern of fine lines or markings built into the eyepiece of an optical device such as a telescopic sight, spotting scope, theodolite, microscope to provide measurement reference during visual inspection
A reticle in this case basically means after the light passes through a mirror, it projects an image of the pattern onto a surface. In semiconductors, it's about 850mm2. So the maximum size per chiplet is 850mm2.
Also curious - when making specialized GPUs by removing HPC-oriented FP64 instructions hardware on a chip, how much silicon is actually saved? If it is relatively small and negligible, let it will still stay there. When all this AI hardware will be similarly like a fire become obsolete and go to the city dumps in masses then with FP64 at least it will be used for HPC
It's significant. FP64 is actually fairly power(only applicable when using FP64) and die area intensive. When you look at whole chip area wise, it would in the range of 10-20%. If you look at individual SM level, it's going to be even more significant, because chips have components that doesn't compute such as memory controllers, IO connections, and caches and other accelerators. It may be in the range of 20-30% in that case.

Full FP64 compliance is set by IEEE, and it wasn't used by any AMD/Nvidia GPUs until fairly recently. It's because compliance took extra effort and there's the power and transistor cost associated with it. Then, "GPUs" actually started living up to the name and did more than just run games, and went into supercomputers.

The difference between FP64 and FP32 is precision. In High Performance Computing(HPC) where you are simulating real world stuff, accuracy is important. In games where you have millions of pixels moving rapidly and changing all the time... not so much.

AI sacrifices precision even more. FP16, and Int8 for examples. I would not be surprised if AI hallucinations are caused significantly by this. The problem is, FP32 takes twice as much units as FP16, and FP64 is twice the amount of FP32, so that's why they use lower precision.

The precision losses are more significant though. It's like comparing 16-bit colors vs 8 bit colors. One is 2 to the power of 16 colors, which is 65,536, and other is 2 to the power of 8, which is only 256. So by going from FP64 to FP32, you might go from practically zero errors, to suddenly few % errors on FP32. And then if the AI is trying to relearn from finished data, you are multiplying the errors. This isn't even taking into account problems with algorithms, or the quality of the original data. It's like a printer that takes a 1200 dpi picture and prints at 300 dpi.
 
Last edited by a moderator:
  • Like
Reactions: Stomx
Still, using nuclear reactors to power supercomputers seems in the 2030s seems to be a more and more realistic possibility.

Great article. Last sentence typo is disconcerting though. Not the best way to end it.