Probably lots of INT8 matrix operations
AMD CDNA 3 Roadmap: MI300 APU With 5X Performance/Watt Uplift : Read more
AMD CDNA 3 Roadmap: MI300 APU With 5X Performance/Watt Uplift : Read more
The scheduled forum maintenance has now been completed. If you spot any issues, please report them here in this thread. Thank you!
AI training has been looking into lower precision alternatives for years. Google's TPUs focus mostly on INT8 from what we can tell. Nvidia has talked teraops (INT8) for a few years, and I think Intel or Nvidia even talked about INT2 for certain AI applications as having a benefit. Given MI250X has the same teraflops for bfloat16, fp16, int8, and int4, that means there's no speedup right now. But if AMD reworks things so that two int8 or four int4 can execute in the same time as a single 16-bit operation, they get a 2x or 4x speedup.If they claim 8x AI training, that is float matrix multiplication performance, something like bfloat16 and not int8.
Jarred, you're missing a key point: training vs. inference. @Bikki was pointing out that int8 isn't useful for training, which traditionally requires more range & precision, like BF16. It's really inference that uses the lower-precision data types you mentioned.AI training has been looking into lower precision alternatives for years. Google's TPUs focus mostly on INT8 from what we can tell. Nvidia has talked teraops (INT8) for a few years, and I think Intel or Nvidia even talked about INT2 for certain AI applications as having a benefit.
AFAIK, Nvidia and others are actively researching training as well as inference using lower precision formats. Some things do fine, others need the higher precision of BF16. If some specific algorithms can work with INT8 or FP8 instead of BF16/FP16, that portion of the algorithm can effectively run twice as fast. Nvidia's transformer engine is supposed to help with switching formats based on what is needed. https://blogs.nvidia.com/blog/2022/03/22/h100-transformer-engine/Jarred, you're missing a key point: training vs. inference. @Bikki was pointing out that int8 isn't useful for training, which traditionally requires more range & precision, like BF16. It's really inference that uses the lower-precision data types you mentioned.
That link only mentions int8 in passing, but actually talks about using fp8 for training.AFAIK, Nvidia and others are actively researching training as well as inference using lower precision formats. Some things do fine, others need the higher precision of BF16. If some specific algorithms can work with INT8 or FP8 instead of BF16/FP16, that portion of the algorithm can effectively run twice as fast. Nvidia's transformer engine is supposed to help with switching formats based on what is needed. https://blogs.nvidia.com/blog/2022/03/22/h100-transformer-engine/