News AMD shares 30x25 progress: New AI machines are 28.3 times faster than 2020 machines

I'd be insterested in some data points on FP8 vs INT8.

For inference it doesn't seem to make much of a difference in terms of quality from what I've glanced, yet with training my gut feeling would be that it's way too much work thinking which layers might tolerate the reduced precision.

So is FP8 actually used widely in the field? I can see how very specific dense models (e.g. vision) might still work and profit from the speed vs. bfloat16, but those sound like industrial and automotive to me.

Dojo is really big on exploiting many permutations of reduced precision data types, but I've found nothing similar elsewhere.

Do any of the gaming use cases rely on FP8?
 
  • Like
Reactions: artk2219
Without disclosing the 2020 comparison system, this is just a pick-your-desired-marketing-multiplier exercise. In 2020, AMD had chips ranging from the Radeon Instinct MI100 all the way down to the teeny little Athlon Gold 7220U. Where in that range you pick from to compare to today likely gives you an order of magnitude range to choose from depending on what performance comparison you want to stick in your powerpoint.
 
An 28x improvement in performance per Watt within just 4 years is an incredible feat.
However, why does AMD not disclose the reference system from 2020 that was used for this comparison? Without knowing what the current machine was compared against, leaves the auditorium wondering why this information is being hidden?
 
  • Like
Reactions: artk2219
So specific hardware is more efficient than general hardware. Good to know.
As far as apples to apples comparisons RDNA2 came out in 2020 and AMD hasn't improved on that efficiency.

Edit: the Ryzen 9000 series is more efficient than the 5000 series from 2020, but not by 30x.
 
An 28x improvement in performance per Watt within just 4 years is an incredible feat.
However, why does AMD not disclose the reference system from 2020 that was used for this comparison? Without knowing what the current machine was compared against, leaves the auditorium wondering why this information is being hidden?
Even 28X over mi100 should not be surprising, mi100 has no int8 units