News AWS building ExaFLOPS-class supercomputer for AI with hundreds of thousands Trainium2 processors

My prediction is after we've accomplished enough performance at low precision, we see precision go back up again. FP8 and even NF4 can produce similar results to to FP16 and higher, but similar isn't exact. Chasing that last 5% is going to need precision, but we're still a long way away from that.
 
My prediction is after we've accomplished enough performance at low precision, we see precision go back up again. FP8 and even NF4 can produce similar results to to FP16 and higher, but similar isn't exact. Chasing that last 5% is going to need precision, but we're still a long way away from that.
This is too simplistic. The value of low-precision lies in a few different scenarios.

In the simplest case, you could use it to quickly bootstrap a model, before fine-tuning it at higher precision. That can shave off a lot from the overall training time. I don't know how much that would compromise the accuracy of the trained model, but it's worth considering.

Second, there are certain layers and weights where the impact of low-precision might be negligible, in which case there can be substitutions in the fully-trained model. I think convolution layers, especially those involving larger convolutions, are probably a good example of this.

Finally, the accuracy deficit of lower-precision weights can probably be compensated by increasing the number of weights, yet not to a degree that would cancel out the savings from using lower precision.

I'm sure the AI community takes a more nuanced view of reduced precision arithmetic. The hardware developers are just touting raw performance vs. accuracy, but that doesn't mean (competent) users are employing reduced-precision arithmetic in exactly they same way as higher-precision.
 
This is too simplistic. The value of low-precision lies in a few different scenarios.

In the simplest case, you could use it to quickly bootstrap a model, before fine-tuning it at higher precision. That can shave off a lot from the overall training time. I don't know how much that would compromise the accuracy of the trained model, but it's worth considering.

Second, there are certain layers and weights where the impact of low-precision might be negligible, in which case there can be substitutions in the fully-trained model. I think convolution layers, especially those involving larger convolutions, are probably a good example of this.

Finally, the accuracy deficit of lower-precision weights can probably be compensated by increasing the number of weights, yet not to a degree that would cancel out the savings from using lower precision.

I'm sure the AI community takes a more nuanced view of reduced precision arithmetic. The hardware developers are just touting raw performance vs. accuracy, but that doesn't mean (competent) users are employing reduced-precision arithmetic in exactly they same way as higher-precision.
Today distillation is all the rage - training lower precision models to mimic the results of higher precision models, and there's a very valid use case for this.

No doubt this is a complicated field, and right now the lowest precision that can get 'good enough' will be strongly preferred. I'm just making the generalized prediction that after the race to the bottom is complete, precision will start to head back the other way. As such, I don't see these current chip designs lasting as long as some of the architectures that came before - they seem too hyper focused on a narrow subset. That's great news for Nvidia who will get to sell full datacenter refreshes faster than ever.