It all depends on how large of a numerical range your data needs to use. Some workloads needs more than what the span of fp16 can offer.
But you don't need the full range that fp32 provides.
By using fp24, you use less data overall when your dataset doesn't need to span the full breadth of fp32.
I'm not asking about merely
theoretical benefits, though. What I was asking is if their GPU lacks hardware support for packing/unpacking and can issue 1/cycle - the same as fp32 - then it's cumbersome to work with (due to manually packing/unpacking) for what benefit?
There's no real overhead when it's supported in hardware, which AMD had previously announced support of in their GPU's.
Look at the Vega ISA manual I linked a couple posts ago, and tell me
where it says they support 24-bit floating point in hardware.
Alternative floating point standards are being used in the industry as is.
Look at
bFloat16, it co-exists with standard fp16
I know about BFloat16. It has many of the same advantages as fp16, except they trade some precision for more range (and easy conversion to/from fp32). But, we weren't talking about BFloat16!
Then there's nVIDIA's maddening 19-bit TensorFloat
Why on Earth would nVIDIA make a 19-bit floating point data-type?
It doesn't even Byte-Align, so you have 5-bits wasted when being stored in every 3-bytes.
According to this, it sounds like TF32 is just an in-register format? It probably just rounds the fraction to 10 bits.
The reason for reducing precision is to make the Tensor ALUs smaller, simpler and more power-efficient. The size of a FP multiplier is supposed to grow as the square of the number of fractional bits. That was one of the main arguments for BFloat16, but I guess someone decided it could use another 3 bits of precision. I think that makes it applicable to a much larger problem domain, such as audio processing, but maybe it also helps improve model convergence times.
At least AMD's fp24 is a slight variance of Pixar's PXR24.
That's compressed using a dictionary scheme. If it's not supported in hardware (which I highly doubt), then decode performance on a GPU will be
terrible. If you need to load or save images in PXR24, just convert them to/from a standard texture format on the CPU.
Adding support into your FPU to support all those data types, when you already support the full range of IEEE 754 fp data types is a good idea in this day and age where you have people with different needs for different data types of different sizes.
Not really, because modern GPUs have many thousands of ALUs, so any feature you add to them uses tens of thousands of times as much die area as just adding it to a single ALU.
Think about it, Pixar has their own 24-bit fp data type, that has a VERY minor difference with AMD's fp24.
Pixar has been around since the 1980's. There might be a lot of stuff out there with their name on it, that they themselves no longer even use.
But the bigger issue with AMD having some proprietary floating point format "just because", is that you need people to write AMD-specific shaders for it to serve any purpose, and game developers aren't going to do that if the benefits of using it don't significantly outweigh the effort of using it. Even then, a lot of game developers still won't bother.
Back in the day, DX 9.0 had a minimum of fp24 required to support the spec.
So it worked back in the day.
A "minimum" implies the implementation can go beyond. They probably specified it that way, so that DX 9 would run on old cards that
only had fp24.