Nice article. It's good to see this sort of content, on the site.
a number thus contains three pieces of information: its sign, its mantissa (which itself can be positive or negative) and the exponent.
Um, I think you meant to say the
exponent can itself be positive or negative. The sign bit applies to the mantissa, but the exponent is biased (i.e. so that an 8-bit exponent of 127 is 0, anything less than that is negative, and anything greater is positive).
To make matters simple, IEEE has standardized several of these floating-point number formats for computers
You could list the IEEE standard (or even provide a link), so that people could do some more reading, themselves. I applaud your efforts to explain these number formats, but grasping such concepts from quite a brief description is a lot to expect of readers without prior familiarity. To that end, perhaps the Wikipedia page is a reasonable next step for any who're interested:
en.wikipedia.org
Nvidia recognized this trend early on (possibly aided by its mobile aspirations at the time) and introduced half-precision support in Maxwell in 2014 at twice the throughput (FLOPS) of FP32.
That's only
sort of true. Their Tegra X1 is the only Maxwell-derived architecture to have it. And of Pascal (the following generation), the only chip to have it was the server-oriented P100.
In fact, Intel was first to the double-rate fp16 party, with their Gen8 (Broadwell) iGPU!
AMD was a relative late-comer, only adding full support in Vega. However, a few generations prior, they had load/save support for fp16, so that it could be used as the in-memory representation while actual computations continued to use full fp32.
It could be noted that use of fp16 in GPUs goes back about a decade further, when people had aspirations of using it for certain
graphical computations (think shading or maybe Z-buffering, rather than geometry). And that format was included in the 2008 version of the standard. Unfortunately, there was sort of a chicken-and-egg problem, with GPUs adding only token hardware support for it and therefore few games bothered to use it.
This is a nice diagram, but It would've been interesting to see FP16 aligned on the exponent-fraction boundary.
hardware area (number of transistors) scales roughly with the square of the mantissa width
Important point - thanks for mentioning.
The elements of a Flexpoint tensor are (16-bit) integers, but they have a shared (5-bit) exponent whose storage and communication can be amortized over the whole tensor
As a side note, there are some texture compression formats like this. Perhaps that's where they got the idea?
ARM too has not followed FP32 rigorously and instead introduced some simplifications.
Specific to BFloat16 instructions, right? Otherwise, I believe ARMv8A is IEEE 754-compliant.
The new BF16 instructions will be included in the next update of the Armv8-A instruction set architecture. Albeit not yet announced, this would be ARMv8.5-A. They should find its way to ARM processors from its partners after that.
This strikes me as a bid odd. I just don't see people building AI training chips out of ARMv8A cores. I suppose people can try, but they're already outmatched.