Casting you as the dissenter? But darling, YOU ARE the original dissenter...
"Okay, but AI doesn't use fp64." 🧐
Oh, am I? That statement wasn't disagreeing with
anything - it was merely stating a relevant fact. There was no actual point in contention, until you contradicted it.
Which you never provided exhaustive supporting evidence for.
As you'll know, one can't prove a negative. However, let's look at some examples of AI accelerators
lacking fp64.
- Tenstorrent - "On Grayskull this is a vector of 64 19 bit values while on Wormhole this is a vector of 32 32 bit values."
- Cerebras - "The CE’s instruction set supports FP32, FP16, and INT16 data types"
- Habana/Intel's Gaudi - "The TPC core natively supports the following data types: FP32, BF16, INT32, INT16, INT8, UINT32, UINT16 and UINT8."
- Movidius/Intel's NPU (featured in Meteor Lake) - "Each NCE contains two VLIW programmable DSPs that supports nearly all data-types ranging between INT4 to FP32."
- AMD's XDNA NPU (based on Xilinx Versal cores) - "Enhanced DSP Engines provide support for new operations and data types, including single and half-precision floating point and complex 18x18 operations."
Not to mention that Nvidia's inferencing-oriented L4 and L40 datacenter GPUs implement fp64 at just 1:64 relative to their vector fp32 support (plus no fp64 tensor support). That's almost certainly there as a vestige of their client GPUs fp64 scalar support, which is needed for the odd graphics task like matrix inversion.
I don't know about you, but I'd expect fp64 support to be a heck of a lot more prevalent in so many purpose-built AI accelerators and AI-optimized GPUs, if it were
at all relevant for AI. Instead, what we see is that even
training is mostly using just 16-bits or less!
Yet feel you can compel others to do so? 🤨
Hey,
you're the one who volunteered the oddly specific claim:
"Yes, it does, only it's mainly edge cases"
That seems to suggest specific knowledge of these "edge cases" where it's needed. If you don't actually know what some of those "edge cases" are, then how are you so sure it's needed for them?
So to be clear, and to further summarize your position: You Disagree with Tom's, Tenstorrent, InspireSemi, and myself.
Okay, Thanks ! 🤠🤙
As I said, I'm not in disagreement with them over this. In your zeal to score internet points, it seems you didn't take the time to digest what the article relayed about InspireSemi's strategy:
"This is a major milestone for our company and an exciting time to be bringing this versatile accelerated computing solution to market," said Alex Gray, Founder, CTO, and President of InspireSemi. "Thunderbird accelerates many critical applications in important industries that other approaches do not,
So, they are taking a decidedly generalist approach, much more like Xeon Phi than Tenstorrent's chips. This is underscored by their point that:
"this processor can be programmed like a regular RISC-V CPU and supports a variety of workloads, such as AI, HPC, graph analytics, blockchain, and other compute-intensive applications. As a result, InspireSemi's customers will not have to use proprietary tools or software stacks like Nvidia's CUDA."
Above, I linked to Tenstorrent's TT-Metallium SDK. Writing compute kernels to run on their hardware requires specialized code, APIs, and tools, which is quite contrary to InspireSemi's pitch.