News AI engineers build new algorithm for AI processing, replace complex floating-point multiplication with integer addition

yahrightthere

Prominent
Oct 26, 2023
27
5
535
Seeing is believing, where's the white paper on this? As I could not find it.
As for the load on the grid, have seen many reports of data center's inking deals to get this that & the other nuclear sites back up & running on line, as well as adding new nuclear sites & including small modular reactors, which would add to the infrastructure of the grid & reduce the load.
It's understood that all this will take time, money & efforts from all facets to accomplish this.
 
  • Like
Reactions: bit_user

Li Ken-un

Distinguished
May 25, 2014
161
111
18,760
“Work smarter not harder.” 🙂

The operating cost to feed the power-hungry algorithms should convince them if the 95% reduction is true.

What’s the relatively fixed cost of investment into the hardware and nuclear power plants compared to the ongoing cost of feeding the less efficient algorithms?
 

JTWrenn

Distinguished
Aug 5, 2008
331
234
19,170
Not sure if promising or just a flare for the hope of capital investment. The hedged wording and apparently no fully working product screams "please invest in us" to me.
 

AkroZ

Reputable
Aug 9, 2021
56
31
4,560
Here is the paper: https://arxiv.org/html/2410.00907v2

I have read it, it's interesting but it is listing only the advantages and not the downfalls, basically this is a paper to ask for investments.
They demonstrate higher precision than FP8 with theorically less costly operations but their implementation is a FP32 meaning that it use 4 times more memory and they do not calculate the potential energy drain of those memory operations.
This is not considered for inference but only for the execution of models (as memory is the main limiting factor), notably for AI processor unit.
 

bit_user

Titan
Ambassador
Thanks for this! @yahrightthere take note!

I have read it, it's interesting but it is listing only the advantages and not the downfalls, basically this is a paper to ask for investments.
They do list its limitations.

They demonstrate higher precision than FP8 with theorically less costly operations but their implementation is a FP32 meaning that it use 4 times more memory and they do not calculate the potential energy drain of those memory operations.
They merely prototyped it on existing hardware. Nvidia GPUs, to be precise. Nvidia doesn't support general arithmetic on lower-precision data types than that.

From briefly skimming the paper, I think they're actually proposing to implement it at 16 bit, but they also work out the implementation cost at 8-bit.

This is not considered for inference but only for the execution of models
"inference" is the term used for what I think you mean by "execution of models". Here's what the abstract says:

"We further show that replacing all floating point multiplications with 3-bit mantissa ℒ-Mul in a transformer model achieves equivalent precision as using float8_e4m3 as accumulation precision in both fine-tuning and inference."

So, they claim that it's applicable to both inference and a subset of training work (i.e. fine-tuning).
 
Seeing is believing, where's the white paper on this? As I could not find it.
As for the load on the grid, have seen many reports of data center's inking deals to get this that & the other nuclear sites back up & running on line, as well as adding new nuclear sites & including small modular reactors, which would add to the infrastructure of the grid & reduce the load.
It's understood that all this will take time, money & efforts from all facets to accomplish this.
Billions of $$ and decades to implement. I'm not holding my breath.
 
Oct 22, 2024
2
1
10
That's what we used to do back in the 8-bit CPU days when floating point chips and libraries didn't exist. Someone's always thinking that they're the first to have an idea.
I do not think our work is the first (we do cite other things, especially for reciprocal square root...). I do think that we did it in a systematic way and with several error measures for multiple operations though. Primarily to have a baseline over all weird stuff showing up.

Did you ever publish your stuff one way or the other (paper? code?). I'd be happy to cite that in upcoming work as I believe in proper citing.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
Welcome!

The basic idea of multiplying floating-point numbers using integer operations has already been published elsewhere, although not cited in the paper: https://www.diva-portal.org/smash/get/diva2:1636876/FULLTEXT02.pdf
Thanks for sharing! I plan to look at your paper, but not sure when I'll have time. So, I'm just going to mention this now...

I think their key innovation is based on the fact that they're specifically targeting fp8, which is obviously super low-precision and thus opens the door to a very crude approximation method. You should have a look at what they did, if you haven't already. See the link in @AkroZ 's post, above.

BTW, I did the whole thing of optimizing certain FP arithmetic via integers, about 2 decades ago. At that time, basic floating-point arithmetic was already pretty fast, but caches were small and SSE2 helped make fixed-point still a win, if you didn't need much precision. I was most proud of my atan() implementation, as well as optimized and generic float <-> fixed conversion routines.
 
Last edited: