News AI engineers build new algorithm for AI processing, replace complex floating-point multiplication with integer addition

Admin · Oct 17, 2024

AI engineers develop an algorithm that will replace floating-point multiplication with integer addition to make AI processing more efficient.

AI engineers build new algorithm for AI processing, replace complex floating-point multiplication with integer addition : Read more

yahrightthere · Oct 17, 2024

Seeing is believing, where's the white paper on this? As I could not find it.
As for the load on the grid, have seen many reports of data center's inking deals to get this that & the other nuclear sites back up & running on line, as well as adding new nuclear sites & including small modular reactors, which would add to the infrastructure of the grid & reduce the load.
It's understood that all this will take time, money & efforts from all facets to accomplish this.

ekio · Oct 17, 2024

If that can apply to ClosedAI, Meta, Google and co, that would be a game changer but without proof, no beliefs.

nitrium · Oct 17, 2024

"potentially up to 95%". I mean that corporate speak, for anywhere from 0% to 95%. The "Up to" number is not something anyone cares about. What's the average saving for typical AI workloads?

Mama Changa · Oct 17, 2024

They don't say at what level of precision. Is it like fp4, fp16 etc. Also, have they never heard of fixed point math?

Li Ken-un · Oct 17, 2024

“Work smarter not harder.” 🙂

The operating cost to feed the power-hungry algorithms should convince them if the 95% reduction is true.

What’s the relatively fixed cost of investment into the hardware and nuclear power plants compared to the ongoing cost of feeding the less efficient algorithms?

JTWrenn · Oct 17, 2024

Not sure if promising or just a flare for the hope of capital investment. The hedged wording and apparently no fully working product screams "please invest in us" to me.

AkroZ · Oct 17, 2024

Here is the paper: https://arxiv.org/html/2410.00907v2

I have read it, it's interesting but it is listing only the advantages and not the downfalls, basically this is a paper to ask for investments.
They demonstrate higher precision than FP8 with theorically less costly operations but their implementation is a FP32 meaning that it use 4 times more memory and they do not calculate the potential energy drain of those memory operations.
This is not considered for inference but only for the execution of models (as memory is the main limiting factor), notably for AI processor unit.

bit_user · Oct 18, 2024

AkroZ said:
Here is the paper: https://arxiv.org/html/2410.00907v2

Thanks for this! @yahrightthere take note!

AkroZ said:
I have read it, it's interesting but it is listing only the advantages and not the downfalls, basically this is a paper to ask for investments.

They do list its limitations.

AkroZ said:
They demonstrate higher precision than FP8 with theorically less costly operations but their implementation is a FP32 meaning that it use 4 times more memory and they do not calculate the potential energy drain of those memory operations.

They merely prototyped it on existing hardware. Nvidia GPUs, to be precise. Nvidia doesn't support general arithmetic on lower-precision data types than that.

From briefly skimming the paper, I think they're actually proposing to implement it at 16 bit, but they also work out the implementation cost at 8-bit.

AkroZ said:
This is not considered for inference but only for the execution of models

"inference" is the term used for what I think you mean by "execution of models". Here's what the abstract says:

"We further show that replacing all floating point multiplications with 3-bit mantissa ℒ-Mul in a transformer model achieves equivalent precision as using float8_e4m3 as accumulation precision in both fine-tuning and inference."

So, they claim that it's applicable to both inference and a subset of training work (i.e. fine-tuning).

bit_user · Oct 18, 2024

Mama Changa said:
They don't say at what level of precision.

The paper mostly focuses on comparing it against different fp8 number formats.

Mama Changa said:
Also, have they never heard of fixed point math?

What good would that do? The problem with fp multiplication is in the mantissa, which is actually cheaper than multiplying fixed-point, since it's fewer bits.

ex_bubblehead · Oct 18, 2024

yahrightthere said:
Seeing is believing, where's the white paper on this? As I could not find it.
As for the load on the grid, have seen many reports of data center's inking deals to get this that & the other nuclear sites back up & running on line, as well as adding new nuclear sites & including small modular reactors, which would add to the infrastructure of the grid & reduce the load.
It's understood that all this will take time, money & efforts from all facets to accomplish this.

Billions of $$ and decades to implement. I'm not holding my breath.

Oscar Gustafsson · Oct 22, 2024

The basic idea of multiplying floating-point numbers using integer operations has already been published elsewhere, although not cited in the paper: https://www.diva-portal.org/smash/get/diva2:1636876/FULLTEXT02.pdf

ex_bubblehead · Oct 22, 2024

Oscar Gustafsson said:
The basic idea of multiplying floating-point numbers using integer operations has already been published elsewhere, although not cited in the paper: https://www.diva-portal.org/smash/get/diva2:1636876/FULLTEXT02.pdf

That's what we used to do back in the 8-bit CPU days when floating point chips and libraries didn't exist. Someone's always thinking that they're the first to have an idea.

Oscar Gustafsson · Oct 22, 2024

ex_bubblehead said:
That's what we used to do back in the 8-bit CPU days when floating point chips and libraries didn't exist. Someone's always thinking that they're the first to have an idea.

I do not think our work is the first (we do cite other things, especially for reciprocal square root...). I do think that we did it in a systematic way and with several error measures for multiple operations though. Primarily to have a baseline over all weird stuff showing up.

Did you ever publish your stuff one way or the other (paper? code?). I'd be happy to cite that in upcoming work as I believe in proper citing.

bit_user · Oct 22, 2024

Welcome!

Oscar Gustafsson said:
The basic idea of multiplying floating-point numbers using integer operations has already been published elsewhere, although not cited in the paper: https://www.diva-portal.org/smash/get/diva2:1636876/FULLTEXT02.pdf

Thanks for sharing! I plan to look at your paper, but not sure when I'll have time. So, I'm just going to mention this now...

I think their key innovation is based on the fact that they're specifically targeting fp8, which is obviously super low-precision and thus opens the door to a very crude approximation method. You should have a look at what they did, if you haven't already. See the link in @AkroZ 's post, above.

BTW, I did the whole thing of optimizing certain FP arithmetic via integers, about 2 decades ago. At that time, basic floating-point arithmetic was already pretty fast, but caches were small and SSE2 helped make fixed-point still a win, if you didn't need much precision. I was most proud of my atan() implementation, as well as optimized and generic float <-> fixed conversion routines.

Search

News AI engineers build new algorithm for AI processing, replace complex floating-point multiplication with integer addition

Admin

Administrator

yahrightthere

Prominent

ekio

Reputable

nitrium

Distinguished

Mama Changa

Proper

Li Ken-un

Distinguished

JTWrenn

Distinguished

AkroZ

Reputable

bit_user

Titan

bit_user

Titan

ex_bubblehead

Titan

Oscar Gustafsson

ex_bubblehead

Titan

Oscar Gustafsson

bit_user

Titan

TRENDING THREADS

Latest posts

Moderators online

Share this page