News AI researchers found a way to run LLMs at a lightbulb-esque 13 watts with no loss in performance

Admin · Wednesday at 7:06 AM

Elimination of matrix multiplication from LLM processing can massively increase performance-per-watt with the correct optimizations, researchers from UC Santa Cruz demonstrate. It remains to be seen how applicable this approach is for AI in general.

AI researchers found a way to run LLMs at a lightbulb-esque 13 watts with no loss in performance : Read more

rluker5 · Wednesday at 7:34 AM

Excellent.

abufrejoval · Wednesday at 8:35 AM

Cool, now the SoC all existing NPU implementations are also architecturally outdated before vendors could ever prove why you would want Co-Pilot on your personal computer in the first place.

hotaru251 · Wednesday at 8:54 AM

This is great for efficiency standpoint but I do wonder if Nvidia will try to take this and just scale it to use more power for more performance. (since they long ago stopped caring about power usage)

hopefully it doesnt scale like that.

salgado18 · Wednesday at 10:01 AM

abufrejoval said:
Cool, now the SoC all existing NPU implementations are also architecturally outdated before vendors could ever prove why you would want Co-Pilot on your personal computer in the first place.

From the news:

The work was done using custom FGPA hardware, but the researchers clarify that (most) of their efficiency gains can be applied through open-source software and tweaking of existing setups

Which is very good news.

DS426 · Wednesday at 11:32 AM

Yep, hilarious as NPU's exist specifically for matrix math and now that might not be needed as a dedicated slice of silicon.

No one has a crystal ball but big tech has treated us mere peasants as complete fools for not going all in as fast as possible on "AI." This is what you get. I actually hope that this new technique doesn't scale much above 50% on existing AI-purpose-built GPU's as I feel like that crisp smack in the face is fully due for. I'm also not saying it doesn't have it's place and progress is good when responsible, but it hasn't been responsible since the news blew up over ChatGPT; when did big business become such big risk takers??

Oh and the best part: AI didn't figure out this optimization problem like all those hopefuls anticipated -- it was us with gray matter.

DS426 · Wednesday at 11:34 AM

OPEN SOURCE? Oh no, don't mention that part to nVidia.

frogr · Wednesday at 1:05 PM

How much energy is required to complete the computation? Watts x seconds = joules.
The comparison in watts is only valid if both methods take the same amount of time

abufrejoval · Wednesday at 1:30 PM

salgado18 said:
From the news:

Which is very good news.

Not sure what you mean... did you read the publication, too?

Because AFAIK these NPUs on the current SoCs aren't in fact FPGAs, but really GPGPU units that are optimized for MatMul on 4-16bit of precision with distinct variants of fixed or floating point weights. And this new implementation eliminates that to a very large degree or in many of those model layers, thus eliminating the benefits of those NPUs.

At the same time their new operations can't be efficiently implemented by some type of emulation via the CPU or even a normal GPU, it requires FPGAs or an entirely new ASIC or IP block.

If you look at the chip diagrams for this emerging wave of SoCs and see the rather large chunks that are being dedicated to NPUs these days, having those turn effectively into dark silicon before they are even coming to the market, isn't good news for the vendors, who've been banging about the need for AI PCs for some time now.

This can't be implemented or fixed in hardware that is currently being manufactured, if that's what you understood from the article.

New hardware implementing this would relatively cheap to make and operate with vastly higher efficencies, but it's new hardware that needs to fit somewhere. And while it should be easy to put on the equivalent of an Intel/Movidius Neural Stick or an M.2 equivalent, that's either clumsy or hard to fit in emerging ultrabooks.

It's good news for someone like me, who really wouldn't want to pay for the NPU because I don't want any Co-Pilot on my PCs, because currently manufactured chips might come down in price quickly.

But vendors won't be happy.

bit_user · Wednesday at 2:23 PM

DS426 said:
it hasn't been responsible since the news blew up over ChatGPT; when did big business become such big risk takers??

I think everyone was worried that AI will be game-changing, like the Internet was. Many companies who failed to predict or respond to the way their business was affected by the internet are no longer with us. And many of the big tech companies today are those which got their start during the internet boom (Google, Facebook, Amazon).

abufrejoval said:
But vendors won't be happy.

I have yet to read the paper, but I'd be cautious about overreacting. They talk specifically about language models, so it might not apply to other sorts of models, like those used for computer vision or image generation.

bit_user · Wednesday at 2:29 PM

frogr said:
How much energy is required to complete the computation? Watts x seconds = joules.
The comparison in watts is only valid if both methods take the same amount of time

The magazine article about it claims "> 50 times more efficient than typical hardware."

https://news.ucsc.edu/2024/06/matmul-free-llm.html

Another noteworthy quote from that article:

"On standard GPUs, the researchers saw that their neural network achieved about 10 times less memory consumption and operated about 25 percent faster than other models. Reducing the amount of memory needed to run a powerful large language model could provide a path forward to enabling the algorithms to run at full capacity on devices with smaller memory like smartphones."

This could be great for iGPUs and NPUs, which tend to have much less available memory bandwidth than dGPUs.

Also:

"With further development, the researchers believe they can further optimize the technology for even more energy efficiency."

So, I guess the upshot is: sell Nvidia, buy AMD (Xilinx)? Intel is probably going to kick themselves for spinning off Altera.

usertests · Wednesday at 8:45 PM

I feel like this was swept off the front page.

It's gigantic if it pans out. But Nvidia is fine for the moment. Check back in 6 months.

bit_user · Wednesday at 9:20 PM

usertests said:
It's gigantic if it pans out. But Nvidia is fine for the moment. Check back in 6 months.

Well, the research really focuses on transformer-type networks. I lack the expertise to say how much further the reach might be, but it's not the first time people have looked at super low-precision weights, even going so far as single-bit.

On the latter point, I'm reminded of delta-sigma modulation as an alternative to PCM. It makes me wonder whether they really need ternary encoding. Even if they do, perhaps the approach you could take is to encode matrices in a way that you just skip all the zeros and then you can just use a single bit to say whether the remaining weights are 1 or -1.

Finally, as it pertains to Nvidia, yeah it could bring their stock back to earth for a bit. However, it's not like production of giant FPGAs can be increased 100x overnight, and so anyone needing AI horsepower is going to have limited choices, in the short term. In the longer term, Nvidia and everyone else will just build this into their AI processors and merrily we'll roll along.

In case you haven't noticed, any efficiencies the tech industry finds tend to get reinvested into scaling up tech faster and deploying it with broader reach. It almost never means less tech. So, Nvidia will probably do just fine.

hotaru251 · Wednesday at 11:46 PM

bit_user said:
So, I guess the upshot is: sell Nvidia

if it was adopted over the traitional way nvidia owuld just rush making it themselves.

nvidia will chase every trend to stay at the bleeding edge of profiting in the ai space.

bit_user · Thursday at 2:12 AM

hotaru251 said:
if it was adopted over the traitional way nvidia owuld just rush making it themselves.

Well yes, I did say as much in post #13. Chip design takes time, though. Meanwhile, FPGAs exist today.

hotaru251 said:
nvidia will chase every trend to stay at the bleeding edge of profiting in the ai space.

Agreed, though I did mention in another thread that I find it strange they still haven't forked their high-end GPUs into separate AI and HPC models. That fp64 support is wasting a lot of silicon in AI users' hands. It also took them a while to ramp up the amount of on-die SRAM and they're also taking their sweet time following Cerbras and Tesla in doing any sort of wafer-scale designs.

I sometimes get the feeling they've gotten a bit complacent with their current architecture and are following a more evolutionary than revolutionary path. The Keller quote about CUDA being more of a swamp than a moat comes to mind.

Search

News AI researchers found a way to run LLMs at a lightbulb-esque 13 watts with no loss in performance

Admin

Administrator

rluker5

Distinguished

abufrejoval

Reputable

hotaru251

Splendid

salgado18

Distinguished

DS426

Great

DS426

Great

frogr

Distinguished

abufrejoval

Reputable

bit_user

Polypheme

bit_user

Polypheme

usertests

Distinguished

bit_user

Polypheme

hotaru251

Splendid

bit_user

Polypheme

TRENDING THREADS

Latest posts

Moderators online

Share this page