News AI researchers found a way to run LLMs at a lightbulb-esque 13 watts with no loss in performance

abufrejoval

Reputable
Jun 19, 2020
615
452
5,260
Cool, now the SoC all existing NPU implementations are also architecturally outdated before vendors could ever prove why you would want Co-Pilot on your personal computer in the first place.
 

salgado18

Distinguished
Feb 12, 2007
981
438
19,370
Cool, now the SoC all existing NPU implementations are also architecturally outdated before vendors could ever prove why you would want Co-Pilot on your personal computer in the first place.
From the news:
The work was done using custom FGPA hardware, but the researchers clarify that (most) of their efficiency gains can be applied through open-source software and tweaking of existing setups
Which is very good news.
 
  • Like
Reactions: AndrewJacksonZA

DS426

Prominent
May 15, 2024
276
206
560
Yep, hilarious as NPU's exist specifically for matrix math and now that might not be needed as a dedicated slice of silicon.

No one has a crystal ball but big tech has treated us mere peasants as complete fools for not going all in as fast as possible on "AI." This is what you get. I actually hope that this new technique doesn't scale much above 50% on existing AI-purpose-built GPU's as I feel like that crisp smack in the face is fully due for. I'm also not saying it doesn't have it's place and progress is good when responsible, but it hasn't been responsible since the news blew up over ChatGPT; when did big business become such big risk takers??

Oh and the best part: AI didn't figure out this optimization problem like all those hopefuls anticipated -- it was us with gray matter.
 

frogr

Distinguished
Nov 16, 2009
78
44
18,570
How much energy is required to complete the computation? Watts x seconds = joules.
The comparison in watts is only valid if both methods take the same amount of time
 
  • Like
Reactions: bit_user

abufrejoval

Reputable
Jun 19, 2020
615
452
5,260
From the news:

Which is very good news.
Not sure what you mean... did you read the publication, too?

Because AFAIK these NPUs on the current SoCs aren't in fact FPGAs, but really GPGPU units that are optimized for MatMul on 4-16bit of precision with distinct variants of fixed or floating point weights. And this new implementation eliminates that to a very large degree or in many of those model layers, thus eliminating the benefits of those NPUs.

At the same time their new operations can't be efficiently implemented by some type of emulation via the CPU or even a normal GPU, it requires FPGAs or an entirely new ASIC or IP block.

If you look at the chip diagrams for this emerging wave of SoCs and see the rather large chunks that are being dedicated to NPUs these days, having those turn effectively into dark silicon before they are even coming to the market, isn't good news for the vendors, who've been banging about the need for AI PCs for some time now.

This can't be implemented or fixed in hardware that is currently being manufactured, if that's what you understood from the article.

New hardware implementing this would relatively cheap to make and operate with vastly higher efficencies, but it's new hardware that needs to fit somewhere. And while it should be easy to put on the equivalent of an Intel/Movidius Neural Stick or an M.2 equivalent, that's either clumsy or hard to fit in emerging ultrabooks.

It's good news for someone like me, who really wouldn't want to pay for the NPU because I don't want any Co-Pilot on my PCs, because currently manufactured chips might come down in price quickly.

But vendors won't be happy.
 

bit_user

Titan
Ambassador
it hasn't been responsible since the news blew up over ChatGPT; when did big business become such big risk takers??
I think everyone was worried that AI will be game-changing, like the Internet was. Many companies who failed to predict or respond to the way their business was affected by the internet are no longer with us. And many of the big tech companies today are those which got their start during the internet boom (Google, Facebook, Amazon).

But vendors won't be happy.
I have yet to read the paper, but I'd be cautious about overreacting. They talk specifically about language models, so it might not apply to other sorts of models, like those used for computer vision or image generation.
 
Last edited:

bit_user

Titan
Ambassador
How much energy is required to complete the computation? Watts x seconds = joules.
The comparison in watts is only valid if both methods take the same amount of time
The magazine article about it claims "> 50 times more efficient than typical hardware."

Another noteworthy quote from that article:

"On standard GPUs, the researchers saw that their neural network achieved about 10 times less memory consumption and operated about 25 percent faster than other models. Reducing the amount of memory needed to run a powerful large language model could provide a path forward to enabling the algorithms to run at full capacity on devices with smaller memory like smartphones."

This could be great for iGPUs and NPUs, which tend to have much less available memory bandwidth than dGPUs.

Also:

"With further development, the researchers believe they can further optimize the technology for even more energy efficiency."

So, I guess the upshot is: sell Nvidia, buy AMD (Xilinx)? Intel is probably going to kick themselves for spinning off Altera.
 
Last edited:
  • Like
Reactions: usertests

bit_user

Titan
Ambassador
It's gigantic if it pans out. But Nvidia is fine for the moment. Check back in 6 months.
Well, the research really focuses on transformer-type networks. I lack the expertise to say how much further the reach might be, but it's not the first time people have looked at super low-precision weights, even going so far as single-bit.

On the latter point, I'm reminded of delta-sigma modulation as an alternative to PCM. It makes me wonder whether they really need ternary encoding. Even if they do, perhaps the approach you could take is to encode matrices in a way that you just skip all the zeros and then you can just use a single bit to say whether the remaining weights are 1 or -1.

Finally, as it pertains to Nvidia, yeah it could bring their stock back to earth for a bit. However, it's not like production of giant FPGAs can be increased 100x overnight, and so anyone needing AI horsepower is going to have limited choices, in the short term. In the longer term, Nvidia and everyone else will just build this into their AI processors and merrily we'll roll along.

In case you haven't noticed, any efficiencies the tech industry finds tend to get reinvested into scaling up tech faster and deploying it with broader reach. It almost never means less tech. So, Nvidia will probably do just fine.
 

bit_user

Titan
Ambassador
if it was adopted over the traitional way nvidia owuld just rush making it themselves.
Well yes, I did say as much in post #13. Chip design takes time, though. Meanwhile, FPGAs exist today.

nvidia will chase every trend to stay at the bleeding edge of profiting in the ai space.
Agreed, though I did mention in another thread that I find it strange they still haven't forked their high-end GPUs into separate AI and HPC models. That fp64 support is wasting a lot of silicon in AI users' hands. It also took them a while to ramp up the amount of on-die SRAM and they're also taking their sweet time following Cerbras and Tesla in doing any sort of wafer-scale designs.

I sometimes get the feeling they've gotten a bit complacent with their current architecture and are following a more evolutionary than revolutionary path. The Keller quote about CUDA being more of a swamp than a moat comes to mind.