News Google claims new AI training tech is 13 times faster and 10 times more power efficient — DeepMind's new JEST optimizes training data for massive g...

Admin

Administrator
Staff member

zsydeepsky

Prominent
Oct 12, 2023
56
51
610
Oh great. Lets see, it boils down to filtering out anything from the dataset that you don't agree with so you can train the models faster.

it amazes me that somehow you linked human behaviors with LLM training methods for me. not kidding.

I had to stop and think that maybe the reason why when human "matured" then most of them became stubborn and hard to move away from their bias...was because the energy cost for the human brain to tune its "intelligence model" became way too expensive, so the brain simply stopped training/tuning anymore.

maybe "ideology" boils down, is just a fundamental flaw in human minds...or even worse: hard physical world limits.
 

watzupken

Reputable
Mar 16, 2020
1,181
663
6,070
There is always some way to speed things up, also known as shortcuts. But in every decision, there will always be tradeoffs. So yeah, it may sound more efficient, but you may have undesirable effects like worst AI hallucination or response. Google don't quite have a good reputation about their AI to begin with.
 

bit_user

Titan
Ambassador
The article said:
a single ChatGPT request costs 10x more than a Google search in power
Huh. I'd have expected more like 100x. Anyway, nothing in the article suggests this new method will make the resulting models cheaper to inference, so that part will remain unchanged.

The article said:
much more likely is that the machine of capital will keep the pedal to the metal, using JEST methods to keep power draw at maximum for hyper-fast training output.
Yes, the tech industry pretty much always reinvests efficiency improvements into greater throughput, rather than net power-savings. Basically, they will spend as much on training and AI development as they can afford to.
 

bit_user

Titan
Ambassador
Would it be fair to call it a 130x performance/Watt improvement?
No. The 10x figure encompasses all computation. For it to be 130x, they would've had to say that it cuts the number of iterations to 1/13th and reduces the compute per iteration to 1/10th.

I think the reason why it's not 1/13th and 1/13th is that it takes computation to train the small model and apply it to grade the training data for the main model.
 
  • Like
Reactions: usertests

bit_user

Titan
Ambassador
Oh great. Lets see, it boils down to filtering out anything from the dataset that you don't agree with so you can train the models faster.
The idea of losing training samples is unsettling, but what it should be doing is actually better representing the diversity of inputs it needs to handle and reducing redundancy between them. If they reduced real diversity, then it shouldn't perform as well on at least some of their benchmarks.

I had to stop and think that maybe the reason why when human "matured" then most of them became stubborn and hard to move away from their bias...was because the energy cost for the human brain to tune its "intelligence model" became way too expensive, so the brain simply stopped training/tuning anymore.
It could have something to do with why you don't remember very much from when you were a small child. My uninformed belief is that neuroplasticity isn't free. The easier it is for you to learn new things, the easier it is for you to forget existing knowledge.

Evolution seems to have settled on an assumption that you can learn all of the fundamental skills you need to know, by the time you're an adult. After that, it's more important that you not forget them, than for you to easily be able to pick up new skills.
 

zsydeepsky

Prominent
Oct 12, 2023
56
51
610
Evolution seems to have settled on an assumption that you can learn all of the fundamental skills you need to know, by the time you're an adult. After that, it's more important that you not forget them, than for you to easily be able to pick up new skills.

I would say evolution simply chooses the path that leads to maximum survival possibility.

"learning more skills" obliviously has a diminishing return, the more you have learned, the less benefit you gain from the next new skill. evolution tends to stop at an "economic-balance-point" for sure. in fact, as you mentioned, since retaining skills also has a cost, the brain even tends to forget skills that are not frequently used.

it's also easy to understand that "balance-point" definitely wouldn't be adequate for "ideology", for this thing only appeared in recent 200 years, an almost non-existent compared to millions of years of human brain evolution history.
 

bit_user

Titan
Ambassador
I would say evolution simply chooses the path that leads to maximum survival possibility.
Obviously. But, my point was that if there's a tradeoff between memory durability and the ability to learn new tasks, then the fact that learning slows as we age seems to have a certain logic to it that could've been selected for by evolution.

it's also easy to understand that "balance-point" definitely wouldn't be adequate for "ideology", for this thing only appeared in recent 200 years,
Ideology is something you learn, just like anything else. It encompasses a world view, which means it touches many of your beliefs and other knowledge. Even if you want to, you can't just switch ideologies in an instant. You need to relearn all of your beliefs and perceptions that it affects.

IMO, the only thing at all "new" about ideologies is that people began to describe them as an abstract concept which explains part of the difference in how different groups view the same facts or set of events. Ideologies have existed since the modern human brain, but it probably took larger & more complex societies for some people to see that not everyone has the same outlook and then to try and pick apart those differences.
 

JRStern

Distinguished
Mar 20, 2017
177
67
18,660
This is very interesting stuff, but it appears to be what someone at Microsoft worked out two years ago, and was adopted in part in the training for ChatGPT 4, so it's not exactly news to the insiders.

It doesn't say here but it also allows much smaller LLM models, that was the Microsoft focus at the time. Maybe small enough to run on edge/client processors.

It's not surprising that training on better material can run much faster! Doh! OTOH all of a sudden it's not automagic, it's not really magically learning because neural networks and truly AI fit to take over the universe all by itself, plus or minus some scaling.

Maybe this is why we send kids to school and have them read books and not just try to figure everything out from social media. We do still send kids to school and have them read books, don't we? Just asking.