News Frontier trained a ChatGPT-sized large language model with only 3,000 of its 37,888 Radeon GPUs — the world's fastest supercomputer blasts through...

Admin · Jan 7, 2024

Frontier, the world's fastest supercomputer, can train a large language model like GPT-4 with just 8% of its GPUs.

Frontier trained a ChatGPT-sized large language model with only 3,000 of its 37,888 Radeon GPUs — the world's fastest supercomputer blasts through... : Read more

peachpuff · Jan 7, 2024

Soon the whole article will be in the headline...

hotaru251 · Jan 7, 2024

any developing tech should be focused on opensource tech not proprietary as if one day nvidia just decided "we are no longer making it" your out to sea w/o a paddle.

rluker5 · Jan 7, 2024

Frontier needs 3072 GPUs for a trillion parameter model, Aurora needs 384 GPUs (64 nodes*6GPUs apiece) for a trillion parameter model: https://www.tomshardware.com/news/intel-supercomputing-2023-aurora-xeon-max-gpu-gaudi-granite-rapids

And Intel is in second place.

More specifics are probably needed for an accurate comparison, but AMD looks to be trailing very badly. I wonder how much more power Frontier had to consume to perform the same task?

H4UnT3R · Jan 7, 2024

How long it took to train?

jthill · Jan 7, 2024

rluker5 said:
Frontier needs 3072 GPUs for a trillion parameter model, Aurora needs 384 GPUs (64 nodes*6GPUs apiece) for a trillion parameter model: https://www.tomshardware.com/news/intel-supercomputing-2023-aurora-xeon-max-gpu-gaudi-granite-rapids

And Intel is in second place.

More specifics are probably needed for an accurate comparison, but AMD looks to be trailing very badly. I wonder how much more power Frontier had to consume to perform the same task?

How many GPUs seems less relevant than how much time is needed.

And according to wikipedia,

[Aurora] has around 10 petabytes of memory and 230 petabytes of storage. The machine is estimated to consume around 60 MW of power. For comparison, the fastest computer in the world today, Frontier uses 21 MW while Summit uses 13 MW.

hotaru251 · Jan 7, 2024

rluker5 said:
Frontier needs 3072 GPUs for a trillion parameter model, Aurora needs 384 GPUs (64 nodes*6GPUs apiece) for a trillion parameter model: https://www.tomshardware.com/news/intel-supercomputing-2023-aurora-xeon-max-gpu-gaudi-granite-rapids

And Intel is in second place.

More specifics are probably needed for an accurate comparison, but AMD looks to be trailing very badly. I wonder how much more power Frontier had to consume to perform the same task?

from what I took from the article they only used as many GPU's as the memory needed (as more gpu wasnt beneficial but they still needed the memory)

bit_user · Jan 7, 2024

H4UnT3R said:
How long it took to train?

Exactly. The most crucial piece of information was omitted. I glanced at the paper, but a quick search didn't find any unit of time. Maybe someone wants to have a closer read?

https://arxiv.org/pdf/2312.12705.pdf

rluker5 said:
AMD looks to be trailing very badly. I wonder how much more power Frontier had to consume to perform the same task?

How can you say they're trailing, when you don't know how much time (or power) either used? If Intel used 1/10th the GPUs, but took 20x as long and used more power in the process, can we really count that as a win?

bit_user · Jan 7, 2024

hotaru251 said:
more gpu wasnt beneficial but they still needed the memory

Given that GPT-3 was rumored to take 10k A100 GPUs an entire month to train. I don't see how you can claim more GPUs aren't beneficial.

BTW, I'm sure GPT-3 was trained on a far larger dataset, which would explain the apparent contradiction in compute power used for that vs. these examples.

rluker5 · Jan 7, 2024

bit_user said:
Exactly. The most crucial piece of information was omitted. I glanced at the paper, but a quick search didn't find any unit of time. Maybe someone wants to have a closer read?

https://arxiv.org/pdf/2312.12705.pdf

How can you say they're trailing, when you don't know how much time (or power) either used? If Intel used 1/10th the GPUs, but took 20x as long and used more power in the process, can we really count that as a win?

That is all true, which is why I said more specifics are probably needed for an accurate comparison.
But this article, since it is lacking in any other information than just the number of GPUs used, makes it look like AMD needs 8x as many and makes AMD look like they are trailing badly.

When we do not know that to be the case. We don't know how long it took, if the models were equally complex, or even if the "training" was done to the same standards.

But still entertaining to put that in perspective with the headline with the loaded language: "Frontier trained a ChatGPT-sized large language model with only 3,000 of its 37,888 Radeon GPUs — the world's fastest supercomputer blasts through one trillion parameter model with only 8 percent of its MI250X GPUs"

When Intel used 0.64% of Aurora's GPUs to do what sounds like the same thing a few months back.

Since there is not enough information to even know which GPU is faster at this point I'll just gloat at throwing egg at the baiting headline.

DavidC1 · Jan 7, 2024

rluker5 said:
Frontier needs 3072 GPUs for a trillion parameter model, Aurora needs 384 GPUs (64 nodes*6GPUs apiece) for a trillion parameter model: https://www.tomshardware.com/news/intel-supercomputing-2023-aurora-xeon-max-gpu-gaudi-granite-rapids

And Intel is in second place.

More specifics are probably needed for an accurate comparison, but AMD looks to be trailing very badly. I wonder how much more power Frontier had to consume to perform the same task?

Each MI250X has 383TOPs Int8 capability while each Data Center GPU 1550 series has 1678TOPS Int8, meaning more than 4x as much per GPU. It's 500W for MI250X and 600W for GPU 1550. Also has 408MB of "Rambo Cache" while MI250X is only 16MB L2, which will have real world benefits.

Just by that metric it's like having 3072 Frontier GPUs versus 1536 Aurora GPUs.

Intel's GPU also has twice the amount of transistors, and much harder to fabricate. One would think it would be faster in something.

bit_user · Jan 8, 2024

DavidC1 said:
Each MI250X has 383TOPs Int8 capability while each Data Center GPU 1550 series has 1678TOPS Int8, meaning more than 4x as much per GPU.

That doesn't help with training, which is what they're talking about. For training, BF16 or FP16 are what you need.

hotaru251 · Jan 8, 2024

bit_user said:
I don't see how you can claim more GPUs aren't beneficial.

did you not read article?

"a new problem: parallelism. Throwing more GPUs at an LLM requires increasingly better communication to actually use more resources effectively. Otherwise, most or all of that extra GPU horsepower would be wasted."

George³ · Jan 8, 2024

hotaru251 said:
a new problem: parallelism.

This problem isn't new. Exist from the first multi cpu computers, then ported to multicore cpu/gpu architectures as well.

bit_user · Jan 8, 2024

hotaru251 said:
did you not read article?

"a new problem: parallelism. Throwing more GPUs at an LLM requires increasingly better communication to actually use more resources effectively. Otherwise, most or all of that extra GPU horsepower would be wasted."

"Strong scaling refers to increasing processor count without changing the size of the workload, and this tends to be where higher core counts become less useful"

With a more realistic training dataset, the workload would naturally increase by a lot. Their limited training data is probably why it didn't take them inordinately long to run this experiment with only 3k GPUs.

hotaru251 · Jan 8, 2024

George³ said:
This problem isn't new. Exist from the first multi cpu computers, then ported to multicore cpu/gpu architectures as well.

it wasnt new in that sense. (i.e. never been a thing ever before)

it was new in that at the point they got to it became a new problem for the goal. (i.e. a new problem appeared that wasnt an issue prior)

mindbreaker · Jan 8, 2024

Does this LLM have a name? Benchmarks? Without benchmarks, what is to hoot about?

purpleduggy · Jan 9, 2024

rluker5 said:
Frontier needs 3072 GPUs for a trillion parameter model, Aurora needs 384 GPUs (64 nodes*6GPUs apiece) for a trillion parameter model: https://www.tomshardware.com/news/intel-supercomputing-2023-aurora-xeon-max-gpu-gaudi-granite-rapids

And Intel is in second place.

More specifics are probably needed for an accurate comparison, but AMD looks to be trailing very badly. I wonder how much more power Frontier had to consume to perform the same task?

I own an RTX4090 and this is just astroturf. There are decent competing products from both Nvidia and AMD. being so supportive of only one side is cringe. these are multibillion dollar companies that you are simping for

rluker5 · Jan 9, 2024

purpleduggy said:
I own an RTX4090 and this is just astroturf. There are decent competing products from both Nvidia and AMD. being so supportive of only one side is cringe. these are multibillion dollar companies that you are simping for

Nvidia is definitely in first place in AI.

The headline of this article being so supportive of one side while completely ignoring the other two better performing companies is cringe and is what I was reacting to.
It practically reads -Amazing AMD blasts through AI model like no company ever has before- even though there are two companies that get little fanfare in the article for doing more, earlier.
And then the article leaves out specifics needed for an even comparison.

Nothing against you and your 4090. I would have bought one if I had a 20 series to upgrade from, I'm just tired of seeing biased simping for AMD.

AMD has some good stuff. I like how they get good results from speeding up data access with their large caches. Their desktop CPU chiplets are very efficient with continuous heavy loads. Their older GPUs have better compatibility with newer games than pre Maxwell GPUs, even if they predate Kepler. Their newer GPUs are easier on CPUs. They get a ton of cores in server chips. But AMD isn't perfect, it is a multibillion dollar company and shouldn't get hype and credit where it isn't due.

I was just going by the numbers that were available for comparison. The task performed was training of a trillion parameter large language AI model like ChatGPT3. AMD needed 3072 currently used GPUs and Intel needed 384. That is all the information we were given. On it's face it makes AMD look much worse than the sensationalist headline suggests.

Maybe if we knew the time each setup took to compete the task, the relative complexity of each setup's task a better comparison could be made. AMD's GPU performance would very likely look better with more information. But it is silly to hype AMD's AI prowess from an achievement that is underwhelming and late compared even to dGPU newcomer Intel, much less Nvidia.

purpleduggy · Jan 11, 2024

rluker5 said:
Nvidia is definitely in first place in AI.

I'm not sure you realize that AMD and Nvidia both have the same majority stake investors: Blackrock and Vanguard. There is no real competition. Both companies are just focused on different segments with some overlap, both made at the same fabs. I just buy what gives the bang for buck. I don't care about a side. Both are decent.

News Frontier trained a ChatGPT-sized large language model with only 3,000 of its 37,888 Radeon GPUs — the world's fastest supercomputer blasts through...

Administrator

Reputable

Splendid

Distinguished

Prominent

Honorable

Splendid

Titan

Titan

Distinguished

Distinguished

Titan

Splendid

Respectable

Titan

Splendid

Distinguished

Prominent

Distinguished

Prominent

Share this page