News Intel Delivers 10,000 Aurora Supercomputer Blades, Benchmarks Against Nvidia and AMD

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
Aurora was supposed to be finished in 2018.
It's 2023, and they're promishing to finish by the end of the year?

WTF happened with the time tables?

The machine is estimated to consume around 60 MW.[17] For comparison, the fastest computer in the world today, Frontier uses 21 MW while Summit uses 13 MW.
So Frontier delivers slightly over 1 ExaFlop for 21 MW's worth of power consumption.

And this Aurora is going to deliver 2 ExaFlops for 60 MW?

Something went horribly wrong here.
 
  • Like
Reactions: bit_user
Seems like it'll be ready for November so we'll see what the real world performance looks like. Intel's HPC strategy definitely relies on this being a success along with OneAPI. I'm looking forward to seeing how the CPU based HBM is leveraged in the HPC space.

img #2
intel # of gpu; 1, 3, 6,
amd & nvidia # gpu: 1, 2, 4.
They're vendor provided benchmarks so grain of salt, but it's likely based on available configurations. If you look at the numbers Intel's 3 are faster than nvidia's 4 and slightly slower than AMD's 4.
 

bit_user

Polypheme
Ambassador
img #2
intel # of gpu; 1, 3, 6,
amd & nvidia # gpu: 1, 2, 4.
Good catch. But, I think Intel's defense would be that it's a graph intending to show scaling linearity. That's what the shadows are meant to depict. Even then, Nvidia scales better @ 4 than Intel does @ 3. So, that's not really such a great counterpoint.
 

bit_user

Polypheme
Ambassador
If you look at the numbers Intel's 3 are faster than nvidia's 4 and slightly slower than AMD's 4.
Do you even know what they're comparing? Polaris is using A100's, which launched 3 years ago and are made on TSMC N7. Ponte Vecchio uses a mix of TSMC N5, N7, and Intel 7.

The other point of note is where I started: cost. It's hard to get pricing details, especially ones not subject to current AI-driven market distortions, but I think you'd find the product costs of each Xe GPU Max are multiple times that of a single A100, even back when it launched.

Today, the only reason you wouldn't buy a H100 is because you can't.
 
Do you even know what they're comparing? Polaris is using A100's, which launched 3 years ago and are made on TSMC N7. Ponte Vecchio uses a mix of TSMC N5, N7, and Intel 7.

The other point of note is where I started: cost. It's hard to get pricing details, especially ones not subject to current AI-driven market distortions, but I think you'd find the product costs of each Xe GPU Max are multiple times that of a single A100, even back when it launched.

Today, the only reason you wouldn't buy a H100 is because you can't.
It doesn't matter what they're comparing as I was just pointing out what the graph shows and hypothesizing configuration as a reason.
 
ofc, but when you are comparing specs between stuff if it isnt 1:1 then you already doing it wrong.


Not everything scales the same thus even if you do the math based on whats given doesn't mean that is what will actually be truth.
So Intel is supposed to design a custom blade to compare to AMD/nvidia? That makes no sense at all which is the entire point I was making. If you're not aware Polaris is 2 CPU/4 GPU, Frontier is 1 CPU/4 GPU and Aurora is 2 CPU/6 GPU so it's a matter of configuration not just "compare 1:1".
 

bit_user

Polypheme
Ambassador
So Intel is supposed to design a custom blade to compare to AMD/nvidia? That makes no sense at all which is the entire point I was making. If you're not aware Polaris is 2 CPU/4 GPU, Frontier is 1 CPU/4 GPU and Aurora is 2 CPU/6 GPU so it's a matter of configuration not just "compare 1:1".
Well, the graph isn't comparing in increments of 1 blade, is it? If you're trying to do a per-GPU scaling analysis, then the X-axis should be consistent.
 
May 19, 2023
14
10
15
Info about financing this project and covering (and justifying) the operating cost caused by TENS of MEGAFREAKINGWATTS extra power draw could be even more interesting than the struggles with technical compute side of this mess.
 
  • Like
Reactions: bit_user
Well, the graph isn't comparing in increments of 1 blade, is it? If you're trying to do a per-GPU scaling analysis, then the X-axis should be consistent.
Can you come up with a single logical reason why Intel's would be in 1/3/6 configuration if they're not comparing using a single blade?

(keeping in mind this is an Aurora slide comparing third party data)
 
Last edited:

bit_user

Polypheme
Ambassador
Can you come up with a single logical reason why Intel's would be in 1/3/6 configuration if they're not comparing using a single blade?
Because it visually compares more favorably than using the same intervals as the others.

For anyone who hasn't figured out, this is what we're talking about (from the article):

YeKuS6doWesqqEc4LXxvbA.jpg

 
Because it visually compares more favorably than using the same intervals as the others.

For anyone who hasn't figured out, this is what we're talking about (from the article):
YeKuS6doWesqqEc4LXxvbA.jpg
You and I have two very different definitions of logical then. Opening themselves up to criticism just to have larger bars in a marketing graph where they'd have been ahead anyways is stupid.
 

bit_user

Polypheme
Ambassador
You and I have two very different definitions of logical then.
Are you asking why they did it? Or are you asking me to try and find some valid justification for them doing it? Because I can answer the first, but I'm having trouble with the second. Why don't you answer the second, if you think there is one.

Opening themselves up to criticism just to have larger bars in a marketing graph where they'd have been ahead anyways is stupid.
Then you haven't looked at many marketing slides, because these kinds of games are exactly what companies do to flog an inferior product.

It even worked on me, because I didn't notice until @hotaru251 pointed it out. That slide had enough else going on that it distracted me from that fact, and I didn't spend too long on it before I flipped to the next one.
 
Are you asking why they did it? Or are you asking me to try and find some valid justification for them doing it? Because I can answer the first, but I'm having trouble with the second. Why don't you answer the second, if you think there is one.
I originally asked you to come up with a logical reason why they'd do it and you didn't.

Then you haven't looked at many marketing slides, because these kinds of games are exactly what companies do to flog an inferior product.

It even worked on me, because I didn't notice until @hotaru251 pointed it out. That slide had enough else going on that it distracted me from that fact, and I didn't spend too long on it before I flipped to the next one.
It's not an inferior product though so there goes that argument right out the window. It's data from a third party which would only have access to Aurora blades. This is the most likely reason for the disparity not some nefarious plan to throw the wool over the eyes of people who can't be bothered to read.
 

bit_user

Polypheme
Ambassador
I originally asked you to come up with a logical reason why they'd do it and you didn't.
You're asking me to defend the indefensible. Did you get confused which side I'm on?

It's not an inferior product
Oh, not according Intel, it's not!

It's data from a third party which would only have access to Aurora blades.
It's one of the oldest tricks in the book! More than 20 years ago, Microsoft contracted 3rd party firms to run biased benchmark comparisons of Linux vs. Windows for powering backoffice and webservers. They never said the firm was unbiased, but just used the fact of it being a separate firm to provide the appearance of a neutral 3rd party. Even if the firm was never explicitly instructed to favor one party vs. another, they know who's paying the bill and having a reputation of making their clients look bad is bad for future business.

The other way that bias can creep in is if the firm truly is independent and unbiased, but Intel then cherry-picks their data and decides how to present it in a report. Hint: Intel made the presentation, not the 3rd party.
 
You're asking me to defend the indefensible. Did you get confused which side I'm on?
You quite literally said "Because it visually compares more favorably than using the same intervals as the others." is a logical reason for them to show 1/3/6 rather than it being due to blade configuration. That's simply not a logical reason to do it.
Oh, not according Intel, it's not!
The data in their slide shows that they're not, so why would they have to manipulate data which already shows them as better to be better? You're making no sense at all...
It's one of the oldest tricks in the book! More than 20 years ago, Microsoft contracted 3rd party firms to run biased benchmark comparisons of Linux vs. Windows for powering backoffice and webservers. They never said the firm was unbiased, but just used the fact of it being a separate firm to provide the appearance of a neutral 3rd party. Even if the firm was never explicitly instructed to favor one party vs. another, they know who's paying the bill and having a reputation of making their clients look bad is bad for future business.

The other way that bias can creep in is if the firm truly is independent and unbiased, but Intel then cherry-picks their data and decides how to present it in a report. Hint: Intel made the presentation, not the 3rd party.
It doesn't matter whether or not there's bias in the test. It simply doesn't make sense for Intel to show 1/3/6 configuration versus 1/2/4 in something the data already showed them being ahead in unless it was due to configuration. You're conflating two separate issues: the validity of the data and the configuration used to show the data.
 

bit_user

Polypheme
Ambassador
You quite literally said "Because it visually compares more favorably than using the same intervals as the others." is a logical reason for them to show 1/3/6 rather than it being due to blade configuration. That's simply not a logical reason to do it.
If you want to know why I think they did it, there you go. I can't think of a more plausible explanation. I don't know why you're expecting there is one. This is marketing material, not academic research.

The data in their slide shows that they're not, so why would they have to manipulate data which already shows them as better to be better? You're making no sense at all...
Did it ever occur to you they might cherry-pick data to paint their product in the best light? Also, note how they didn't even try to compare it in terms of metrics like cost or energy-efficiency.

This is a dumb argument. I just can't believe you're trying to defend Intel's marketing slides. If you actually wanted to know how it compares, you should look for papers from the HPC research community comparing it to other solutions, or we just sit back and wait for their Top 500 score and normalize that by the node-count.

I'm not really interested in continuing this debate, because either you have some vested interest or your ego is just way too wrapped up in this. Either way, your replies are coming awfully close to personal attacks. I have no beef against you, and I'd like to keep it that way.
 

bit_user

Polypheme
Ambassador
Okay, speaking of academic research papers, I've found two. Probably not more, because the GPU is so new and scarce. More will crop up, in time.

The first examines performance on a Molecular Docking workload. There's an added complication that the app was originally written in CUDA, so the first thing they had to do was port it to SYCL. They found that SYCL was slower on the A100 in nearly all cases, but I'm not sure how much of that is simply due to the relative quality of Nvidia's SYCL implementation vs. their CUDA runtime. Then, they compare the Intel 1550 (the same referenced in this article) vs. the A100. As I've said, it's a slightly mismatched comparison, in terms of age, manufacturing node, and likely cost. What they found was 2 of the 5 ligand-receptor test cases performed better with one algorithm (Solis-Wets) and all 5 performed better with another (ADADELTA).

It's also worth noting they spent some time optimizing the SYCL implementation, which the justified by the fact that the initial porting work was performed by an automated tool. I didn't read enough of the paper to see whether similar optimizations might be possible in the CUDA version.

Caveats: One of the 3 authors on that paper is listed as an Intel employee. ...and then there's this:

ACKNOWLEDGMENTS

This work has been supported by Intel under the oneAPI Center of Excellence Research Award granted to Technical University of Darmstadt.

Here's another one, which goes so far as to list "Intel Coporation" as a co-author, that analyzes Scalability of OpenMP Offloading. Using OpenMP should make it more apples-to-apples, at least avoiding the added variable of porting & optimization. They used the 1100 model and compared it with an A100.

In the end, they were unsuccessful in outperforming the A100, although it looks like they at least managed to scale to the next larger dataset size. I didn't read enough of the paper to know whether the 20% larger HBM size of the PVC factored into this.

So, there you go, @thestryker . Although both had heavy 1st-party involvement, likely guiding their methodology, I at least trust them to provide transparency so that the appropriate questions can be raised. Perhaps some truly independent comparisons will emerge with time & greater public access to resources like Aurora.
 
Also, note how they didn't even try to compare it in terms of metrics like cost or energy-efficiency.
Because at least for aurora these are completely irrelevant, all they care about is for all of the stuff to fit into their building and to be below the 60MW maximum that their power plant and their cooling can provide, and of course to reach the amount of exascale performance they wanted.
Also they would probably want their hardware to not blow up either on the chips or on the connectors.

Here is what the head scientist in charge of aurora has to say.
The delays aren't just because intel or HP or Crey where late on producing hardware, they realised before the fact that x86 cores would be terrible for the project and switched gears towards GPU cores and of course that would take a lot of time.
“People realized that GPUs actually do calculations really quickly,” Martin explains. “And so we worked with Intel to help design the GPUs — they took out the graphics-rendering engine and all the ray tracing and put in more ability to do calculations. So these GPUs are calculation accelerators.” Papka dates this supercomputer innovation to the late ’90s: “The entire world needs to thank the 14-year-olds for their love of gaming, because that’s largely what drove it.”
Also the max power draw is the max not the regular draw, they would be happy if it would always be able to draw the max because it would mean that they would get their work done as fast as possible.
Papka emphasizes that 60 megawatts is the absolute limit — Aurora will more likely run at around 50 to 54 megawatts, if that. “Our old system, Mira, had a peak of nine megawatts,” Papka says. “But if I’d look at the electric bill, it was more around three and a half. So if you target for peak — everything’s perfect, you’re using every piece of silicon on that chip — you’re gonna consume all that power. But you never get there.