News AMD-Powered Frontier Remains Fastest Supercomputer in the World, Intel-Powered Aurora Takes Second With Half-Scale Result

Status
Not open for further replies.

Admin

Administrator
Staff member
However, Aurora is expected to eventually reach up to 2 EFlop/s of performance when it comes fully online. ... The system has seen numerous redesigns and reschedules in the years since, with the new Aurora being announced in 2019 with one exaflop of performance to be delivered in 2021. Yet another rescheduling in late 2021 claimed the system would deliver two exaflops upon completion, which is now slated for next year
Wow. Maybe there should be a Late500 list, for the latest deliveries of supercomputing installations that were still completed by the original vendor/integrator!
: D
 
  • Like
Reactions: TCA_ChinChin
"at a total of 24.69 megawatts (MW) of energy"
MW is a measure of power not energy. Power is how fast energy is consumed.
E.g., The total power consumption of the ~10000 cpus and ~ 30000 gpus is 24.69 MW.
The total energy consumed for a 10 second calculation would be 246.9 MJ. For a 1 hour calculation it would be 24.69 MW h or (24.69 × 3600) MJ
 
I'm surprised they let Intel continue it's Aurora contract after how late it was.

I wouldn't be surprised if future Super Computer contracts have a Opt-Out clause due to late-ness of vendor.

Especially given how bad Intel has been.
 
I'm surprised they let Intel continue it's Aurora contract after how late it was.

I wouldn't be surprised if future Super Computer contracts have a Opt-Out clause due to late-ness of vendor.

Especially given how bad Intel has been.
This was never going to happen once Intel made cost guarantees. At the end of the day the cost of Aurora is still ~$200m (to the gov't) which is significantly cheaper than anything else in its class (El Capitan and Frontier are ~$600m).
 
  • Like
Reactions: TCA_ChinChin
This was never going to happen once Intel made cost guarantees. At the end of the day the cost of Aurora is still ~$200m (to the gov't) which is significantly cheaper than anything else in its class (El Capitan and Frontier are ~$600m).
But Frontier was proven to be (On-Time & Cheaper to Operate, energy wise).

That's money well spent IMO, can't say the same for Intel's Aurora.

Late, More Power Hungry, still on-going build up.
 
But Frontier was proven to be (On-Time & Cheaper to Operate, energy wise).

That's money well spent IMO, can't say the same for Intel's Aurora.

Late, More Power Hungry, still on-going build up.
I'm not entirely sure what your point is here. Should the DOE have canceled Aurora? If so when should they have canceled it?

The power consumption is built into the contract and only becomes a problem if they fail to reach the performance target (they can't just increase power consumption). The time it's currently taking isn't a problem unless they miss their setup and acceptance timelines.

Has Aurora been a boondoggle? Absolutely, but everything was locked in when the dollar figures became guaranteed and hardware started arriving (I believe initial test hardware was late 2021). The real question is whether or not it delivers what was promised. Realistically speaking that is the only important part of systems like these.
 
I'm surprised they let Intel continue it's Aurora contract after how late it was.
Other than cancelling it and letting someone else build a completely difference machine, what else were they going to do?

Normally, I think you'd fire the integrator. However, if Intel was the primary party to the contract, then you have fewer and more drastic options.

I wouldn't be surprised if future Super Computer contracts have a Opt-Out clause due to late-ness of vendor.
They had a financial penalty, and it was big enough to warrant a mention in one of Intel's quarterly reports to its investors.
 
I'm surprised they let Intel continue it's Aurora contract after how late it was.

I wouldn't be surprised if future Super Computer contracts have a Opt-Out clause due to late-ness of vendor.

Especially given how bad Intel has been.
Aurora changed from wanting half an exaflop of xeon phis to wanting 2 exaflops, how does anybody expect this to happen without any delays?!
Aurora realized that that would be completely impossible with xeon phi so they changed to a platform that wasn't even ready yet, the max series got delayed and that didn't help but it wasn't intel that just changed the parameters, aurora made changes that took a lot more time to be realised.
They had a financial penalty, and it was big enough to warrant a mention in one of Intel's quarterly reports to its investors.
Only that they didn't mention anything like that and you are just projecting your feelings on irrelevant quotes...
You are talking about an ~ $300 million pay off or write off or whatever from intel to intel federal...so your argument is that intel gave intel money so that is a fine against intel... makes complete sense, right?!
This could just as well have been intel investing $300 m into intel federal to get even more federal contracts.
If it were a fine it would have had to have been from intel federal to the government, not from intel to intel.
 
So it turns out no one has any real data on how much money has been spent, and how much more will be spent, by the time the Aurora is online at full performance as it is intended to be in the latest spec revisions? We certainly know what the expected maximum value was (in 2019(?) dollars) and the amount of the late penalty imposed. Then we just get the difference with a simple math operation. Ie we have no knowledge of value on the invoices on which the costs are described? What is the value in today's dollars and does significant period inflation matter?
 
Aurora changed from wanting half an exaflop of xeon phis to wanting 2 exaflops, how does anybody expect this to happen without any delays?!
Aurora realized that that would be completely impossible with xeon phi so they changed to a platform that wasn't even ready yet, the max series got delayed and that didn't help but it wasn't intel that just changed the parameters, aurora made changes that took a lot more time to be realised.
You're conveniently overlooking the part about how the Sapphire Rapids + Ponte Vecchio version was meant to be delivered in 2021.

Only that they didn't mention anything like that and you are just projecting your feelings on irrelevant quotes...
No, I'm just saying what was reported on this site.
 
for the same amount of work performed
These are supercomputers intended for specific tasks, not benchmark runs. System architecture and workload are generally designed in parallel. This is not like a desktop machine where you can just move a bit of software to another box and expect it to work faster.
 
You're conveniently overlooking the part about how the Sapphire Rapids + Ponte Vecchio version was meant to be delivered in 2021.
I'm conveniently overlooking it...by stating it?!
Sapphire Rapids + Ponte Vecchio is the max series which I said got delayed.
No, I'm just saying what was reported on this site.
What was reported on this site?!
Did they report something different than everybody else?!
 
How do they plan to go from ~0.5 EFLOPS for 1/2 the system to 2.0 EFLOPS for the full system?
Pretty easily as the system isn't running even remotely optimally yet (keep in mind the power alotted is up to 60MW as well).

Supercomputers don't necessarily scale linearly due to interconnects and require software/hardware optimization over installation time (and potentially beyond). This is why Frontier increased performance when it was formally accepted compared to when it was initially run.
 
  • Like
Reactions: Order 66
I read somewhere ... maybe nextplatform ... that the per GPU performance numbers in Aurora were around 31TF FP64 vs the ~52TF that Intel was putting in slides as the FP64 performance of their highest performance PVC GPUs. So I don't expect the 2EF performance numbers will be reached on Aurora.

However, it appears the highest performance PVC has over double the Tensor processing performance of the chips used in Frontier... so it appears to be a good fit for Argonne's trillion parameter AI models.

Ponte Vecchio 839TF FP16
MI250x 383TF (FP16 tflops)
 
  • Like
Reactions: bit_user
I read somewhere ... maybe nextplatform ... that the per GPU performance numbers in Aurora were around 31TF FP64 vs the ~52TF that Intel was putting in slides as the FP64 performance of their highest performance PVC GPUs. So I don't expect the 2EF performance numbers will be reached on Aurora.
It probably was Next Platform as they were the ones who reported on the limited implementation of PVC. They had estimated something like 1.3-1.4 for sustained (so ~2eF peak) which means even if you go off of that the current run is still being limited by not being fully installed. The question I haven't seen answered is whether they're just power limiting PVC or if they had physical limits on the cards themselves. Realistically it could be either one given the frankly stupid way they designed the first iteration of PVC where they couldn't test individual components and effectively had to toss an entire chip if there was a failure.
 
I think you just misunderstood the original post as "financial penalty" was the wording used. I think that's a fair way of looking at it since $300m due to failed contracting is a pretty severe penalty even if they're doing it in writeoff form.
My point is that there is no evidence of that money actually leaving intel.
It can only be a fine if that money left intel and went to the government or whomever the damaged party is there.
 
My point is that there is no evidence of that money actually leaving intel.
It can only be a fine if that money left intel and went to the government or whomever the damaged party is there.
Nobody but you has used the word "fine" and nobody has implied such. If Intel wrote off $300m for Aurora that means it was likely hardware/installation related and given that the other exascale supercomputers have cost ~$600m that's a reasonable assumption.
 
  • Like
Reactions: bit_user
If Intel wrote off $300m for Aurora
How would we know that it was for aurora?!
Intel federal is doing multiple things.
They are doing the two phase immersion cooling that could get them lots of money, how do we know that they didn't need that money to develop that?!
Or for anything else for that matter, maybe they just needed to hire a few more people, it COULD be for aurora but it also could be for anything else.

Intel Federal (Austin, TX) will seek to adapt a two-phase immersion cooling system to spread heat more effectively. (Award amount: $1,711,416)
 
How would we know that it was for aurora?!
Intel federal is doing multiple things.
They are doing the two phase immersion cooling that could get them lots of money, how do we know that they didn't need that money to develop that?!
Or for anything else for that matter, maybe they just needed to hire a few more people, it COULD be for aurora but it also could be for anything else.

Intel Federal (Austin, TX) will seek to adapt a two-phase immersion cooling system to spread heat more effectively. (Award amount: $1,711,416)
Sure we have no clue what it could possibly be for as I'm sure Gelsinger is blowing smoke:
Timothy Prickett Morgan, The Next Platform: I'm trying to understand the new Aurora system: the original machine was supposed to be north of [an exaflop] and $500 million. Now it's two Exaflops, or in excess of two Exaflops, and you've got a $300 million write-off for federal systems coming in the fourth quarter. Is that a write-off of the original investment, or is Argonne getting the deal of the century on a two [exaflop] machine?

PG:
[Since] the original concept of Aurora, we've had some redefinitions of the timelines and the specifications associated with the project efforts. Obviously some of those earlier dates when we first started talking about the Aurora project we've moved out and changed the timelines for a variety of reasons to get there. Some of those changes lead to the write-off that we're announcing right now. The way the contract is structured, part of it is that the moment that we deliver a certain thing, we will incur some of these write-offs simply from the accounting rules associated with it. As we start delivering it, some of those will likely get reversed next year as we start ramping up the yields of the products. So some of it just ends up being how we account for and how the contracts were structured.
 
Status
Not open for further replies.

TRENDING THREADS