News Supergroup: Nvidia Announces Support for Arm Processors

bit_user

Polypheme
Ambassador
Nvidia's effectively making its software hardware-agnostic by letting supercomputer makers use x86 offerings from Intel and AMD or Arm processors at their discretion.
Hmmm... what about POWER? Not long ago, Nvidia had partnership with IBM, whose latest POWER CPUs are the first (and AFAIK only) CPU to natively implement NVLink.

"As traditional compute scaling ends, power will limit all supercomputers. The combination of Nvidia's CUDA-accelerated computing and Arm’s energy-efficient CPU architecture will give the HPC community a boost to exascale.”

That may sound like a bunch of gobbledygook
a bunch of Gobbledygook? Are you trying to write for like 6th-graders? It's fine if you want to break it down, but maybe don't demean readers.

And as for breaking it down, you skipped what they meant by "traditional compute scaling" - the idea that legacy CPU architectures will naturally get more efficient over time.
 
Last edited:

bit_user

Polypheme
Ambassador
Is anyone even interested in ARMs for supercomputers?
Quick search in Top500 shows only 1 ARM entry on 156'th place.
You're looking backwards, while they're looking forwards.

There's been a lot of coverage of both ARM and others using the ARMv8-A ISA in server and hyperscale applications. So far, the list of custom server cores includes: ARM Neoverse, Amazon Gravion, Ampere (formerly AMCC) X-Gene, Qualcomm Centriq (defunct?), Cavium (now Marvell) ThunderX2. Huawei also has a few generations of ARM-based server CPUs deployed.

So, while you weren't looking, a lot of big money has been pouring into taking ARM into servers, the cloud, and beyond.
 

bit_user

Polypheme
Ambassador
I dont known why you would want arm and nvidia. ARM processors are generally not going to be able to utilize the graphics to do much
Hmmm... have you checked out the graphics capabilities of modern smart phones, lately?

https://benchmarks.ul.com/3dmark-android

And they manage that on only a couple watts. So, imagine what happens when you scale up to servers packed-full of even more powerful ARM cores!
 

setx

Distinguished
Dec 10, 2014
224
149
18,760
You're looking backwards, while they're looking forwards.
ARM is well known for "looking forwards" to server/desktop markets for many years now. Now they added 'exascale supercomputing' to their PR slides.

So, while you weren't looking, a lot of big money has been pouring into taking ARM into servers, the cloud, and beyond.
I don't doubt big money been poured for many years but where are the results? I've pointed to that lone supercomputer because that's the real thing you can see now.

Look at AMD for example: they were developing Zen uarch for CPUs with x86 and ARM ISA variants. Where is the ARM variant now? Killed.

"Big money been poured" absolutely doesn't mean it'll succeed, take for example Intel and phones or recently Intel and 5G.
 

bit_user

Polypheme
Ambassador
I don't doubt big money been poured for many years but where are the results? I've pointed to that lone supercomputer because that's the real thing you can see now.
It seems you've got the sequence wrong. First, you need support for a GPU software stack. Then, you'll start to see ARM-based supercomputers. Since and Intel are still squarely in the x86 camp, while Nvidia already has ARM-based SoCs, it's natural to expect the first GPU stack supported on ARM is Nvidia's.

Look at AMD for example: they were developing Zen uarch for CPUs with x86 and ARM ISA variants. Where is the ARM variant now? Killed.
Yes, or mothballed.

People have been predicting ARMs dominance of the cloud for a while. Sometimes, it's difficult to get the timing right, on these predictions, but the fundamentals haven't really changed and the trends do indeed seem to be moving in that direction.

"Big money been poured" absolutely doesn't mean it'll succeed, take for example Intel and phones or recently Intel and 5G.
You should differentiate between one company making a bid with an existing technology it already has, vs. a whole industry making investments based on what logically makes the most sense.

Part of the motivation to switch to ARM was to break Intel's hegemony, which is now alleviated by an ascendant AMD, but the other part is that ARM is a more efficient ISA. That's one of the main reasons Intel failed to penetrate mobile - x86's fundamental inefficiencies.

Because AMD's competitiveness gave big customers better bargaining power, that might've slowed the transition to ARM. But, as the article points out, x86 is going to run out of gas (in terms of perf/W) sooner than ARM. For that reason alone, it seems virtually inevitable.

The only wrinkle might be RISC V. However, the analysis I've read doesn't reveal any big advantages in the ISA, itself, and the chip makers aren't nearly as far along with implementations of it. For now, probably the biggest threat RISC V poses is at the low-end.
 

setx

Distinguished
Dec 10, 2014
224
149
18,760
It seems you've got the sequence wrong. First, you need support for a GPU software stack. Then, you'll start to see ARM-based supercomputers. Since and Intel are still squarely in the x86 camp, while Nvidia already has ARM-based SoCs, it's natural to expect the first GPU stack supported on ARM is Nvidia's.
It seems you don't even understand what we are talking about. Just go to Top500 and count how many systems there are mostly GPU based and how many are pure CPU.

but the other part is that ARM is a more efficient ISA. That's one of the main reasons Intel failed to penetrate mobile - x86's fundamental inefficiencies.
Again this mystical "ARM is a more efficient ISA". Software optimization for specific architecture and efficiency of main loop are way more important than some theoretical ISA comparison. Great indicator here are non-GPU supercomputers where energy efficiency is extremely important.
 

bit_user

Polypheme
Ambassador
It seems you don't even understand what we are talking about. Just go to Top500 and count how many systems there are mostly GPU based and how many are pure CPU.
Okay, I'm looking at the top 20 or so and they fall in 3 categories:
  • GPU-based (including Xeon Phi)
  • Chinese (in which case they use their own GPU-equivalent)
  • Two pure CPU (Intel)

Virtually everyone seems to recognize that general purpose CPUs are not well-suited to HPC workloads. AFAIK, the pure CPU entries aren't even targeted at the same kinds of workloads as the others.

Again this mystical "ARM is a more efficient ISA". Software optimization for specific architecture and efficiency of main loop are way more important than some theoretical ISA comparison.
Huh? You can optimize software for ARM, too.

Given an equivalent degree of optimization, the x86 ISA is moribund. Intel and AMD have done a phenomenal job continuing to milk performance out of it, but even that has certain overheads. There's a good reason why Intel's x86 efforts weren't competitive, in mobile or IoT.

Furthermre, Intel's discontinuation of Xeon Phi and launching of their dGPU products is an acknowledgement that x86 isn't well-suited to the heavy-lifting needed in HPC.
 

setx

Distinguished
Dec 10, 2014
224
149
18,760
Okay, I'm looking at the top 20 or so and they fall in 3 categories:
First of all, why are you looking at top 20? Is there any ARM system there?
Look at the same level that ARM can achieve: Top500.

  • Chinese (in which case they use their own GPU-equivalent
And from where you pulled that Chinese systems are GPU? Where are the GPUs with the same or equivalent processors? You really like to make baseless conclusions...

Huh? You can optimize software for ARM, too.
But do I want to do that? No. And many, many other programmers are the same so far.

Given an equivalent degree of optimization...
Go read about the Itanium and how easy is to get this "equivalent degree of optimization" in real world.
 

bit_user

Polypheme
Ambassador
First of all, why are you looking at top 20?
Because the further down the list you go, the older they tend to be. If you want to look at where things are headed, the top of the list should be the best indicator.

Plus, I'll be honest, you seem a lot more invested in this debate than I am. So, I'm not going to sift through the entire 500.

I do wonder why you seem to care quite so much...

Is there any ARM system there?
Look at the same level that ARM can achieve: Top500.
Again, the problem with this approach is that you're looking backwards, instead of forwards.

And from where you pulled that Chinese systems are GPU? Where are the GPUs with the same or equivalent processors? You really like to make baseless conclusions...
Okay, let's look at the specifics.

#3: Sunway SW26010 260C basically has 4 general-purpose cores and 256 GPU-like cores. To be more specific, the architecture is very much like IBM's Cell.

#4: Matrix-2000 is a 128-core part, designed as a replacement for 1st gen Xeon Phi processors. Like 1st gen Xeon Phi, it used in-order cores with a big SIMD bolted-on. Furthermore, these are paired with conventional Xeon processors for control.

But do I want to do that? No. And many, many other programmers are the same so far.
A lot of libraries are already optimized for ARM. Plus, with GPUs doing the heavy-lifting, the burden on the host processor is much less. Finally, there's the question of cost. Cost breaks down into two parts: purchase price and operating costs.

Because ARM's ISA is smaller and easier to decode, the cores can also be smaller. You don't need as much register-renaming logic, since the ISA's register file is bigger to begin with. You also don't need SMT to milk as much efficiency out of the cores, because it's easy just to add more of them. All of this makes the silicon smaller and cheaper to buy. The engineering costs are potentially lower, especially if you just take ARM's pre-designed cores, which they're now even tailoring for server/HPC workloads ( https://www.arm.com/company/news/2018/10/announcing-arm-neoverse ).

For many of these same reasons, ARM can also offer better energy efficiency. Since the energy costs (both for computation and cooling) of HPC are so large, this is not a minor point.

Given the costs involved, a bit of tuning and customization is often worthwhile.

Go read about the Itanium and how easy is to get this "equivalent degree of optimization" in real world.
Wow, that's sure a non sequitur.

We're not talking about Itanium, in case you hadn't noticed. ARM is not VLIW. HPC would tend to use out-of-order ARM cores, of which there are plenty.

ARM didn't come out of nowhere, it's not a new kid on the block, nor is it particularly exotic or difficult to program or optimize for. IMO, the Itanium comparison is baseless, at best.

The weird thing about this whole issue of optimization is that there are plenty of non-x86 architectures on there: POWER, SPARC, and odd-ball Chinese stuff, just in the top 100. HPC wasn't always x86-dominated, yet you seem to think it must be ever thus. Moreover, I don't get why you're singling out ARM as deserving of such scorn. From what little I know of SPARC, I'd say ARM is a far more deserving HPC platform than it.