News Japanese ARM Based Supercomputer Fugaku Is Now World's Most Powerful

Apr 10, 2020
57
10
35
0
Now I see guys telling how ARM is more powerful than x86 or P9.
Summit has 202752 cores at only 13MW. Fugaku has 7299072 cores at 28MW. Fugaku is 2.8 time more powerful but has 36 time more cores!
Also Summit has 27648 NVidia V100 each at about 300W!
 

JarredWaltonGPU

Senior GPU Editor
Editor
Feb 21, 2020
511
395
760
0
Now I see guys telling how ARM is more powerful than x86 or P9.
Summit has 202752 cores at only 13MW. Fugaku has 7299072 cores at 28MW. Fugaku is 2.8 time more powerful but has 36 time more cores!
Also Summit has 27648 NVidia V100 each at about 300W!
It's worth noting that Top500 counts Nvidia SMs (in GPUs) as one "core" each. Fugaku has no GPUs but lots of CPUs. Summit has far fewer CPUs, but it also has six V100 GPUs per 2 Power9 CPUs, and each GPU counts as 80 'cores' -- so it still has 2,414,592 cores total, by Top500 metrics where 1 Nvidia SM = 1 core, 1 AMD CU = 1 core, and 1 CPU = 1 core.
 
Reactions: bit_user
Apr 7, 2020
3
0
10
0
more powerful yes , but at a cost .... how many cores again ? there should be performance/cores comparisons
What do you mean by "at what cost" Cost is not the primary reasoning here. The first real Super Computers required liquid nitrogen cooling. Who puts that kind of money into a computer building's cooling requirements?

It's No.1 in compute. Is it extravagant? yes, who cares!
No.1

Intel used to make CPUs you could cook an egg on once upon a time.

Good effort I say.
 

bit_user

Splendid
Ambassador
The Question...
Can it run Crysis?
No. Or, maybe in an emulator and badly.

It's not based on GPUs, so the graphics rendering backend would be running on CPU cores.

As for the main game logic, that would have to run in a x86 emulator.

Even so, I'd imagine it would be practically limited to running on just one 48-core chip. So, not even worth thinking about.
 

bit_user

Splendid
Ambassador
more powerful yes , but at a cost ....
Well, it uses a fully-custom CPU design, so that's going to skew costs by a lot.

For Japan, having their own homegrown HPC is surely a matter of strategic importance. So, they probably don't mind subsidizing it.

how many cores again ? there should be performance/cores comparisons
Top500 has more details.
 

bit_user

Splendid
Ambassador
That question is archaic. The new question, (you heard it here first), is: 'Can it be Crysis'?
AI Learns to be PacMan
My favorite part about that:

the AI network that generated the 50,000 Pac-Man games for training is actually really good at Pac-Man, so it rarely died. That caused GameGAN to not fully comprehend that a normal ghost can catch Pac-Man and kill it. At one point, the network would 'cheat' and turn a ghost purple when it reached Pac-Man, or allow the ghost to pass through Pacman with no ill effect, or other anomalous behavior. Additional training is helping to eliminate this.

...and we're talking Pac Man, here. So, good luck with it learning to plausibly simulate anything much more complex.

Also:
The GameGAN version of Pac-Man also targets a low output resolution of only 128 x 128 pixels right now. That's an even lower resolution than the original arcade game (224 x 288).

...far from playing at 4k - I hope you don't mind squinting.

Suffice to say, I don't expect "being Crysis" is going to be a thing, anytime soon.

Maybe someone will create a far more sophisticated model that's specifically designed for 3D game simulation and is a lot easier to train, but that starts to feel more like programming and less like machine learning.
 
Reactions: JarredWaltonGPU
Apr 30, 2020
11
8
15
0
My favorite part about that:


...and we're talking Pac Man, here. So, good luck with it learning to plausibly simulate anything much more complex.

Also:

...far from playing at 4k - I hope you don't mind squinting.

Suffice to say, I don't expect "being Crysis" is going to be a thing, anytime soon.

Maybe someone will create a far more sophisticated model that's specifically designed for 3D game simulation and is a lot easier to train, but that starts to feel more like programming and less like machine learning.
Deep learning at this point is basically throwing digital monkeys at the problem until they get something resembling the desired result.
It's popular because you can basically just give the data and the desired result without much domain expertise but it's horribly computationally inefficient at what it does.
 
Reactions: bit_user
Now I see guys telling how ARM is more powerful than x86 or P9.
Summit has 202752 cores at only 13MW. Fugaku has 7299072 cores at 28MW. Fugaku is 2.8 time more powerful but has 36 time more cores!
Also Summit has 27648 NVidia V100 each at about 300W!
You obviously don’t understand the problems of scaling them because the arm can scale much better than x86 with more cores
 
Apr 10, 2020
57
10
35
0
It's worth noting that Top500 counts Nvidia SMs (in GPUs) as one "core" each. Fugaku has no GPUs but lots of CPUs. Summit has far fewer CPUs, but it also has six V100 GPUs per 2 Power9 CPUs, and each GPU counts as 80 'cores' -- so it still has 2,414,592 cores total, by Top500 metrics where 1 Nvidia SM = 1 core, 1 AMD CU = 1 core, and 1 CPU = 1 core.
I'm not sure how Linpack and other super computer benchmarks treat GPU cores and do they use them at all. GPUs are very limited compared to CPUs. This is becoming problematic cause HW is much faster then any CPU.
For example Intel AI HW is, of course, much faster then AMD CPUs and is not correct to compare them.
 

bit_user

Splendid
Ambassador
Deep learning at this point is basically throwing digital monkeys at the problem until they get something resembling the desired result.
It's popular because you can basically just give the data and the desired result without much domain expertise but it's horribly computationally inefficient at what it does.
I think you're mostly right, but it's worth noting that when domain-expertise is combined with deep learning, the result can sometimes far surpass what was possible with conventional techniques.

In other words, if I know specifically what are the hard parts of a problem, how to pre-process the input, as well as approximately how many layers of which types, I'm more likely to get away with using a far simpler network, which is much easier to train and will have fewer errors (due to less chance of over-fitting).
 

JarredWaltonGPU

Senior GPU Editor
Editor
Feb 21, 2020
511
395
760
0
I'm not sure how Linpack and other super computer benchmarks treat GPU cores and do they use them at all. GPUs are very limited compared to CPUs. This is becoming problematic cause HW is much faster then any CPU.
For example Intel AI HW is, of course, much faster then AMD CPUs and is not correct to compare them.
I'd really love for an expert with experience coding / using a variety of supercomputers to chime in.

Theoretically, if a problem involves a lot of math, it doesn't matter too much that you have a ton of GPU cores doing the math. GPUs are great at multiplication, division, and other FP math. Logic is a weak point of GPUs, but many of the HPC workloads that supercomputers run are very much math dependent. It's a big part of getting them to scale to thousands and even millions of 'cores' -- you can't reasonably scale branchy, logic-driven code well. At least, that's my experience (which is woefully out of date -- I was a Computer Science graduate, back in the early 00s.)

In short, given that Top500 uses Linpack and has a LOT of GPU-based supercomputers, it's clear the industry at least feels GPUs are useful. A lot of the custom chips are also focusing on matrix operations (eg, Tensor cores), which suggests they don't often need complex CPU-style execution resources and are perfectly happy with simplified GPU-style (or even less than that) programmability.
 

bit_user

Splendid
Ambassador
Theoretically, if a problem involves a lot of math, it doesn't matter too much that you have a ton of GPU cores doing the math. GPUs are great at multiplication, division, and other FP math.
...if it's SIMD-friendly math. But graph analytics is a burgeoning field and not something traditional GPUs are good at. I think Ampere is supposed to be better, but I haven't delved into the nitty gritty or seen any benchmarks.

Logic is a weak point of GPUs,
I guess you mean branch-dense code? Funny enough, GPUs are awesome at branching, as long as you're talking about real branches - not predicated code, where you have to follow both paths.

you can't reasonably scale branchy, logic-driven code well.
Depends on the granularity of your data-dependencies. For instance, it's not hard to scale a compiler to multiple threads, as long as you have enough compilation units to work on, in parallel. Once you get down to parallelizing processing of finer-grained structures, the synchronization overhead can swamp any potential gains from deploying more cores on the problem.
 
Apr 30, 2020
11
8
15
0
I think you're mostly right, but it's worth noting that when domain-expertise is combined with deep learning, the result can sometimes far surpass what was possible with conventional techniques.

In other words, if I know specifically what are the hard parts of a problem, how to pre-process the input, as well as approximately how many layers of which types, I'm more likely to get away with using a far simpler network, which is much easier to train and will have fewer errors (due to less chance of over-fitting).
Of course it works best when the data is properly treated beforehand, the problem is that most wannabe data scientists watch Andrew NG's Ai for everyone, misinterpret it and just throw more hidden layers at the problem.
Kaggle (although by far not a completely accurate representation of reality) shows pretty well what algorithms tend to work best with certain kinds of data, demonstrating pretty well the no free lunch theorem.

Heck, 10 years ago Microsoft was doing computer vision with the Kinect, powered by humble random forests...
 
Reactions: bit_user

bit_user

Splendid
Ambassador
most wannabe data scientists watch Andrew NG's Ai for everyone, misinterpret it and just throw more hidden layers at the problem.
The amount of training data must be matched to the size of the network. You can get away with throwing more hidden layers at the problem, if you have enough data (and compute power) to adequately train it. However, you'll still end up with a network that's more costly for inference.

I haven't followed Dr. Ng, in quite a while, but I recall him preaching the gospel of large networks + truly massive amounts of training data. It should then come as no surprise that his lab was an early partner of Nvidia in developing their multi-GPU training systems. (I don't mean to imply conflict of interest, as he was saying that even before.)

Heck, 10 years ago Microsoft was doing computer vision with the Kinect, powered by humble random forests...
A lot of real-world problems involve some sort of machine learning techniques to find and fit patterns. It doesn't have to be deep learning, but that has caught on due to its flexibility and power.
 

ASK THE COMMUNITY