News Yes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count s...

Status
Not open for further replies.
  • Like
Reactions: palladin9479
I am kind of surprised considering what @kanewolf said. Also, @kanewolf why do you think this is surprising? my surprise is since HPC has been on linux for such a long time, I am surprised that nobody had though to increase this limit before.
 
I am kind of surprised considering what @kanewolf said. Also, @kanewolf why do you think this is surprising? my surprise is since HPC has been on linux for such a long time, I am surprised that nobody had though to increase this limit before.
Until recently, 256 cores was not an issue for most hosts. 1U or 2U server chassis that make up 99.999% of Linux 2 socket hosts couldn't come close to 256 cores. Only specialized hosts, generally used in HPC pushed more than 256 cores. Those specialized hosts could have vendor specific kernels maintained. SGI supported 2048 sockets 10 years ago. But they had custom Linux to support those extreme configs.
I am not sure if the proposed kernel changes will be approved. This is still a very small market for such a major change.
 
  • Like
Reactions: Order 66
Until recently, 256 cores was not an issue for most hosts. 1U or 2U server chassis that make up 99.999% of Linux 2 socket hosts couldn't come close to 256 cores.
Intel typically offers up to 8-socket scalability on select SKUs, at the top end of their Xeon product line. At least, that's what they support for cache-coherent configurations without additional glue logic. With 8x of the top-spec Sapphire Rapids CPUs, you can reach 480 cores.


Note where it says: "Scalability: S8S"
 
Intel typically offers up to 8-socket scalability on select SKUs, at the top end of their Xeon product line. At least, that's what they support for cache-coherent configurations without additional glue logic. With 8x of the top-spec Sapphire Rapids CPUs, you can reach 480 cores.

Note where it says: "Scalability: S8S"
Yes, but that is not a 2S config, and not a 1U or 2U host. The vast majority of datacenter hosts are 2S blades or 2S 1U or 2U hosts. Any company selling greater than 2S hosts has a tailored OS to support their hardware.
 
You guys are missing the point. HPC has a very different set of requirements than traditional micro-serviced applications which also run on containers.

An always on web app running via K8s is likely well-defined and is made up of many containers, volumes, a secret manager, etc.

An HPC container tends to be a big 'ole fatty, containing everything needed.

These large compute (and memory) dependent HPC containers are just not very "distributable", like in regular K8s apps over large numbers of nodes.

So, a company produces a chip like these to PUSH the boundaries so they can be the best at offering super high core counts for even better HPC support.

The kernel needs this patch, no biggie.

It's not behind the times or ahead of it (linux). It's adapting to the modern needs.


Now, if only HPC architects could figure out how to properly 'Kube some workloads!

😉
 
Yes, but that is not a 2S config, and not a 1U or 2U host. The vast majority of datacenter hosts are 2S blades or 2S 1U or 2U hosts. Any company selling greater than 2S hosts has a tailored OS to support their hardware.
While this is certainly true until SPR there were no x86 systems that hit the 256 core mark since AMD has been exclusively 2S, ICL was 2S and SKL 8S peaked at 28c.
 
  • Like
Reactions: bit_user
So, a company produces a chip like these to PUSH the boundaries so they can be the best at offering super high core counts for even better HPC support.
AmpereOne is certainly not aiming at HPC. It lacks SVE and has only 2x 128-bit NEON SIMD. Compare that to AMD, which has 6x 256-bit issue ports in Zen 4 (although only two of them are multiply-capable ports) and Golden Cove (similar width to Zen 4, but with 4 multiply-capable ports).

ARM's Neoverse V1 core is an example of an HPC-oriented ARM core. It has 2x 256-bit SVE. Fujitsu's A64FX is even wider, with 2x 512-bit SVE.

Furthermore, Ampere themselves describes AmpereOne as a "Cloud Native" CPU and doesn't mention HPC anywhere:

 
  • Like
Reactions: Order 66
I am surprised at the ignorance of the writer. The USA has had a supercomputer chip with 30,000 cores for a number of years. The chip was classified for military use but the manufacturer fought a lengthy battle to undo that so it could be used for civilian use such as in sequencing DNA super fast for cancer treatment and lately AI. You cannot buy this chip as the manufacturer is extremely selective so they don't go to jail for violating export restrictions.
Produced by Venray Technology in Dallas. These are the same guys that invented CPU clock timing.
 
  • Like
Reactions: Order 66
The USA has had a supercomputer chip with 30,000 cores for a number of years.
Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?

About 35 years ago, a company called Thinking Machines made the CM-2 - a machine with 65536 "CPUs", but they were almost as simple as you could possibly imagine. They each processed only a single bit of information. The most common way to use it was via software that glued them together and make them behave like 2048 32-bit CPUs. Of course, performance would've been a lot better if you'd just started by using actual 32-bit CPUs. That's what they eventually did, in the CM-5.

My point is, it matters precisely how you define a core. Nvidia's "cores" don't behave like general purpose CPU cores, and I'd wager neither do the "cores" in that 30k processor you mentioned. In contrast, the cores in these AmpereOne CPUs are truly general-purpose ARMv8.6-A CPU cores.
 
  • Like
Reactions: Order 66
I'm also reminded of this:

It's a video processing chipset, based on some technology developed for military applications by a company called Teranex. It has 3072 processing elements and was used in a HDMI video processor I bought off ebay like 10 years ago. Even then, it was already long out of production.


When it was first released (2005), even GPUs didn't have that many "cores". They were still in the hundreds.
 
Last edited:
  • Like
Reactions: Order 66
Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?

About 35 years ago, a company called Thinking Machines made the CM-2 - a machine with 65536 "CPUs", but they were almost as simple as you could possibly imagine. They each processed only a single bit of information. The most common way to use it was via software that glued them together and make them behave like 2048 32-bit CPUs. Of course, performance would've been a lot better if you'd just started by using actual 32-bit CPUs. That's what they eventually did, in the CM-5.

My point is, it matters precisely how you define a core. Nvidia's "cores" don't behave like general purpose CPU cores, and I'd wager neither do the "cores" in that 30k processor you mentioned. In contrast, the cores in these AmpereOne CPUs are truly general-purpose ARMv8.6-A CPU cores.
I have spoke with the designer and I would say you need to get more details from them. I think the chip is a halfway in between a CPU and GPU. Because of the military applications details are skimpy. I know the military wanted it for the brain of missiles and the designer wanted it used for good. Hence it is currently used in super fast genome cancer DNA sequencing. Massively parallel compute in realtime. This is the type of chip AGI needs and from my understanding where they are going.
 
Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?

About 35 years ago, a company called Thinking Machines made the CM-2 - a machine with 65536 "CPUs", but they were almost as simple as you could possibly imagine. They each processed only a single bit of information. The most common way to use it was via software that glued them together and make them behave like 2048 32-bit CPUs. Of course, performance would've been a lot better if you'd just started by using actual 32-bit CPUs. That's what they eventually did, in the CM-5.

My point is, it matters precisely how you define a core. Nvidia's "cores" don't behave like general purpose CPU cores, and I'd wager neither do the "cores" in that 30k processor you mentioned. In contrast, the cores in these AmpereOne CPUs are truly general-purpose ARMv8.6-A CPU cores.


I stand corrected after some research. It does seem that they are being positioned as just lower-cost x86 replacements.

 
  • Like
Reactions: bit_user
I see this coming long time ago... maybe in future amd will have 192 cores 384 threads or two cpus with 768 threads. More powa baby... will need fiber to storage the ram need for feed all these cores lol
 
I stand corrected after some research. It does seem that they are being positioned as just lower-cost x86 replacements.

Altra is the previous generation. They maxed out at 128 cores. The new one is called AmpereOne and goes up to 192 cores.
 
This reminds me of the famous quote from none other than Bill Gates himself, "Who's going to need more than 640K of memory?"

History repeating itself all over again.

This kind of thinking on the part of the Linux Kernel maintainers is typical of general thinking, "We don't need it yet," all over.
 
This reminds me of the famous quote from none other than Bill Gates himself, "Who's going to need more than 640K of memory?"
The actual quote is supposedly:

"640K ought to be enough for anybody."

However, he apparently denies ever saying it:


History repeating itself all over again.
There are some key details about that 640k quote and about this situation that you seem to be missing. So, please allow me to spell it out for you.

Regarding the "640k" quote, I'd heard this occurred when they were doing the memory layout for MS DOS. A key point is that the CPU had already been designed, and had a hard limit of just over 1 MB, due to the way addressing worked on the 8086. So, what they were deciding was how much of that address range would be available for normal programs and data. The other areas were reserved for BIOS and memory-mapped devices.

ms-dos-memory-map-l.jpg

Source: https://www.slideserve.com/raleigh/lecture-19-16-bit-ms-dos-programming

So, no matter what they had decided, there's no way they could've allowed even up to 1 MB. Looking back, 640 kB sounds ridiculously small - but, when you put it in context, it was still the majority of memory possible - and PCs of that era usually shipped with far less RAM, because it was expensive.

Given the design of the 8086 ISA, there was no way around having some limit below 1 MB. They would've known that fundamental CPU changes would be required to break the 1 MB barrier, at which point they probably assumed you'd just design a new operating system. And that's actually what happened, since Windows was easily able to surpass the 640k limit, once CPUs like the 80286 and 80386 launched. For pure DOS programs, there were so-called "DOS Extenders" that allowed DOS programs to access memory above 1 MB, after jumping through some hoops.

This kind of thinking on the part of the Linux Kernel maintainers is typical of general thinking, "We don't need it yet," all over.
The kernel maintainers left open the door to adding it when needed. They didn't say "never", just "not yet".

The reason why they said "not yet" is that for each supported CPU core, there's a finite resource cost in the size of kernel datastructures. Essentially, it bumps into some minor scalability problems in the kernel. Thus, increasing the limit isn't free, so it makes sense not to do it prematurely.

However, it's easily done, when there is a reason to do it. In that sense, it's definitely not like the MS-DOS case, where it basically required a bunch of OS-level + application code changes to go past 640k. Although DOS' limit was really just due to the primitive nature of CPUs, at the time.
 
Last edited:
Status
Not open for further replies.