News Yes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count s...

Admin · Nov 30, 2023

Data center CPU manufacturer Ampere has requested to boost the default Linux CPU core count from 256 to 512 to support its latest AmpereOne CPUs with core counts of up to 384 cores in dual-socket configurations.

Yes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count s... : Read more

kanewolf · Nov 30, 2023

Admin said:
Data center CPU manufacturer Ampere has requested to boost the default Linux CPU core count from 256 to 512 to support its latest AmpereOne CPUs with core counts of up to 384 cores in dual-socket configurations.

Yes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count s... : Read more

This is not surprising. The HPC world has had to have tailored linux for many years.

gfg · Nov 30, 2023

It's wrong

Zen 4c EPYC CPUs don't come close, with its highest core count chip at just 96 cores,

128 cores

AMD EPYC™ 9754 Zen4c

Order 66 · Nov 30, 2023

I am kind of surprised considering what @kanewolf said. Also, @kanewolf why do you think this is surprising? my surprise is since HPC has been on linux for such a long time, I am surprised that nobody had though to increase this limit before.

kanewolf · Nov 30, 2023

Order 66 said:
I am kind of surprised considering what @kanewolf said. Also, @kanewolf why do you think this is surprising? my surprise is since HPC has been on linux for such a long time, I am surprised that nobody had though to increase this limit before.

Until recently, 256 cores was not an issue for most hosts. 1U or 2U server chassis that make up 99.999% of Linux 2 socket hosts couldn't come close to 256 cores. Only specialized hosts, generally used in HPC pushed more than 256 cores. Those specialized hosts could have vendor specific kernels maintained. SGI supported 2048 sockets 10 years ago. But they had custom Linux to support those extreme configs.
I am not sure if the proposed kernel changes will be approved. This is still a very small market for such a major change.

coromonadalix · Nov 30, 2023

was there some other talk years ago about 128bit systems too, to deal with high count cpus cores ??

Order 66 · Nov 30, 2023

coromonadalix said:
was there some other talk years ago about 128bit systems too, to deal with high count cpus cores ??

this is the first I've heard of it, but that's not saying much.

vern72 · Nov 30, 2023

256 cores ought to be enough for everyone. 😆

bit_user · Nov 30, 2023

kanewolf said:
Until recently, 256 cores was not an issue for most hosts. 1U or 2U server chassis that make up 99.999% of Linux 2 socket hosts couldn't come close to 256 cores.

Intel typically offers up to 8-socket scalability on select SKUs, at the top end of their Xeon product line. At least, that's what they support for cache-coherent configurations without additional glue logic. With 8x of the top-spec Sapphire Rapids CPUs, you can reach 480 cores.

Intel® Xeon® Platinum 8490H Processor (112.5M Cache, 1.90 GHz) - Product Specifications | Intel

Intel® Xeon® Platinum 8490H Processor (112.5M Cache, 1.90 GHz) quick reference with specifications, features, and technologies.

ark.intel.com

Note where it says: "Scalability: S8S"

kanewolf · Nov 30, 2023

bit_user said:
Intel typically offers up to 8-socket scalability on select SKUs, at the top end of their Xeon product line. At least, that's what they support for cache-coherent configurations without additional glue logic. With 8x of the top-spec Sapphire Rapids CPUs, you can reach 480 cores.

Intel® Xeon® Platinum 8490H Processor (112.5M Cache, 1.90 GHz) - Product Specifications | Intel

Intel® Xeon® Platinum 8490H Processor (112.5M Cache, 1.90 GHz) quick reference with specifications, features, and technologies.

ark.intel.com

Note where it says: "Scalability: S8S"

Yes, but that is not a 2S config, and not a 1U or 2U host. The vast majority of datacenter hosts are 2S blades or 2S 1U or 2U hosts. Any company selling greater than 2S hosts has a tailored OS to support their hardware.

brandonjclark · Dec 1, 2023

You guys are missing the point. HPC has a very different set of requirements than traditional micro-serviced applications which also run on containers.

An always on web app running via K8s is likely well-defined and is made up of many containers, volumes, a secret manager, etc.

An HPC container tends to be a big 'ole fatty, containing everything needed.

These large compute (and memory) dependent HPC containers are just not very "distributable", like in regular K8s apps over large numbers of nodes.

So, a company produces a chip like these to PUSH the boundaries so they can be the best at offering super high core counts for even better HPC support.

The kernel needs this patch, no biggie.

It's not behind the times or ahead of it (linux). It's adapting to the modern needs.

Now, if only HPC architects could figure out how to properly 'Kube some workloads!

😉

thestryker · Dec 1, 2023

kanewolf said:
Yes, but that is not a 2S config, and not a 1U or 2U host. The vast majority of datacenter hosts are 2S blades or 2S 1U or 2U hosts. Any company selling greater than 2S hosts has a tailored OS to support their hardware.

While this is certainly true until SPR there were no x86 systems that hit the 256 core mark since AMD has been exclusively 2S, ICL was 2S and SKL 8S peaked at 28c.

bit_user · Dec 1, 2023

brandonjclark said:
So, a company produces a chip like these to PUSH the boundaries so they can be the best at offering super high core counts for even better HPC support.

AmpereOne is certainly not aiming at HPC. It lacks SVE and has only 2x 128-bit NEON SIMD. Compare that to AMD, which has 6x 256-bit issue ports in Zen 4 (although only two of them are multiply-capable ports) and Golden Cove (similar width to Zen 4, but with 4 multiply-capable ports).

ARM's Neoverse V1 core is an example of an HPC-oriented ARM core. It has 2x 256-bit SVE. Fujitsu's A64FX is even wider, with 2x 512-bit SVE.

Furthermore, Ampere themselves describes AmpereOne as a "Cloud Native" CPU and doesn't mention HPC anywhere:

AmpereOne Product Brief

AmpereOne high performance Cloud Native Processor features a new architecture using Ampere’s custom ARM ISA compliant CPU core and up to 192 cores, and DDR5.

amperecomputing.com

M_C · Dec 1, 2023

I am surprised at the ignorance of the writer. The USA has had a supercomputer chip with 30,000 cores for a number of years. The chip was classified for military use but the manufacturer fought a lengthy battle to undo that so it could be used for civilian use such as in sequencing DNA super fast for cancer treatment and lately AI. You cannot buy this chip as the manufacturer is extremely selective so they don't go to jail for violating export restrictions.
Produced by Venray Technology in Dallas. These are the same guys that invented CPU clock timing.

bit_user · Dec 1, 2023

M_C said:
The USA has had a supercomputer chip with 30,000 cores for a number of years.

Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?

About 35 years ago, a company called Thinking Machines made the CM-2 - a machine with 65536 "CPUs", but they were almost as simple as you could possibly imagine. They each processed only a single bit of information. The most common way to use it was via software that glued them together and make them behave like 2048 32-bit CPUs. Of course, performance would've been a lot better if you'd just started by using actual 32-bit CPUs. That's what they eventually did, in the CM-5.

Connection Machine - Wikipedia

en.wikipedia.org

My point is, it matters precisely how you define a core. Nvidia's "cores" don't behave like general purpose CPU cores, and I'd wager neither do the "cores" in that 30k processor you mentioned. In contrast, the cores in these AmpereOne CPUs are truly general-purpose ARMv8.6-A CPU cores.

thestryker · Dec 1, 2023

bit_user said:
Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?

I'm not sure they ever actually made a product (I wasn't able to find anything and their website isn't even really a thing), but they were pushing an integrated CPU/DRAM design around a decade ago and those would have had to be pretty simple.

bit_user · Dec 1, 2023

I'm also reminded of this:

TeraNex: Filling the GAPP | The CPU Shack Museum

www.cpushack.com

It's a video processing chipset, based on some technology developed for military applications by a company called Teranex. It has 3072 processing elements and was used in a HDMI video processor I bought off ebay like 10 years ago. Even then, it was already long out of production.

Silicon Optix Realta Video Processing Chipset

Silicon Optix put forth a most impressive demo of their …

www.audioholics.com

When it was first released (2005), even GPUs didn't have that many "cores". They were still in the hundreds.

M_C · Dec 1, 2023

bit_user said:
Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?

About 35 years ago, a company called Thinking Machines made the CM-2 - a machine with 65536 "CPUs", but they were almost as simple as you could possibly imagine. They each processed only a single bit of information. The most common way to use it was via software that glued them together and make them behave like 2048 32-bit CPUs. Of course, performance would've been a lot better if you'd just started by using actual 32-bit CPUs. That's what they eventually did, in the CM-5.

Connection Machine - Wikipedia

en.wikipedia.org

My point is, it matters precisely how you define a core. Nvidia's "cores" don't behave like general purpose CPU cores, and I'd wager neither do the "cores" in that 30k processor you mentioned. In contrast, the cores in these AmpereOne CPUs are truly general-purpose ARMv8.6-A CPU cores.

I have spoke with the designer and I would say you need to get more details from them. I think the chip is a halfway in between a CPU and GPU. Because of the military applications details are skimpy. I know the military wanted it for the brain of missiles and the designer wanted it used for good. Hence it is currently used in super fast genome cancer DNA sequencing. Massively parallel compute in realtime. This is the type of chip AGI needs and from my understanding where they are going.

brandonjclark · Dec 1, 2023

bit_user said:
Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?

About 35 years ago, a company called Thinking Machines made the CM-2 - a machine with 65536 "CPUs", but they were almost as simple as you could possibly imagine. They each processed only a single bit of information. The most common way to use it was via software that glued them together and make them behave like 2048 32-bit CPUs. Of course, performance would've been a lot better if you'd just started by using actual 32-bit CPUs. That's what they eventually did, in the CM-5.

Connection Machine - Wikipedia

en.wikipedia.org

My point is, it matters precisely how you define a core. Nvidia's "cores" don't behave like general purpose CPU cores, and I'd wager neither do the "cores" in that 30k processor you mentioned. In contrast, the cores in these AmpereOne CPUs are truly general-purpose ARMv8.6-A CPU cores.

I stand corrected after some research. It does seem that they are being positioned as just lower-cost x86 replacements.

Azure Virtual Machines with Ampere Altra Arm–based processors—generally available | Microsoft Azure Blog

Microsoft is announcing the general availability of the latest Azure Virtual Machines featuring the Ampere Altra Arm–based processor. The new virtual machines will be generally available on September 1 and customers can now launch them in 10 Azure regions and multiple availability zones around...

azure.microsoft.com

Amdlova · Dec 2, 2023

I see this coming long time ago... maybe in future amd will have 192 cores 384 threads or two cpus with 768 threads. More powa baby... will need fiber to storage the ram need for feed all these cores lol

bit_user · Dec 2, 2023

brandonjclark said:
I stand corrected after some research. It does seem that they are being positioned as just lower-cost x86 replacements.

Azure Virtual Machines with Ampere Altra Arm–based processors—generally available | Microsoft Azure Blog

Microsoft is announcing the general availability of the latest Azure Virtual Machines featuring the Ampere Altra Arm–based processor. The new virtual machines will be generally available on September 1 and customers can now launch them in 10 Azure regions and multiple availability zones around...

azure.microsoft.com

Altra is the previous generation. They maxed out at 128 cores. The new one is called AmpereOne and goes up to 192 cores.

bart simpson · Dec 2, 2023

This reminds me of the famous quote from none other than Bill Gates himself, "Who's going to need more than 640K of memory?"

History repeating itself all over again.

This kind of thinking on the part of the Linux Kernel maintainers is typical of general thinking, "We don't need it yet," all over.

bit_user · Dec 3, 2023

bart simpson said:
This reminds me of the famous quote from none other than Bill Gates himself, "Who's going to need more than 640K of memory?"

The actual quote is supposedly:

"640K ought to be enough for anybody."

However, he apparently denies ever saying it:

The ‘640K’ quote won’t go away — but did Gates really say it?

In 1981, when the IBM PC was introduced, Bill Gates supposedly said that 640KB of memory "ought to be enough for anybody." The quote has followed him through the years, despite a lack of solid evidence that he actually said it.

www.computerworld.com

bart simpson said:
History repeating itself all over again.

There are some key details about that 640k quote and about this situation that you seem to be missing. So, please allow me to spell it out for you.

Regarding the "640k" quote, I'd heard this occurred when they were doing the memory layout for MS DOS. A key point is that the CPU had already been designed, and had a hard limit of just over 1 MB, due to the way addressing worked on the 8086. So, what they were deciding was how much of that address range would be available for normal programs and data. The other areas were reserved for BIOS and memory-mapped devices.

Source: https://www.slideserve.com/raleigh/lecture-19-16-bit-ms-dos-programming

So, no matter what they had decided, there's no way they could've allowed even up to 1 MB. Looking back, 640 kB sounds ridiculously small - but, when you put it in context, it was still the majority of memory possible - and PCs of that era usually shipped with far less RAM, because it was expensive.

Given the design of the 8086 ISA, there was no way around having some limit below 1 MB. They would've known that fundamental CPU changes would be required to break the 1 MB barrier, at which point they probably assumed you'd just design a new operating system. And that's actually what happened, since Windows was easily able to surpass the 640k limit, once CPUs like the 80286 and 80386 launched. For pure DOS programs, there were so-called "DOS Extenders" that allowed DOS programs to access memory above 1 MB, after jumping through some hoops.

bart simpson said:
This kind of thinking on the part of the Linux Kernel maintainers is typical of general thinking, "We don't need it yet," all over.

The kernel maintainers left open the door to adding it when needed. They didn't say "never", just "not yet".

The reason why they said "not yet" is that for each supported CPU core, there's a finite resource cost in the size of kernel datastructures. Essentially, it bumps into some minor scalability problems in the kernel. Thus, increasing the limit isn't free, so it makes sense not to do it prematurely.

However, it's easily done, when there is a reason to do it. In that sense, it's definitely not like the MS-DOS case, where it basically required a bunch of OS-level + application code changes to go past 640k. Although DOS' limit was really just due to the primitive nature of CPUs, at the time.

News Yes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count s...

Administrator

Titan

Distinguished

128 cores​

AMD EPYC™ 9754 Zen4c ​

Grand Moff

Titan

Distinguished

Grand Moff

Distinguished

Titan

Titan

Distinguished

Judicious

Titan

Titan

Judicious

Titan

Distinguished

Distinguished

Titan

Titan

Share this page

128 cores

AMD EPYC™ 9754 Zen4c