NewsYes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count s...
Data center CPU manufacturer Ampere has requested to boost the default Linux CPU core count from 256 to 512 to support its latest AmpereOne CPUs with core counts of up to 384 cores in dual-socket configurations.
Data center CPU manufacturer Ampere has requested to boost the default Linux CPU core count from 256 to 512 to support its latest AmpereOne CPUs with core counts of up to 384 cores in dual-socket configurations.
I am kind of surprised considering what @kanewolf said. Also, @kanewolf why do you think this is surprising? my surprise is since HPC has been on linux for such a long time, I am surprised that nobody had though to increase this limit before.
I am kind of surprised considering what @kanewolf said. Also, @kanewolf why do you think this is surprising? my surprise is since HPC has been on linux for such a long time, I am surprised that nobody had though to increase this limit before.
Until recently, 256 cores was not an issue for most hosts. 1U or 2U server chassis that make up 99.999% of Linux 2 socket hosts couldn't come close to 256 cores. Only specialized hosts, generally used in HPC pushed more than 256 cores. Those specialized hosts could have vendor specific kernels maintained. SGI supported 2048 sockets 10 years ago. But they had custom Linux to support those extreme configs.
I am not sure if the proposed kernel changes will be approved. This is still a very small market for such a major change.
Until recently, 256 cores was not an issue for most hosts. 1U or 2U server chassis that make up 99.999% of Linux 2 socket hosts couldn't come close to 256 cores.
Intel typically offers up to 8-socket scalability on select SKUs, at the top end of their Xeon product line. At least, that's what they support for cache-coherent configurations without additional glue logic. With 8x of the top-spec Sapphire Rapids CPUs, you can reach 480 cores.
Intel typically offers up to 8-socket scalability on select SKUs, at the top end of their Xeon product line. At least, that's what they support for cache-coherent configurations without additional glue logic. With 8x of the top-spec Sapphire Rapids CPUs, you can reach 480 cores.
Yes, but that is not a 2S config, and not a 1U or 2U host. The vast majority of datacenter hosts are 2S blades or 2S 1U or 2U hosts. Any company selling greater than 2S hosts has a tailored OS to support their hardware.
You guys are missing the point. HPC has a very different set of requirements than traditional micro-serviced applications which also run on containers.
An always on web app running via K8s is likely well-defined and is made up of many containers, volumes, a secret manager, etc.
An HPC container tends to be a big 'ole fatty, containing everything needed.
These large compute (and memory) dependent HPC containers are just not very "distributable", like in regular K8s apps over large numbers of nodes.
So, a company produces a chip like these to PUSH the boundaries so they can be the best at offering super high core counts for even better HPC support.
The kernel needs this patch, no biggie.
It's not behind the times or ahead of it (linux). It's adapting to the modern needs.
Now, if only HPC architects could figure out how to properly 'Kube some workloads!
Yes, but that is not a 2S config, and not a 1U or 2U host. The vast majority of datacenter hosts are 2S blades or 2S 1U or 2U hosts. Any company selling greater than 2S hosts has a tailored OS to support their hardware.
While this is certainly true until SPR there were no x86 systems that hit the 256 core mark since AMD has been exclusively 2S, ICL was 2S and SKL 8S peaked at 28c.
So, a company produces a chip like these to PUSH the boundaries so they can be the best at offering super high core counts for even better HPC support.
AmpereOne is certainly not aiming at HPC. It lacks SVE and has only 2x 128-bit NEON SIMD. Compare that to AMD, which has 6x 256-bit issue ports in Zen 4 (although only two of them are multiply-capable ports) and Golden Cove (similar width to Zen 4, but with 4 multiply-capable ports).
ARM's Neoverse V1 core is an example of an HPC-oriented ARM core. It has 2x 256-bit SVE. Fujitsu's A64FX is even wider, with 2x 512-bit SVE.
Furthermore, Ampere themselves describes AmpereOne as a "Cloud Native" CPU and doesn't mention HPC anywhere:
AmpereOne high performance Cloud Native Processor features a new architecture using Ampere’s custom ARM ISA compliant CPU core and up to 192 cores, and DDR5.
I am surprised at the ignorance of the writer. The USA has had a supercomputer chip with 30,000 cores for a number of years. The chip was classified for military use but the manufacturer fought a lengthy battle to undo that so it could be used for civilian use such as in sequencing DNA super fast for cancer treatment and lately AI. You cannot buy this chip as the manufacturer is extremely selective so they don't go to jail for violating export restrictions.
Produced by Venray Technology in Dallas. These are the same guys that invented CPU clock timing.
Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?
About 35 years ago, a company called Thinking Machines made the CM-2 - a machine with 65536 "CPUs", but they were almost as simple as you could possibly imagine. They each processed only a single bit of information. The most common way to use it was via software that glued them together and make them behave like 2048 32-bit CPUs. Of course, performance would've been a lot better if you'd just started by using actual 32-bit CPUs. That's what they eventually did, in the CM-5.
My point is, it matters precisely how you define a core. Nvidia's "cores" don't behave like general purpose CPU cores, and I'd wager neither do the "cores" in that 30k processor you mentioned. In contrast, the cores in these AmpereOne CPUs are truly general-purpose ARMv8.6-A CPU cores.
I'm not sure they ever actually made a product (I wasn't able to find anything and their website isn't even really a thing), but they were pushing an integrated CPU/DRAM design around a decade ago and those would have had to be pretty simple.
It's a video processing chipset, based on some technology developed for military applications by a company called Teranex. It has 3072 processing elements and was used in a HDMI video processor I bought off ebay like 10 years ago. Even then, it was already long out of production.
Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?
About 35 years ago, a company called Thinking Machines made the CM-2 - a machine with 65536 "CPUs", but they were almost as simple as you could possibly imagine. They each processed only a single bit of information. The most common way to use it was via software that glued them together and make them behave like 2048 32-bit CPUs. Of course, performance would've been a lot better if you'd just started by using actual 32-bit CPUs. That's what they eventually did, in the CM-5.
My point is, it matters precisely how you define a core. Nvidia's "cores" don't behave like general purpose CPU cores, and I'd wager neither do the "cores" in that 30k processor you mentioned. In contrast, the cores in these AmpereOne CPUs are truly general-purpose ARMv8.6-A CPU cores.
I have spoke with the designer and I would say you need to get more details from them. I think the chip is a halfway in between a CPU and GPU. Because of the military applications details are skimpy. I know the military wanted it for the brain of missiles and the designer wanted it used for good. Hence it is currently used in super fast genome cancer DNA sequencing. Massively parallel compute in realtime. This is the type of chip AGI needs and from my understanding where they are going.
Those must've been awfully simple cores, then. Do you mean like GPU "cores", the way Nvidia uses the term?
About 35 years ago, a company called Thinking Machines made the CM-2 - a machine with 65536 "CPUs", but they were almost as simple as you could possibly imagine. They each processed only a single bit of information. The most common way to use it was via software that glued them together and make them behave like 2048 32-bit CPUs. Of course, performance would've been a lot better if you'd just started by using actual 32-bit CPUs. That's what they eventually did, in the CM-5.
My point is, it matters precisely how you define a core. Nvidia's "cores" don't behave like general purpose CPU cores, and I'd wager neither do the "cores" in that 30k processor you mentioned. In contrast, the cores in these AmpereOne CPUs are truly general-purpose ARMv8.6-A CPU cores.
Microsoft is announcing the general availability of the latest Azure Virtual Machines featuring the Ampere Altra Arm–based processor. The new virtual machines will be generally available on September 1 and customers can now launch them in 10 Azure regions and multiple availability zones around...
I see this coming long time ago... maybe in future amd will have 192 cores 384 threads or two cpus with 768 threads. More powa baby... will need fiber to storage the ram need for feed all these cores lol
Microsoft is announcing the general availability of the latest Azure Virtual Machines featuring the Ampere Altra Arm–based processor. The new virtual machines will be generally available on September 1 and customers can now launch them in 10 Azure regions and multiple availability zones around...
In 1981, when the IBM PC was introduced, Bill Gates supposedly said that 640KB of memory "ought to be enough for anybody." The quote has followed him through the years, despite a lack of solid evidence that he actually said it.
There are some key details about that 640k quote and about this situation that you seem to be missing. So, please allow me to spell it out for you.
Regarding the "640k" quote, I'd heard this occurred when they were doing the memory layout for MS DOS. A key point is that the CPU had already been designed, and had a hard limit of just over 1 MB, due to the way addressing worked on the 8086. So, what they were deciding was how much of that address range would be available for normal programs and data. The other areas were reserved for BIOS and memory-mapped devices.
So, no matter what they had decided, there's no way they could've allowed even up to 1 MB. Looking back, 640 kB sounds ridiculously small - but, when you put it in context, it was still the majority of memory possible - and PCs of that era usually shipped with far less RAM, because it was expensive.
Given the design of the 8086 ISA, there was no way around having some limit below 1 MB. They would've known that fundamental CPU changes would be required to break the 1 MB barrier, at which point they probably assumed you'd just design a new operating system. And that's actually what happened, since Windows was easily able to surpass the 640k limit, once CPUs like the 80286 and 80386 launched. For pure DOS programs, there were so-called "DOS Extenders" that allowed DOS programs to access memory above 1 MB, after jumping through some hoops.
The kernel maintainers left open the door to adding it when needed. They didn't say "never", just "not yet".
The reason why they said "not yet" is that for each supported CPU core, there's a finite resource cost in the size of kernel datastructures. Essentially, it bumps into some minor scalability problems in the kernel. Thus, increasing the limit isn't free, so it makes sense not to do it prematurely.
However, it's easily done, when there is a reason to do it. In that sense, it's definitely not like the MS-DOS case, where it basically required a bunch of OS-level + application code changes to go past 640k. Although DOS' limit was really just due to the primitive nature of CPUs, at the time.