Intel's Oak Stream platform to adopt 9324-pin packaging for Diamond Rapids CPU.
Intel's Diamond Rapids will use LGA9324 packaging : Read more
Intel's Diamond Rapids will use LGA9324 packaging : Read more
The scheduled forum maintenance has now been completed. If you spot any issues, please report them here in this thread. Thank you!
ENIAC had over 5 Million hand-soldered joints. I'm not impressed.
Bring this to the next HEDT platform!
Somebody must've blown the breaker on your humor circuit. I detected a distinct note of sarcasm in that post. Either way, it'd be pretty ridiculous!HEDT isn't usually the path for the the truly massive CPUs. At that level you would just buy the enterprise platform.
HEDT is aimed at a price point between consumer and commercial/enterprise. Several thousand dollars, not tens of thousands per chip.
Oops, no. These are Xeon 6.The article said:also up from the 7,529 contacts used by Intel's Xeon 5 Granite Rapids and Sierra Forest processors.
Meant to be a response in kind in a sense. We didn't exactly start this thread off on a serious note.Somebody must've blown the breaker on your humor circuit. I detected a distinct note of sarcasm in that post. Either way, it'd be pretty ridiculous!
: D
Monolithic... package? I don't understand the question. When are these packages ever not monolithic? It will obviously contain multiple dies/tiles/chiplets, but I assume you weren't asking about that.So, this would be a monolithic package of how many chips inside?
Intel seems to think so. The downside is that the more components you cram into a single package, the greater the chance of failures. And if it does experience a catastrophic failure, the cost of replacing the whole thing is quite high.Is this a good idea?
I/O bandwidth between packages is much lower, as well as the energy per bit transferred being higher. That's the main reason. If you keep all chips on the same package, you can use much wider, faster, and more efficient links.Would anything be lost if it was split into four or eight smaller packages?
"Monolithic" is the (old) term for stuffing many chips under one cover. Even so, maybe you can take it too far. If you have a 128 cores, how much do they even want or need to talk to each other? Maybe keeping "just" 16 or 32 of them in a package would cover 90% of any need, if there is any need? With mobs of cores like that they are still hardwired to whichever L1, L2 caches they are using so are already somewhat constrained on memory and data fragmentation or duplication, so putting them all under a single cover is already misleading.Monolithic... package? I don't understand the question. When are these packages ever not monolithic? It will obviously contain multiple dies/tiles/chiplets, but I assume you weren't asking about that.
Intel seems to think so. The downside is that the more components you cram into a single package, the greater the chance of failures. And if it does experience a catastrophic failure, the cost of replacing the whole thing is quite high.
On the other hand, if just a single core fails, you can simply take it offline. I recently saw some news about Intel working on field diagnostics, which are probably becoming a necessity for continuing to scale individual CPUs like this.
I/O bandwidth between packages is much lower, as well as the energy per bit transferred being higher. That's the main reason. If you keep all chips on the same package, you can use much wider, faster, and more efficient links.
Modern software (both kernel and userspace) assumes memory accesses are cache-coherent. This means a certain amount of traffic between the cores to support cache coherency, even when they're not actually exchanging data.maybe you can take it too far. If you have a 128 cores, how much do they even want or need to talk to each other?
For at least 5 years or so, AMD has supported partitioning EPYC CPUs into separate NUMA domains. I'm not sure, but I think it could be simply talking about de-interleaving memory accesses to an extent. Here's what the Zen 4 EPYC doc says about it:Maybe keeping "just" 16 or 32 of them in a package would cover 90% of any need, if there is any need?
Misleading? What's misleading about having distributed caches? Go back as far as you like, even to the days before multi-core CPUs. Back then, each CPU had its own cache(s). There was never a point, in probably at least the last 3 decades, when a multi-core system didn't have such a cache architecture.With mobs of cores like that they are still hardwired to whichever L1, L2 caches they are using so are already somewhat constrained on memory and data fragmentation or duplication, so putting them all under a single cover is already misleading.
Presume you need the NUMA domains even with all the processors in one package.Modern software (both kernel and userspace) assumes memory accesses are cache-coherent. This means a certain amount of traffic between the cores to support cache coherency, even when they're not actually exchanging data.
...
What NUMA actually means is somewhat fuzzy. As the above text I quoted from the Zen 4 EPYC manual says, the penalty for accessing the farthest memory controller wasn't terribly big and has only gotten smaller.Presume you need the NUMA domains even with all the processors in one package.
I think 2P is still the standard server workhorse. Most of Intel and AMD's server CPU models support at least 2P configurations. As for the ARM world, it's the same with Ampere Altra (and I presume AmpereOne) and Nvidia's Grace.Back about, oh, twenty years ago, the standard servers had two or more processor chips on single motherboards.
NUMA is an attempt to have multiple channels to separate banks of memory to avoid collisions, waiting, and other bandwidth problems. You need controllers, and channels (conductors), and memory organized in banks - and ways to cross the boundaries without too much penalty. But the penalties are substantial on their own and what's worse is if six processors all start contending for one bank. Again, I wish I'd ever seen anything detailed about how SQL Server handled NUMA boundaries, but never seen a word of that - you can ask for this or that, but without knowing how the system is organized.What NUMA actually means is somewhat fuzzy. As the above text I quoted from the Zen 4 EPYC manual says, the penalty for accessing the farthest memory controller wasn't terribly big and has only gotten smaller.
The main thing that's NUMA about modern multiprocessor servers is how each CPU has its own memory controller(s). Until the end of the Core 2 era, Intel had a centralized memory controller in the North Bridge chip.
I think 2P is still the standard server workhorse. Most of Intel and AMD's server CPU models support at least 2P configurations. As for the ARM world, it's the same with Ampere Altra (and I presume AmpereOne) and Nvidia's Grace.
I think even that's an overcomplicated definition. At its heart, what it says is that the path isn't symmetrical between all processing elements and all memories. This functional definition hints at the reasons you list for pursuing such an architecture, but there are others (e.g. modularity) and the rationale for NUMA really is an adjunct, as there are other potential solutions for addressing those motives.NUMA is an attempt to have multiple channels to separate banks of memory to avoid collisions, waiting, and other bandwidth problems.
Contention is pretty straight-forwardly addressed via caches and queues, but there are further tricks like prefetching. None of this is specific to NUMA, either. Even more so than with rationale, you quickly get bogged down if you over-specify what NUMA actually means, implementation-wise.You need controllers, and channels (conductors), and memory organized in banks - and ways to cross the boundaries without too much penalty. But the penalties are substantial on their own and what's worse is if six processors all start contending for one bank.