News Ampere sneaks out a 192-core CPU with 12-channel DDR5 memory

Very exciting times for ARM. Give it another decade and maybe I won't have any more x86 software I'm dependent on keeping me from jumping ship. For those who can already do full arm workloads this sounds great.
 
828 gigabytes a second of memory bandwidth with DDR5-5600!!!
No, it's 537.6 GB/s (nominal), by my math.

For server memory bandwidth, Intel's Granite Rapids takes the cake. Its MRDDR-8800 yields a nominal bandwidth of 844.8 GB/s. However, x86 uses memory bandwidth less efficiency, except in rare cases. So, for typical apps, this will behave more like 633.6 GB/s, or so.

This move basically just catches AmpreOne up to the DRAM width and capacity Intel's Granite Rapids and AMD's Genoa and Turin. I think they both, at least theoretically, support 2 DPC. Probably not if each DIMM is quad-ranked, though. Anyway, Intel claims Granite Rapids can match AmpereOne M on memory capacity. I'm not sure about Turin, but I'd guess it does too.

BTW, when putting so much memory in a server, the DIMMs burn so much power that the CPU cores might as well go ahead and run a little faster. In such a configuration, I think Ampere's emphasis on power-efficiency isn't quite such an advantage as they claim it to be.
 
Last edited:
Give it another decade and maybe I won't have any more x86 software I'm dependent on keeping me from jumping ship. For those who can already do full arm workloads this sounds great.
If you're more concerned about reducing operating costs than improving runtime, you could take a look at emulation. It might only run 70% as fast, but it would let you use much cheaper ARM instances and then you could probably more than make up for the performance loss by running a lot more of them.
 
828 gigabytes a second of memory bandwidth with DDR5-5600!!!
I'd expect about 400GB/s peak memory bandwidth on these 12 channel 5600MT/s AmpereOne platforms. The custom core they use has fairly bad memory concurrency compared to Apple's implementation of ARM which drag down real world performance.
 
I'd expect about 400GB/s peak memory bandwidth on these 12 channel 5600MT/s AmpereOne platforms. The custom core they use has fairly bad memory concurrency compared to Apple's implementation of ARM which drag down real world performance.
Well, you won't get 400 GB/s from a single core. But, they have 192 cores with which to saturate the memory subsystem.
 
Well, you won't get 400 GB/s from a single core. But, they have 192 cores with which to saturate the memory subsystem.
Most definitely not, I doubt the single core memory performance of the new AmpereOne is much higher than the 15GB/s the old Neoverse N1 cores the brand use to use got.
This new Ampere CPU is at a pretty severe disadvantage by staying on the ancient version 8 ISA, it's missing out on important SIMD instructions that version 9 brought (that coincidentally improve memory performance).
 
I have no idea how they're doing the math on that one and their notes don't indicate how. If I had to guess I'd say it's some sort of overhead normalized calculation based on a dual channel system. That would make it 69*6 by their calculations which would be 414GB/s.

This is how you calculate maximum memory bandwidth:

((A * 2) * B) * C = Maximum memory bandwidth in MB/s

A = DRAM speed in MHz (you can also just use MT/s and ignore the multiplication here)
B = Width in bytes (64-bit (divide by 8 to get bytes) is most common for DRAM modules)
C = Number of memory channels in question

So using the above formula we have:
((2800 * 2) * 8) * 12 = 537,600
 
Sorry I missed this. @thestryker is right.

The easiest way to remember is to think of the number after DDR5 (i.e. 5600) as the bit rate per-pin, in Mbps. When they say 12-channel, they mean 64-bit x12 = 768 bits. So, it's very easy to just multiply it out (5600 * 768), then divide by 8 bits per byte, and then divide by 1000 to go from MB/s to GB/s.

Of course, this doesn't account for things like refresh or DDR5 protocol overhead. That's why I call it a "nominal" number, because it's based on the name DDR5-5600. It's not even "theoretical", because there's no theory where you could ever reach that exact number.
 
Last edited: