News IBM boosts mainframes with 50% more AI performance: z17 features Telum II chip with AI accelerators

If you thought Nvidia's DGX systems were expensive...

I think the real story on the AI acceleration is that it's for people who need to use mainframes for regulatory reasons, or maybe they're just deep-pocketed and afraid to break with tradition. For the latter set, AI was probably the one thing that would lure them out of the mainframe world and into the cloud. IBM was probably worried that once they started using the cloud for AI, they might decide they could migrate their other computing tasks there, as well.
 
  • Like
Reactions: thestryker
I imagine the on die AI capabilities are mostly just standard gen on gen improvement, but the addin board certainly isn't. It would be interesting to see what the practical application ends up being given how these systems are typically used. Ever since I first saw the improvements IBM was doing each z series generation they've been fascinating to watch. They have so much design innovation it's easy to forget that at their core they're still about allowing native execution of code that goes back decades.
 
I imagine the on die AI capabilities are mostly just standard gen on gen improvement, but the addin board certainly isn't.
They said it uses the same AI cores as the CPUs. They don't say how many AI cores the CPU has, but if we assume it's only 1, then the Spyre should do up to 768 TOPS, which is still peanuts compared to Nvidia.

There's some more detail about non-AI parts of Telum II, here:

They have so much design innovation it's easy to forget that at their core they're still about allowing native execution of code that goes back decades.
Each CPU only has 8 cores, which is sorta shocking in this day and age and considering how much they cost. That legacy code probably should've been rewritten ages ago. Leaving that aside, I wonder how fast it'd run in a JIT emulator on a modern x86 or ARM server CPU.

The part I find most interesting is that they still manage to pull enough revenue to fund the development of a custom microarchitecture and ...well, everything else.
 
  • Like
Reactions: thestryker
How does the performance of the Spyre AI accelerator cards compare to AI processors from other companies?
If we take the above figure of 768 TOPS and consider it to mean dense, int8 tensors, here would be the comparable scores for a few recent Nvidia products:

Model
TOPS (int8, dense)​
Max Power (W)​
L4
485​
72​
L40
1448​
300​
RTX 5090
1676​
575​
H200
1979​
700​
B200
4500​
1200​

I'm guessing the 768 figure is an overestimate, because they apparently use only 75 W each, according to this:


Based on that, Nvidia's L4 is a rather useful point of comparison. I think they're also made on a similar process node and use similar memory (the Spyre appears not to use HBM-class DRAM). At 75 W, they're probably clocked a fair bit lower than the AI unit incorporated into each Tellum II processor. It could also have fewer AI cores than I assumed, since my estimate attributed one unit to each Tellum II. So, my guess of 768 might even be something like 3x of the real figure. Whatever the case, I think there's no way it's faster than a L4.

BTW, I just took another look at the die shot and it's quite clear the AI unit contains at least two units - maybe 4 or even more?

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcde2001f-57f5-4d61-bb2f-e8c311271b25_1915x1000.jpeg

Source: https://chipsandcheese.com/p/telum-ii-at-hot-chips-2024-mainframe-with-a-unique-caching-strategy

However many you think it is, divide the 768 TOPS figure by that. Then, maybe tweak it further for power/clock speed.
 
Last edited: