News Tachyum Prodigy Chip Now Has 192 Universal Cores

Status
Not open for further replies.
As more compute specific cores are being added to SoCs, I find it hard to believe this approach of universal cores will work. I hope they do work though.
 
I've already learned to ignore any news of Tachyum, until the thing actually reaches the hands of end customers. It's pretty amazing the company is still in business, after so many years of shipping nothing but lofty promises.

Worst of all, the most innovative and daring aspects of their ISA seem to have disappeared in the last iteration of news about them. I don't know how much appetite there is for another "me too" ISA, these days. RISC-V seems to be consuming all the oxygen for "the next big thing", while LoongArch and ELBRUS are probably both just floating along on Chinese government funding.

More troubling is that I don't see any news of them contributing patches towards Linux support. They seem to be focusing on FreeBSD, which is going to seriously limit their market.
 
Looking at their website, their last Blog entry is dated June 2022.

The specs claim their CPUs provide "Out-of-Order, 4 instructions per clock", which is old hat. They also claim x86, ARM, and RISC-V "support", which probably means JIT translation, like Apple's Rosetta 2. So, we're basically looking at a core with the same dispatch rate per clock as Sandybridge (from 12 years ago) and the added tax of emulation/translation. That's going to get trashed on general-purpose workloads by AMD's Begamo and Intel's Sierra Forest.

When it comes to AI/Deep learning, they claim 2x 1024-bit vector units per core. Golden Cove (server) and Zen 4 are both at about 1536 bits of total vector execution width, per core. So, not a huge advantage, and it still remains to be seen what their issue rate, latency, and how rich those instructions are.

Next, it seems to have a rather paltry 1 MiB of L2 + L3 cache per core. Compare that to 5 MiB per core in regular Genoa and about 2 MB per core in Sapphire Rapids. AMD's 3D Vcache has shown us how sensitive some workloads are to cache.

Finally, they tout a 4096-bit matrix processor per core, which I think is approximately half of Intel's AMX, though I wouldn't be surprised if it supported a wider variety of operations. For deep learning, Sapphire Rapids gains a lot from its optional HBM. Prodigy's 16-channel DDR5-7200 might be nearly comparable in bandwidth, but that also means surpassing Intel's performance is probably unlikely.
The 4096-bit matrix processor is mentioned on the spec sheet of the older, 48-core model:

As more compute specific cores are being added to SoCs, I find it hard to believe this approach of universal cores will work. I hope they do work though.
Since Intel added AMX, Sapphire Rapids is every bit as "universal" as theirs.

I think your skepticism is well-founded. Increasingly, people are going to be using special-purpose accelerators for AI workloads, due to not only the performance but also the efficiency benefits.

In spite of all my nay-saying, I suppose it would probably be quite an accomplishment for a Slovenian company to build an entirely new CPU that's even in the same ballpark as AMD and Intel's latest and greatest. It should compare favorably against the latest LoongArch CPUs, as well. I guess we should also mention some of the RISC-V efforts in progress, such as SiPearl.

However, let's just see if they can actually get anything to market. We've seen this story play out so very many times. A CPU startup makes lofty claims, but underestimates the time and complexity involved in actually getting a working CPU to market. By the time they do, the mainstream players have pretty much already passed them by. The only remotely recent examples I can think of that beat the trend were Japanese (Fujitsu AFX64, PEZY Computing, and Preferred Networks).
 
Last edited:
I've read on another site that Tachyum has gone a way from VLIW to a more traditional out-of-order architecture. Each instruction is four or eight bytes long.

Emulation of other architectures is supposed to use QEMU. (Like on Apple M1), a core can be switched from WMO to TSO mode for running translated x86 code without memory-barrier instructions everywhere.

Personally, as a geek of low-level system things, I am not wowed by promises of performance numbers. I'm more interested in if its ISA has any benefits (or quirks) for compilers and operating systems compared to other ISAs, and if there are any features that would make it easier to make programs secure.
IMHO, anything like that could give it another reason for existence than just being fast.
But there is so very little information available about it, so we can't tell.
 
Last edited:
  • Like
Reactions: bit_user
Status
Not open for further replies.