The DOE's buy-in on the project is incredibly important for Cerebras, as it signifies that the chips are ready for actual use in production systems.
Um, I think the DoE invests in a lot of experimental tech. I wouldn't assume it necessarily means the tech is yet ready for end users.
Also, as we've seen time and again, trends in the supercomputing space often filter down to more mainstream usages, meaning further development could find Cerebras' WSE in more typical server implementations in the future.
Also, we've seen plenty of supercomputing tech that
didn't filter down, like clustering, Infiniband, silicon-germanium semiconductors, and other stuff that I honestly don't know much about, because it
hasn't filtered down. In fact, the story of the past few decades has been largely about the way that so much tech has filtered
up from desktop PCs into HPC.
That's not tot say
nothing filtered down - it's gone both ways. But the supercomputing industry used to be exclusively built from exotic, custom tech and has been transformed by the use of PCs, GPUs, and a lot of other commodity technology (SSDs, PCIe, etc.). Interestingly, it seems to be headed back in the direction of specialization, as it reaches scales and levels of workload-customization (such as AI) that make no sense for desktop PCs. I'd say this accelerator is a good example of that trend.
In particular, the problem with wafer-scale is that it will
always be extremely expensive, because die space costs a certain amount per area. The better your fault-tolerance is, the less sensitive you are to yield, but it's still the case that die area costs a lot of money, as does their exotic packaging.
Cerebras tells us that it can simply use multiple chips in tandem to tackle larger workloads because, unlike GPUs, which simply mirror the memory across units (data parallel) when used in pairs (think SLI), the WSE runs in model parallel mode, which means it can utilize twice the memory capacity when deployed in pairs, thus scaling linearly.
This is silly.
Of course you can scale models on GPUs in exactly the same way they're talking about.
Cool tech - and fun to read about, no doubt - but, this is
exactly the sort of exotic tech that will remain the preserve of extreme high-end, high-budget computing installations.