The article said:
As Zeus is aimed at path tracing rendering technique as well as compute workloads, it does not seem to have traditional fixed-function GPU hardware like texture units (TMUs) and raster operation units (ROPs), so it has to rely on compute shaders (or similar methods) for texture sampling and graphics outputs. This saves precious silicon real estate for compute elements. Nonetheless, each Zeus GPU has one DisplayPort 2.1a and one HDMI 2.1b output.
Wow, I'm getting major deja vu from Xeon Phi, here. Larrabee actually did have TMUs, but not much else. Intel demo'd pure software ray tracing on it, IIRC. And someone eventually found a later model Xeon Phi that still had display interfaces on it, in a dumpster outside of Intel's labs.
The article said:
two PCIe Gen5 x16 slots with CXL 3.0 on top
BTW, CXL 3.0 uses the same phy as PCIe 6.0, which makes it a little puzzling they kept PCIe 5.0 for the "slots". It's not like there are any PCIe 6.0 platforms yet, to my knowledge, but probably the next gen of server boards will be.
Edit: In the doc I found (see
post #23), there's no mention of CXL. So, I don't know where that idea came from. They specifically discuss using a 400 GbE switch for multi-GPU communication. The diagram of their next gen architecture shows the PCIe 5.0 being able to bifurcate into as many as 8 PCIe 5.0 x4 links (but that's still 32 lanes in total) and supporting 2x 800 Gb Ethernet links per I/O chiplet, though it's not clear if those capabilities can be used concurrently.
The article said:
The quad-chiplet Zeus implementation is not a card, but rather is a server.
Unlike high-end GPUs that prioritize bandwidth, Bolt is evidently focusing on greater memory size to handle larger datasets for rendering or simulations.
Yeah, let's please just call this what it is, which is a server CPU with a built-in display controller.
The article said:
even the entry-level Zeus 1c26-32, offers significantly higher FP64 compute performance than Nvidia's GeForce RTX 5090 — up to 5 TFLOPS vs. 1.6 TFLOPS
That's a bad comparison, because client GPUs are intentionally designed to have minimal fp64 compute, since it's not generally useful for gaming or AI inference and would therefore be a waste of silicon. If you want to compare it to a client GPU, focus on the fp32 and AI inference horsepower, which shows this Zeus processor is truly no GPU.
Also, their fp64 TFLOPS numbers don't compare well with proper datacenter GPUs. They claim up to 20 TFLOPS of fp64, which indeed is more than double of a 192-core Zen 5 EPYC Turin, by my math. However, Nvidia's B200 claims 67 TFLOPS and AMD's MI300X claims 81.7 (163.4 matrix).
So, it's indeed powerful for a server CPU, but pretty weak compared with GPUs. Also, you cannot get around the memory bandwidth problem, when you start slinging such amounts of compute. Each fp64 is 8 bytes, which means that if your processor can only manage 1.45 TB/s, which works out to a mere 0.18T loads or stores of a fp64. It's funny they mention FFTs as an application, because it does seem like it'd be bandwidth-starved, there. Where this really becomes a problem is for AI, which is why the big training "GPUs" use HBM and have up to like 4x that bandwidth.
The article said:
... and considerably higher path tracing performance: 77 Gigarays vs. 32 Gigarays.
This part is probably the most intriguing. I'd like to see how real-world performance compares. I wonder if the Zeus processor is claiming theoretical peak, but once you hit it with a real dataset, it quickly becomes bandwidth-limited. Perhaps that's the reason Nvidia didn't put more RT cores in the RTX 5090.
The article said:
Unlike CUDA for Nvidia and ROCm for AMD, Bolt's Zeus lacks a mature, widely adopted software ecosystem. Since it is based on RISC-V, Zeus can potentially leverage existing open-source tools and libraries, but without strong developer support, adoption will be limited.
I think their approach is basically that it's just like a server CPU, so you can use existing threading libraries and techniques. That's not ideal, from a scalability perspective, but it does open them up to the vast majority of HPC/scientific software out there.
Edit: I'm now less sure about this, since they don't mention what OS or runtime the device is using. I had assumed it's just a massively-parallel RISC-V CPU, but it seems not. Check the doc, for details.
The article said:
Bolt Graphics says that the first developer kits will be available in late 2025, with full production set for late 2026, which will give time for software developers to play with the hardware.
Ah, and here's the rub. What we usually see with upstart HPC makers is that, by the time they can get something to market, mainstream is already passing them by. This should be concurrent with Zen 6 and Diamond Rapids, which will have even more cores and even more memory & IO bandwidth.
Well, good luck to them. It's cool to hear people still doing software ray tracing, in this day and age. I'd love to see that benchmarked.