News Startup claims its Zeus GPU is 10X faster than Nvidia's RTX 5090: Bolt's first GPU coming in 2026

The article said:
As Zeus is aimed at path tracing rendering technique as well as compute workloads, it does not seem to have traditional fixed-function GPU hardware like texture units (TMUs) and raster operation units (ROPs), so it has to rely on compute shaders (or similar methods) for texture sampling and graphics outputs. This saves precious silicon real estate for compute elements. Nonetheless, each Zeus GPU has one DisplayPort 2.1a and one HDMI 2.1b output.
Wow, I'm getting major deja vu from Xeon Phi, here. Larrabee actually did have TMUs, but not much else. Intel demo'd pure software ray tracing on it, IIRC. And someone eventually found a later model Xeon Phi that still had display interfaces on it, in a dumpster outside of Intel's labs.

The article said:
two PCIe Gen5 x16 slots with CXL 3.0 on top
BTW, CXL 3.0 uses the same phy as PCIe 6.0, which makes it a little puzzling they kept PCIe 5.0 for the "slots". It's not like there are any PCIe 6.0 platforms yet, to my knowledge, but probably the next gen of server boards will be.

Edit: In the doc I found (see post #23), there's no mention of CXL. So, I don't know where that idea came from. They specifically discuss using a 400 GbE switch for multi-GPU communication. The diagram of their next gen architecture shows the PCIe 5.0 being able to bifurcate into as many as 8 PCIe 5.0 x4 links (but that's still 32 lanes in total) and supporting 2x 800 Gb Ethernet links per I/O chiplet, though it's not clear if those capabilities can be used concurrently.

The article said:
The quad-chiplet Zeus implementation is not a card, but rather is a server.
Unlike high-end GPUs that prioritize bandwidth, Bolt is evidently focusing on greater memory size to handle larger datasets for rendering or simulations.
Yeah, let's please just call this what it is, which is a server CPU with a built-in display controller.

The article said:
even the entry-level Zeus 1c26-32, offers significantly higher FP64 compute performance than Nvidia's GeForce RTX 5090 — up to 5 TFLOPS vs. 1.6 TFLOPS
That's a bad comparison, because client GPUs are intentionally designed to have minimal fp64 compute, since it's not generally useful for gaming or AI inference and would therefore be a waste of silicon. If you want to compare it to a client GPU, focus on the fp32 and AI inference horsepower, which shows this Zeus processor is truly no GPU.

Also, their fp64 TFLOPS numbers don't compare well with proper datacenter GPUs. They claim up to 20 TFLOPS of fp64, which indeed is more than double of a 192-core Zen 5 EPYC Turin, by my math. However, Nvidia's B200 claims 67 TFLOPS and AMD's MI300X claims 81.7 (163.4 matrix).

So, it's indeed powerful for a server CPU, but pretty weak compared with GPUs. Also, you cannot get around the memory bandwidth problem, when you start slinging such amounts of compute. Each fp64 is 8 bytes, which means that if your processor can only manage 1.45 TB/s, which works out to a mere 0.18T loads or stores of a fp64. It's funny they mention FFTs as an application, because it does seem like it'd be bandwidth-starved, there. Where this really becomes a problem is for AI, which is why the big training "GPUs" use HBM and have up to like 4x that bandwidth.

The article said:
... and considerably higher path tracing performance: 77 Gigarays vs. 32 Gigarays.
This part is probably the most intriguing. I'd like to see how real-world performance compares. I wonder if the Zeus processor is claiming theoretical peak, but once you hit it with a real dataset, it quickly becomes bandwidth-limited. Perhaps that's the reason Nvidia didn't put more RT cores in the RTX 5090.

The article said:
Unlike CUDA for Nvidia and ROCm for AMD, Bolt's Zeus lacks a mature, widely adopted software ecosystem. Since it is based on RISC-V, Zeus can potentially leverage existing open-source tools and libraries, but without strong developer support, adoption will be limited.
I think their approach is basically that it's just like a server CPU, so you can use existing threading libraries and techniques. That's not ideal, from a scalability perspective, but it does open them up to the vast majority of HPC/scientific software out there.

Edit: I'm now less sure about this, since they don't mention what OS or runtime the device is using. I had assumed it's just a massively-parallel RISC-V CPU, but it seems not. Check the doc, for details.

The article said:
Bolt Graphics says that the first developer kits will be available in late 2025, with full production set for late 2026, which will give time for software developers to play with the hardware.
Ah, and here's the rub. What we usually see with upstart HPC makers is that, by the time they can get something to market, mainstream is already passing them by. This should be concurrent with Zen 6 and Diamond Rapids, which will have even more cores and even more memory & IO bandwidth.

Well, good luck to them. It's cool to hear people still doing software ray tracing, in this day and age. I'd love to see that benchmarked.
 
Last edited:
This could literally destroy nVidia GPU dominance, a true disrupter finally.
It shouldn't be compared to a client GPU, like the RTX 5090. That makes about as much sense as comparing EPYC or Xeon to one.

The main difference between this and regular server CPUs is just that this thing has a built-in display controller. To me, that display engine seems like it's just there for the sake of integration, like putting a BMC right in your I/O die.

Edit: okay, it's more than just a server CPU with an iGPU. Check post #23 for more detailed analysis.
 
Last edited:
What an interesting card design.

Is that two PCIe connectors on it?
Is that an SFP and RJ-45 connector on the back?
The GPU has 4 memory chips nearby, but then an extra two SODIMMs further away.
 
What an interesting card design.

Is that two PCIe connectors on it?
Is that an SFP and RJ-45 connector on the back?
The GPU has 4 memory chips nearby, but then an extra two SODIMMs further away.
It certainly looks like two PCIe connectors and a RJ-45.

I wish for these claims to be even a 10th true. More competition would be good.
 
This could literally destroy nVidia GPU dominance, a true disrupter finally.
Yeah because people don't complain about gpu prices already...this is going to be server tier expensive.
It shouldn't be compared to a client GPU, like the RTX 5090. That makes about as much sense as comparing EPYC or Xeon to one.

The main difference between this and regular server CPUs is just that this thing has a built-in display engine. To me, that display engine seems like it's just there for the sake of integration, like putting a BMC right in your I/O die.
Pure ray tracing games will be a thing in the future, but it will take a good long while.
 
  • Like
Reactions: derekullo
I hope everything in the article is true and they totally eat Nvidia's compute lunch.

That means the Jen's will have to actually focus on selling GPUs to people playing games.
 
I hope everything in the article is true and they totally eat Nvidia's compute lunch.

That means the Jen's will have to actually focus on selling GPUs to people playing games.
Their RT claims vs. memory bandwidth is really bothering me. I suspect they just ran a benchmark where all the geometry fits in the cores' L2 caches. (Edit: the doc I found in post #23 says what scene they used and some details concerning how they tested.)

I've been poking around, trying to find the memory density of geometry in ray tracing data structures, when I ran across a blog which cites a paper about a novel technique to compress the BVH + triangle size down to 5-8 bytes per triangle.


So, let's go with that rather optimistic claim. To achieve the 77 GRays/s claimed by their low-end config, you'd need 385 to 616 GB/s of memory bandwidth for a complex scene, yet the device claims up to 363 GB/s. Still plausible. However, the cache architectures of modern CPUs is such that you can't get just 8 bytes at a time. Cachelines are typically 64 bytes, due to the need to amortize cache overhead and the burst-oriented behavior of modern memory. Random-access means most cache fetches will have poor efficiency, resulting in an 8x penalty in real bandwidth vs. the data actually needed. In that case, their 363 GB/s should look more like 45 GB/s, supporting anywhere from 5.6 to 9.0 GRays/s.

So, that's a pessimistic take. I'd expect the number to be a bit higher, because there is a high degree of spatial coherence in primary ray bounces.

BTW, I did poke around, trying to find a CPU + GPU raytracing benchmark that would let us compare scores on modern CPUs and GPUs, but the only one I found was V-Ray and it has different units for each class of device, seemingly making the results non-comparable. If anyone can find comparable raytracing data on CPUs vs. GPUs, I'd love to know how they compare these days!

Comparing CPU vs. CPU, my expectation is that this thing is probably in the same ballpark as a 128-core EPYC or Xeon. I just don't have any intuition of how those compare with GPUs.
 
Last edited:
the first paragraph feels like a contradiction, I can't say why....not because I don't want to, but the forum system thinks my post is spam or inappropriate when I type in a message with no swearing, political opinions, and on ly involves asking a question from confusion
 
the first paragraph feels like a contradiction, I can't say why....not because I don't want to, but the forum system thinks my post is spam or inappropriate when I type in a message with no swearing, political opinions, and on ly involves asking a question from confusion
Try it again and ping me. Or DM me it
 
In my Powerpoints the GPUs I'm developing will also beat NVidia products around 2028.

My planned GPU will not be exactly 10x better though.

12.4 x better because with the new level of compute available marketers will no longer have to use such simple multipliers.

The only tricky part is financing, manufacturing and getting the right leather jacket look together.
 
Last edited:

Strong team with qualifications.

Remember the product they put out last year?

 
Feels like it would better to describe products like this as an SPU (Scientific Processing Units) as opposed to a GPU. Or something in that vein.
Yeah, it feels wrong to call something without actual graphics hardware a "GPU". Even GPGPU isn't quite right, because the second GP doesn't really belong there. When I can, I like to simply call them "processors", without qualifying them further. Or, sometimes I say HPC/AI GPUs, when I can't really avoid calling it a GPU.

A few years ago, Intel presented an interesting taxonomy, that showed CPUs as being optimized for scalar performance, GPUs as optimized for vector processing, and I think NPUs as optimized for matrices & tensors, or something like that. However, as GPUs gain better matrix and tensor support, and CPU continue to improve their vector support (not to mention AMX), this distinction is getting increasingly blurred.

A few other ways they differ:
  • GPUs are low-IPC and massively-parallel, supporting thousands of concurrent "threads".
  • HPC GPUs add HBM-class memory and vector fp64 support. Lots of tensor compute, for AI-optimized ones.
  • CPU are high-IPC with low-medium parallelism.
  • NPUs tend to be very much like arrays of DSPs, each with lots of matrix and tensor hardware add-ons. NPUs are GPU-like, but designed around much more predictable latencies and data movement, with a heavy focus on low-precision arithmetic.

I'm still left feeling that, while the categories are fairly distinct and not hard to define, summarizing those key characteristics in a concise name is elusive.
 
Last edited:
Cool, now compare it to an h100 or H200 and show the entire system not just a board. This is like comparing the speed of accessing data on a laptop to a server and claiming you have made a leap.

now maybe they have, who knows, but you are not going up against a 5090 with this card. You are aiming somewhere between it and an h100 or 200 so where does it fall really matters.

Doing things like this might get hype and investment but I feel like it really damages the look and trust of the company for people in the industry.
 

Strong team with qualifications.

Remember the product they put out last year?

I note a touch of irony.

Seriously, though. Your link lead me to the real story, here.

In there, it claims Zeus does have hardware texture mapping!
  • Supports OpenImageIO standard
  • Cached image buffers
  • Tiled and MipMapped textures
  • Procedural textures from OpenShadingLanguage
  • Direct support for USDImaging
  • OpenColorIO for color management
  • PTex support

The hardware architecture block diagram shows a block labeled "Accelerators" that are presumably accessed from a RVA23 core via RISC-V extensions.

Nothing is mentioned about whether the interconnect between cores is cache-coherent, which I assumed as it's essential for them to support industry standard CPU threading models. They also don't mention or show CXL, so I don't know where that notion came from.

Also, they did state that they're doing full path-tracing and specified the scene they used is:

"Sponza (source) with curtains & ivy addons @ 1080p 100spp"
SPP -> Samples Per Pixel, which is rather a lot, for interactive rendering. With AI denoising, I think Nvidia does global illumination using values well under 10.

Do note they specify the hardware as "Xilinx U50 FPGA", which is a $3k FPGA accelerator board. I'm not sure how much of their chip they could synthesize on it, but I'm sure it's a subset and was running at a much lower clock speed than their final product is intended to use. If they extrapolated their results linearly, they could be underestimating the impact of memory latency and perhaps even bandwidth.

And yes, that's a very large scene. Definitely won't fit in cache!
: )
 
Last edited:
  • Like
Reactions: Loadedaxe
Ah... yes, what a bold announcement to do.

But as usual you should not believe what the maker said, always wait for reviews.

Specially after whats been happening with the last paper launch of that maker who forgot to mention that power cables can catch on fire when used with their new product, and that maybe they forgot some internal parts.... nothing serious, it may only lead to 5% to 10% less performance than what you payed for, thats not so bad, isn't it ?
It's not like this new GPUs are very expensive anyways.
Whats 2000~4000 american dollars for a complete GPU?, What could you posible get instead?
Useless thing like a used car, or maybe 2~4 very good brand new notebooks?, or a brand new bed + a kitchen + washine machine +Oled tv + one good notebook.... or a trip 7 days trip to some very nice place for 2 people, or like a 4 to 7 brand new PS5, or 3 complete and very decent desktops....
Crazy times we are living in.
 
Last edited: