News Startup claims its Zeus GPU is 10X faster than Nvidia's RTX 5090: Bolt's first GPU coming in 2026

Admin · Mar 7, 2025

Bolt Graphics's RISC-V-based Zeus GPU features upgradeable memory, breakthrough performance, and enhanced energy efficiency.

Startup claims its Zeus GPU is 10X faster than Nvidia's RTX 5090: Bolt's first GPU coming in 2026 : Read more

jp7189 · Mar 7, 2025

So strange to position this against a 5090 or any consumer GPU for that matter.

bit_user · Mar 7, 2025

The article said:
As Zeus is aimed at path tracing rendering technique as well as compute workloads, it does not seem to have traditional fixed-function GPU hardware like texture units (TMUs) and raster operation units (ROPs), so it has to rely on compute shaders (or similar methods) for texture sampling and graphics outputs. This saves precious silicon real estate for compute elements. Nonetheless, each Zeus GPU has one DisplayPort 2.1a and one HDMI 2.1b output.

Wow, I'm getting major deja vu from Xeon Phi, here. Larrabee actually did have TMUs, but not much else. Intel demo'd pure software ray tracing on it, IIRC. And someone eventually found a later model Xeon Phi that still had display interfaces on it, in a dumpster outside of Intel's labs.

The article said:
two PCIe Gen5 x16 slots with CXL 3.0 on top

BTW, CXL 3.0 uses the same phy as PCIe 6.0, which makes it a little puzzling they kept PCIe 5.0 for the "slots". It's not like there are any PCIe 6.0 platforms yet, to my knowledge, but probably the next gen of server boards will be.

Edit: In the doc I found (see post #23), there's no mention of CXL. So, I don't know where that idea came from. They specifically discuss using a 400 GbE switch for multi-GPU communication. The diagram of their next gen architecture shows the PCIe 5.0 being able to bifurcate into as many as 8 PCIe 5.0 x4 links (but that's still 32 lanes in total) and supporting 2x 800 Gb Ethernet links per I/O chiplet, though it's not clear if those capabilities can be used concurrently.

The article said:
The quad-chiplet Zeus implementation is not a card, but rather is a server.
Unlike high-end GPUs that prioritize bandwidth, Bolt is evidently focusing on greater memory size to handle larger datasets for rendering or simulations.

Yeah, let's please just call this what it is, which is a server CPU with a built-in display controller.

The article said:
even the entry-level Zeus 1c26-32, offers significantly higher FP64 compute performance than Nvidia's GeForce RTX 5090 — up to 5 TFLOPS vs. 1.6 TFLOPS

That's a bad comparison, because client GPUs are intentionally designed to have minimal fp64 compute, since it's not generally useful for gaming or AI inference and would therefore be a waste of silicon. If you want to compare it to a client GPU, focus on the fp32 and AI inference horsepower, which shows this Zeus processor is truly no GPU.

Also, their fp64 TFLOPS numbers don't compare well with proper datacenter GPUs. They claim up to 20 TFLOPS of fp64, which indeed is more than double of a 192-core Zen 5 EPYC Turin, by my math. However, Nvidia's B200 claims 67 TFLOPS and AMD's MI300X claims 81.7 (163.4 matrix).

So, it's indeed powerful for a server CPU, but pretty weak compared with GPUs. Also, you cannot get around the memory bandwidth problem, when you start slinging such amounts of compute. Each fp64 is 8 bytes, which means that if your processor can only manage 1.45 TB/s, which works out to a mere 0.18T loads or stores of a fp64. It's funny they mention FFTs as an application, because it does seem like it'd be bandwidth-starved, there. Where this really becomes a problem is for AI, which is why the big training "GPUs" use HBM and have up to like 4x that bandwidth.

The article said:
... and considerably higher path tracing performance: 77 Gigarays vs. 32 Gigarays.

This part is probably the most intriguing. I'd like to see how real-world performance compares. I wonder if the Zeus processor is claiming theoretical peak, but once you hit it with a real dataset, it quickly becomes bandwidth-limited. Perhaps that's the reason Nvidia didn't put more RT cores in the RTX 5090.

The article said:
Unlike CUDA for Nvidia and ROCm for AMD, Bolt's Zeus lacks a mature, widely adopted software ecosystem. Since it is based on RISC-V, Zeus can potentially leverage existing open-source tools and libraries, but without strong developer support, adoption will be limited.

I think their approach is basically that it's just like a server CPU, so you can use existing threading libraries and techniques. That's not ideal, from a scalability perspective, but it does open them up to the vast majority of HPC/scientific software out there.

Edit: I'm now less sure about this, since they don't mention what OS or runtime the device is using. I had assumed it's just a massively-parallel RISC-V CPU, but it seems not. Check the doc, for details.

The article said:
Bolt Graphics says that the first developer kits will be available in late 2025, with full production set for late 2026, which will give time for software developers to play with the hardware.

Ah, and here's the rub. What we usually see with upstart HPC makers is that, by the time they can get something to market, mainstream is already passing them by. This should be concurrent with Zen 6 and Diamond Rapids, which will have even more cores and even more memory & IO bandwidth.

Well, good luck to them. It's cool to hear people still doing software ray tracing, in this day and age. I'd love to see that benchmarked.

Gururu · Mar 7, 2025

This could literally destroy nVidia GPU dominance, a true disrupter finally.

Elusive Ruse · Mar 7, 2025

I would love to have a new player in this field, but I’m not sure how much stake I can put into these claims. I certainly hope this startup is up to sth.

bit_user · Mar 7, 2025

Gururu said:
This could literally destroy nVidia GPU dominance, a true disrupter finally.

It shouldn't be compared to a client GPU, like the RTX 5090. That makes about as much sense as comparing EPYC or Xeon to one.

The main difference between this and regular server CPUs is just that this thing has a built-in display controller. To me, that display engine seems like it's just there for the sake of integration, like putting a BMC right in your I/O die.

Edit: okay, it's more than just a server CPU with an iGPU. Check post #23 for more detailed analysis.

A Stoner · Mar 7, 2025

Maybe it can come in at MSRP?

usertests · Mar 7, 2025

There is one major catch: Zeus can only beat the RTX 5090 GPU in path tracing and FP64 compute workloads because it does not support traditional rendering techniques.

If it could be used alongside a traditional GPU, like that 5090 + 3050 combo used for PhysX, then it could at least be interesting as a curiosity for YouTube clicks.

Notton · Mar 7, 2025

What an interesting card design.

Is that two PCIe connectors on it?
Is that an SFP and RJ-45 connector on the back?
The GPU has 4 memory chips nearby, but then an extra two SODIMMs further away.

Heiro78 · Mar 7, 2025

Notton said:
What an interesting card design.

Is that two PCIe connectors on it?
Is that an SFP and RJ-45 connector on the back?
The GPU has 4 memory chips nearby, but then an extra two SODIMMs further away.

It certainly looks like two PCIe connectors and a RJ-45.

I wish for these claims to be even a 10th true. More competition would be good.

Gururu · Mar 7, 2025

Heiro78 said:
It certainly looks like two PCIe connectors and a RJ-45.

I wish for these claims to be even a 10th true. More competition would be good.

It even has DDR5 slots! Pretty badass, the website is very promising, at least in certain specifications blows away current GPUs.

bit_user · Mar 7, 2025

Notton said:
What an interesting card design.

Is that two PCIe connectors on it?
Is that an SFP and RJ-45 connector on the back?
The GPU has 4 memory chips nearby, but then an extra two SODIMMs further away.

Think of it as a mini server board or a blade made for a blade server. It make much more sense, then.

TerryLaze · Mar 7, 2025

Gururu said:
This could literally destroy nVidia GPU dominance, a true disrupter finally.

Yeah because people don't complain about gpu prices already...this is going to be server tier expensive.

bit_user said:
It shouldn't be compared to a client GPU, like the RTX 5090. That makes about as much sense as comparing EPYC or Xeon to one.

The main difference between this and regular server CPUs is just that this thing has a built-in display engine. To me, that display engine seems like it's just there for the sake of integration, like putting a BMC right in your I/O die.

Pure ray tracing games will be a thing in the future, but it will take a good long while.

lmcnabney · Mar 7, 2025

I hope everything in the article is true and they totally eat Nvidia's compute lunch.

That means the Jen's will have to actually focus on selling GPUs to people playing games.

bit_user · Mar 7, 2025

lmcnabney said:
I hope everything in the article is true and they totally eat Nvidia's compute lunch.

That means the Jen's will have to actually focus on selling GPUs to people playing games.

Their RT claims vs. memory bandwidth is really bothering me. ~~I suspect they just ran a benchmark where all the geometry fits in the cores' L2 caches.~~ (Edit: the doc I found in post #23 says what scene they used and some details concerning how they tested.)

I've been poking around, trying to find the memory density of geometry in ray tracing data structures, when I ran across a blog which cites a paper about a novel technique to compress the BVH + triangle size down to 5-8 bytes per triangle.

“Ray Tracing Massive Models using Hierarchically Compressed Geometry” – or: Huh, the things you find when sifting through your old stuff…

Sometimes you just “stumble” over things from the past that you had mostly forgotten about (or “gone into denial over”!?) …. in this case, I was sifting through my bac…

ingowald.blog

So, let's go with that rather optimistic claim. To achieve the 77 GRays/s claimed by their low-end config, you'd need 385 to 616 GB/s of memory bandwidth for a complex scene, yet the device claims up to 363 GB/s. Still plausible. However, the cache architectures of modern CPUs is such that you can't get just 8 bytes at a time. Cachelines are typically 64 bytes, due to the need to amortize cache overhead and the burst-oriented behavior of modern memory. Random-access means most cache fetches will have poor efficiency, resulting in an 8x penalty in real bandwidth vs. the data actually needed. In that case, their 363 GB/s should look more like 45 GB/s, supporting anywhere from 5.6 to 9.0 GRays/s.

So, that's a pessimistic take. I'd expect the number to be a bit higher, because there is a high degree of spatial coherence in primary ray bounces.

BTW, I did poke around, trying to find a CPU + GPU raytracing benchmark that would let us compare scores on modern CPUs and GPUs, but the only one I found was V-Ray and it has different units for each class of device, seemingly making the results non-comparable. If anyone can find comparable raytracing data on CPUs vs. GPUs, I'd love to know how they compare these days!

Comparing CPU vs. CPU, my expectation is that this thing is probably in the same ballpark as a 128-core EPYC or Xeon. I just don't have any intuition of how those compare with GPUs.

thesyndrome · Mar 7, 2025

the first paragraph feels like a contradiction, I can't say why....not because I don't want to, but the forum system thinks my post is spam or inappropriate when I type in a message with no swearing, political opinions, and on ly involves asking a question from confusion

why_wolf · Mar 7, 2025

Feels like it would better to describe products like this as an SPU (Scientific Processing Units) as opposed to a GPU. Or something in that vein.

Heiro78 · Mar 7, 2025

thesyndrome said:
the first paragraph feels like a contradiction, I can't say why....not because I don't want to, but the forum system thinks my post is spam or inappropriate when I type in a message with no swearing, political opinions, and on ly involves asking a question from confusion

Try it again and ping me. Or DM me it

Co BIY · Mar 7, 2025

In my Powerpoints the GPUs I'm developing will also beat NVidia products around 2028.

My planned GPU will not be exactly 10x better though.

12.4 x better because with the new level of compute available marketers will no longer have to use such simple multipliers.

The only tricky part is financing, manufacturing and getting the right leather jacket look together.

Co BIY · Mar 7, 2025

About Us - Bolt Graphics

Bolt is revolutionizing the tech landscape with our innovative, energy-efficient graphics processors. Meet our founder, Darwesh, and learn more about what we do.

bolt.graphics

Strong team with qualifications.

Remember the product they put out last year?

Bolt Graphics Unveils Thunder, The World’s Fastest Graphics Processor - Bolt Graphics

Today, Bolt Graphics, a trailblazer in innovative silicon chip design, has launched the latest edition of its groundbreaking graphics processor Thunder at CES 2024, redefining the landscape of 3D content creation and consumption. Located in the Bellini 2003 Meeting Room at The Venetian Expo...

bolt.graphics

bit_user · Mar 7, 2025

why_wolf said:
Feels like it would better to describe products like this as an SPU (Scientific Processing Units) as opposed to a GPU. Or something in that vein.

Yeah, it feels wrong to call something without actual graphics hardware a "GPU". Even GPGPU isn't quite right, because the second GP doesn't really belong there. When I can, I like to simply call them "processors", without qualifying them further. Or, sometimes I say HPC/AI GPUs, when I can't really avoid calling it a GPU.

A few years ago, Intel presented an interesting taxonomy, that showed CPUs as being optimized for scalar performance, GPUs as optimized for vector processing, and I think NPUs as optimized for matrices & tensors, or something like that. However, as GPUs gain better matrix and tensor support, and CPU continue to improve their vector support (not to mention AMX), this distinction is getting increasingly blurred.

A few other ways they differ:

GPUs are low-IPC and massively-parallel, supporting thousands of concurrent "threads".
HPC GPUs add HBM-class memory and vector fp64 support. Lots of tensor compute, for AI-optimized ones.
CPU are high-IPC with low-medium parallelism.
NPUs tend to be very much like arrays of DSPs, each with lots of matrix and tensor hardware add-ons. NPUs are GPU-like, but designed around much more predictable latencies and data movement, with a heavy focus on low-precision arithmetic.

I'm still left feeling that, while the categories are fairly distinct and not hard to define, summarizing those key characteristics in a concise name is elusive.

JTWrenn · Mar 7, 2025

Cool, now compare it to an h100 or H200 and show the entire system not just a board. This is like comparing the speed of accessing data on a laptop to a server and claiming you have made a leap.

now maybe they have, who knows, but you are not going up against a 5090 with this card. You are aiming somewhere between it and an h100 or 200 so where does it fall really matters.

Doing things like this might get hype and investment but I feel like it really damages the look and trust of the company for people in the industry.

bit_user · Mar 7, 2025

Co BIY said:
About Us - Bolt Graphics

Bolt is revolutionizing the tech landscape with our innovative, energy-efficient graphics processors. Meet our founder, Darwesh, and learn more about what we do.

bolt.graphics

Strong team with qualifications.

Remember the product they put out last year?

Bolt Graphics Unveils Thunder, The World’s Fastest Graphics Processor - Bolt Graphics

Today, Bolt Graphics, a trailblazer in innovative silicon chip design, has launched the latest edition of its groundbreaking graphics processor Thunder at CES 2024, redefining the landscape of 3D content creation and consumption. Located in the Bellini 2003 Meeting Room at The Venetian Expo...

bolt.graphics

I note a touch of irony.

Seriously, though. Your link lead me to the real story, here.

https://bolt.graphics/wp-content/uploads/2025/03/Bolt-Zeus-Announcement-External.pdf

In there, it claims Zeus does have hardware texture mapping!

Supports OpenImageIO standard
Cached image buffers
Tiled and MipMapped textures
Procedural textures from OpenShadingLanguage
Direct support for USDImaging
OpenColorIO for color management
PTex support

The hardware architecture block diagram shows a block labeled "Accelerators" that are presumably accessed from a RVA23 core via RISC-V extensions.

Nothing is mentioned about whether the interconnect between cores is cache-coherent, which I assumed as it's essential for them to support industry standard CPU threading models. They also don't mention or show CXL, so I don't know where that notion came from.

Also, they did state that they're doing full path-tracing and specified the scene they used is:

"Sponza (source) with curtains & ivy addons @ 1080p 100spp"

SPP -> Samples Per Pixel, which is rather a lot, for interactive rendering. With AI denoising, I think Nvidia does global illumination using values well under 10.

Do note they specify the hardware as "Xilinx U50 FPGA", which is a $3k FPGA accelerator board. I'm not sure how much of their chip they could synthesize on it, but I'm sure it's a subset and was running at a much lower clock speed than their final product is intended to use. If they extrapolated their results linearly, they could be underestimating the impact of memory latency and perhaps even bandwidth.

And yes, that's a very large scene. Definitely won't fit in cache!
: )

RodroX · Mar 7, 2025

Ah... yes, what a bold announcement to do.

But as usual you should not believe what the maker said, always wait for reviews.

Specially after whats been happening with the last paper launch of that maker who forgot to mention that power cables can catch on fire when used with their new product, and that maybe they forgot some internal parts.... nothing serious, it may only lead to 5% to 10% less performance than what you payed for, thats not so bad, isn't it ?
It's not like this new GPUs are very expensive anyways.
Whats 2000~4000 american dollars for a complete GPU?, What could you posible get instead?
Useless thing like a used car, or maybe 2~4 very good brand new notebooks?, or a brand new bed + a kitchen + washine machine +Oled tv + one good notebook.... or a trip 7 days trip to some very nice place for 2 people, or like a 4 to 7 brand new PS5, or 3 complete and very decent desktops....
Crazy times we are living in.

blppt · Mar 7, 2025

Bitboys are back, lol!

News Startup claims its Zeus GPU is 10X faster than Nvidia's RTX 5090: Bolt's first GPU coming in 2026

Administrator

Distinguished

Titan

Commendable

Estimable

Titan

Distinguished

Distinguished

Estimable

Prominent

Commendable

Titan

Titan

Respectable

Titan

Great

Splendid

Prominent

Splendid

Splendid

Titan

Distinguished

Titan

Dignified

Distinguished

Share this page