News Startup claims its Zeus GPU is 10X faster than Nvidia's RTX 5090: Bolt's first GPU coming in 2026

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
I registered for an account just so I could let you know what a sleazy, cheap, clickbait garbage headline this is.
Yeah, pretty much. The Path Tracing performance claim is quite far out there, by my read of the docs.

Not only that, but the 10x number is for their top-spec 500W 4-cluster version, while they only showed a board containing the single-cluster version.

It's also pretty funny to see them claim that realtime path tracing requires 280 RTX 5090 GPUs. Yes, 280 GPUs teamed together, in order to produce 4k @ 120 Hz with 100 spp, as if they didn't know that Nvidia is already doing Global Illumination with only a couple rays per pixel. Coupled with DLSS3+, that can let you hit > 60 fps at 4K on a single RTX 5090.

Meanwhile, if you use their approach, they're saying you need 28 of their 500W models to hit that same aggregate performance. If you've only got budget for just a single one of their 500W cards, then you have to sacrifice on framerate, image quality (spp), resolution, or some combination. A bullet point in their slides seems to imply that you can get by with the 2c version, using 8 spp, 5 bounces, and denoising.
 
Bottom line, this is not a GPU, this is just a specialized accelerator board for very specific and limited calculations.
Actually, if it according to the article implements the RISC-V Vector extension, then it should be suited for pretty much any massive computing task. It can do pretty much anything that AVX-512 or ARM SVE can.

According to info from others above, it has physical hardware units also specifically for graphics, accessible through RISC-V instruction extensions.

The only thing it seems to be missing is output to a display, but that can be added.
In a large data-center or supercomputer, the GPUs don't typically have them anyway. I'm guessing that they are looking for large orders of that kind to gain momentum and capital before they start doing retail graphics card, which has its own set of problems.

RISC-V's Vector extension had been designed by technology experts over many years to be a good foundation for technology such as this. This is not the first time I've heard about it being used for GPUs, and won't be the last.

It is also a strength of RISC-V that the standard allows for proprietary extensions.
Proprietary extensions tested and evaluated in the field can lead to valuable insights that get used to develop the official standard further.
 
Last edited:
  • Like
Reactions: bit_user
If it can just straight handle video encode/decode then it's gonna be light years ahead of the competition, maybe. I'm so sick to death of all this "gaming" noise all the damn time, I couldn't care less about another useless gaming benchmark! Show me some real work, how many FPS can this thing do running Topaz Video AI (or the like) ? Most have no idea how much brute force power it takes to do this kind of real work. If this card can do upscaling in Video AI at a messily 10 FPS, then it would be about 500 times faster then a 4090 which would make this card incredibly valuable to me!
 
If it can just straight handle video encode/decode then it's gonna be light years ahead of the competition, maybe. I'm so sick to death of all this "gaming" noise all the damn time, I couldn't care less about another useless gaming benchmark! Show me some real work, how many FPS can this thing do running Topaz Video AI (or the like) ? Most have no idea how much brute force power it takes to do this kind of real work. If this card can do upscaling in Video AI at a messily 10 FPS, then it would be about 500 times faster then a 4090 which would make this card incredibly valuable to me!

... "show me some real work", and then you talk about AI Video, upscaling and brute force ... sorry it made me smile a little bit.

But I do get your point.

Time will tell if they get anywhere near to a final product, and available for purchase.
 
According to info from others above, it has physical hardware units also specifically for graphics, accessible through RISC-V instruction extensions.
Yeah, I think the ray tracing must be hardware-accelerated. I had initially assumed it just implemented software ray tracing on thousands of simple, in-order RISC-V cores but the doc on their website gives me the impression it has relatively fewer cores and relies more on special-purpose accel for that.

As for how many cores per cluster, there are a couple stats we might use to work this out. We can probably get in the ballpark, if we work back from that fp64 number of 5 fp64 TFLOPS for the single-cluster version (shown in the picture). If we assume a modest clock of 2.5 GHz, that tells us we need to account for 2k floating point ops per cycle. Figure that they're talking about FMA, which gives us 2 ops per lane. So, 1k * 64-bit = 64k bits worth of SIMD pipelines, that you can divide up among cores as you like.

SIMD Width per RISC-V Core (bits)​
Number of RISC-V cores (approx)​
Rays per cycle per RISC-V core​
512​
128​
0.24​
1024​
64​
0.48​
2048​
32​
0.96​
4096​
16​
1.93​

Note that I'm considering cumulative SIMD per core, which could be divided up amongst multiple pipelines. I don't even consider less than 512-bit, because Xeon Phi implemented two pipelines of AXV-512 per core, almost a decade ago. Also, 512-bit is only SIMD-16 (fp32), which only Intel GPUs support. AMD and Nvidia haven't gone below SIMD-32. For a GPU or GPU-like architecture, wider SIMD makes more sense, because you have enough data parallelism and you want to keep down the overheads of things like instruction decoding.

If we consider AMD, RDNA uses Wave-32, but (last I checked) packs two of those engines per CU, giving the equivalent of the same 2048-bit SIMD per CU that they first introduced with GCN and have retained for CDNA. That said, CDNA is 64-bit native, so I guess when you combine that with their Wave-64 ISA, it should mean that CDNA SIMD throughput per cycle is actually 4096 bits.

Finally, I think Nvidia is still using 4 warp pipelines per SM, giving them the highest width at 4096 bits of SIMD throughput per cycle per SM.

There's nothing terribly exotic about a 16-core - or even 64-core - CPU, these days. These numbers are very believable.

Edit: I've gone back to add in the number of rays/cycle/core, based on these core counts and the figure of 77 GRays/s on the base model. Also, if that that figure of 77 GRays/s is theoretical and not measured, then it suggests maybe the actual clockspeed is about 2.6 GHz.

The only thing it seems to be missing is output to a display, but that can be added.
They actually have that. HDMI and DisplayPort.

RISC-V's Vector extension had been designed by technology experts over many years to be a good foundation for technology such as this. This is not the first time I've heard about it being used for GPUs, and won't be the last.
Think-Silicon announced it, way back in 2022. However, since they've gotten absorbed into Applied Materials, all of the old links are dead and I have no idea whether that IP went anywhere. It would be a shame, since an embedded SoC having a wide vector array you could retask for running other sorts of compute threads makes a fair bit of sense to me.
 
Last edited:
If it can just straight handle video encode/decode then it's gonna be light years ahead of the competition, maybe.
Their PDF, which I've been going through, claims "2x 8K60 streams" of "AV1, H.264/265" video encoding throughput, on the base model. If you look at my above analysis of how many cores I believe it has, I think maybe they're just using a pure software implementation.

According to recent benchmarks from Phoronix, a 64-core Zen 4 Threadripper can achieve about 15, 64, or 222 fps of AV1 encoding throughput, when processing a single stream of 4k video, depending on the quality settings. I'd bet they're claiming towards the faster end of the presets, meaning we could estimate it by dividing the 222 FPS number to account for the additional resolution and number of streams. That gives us only about 28 fps. However, keep in mind that multi-stream encoding should scale better than single-stream. Furthermore, resolution-scaling appears to be super-linear. The Phoronix' data indicates the 1080p performance varies from 2.62x to 2.93x as fast as the corresponding 4k, which suggests my estimate should be more along the lines of 38 fps.

I'm so sick to death of all this "gaming" noise all the damn time, I couldn't care less about another useless gaming benchmark!
It's funny you say that, because their PDF doesn't actually cite gaming performance (although they do talk about interactive rendering and make some extrapolations for 4k @ 120 fps). The scene it claims to use for RT benchmarking is like what movies or other production renders would use - not at all the sort of geometry you'd use for gaming.

Also, that presentation spends a few slides looking at professional computing applications, hence the focus on fp64 performance. BTW, they claim 300x accuracy vs. modern GPU and CPU fp64 arithmetic, although I think that's probably just because they support denormals. Either that, or they use higher-accuracy implementations of transcendental functions.

Show me some real work, how many FPS can this thing do running Topaz Video AI (or the like) ? Most have no idea how much brute force power it takes to do this kind of real work. If this card can do upscaling in Video AI at a messily 10 FPS, then it would be about 500 times faster then a 4090 which would make this card incredibly valuable to me!
They're sort of limited by what has been optimized for RISC-V. That assumes they can even run general-purpose CPU workloads on it (see my earlier point about lack of any mention what OS it runs or whether their interconnect is even cache-coherent).

Also, they don't claim to surpass the RTX 5090 on AI performance. So, if it's AI you want, then this isn't going to be your savior.

TBH, I don't believe their AI numbers reflect anything remotely close to real-world performance. AI is very bandwidth-intensive and that's one of their obvious weak spots.
 
Dwelling a bit more on the matter of cores & clock speeds, they do have a slide which mentions "Cache per FP32 core" and "Memory Bandwidth per FP32 core". This doesn't directly tell us how the SIMD is distributed among their CPU cores, but it does confirm the aggregate SIMD width.

The slide (page 32, if you're following along) says 64 kB per FP32 core. On page 36, they state the smallest config has 128 MB of cache, yielding a figure of 2000 FP32 "cores". That supports my estimate of 64k bits of over total SIMD width.

Likewise, page 32 references 177 MB/s of memory bandwidth. The figure of 177.08 MB/s per "FP32 core" * 2k "FP32 cores" = 354.16 GB/s, which is a little shy of their page 36 claim of 363 GB/s.

On the matter of memory bandwidth, we should also point out that the majority of it is coming from the meager 32 GB LPDDR5X. So, if that's your "high-bandwidth" memory, then it's really not that much better off than a RTX 5090, capacity-wise.

So, after going through their PDF and extracting everything I can, I tried to do a little more searching, to see if we can learn anything about their RISC-V cores, which I have a hunch they probably licensed from someone like SiFive. Although I found no announcements of such a deal, I did run across Jon Peddie's coverage of this product announcement:


I think his is clearly the best take, on this thing. He didn't go into quite the nooks and crannies as I did, but his focus made sense and I agree with everything he said.

What he didn't say is that Nvidia has invested substantial resources into their AI-based denoising and ray-sampling technologies, which I'm sure make quite a bit of difference, when using their GPUs for path tracing and makes up for (and then some?) the supposed 10x deficit on raw ray-intersection performance, claimed by the Bolt folks.
 
Last edited:
Their PDF, which I've been going through, claims "2x 8K60 streams" of "AV1, H.264/265" video encoding throughput, on the base model. If you look at my above analysis of how many cores I believe it has, I think maybe they're just using a pure software implementation.

According to recent benchmarks from Phoronix, a 64-core Zen 4 Threadripper can achieve about 15, 64, or 222 fps of AV1 encoding throughput, when processing a single stream of 4k video, depending on the quality settings. I'd bet they're claiming towards the faster end of the presets, meaning we could estimate it by dividing the 222 FPS number to account for the additional resolution and number of streams. That gives us only about 28 fps. However, keep in mind that multi-stream encoding should scale better than single-stream. Furthermore, resolution-scaling appears to be super-linear. The Phoronix' data indicates the 1080p performance varies from 2.62x to 2.93x as fast as the corresponding 4k, which suggests my estimate should be more along the lines of 38 fps.


It's funny you say that, because their PDF doesn't actually cite gaming performance (although they do talk about interactive rendering and make some extrapolations for 4k @ 120 fps). The scene it claims to use for RT benchmarking is like what movies or other production renders would use - not at all the sort of geometry you'd use for gaming.

Also, that presentation spends a few slides looking at professional computing applications, hence the focus on fp64 performance. BTW, they claim 300x accuracy vs. modern GPU and CPU fp64 arithmetic, although I think that's probably just because they support denormals. Either that, or they use higher-accuracy implementations of transcendental functions.


They're sort of limited by what has been optimized for RISC-V. That assumes they can even run general-purpose CPU workloads on it (see my earlier point about lack of any mention what OS it runs or whether their interconnect is even cache-coherent).

Also, they don't claim to surpass the RTX 5090 on AI performance. So, if it's AI you want, then this isn't going to be your savior.

TBH, I don't believe their AI numbers reflect anything remotely close to real-world performance. AI is very bandwidth-intensive and that's one of their obvious weak spots.

This is the kind of "real work" I'm referencing below:

View: https://youtu.be/naV-J1kfZmQ


When top end hardware can only muster 2.2 FPS, then clearly we need to change the way we do things. I think you misunderstood my gaming reference. Nobody can even talk about a GPU anymore it seems without using meaningless gaming benchmarks, as if games are the only thing in the world that matters. I'm sick to death of all the meaningless "gaming" noise everywhere all the freakin time!
 
  • Like
Reactions: RodroX and bit_user
I think you misunderstood my gaming reference. Nobody can even talk about a GPU anymore it seems without using meaningless gaming benchmarks, as if games are the only thing in the world that matters. I'm sick to death of all the meaningless "gaming" noise everywhere all the freakin time!
If you're talking about the article's author or people posting in these forums, I'd just point out that (for better or for worse) this site does have a bias towards gamers. I'm not really sure why that is, but maybe the more general computing enthusiasts fled to sites like ServeTheHome or Phoronix, which cater more to non-gaming interests.

If you saw the questionnaire the site just launched, it sounds like they're currently in the process of re-evaluating their priorities. That would probably be a good avenue to make your opinions known. Just be aware that (according to others - I have yet to open it) it does seem like a feeler for a premium subscription. I think you can still just fill out what you're comfortable with and maybe it will have a positive impact.
 
  • Like
Reactions: t3t4
If you're talking about the article's author or people posting in these forums, I'd just point out that (for better or for worse) this site does have a bias towards gamers. I'm not really sure why that is, but maybe the more general computing enthusiasts fled to sites like ServeTheHome or Phoronix, which cater more to non-gaming interests.

If you saw the questionnaire the site just launched, it sounds like they're currently in the process of re-evaluating their priorities. That would probably be a good avenue to make your opinions known. Just be aware that (according to others - I have yet to open it) it does seem like a feeler for a premium subscription. I think you can still just fill out what you're comfortable with and maybe it will have a positive impact.

Thanks for the site suggestions.

But it's not this article or even this site that I'm complaining about, it's the industry as a whole that I'm referring to in regards to GPU testing and review across the board. It's all games all the time everywhere it seems, I'm just sick of it. Gaming is probably the least important thing a GPU is capable of doing and yet seems to be the singular focus by almost everyone almost everywhere almost every time.

And that's my rant for the day 😁.

Oh, and I did fill out their survey and yes, it does seem to be about my willingness to pay for a subscription. The answer provided was a rock solid hell no!
 
But it's not this article or even this site that I'm complaining about, it's the industry as a whole that I'm referring to in regards to GPU testing and review across the board. It's all games all the time everywhere it seems, I'm just sick of it. Gaming is probably the least important thing a GPU is capable of doing and yet seems to be the singular focus by almost everyone almost everywhere almost every time.
Yeah, I've followed the evolution of GPUs since the days, when the only hardware acceleration of 3D graphics was in expensive UNIX workstations and definitely not for gaming. Back then, pixel shaders were only used in Hollywood movie production (see Rendeman) and inconceivable in anything realtime. No doubt, gaming has driven the evolution of GPUs, but then we also have things like VR that have even surpassed the CAD and scientific visualization applications that were among their first drivers.

The thing about AI is that GPUs aren't really the best architecture for it. GPUs are basically the second most general type of processor, following CPUs. They don't care how coherent your data access is or how much communication you do between threads. If you can divide your workload into tons of threads and if it's SIMD-friendly, then it will work on a GPU. That's why it was a natural choice for neural networks.

NPUs are more specialized towards exactly the data access patterns and types of arthmetic operations that are needed by deep learning. That why, per Watt or per mm^2 of silicon, they're much more efficient than GPUs. I'd even say NPUs are more closely related to DSPs than GPUs.

It will be interesting to see if AMD's upcoming UDNA architecture truly manages to bridge the gap. It is definitely weird that phones and now laptops have separate GPU and NPU blocks, in spite of the functional overlap. Seems like a waste of silicon.

And that's my rant for the day 😁.
Thanks for explaining your points.