News Positron AI says its Atlas accelerator beats Nvidia H200 on inference in just 33% of the power — delivers 280 tokens per second per user with Llama...

Admin · 2025-07-28T12:40:02-0400

Cloudflare is testing Positron AI's Atlas machine based on Archer accelerators, an inference-only solution that claims to outperform Nvidia's H200 DGX using one-third the power.

Positron AI says its Atlas accelerator beats Nvidia H200 on inference in just 33% of the power — delivers 280 tokens per second per user with Llama... : Read more

Kindaian · 2025-07-28T14:04:58-0400

ZzZzZ Give me a PCIe expansion card (not 16x as that is already used by the graphics card), that can do 1/10 of that, and used 1/10 of the power (or rater, under the power limit of the PCIe slot). And with an affordable price, then i may consider it.

No need for more than that for my desktop as i'm not intending to build a datacentre.

BUT for the datacentre people, i'm sure they will be interested in that!

nimble541 · 2025-07-28T15:00:25-0400

...hands Positron nail and hammer.. while handing Nvidia a coffin.

ASICs will win in the end.

kealii123 · 2025-07-28T17:00:12-0400

For us consumers, not data centers, being able to fine tune is where the real gold is at. If you can only run a 32b model on your dual 3090s via SLI, it better be fined tuned for your task otherwise Cline or OpenHands isn't going to be much worth to you.

From what I read as a layman, it seems these chips are hyper optimized for inference only, and probably aren't useful for training/tuning.

phead128 · 2025-07-28T18:32:02-0400

ASIC and inference is going to trend towards electricity prices.

PapaMalevolum · 2025-07-28T19:23:25-0400

The article makes no note of it, but this first generation accelerator from Positron is not actually an ASIC - it is an FPGA-based accelerator. I was trying to figure out how a company that was founded only 2 years ago with a couple dozen engineers is already sampling cutting-edge ASIC hardware - it makes sense knowing that it is actually an FPGA platform with physical hardware produced presumably by AMD/Xilinx or Intel/Altera.

If Positron's metric claims are true, it is interesting to see how much more efficient their logical architecture is for inference than Nvidia's, especially given that FPGAs have a significant physical handicap in density and efficiency versus true ASICs and mainstream GPUs+CPUs.

PapaMalevolum · 2025-07-28T19:25:20-0400

I am also curious to see how Positron's performance and efficiency compare to the internally-developed AI inference accelerators by these same FPGA manufacturers.

bit_user · 2025-07-28T19:45:08-0400

PapaMalevolum said:
If Positron's metric claims are true, it is interesting to see how much more efficient their logical architecture is for inference than Nvidia's, especially given that FPGAs have a significant physical handicap in density and efficiency versus true ASICs and mainstream GPUs+CPUs.

The newer ones have hard-wired arithmetic pipelines, which goes some ways towards eliminating the efficiency deficit of FPGA vs. ASIC.

Furthermore, the XDNA NPUs in AMD's current laptop chips are descended very closely from the design lineage of Xilinx Versal cores. You can see them discussed and analyzed down in the middle of this post:

https://chipsandcheese.com/p/hot-chips-2023-amds-phoenix-soc

Mr Majestyk · 2025-07-28T21:26:05-0400

nimble541 said:
...hands Positron nail and hammer.. while handing Nvidia a coffin.

ASICs will win in the end.

Here's hoping that the current brain-dead brute force approach eschewed by Nvidia, Microsoft, Meta, Telsa etc, is soon outlawed just on the basis of electricity consumption, allowing a new bunch of smarter players to thrive.

JTWrenn · 2025-07-28T21:52:21-0400

This market is just too capital rich for me to trust any claims of such a huge leap over the leader in the market. I will wait to see how it works, and if it can scale. i bet that will be the issue. Per system it's there but the whole infrastructure you need to run it I wonder if it has the good. Also, every time I see one of these it turns out to be a niche case or something.

I hope the efficiency is true, but I just don't trust this type of claim in this type of market.

Search

News Positron AI says its Atlas accelerator beats Nvidia H200 on inference in just 33% of the power — delivers 280 tokens per second per user with Llama...

Admin

Administrator

Kindaian

Great

nimble541

kealii123

Prominent

phead128

Notable

PapaMalevolum

PapaMalevolum

bit_user

Titan

Mr Majestyk

Distinguished

JTWrenn

Distinguished

TRENDING THREADS

Latest posts

Moderators online

Share this page