News MemryX launches $149 MX3 M.2 AI Accelerator Module capable of 24 TOPS compute power

This thing is virtually useless, for most users. It won't run LLMs or Stable Diffusion. And no, adding two of them won't turn your machine into a "CoPilot+" PC.

The main problem it faces is the lack of DRAM and connectivity is too poor for it to effectively fall back on host memory as a substitute. Even with PCIe 5.0, I think that would be an inadequate solution.
 
This thing is virtually useless, for most users. It won't run LLMs or Stable Diffusion. And no, adding two of them won't turn your machine into a "CoPilot+" PC.

The main problem it faces is the lack of DRAM and connectivity is too poor for it to effectively fall back on host memory as a substitute. Even with PCIe 5.0, I think that would be an inadequate solution.
25 TOPS is enough for text-response AI, even with slower memory.

Sounds like someone has a horse in the race.
 
25 TOPS is enough for text-response AI, even with slower memory.
What good is 25 TOPS when you're limited to streaming in weights at a mere 4 GB/s (best case)? That PCIe 3.0 x4 interface would likely be such a bottleneck that you might as well just inference on your CPU, if your model doesn't fit in the on-chip memory.

Furthermore, their datasheet doesn't even say it's capable of weight-streaming from host memory! It just says it supports models with 40M 8-bit parameters (or 80M 4-bit), period.

Most pointedly, they themselves don't even list LLMs as among its possible applications. The only demo I've seen is object detection/classification.

Sounds like someone has a horse in the race.
I point out lots of weaknesses in products written about on this site. That doesn't mean I have a stake in competing products.

I found the article alarming, because the author seems to be essentially regurgitating a press release and utterly failed to point out critical weaknesses and limitations in the product. It's so bad that I found myself questioning whether the author even understands what they're writing about or how an end user might even use it. The reason for my comment is simply to warn others about these oversights.

Are you really so confident it can be used for LLM inferencing that people should just go ahead and buy it? Did you even look at the datasheet of the product? If not, you might want to do that, before replying.
 
This thing is virtually useless, for most users. It won't run LLMs or Stable Diffusion. And no, adding two of them won't turn your machine into a "CoPilot+" PC.

The main problem it faces is the lack of DRAM and connectivity is too poor for it to effectively fall back on host memory as a substitute. Even with PCIe 5.0, I think that would be an inadequate solution.
Maybe for your purposes, but for anyone currently using a Google Coral TPU, which, given how long every version of it was consistently sold out for, is quite a few, this would present a fairly significant boost.

I personally use a dual Coral TPU card in my home automation server to handle real-time security camera footage processing, since I the idea of sending it to some company's cloud is all kinds of disturbing to me, and it does a great job with facial/license plate recognition, etc.

Point is, there are all kinds of uses for this kind of thing other than just running local LLMs, and it has more than enough processing power for many people's purposes.
 
I personally use a dual Coral TPU card in my home automation server to handle real-time security camera footage processing, since I the idea of sending it to some company's cloud is all kinds of disturbing to me, and it does a great job with facial/license plate recognition, etc.
A lot of cameras now have that capability builtin, even cheap ones. A friend mentioned that he recently bought some $40 cameras and he thought their object detection & classification capability is pretty good.

Speaking of price, the dual-processor Coral board is only $40, while this board is $149. They say there's a version with only two chips, but I don't see it for sale. This offers 6 TOPS per chip, while Coral offers only 4. This seems to have 10 MB of weights storage, while Coral has 8 MB. So, it's better than Coral, but also nearly twice as expensive per chip. However, Coral launched 5 years ago, so I'd say the MX3 isn't nearly where it should be, which is about 10x as good as the Coral and costing roughly the same.

I think there's also a possibility the MX3 won't support some layer in the model you might want to use, so that would be another thing to check. Google is very clear about which layer types Coral supports, and you can use it via TensorFlow Lite.

Point is, there are all kinds of uses for this kind of thing other than just running local LLMs, and it has more than enough processing power for many people's purposes.
I would say not "all kinds", due to the severe model size limitation. For larger models, Coral can stream in weights, whereas nothing in the MX3 literature indicates it can do the same.
 
Last edited: