News Sohu AI chip claimed to run models 20x faster and cheaper than Nvidia H100 GPUs

Diogene7

Prominent
Jan 7, 2023
72
13
535
In theory, Intel seems to have shown that their Intel MESO (Magnetoelectric spin-orbit) logic, a post-CMOS technology related to spintronics, is much more amenable to both low-power logic and low-power neuromorphic computing.

For example, emerging Non-Volatile-Memory (NVM) / Persistent Memory MRAM is already based on spintronics phenomena

Therefore much more R&D ressources should be allocated to develop new manufactuting tools to improve and lower MRAM manufacturing cost, and then improve those tools to evolve to MESO manufacturing : this would be much, much more groundbreaking !!!

https://www.techspot.com/news/77688-intel-envisions-meso-logic-devices-superseding-cmos-tech.html

https://www.imec-int.com/en/press/i...how-record-low-switching-energy-and-virtually
 

JTWrenn

Distinguished
Aug 5, 2008
297
204
19,170
This was an interesting bit of tech news but....from what I can tell they have yet to actually build a working chip. They just have an idea and not even a fully fleshed out design which they will outsource. Seems like it could be amazing but can anyone find anything saying they have built and tested one? Right now this feels like vaporware with a huge upside if it works but s ton of hurdles to get to there.

Am I wrong on that? Or does this thing only exist on paper right now?
 

Diogene7

Prominent
Jan 7, 2023
72
13
535
This was an interesting bit of tech news but....from what I can tell they have yet to actually build a working chip. They just have an idea and not even a fully fleshed out design which they will outsource. Seems like it could be amazing but can anyone find anything saying they have built and tested one? Right now this feels like vaporware with a huge upside if it works but s ton of hurdles to get to there.

Am I wrong on that? Or does this thing only exist on paper right now?

To my knowledge, the European semiconductor research center IMEC was able to build a prototype but not with all needed key metrics.

So indeed there isn’t yet any hard proof yet.

That the reason why I would first allocate significant AI ressources to help find appropriate materials that would help build some prototypes.

From there, more ressources will be needed to build manufacturing tools but some of them may be able to piggy back on MRAM manufacturing tools.

That’s the reason why it is also important to scale up MRAM manufacturing to accelerate the transition to spintronics technology.

My personal belief is that we are currently at the start of the transition from silicon transistors to spintronics, and we are like in the 1950’s / 1960’s when solid state silicon transistors were emerging to replace vacuum tubes…
 

CmdrShepard

Prominent
Dec 18, 2023
432
317
560
This is so stupid, the time of fixed function hardware was in the 90s. At that time the software technology developed much slower so it made sense to use fixed function hardware to accelerate stuff that wasn't about to change with the next big fad.

Today we have transformers, but tomorrow? And what are datacenters going to do with all those useless ASICs then? More e-waste. At least with H100 you can program and run whatever you want.
 
  • Like
Reactions: JTWrenn

wwenze1

Reputable
Mar 22, 2020
11
4
4,515
This is so stupid, the time of fixed function hardware was in the 90s. At that time the software technology developed much slower so it made sense to use fixed function hardware to accelerate stuff that wasn't about to change with the next big fad.

Today we have transformers, but tomorrow? And what are datacenters going to do with all those useless ASICs then? More e-waste. At least with H100 you can program and run whatever you want.
Crypto ASICs anyone
 
  • Like
Reactions: cyrusfox

abufrejoval

Reputable
Jun 19, 2020
441
300
5,060
To my knowledge, the European semiconductor research center IMEC was able to build a prototype but not with all needed key metrics.

So indeed there isn’t yet any hard proof yet.

That the reason why I would first allocate significant AI ressources to help find appropriate materials that would help build some prototypes.

From there, more ressources will be needed to build manufacturing tools but some of them may be able to piggy back on MRAM manufacturing tools.

That’s the reason why it is also important to scale up MRAM manufacturing to accelerate the transition to spintronics technology.

My personal belief is that we are currently at the start of the transition from silicon transistors to spintronics, and we are like in the 1950’s / 1960’s when solid state silicon transistors were emerging to replace vacuum tubes…
Hmm, I can almost picture where you're working (Sophia Antipolis or Grenoble?) or who for (CEA?)

I'm afraid you're preaching a reasonable message to the wrong crowd here...
Mais peut-être qu'elle finira par atteindre les personnes qui détiennent la bourse.
 
  • Like
Reactions: bit_user

abufrejoval

Reputable
Jun 19, 2020
441
300
5,060
This is so stupid, the time of fixed function hardware was in the 90s. At that time the software technology developed much slower so it made sense to use fixed function hardware to accelerate stuff that wasn't about to change with the next big fad.

Today we have transformers, but tomorrow? And what are datacenters going to do with all those useless ASICs then? More e-waste. At least with H100 you can program and run whatever you want.
I'm pretty sure the H100 won't run LibreOffice writer all that well, while I've longed for a GPU-redesigned LibreOffice Calc, basically ever since AMD did demo an HSA variant for Kaveri.

I've managed to run games on V100s using VirtGL and Steam remote gaming on GPU pass-through VMs, just to prove it can be done.

But I still doubt that recycling H100 or B100 will be much better than trying to squeese value from tons of K40s, P/V100s or in fact anything left over from supercomputers once they've finished their term.

Even at zero purchase cost, the operational expenses kill that older hardware within years and now perhaps months.

Sometimes I'm actually glad I only have kids not shares.
 

bit_user

Polypheme
Ambassador
The article said:
Most large language models (LLMs) use matrix multiplication for the majority of their compute tasks and Etched estimated that Nvidia’s H100 GPUs only use 3.3% percent of their transistors for this key task. This means that the remaining 96.7% silicon is used for other tasks, which are still essential for general-purpose AI chips.
No, a significant chunk of it is used for fp64 and that's definitely not needed for AI!

This is the weird thing: by now, I fully expected Nvidia and AMD to further specialize their high-end silicon on AI and fork the HPC stuff. There's still a chance they might, by the time Etched can ever scale up production of their Sohu chip.

Anyway, this might ultimately be what rains Etched's parade:
 
  • Like
Reactions: JTWrenn

bit_user

Polypheme
Ambassador
I've managed to run games on V100s using VirtGL and Steam remote gaming on GPU pass-through VMs, just to prove it can be done.
The V100 is a different animal. It was Nvidia's last to have a full contingent of TMUs and ROPs. The A100 had that stuff on only one SM and I haven't heard of Hopper featuring graphics silicon at all!

Even at zero purchase cost, the operational expenses kill that older hardware within years and now perhaps months.
Yeah, for one thing you need a chassis that can host those SXM boards and those won't be cheap, quiet, low-power, or small. And then you've got to pay the electric bill.

Yes, PCIe versions of these GPUs exist, but I'm sure the vast majority of them are of the SXM variety. And the PCIe cards will need a fairly heavy-duty bolt-on cooler, when used outside of a high-airflow server case.
 

JTWrenn

Distinguished
Aug 5, 2008
297
204
19,170
No, a significant chunk of it is used for fp64 and that's definitely not needed for AI!

This is the weird thing: by now, I fully expected Nvidia and AMD to further specialize their high-end silicon on AI and fork the HPC stuff. There's still a chance they might, by the time Etched can ever scale up production of their Sohu chip.

Anyway, this might ultimately be what rains Etched's parade:
I think this is the main point. Nvidia has orders of magnitude more capital and engineers as well as experience in this. The idea that they haven't thought of this approach seems silly.

This is an arms race between companies. At this size with this level of money I don't see Nvidia not going a similar route without a damn good reason. Possibly because they see the fact that AI is still just starting and if you have a custom card for only one type of AI...you could suddenly have a totally obsolete card just by a new AI trend/standard coming out.
 

edzieba

Distinguished
Jul 13, 2016
494
480
19,060
No, a significant chunk of it is used for fp64 and that's definitely not needed for AI!

This is the weird thing: by now, I fully expected Nvidia and AMD to further specialize their high-end silicon on AI and fork the HPC stuff. There's still a chance they might, by the time Etched can ever scale up production of their Sohu chip.

Anyway, this might ultimately be what rains Etched's parade:
Nvidia keep the rest of the GPU gubbins because they are targeting training, not inference. The rest of the non-tensor bits are there for all the support tasks needed for training, from basic stuff like image decoding and scaling (taking high resolution JPEGs of random size and producing low resolution bitmaps of predetermined size to feed to the model) to producing training data from whole cloth, as in their autonomous navigation training using rendered environments (rasterization and raytracing) rather than recorded camera data.
 

bit_user

Polypheme
Ambassador
Nvidia keep the rest of the GPU gubbins because they are targeting training, not inference. The rest of the non-tensor bits are there for all the support tasks needed for training, from basic stuff like image decoding and scaling (taking high resolution JPEGs of random size and producing low resolution bitmaps of predetermined size to feed to the model)
No, you don't need fp64 for training or image decoding. You don't even need nearly as much fp32 horsepower is at has.

to producing training data from whole cloth, as in their autonomous navigation training using rendered environments (rasterization and raytracing) rather than recorded camera data.
First, I'm going to need a citation on that. Second, this would be devoting lots of silicon to cater to < 1% of their users. A far more cost effective solution would be to use pre-rendered content or mix some L40 nodes in those machines. Not least, because I'm pretty sure the Hopper GPUs have no ROPs, TMUs, or RT cores.

Again, significant fp64 horsepower isn't needed for rendering, which is why their gaming graphics cards have so little of it.
 

edzieba

Distinguished
Jul 13, 2016
494
480
19,060
First, I'm going to need a citation on that.
Here's a demo from a few years ago.
Second, this would be devoting lots of silicon to cater to < 1% of their users.
The vast majority of H100 buyers are buying them for NN training. It'd be nice if you could train models without ever needing to worry about having any of that pesky data to train them on, or performing any measurement of model output to feed back into weighting, but that's not the case. A surprising amount of your power budget goes on 'support' functions for the actual matrix math bit, shoving input and output data round and massaging it into the formats needed.
 

CmdrShepard

Prominent
Dec 18, 2023
432
317
560
Again, significant fp64 horsepower isn't needed for rendering, which is why their gaming graphics cards have so little of it.
No.

Their gaming cards have so little FP64 because of artificial cosumer .vs. professional product differentiation, nothing else.

Want FP64? Spring for a Quadro, which is essentially the same chip with different PCI ID and not gimped FP64.
 

bit_user

Polypheme
Ambassador
Presumably you're not disagreeing that interactive rendering doesn't need fp64, no?

Their gaming cards have so little FP64 because of artificial cosumer .vs. professional product differentiation, nothing else.
It's hard to say that, when you consider the amount of silicon it requires. If they would put more fp64 horsepower in client GPUs, the effect would be worse perf/$ on interactive rendering tasks.

This fairly recent paper analyzed the silicon area needed for pipelined and non-pipelined fp32 and fp64 arithmetic. They found a pipelined fp64 ALU required 3.33x as much area as pipelined fp32 ALU and 3.37x as much area when not pipelined. This aligns pretty well with a heuristic I once heard that just the multipliers should have a ratio of 4.88x.

In other words, fp64 is no free lunch. For something really to be a case of artificial product segmentation, it would have to be a feature that basically comes for free. Client GPUs typically provide scalar fp64, since some level of fp64 arithmetic support is required for OpenGL 4.x conformance and it's quite nice to have for the occasional matrix inversion, among other things. In fact, Intel entirely left out hardware fp64 support from the client Xe GPUs in Gen 12 iGPUs and Alchemist dGPUs, only to have to reverse course and re-add it in Battlemage.

It's not since 5 years ago that we last had a client GPU (Radeon VII) with vector fp64 support. In all instances where this happened, it was a case of a datacenter GPU die that got repurposed for the consumer market. However, AMD's RDNA/CDNA schism slammed the door shut on that ever happening, again. For Nvidia, the V100 got released as the Titan V, but I'm not sure you can call a $3k GPU really a consumer model - especially back in 2017. Nvidia also closed the door on this approach by dropping the rendering hardware from most of the SMs in the A100 and (AFAIK) leaving it completely out of the H100.

Want FP64? Spring for a Quadro, which is essentially the same chip with different PCI ID and not gimped FP64.
LOL. Okay, tell us which models, then.

I can save you some trouble, though. The last one was 2018's Quadro GV100, which was basically a 16 GB version of the Titan V. Before that, the Quadro GP100 launched in 2016, but there was no consumer card based on the same die. And before that, 2013's Quadro K6000 featured the GK110 and was the workstation equivalent of the GTX 780 and above. The GTX 780/Ti was the last Nvidia silicon with vector fp64 ever sold in a consumer product for less than $1k.

So, your little conspiracy theory really hasn't had any truth to it for Nvidia for about a decade. I say "any truth", because even back then, most of their client GPUs never had the silicon for doing vector fp64. That timeframe roughly aligns with AMD, if we overlook the Radeon VII, because the prior example from them was 2013's Tahiti. And the reason we might not count the Radeon VII is that they only slightly gimped its fp64 horsepower, just cutting it in half vs. the workstation & server incarnations. I use the word "slightly" to put it in contrast to how Nvidia cut the GTX 780 (Ti)'s fp64 by a factor of 8!

While you weren't paying attention, Nvidia also dropped the Quadro branding. The last Quadro-branded cards launched in 2018.
 
Last edited:

bit_user

Polypheme
Ambassador
So what you are saying is that for example RTX A6000 and RTX 3090 Ti aren't the same chip
No, I didn't say any such thing. What I said is that neither of them support vector fp64. If you follow your own links, you'll see they both have a 1:64 ratio between fp32 and fp64. That's what you get when all you have is a single scalar fp64 pipe per SM.

You claimed GPU makers are nerfing the fp64 capabilities of consumer products, but Nvidia hasn't done that since 2013's GTX 780 (Ti). AMD did it a little bit with Radeon VII (circa late 2018).