No, like OpenPOWER did, it would come too late.
Well it was really more meant as: if you want other people's crown jewels, you better be ready to offer some of your own...
You raise a good point about the $Billions being spent on accelerators that will just become e-waste, as future AI models exceed their capacity and render them obsolete. Most of these aren't even built as PCIe cards you could put in a standard PC.
Microsoft is supposed to spend 50 billion in 2024 alone on AI data center buildout. Google is investing similar numbers for their TPU infra. That's somewhere near the GDP of Estonia for each of them and that money needs to offer a sizeable return within say five years.
All Nvidia Hopper assembly capacity for 2024 is allocated today with Microsoft and some of the other giants getting near all of it. The giants have gone all-in, bought near total exclusivity, bet the future of their company on AI becoming a viable commercial product for the masses, but I completely fail to see the value of AI under the family christmas tree, no matter how much inference acceleration PC and mobile makers are putting into end-user devices.
And that exclusivity is not going to last past 2025 by which time they might be obsolete or the (x00 billion) bubble might have burst.
I've run all kinds of LLMs on my RTX 4090, up to 70b Llama-2 (albeit with somewhat ridiculous quantizations for that one), also Mistral 7B, JAIS 13+30B, even the completely ridiculous Yi model from 01.ai and they are just laughable in how badly they hallucinate today.
How that's supposed to get better once you squeeze them into the edge, I just don't see.
This is much worse than the cryptocalypse, because much smarter people are gambling this time around...
However, some do still use PCIe form factor. If anyone wants a deal on some previous-generate GPU compute hardware, checkout Nvidia V100's on ebay. At 7 TFLOPS of fp64, it's still way more than you can get on consumer GPUs (if you actually need that sort of thing).
I've got a couple of V100 in my corporate lab. Back then when FP16 was a thing, they were pretty nice.
And just as a finger exercise I also ran the Superposition demo in fluid interactive game mode on them across 600km distance using my ultra-thin notbook as a remote screen. Most of my colleagues didn't even realize that it wasn't my Whiskey Lake ultrabook that ran the graphics, one of the weakest iGPUs of recent history...
Yes, remote gaming would be possible in theory, more attractive if you don't pay the electricity bill.
For LLM training they are way too small, for inference they don't support the lower precision data types and sparsity that it takes to compete today. They are very near unusable for current AI and an RTX4070 will push them against the wall on pretty near every workload. No DLSS, either, and gaming has just moved to the point where eye candy needs it.
Consumers would have to find ways of cooling them, because they come only with shrouds. Supermicro sold them in a workstation chassis with external fans, that at least weren't quite the turbines you have in the data center.
All these Hoppers are much worse at any other purpose, all these TPUs are turning into e-waste as we speak. I got a couple of K80s in storage, quite capable cards, too, at FP64, but out of favor even with CUDA these days and zero residual value.
I'm trying to imagine five years from now and I'm just glad I'm as invested into AI as much as I was in IoT: dabbling is fun, but I wouldn't want my pension fund in on that gamble, either.