News Jim Keller slams Nvidia's CUDA, x86 — 'Cuda’s a swamp, not a moat. x86 was a swamp too

hsv-compass · Feb 19, 2024

bit_user said:
You're confusing different things. What you're talking about is iteration on a design where you're not burdened with the legacy of backward compatibility. In this case, you can remove or streamline things that don't work or are needlessly complex. You can find better ways to do things and optimize the design to emphasize them, etc.

What CUDA has to deal with is supporting 17+ years worth of just about every idea they ever had, whether it was good or bad and regardless of how it interacted with the other features of the language/API. In that sense, it's like the C++ of GPU Compute APIs (in the sense that C++ was heavily-evolved and needlessly complex for what it does).

CUDA is basically like a dirty snowball, rolling down a hill, accumulating random cruft as it rolls along. You can't pull anything out of it, for fear of breaking some codebase or another, so it all just accumulates.

I did a little CUDA programming around 2010 or so, before I picked up a book on OpenCL and started dabbling with it. OpenCL immediately seemed so much cleaner and more self-consistent. It's like they took all of the core ideas that had been proven in CUDA and other GPGPU frameworks and re-implemented them with a blank slate. That's one thing I like about OpenCL and SYCL.

And what have you done to elevate yourself to such a vaunted status where we should take your word over that of such an industry luminary?

linked-in jobs for cuda : 1699 (352 are Nvidia openings)
linked-in jobs for opencl : 333 (73 are Nvidia openings)
linked-in jobs for direct compute / direct ML : 3

if you want platform independence ... opencl or direct compute/ml

if you want time to market, market availability, hosting, toolchain support, performance, programming resources ... cuda and opencl (nvidia gpu only available commercial hosing for opencl) ... features support in opencl sdk tend to lag cuda. so if performance, TPU support is important to you, still CUDA.

... AWS still hosts Kepler (~2012) series instances. don't need to port working s/w
... OPENCL on AWS mostly supported by P3 instance (nvidia)

clean slate ecosystems are nice but the practical reality is many have tried ...

graphcore
larrabe
Close to Metal
ROCm
GPUOpen

Keller is a genius but overcoming 18 year of commitment / 10 learning cycles will be difficult (major architecture iterations)

... its ~~a dirty snowball~~ ice-cream with many yummy chocolate sprinkles

dalauder · Feb 20, 2024

ezst036 said:
Maybe Jim should stop flapping his lips and see to it that Tenstorrent produces some motherboards.

Without some ATX boards, we can't use his RISC-V designs.

With that said, we're stuck in the swamp and he isn't supplying any rope so we can try to stop drowning in it. So what do I care what he says? He isn't producing any solutions to my problem that's for sure. Talk is cheap.

This is just another installment in the daily drama.

I'm pretty sure Jim Keller has a blank check to say whatever he wants regarding architectures.

Somehow, the article didn't credit him for work on x64 like Stars (K10) and Zen architecture too.

EDIT: Keller did K8, which led to K10, I think. K8 were the first x64 mainstream processors.

dalauder · Feb 20, 2024

CmdrShepard said:
He said "It was built by piling on one thing at a time" -- I dare you give an example of anything that wasn't made by iterating the design as the capabilities and demands changed.

Such a dumb comment should be ridiculed, not promoted.

Alas, journalism doesn't exist anymore. This isn't journalism, it's regurgitating crap from social media. Parrots can write better news.

You're right about bad journalism. But people have gotten worse at reading, in fairness. Jim's complaining about the burden of backward compatibility, basically. NVidia started CUDA in like 2007. That's problematic.

Not piling on/original Intel architectures that I recall (I don't know 1980s):
386, Pentium 1 (I think P2 was just faster), Pentium III, Pentium 4 (a debacle), Nehalem (first Core i series), Sandy Bridge, Skylake

...I think Core 2 Duo was fundamentally Pentium III, but I could be wrong.

AMD:
K6, K8, Bulldozer (a debacle), Zen

As far as I know, most ARM architectures are redesigns, more than Intel's traditional pile-ons.

bit_user · Feb 20, 2024

CmdrShepard said:
There's no such thing in real life. Clean breaks are just wet dreams of every junior developer facing their first large code base.

There are plenty of cases where API compatibility is either intentionally broken or never existed due to not having a stable contract, in the first place. We need look no further than the Linux Kernel, which has no stable internal APIs, including for device drivers!

I think you're drawing a false equivalence between being able to break backward compatibility and starting from a clean slate. In order to effectively iterate, you don't need to start from a clean slate. You just need the possibility to break backward compatibility where & when it makes sense. That's all I'm talking about, but it's not really true for something like CUDA.

CmdrShepard said:
Except he didn't mention any one specific thing that is bad and should be removed or streamlined. Citation please.

He doesn't have to cite specific deficiencies to characterize CUDA's nature as a dirty snowball (or "swamp", as he put it), though it certainly wouldn't hurt his case.

CmdrShepard said:
I personally submitted some bugs for NVJPEG codec and those were fixed while one method was deprecated because I have shown it was broken by design.

Removing stuff that's broken by design doesn't count. If it's not possible to implement it correctly, as specified, then it's already broken and you're not really breaking it by removing it.

CmdrShepard said:
C++ needlessly complex for what it does?!? Okay, now you have shown that you are totally out of your depth here.

I've written C++ for over 2 decades, much of it in complex, high-uptime systems. I read Stroustrup cover to cover. I've read Meyers, Josuttis, Vandevoorde, and others. I've done more than a little template metaprogramming. I've participated in the Boost developers mailing list for many years, and even made a couple submissions to CppCoreGuidelines. But yeah, maybe I'm out of my depth.

CmdrShepard said:
Do you know that Apple authored OpenCL?

Yes, and then they unceremoniously walked away. What does that have to do with anything? FWIW, I even contributed bugfixes to its original C++ wrappers.

CmdrShepard said:
Too bad they weren't happy with OpenCL for compute and had to create Metal which is now used for AI / ML on their platform.

I think it was really for business reasons. They decided they were big enough they didn't need OpenCL either to provide themselves vendor flexibility or to attract developers. It ran counter to their walled garden strategy, which is probably the same reason Google never embraced it on Android.

CmdrShepard said:
But social intelligence? No, he has an attitude of a tech-bro, dissing competitors and boasting about his own stuff.

That's not the vibe I got from his interviews with Ian Cutress, when Ian was still at Anandtech.

CmdrShepard said:
I don't think I need any special credentials to call someone out on their <Mod Edit>.

Well, you expected him to provide a robust case against CUDA, but he lead the team building Tesla's self-driving SoC and is CEO of an AI chip startup. I think that gives his words enough weight that I'm not surprised to see them covered, here.

While I think he has a point, I've already mentioned that these statements probably didn't just come out of nowhere. There's clearly some reason he's speaking out, especially since he doesn't seem to tweet very frequently.

bit_user · Feb 20, 2024

CmdrShepard said:
I gave you an example of Iray

Okay, so tell me what percentage of PC games you think are written in CUDA. Not games which link to a Nvidia library that uses CUDA, but actually contain CUDA code in their own code base.

...because that's the claim @yeyibi seemed to be making and that I was answering, when you decided it was a lot more important to show off your CUDA knowledge than to achieve real clarity on that 1st order issue.

CmdrShepard said:
I never said you have to, just that there are people who don't want Python at all.

And how is that relevant?

bit_user · Feb 20, 2024

CmdrShepard said:
Oh and by the way, he said "There is a good reason there is Triton, Tensor RT, Neon, and Mojo" -- too bad there isn't some search engine that crawls the Internet which he could have used to find Tensor RT GitHub repo and see that you need CUDA to build it.

What I took from that was that Nvidia did the hard parts involving CUDA, because they don't expect most developers to undertake that burden.

CmdrShepard said:
He also said "If you do write CUDA, it is probably not fast" -- I wonder how those other frameworks that rely on CUDA work so well then?

I thought it was pretty obvious that he meant it's a lot of work and/or involves a big learning curve to squeeze good performance out of it - not that it's not possible.

Have you ever tried to achieve close to the theoretical performance of something like a convolution or tensor product, using CUDA? As recently as about 5 years ago, accounts I've read seemed to indicate that it's not easy, and that was just for a single hardware target. If you want performance portability, I'm not even sure you can write a generic version which accomplishes it. There's a reason CUDNN is so enormous (over 600 MB, last I checked)!

bit_user · Feb 20, 2024

First, I should welcome you to the forums! Thanks for sharing this info, especially since I'm not very attuned to the state of cloud GPU instances.

hsv-compass said:
linked-in jobs for cuda : 1699 (352 are Nvidia openings)
linked-in jobs for opencl : 333 (73 are Nvidia openings)

This should surprise exactly no one who would be remotely qualified for any of these jobs. We can get onto a whole tangent about OpenCL, but let's not.

hsv-compass said:
if you want platform independence ... opencl or direct compute/ml

FWIW, there are other platform-independent APIs for taking advantage of GPU compute. WebGPU is one option, which should even be accessible from most compiled languages via Web Assembly. Vulkan is another, although I'm not saying it's suitable for all the same applications OpenCL or CUDA can address. You can even go up a level and use domain-specific libraries that feature an OpenCL backend, such as OpenCV, if it does what you need.

hsv-compass said:
clean slate ecosystems are nice but the practical reality is many have tried ...

graphcore
larrabe
Close to Metal
ROCm
GPUOpen

This is an odd mishmash of subjects, but let's take them one at a time.

I don't know much about GraphCore, but I think they and Xeon Phi (which I assume you mean by "Larrabee") failed for hardware reasons, not their APIs. I mean, Xeon Phi had quite possibly the best API story, which is that you could just use generic, multithreaded x86 code that runs on Linux. Of course, they provided other options, too. The main reason it failed is that its performance just wasn't competitive, which I attribute mostly to the notion of using 2-way SMT-4 x86 cores. Even AVX-512 and on-package memory (HMC DRAM, not exactly HBM) wasn't enough to save it.

CTM? Really? That's ancient history that AMD basically dumped in favor of OpenCL.

ROCm is frequently misunderstood. It's the stuff that sits underneath a compute API, such as OpenCL or HIP. It's had a troubled history, but is still a going concern, as is HIP.

GPUOpen is just like just a random name you threw in, there? It's the name of the platform AMD established for their open source GPU efforts, including both interactive graphics and GPU compute. It's really not a technology, but more of an organizational thing, I guess?

Some you forgot: Cg, Lib Sh, BrookGPU, HSA, oneAPI, SYCL, C++AMP, DPC++, OpenMP, Intel's C for Metal (nothing to do with ATI/AMD's CTM or Apple's Metal), RenderScript, OpenACC, WebCL, and a couple others whose names are eluding me.

bit_user · Feb 20, 2024

dalauder said:
Somehow, the article didn't credit him for work on x64 like Stars (K10) and Zen architecture too.

EDIT: Keller did K8, which led to K10, I think. K8 were the first x64 mainstream processors.

Don't forget the fabled K12, which is said to have been his last project for AMD.

https://en.wikipedia.org/wiki/AMD_K12

Jim's remarks about the K12:

https://www.notebookcheck.net/Zen-a...ncel-the-K12-Core-ARM-processor.629843.0.html

dalauder said:
Nehalem (first Core i series), Sandy Bridge, Skylake

A couple years ago, I went back and read Anandtech's coverage of Nehalem and Sandybridge. Nehalem was described very much as an evolutionary refinement of Core 2. Even Sandybridge, while being a bigger overhaul, wasn't described as a clean sheet redesign.

Not sure why Skylake is even mentioned, either. It certainly didn't seem as big a departure as Sandybridge.

dalauder said:
AMD:
K6, K8, Bulldozer (a debacle), Zen

Don't forget their line of cat cores (Puma, Jaguar, etc.). I suspect those might have been derived from the Cyrix Geode IP they acquired via National Semiconductor.

dalauder said:
As far as I know, most ARM architectures are redesigns, more than Intel's traditional pile-ons.

ARM has several design teams and they do seem to work on cores in a sequence. For instance, the Cortex-A510 was described as the first core in a new low-power series.

hsv-compass · Feb 20, 2024

bit_user said:
I don't know much about GraphCore, but I think they and Xeon Phi (which I assume you mean by "Larrabee") failed for hardware reasons, not their APIs. I mean, Xeon Phi had quite possibly the best API story, which is that you could just use generic, multithreaded x86 code that runs on Linux. Of course, they provided other options, too. The main reason it failed is that its performance just wasn't competitive, which I attribute mostly to the notion of using 2-way SMT-4 x86 cores. Even AVX-512 and on-package memory (HMC DRAM, not exactly HBM) wasn't enough to save it.

Larrabee was intel attempt at a failed GPU that morphed into Xeon Phi when graphics performance was too low.

there is good hardware out there: Qcom QAIC, Graphcore BOW, Google TPU, Intel Gaudi but these solutions only submit results on one or two of the 10+ benchmarks. CUDA based GPUs no longer the king in all workloads. Wish benchmarks has perf/power normalization.

bit_user · Feb 20, 2024

hsv-compass said:
Larrabee was intel attempt at a failed GPU that morphed into Xeon Phi when graphics performance was too low.

Since you didn't list Xeon Phi separately, I assumed you were using Larrabee as a shorthand for it. Yes, x86 was insufficient for interactive graphics, also.

hsv-compass said:
Wish benchmarks has perf/power normalization.

Yes, for sure. To find out how much energy they use to run these benchmarks, you probably need to get a demo unit in-house and run your own tests.

hsv-compass · Feb 21, 2024

bit_user said:
Since you didn't list Xeon Phi separately, I assumed you were using Larrabee as a shorthand for it. Yes, x86 was insufficient for interactive graphics, also.

Yes, for sure. To find out how much energy they use to run these benchmarks, you probably need to get a demo unit in-house and run your own tests.

Anybody know why I can't find benchmark results for AMD, intel and Graphcore in open ml but they have articles pimping their virtues? Are these companies doing compiler hacks or non benchmark reference workloads to game results

dalauder · Feb 21, 2024

bit_user said:
There are plenty of cases where API compatibility is either intentionally broken or never existed due to not having a stable contract, in the first place. We need look no further than the Linux Kernel, which has no stable internal APIs, including for device drivers!
...

While I think he has a point, I've already mentioned that these statements probably didn't just come out of nowhere. There's clearly some reason he's speaking out, especially since he doesn't seem to tweet very frequently.

Wow, you are remarkably thorough and respectful of someone who is practically trolling us out there.

dalauder · Feb 21, 2024

yeyibi said:
It's obvious. If I say "the person drives a car made of metal", and you jump screaming "fool, persons aren't made of metal", you are only demonstrating that you are a bully, always willing to insult gratuitously and pick a fight for no reason.

Dude, he's been posting here, respectfully, for like 100 years. He just misunderstood. The closest he's done to picking a fight is taking Jim Keller's side when Keller said there's a problem with CUDA. And Keller's not wrong.

The other side--that Keller doesn't have a better alternative--is also valid. I'm irritated that we have to speak English. Its vowel usage and redundant consonants are completely absurd. But it's the language I know and the most commonly used one (in my part of the world), so we're stuck with it. Which I'm assuming Keller also realizes (because he's a pretty smart guy) regarding the usage of CUDA.

dalauder · Feb 21, 2024

bit_user said:
Don't forget the fabled K12, which is said to have been his last project for AMD.

https://en.wikipedia.org/wiki/AMD_K12

Jim's remarks about the K12:

https://www.notebookcheck.net/Zen-a...ncel-the-K12-Core-ARM-processor.629843.0.html

I would've loved to see K12. But that's probably just nostalgia talking.

bit_user said:
A couple years ago, I went back and read Anandtech's coverage of Nehalem and Sandybridge. Nehalem was described very much as an evolutionary refinement of Core 2. Even Sandybridge, while being a bigger overhaul, wasn't described as a clean sheet redesign.

Not sure why Skylake is even mentioned, either. It certainly didn't seem as big a departure as Sandybridge.

Almost nothing is completely new, since x86 was first launched. But my list was woefully incomplete it was off the top of my head in a couple of minutes.

Anyhow, Nehalem changed enough to jump clockspeeds by like 30% and moved the northbridge to the SoC. I think that's enough tweaking to give it a "pass". Sandy Bridge was really just an AMAZING refinement, true. I only mention Skylake because it's what Intel shipped as "new" until it went BigLittle--so for like 6 years, IIRC. Basically no IPC improvements for that long too. Good thing there's competition now or things would still stagnate.

bit_user said:
Don't forget their line of cat cores (Puma, Jaguar, etc.). I suspect those might have been derived from the Cyrix Geode IP they acquired via National Semiconductor.

ARM has several design teams and they do seem to work on cores in a sequence. For instance, the Cortex-A510 was described as the first core in a new low-power series.

I forget about their APUs--probably the only thing that kept AMD from going out of business during the Bulldozer era. That's what landed them the Xbox and Playstation contracts--which they'll probably lock down for next generation too, at least if Zen 6 GPU rumors are true.

I don't really know ARM. I just remember thinking their A7, A53, and A35 were all surprisingly different. I haven't looked at their core configurations since my daughter was born (2013) though--and mostly just glance over the different Snapdragon options.

dalauder · Feb 21, 2024

hsv-compass said:
Anybody know why I can't find benchmark results for AMD, intel and Graphcore in open ml but they have articles pimping their virtues? Are these companies doing compiler hacks or non benchmark reference workloads to game results

They're definitely gaming the results. They're also doing their tests with 360mm water cooling on open test benches at 10 degrees C ambient with the TDP limiter disabled.

bit_user · Feb 22, 2024

dalauder said:
I only mention Skylake because it's what Intel shipped as "new" until it went BigLittle--so for like 6 years, IIRC. Basically no IPC improvements for that long too. Good thing there's competition now or things would still stagnate.

In fairness to Intel, Ice Lake (with a core later back-ported to Rocket Lake) did deliver IPC improvements. Ice Lake was originally set to launch on the desktop in 2018 or 2019, but Intel just couldn't get their 10 nm node clocking high enough for it to outperform refined Skylake cores running on refined 14 nm nodes. However, that's why they could still ship it for laptops and servers, which don't need to clock as high.

The leap we witnessed from Skylake to Alder Lake (Golden Cove) was actually 2 ticks (i.e. architecture iterations) + a couple tocks (i.e. process refinements). As much as people deride Intel's hybrid approach (P + E cores), I believe it enabled intel to make Golden Cove bigger and more power-hungry than they otherwise would've. So, if you like their P-cores, maybe the price of that is acknowledging their E-cores?

bignastyid · Feb 22, 2024

Some off topic posts have been removed and if any of that continues, further action will be taken.

CmdrShepard · Feb 25, 2024

dalauder said:
I'm pretty sure Jim Keller has a blank check to say whatever he wants regarding architectures.

That doesn't guarantee that what he says will make sense.

He is supposedly an expert in hardware architectures, not software architectures.

For APIs like CUDA (or even Win32) it is crucial to have backward compatibility because applications and services depend on it.

For CPUs, you can do a RISC or any other architectural design inside as long as what you expose on the outside can still execute x86 code so you have more liberty.

dalauder said:
EDIT: Keller did K8, which led to K10, I think. K8 were the first x64 mainstream processors.

Good point, thanks for reminding me to bring x64 up.

x64 has been "piled up on x86" in much the same way (by using an instruction prefix and widening the register file) like what Intel did when transitioning from 16-bit to 32-bit. Yet I don't hear anybody dissing Keller's own work on that particular dirty snowball.

bit_user · Feb 25, 2024

CmdrShepard said:
That doesn't guarantee that what he says will make sense.

He is supposedly an expert in hardware architectures, not software architectures.

Yes and no. I'm sure he's worked with software architects for most of his career, who had to program the hardware he was building. In the case of Tesla, I'm sure the self-driving hardware he designed was even derived from a set of requirements posed directly by the software team.

So, while I don't expect him to be as knowlegable about software architecture as he is about hardware, I fully expect he's aware of programmability concerns and programming models.

CmdrShepard said:
x64 has been "piled up on x86" in much the same way (by using an instruction prefix and widening the register file) like what Intel did when transitioning from 16-bit to 32-bit. Yet I don't hear anybody dissing Keller's own work on that particular dirty snowball.

Do you know how much say he had in the design of x86-64 (aka AMD64)? If he had a major say in its design, then I would consider it fair game.

BTW, do you always get things right, on the first try? Are there ever any unforeseen consequences from things you do? For most of us, the answer is "yes", and that makes them valuable as potential learning experiences. Even if Jim did have some direct hand in the design of AMD64, maybe he learned from it?

Note that he did tweet:

"CUDA is a swamp, not a moat," Keller wrote in an X post. "x86 was a swamp too. […] CUDA is not beautiful. It was built by piling on one thing at a time."

If he has any of that dirt on his hands, perhaps that side-comment about x86 was made to show self-awareness that he doesn't have a perfect track record, himself, and maybe even learned from that experience. ...just speculating.

CmdrShepard · Feb 25, 2024

bit_user said:
So, while I don't expect him to be as knowlegable about software architecture as he is about hardware, I fully expect he's aware of programmability concerns and programming models.

His attitude towards backward compatibility in CUDA says that he either isn't aware or he doesn't care. I mean, he is designing new stuff now so he can afford to not care but drawing parallels between a language + API (which has evolved a lot in the meantime) and x86 is IMO quite unfair.

bit_user said:
Do you know how much say he had in the design of x86-64 (aka AMD64)? If he had a major say in its design, then I would consider it fair game.

I don't, but people want only to look at the good things (he worked on x64), and ignore the bad things (that x64 was practically a swamp on top of a swamp).

bit_user said:
BTW, do you always get things right, on the first try?

Check my first post in this thread, I claimed exactly the opposite -- that nobody gets it right the first time (not even Apple whose "first time" comes after everyone else did it wrong at least once).

bit_user said:
Even if Jim did have some direct hand in the design of AMD64, maybe he learned from it?

He apparently didn't learn any humility. He exudes typical tech bro boastful and dismissive attitude.

Contrast that to some other professions (for example check out this interview with the legendary Björn Ulvaeus from ABBA) to see how mature people talk about themselves and their achievements and how carefuly they discuss subjects which they aren't experts on.

bit_user said:
Note that he did tweet:

"CUDA is a swamp, not a moat," Keller wrote in an X post. "x86 was a swamp too. […] CUDA is not beautiful. It was built by piling on one thing at a time."

If he has any of that dirt on his hands, perhaps that side-comment about x86 was made to show self-awareness that he doesn't have a perfect track record, himself, and maybe even learned from that experience. ...just speculating.

Keller never worked on x86 -- he only worked on x64 instruction set and HyperTransport design so no, that part is also pissing on someone else's work.

I am not saying it is undeserved when it comes to x86, but to me it shows his character in unfavorable light and diminishes the value of anything he has said or might have to say hence my initial view that such polarizing (and incorrect) statements should not be amplified by the (responsible) press.

bit_user · Feb 25, 2024

CmdrShepard said:
He apparently didn't learn any humility. He exudes typical tech bro boastful and dismissive attitude.

You're just basing that on a tweet? I didn't get that sense, from the long interviews with him I've read.

Also, I find it funny you're bashing him for lack of humility. I think the level of humility on display in this discussion thread sets a pretty low bar.

CmdrShepard said:
to see how mature people talk about themselves and their achievements

Where are these statements from Jim, about himself and his achievements, that you're comparing with that artist?

CmdrShepard said:
and how carefuly they discuss subjects which they aren't experts on.

Why do you assume he's speaking 100% on his own behalf? He has an entire software team at Tenstorrent, as well as marketing and business development folks who are talking to customers, partners, and thinking a lot about the competitive landscape and their own API strategy. I think it's a pretty safe bet these folks are experts in their fields and probably above average at what they do. Do you think they never discuss these things with him?

CmdrShepard said:
Keller never worked on x86 -- he only worked on x64 instruction set and HyperTransport design so no, that part is also pissing on someone else's work.

Many people use "x86" as a shorthand. You'd have to pin him down, but I doubt he's explicitly focusing on x86. Also, are you saying someone can never criticize something they didn't work on? That's a bizarre and arbitrary standard. Plus, it's not like he event said the decisions that went into x86 were bad - just that the end result is a "swamp". It might sound unflattering, but it's been very successful.

Anyway, I think we're off track by focusing on the man and not the issue. The claim was that CUDA is a swamp, not a moat. What people are talking about (and he's not the only one saying this) is that CUDA doesn't represent the barrier to competition that people seem to think it does. We could carry the analogy even further and ask if all of the different use cases & usage models CUDA has to support might even be working a little to Nvidia's detriment, when it comes to building AI chips. I could definitely imagine that to be true, but I can't say whether that's what Jim actually means.

CmdrShepard · Feb 26, 2024

bit_user said:
You're just basing that on a tweet? I didn't get that sense, from the long interviews with him I've read.

Yeah, I am basing that on a tweet, because that's what this article has conveyed to a large segment of the population.

bit_user said:
Also, I find it funny you're bashing him for lack of humility. I think the level of humility on display in this discussion thread sets a pretty low bar.

But we are just regular people discussing things on a private forum and our opinions aren't magnified by thousands of X followers or the press.

Jim Keller's opinions on the other hand are, so I don't get why are you even trying to compare the two?

Does me not showing humility in a pseudo-anonymous forum discussion somehow make my argument that Jim Keller is being an arse about CUDA less valid? Are you really using tu quoque fallacy to defend him?

I mean, you did already use an ad hominem (paraphrasing: "and what have you accomplished so that we should trust you and not him") to try and discredit my opinion so I can't say I am really suprised at this point.

I see that you are intelligent and that you do have broad knowledge on a variety of subjects (including C++, my bad for implying you don't) -- these kinds of responses should be beneath you.

bit_user said:
Why do you assume he's speaking 100% on his own behalf?

Because it was taken from his personal X account and presented as his personal opinion? Duh.

bit_user said:
Also, are you saying someone can never criticize something they didn't work on?

I never said that, it's called putting the words into one's mouth.

Also, you're deflecting from the reall issue with that assertion -- the point was not on whether he can or can't criticize x86, but whether he did criticize it in good (as you suggested by assuming he was part of it) or in bad faith.

bit_user said:
Anyway, I think we're off track by focusing on the man and not the issue.

Sometimes you need to focus on a person first to see if there's any agenda behind what they are claiming.

To me clearly there is -- he is making a new product, boasting how it will be superior to all the existing stuff which has this "backward compatibility baggage", and he is dissing all the existing stuff from competitors to make his own stuff look better. That diminishes the value of his unsubstantiated claim unless you want to ignore it because you (not you personally) are Jim Keller fanboy.

bit_user said:
The claim was that CUDA is a swamp, not a moat.

And the claim is bonkers, I'd be ashamed to make it publicly if I had his credentials. Worse yet, he didn't cite any specific issues to substantiate it -- he is just pissing at two extremely popular architectures (x86 and CUDA).

Besides, 99% of people do not need "perfect", they need "good enough" where both CUDA and x86 qualify and seeing as he wasn't a part of creating either his comments smell awfully like sour grapes to me. Can't wait to see how many people will use his product once it is launched. Until then I will assume his comments to be nothing more than virtual chest pounding targeted at his investors.

bit_user said:
What people are talking about (and he's not the only one saying this) is that CUDA doesn't represent the barrier to competition that people seem to think it does. We could carry the analogy even further and ask if all of the different use cases & usage models CUDA has to support might even be working a little to Nvidia's detriment, when it comes to building AI chips. I could definitely imagine that to be true, but I can't say whether that's what Jim actually means.

You of all people know that CUDA is more than just an API -- it is also a PTX virtual machine model, which is meant to abstract underlying GPU architecture and make it easier to write portable code that works over several generations of GPUs. CUDA compiler makes sure you don't have to write PTX code, but only (and mostly) C++.

Does this design have some flaws? Most certainly, but it is not half as bad as this stupid sensationalist statement and article make it to be.

bit_user · Feb 26, 2024

CmdrShepard said:
I mean, you did already use an ad hominem (paraphrasing: "and what have you accomplished so that we should trust you and not him") to try and discredit my opinion so I can't say I am really suprised at this point.

That post seems to have gotten deleted, so I lost the context. However, if you're attacking someone's authority, then I don't think it's an ad hominem to ask what basis you have for doing so.

CmdrShepard said:
I never said that, it's called putting the words into one's mouth.

When you say something like: "that part is also pissing on someone else's work."

I think it naturally leads to the question of whether you consider it legitimate to criticize something you didn't work on. That's all I asked. I should be able to ask for clarification without being accused of mischaracterizing your statement.

I expect you can answer "yes" or "no", without trying to turn it into a counter-attack. If I can't get you to clarify your statements or position, we're not going to make any progress.

CmdrShepard said:
Because it was taken from his personal X account and presented as his personal opinion? Duh.
...
Sometimes you need to focus on a person first to see if there's any agenda behind what they are claiming.

I reordered these two statements, since I find their juxtaposition interesting. On the one hand, you're taking his statements to represent his own opinions and direct experience, in spite of his role as CEO of Tenstorrent. On the other hand, you're alluding to his interests as motivation for the statement, seemingly acknowledging that this opinion might not have come out of the blue.

Given how infrequently he tweets, I think it's reasonable to assume this tweet had some specific motivation or purpose. If the statements were motivated by factors outside of his own, direct knowledge and experience, maybe they were also similarly informed.

It seems like a lot of your basis for dismissing his statements is his apparent lack of expertise in what he's talking about. However, if he's tapping into a much deeper pool of expertise and channeling their thoughts on the matter, that would seem to undermine this critique.

CmdrShepard said:
To me clearly there is -- he is making a new product, boasting how it will be superior to all the existing stuff which has this "backward compatibility baggage", and he is dissing all the existing stuff from competitors to make his own stuff look better.

I don't think he's trying to make his stuff look better than it is. I take the CUDA comments as simply trying to downplay a perceived disadvantage.

CmdrShepard said:
Besides, 99% of people do not need "perfect", they need "good enough"

By the same token, one could probably argue that most people don't need an AI accelerator that's CUDA-compatible. I take his statement that "CUDA isn't a moat" as basically stating that lack of CUDA-support shouldn't be seen as an obstacle for others to overcome.

CmdrShepard said:
seeing as he wasn't a part of creating either his comments smell awfully like sour grapes to me.

It depends a lot on whether or not he has a point. Simply having a vested interest in an answer doesn't make it wrong.

CmdrShepard said:
You of all people know that CUDA is more than just an API -- it is also a PTX virtual machine model, which is meant to abstract underlying GPU architecture and make it easier to write portable code that works over several generations of GPUs.

If you use an even higher-level API, like that of a deep learning framework or service, then you can get similar benefits without having to write any CUDA code. You just need the framework to support whatever hardware you're trying to run it on, and then leave the portability problems up to the backend, itself.

CmdrShepard · Feb 26, 2024

bit_user said:
That post seems to have gotten deleted, so I lost the context. However, if you're attacking someone's authority, then I don't think it's an ad hominem to ask what basis you have for doing so.

But I am not attacking their authority, just their stupid statement.

bit_user said:
Given how infrequently he tweets, I think it's reasonable to assume this tweet had some specific motivation or purpose.

I have a suspicion that the motivation was simply getting publicity, good or bad. Most of C suites in tech think like that.

bit_user said:
I don't think he's trying to make his stuff look better than it is. I take the CUDA comments as simply trying to downplay a perceived disadvantage.

The disadvantage is real. There are no skilled developers coding for his new architecture on the free market so companies wanting to use his product will have to rely on what his product supports out of the box or invest in training people to write code for it which takes time and money which would be better spent using off-the-shelf CUDA compatible cards.

bit_user said:
It depends a lot on whether or not he has a point. Simply having a vested interest in an answer doesn't make it wrong.

There are different levels of wrong -- you can be factually wrong, legally wrong, ethically wrong, or some combination thereof. I am just saying that ethics doesn't seem to be his strongest point given the tone of the statement coupled with his vested interest in downplaying the strengths of his competition.

bit_user said:
If you use an even higher-level API, like that of a deep learning framework or service, then you can get similar benefits without having to write any CUDA code. You just need the framework to support whatever hardware you're trying to run it on, and then leave the portability problems up to the backend, itself.

What he needs is someone to write necessary support for his hardware architecture in Torch, Tensorflow, and all other major AI / ML frameworks. Then he just needs to convince the maintainers of those frameworks to upstream that code, and finally he needs to pay someone who will maintain this support if they accept.

Piece of cake. I for one wish him luck.

bit_user · Feb 27, 2024

CmdrShepard said:
What he needs is someone to write necessary support for his hardware architecture in Torch, Tensorflow, and all other major AI / ML frameworks. Then he just needs to convince the maintainers of those frameworks to upstream that code, and finally he needs to pay someone who will maintain this support if they accept.

Piece of cake. I for one wish him luck.

They have a software stack they call TT-Buda:

"TT-Buda is Tenstorrent’s framework which allows standard AI software to run on our hardware."

Source: https://tenstorrent.com/software/tt-Buda/

News Jim Keller slams Nvidia's CUDA, x86 — 'Cuda’s a swamp, not a moat. x86 was a swamp too

Prominent

Splendid

Splendid

Titan

Titan

Titan

Titan

Titan

Prominent

Titan

Prominent

Splendid

Splendid

Splendid

Splendid

Titan

Titan

Prominent

Titan

Prominent

Titan

Prominent

Titan

Prominent

Titan

Share this page