News Jim Keller slams Nvidia's CUDA, x86 — 'Cuda’s a swamp, not a moat. x86 was a swamp too

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

CmdrShepard

Prominent
BANNED
Dec 18, 2023
531
428
760
They have a software stack they call TT-Buda
If I am reading that right, they side-stepped adding support for their hardware in those frameworks (which would've been an open-source-y thing to do) with conversion of existing frameworks and models to their proprietary format in order to run them on their hardware?

And the second option is C++ access to some "kernel" (supposedly well documented) in a roll-your-own fashion without the benefit of open-source collaboration?

Provided I am not mistaken that's:

1. Trying to get a vendor lock-in on customers
2. Being open-source in name but not in spirit (leeching from other frameworks, not giving anything back)

Thanks for posting that, now I dislike him even more.
 

bit_user

Titan
Ambassador
If I am reading that right, they side-stepped adding support for their hardware in those frameworks
I had some similar thoughts. It's probably worth actually diving into the details, because some of their comments suggest otherwise.

Anyway, to the extent someone cares, they can investigate further. Since I have no stake in the matter, I'm done with this.

And the second option is C++ access to some "kernel" (supposedly well documented) in a roll-your-own fashion without the benefit of open-source collaboration?
You're referring to Metalium? I believe that's how you program the hardware, directly. ...since it doesn't support CUDA.

"The figure below shows the software layers that can be built on top of the TT-Metalium platform. With TT-Metalium, developers can write host and kernel programs that can implement a specific math operation (e.g., matrix multiplication, image resizing etc.), which are then packaged into libraries. Using the libraries as building blocks, various frameworks provide the user with a flexible high-level environment in which they can develop a variety of HPC and ML applications."

Frame-4-1024x490.png


Source: https://tenstorrent.com/software/tt-metalium/

As for the "open-source" part, Jim had previously touted their intention to open source this stuff. I don't know if that's still the plan, and maybe time-to-market concerns just de-prioritized that aspect? Or, maybe they made a strategic decision to do otherwise. Would be interesting to know.

Provided I am not mistaken that's:

1. Trying to get a vendor lock-in on customers
2. Being open-source in name but not in spirit (leeching from other frameworks, not giving anything back)

Thanks for posting that, now I dislike him even more.
Okay, but what if you are mistaken? Why rush to a conclusion without lining up the facts?

Also, let's say it's no longer planned to be open-sourced. How would that be worse than CUDA?

Finally, let's not forget the context, here. Jim is concerned about AI accelerators, not HPC or other domains CUDA is intended to serve. CUDA is much more general than what you need for an AI accelerator, and that generality doesn't come without costs (i.e. in terms of performance, efficiency, and literal cost). I think that's another way to see the "swamp" analogy - that Nvidia is bogging down its AI accelerators with the generality needed to support CUDA.
 
Last edited:
  • Like
Reactions: Order 66
That doesn't guarantee that what he says will make sense.

He is supposedly an expert in hardware architectures, not software architectures.

For APIs like CUDA (or even Win32) it is crucial to have backward compatibility because applications and services depend on it.

For CPUs, you can do a RISC or any other architectural design inside as long as what you expose on the outside can still execute x86 code so you have more liberty.


Good point, thanks for reminding me to bring x64 up.

x64 has been "piled up on x86" in much the same way (by using an instruction prefix and widening the register file) like what Intel did when transitioning from 16-bit to 32-bit. Yet I don't hear anybody dissing Keller's own work on that particular dirty snowball.
True, Keller isn't infallible. But mostly we're probably reading into him complaining about legacy compatibility too much.
 

CmdrShepard

Prominent
BANNED
Dec 18, 2023
531
428
760
I had some similar thoughts. It's probably worth actually diving into the details, because some of their comments suggest otherwise.
Talk is cheap.
As for the "open-source" part, Jim had previously touted their intention to open source this stuff. I don't know if that's still the plan, and maybe time-to-market concerns just de-prioritized that aspect? Or, maybe they made a strategic decision to do otherwise. Would be interesting to know.
Or it is what I suggested. A cash grab and posturing in front of investors.
Okay, but what if you are mistaken? Why rush to a conclusion without lining up the facts?
Because he presented no facts about CUDA and wanted me to take his word at face value? Why is that OK for him and not for me?
Also, let's say it's no longer planned to be open-sourced. How would that be worse than CUDA?
Ever heard of hypocrisy? "Those who are without sin should throw the first stone" and all that?
Finally, let's not forget the context, here. Jim is concerned about AI accelerators, not HPC or other domains CUDA is intended to serve. CUDA is much more general than what you need for an AI accelerator, and that generality doesn't come without costs (i.e. in terms of performance, efficiency, and literal cost). I think that's another way to see the "swamp" analogy - that Nvidia is bogging down its AI accelerators with the generality needed to support CUDA.
There's nothing wrong with CUDA being more general. I think it is an advantage. You were able to process AI workloads with CUDA cores even before NVIDIA added Tensor cores in hardware. You were also able to process raytracing workloads with CUDA cores even before NVIDIA added RT cores. If anything, having unified general architecture enables them to easily figure out which parts of the workflow would benefit the most from being hardware-accelerated and how and they are doing just fine so far by dedicating small bits of expensive silicon to get the biggest possible gains.
 

bit_user

Titan
Ambassador
Talk is cheap.
So are forum posts!
; )

Or it is what I suggested. A cash grab and posturing in front of investors.
If the investors care about their open source status & plans, I'm sure they can ask. I think it's probably customers who have more of a stake in the matter.

Because he presented no facts about CUDA and wanted me to take his word at face value? Why is that OK for him and not for me?
That feels like whataboutism.

Your concern about vendor-lockin via their proprietary API is certainly valid, for customers who need to access the hardware at that level. IMO, the main benefit of the tools being opensourced is just to help avoid the hardware turning into a brick, if Tenstorrent ceases operations or undergoes a strategic shift that results in them prematurely dropping support for existing products.

There's nothing wrong with CUDA being more general.
Nothing morally wrong, but the main concerns would be if it forces the hardware to be less optimized, cost-effective, or efficient for its intended use case.

I think it is an advantage.
It increases total addressable market size, but that only benefits the manufacturer and customers looking for a general solution. For those who have a very specific purpose and RoI case, the price of unnecessary generality might be significant.

You were able to process AI workloads with CUDA cores even before NVIDIA added Tensor cores in hardware. You were also able to process raytracing workloads with CUDA cores even before NVIDIA added RT cores.
CPUs could do these things and a whole lot more. See, generality has tradeoffs!

As for ray tracing, the only real value it had on GPUs without RT cores is as a development vehicle. Performance was generally too low to be playable, making it almost irrelevant for gamers.

If anything, having unified general architecture enables them to easily figure out which parts of the workflow would benefit the most from being hardware-accelerated and how and they are doing just fine so far by dedicating small bits of expensive silicon to get the biggest possible gains.
Generality is a win, if you're either doing a variety of stuff with the hardware or you aren't initially sure what your needs will ultimately be.

What matters for Tenstorrent's customers and investors is whether their products are successful and competitive in their ability to solve real customer problems. If vague concerns about CUDA are getting in the way of that, I think Jim's comments are justified.
 

CmdrShepard

Prominent
BANNED
Dec 18, 2023
531
428
760
CPUs could do these things and a whole lot more. See, generality has tradeoffs!
Could, but not at GPU speed.
As for ray tracing, the only real value it had on GPUs without RT cores is as a development vehicle. Performance was generally too low to be playable, making it almost irrelevant for gamers.
I feel like I should have clarified what I meant by ray tracing.

I was talking about professional 3D applications and renderers. Even with only just CUDA cores they were at least an order of magnitude ahead of CPU if not much more.
Generality is a win, if you're either doing a variety of stuff with the hardware or you aren't initially sure what your needs will ultimately be.
Yeah, and if you decide you need some other form of number crunching in addition to tensors with Tenstorrent hardware you wake up to a painful realisation that you have bought into an expensive single-purpose brick.
If vague concerns about CUDA are getting in the way of that, I think Jim's comments are justified.
His comment isn't justified because he didn't provide any justification to back up his claim.

I guess what I am trying to say but it doesn't seem to be getting through to you is this -- he doesn't have to push others into <Mod Edit> and stand on their shoulders in order for him to look cleaner.
 
Last edited by a moderator:

bit_user

Titan
Ambassador
Could, but not at GPU speed.
Right. Hence, my point: the generality vs. efficiency tradeoff.

Yeah, and if you decide you need some other form of number crunching in addition to tensors with Tenstorrent hardware you wake up to a painful realisation that you have bought into an expensive single-purpose brick.
It's sold as AI hardware, though. You don't buy a sports car and then throw a tantrum when you discover it has no way to mount a snow plow on it!

His comment isn't justified because he didn't provide any justification to back up his claim.
Failure to provide supporting evidence doesn't make a claim wrong; it just makes the argument a bad one.

I happen to agree with him. It's fine if you don't. I really don't care, either way.

I know two people who work at Nvidia, both of whom I really like and respect. I also think they have the best-in-class hardware and software, as well as some leading AI researchers. I can believe all of those things and still take issue with some of their business practices, including their strategy around CUDA. As a matter of fact, I have fewer issues with CUDA, itself, than I do with Nvidia's foot-dragging on OpenCL support, which is why my next dGPU will probably be Intel.

BTW, I could even imagine Jim complaining that OpenCL is a swamp, and I could accept that. Just because I prefer it as my GPGPU standard doesn't mean it's the most appropriate way to program every AI accelerator.
 

CmdrShepard

Prominent
BANNED
Dec 18, 2023
531
428
760
It's sold as AI hardware, though.
So are NVIDIA CUDA capable cards. Your point?

Regarding your car analogy it is more like buying a sports car that can only be driven on one race track, not others, and absolutely can't be driven on regular roads.
Failure to provide supporting evidence doesn't make a claim wrong; it just makes the argument a bad one.
It also suggests that the argument has been made in bad faith. To me that's more important.
I can believe all of those things and still take issue with some of their business practices, including their strategy around CUDA.
You are shifting the goal posts.

First you had issue with CUDA, now it's about NVIDIA's business practices. What's next, you'll take issue with Jensen's leather jacket or kitchen fetish?
As a matter of fact, I have fewer issues with CUDA, itself, than I do with Nvidia's foot-dragging on OpenCL support, which is why my next dGPU will probably be Intel.
Apple has all but abandoned the OpenCL. Khronos Group has released 3.0 specification back in 2020 I think? What exactly is your complaint about NVIDIA and OpenCL? Is there a feature their implementation is missing or what? I hate vague unsubstantiated complaints like Keller's on CUDA and x86 and yours hence this discussion.

Maybe if he explained it I would have even agreed with him? Too bad I don't know with what I am supposed to be agreeing with. As you have seen with "big name in X says Y is bad, trust him" I vehemently disagree because I wasn't raised to trust people on basis of who they are (i.e. authority).
 

bit_user

Titan
Ambassador
You are shifting the goal posts.

First you had issue with CUDA, now it's about NVIDIA's business practices. What's next, you'll take issue with Jensen's leather jacket or kitchen fetish?
No, it's their business practices which I see underlying some of my key concerns surrounding CUDA - the fact that they've kept it closed source (unlike AMD's HIP and Intel's oneAPI) and the fact that (unlike Intel) they've dragged their feet on OpenCL support.

Apple has all but abandoned the OpenCL.
That's been true for more than a decade. I'm not an Apple fan either, BTW.

Khronos Group has released 3.0 specification back in 2020 I think? What exactly is your complaint about NVIDIA and OpenCL?
Until OpenCL 3.0, they remained stuck at 1.2, in spite of having a beta implementation of 2.x that they never pushed over the line into "general release" status. The supposed reason for them remaining stuck at 1.2 was OpenCL 2.0's requirement of SVM, yet CUDA had been implementing the same sorts of features and someone even demonstrated an adapter that could run OpenCL 2.x code atop CUDA.

Furthermore, the didn't support OpenCL on their SoCs, which is even more shady.

Is there a feature their implementation is missing or what? I hate vague unsubstantiated complaints like Keller's on CUDA and x86 and yours hence this discussion.
Even their 3.0 support is highly questionable, since what it does is essentially make a lot of previously-mandatory features optional.

It's kinda funny that you attack me for even mentioning Nvidia's business practices, but then you leap at OpenCL like catnip. Well, if we're going to stay on topic, then let's stay on topic.

"big name in X says Y is bad, trust him"
He said it's a "swamp", which has negative connotations but it's a richer analogy than you suggest. And I never said "believe it because it's Jim K.". I always said two things:
  1. I agree that it's a swamp, as far as I understand the analogy (which I've explained).
  2. (in response to attacks on him) I defend his legitimacy to even make such statements.

I vehemently disagree because I wasn't raised to trust people on basis of who they are (i.e. authority).
Well, you've made it abundantly clear that you don't consider him qualified to voice an opinion on CUDA, in spite of his experience with leading the self-driving chip development at Tesla and now running an AI startup. I accept you feel that way, so can we move on?
 
Last edited:

CmdrShepard

Prominent
BANNED
Dec 18, 2023
531
428
760
No, it's their business practices which I see underlying some of my key concerns surrounding CUDA - the fact that they've kept it closed source (unlike AMD's HIP and Intel's oneAPI) and the fact that (unlike Intel) they've dragged their feet on OpenCL support.
Parts of CUDA toolkit (PTX in particular) are closely tied to their GPU architecture. You don't really expect them to open-source that and reveal all their trade secrets?

I agree they could've open-sourced say NPP libraries built on CUDA, but then again Intel didn't open-source IPP libraries either so... meh.
Even their 3.0 support is highly questionable, since what it does is essentially make a lot of previously-mandatory features optional.
You still didn't name a single feature you need supported in their OpenCL implementation which isn't supported.
It's kinda funny that you attack me for even mentioning Nvidia's business practices, but then you leap at OpenCL like catnip.
I am not attacking you at all.

I am just asking for citations, same I expected to see from Jim Keller.
Well, you've made it abundantly clear that you don't consider him qualified to voice an opinion on CUDA
I never said that so please stop putting words in my mouth.

All I said is he never offered any PROOF that he is qualified to voice an opinion on CUDA.

You know, by giving us some actual examples of what he considers a swamp instead of relying on his authority in the AI field to * on two extremely popular general architectures.
 
Last edited by a moderator:

bit_user

Titan
Ambassador
Parts of CUDA toolkit (PTX in particular) are closely tied to their GPU architecture. You don't really expect them to open-source that and reveal all their trade secrets?
AMD and Intel open sourced their entire GPU software stacks. AMD even publishes ISA documents for their GPUs, openly. I'm sure there's some non-public information that's only shared under NDA, but probably a lot can be gleaned by closely inspecting their drivers and toolchains.

I agree they could've open-sourced say NPP libraries built on CUDA, but then again Intel didn't open-source IPP libraries either so... meh.
That's a CPU library, so not directly equivalent. x86 is an open architecture, so anyone can program for it without using IPP. The same is not true of GPUs, in general, and especially Nvidia GPUs.

All I said is he never offered any PROOF that he is qualified to voice an opinion on CUDA.

You know, by giving us some actual examples of what he considers a swamp instead of relying on his authority in the AI field to * on two extremely popular general architectures.
Okay, there it is, in your words.
 
Status
Not open for further replies.