Nvidia brags all the time about having more software engineers than hardware engineers. It's why CUDA is winning so badly right now. It's why DLSS is winning. AMD always seems to fall back to hardware first. Lisa Su was involved with the PS3, a notoriously difficult to program for platform, that was made so much worse by the lack of proper software support. And even now, over a decade later when nearly everyone agrees that, yes, PS3 was an interesting design that was ahead of its time in some ways but that was severely lacking in software support... Su doesn't seem to want to acknowledge that.
I do think that's a reach.
Very interesting read! Thanks for sharing! TBH, I wasn't very surprised, although I had expected software support for their flagship CDNA product would be in better shape than what we see with ROCm on consumer GPUs.
I also read part of that interview the commentator is talking about. It's quite long, though. I think the commentator has some points, but I diverge part way through their post. I think it's most likely that Lisa was being guarded so as not to give away any mea culpa quotes. I think it's fair to say that AMD underestimated and underresourced their software efforts, but I'm sure there's not so much lack of awareness or self-reflection.
Furthermore, it's quite obvious that AMD
does care about ROCm and they
do care about FSR. And I think the ROCm vs. Pytorch quote was taken out of context.
IMO, the core problem is this:
why did AMD think they could beat Nvidia at their own game? This has been
notoriously difficult in the 3D graphics realm, which has been a focus for far longer. And you really have to ask yourself some hard questions about your tactics, if this is going to be your approach. It's a David and Goliath battle, meaning you have to be more clever, more resourceful, and try to find every possible advantage you can. To approach it as a mere slug-fest is a sure way to lose.
The solution, IMO, is that AMD needs to hire about 10X more software people for its GPU division. I don't know, maybe only 3X would suffice, but whatever it has right now isn't enough.
10x seems way over, but I could easily see them being off by a factor of 2. Maybe more, if their strategy is really just to beat Nvidia at its own playbook.
BTW, AMD did recently open new engineering offices in Serbia. I would note that a lot of tech talent fled there from Russia, over the past few years:
And even if it hired 5,000 people today as an example, it would take months to get them all up to speed. So this problem isn't going away any time soon...
I think Wall St. has figured this out, which is why we saw the correction in AMD's stock price, and it's been on a long, slow slide since last Oct.
Gee, I don't know why devs prefer CUDA to ROCm!
They always will. The fundamental problem is that ROCm's HIP will always be an off-brand CUDA wannabe. They will always be playing catchup and basically the best they can do is just be as good as CUDA - maybe a little faster.
That said, AMD isn't wrong to say that a lot of AI users don't directly use ROCm and don't really care what's under PyTorch, as long as it works and it's fast.
P.S. I think maybe AMD had the wrong strategy. Perhaps they could've sued Nvidia for using its monopoly power to lock customers out of other hardware. A victory here could've enabled AMD to integrate with Nvidia's CUDA stack at the PTX level. That gets AMD mostly out of the software race, where they've had the most trouble playing. However, I see little chance of that happening now.