linked-in jobs for cuda : 1699 (352 are Nvidia openings)You're confusing different things. What you're talking about is iteration on a design where you're not burdened with the legacy of backward compatibility. In this case, you can remove or streamline things that don't work or are needlessly complex. You can find better ways to do things and optimize the design to emphasize them, etc.
What CUDA has to deal with is supporting 17+ years worth of just about every idea they ever had, whether it was good or bad and regardless of how it interacted with the other features of the language/API. In that sense, it's like the C++ of GPU Compute APIs (in the sense that C++ was heavily-evolved and needlessly complex for what it does).
CUDA is basically like a dirty snowball, rolling down a hill, accumulating random cruft as it rolls along. You can't pull anything out of it, for fear of breaking some codebase or another, so it all just accumulates.
I did a little CUDA programming around 2010 or so, before I picked up a book on OpenCL and started dabbling with it. OpenCL immediately seemed so much cleaner and more self-consistent. It's like they took all of the core ideas that had been proven in CUDA and other GPGPU frameworks and re-implemented them with a blank slate. That's one thing I like about OpenCL and SYCL.
And what have you done to elevate yourself to such a vaunted status where we should take your word over that of such an industry luminary?
linked-in jobs for opencl : 333 (73 are Nvidia openings)
linked-in jobs for direct compute / direct ML : 3
if you want platform independence ... opencl or direct compute/ml
if you want time to market, market availability, hosting, toolchain support, performance, programming resources ... cuda and opencl (nvidia gpu only available commercial hosing for opencl) ... features support in opencl sdk tend to lag cuda. so if performance, TPU support is important to you, still CUDA.
... AWS still hosts Kepler (~2012) series instances. don't need to port working s/w
... OPENCL on AWS mostly supported by P3 instance (nvidia)
clean slate ecosystems are nice but the practical reality is many have tried ...
graphcore
larrabe
Close to Metal
ROCm
GPUOpen
Keller is a genius but overcoming 18 year of commitment / 10 learning cycles will be difficult (major architecture iterations)
... its
Last edited: