The only issue I have is software takes a very long time to catch up. Core utilization in the mainstream consumer market is not going to scale as fast as HPC markets do.
I still don't see an advantage to 16 cores for the mass majority and very little for enthusiasts outside of gamers who also want to stream.
Whats needed is a major boost to IPC until software can actually, at the core lik the OS, utilize multiple cores efficiently enough.
I think the mainstream market is actually doing pretty well within its technical limitations (Amdahl's Law).
Operating systems are actually using multiple threads. in that most kernels do handle interrupts and system call on every core/hardware thread without a global lock (for most operations) and auxiliary services are split into multiple processes that spread out to all available cores.
But an OS is just an enabler for the actually desired software to run on, and most mainstream OSes do that just fine.
On the gaming front, I think we are seeing a pretty fast adoption of higher core counts. So much so, that the common recommendation for a minimum decent gaming system has gone from "quad core i5 with no HT is fine" to "at least 6c/6t but take 6c/12t if you can spare the cash" in just two years.
On the desktop front, we've seen all major browsers adopting multithreaded DOM rendering and Javascript execution, and offloaded auxiliary tasks (IO etc.) in recent years (usually with no more than one thread per page because that's good enough for now).
Besides that, most mainstream workloads barely need one core, so there's really no reason to scale out.
The ecosystem will work with what it gets. Moaning the absence of 16-core optimization not even half a year after the first 16-core mainstream CPU ever has launched is not fair.
The big problem with more cores is that tons of everyday algorithms don't lend themselves to being split into multiple threads cleanly or efficiently and those will always favor CPUs with high single-thread performance (ex.: core control logic for games) regardless of how much more AVX stuff (like AI and physics) gets tacked on top of it.
Tons of everyday workloads actually can be multithreaded, because most of them are made up of more than just one instance of one sequential algorithm.
Even a single hardware thread is exploiting opportunities for parallelization in "singlethreaded" code. We see this in vector instructions, instr. reordering, pipelining and superscalarity.
The biggest hurdles I see there are tools to make multithreaded development easily accessible, and efficient communication between threads.
The former is being addressed over time.
For example, Go has introduced an easy, race-free approach to multithreading with goroutines.
C++ has taken until 2011 to finally provide a standardized threading library and has just laid the groundwork for "easy" multithreading with execution policies in C++17, with more on the way for C++20 and 23.
Unity's relatively new ECS and Job System provide the foundation to reorganize monolithic code into discrete jobs that can be executed by multiple threads.
The latter is something to keep in mind while developing. Communication between cores is relatively slow (in terms of CPU cycles), so there is a tradeoff between execution resources (cores) and communication overhead.
With more and more cores that require more complex on-chip networks and have longer wires, this only going to get worse.
And the of course, there is necessity. Nothing gets done at scale unless it's beneficial in some way. A lot of everyday workloads are too simple to even bother with multithreading.
Now that even Intel embraces core count again because they are hitting a brick wall, we'll see more investment into multithreading because there is just no other way.
And with availability, we'll see more adoption as well in industries that just target whatever is there (like gaming).
I think it is a major mistake for AMD not to support Intel's OneAPI, This could be most significant development change in computers in at least decade. A single development API across CPU/GPU and across multiple of levels of demands.
Papermaster didn't specifically deny oneAPI support in the future. He just said that AMD is already well into this topic and they are doing basically the same with their ROCm work to assure the readers that AMD is no stranger to heterogeneous compute.
But oneAPI is a competitor's initiative, so AMD will want to see whether it takes off and then jump on the bandwagon if necessary like they did with CXL.
oneAPI is a high-level programming model and API specification, agnostic to the underlying implementation stack, so it can most likely be bolted on top of the ROCm stack (in parallel to or on top of HIP).
The biggest problem with ROCm is that the entire stack is atrociously immature. For example, the entire HCC path was just dropped, Navi and APUs are still not supported, a SYCL wrapper is only provided by a third-party developer, and don't even think about Windows support.
It often feels like there is maybe one developer at AMD working on it part-time.