Feel free to believe all the marketing ads. I'd rather see the real independent tests first.
Yes, I will believe Intel's claims, because these seem very credible:
- "APX-compiled code contains 10% fewer loads and more than 20% fewer stores"
- there are 10% fewer instructions in APX-compiled code
The thing you're missing about move-elimination is that
mov instructions still have costs, which correlate to that 10% figure:
- Wasting memory bandwidth, since they have to be fetch from DRAM.
- Wasting space in the instruction cache.
- Wasting instruction decoder bandwidth.
That's why it's more beneficial not to have them in the first place, even though you can mitigate their cost through register renaming. The further upstream you can eliminate an instruction, the better.
I don't doubt that APX is good. The questions are how good it is and how much impact it would have on real software.
That's tested easily enough. You can do a simple experiment by restricting a compiler from using certain features on an existing CPU which has them, but you can also model your hypothetical CPU in a simulator and update the compiler to match. I'm certain the latter is how they arrived at the above numbers, because it's standard practice to have a cycle-accurate software simulation of modern CPUs (usually written in a language like C++), before you build them. That should be easy enough to modify, as should a compiler like GCC or LLVM.
Here is a real example of how much new instructions can help speed and how much time it takes to actually introduce them in software, exactly to Python that you've mentioned:
https://www.phoronix.com/news/Intel-AVX-512-Quicksort-Numpy That's just 7.5 years from first hardware available.
That's a terrible analogy, because they had to hand-code those optimized routines. For the kind of enhancements that Intel is adding via APX, all you have to do is flip a switch and the compiler automatically utilizes them.
Also, no one can guarantee that Intel won't do something stupid again like limiting APX to Xeons only like they do with TSX now.
I thought they cancelled TSX. Previously, it was available on Skylake-era client CPUs. They subsequently released microcode patches that disabled it for security reasons.
And look at that TSX: it's a real groundbreaking technology (unlike just incremental benefits of APX) that almost no one had/has
Heh, ARM has similar functionality in TME (Transactional Memory Extensions), which is included in ARMv8.5-A and ARMv9-A
en.wikipedia.org
So, if you like that, you should welcome our new ARM overlords!
; )
Sadly, almost nothing uses it to the point that almost no one noticed Intel removing it in client CPUs.
glibc used it to optimize pthreads mutexes and maybe some other stuff. Some databases used it. Maybe the kernel used it for futexes? I was sad to see it go.
The former is actually a far bigger risk nowadays – licensing risk. Just look at Qualcomm. Pretty much no one doubts bright future of RISC-V, the only question is when.
Again, you're assuming (incorrectly, I might add) that AMD and Nvidia
didn't have architectural licenses, before ARM started trying to extort Qualcomm! AMD definitely had one, since it would've been needed for designing the K12, which they were rumored to have even considered releasing as late as 2020. Nvidia had one from its Denver cores, and I remember reading they renewed it in the past couple years. I've read those licenses are good for a decade or so.