I bet that the vote of confidence comes from AMD's aspirations of making ARM CPUs. The latest Linux benchmarks show that A15 CPUs can provide about 75% of the instructions-per-clock-per-core of Intel's CPUs. Of course, the A15 core is well under 75% of the size of Intel's cores, and well under 75% of the power consumption, so throw about 64 of those A15 cores into a 125w server CPU using AMD's GPU memory tech, and Intel has no chance.
This is the inevitable failure of CISC vs. RISC. CISC relies on compilers to take code and find a way to fit it into instructions. Unfortunately, that's a losing proposition once you get into these giant 256 and 512 bit instructions, the compiler always finds itself trying to fit a square peg into a round hole. The FMA stuff can be accelerated to probably 1024 bit and beyond with FP math like convolution, FFT, FIR filtering and so on, but that stuff is never going to be more than 10% of any application, so you're investing huge amounts of effort into perhaps a 5% speedup of a select few apps.