microcode patch is an easy fix that would not have taken half a year to investigate and few more months to roll out. a bug in microcode for voltage control may be a sufficient condition to cause the symptoms, however yet to say it is "the" root cause. i believe there are much more necessary conditions being the root cause, and most likely physical rather than software/algorithmic cause.
one example i have had in mind is the chips' high interval per sec switching between ~20 to ~50 ratio multipliers, corresponding voltage jump is as large as close to 1V from ~0.6V to 1.56V; where each 8 p-cores are switching independently, the L3 cache and ring bus has to deal with all the independent voltages thousand times per second among all the cores. I suspect the ringbus & L3 cache simply cannot bear with the huge voltage jumps and differences all the time, thus it degrades. some say disabling e-cores helped a bit, I suspect there is a typical large voltage gap at the ringbus position gapped between p- and e-cores, where most likely hit hard, disabling them keeps the processes away from crossing the defected site on the ring.
The new microcode may help in reducing the voltage differences across core nodes on the ring, but it will be more complicated to actually fix the frequency dynamics to reduce stress on the voltage changes on the ring, while still providing that advertised 5.xGHz which takes high voltage spikes to achieve.