patrickjp93 :
ARM can either make individual instructions take a bit less time (not much room on that given how rudimentary the instruction set is in the first place) or they improve the pipelining and branch prediction.
Don't forget out-of-order.
A lot of ARM's original power-efficiency fame came from ditching all this single-threaded performance stuff but due to how little software can leverage multi-threading properly, pretty much all ARM designers are going back to focusing on single-threaded performance and putting all that stuff back in.
Considering how kludgy the x86 instruction set is, I used to believe there would never be a half-decent x86 chip under 10W but now Intel lowered the bar to 5W.