With speculative OOO, overlapping loop execution is standard.
You'll have to show me when that changed from theory to silicon, as well you'll have to show me a x86 machine that can do overlapping loop execution, without hardlocking.
Well, a K7 can do that (and K8 and Core 2).
Perhaps even a humble Pentium 2.
With register renaming, thanks to Tomasulo's algorithm, and speculative execution (the branch will be correctly predicted as taken except the first and last time).
Each loop iteration will use a different "shadow" register, until the instruction retire.
As for speculative execution, or as some of you are coining it speculative OOO (I have never heard of someone calling it that), or simply brand prediction, I’ve never heard of someone calling speculative execution speculative OOO my bad.
With that in mind yes modern x86 machines are capable of software pipelining, but it is done at the compiler level not at the machine level, which is the case with EPIC. It’s noticeably more efficient to be done on EPIC than x86, based on x86’s register limits as apposed to EPIC’s.
It also should be noted that software pipelining is quite difficult to implement on x86 and is nearly all cases is faster and much easier to do a straight forward loop, or use Duff's technique to do your loop unrolling which is easier and quite a bit simpler.
Why are you bringing the Tomasulo's algorithm into the argument, I quite aware that it gave birth to out-of-order execution processors? You look that up on Wiki to make me look stupid?
Nope, OOO execution existed before Tomasulo's algorithm was invented, and it was based on a simpler algorithm, called Scoreboarding.
Tomasulo's algorithm allows dynamic loop unrolling at HW level, that's why i mentioned it.
And if i find the time i might provide an example of this, but not tonight though (yes it's night here

).
And no, i didn't look it up in a Wiki, since i'm teaching it in a course on computer architecture in a college..

Did you mention me lookin it up in a Wiki to make me look stupid?
😉