The system interface control logic contains
an in-order queue (IOQ) and an out-of-order
queue (OOQ), which track all transactions
pending completion on the system interface.
The IOQ tracks a request’s in-order phases
and is identical on all processors and the node
controller. The OOQ holds only deferred
processor requests. The IOQ can hold eight
requests, and the OOQ can hold 18 requests.
The system interface logic also contains two
128-byte coalescing buffers to support writecoalescing
stores. The buffers can coalesce
store requests at byte granularity, and they
strive to generate full line writes for best performance.
Writes of 1 to 8 bytes, 16 bytes, or
32 bytes are possible when holes exist in the
coalescing buffers.
Is not this rather related to commiting writes to main memory? There is nothing that would preven OOO cores to use similiar techniques and they AFAIK do.
If you look at predicate, and branch in Itanium, there is less a chance of stall in Itanium versus x86.
I believe you are speaking about different subject. If branch misprediction happens, pipeline is not stalled but canceled.
What we are dealing with when speaking about pipleine stall is cache miss. In that case any in-order core (like Itanium) has to wait for data (pipeline is stalled). That is why is so important for Itanium to use explicit prefetch and why it has so big low latency cache.
Meanwhile, OOO has a good chance to continue with another ready instruction from reorder buffer. In worst case, it will not do any real processing in execution units, but will likely still continue fetching instructions and placing them into reorder buffer.
(I hope pippero will correct any errors I made 😉
Every response phase of Itanium is followed by defered phase which the data phase can be out of order
I believe this is dealing with completition and commiting results to memory. That is nontrivial but still easier part of trouble...