Nice article, Paul!
The only block that strikes me as truly novel is the stack engine. I'm curious to know more about it, but perhaps the article already conveys everything they disclosed? Stack is an interesting area for optimizations, but it seems that a lot of them would bend or break ABI compatibility.
For instance, you could use a dedicated cache, so as not to compete with instruction fetches & load/store. But, no doubt there's code which loads values previously pushed onto stack, and probably even some which uses push and pop interchangeably with load/store.
BTW, I'd love to read a side-by-side comparison with Skylake, at this level of detail. I'm sure there are some slight differences in the way things are put together. So, even where there are similarities, some (possibly significant) variations still lurk.
The only block that strikes me as truly novel is the stack engine. I'm curious to know more about it, but perhaps the article already conveys everything they disclosed? Stack is an interesting area for optimizations, but it seems that a lot of them would bend or break ABI compatibility.
For instance, you could use a dedicated cache, so as not to compete with instruction fetches & load/store. But, no doubt there's code which loads values previously pushed onto stack, and probably even some which uses push and pop interchangeably with load/store.
BTW, I'd love to read a side-by-side comparison with Skylake, at this level of detail. I'm sure there are some slight differences in the way things are put together. So, even where there are similarities, some (possibly significant) variations still lurk.