[citation][nom]palladin9479[/nom]Here is another problem with VLIW and Itanium architecture. The binaries encoding of instructions are static and the HW is incapable of executing them out of order. The original Itanium had six execution units, binaries compiled for that CPU can execute up to six instructions in one pass. But if the user later upgrades their CPU to one with eight or twelve instruction units the binary could still only execute on six and the other units would be permanently stalled. You would have to go back and recompile ~everything~ to support the 12 instruction unit model. And if in the future they introduced a 16 or 24 unit model, then you'd have to do all that recompiling all over again.Now lets reverse it, lets say MS goes out and compiles W2K8 for the *newer* Itanium with 12 instruction units. All the application makers go out and do this too, your entire software base goes out and does this. Guess what happens should you try to run those binaries on the older 6 instruction unit hardware? They will not execute properly if at all. They are statically encoded to send up to 12 instructions to a 6 instruction system. It will work just fine until the code tries to send a 7th simultaneous instruction and suddenly you will get an exception which will cause a nonmaskable interupt (NMI), most likely this will cause the system to crash. This would require the compiler to compile binary code for multiple instances of the CPU and then have the binary check and determine if it should execute 6, 12, 16 or 24 instruction code.This is all because VLIW based architecture is perfect for when the software is being written directly to a very specific known architecture, stuff used in DSP's or GPU's. Its absolutely a bad idea in a general purpose CPU which can be upgraded or switched out and comes in multiple flavors and may be expected to execute any random amount of code at any random time.[/citation]
Sadly, you don't know what you're talking about. The hardware uses the compiler to optimize instruction packing, but it does not execute the instructions. You'd just be sending four packets instead of two, and they would execute fine on older hardware, although possibly with a small performance penalty since it's not optimized for it.
The same is true for x86. Remember the Pentium 4 and how code had to be optimized for it to execute well?
Sadly, you don't know what you're talking about. The hardware uses the compiler to optimize instruction packing, but it does not execute the instructions. You'd just be sending four packets instead of two, and they would execute fine on older hardware, although possibly with a small performance penalty since it's not optimized for it.
The same is true for x86. Remember the Pentium 4 and how code had to be optimized for it to execute well?