News Intel's Itanium Is Finally Laid To Rest After Linux Yanks IA-64 Support

I guess that the reports of Itanium's death ten years ago were greatly exaggerated. I honestly had no idea that they were still trying to run with that overpriced niche product.

Itanium is a perfect example of everything that's wrong with Intel and it amazes me that they tried to keep it alive for an extra ten years rather than just letting it die.
 
It was a proprietary 64 bit standard. It could not run x86 code at all, whereas the competing AMD64 (x86-64) could. Microsoft did compile some versions of Windows for IA64, but it wasn't widely adopted. Mostly used as servers running bespoke software solutions sold with the hardware. But there were other architectures available for servers that had much wider support. SPARC comes to mind.
 
  • Like
Reactions: Order 66
What made Itanium so bad?
The real issue was Intel never made good compilers for Itanium. They just assumed that they would get market share because they were Intel and that developers would figure out how to make everything work. However, back when cross compilers were largely just a dream. Hoping developers would fix the problems Intel introduced for them turned out to the main reason Itanium never really had a chance.

Had Intel put more effort into helping developers it might have been a success.
 
  • Like
Reactions: Order 66
Article gets history wrong. Xeon existed before Itanium. It was simply co-opted into the x64 line once Intel released x64 processors.

The reason Itanium failed was because it eliminated all the hardware that makes CPUs fast, instead shifting that functionality onto compiler software, and then Intel never followed through with the promised compiler software.
 
  • Like
Reactions: Sleepy_Hollowed
It was a proprietary 64 bit standard. It could not run x86 code at all, whereas the competing AMD64 (x86-64) could. Microsoft did compile some versions of Windows for IA64, but it wasn't widely adopted. Mostly used as servers running bespoke software solutions sold with the hardware. But there were other architectures available for servers that had much wider support. SPARC comes to mind.
Not SPARC, but PowerPC was the big competitor to Itanium. IBM would just about give away PowerPC servers to get support tail.
 
Article gets history wrong. Xeon existed before Itanium. It was simply co-opted into the x64 line once Intel released x64 processors.

The reason Itanium failed was because it eliminated all the hardware that makes CPUs fast, instead shifting that functionality onto compiler software, and then Intel never followed through with the promised compiler software.
Came here to say this exact thing lol Xeon existed since 1998 and was based off of P2.

Additon: My first server was a Dual Xeon Slot 2 500 MHz IBM motherboard with a built in SCSI RAID lol
 
What made Itanium so bad?
Afaik it was mostly the fact that it was based on the so called "very long instruction word", which in turns requires a lot of optimisation in the compiling phase. Nowadays the only comparable architecture that still relies on the same concept is the Russian Elbrus.
 
What made Itanium so bad?
You should search that, if you really want to know.

But let's try to see if I can make it short:

When Intel and HP designed the Itanium, they did it under the assumption that an ISA (instruction set architecture) was relevant to a CPU's performance. And the background was that the x86 had about the worst ISA imaginable, having started as a hodgepodge extension of the 8080, which was an updated design from the 8008. Among other things it had very few architecture registers, floating point was a side building, it supported BCD arithmetic and plenty of other oddball stuff instead of really useful things. Plenty of the instruction coding space was wasted on irrelevant stuff and some of the new goodies required prefixes, basically escape codes which lengthened the encoding of an instruction at the cost of code and data density.

VLIW tried optimized instruction encodings and immediate data types so you could fit or almost compress as much of it together as possible, so the input bandwidth of the CPU (RAM is slow) could be used as effectively as possible. Compilers struggled with that, an important angle, I'm not expanding here.

It made for a very complex silicon implementation just the first time around. But worse, it made it very difficult to rework the design generation after generation for something significantly better.

In the x86 world NexGen had launched on another approach, which translated the x86 ISA into a far more modern and efficient interal ISA, which was then as fast and efficient as such a modern ISA would be natively. Better yet, these native ISAs could be swapped, improved, extended or shrunk with near complete independence of the x86 layer on top and allowed a constant stream of re-implementations, taking advantage of additional transistor budgets or smart new IP blocks. It's how AMD and Intel x86 CPUs have been built since the K5 and Pentium-Pro around 1995.

In theory such an approach could have also been used for the Itanium, but where the x86 might have the complexity of a golf cart, the Itanium was more of a passenger liner (say T-Itanic) and generational replacements huge and thus slower undertakings.

But that takes a huge lucrative market to enable and that market was drained dry first by 32-bit x86 designs that soon ran rings around the Itanium and then via the AMD64 ISA extension, which offered a smooth upgrade path to 64-bit.

And then there were still Power, SPARC, Alpha or even z/Arch on the other end for huge scale-up midrange or bigger machines, so Itanium didn't have any exclusive niche to survive, apart from an artificial Non-Stop one, that HP should have just made ARM or x86 long ago.

Today that x86 translation overhead and ISA legacy is seen as a large transistor budget liability, but just how much, has been hotly debated for nearly 30 years.
 
  • Like
Reactions: mitch074
What made Itanium so bad?
Apart from the whole issue of poor x86 support...

Intel relied too much on software optimizations to make it go fast. They never made any out-of-order IA64 CPUs, although such a thing is actually possible. People incorrectly equate IA64 to VLIW (which cannot be executed out-of-order), but it's not.

Also, they never added vector instructions to it, similar to SSE, AVX, etc. which is a big reason x86 was able to pull ahead in scientific & technical computing.

Lastly, because Intel patented all aspects of the ISA so heavily, there could never be any competition. This meant that if you embraced IA64, you were tying yourself to a single supplier and whatever they felt like charging. Businesses & governments generally don't like to be in single-supplier situations. A large number of IA64's buyers used Java, which can be run on almost any CPU, thereby minimizing lock-in at the ISA level.
 
VLIW tried optimized instruction encodings and immediate data types so you could fit or almost compress as much of it together as possible,
VLIW isn't really about that, but IA64 isn't VLIW. What VLIW is concerned with is having all instruction scheduling & port-assignment at compile-time.

But worse, it made it very difficult to rework the design generation after generation for something significantly better.
Not true. IA64 used a philosophy Intel coined as EPIC - Explicitly-Parallel Instruction Computer. What it meant is that they encoded the dependencies between triads in their headers. The fact that instructions were not compile-time scheduled is what allowed new hardware to have a wider implementation. You could safely execute a mix of instructions from different triads, so long as the dependency graph wasn't violated. The idea behind this was to simplify the hardware logic needed to perform runtime scheduling.

In theory such an approach could have also been used for the Itanium,
Didn't the first couple of CPUs have some hardware x86 translation support? Even if they did, I remember x86 was very slow on it.

Today that x86 translation overhead and ISA legacy is seen as a large transistor budget liability, but just how much, has been hotly debated for nearly 30 years.
It mainly affects power-efficiency and how wide the front end can be. It's much harder to scale up a decoder of variable length instructions than fixed-length ones.
 
  • Like
Reactions: mitch074
Apart from the whole issue of poor x86 support...

Intel relied too much on software optimizations to make it go fast. They never made any out-of-order IA64 CPUs, although such a thing is actually possible. People incorrectly equate IA64 to VLIW (which cannot be executed out-of-order), but it's not.

Also, they never added vector instructions to it, similar to SSE, AVX, etc. which is a big reason x86 was able to pull ahead in scientific & technical computing.

Lastly, because Intel patented all aspects of the ISA so heavily, there could never be any competition. This meant that if you embraced IA64, you were tying yourself to a single supplier and whatever they felt like charging. Businesses & governments generally don't like to be in single-supplier situations. A large number of IA64's buyers used Java, which can be run on almost any CPU, thereby minimizing lock-in at the ISA level.
Have any of you actually used or supported Itanium servers? I have. They weren't "bad". They were very reliable. Reliability was one of the goals of the Itanium implementation. They scaled to 256+ sockets in a single system. Terrabytes of RAM? Yep, supported that when nobody but PowerPC else did.
Were they hot? YES. BUT if you wanted high performance computing with a single system image (not a cluster), it was the way to go.
Saying that they didn't support X86 didn't matter because they were targeted to custom software. The large commercial software packages generally supported Itanium for a while. Oracle dropping support was the key to the demise. The Itaniums had enough RAM capacity that "in memory" databases were possible in the early 2000s when 4Gb fiber channel was a common interconnect. 10GE was a RARE interconnect.
Was Itanium the best CPU ever? No. It was eventually replaced by Xeons which had similar capabilities. But in the early 2000s the Xeon was not capable of the scale that the Itaniums were.
 
  • Like
Reactions: bit_user
What made Itanium so bad?
It wasn't x86 compatible and if you thought Intel x86 CPUs were overpriced, they paled in comparison to the cost of an Itanium. Intel had this big (and remarkably stupid) idea that people were so stupid and inept that they would continue to buy only Intel CPUs even if Intel abandoned x86 to avoid any competition from AMD or VIA. The pricing of Itaniums was astronomical, once again demonstrating Intel's delusional belief that people bought INTEL and not x86.

Enter Jim Keller....

A brilliant CPU designer at AMD named Jim Keller designed a CPU that AMD referred to as "K8". K8 (also known as the sledgehammer core) was a 64-bit CPU architecture that was backwards compatible with all previous x86 CPUs. It was released on the market by AMD as the Athlon 64 using the AMD64 instruction set. Jim Keller became known as "The Father of x64" and he went back to AMD years later and designed the first Ryzen architecture.

Intel was caught flat-footed by this development because until that point, they believed that the performance of the Itanium would convince people to spend insane amounts of money for it. While technically not nearly as performant as the Itanium, the Athlon 64 was a fraction of the price, offered a staggering performance increase over the Pentium-4 and Athlon XP and most importantly, worked with ALL x86-compatible software (like Windows).

Intel scrambled to catch up with their own x86-x64 architecture but ultimately failed and paid AMD licencing fees to use AMD64 just as AMD had paid Intel for the rights to use i386.

As a result, all x86 CPUs made today are based on the AMD64 architecture. People had stopped talking about the Itanium less than a year after it was released and I thought that Intel had just quietly retired it. Now I'm wondering what it was used for because it wasn't big in servers. I never saw it competing with Xeon, Opteron or EPYC in the server space. Its use-case must have been the very definition of "niche".
 
Last edited:
  • Like
Reactions: Order 66
What made Itanium so bad?
It was a VLIW architecture and contrary to classic superscalar architectures, rely heavly on the compiler to squeeze all the performance of the CPU. So with a new CPU you need recompilation of the software to increase (or not decrease) the performance.
The advantage is that you can have less circuitry, minor complexity and better efficiency.
Add 2 years of delay on launch (huge in the sector), the fact that legacy x86 code must be emulated, AMD arrived unexpectedly with AMD64, Microsoft didn't any effort, and Intel leaved HP alone as soon as possible (blahh).
 
  • Like
Reactions: Order 66
Have any of you actually used or supported Itanium servers? I have. They weren't "bad". They were very reliable. Reliability was one of the goals of the Itanium implementation. They scaled to 256+ sockets in a single system. Terrabytes of RAM? Yep, supported that when nobody but PowerPC else did.
Were they hot? YES. BUT if you wanted high performance computing with a single system image (not a cluster), it was the way to go.
Saying that they didn't support X86 didn't matter because they were targeted to custom software. The large commercial software packages generally supported Itanium for a while. Oracle dropping support was the key to the demise. The Itaniums had enough RAM capacity that "in memory" databases were possible in the early 2000s when 4Gb fiber channel was a common interconnect. 10GE was a RARE interconnect.
Was Itanium the best CPU ever? No. It was eventually replaced by Xeons which had similar capabilities. But in the early 2000s the Xeon was not capable of the scale that the Itaniums were.

The RAS features of Itanium were certainly impressive for Intel and as near to HP-PA or even z/Arch as Intel had ever gone. Xeons didn't have that because Intel didn't want them to initially: they wanted to kill AMD free-riding on x86.

In a HP Non-Stop environment those RAS features even made a bit of sense, but they could have done all of that (and later mostly did) with x86-64, if perhaps not with Intel chipsets but 3rd party ones. I know that my former colleagues at Bull did x86 scale-up servers even with their own fabric and there was another vendor, I've forgotten, that went much further: they all became irrelevant as scale-up servers have become irrelevant, unless CXL manages to change that a bit.

What irked me most is that the Itanium was designed to excel at floating point loops. It had none of the iAPX 850/860 "Cray-on-a-chip" genes, but it could still do floating point loops very well, using it's dense EPIC/VLIW instruction streams.

And it sucked bad when branches were mispredicted, it stalled forever to recover then and it performed horrendously at very branchy light logic/integer-only logic code: which is exactly what you'll find in a relational database.

Do you know that the main benefit of floating-point is sacrificing exact numbers for giant ranges?

And do you know how much floating-point there is in an Oracle database?

Or just how people react to in-exact numbers in their bank account?

The Itanium was a floating point loop machine, designed for engineering or HPC workloads. Running an Oracle database on it was about the worst you could do to that CPU, because it would never allow it to put all those transistors to the work they were designed for. Databases servers are very branchy logic code with plenty of bitmaps used for optimizations, you could probably run them with the FPU disabled and never know the difference.

It was a "Fortran" machine and they ran nothing but "Cobol" on it, it's like using a "super reliable" Ferrari to plow a field and it hurt my heart visualizing all these endless stalls.
 
  • Like
Reactions: bit_user
VLIW isn't really about that, but IA64 isn't VLIW. What VLIW is concerned with is having all instruction scheduling & port-assignment at compile-time.


Not true. IA64 used a philosophy Intel coined as EPIC - Explicitly-Parallel Instruction Computer. What it meant is that they encoded the dependencies between triads in their headers. The fact that instructions were not compile-time scheduled is what allowed new hardware to have a wider implementation. You could safely execute a mix of instructions from different triads, so long as the dependency graph wasn't violated. The idea behind this was to simplify the hardware logic needed to perform runtime scheduling.
VLIW has been used broadly, I'd subsume EPIC as a variant for simplicity, but the aim was to get more done using dense code and data representations, doing a type of novel CISC after RISC design that was very complex in hardware and for the compiler. They didn't repeat the mistakes of the early MIPS implementations where erroneous instruction scheduling could lock up a CPU, because that design pushed pipeline management to the compiler.

If you want an interesting follow-up on the Itanium, have a look at the Belt architecture from Mill computing and Ivan Godard's lectures on it.
Didn't the first couple of CPUs have some hardware x86 translation support? Even if they did, I remember x86 was very slow on it.
It was a software binary translation package much like Quick Transit, which was used for SPARC emulation on x86, PowerPC on x86 and x86 emulation on z/Arch by IBM, who eventually bought the company and shelved the technology in the poison locker (perhaps because it could have just as well done z/Arch on x86...)

And the main reason it was so slow was because such code is very branchy logic code, at which the Itanium extremely poorly. It was designed as a heavy floating-point loop machine, not for nimble light logic, which is what most of x86 code was also. Even with binary translation, which unlike emulation can achieve near native code speeds, most of the x86 code run on these machiens still wasn't the Fortran loops stuff the Itanium was really designed for.
It mainly affects power-efficiency and how wide the front end can be. It's much harder to scale up a decoder of variable length instructions than fixed-length ones.
The x86 ISA has so few registers and is so primitive, that most frequent instructions are easy to translate in dozens of variants and improve in relatively small increments. A software transparent redesign of Itanium would always be a heavy undertaking, they should have had a look at the Alpha, which was precisely designed for many generations of implementations, being done by people who'd gone through having to re-implement VAX and PDP-11.

To me the most important lesson has been that you need to plan for a many-generational roadmap for things like CPUs, which might have a code base measured in decades. It's perhaps getting less true for hyperscalers today but at the time, Intel's disability to execute a constant cadence of generational improvments was just another nail in Itanium's coffin.

Ignoring the IBM 360 lesson because you want to get rid of the 8008 burden was just too 80432.
 
  • Like
Reactions: bit_user
They scaled to 256+ sockets in a single system. Terrabytes of RAM? Yep, supported that when nobody but PowerPC else did.
I think a lot of what HP brought to the partnership was that experience with such mainframe-tier systems.

Saying that they didn't support X86 didn't matter because they were targeted to custom software.
Intel's plan seemed to be for IA64 to replace x86. Remember, Intel didn't design the 64-bit extensions for x86 - AMD did! Intel's plan was to sunset x86 and have IA64 become the new mainstream 64-bit computing platform. In order for that to succeed, x86 performance was a relevant consideration.

Those of us old enough to remember can look back to the Pentium Pro as Intel's first big misstep in transitioning faster than the market. It suffered a regression on 16-bit performance, relative to the Pentium. It turned out that there was so much 16-bit code still in use that, Intel decided not to market the P6 as a mainstream successor to the Pentium, but rather more of a workstation/server CPU.
 
  • Like
Reactions: NinoPino