News Intel's Itanium Is Finally Laid To Rest After Linux Yanks IA-64 Support

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
if you thought Intel x86 CPUs were overpriced, they paled in comparison to the cost of an Itanium.
Eh, that's only because Itanium wasn't intended to be a mainstream CPU, though others were to follow in its wake. Those got cancelled, as Intel found it had to counter the performance threat posed by AMD and ultimately deliver a viable 64-bit option based on x86. Furthermore, considering how late, over-budget, and relatively slow Itanium was, Intel decided they couldn't leave the x86 market, after all.

Intel had this big (and remarkably stupid) idea that people were so stupid and inept that they would continue to buy only Intel CPUs even if Intel abandoned x86 to avoid any competition from AMD or VIA.
There was supposedly a built-in x86 front end, to deal with legacy code. However, I think Intel's hubris wasn't so much about its brand name as the expectation that IA64 would deliver on its technical promises and mature into something x86 simply couldn't. I think that experiment wasn't allowed to play out, fully, but it indeed had some serious issues.

Enter Jim Keller....

A brilliant CPU designer at AMD named Jim Keller designed a CPU that AMD referred to as "K8". K8 (also known as the sledgehammer core) was a 64-bit CPU architecture that was backwards compatible with all previous x86 CPUs. It was released on the market by AMD as the Athlon 64 using the AMD64 instruction set.
...
As a result, all x86 CPUs made today are based on the AMD64 architecture. People had stopped talking about the Itanium less than a year after it was released and I thought that Intel had just quietly retired it.
Let's consider the timeline.
  • Itanium launched in 2001-06
  • Itanium 2 launched in 2002-07
  • AMD Opteron launched in 2003-04
  • Montecito (Itanium 2 9000) launched in 2006-07

Itanium 2 launched against Northwood Pentium 4's. Their clockspeeds were like 900 MHz vs. 2.53 GHz. I think Intel probably started to have doubts about Itanium delivering the goods, even before they saw the performance of the K8. Otherwise, they wouldn't have waited 4 whole years to launch a successor in Merced (Itanium 2).

Now I'm wondering what it was used for because it wasn't big in servers. I never saw it competing with Xeon, Opteron or EPYC in the server space. Its use-case must have been the very definition of "niche".
Itanium was Intel's only 64-bit CPU until I think some Xeons based on the P4 Prescott microarchitecture, in 2005. Intel was trying to use it to capture the mainframe, datacenter, and HPC market, which was dominated by UNIX-based systems from DEC, Sun, IBM, HP, and SGI/MIPS. If you weren't around any of those other systems, then it's not surprising you didn't come into contact with any IA64 machines.

As for where it gained some staying power, I believe that was largely in the banking & financial server market. Typical places where you'd have previously found mainframes.
 
Last edited:
VLIW has been used broadly, I'd subsume EPIC as a variant for simplicity, but the aim was to get more done using dense code and data representations,
First, it wasn't VLIW. People need to stop saying that - it's simply not true. That includes you, @NinoPino . The key distinction is that VLIW has no runtime scheduling logic. EPIC simplifies runtime instruction scheduling, but the whole reason it encodes dependencies between the instruction packets is specifically to enable it!

Second, I don't know where you get the idea that it was about code density, because what is it about these instruction words or packets which are packed with NOPs is consistent with that?

doing a type of novel CISC after RISC design that was very complex in hardware and for the compiler.
The whole point of VLIW - and to a lesser extent, EPIC - is to simplify the hardware, by not having it perform dependency analysis of the instruction stream and dynamic, out-of-order scheduling. Well, EPIC does the latter, but by having the dependencies Explicitly specified, it simplifies the instruction decoder so it doesn't have to detect them. This also facilitated expanding the register file, substantially.

They didn't repeat the mistakes of the early MIPS implementations where erroneous instruction scheduling could lock up a CPU, because that design pushed pipeline management to the compiler.

And the main reason it was so slow was because such code is very branchy logic code, at which the Itanium extremely poorly. It was designed as a heavy floating-point loop machine, not for nimble light logic, which is what most of x86 code was also.
Yes, because Intel never made one that did branch prediction, speculative execution, or OoO. Instead, they added SMT, which was fine for the sort of highly-multithreaded server apps that became its main workload.

Intel's disability to execute a constant cadence of generational improvments was just another nail in Itanium's coffin.
Nah, they simply didn't want to. Ever since Itanium 2, Intel was clearly putting minimal resources into IA64 and just refreshing it to fulfill contractual obligations.

The Itanium was a floating point loop machine, designed for engineering or HPC workloads.
Sadly, it was soon outclassed in many such workloads, because Intel never bothered to add SIMD/vector instructions to it. This is another point in the argument that Itanium could've gone a lot further than where Intel left it.
 
Last edited:
  • Like
Reactions: mitch074
First, it wasn't VLIW. People need to stop saying that - it's simply not true. That includes you, @NinoPino . The key distinction is that VLIW has no runtime scheduling logic. EPIC simplifies runtime instruction scheduling, but the whole reason it encodes dependencies between the instruction packets is specifically to enable it
It was. Look at the meaning of VLIW and you'll see that it was simply an advanced VLIW.
EPIC is the word used to describe the pride of HP/Intel engineers but the base don't change.
 
It was. Look at the meaning of VLIW and you'll see that it was simply an advanced VLIW.
No. I have firsthand experience with programming multiple VLIW processors. EPIC is not VLIW. Please don't spread misinformation.

EPIC is the word used to describe the pride of HP/Intel engineers but the base don't change.
The "Explicitly Parallel" part of EPIC refers to the way the instruction bundles contain dependency information. You don't find that in any VLIW ISA, because it implies runtime scheduling, which is anathema to VLIW.

Another universal feature of VLIW is a fixed mapping between instruction word slots and instruction dispatch ports. In other words, there are restrictions on which types of instructions can go in which slots. The number of slots equals the maximum number of instructions that can be dispatched per cycle. Again, the reason for this fixed mapping is that VLIW CPUs don't do any dynamic scheduling.

In the case of IA64, there's no constraint on which types of instructions can be placed into which slots of a bundle. The entire concept of a bundle exists merely for convenience.

The main reason for the difference is that VLIW code basically needs to be recompiled from one generation of CPU to the next. This isn't a problem for embedded processors, GPUs, etc. but it's a deal-breaker for general purpose computing. What Intel did with EPIC was to simplify the task of dynamic scheduling, in such a way that you could take code compiled for an older CPU and not only run it on a newer one, but also potentially avoid the "underutilization" problem that would result from a naive compatibility mode.

Finally, because all instruction scheduling in VLIW happens at compile time, VLIW instruction streams are rife with NOP instruction (these No OPeration opcodes are mere place-holders), since there are certain scheduling conflicts CPUs also normally negotiate at runtime. Examples include: instructions with latency > 1 cycle, instructions with throughput > 1 cycle, and register bank conflicts. These NOPs waste instruction bandwidth and I know EPIC avoids most of them.
 
Last edited:
  • Like
Reactions: mitch074
I really wish somebody would've prototyped a truly modern IA64 core on a huge FPGA. I'm thinking maybe a research department could undertake such a project, at a university with a strong computer architecture program. And open source it, of course, to draw contributions from others.

I'll bet you could achieve competitive IPC by implementing the same techniques as the most modern x86 and ARM CPUs employ. Obviously, the clock speed would be terrible from using a FPGA and you could probably only make a single-core CPU, but just to prove that IA64 had a lot of gas left in the tank.

Even though this news article is about Linux purging the last of its IA64 support (some of which has been broken for a while), there's no reason you couldn't go back and take the last known-good distro to support IA64 and use that as a starting point. Linux has even been known to revive support for certain CPUs, such as when someone wanted to run it on a Nintendo 64.
 
I really wish somebody would've prototyped a truly modern IA64 core on a huge FPGA. I'm thinking maybe a research department could undertake such a project, at a university with a strong computer architecture program. And open source it, of course, to draw contributions from others.

I'll bet you could achieve competitive IPC by implementing the same techniques as the most modern x86 and ARM CPUs employ. Obviously, the clock speed would be terrible from using a FPGA and you could probably only make a single-core CPU, but just to prove that IA64 had a lot of gas left in the tank.

Even though this news article is about Linux purging the last of its IA64 support (some of which has been broken for a while), there's no reason you couldn't go back and take the last known-good distro to support IA64 and use that as a starting point. Linux has even been known to revive support for certain CPUs, such as when someone wanted to run it on a Nintendo 64.
If you want to invest your heartache into a better iA64, why not have a look at the Mill? Ivan Godard seems to make a rather good argument that the Mill is what a redesign of the Itanium should be.

But it's not exactly going forward, either, general purpose CPUs fail when nearly every server workload today is a scale-out use case which really wants its own purpose architecture.

I was quite taken by the Itanium at first, too. But when they started running Oracle databases and Non-Stop on it, I just felt the architecture was abused by HP at the cost of their user base.

It's quite funny, just about any fault-tolerant hardware I've worked with in my career, actually turned out to be less reliable than plain old x86. I've dealt with fault-tolerant HP-UX Stratus machines costing a Maybach a piece, that were not only dog-slow but failed somewhere every other week (at least without stopping) and with Non-Stop systems being stopped for hours until somewhat could decide which SAN was good.

My team ran the same service on x86 Linux on a penny budget and we only ever had minutes of outage, fully covered by an eventual consistency design of the application (CAP!).

The Itanium guys kept on ranting about the lack of x86 reliability, long after it had gotten way better than the very same cables and humans who plugged them on both architectures.

Hyperscalers prove that the reliability of x86 isn't perfect, when you deploy millions of them. But the chances of being hit with dual faulty x86 servers in an enterprise deployment was low enough not to affect us for almost 20 years: it was never the hardware.
 
  • Like
Reactions: bit_user
I was quite taken by the Itanium at first, too. But when they started running Oracle databases and Non-Stop on it, I just felt the architecture was abused by HP at the cost of their user base.
Eh, with SMT (which they added @ 2-way in Montecito, circa 2006) it really shouldn't be as bad.

Interestingly, I'm reading that Poulson (2012) was not only a 12-wide architecture, but had a RAS feature called "Instruction Replay".

It's quite funny, just about any fault-tolerant hardware I've worked with in my career, actually turned out to be less reliable than plain old x86.
Yeah, when I did a brief stint as a sysadmin, I tried to go with the most standard hardware for our Linux servers, under the assumption this would be the best-tested code path. Back in those days, you couldn't install Linux on just any machine and assume it'd work.

Hyperscalers prove that the reliability of x86 isn't perfect, when you deploy millions of them. But the chances of being hit with dual faulty x86 servers in an enterprise deployment was low enough not to affect us for almost 20 years: it was never the hardware.
Yeah, I'd read that commodity CPUs sometimes suffer from errors inside their ALUs & other places not protected by ECC or parity. That's why I'd never overclock any machine I actually care about using for real work. One thing that irks me about Intel's decision to stop selling Xeon-branded desktop processors is that the Xeons were supposedly binned for better reliability.
 
No. I have firsthand experience with programming multiple VLIW processors. EPIC is not VLIW. Please don't spread misinformation.
EPIC is an evolution of VLIW but always VLIW in the origin.
Itanium have a 128 bit instruction word with 3 operations inside and as any VLIW architecture need the compiler to replace the hardware.
This is a typical Very Long Instruction Word.
I understand all your objections on the fact that there are a lot of evolution from the basic concepts, but it remains a VLIW architecture in his roots.
Is the same thing appened to all the others architectures. Take for example RISC and CISC, today are not very conformant with the classic definition of the '70, but we all continue to call them RISC and CISC.
If you think that calling it VLIW it reductive, I understand, but for me, saying that it is not a VLIW is wrong.
 
EPIC is an evolution of VLIW but always VLIW in the origin.
Itanium have a 128 bit instruction word with 3 operations inside and as any VLIW architecture need the compiler to replace the hardware.
As I said, the reason they bundled the instructions was for practical reasons. It's not like VLIW, where slots in the instruction word map to ALU dispatch ports.

A 128-bit bundle makes them byte-aligned and allows the dependency information to be specified at a granularity of 3 instructions, rather than per-instruction. Specifying dependencies at the granularity of every instruction would've been much higher overhead and probably wouldn't have yielded significant benefits.

The reasons people tend to confuse it with VLIW are probably the bundles and the emphasis Intel put on compiler-level optimizations. However, as I've repeatedly said, IA64 is dynamically scheduled and VLIW is not. That's a fundamental difference.

These are major distinctions. Like, practically bigger distinctions than you have between CISC and RISC.

Your entire argument basically hinges on the fact that IA64 uses 3-instruction bundles. That's a superficial similarity, because the underlying reason why IA64 uses them is different from why VLIW uses them. After the fundamental differences I've outlined, such a similarity is utterly meaningless.