• Happy holidays, folks! Thanks to each and every one of you for being part of the Tom's Hardware community!

Intel Releases Itanium 9500 Poulson Manual

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
[citation][nom]blazorthon[/nom]Cell is getting old. It'd probably need a refresh before it's a real competitor anymore. It's also hard to code well for.[/citation]


AFAIK, cell's programming problems were mostly attributed to a lack of optimized development tools. Each core is different, and compilers/APIs can't easily choose the correct thread on the most optimized core for that particular workload. Programmers would have to write code per core based on what they know the work is like, and pick and choose how to schedule those threads. I think development tools are mature to the point now that they can optimize easily and choose cores automatically, making programming easier. I do agree with you, though, that the hardware itself needs a refresh.
 


I'm just separating out the design philosophy (VLIW) from the implementation of that design (EPIC). Very important to do that because their are other implementations of VLIW. Each implementation does something different or attempts to tailor their design to a specific application. VLIW itself was made back in the 70's when the engineers were looking for ways to get more raw powers out of the same silicon. Most design's took a serial approach with the CPU processing one instruction at a time, VLIW's idea was to ship multiple instructions to the CPU simultaneously. An "instruction word" is just an engineering term used to describe the standard size of a CPU's instruction, most being 16~32 bits in length. Sending four instructions as a single transaction would result in a 64~128 bit word, and thus the term "Very Long Instruction Word" was born. It's like RISC, not a hard and fast standard but rather a set of principles and concepts to use when designing a CPU for a specific result.

GCN being the big exception, most modern GPU's handle instruction words via VLIW ISA's native to each design. The drivers compile from graphics commands to the binary GPU language on the fly.

As for the comment about "no common VLIW" that's flat wrong. No common VLIW in computing yes, every stereo mixer or DSP in the world use's a VLIW CPU. VLIW lends itself extremely well to digital multimedia processing. Sucks at general computing though.
 


Radically different performance characteristics. The real question should be "why Itanium over SPARC / POWER" as that's the world their competing in.

Most of those big iron RISC systems have insane I/O and parallel workload characteristics. My specialty is in SPARC so I'll demonstrate on a T2 and T4 CPU.

http://en.wikipedia.org/wiki/UltraSPARC_T2

1.6Ghz
Eight cores, each core able to handle eight threads simultaneously (think Intel's HT on crack). 64 threads per chip.
Four DDR2 memory controllers (this chip was designed before DDR3) per chip
Two 10GbE controllers integrated into each chip
One PCIe x8 interface
Hardware encryption / hashing

The T2+ update added glue-less SMP capability for up to four chips per system and removed one of the 10GbE controllers.

The newer offering is the T4 CPU
3.0Ghz
Eight cores, each core doing eight threads, 64 threads per chip
Four DDR3 1066Mhz memory controllers per chip
Two 10GbE controller per chip
Two PCIe 2.0 x8 interfaces
Hardware encryption / hashing
Glue-less SMP support for up to four sockets
Hardware priority scheduling (CPU can prioritize thread scheduling on it's own without intervention from the OS)

The system boards themselves have several PCIe bridges for multiple system bus's.

When building these, it's a massive waste of space / money to buy single socket. You tend to go with dual socket for most things and quad for extremely heavy workloads. The other approach is buy a T6000 chassis and fill it with dual socket blades and a disk array module.

What you get is a system that excels at running 200+ tasks simultaneously. That many running tasks cause's a ton of memory and bus I/O, especially with disk controllers.

The direct opposite to this approach is the "googleplex" concept where you run a ton of smaller / slower systems and assign each a different task.

Each concept is better at running different types of workloads. The HPC approach is for massive databases with multiple JREE webapp front ends with high amounts of disk activity. The distributed approach is for anything that can effectively be broken into multiple prepackaged tasks with little dependency between then.
 
[citation][nom]ashinms[/nom]***, I think I just sparked a flame war... *facepalm*Either way, I was really just relating go the architectures being different. As far as bulldozer being a bad architecture, all I can say is that mine does everything unwanted it to and I have no complaints.[/citation]
That's fine; I know they're different, but I was specifically highlighting the fact that Bulldozer performs rather poorly if you're not throwing a lot at it. Yes, you can disable cores in order to get the less heavily threaded performance up, but if somebody sells you a car with 8 cylinders, you'd be a bit annoyed if you had to use 4 of them. Sure, you'd save fuel in general but you'd lose power, however for pottering around town with, it's better suited. I know... bad analogy.

Bulldozer certainly isn't a bad architecture, far from it; it's just a bad implementation. Over the next two to three years, it'll come good.
 


Funny, back when I was in highschool, my mom bought a van that did exactly that: it went from 6 to 4 cylinders when cruzing. I do agree about bulldozer, though.
 
4 cylinders certainly makes sense if you're not pushing your foot to the floor. 😛 F1 cars also cut to 4 cylinders when at low RPM such as on the grid or in the pit lane but I wasn't aware of how widespread it was in the consumer automotive industry.
 
Wow, all this bickering over EPIC vs. VLIW. Hopefully, I can shed a little light on this matter for anyone interested in a bit of enlightenment.

If you've worked directly with VLIW in the real world, it would be immediately clear to you what problem in classical VLIW Intel was trying to solve in EPIC: binary compatibility. Classical VLIW offloads the burden of instruction scheduling onto the compiler. It's all done statically, at compile-time. Unless you're doing JIT compilation, this means that the executable must be compiled for a specific generation of the chip.

What EPIC does is to simply encode the instruction dependencies (hence the "Explicitly Parallel" part of the name). That simplifies the task of scheduling, making it easier to perform at runtime. But, unlike VLIW, Itanium CPUs still do the scheduling at runtime. This lets the scheduler on a future CPU schedule more instructions in parallel, in order to take advantage of additional pipelines. They can also do a better job of scheduling around pipeline latencies, cache misses, branch predictor misses, etc. It takes a lot of the downside out of VLIW, though the tradeoff is more overhead and it doesn't scale quite as efficiently.
 
And as for the reason Itanium never went mainstream, it's a fact that Intel built a fortress of patents around the Itanium. Unlike x86, it's completely impractical for anyone to build a compatible CPU. Without a choice between vendors, many of the big businesses that drive adoption of new technologies decided to pass on Itanium, since Intel would have them over a barrel once they became dependent on it.

Add to that, the fact that AMD was making some really competitive x86 offerings at the time, and introduced the 64-bit extensions. Suddenly, Intel's case for why everyone should switch to Itanium evaporated, and they were left with only the Mainframe market that's accustomed to long-term contracts for expensive, proprietary tech.
 
Status
Not open for further replies.