[citation][nom]Shadow703793[/nom]1. Why not use Cell... ...fore more acceleration.[/citation]
Contrary to what most are under the belief of, the Cell actually sucks for general-purpose computing operations. Each of the SPEs is utterly "dumb" in that it cannot handle complex operations, and really only works well for streaming SIMD operations.
Additionally, the performance numbers claimed by Sony are utterly bogus, just like their claims that the PS2 would've been restricted and impossible to export from Japan for being "too technologically advanced." Basically, it's NOT a 1-2 TFlop chip, and in the PS3, can only handle up to 185.6 GigaFLOPs as its theoretical peak... If they are 32-bit operations! FP32 is not very useful for heavy-math applications, including gaming, physics, and anything a supercomputer would be handling, which is usually dealing with double-precision FP64, which is used by the well-known LINPACK benchmark, (the standard for testing supercomputers) as well as popular distributed computing applications such as all those @home programs. On such a test, the Cell's design would cripple it to a peak theoretical performance of around 51.2 GigaFLOPs; a number that's comparable to, say, a readily affordable PC gaming CPU like the Core2Duo E8400, with the C2D having the advantage of being vastly more programmable and flexible, meaning it'd most likely score much higher on LINPACK. In other words, the peak for the Cell assumes that the 1/8th of the 64-bit power that lies in the PPE isn't being sapped by giving commands to the PPEs, and that the 7/8ths that are the SPEs aren't being held up waiting for commands from the PPE, which is a situation that'd be very hard to get in the first place, and an impossible one to sustain outside of streaming SIMD for longer than a couple clock cycles, let alone the 3.2 billion cycles for a full second. And once you consider the extreme level of power consumption of the Cell... It doesn't make sense for supercomputing applications; it winds up LESS efficient than other designs.
Basically, the Cell was designed to be really good at one thing: handling streaming media. Encoding/decoding is an application that's very intense on low-precision math, and not very intense at all when it comes to instructions. Hence, the Cell is a perfect fit; it is the only chip out there (CPU, GPU, whatever) that's capable of handling SEVERAL fully high-definition video streams, complete with HDCP and even in exotic formats like DivX/XviD, all simultaneously, an application that is largely impossible for most PC CPUs without the aid of a graphics card alongside it.
[citation][nom]Shadow703793[/nom]Why not use... ...GPUs fore more acceleration.[/citation]
It's not the same problem with Cell processors that results in no supercomputers built out of GPUs, but nonetheless, there are technical reasons behind it that keep them from being used as well. There are actually three that I can think of.
The first is that of memory latency; GPUs operate with a very high degree of latency on the memory; since they're handling relatively linear tasks, and when dealing with textures and shaders, always call up very large, sequential blocks of memory at a time, having a CAS latency of 30+, 40+, or more clock cycles doesn't really matter, since the GPU will know much farther in advance what it'll be needing next around 99% of the time. The same benefit can be applied to decoding media; being a streaming application, latency doesn't hurt it. However, when it comes to scientific applications, that really can be harmful, as in those cases the predominant bottleneck invariably winds up being data and instruction latency, something that's also hurt heavily by how GPUs have an extremely skewed processing unit-to-cache ratio, a ratio that's vastly different than what's found in general-purpose CPUs.
The second reason that occurred to me is the lack of a standard multi-GPU architecture that would be able to support a large quantity of GPUs even just for mathematic operations; the current limit for ANY design appears to be 4 GPUs, from either nVidia or ATi/AMD. So, while yes, while in theory you could produce the same floating-point capacity using only 1/7.5th the number of RV770s compared to what Sequoia uses (i.e, 13.3% the number) as of yet, there is no way to actually build that assembly, so in practice, it's a moot point.
The final reason is actually that of power and heat; GPUs may have a very high degree of performance-per-watt efficiency when it comes to math, but they STILL have a very high TDP per chip. The cost of the actual chips are usually one of the minor parts of a supercomputer, as a lot more care has to be given to providing enough power to stably run thousands upon thousands of nodes, with not just multiple CPUs per node, but all the other components as well, all of which must be powered and cooled. With GPUs, you're going to have your heat production focused on a far smaller number of chips, so you'll need to actually have more intensive cooling, and likely greater spacing between GPUs, since you can't just blow hot air out the back of the case, since there will be more nodes in every direction. There's a good chance that one would actually have to construct a LARGER facility to house an equally-powerful supercomputer built from GPUs than one built from multi-core general-purpose CPUs.