[citation][nom]Hupiscratch[/nom]Could Tom's Hardware put some numbers of GFlops of single-precision computational? And what's the i7 980 performance?[/citation]
Well I could give you a few examples for reference:
Athlon X2 5200+ - (2 cores) * (2 add + 2 muls) * (2.6 GHz) = 20.8 GFLOPS
Core 2 Duo E8400 - (2 cores) * (4 adds + 4 muls) * (3 GHz) = 48 GFLOPS
Core i7 980X - (6 cores) * (4 adds + 4 muls) * (3.33 GHz) = 159.8 GFLOPS
Cell - (1 PPE + 7 SPEs) * (4 madds) * (2 flops/madd) * (3.2 GHz) = 204.8 GFLOPS
Radeon 5870 - (1600 ALUs) * (1 madd) * (2 flops/madd) * (0.85 GHz) = 2720 GFLOPS
Of course, all of the these values are for peak theoretical single-precision computation. Most architectures can achieve close to their peak performance with convolutions or matrix multiplication, but in most real-world applications, achieving even half the peak would be impressive. In many cases, just getting data into the processor fast enough is a bottleneck. Cell largely overcomes this problem by giving each SPE a small amount (256K) of very fast local memory in lieu of a traditional cache. This software-managed memory is what can make the Cell hard to program (as Carmack stated), but also what gives it its impressive performance (and not just theoretically).
I think the Cell was a decent compromise given transistor budgets of the time, but the main thing it's good at (streaming computation) has been encroached on by GPUs. For "general-purpose" code, modern x86 CPUs can run circles around it, so the architecture doesn't really have a clear purpose anymore, which is probably why IBM announced they were abandoning Cell development a while back.