The memory architecture of the Cell, is what really makes it shine. It's all about giving back control to the programmer. When doing high performance optimizations you have to second-guess cache behavior and this is more trouble than programming Cell SPEs. It also forces the programmer to think about main memory operations and start them early.
The Cell gave us the best of both worlds. A generel purpose CPU (PPE) for OS and "normal" software, while leaving the hard and trivial work to the SPEs. If the PPE had been a better performer (out-of-order etc..), the Cell would have been great in PCs (from netbooks to workstations).
I really hoped Larrabee could fill that gap. On one hand they want to make a GPU, for which the x86 and cache crap are useless, unless it is crucial to write drivers in x86 assembly
One the other hand the design stinks of threaded programming on to many cores, for the average Joe the programmer; this will never work!!!
They could start fixing it by:
- dropping the cache coherence crap, or at least make it possible to turn it of and go Cell
- dropping x86 if it can reduce die size.
Now they have a GPU that can accelerate calculations as well (like CUDA). Considering how many Cell that can fit into the 1.4 bill transistors of the GT200 this could be a killer.
Adding a modern x86 cpu to the die, would make it a wonderful PC processor.