I'm not 100% sure what they're saying they did here. Was it that they're suggesting that the CPU acted as a memory manager for the GPU? I'd be surprised if performance improved through that: if it did, that's a sign of where GPU engineers need to focus development: a 20% boost improvement on the same SPs and cache just by fixing issues with memory access is a huge thing.
What it sounds more likely is that they're speaking of integrating the GPU into the CPU's pipeline, at least virtually. This is something that makes sense: it's originally something that I was hoping would be seen with Llano. Anything that could be an "embarrassingly parallel" problem would be best offloaded to the GPU. While the SSE/AVX unit on current CPUs may be fine for some math, performance would indeed be better if the CPU could simply hand it off to another unit with vastly more power.
To put it into perspective, a single modern x86 CPU core, (including any of Zambezei's EIGHT cores) using an SSE instruction, can execute a single 4-wide 32-bit floating-point math instruction; the most common one used is "multiply-add," so that counts as a grand total of 8 FP instructions per core, per cycle. (4 multiply, followed by 4 add) This makes a 4-core Sandy Bridge top out at 32 ops/cycle, or a theoretical maximum of 108.8 gigaFLOPs for a 3.4 GHz Core i7 2600K. This is comparably VERY small once you put that side-by-side with a GPU, where each SP for an nVidia GPU, or each cluster of 4 SPs on an AMD GPU, can accomplish the same math throughput per cycle as a whole core on an x86 CPU.
Now, in all honesty I actually DON'T believe that having the GPU be separate on a discrete expansion card prevents this from being done; it merely introduces a lot of latency. While this might make some use less ideal, it's still quite possible for CPU tasks to be offloaded to the GPU, if latency isn't a critical requirement. It's quite possible that future architectures will provide us with a vastly lower-latency, more-direct interface between the CPU and the GPU. After all, current integration has put the main memory controller on the CPU die, and all but eliminated the Northbridge chipset.
The more telling thing here, though, is the fact that the original x87 line, before the 80486, was actually implemented as a separate chip on the motherboard, with its own socket, and it managed to work fine there. Granted, it DID simply sit on the same FSB as the CPU, but the physical distance proved to not be an issue. (similarly, cache used to be implemented on separate chips on the motherboard, which worked as well, albeit with higher latency)
[citation][nom]loomis86[/nom]You are completely missing it. This research proves separate GPUs are STUPID. RAM and BIOS will be integrated on a single die someday also.[/citation]
That would be even stupider. Need to replace the BIOS? GL, there goes the CPU as well! There's a reason that the BIOS has been separate from the dawn of the CPU. (the Intel 4001 served this purpose for Intel's 4004)
Ditto for RAM; the stuff needs to be quite variable. That, and by now the amount of silicon needed for a proper supply is huge: implementing all the same components you speak of on a single die would require a massive silicon wafer that would be MORE expensive than the current arrangement. This is because cost goes up exponentially as your die surface area goes up: not only are you getting fewer chips per wafer (due to higher surface area) but the failure rate ALSO goes up: the number of defects per wafer tends to be constant, but 8 defects on a 100-chip wafer is a mere 8% failure rate, while 3 on a 25-chip-wafer is a whopping 32%. (this is a lesson nVidia has learned the hard way again and again)
The same thing here applies to discrete GPUs: putting it on the same die as a CPU is stupid to apply to all applications. While for a tablet or phone this may make perfect sense, if you need high power, you simply can't get enough transistors on a single die. And no, you can't just wait for the next die shrink, because that'll provide more space your competitor is going to use to make a more complex and powerful GPU.
[citation][nom]alyoshka[/nom]Then they made the Sandy Bridge & The Llano which calculated the graphics with the help of a secondary Chip on board.[/citation]
Actually, the GPU portion of Sandy Bridge and Llano ARE on the same die. They are not integrated onto the motherboard, or even on a separate die in the CPU's package.