It's a shame that you didn't list which companys' spokesmen claimed that the discrete video card was on its way out. It'd have been nice to know which motherboard makers to avoid.
Those who liken the GPU to the x87 FPU are missing several important points. The FPU, unlike the GPU, was not stand-alone: it was basically another segment of the CPU, much like how processor ICs were before Intel integrated all the components together in the 4004. The truth is, the CPU and GPU are both examples of engineering: they take a finite amount of resources, (in this case, price, silicon area, etc.) and make choices on how to best trade off one aspect for another. Sure, if you raise your resource quantity (by spending more money, or getting a die shrink, etc.) you could then make a design that incorporates the best of both designs... But guess what? The trade-off designs could be likewise improved too!
Also, all these "people made wrong predictions in the past about computers!" arguments are actually pretty null; the same sort of argument tends to be popular with anyone advocating unaccepted fringe theories. "Oh, the people didn't believe Gallileo in the 1600s! So my fringe theory will automatically don that mantle of martyrdom without any justification!" No. Just, no. Instead, try educating yourself about computer architecture. We CAN make predictions right: I distinctly recall a man making a prediction that's held true for half a century... His name was Gordon Moore, co-founder of Intel.
Here's some REAL reasons you can't simply have just CPUs:
- First off, it's clear that the enthusiast crowd cannot be satisified. Already, we're up to 1 CPU plus 2 or more GPUs; that's way too much silicon surface to put in one area, especially with the trend of chips getting progressively hotter and more power-hungry. And no, very obviously we AREN'T gonna just wait (waste) two fabrication generations to get the same tech to fit on one chip; competition wouldn't allow it; if AMD decided to halt advancing so that they could cram a quad-core Bulldozer and a pair of 6870s on a single chip in 2013, guess what Nvidia and Intel will be doing in the meantime?
- Secondly, both the CPU and GPU fit different needs, which come in different amounts; Crysis, for all its impressiveness, doesn't need a quad-core; a "fusion" chip for it would best be a 2/3-core with tons of GPU hardware. On the flip side, StarCraft II doesn't need as much graphics hardware, but could certainly use more CPU horsepower. Hardly anyone plays just one game, so a combined CPU+GPU for a PC just won't cut it!
- Third, what is there to gain, performance-wise, by putting the GPU on the die? Nothing. With the FPU and cache, the CPU saw the benefit of reduced latency; this meant less time spent waiting to fetch that crucial operand, or waiting for that instruction to finish executing; those are the two chief bottlenecks any CPU ALWAYS sees. But with the GPU, once the CPU's finished seting up the scene for the GPU, it's DONE once it's sent the data over. And it's a LOT of data, sent in a stream: so what if sending it over PCI-e takes a few extra clock cycles to get started? It's going to be occupied for millions of 'em per frame. Remember that PCI-e was a DOWNGRADE from AGP in terms of latency, but PCI-e won out because of higher bandwidth.
Now, here's perhaps the REAL major reasons: the memory architectures needed for a CPU and GPU are wildly different; you can't satisfactorily meet the requirements for both. There's a lot to be said here:
- Without resorting to overclocking, 19.2 GB/sec is the highest memory bandwidth an enthusiast can get on a PC, using an Intel Core i7 CPU. Make it an AMD or an i5, and that drops to 12.8 GB/sec. (since they're dual-channel sets) A modern high-end card from either Nvidia or ATi will readily pass 100 GB/sec. Obviously, the 2/3-channel DDR3 setup for main memory won't cut it for graphics.
- Sure, you could swap for GDDR5 for main memory... But it'd bring forth very bad latency that will drag the CPU down, killing the point. Main CPU memory often comes with a CAS time of <5 nanoseconds; 10 ns is high enough the CPU will scream for mercy. Meanwhile, GDDR5 setups routinely seem to pass 20-30 ns without so much as the GPU shrugging; texture fetches and block-writes are big, predictable access patterns. Hence, latency matters far less than raw bandwidth for a GPU... But it's the opposite for a CPU: 12.8 GB/sec is overkill, as much of the main memory's time is spent waiting to switch to the right bank.
- Yes, this disadvantage in latency can be handled by good programming and handling of a CPU's caches. However, tweaking them is best done on a per-model basis; impractical to do on PCs, where you have two different manufacturers with a dozen lines and hundreds of models. (and if we had to factor for GPU permutations, that would rise to thousands) Hence, this is only done to a high level on consoles, where there's only ever one CPU to tweak for. (An Xbox programmer knows that all 360s use the same Xenon CPU)
- Lastly comes a matter of price. Sure, the solution for more bandwidth is more channels and faster speeds. However, DIMMs only provide 64 bits of interface per slot, and high-end cards run 256-512-bit. So that's 4 modules MINIMUM. Further, if you needed GDDR5 memory for graphics, guess what? you gotta waste money on GDDR5 capacity the non-graphics part doesn't need; so a machine that needs 4GB for the CPU and 2GB for graphics will have to buy 6 (or 8 for 4-channel!) GB of GDDR5, even though only 2 will be needed. Sure, the ultra-enthusiasts wouldn't mind, but what about those $500-750 gaming machines? Having to buy more expensive main memory makes an affordable gaming rig impossible. (anyone remember Rambus RIMMs?)
- An alternative would be to have two memory controllers; one for DDR3, the other GDDR5; but that adds further complexity, more pins, more package size to the CPU, added complexity and cost for the motherboard... Guess what? You basically just made a GPU-less design as complicated and expensive as simply having a discrete graphics card.
Overall, unlike the discrete sound card, the discrete graphics card has a secure place in the future. The former failed to offer more than a flat benefit in the face of exponential CPU increases. But the graphics card isn't staying flat; it's making exponential growth as well, potentially even faster than the CPU. In all, to sum up what the graphics card is not like:
- The graphics card is not like the FPU or cache. CPU->GPU latency isn't an issue, so there's nothing to gain by moving it on-die.
- The graphics card is not like the sound card or network card. Graphics are getting exponentially more complicated over time; audio and networking weren't changing, and eventually became an inconsequential task.
- The graphics card is not like the physics card. The physics card idea wasn't a good one to begin with. (hence why the PhysX card never succeeded)
- The graphics card isn't just an overglorified extra CPU. The CPU and GPU are paragons of not just opposite extremes of processor design, but also memory architecture as well.
- The technology will not "improve" to make things make sense. Sure, in 4 years an AMD fusion processor can compare to today's high-end discrete graphics card. But it'd be laughable compared to a discrete graphics card 4 years down the road.
Above all, perhaps, is this lesson: Moore's law alone can't let you catch up with someone if they're using Moore's law too.