I'm kind of confused why you guys are jumping on 64-bit code not being common. There's no point for most applications, unless you like taking more memory and running slower. 32-bit code is denser, and therefore improves cache hit rates, and helps other apps have higher cache hit rates.
Unless you need more memory, or are adding numbers more than over 2 billion, there's absolutely no point in it. 8-bit to 16-bit was huge, since adding over 128 is pretty common. 16-bit to 32-bit was huge, because segments were a pain in the neck, and 32-bit mode essentially removed that. Plus, adding over 32K isn't that uncommon. 64-bit mode adds some registers, and things like that, but even with that, often times is slower than 32-bit coding.
SSE and SSE2 would be better comparisons. Four years after they were introduced, they had pretty good support.
It's hard to imagine discrete graphic cards lasting indefinitely. They will more likely go the way of the math co-processor, but not in the near future. Low latency should make a big difference, but I would guess it might not happen unless Intel introduces a uniform instruction set, or basically adds it to the processor/GPU complex, for graphics cards, which would allow for greater compiler efficiency, and stronger integration. I'm a little surprised they haven't attempted to, but that would leave NVIDIA out in the cold, and maybe there are non-technical reasons they haven't done that yet.