[citation][nom]kristoffe[/nom]Checking scores, it's almost as if the cuda cores are really just getting slammed in these 104's and not accessed properly in design. If they were a sign of proper parallel architecture, they would KILL the 560Ti, which I have 2 x 2gb in each of my rendering systems. It is not the case. Nvidia is simply engineering marketing now to keep up with ati's 'streams' when in fact the 560Ti 2gb was killing it and the power draw was reasonable.1344~1536 should show a parallel processing advantage of at least 4~5x that of the 560Ti and the scores on various sites are just pathetic. Hopefully someone comes out with a nice hack to enable or properly access the cores, otherwise, what is the point? And this new 660Ti with only 1.5gb, what they can't afford to put in parallel 2gb? ORLY? 4gb for a great custom 680 (which I have read about but never seen IRL)yawn[/citation]
That's not how it works. First off, these cores are not the same as the cores in the 560 TI. These are optimized for single precision math and aren't even capable of dual-precision math. They are also only half as fast as the older cores (although much more power efficient and not only because of the die shrink) due to the abandonment of the inefficient hot-clocking method use previously. The dual-precision capabilities of the GK104 are only from a small amount of 64 bit Kepler CUDA cores that don't do single-precision math. Well, since games run on single-precision math, these were not prioritized. This is why the Kepler cards are somewhat more power efficient than AMD's GCN based Radeon 7000 cards. They are purely designed for gaming performance and that is what they excel at when they're VRAM doesn't cause too severe of problems with it's too-small bandwidth.
Furthermore, there is 1.5GB because it has a 192 bit bus instead of a 256 bit bus... RAM chips have 32 bit buses. You do the math on how many chips a smaller bus can get. That's right, twelve. Twelve chips times 256MiB per chip means 1.5GB of VRAM. 512MiB chips are much more expensive than 256MiB chips. For example, 8GB DDR3 memory modules use 512MiB chips and although their prices have improved substantially in the last few months, they are still oftentimes much more expensive than a similar 2x4GiB memory kit. Also, there is a GTX 670 4GiB at newegg, not that it matters because in any situation where you can use that much VRAM in a gaming situation, the memory bandwidth holds it back so badly that you'd hate to compare it to a multiple 7950 OC or 7970 multiple 7970 system... There might be a 4GB GTX 680 out by now, but I don't really care to check and like I said, it doesn't really matter.
So, there's not a problem with CUDA cores being improperly accessed... The problem is that you don't know the situation. Beyond that, you ignore the other factors in performance... I guess that you didn't know about how increasing the core count linearly does not give a linear increase in performance and there are other limits in performance, such as the memory and more either. Heck, that's all ignoring any CPU bottle-necks and other bottle-necks that aren't directly related to the graphics card that can hold back performance.