CUDA cores and memory interface width...Help understand their relation

satyamdubey

Distinguished
Hello everyone,

I want to know how much does the interface width offset the extra number of CUDA cores in any gpu family. In Tom's hierarchy chart, the GTX 580 sits three tiers above the GTX 650 TI. the 580 has less CUDA cores (512) than 650 Ti (768) but the interface width is more than double of it, 384 vs 128. It also has 1544 Ghz clock vs the 650 Ti's 928.

So how do all these parameters work together to make a card faster and why would NVidia put more cores in a GPU that is supposed to be less powerful? could they have used 512 CUDA cores and a 192 bit interface width to achieve similar results?

appreciate your inputs.

many thanks
-Satyam
 
Solution
satyamdubey writes:
> I want to know how much does the interface width offset the extra
> number of CUDA cores in any gpu family. ...

Memory bandwidth is critical for good CUDA performance, though it does
vary a bit by application. I refer to it as, 'aggregate memory bandwidth
per CUDA core'. A GTX 780 has lots of cores, but it doesn't remotely have
the bandwidth to feed them, making it less efficient in how CUDA loads
can be parallelised. Thus, one 580 can beat a 780, and two 580s easily
beats a single Titan despite having less than half as many cores total. See:

http://www.tomshardware.com/reviews/geforce-gtx-760-review-gk104,3542.html

Toms doesn't have an AE CUDA test yet, but I'm hoping to help them out
with that in the Fall...

satyamdubey

Distinguished
okay I did not know Kepler was slower than fermi. It would make sense to have more of slower cores and the way the number of cores has jumped in 6xx series cards also kinda points to that fact. I mean the 570 has 480 Cuda cores and the 670 has 1334.

what about the effect of interface width. how does that work. Thanks for taking the time to answer Sumsum(your name is too long :)).

-Satyam
 
Bus with is looking at a highway...two lanes moves x number of cars per clock cycle. Four is twice as much...in video cards low end is 64 bit wide controller then 128...192...256...512 for top end cards. With nvidia Cuda core count it the number of working cores on the chip. The 680/780 has set Cuda count the 670/770 count is less because there failed 680/780. Nvidia won't scrap them just tags them as lower bin part number. The biggest issue is that nvidia on the 600 line uses three different gpu cores. Then new 700 is a refresh of one of nvidia gpu chip line that has higher overclock and faster ram.
 
It's honestly not that important - all that counts is the end result (frames per second). There's been this stupid obsession with memory bus width ever since Igor's stupid comments in his GTX660 Ti review. Fact is, bus width is means to an ends (the ends being bandwidth - divide bus width by eight (bits to bytes) and multiply by effective memory clock frequency to get bandwidth in MB/s). Bandwidth is in turn another means to an ends (framerates). Since nobody is capable of replacing/upgrading the memory architecture on a card they buy, it makes exactly zero sense to single it out as a performance factor any more important than cores, ROPs, fill rate or anything else.

Think about it - how does the 192-bit GTX660 beat the 256-bit Radeon 7850? Hell even the GTX660 Ti is 192-bit and that beats the hell out of the 7850 despite a narrower bus width. My old 8800GTX from 2006 had a 384-bit bus width (look it up if you don't believe me) and that would get absolutely trampled today even by a basic GTX650. Since it used DDR3 and not GDDR5, data transfers per clock cycle were halved, so half effective bandwidth. Still just a means to ends.
 

mapesdhs

Distinguished
satyamdubey writes:
> I want to know how much does the interface width offset the extra
> number of CUDA cores in any gpu family. ...

Memory bandwidth is critical for good CUDA performance, though it does
vary a bit by application. I refer to it as, 'aggregate memory bandwidth
per CUDA core'. A GTX 780 has lots of cores, but it doesn't remotely have
the bandwidth to feed them, making it less efficient in how CUDA loads
can be parallelised. Thus, one 580 can beat a 780, and two 580s easily
beats a single Titan despite having less than half as many cores total. See:

http://www.tomshardware.com/reviews/geforce-gtx-760-review-gk104,3542.html

Toms doesn't have an AE CUDA test yet, but I'm hoping to help them out
with that in the Fall. In the meantime, see:

http://forums.creativecow.net/thread/2/1019120


> ... could they have used 512 CUDA cores and a 192 bit interface width
> to achieve similar results?

No; when the 580 was new, it was the top-end card, and NVIDIA knew that
providing 512 cores with a lot of bandwidth makes a big difference in
many situations, including gaming (heavy AA, high-res displays). What
NVIDIA has not done is make any kind of equivalent card today. The mem bw
of all the 700 series cards, and the Titan, really needs to be at least
2X more to feed that vast number of cores present (512bit minimum,
preferably more). In this respect, for CUDA, the 780 and Titan are
particularly disappointing. For the same cost as a single 780, one can
get four 580 1.5GB cards, or three 580 3GB cards, which would be much
better for AE. Other apps may vary, but tom's results show the 580 doing
very well in all CUDA tests except Fluidmark because that's an aggregate
result which includes normal 3D tests aswell.

So, for example, if you're building an AE system, it boils down to
budget. If you can afford it, get multiple Titans and benefit from the
larger RAM. If not, get used 3GB 580s.

For example btw, the last 580 1.5GB card I won on eBay only cost me 96.50 UKP
(about $150 US). My AE system has four 580 1.5GB cards, the total cost of
which was just 472 UKP - that's a quite a bit less than one 780 card, but
massively faster for CUDA.


Btw, the GTX 460 only has 336 cores, but already showed that having a
narrower bus can hurt performance in some situations (the standard 460
has a 256bit bus, the V2 card has a 192bit bus), so the difference is
clear even with much less cores than a 580. However, irony of the 460
is that the V2 card has higher clocks, creates much less heat and thus
can be oc'd way more than a normal 460 - one of my 460 V2s runs at
1025MHz with a lower vcore than my original 850MHz EVGA FTWs.

Ian.

 
Solution

mapesdhs

Distinguished


They are faster for gaming (lots of cores, ROPs, etc. matters), but they're not faster for CUDA. I'm sure
NVIDIA would be the first to point out that if one was serious about CUDA applications then one ought
to be using Tesla, since that has ECC support, a full speed PCIe return path (gamer cards don't do this),
proper 64bit fp support and a lot more RAM. The reality of course is that a lot of solo professionals and
even small businesses can't afford Teslas. The ideal in a workstation would be Quadro + three Teslas,
but without the budget for that the next best thing is four 580s. For some apps though, using any gamer
card is just not viable - the consequences of a RAM error are too serious, eg. financial transaction
processing.

Ian.

 

satyamdubey

Distinguished
Ian, thanks for an awesome explanation. I am however looking at CUDA more as an architecture and less as a programming standard (The word CUDA is used for two things right?....I do not know a lot about it). But going by your aggregate memory theory, if a game was specifically optimized for CUDA (Architecture/programming standard/the more appropriate word), would a 580 still hammer a 680?

I understand you admit that having more ROPs and extra cores benefit games in general but what would it mean if a game was optimized for CUDA like AE?
 

satyamdubey

Distinguished


Okay.... I finally got it. Kepler/Fermi are CUDA compliant architectures which can run CUDA compiled codes/applications right?
 

mapesdhs

Distinguished
SomeoneSomewhere is right, NVIDIA's terminology is a bit skewiff.

Games primarily involve 3D functions, which is not the same as GPU acceleration of
other mathematical operations. The only aspects of games which can benefit from GPU
acceleration are things like physics engines, and that's handled by PhysX anyway
which is the same idea. To that extent, as I understand it, games and CUDA aren't
really related. A game uses a GPU directly for its main purpose: 3D operations.
Concepts like CUDA are just ways of exploiting the processing power of a GPU for
other tasks. Trying to think of a suitable analogy... umm... how about, saying a game
has been optimised for CUDA is a bit like saying a pair of skis has been optimised
for a sandy surface. :D Not, as it were, a ski's native area of strength, if you see
what I mean. A game simply uses the standard functions of a GPU directly.

Note that I can send you some more detailed comments about CUDA by someone
I know who does a lot of CUDA programming. Just PM or email me.

Ian.

 

satyamdubey

Distinguished
Trying to think of a suitable analogy... umm... how about, saying a game
has been optimised for CUDA is a bit like saying a pair of skis has been optimised
for a sandy surface. :D Not, as it were, a ski's native area of strength

Not really. It's like a passenger loco being repurposed to haul freight. It's still good at it, but there's better ones out there. ...
Am I getting you guys right in thinking that a CUDA core's main purpose is NOT gaming rather it is productivity work.... the area of focus of workstation graphic cards? Isn't gaming industry the main revenue generator for GPU's?
 

satyamdubey

Distinguished
Thanks to all you guys. Smo and Sam, as always there was much to learn from you guys :). Someone Somewhere and especially Ian, thanks a lot for all the insights and knowledge. you guys have got me really excited about reading up more on this and Ian, I hope I can bother you when I have more doubts about GPU tech. I hope it will be okay with you.

all of you have a great day. see you around :)

-Satyam

p.s. I cant see Smo's avatar either.
 

mapesdhs

Distinguished
satyamdubey writes:
> Thanks to all you guys. Smo and Sam, as always there was much to learn from you guys :).
> Someone Somewhere and especially Ian, thanks a lot for all the insights and knowledge.
> you guys have got me really excited about reading up more on this ...

Most welcome!!


> ... and Ian, I hope I can bother you when I have more doubts about GPU tech.
> I hope it will be okay with you.

Sure, feel free. My contact page is here.

Ian.