Marcelo Viana writes:
> It's relevant for common readers, and this site is full of. But of course you clear are not.
Careful, if my head gets any bigger it'll need its own post code... :}
> Very nice site mapesdhs, lots of information, thank you for sharing.
Thanks! Most welcome.
> As for Catia i totally agree with you something must holding back. Whish i could have the cards to do it myself.
Hopefully I'll be able to work out something about this when I've tested
using my other systems, though not an ideal comparison - for that I'd need
a newer 2-socket XEON system using two CPUs with 6+ cores each (can't see
that happening any time soon, too expensive; it was painful enough
building up the Dell T7500 from scratch).
> Very interesting point, and at least a diferent way to go. “Dig, dig, dig is always the way”.
Thing is though, even if I did find out what I mentioned above, the
conclusion may only be valid for that particular application and that
particular GPU. Let's say just as an example that the CUDA version of a
task is 30% quicker than an OpenCL implementation of the same task; can
one infer from this that a CUDA version of any task will be 30% faster?
Certainly not. Alas, though it would be great to have a range of data
points on this, atm there isn't even one example one can examine to gain
some insight into any API efficiency differences. There's an assumption
(and maybe it's valid) that given the option, CUDA is the better choice,
but where's the data?
I'm a firm believer in the ethos espoused by David Kirkaldy, a man who
administered a15m-long 115 tonne testing machine in London in the late
1800s (eg. he was asked by the govt. to investigate structural problems
with parts recovered from the Tay Bridge disaster in 1879). An engraved
stone tablet above his works' entrance read, "Facts Not Opinions", an
attitude which annoyed many of his peers.
> ... The W9100 is the best card, more memory, more flops etc...
The memory angle presents a problem, something I was moaning about to a
friend this week.
Tasks such as volumetric imaging (medical), GIS, defense imaging, etc.
need a lot of main RAM and clearly will be much more effective if the GPU
has lots of RAM too, but the two Viewperf12 examples show that the real
apps used for such tasks put considerable demands on the main CPU(s)
aswell, ie. in this case the single 4930K appears to be a bottleneck. If
so, then the potential performance of a card like the W9100 is being
wasted because the host system doesn't have the compute power to feed it
properly (same concept as there being no point putting a more powerful GPU
in the Red Devil budget gaming build presented on toms this week, because
the main CPU couldn't exploit it). SGI did a lot of work on these issues
20 years ago - it's why some Onyx setups needed 8+ even if only one gfx
pipe was present, because the application needed a lot of preprocessing
power, eg. the real-time visualisation of an oil rig database is a good
example I recall (this image dates from the early/mid-1990s):
http://www.sgidepot.co.uk/misc/oilrig.jpg
The proprietary oil rig data was converted to IRIS Performer for every
frame using various culling methods, giving a 10Hz update rate on Onyx
RE2, 60Hz on Onyx2 IR. The system was creating an entirely new scene graph
for every frame.
Or to put it another way, there's not much point in a card having as much
RAM as the W9100 if the system doesn't have the CPU power to drive it
properly. The 'problem' is the slow-as-mud improvements Intel has been
making with its CPUs in recent years. Although they've added more cores to
the XEON lines, some tasks need higher single-core performance, not more
cores, especially those which aren't coded to use more than 6 cores (this
has been especially painful for those still using ProE). IBM sorted this
out years ago (some very high clocks present in their Power CPUs), so why
hasn't Intel? With the same higher thermal limits followed, they ought to
be able to offer CPUs with 4 to 8 cores by now at much higher clock speeds
than are currently available. Instead, they've gone crazy with many-cores
options, but the clock rates are too low. I'm certain that many pro users
would love the option of having 4.5GHz+ CPUs with only 4 to 8 cores max.
Such a config would speed up the typical single-thread GUI used by most
pro apps aswell.
> but in a real word, i'll go for K6000. Not for the card itself but for the cuda. AMD must release
> a driver that really support openCL in they cards on windows and (in my case) linux.
Me too, though also for the driver reliability. Not that NVIDIA is immune
to driver screwups, but I've had far fewer problems with NVIDIA drivers in
general. I agree with an earlier poster who said the W9100 needs to be a
lot cheaper than the K6000 to draw away those who might otherwise buy the
latter despite the moderate price difference. In many pro markets, it's
worth spending disproportionately more in order to achieve significantly
greater reliability. I talked to someone yesterday who told me a full
render of a movie they're working on is going to take about a week on
their GPU cluster, so it's obviously important that during all that time
the GPU drivers don't do anything dumb, otherwise it's a heck of a lot of
wasted power, delay, etc.
> Other way if openCL 2 was already here (full work not bug drives) i have no doubt to go
> for W9100, best card, more memory, better price.
In a way it reminds me a bit of what used to happen with pro cards 10+
years ago, ie. vendors never developed the drivers to get the best out of
a product before they moved on to the next product (SGI's approach was
very different, but costly). Someone in a position to know told me at the
time that few cards end up offering more than about a third of what they
could really do before optimisation work on the card is halted to allow
vendor staff to move on to the next product release. And it doesn't help
that sometimes driver updates can completely ruin the performance of an
older card, eg. way back in the early 200-series NV drivers, DRV-09
performance (Viewperf 7.1.1) was really good (check the numbers on my site
for a Quadro 600 with a simple oc'd i3 550), but then at some point an
update made it collapse completely. See my Viewperf page for details (last
table on the page).
In short then, more data please! 8)
Ian.