AMD FirePro W9100 Review: Hawaii Puts On Its Suit And Tie

Status
Not open for further replies.

CodeMatias

Reputable
Mar 12, 2014
23
0
4,520
"Nvidia’s sub-par OpenCL implementation"

Right... that's why in real-world functions (rather than "perfect" functions used in benchmarks) the nvidia cards are on par with or even better than the AMD ones... What the author fails to understand is that AMD is the one with sub-par implementation of OpenCL, since half the language is missing in their drivers (and why groups like Blender and Luxrender have to drop support for most things to have the kernel compile properly). Sure the half of the language that is there is fast, but it's like driving a three wheeled ferrari!
 

Kekoh

Reputable
Mar 21, 2014
772
0
5,360
I'll be honest, I don't know anything about workstation graphics. I read this purely for knowledge. That being said, I can't help but pick up on the AMD bias in this article.
 

sha7bot

Distinguished
Dec 12, 2009
36
0
18,540
Amazing card, but I disagree with your thoughts on the price. Anyone in this segment will drop another 1k for NVIDIA's consistent reliability.

If AMD wants to take more market share from NVIDIA, it needs to lower the pricing to appeal to a larger audience and when the IT team is convincing purchasing, 1k isn't much in the long run. They need to drop there price so it's hard to pass up.
 

Shankovich

Distinguished
Feb 14, 2010
336
0
18,810
A great card to be honest. I had one sent to me by AMD and I've been tinkering with it today to run CFD software, along with some CFD code. It really sped things up a lot! Though the drivers need work however.

I only think AMD really needs to beef up that cooler. A triple slot perhaps? (make the blower two slots). That thermal ceiling is holding a lot back.
 

Jarmen Kell

Reputable
Apr 22, 2014
3
0
4,510
with this performance the W9100 really has a great value, some tests feel's like driving a fast four wheeled fully opencl accelerated Mclaren F1,nice review.
 

mapesdhs

Distinguished
The picture is incomplete though without comparing to how the Quadro would
perform when using its native CUDA for accelerating relevant tasks vs. the
FirePro using its OpenCL, eg. After Effects. Testing everything using OpenCL
is bound to show the FirePro in a more positive light. Indeed, based on the
raw specs, the W9100 ought to be a lot quicker than it is for some of the tests
(Igor, ask Chris about the AE CUDA test a friend of mine is preparing).

Having said that, the large VRAM should make quite a difference for medical/GIS
and defense imaging, but then we come back to driver reliability which is a huge
issue for such markets (sha7bot is spot on in that regard).

Ian.



 

wekilledkenny

Honorable
Jan 22, 2013
27
0
10,540
WTH is "Drawed Objects"? Even a rudimentary spell-check can catch this.

For an English irregular verb "to draw" the perfect tense is "drawn" (and the past is "drew").
For an organization claiming to be professional enough to do a review of a professional grade GPU, simple things like that can take away a lot of credibility.
 

Marcelo Viana

Honorable
Nov 25, 2013
10
0
10,510
The picture is incomplete though without comparing to how the Quadro would
perform when using its native CUDA for accelerating relevant tasks vs. the
FirePro using its OpenCL, eg. After Effects. Testing everything using OpenCL
is bound to show the FirePro in a more positive light. Indeed, based on the
raw specs, the W9100 ought to be a lot quicker than it is for some of the tests
(Igor, ask Chris about the AE CUDA test a friend of mine is preparing).

Having said that, the large VRAM should make quite a difference for medical/GIS
and defense imaging, but then we come back to driver reliability which is a huge
issue for such markets (sha7bot is spot on in that regard).

Ian.

Then put a box with 8 k6000(8 is the total of cards that the "Nvidia maximum" alow) against 4 w9100(4 is the total of cards that amd said that should put in one system).

Do you think it is fair? From the point of view of a renderfarm owner perhaps, because he dont look at a card but at a solution. Also dont forget that he have to deal with the price(8 $5K($40,000) against 4 $4K($16,000)) maybe he find that the cheaper solution isn't the faster one but maybe faster enough.

But here they put a card against a card. And for me the only way is openCL because it is open. You cant benchmark over a proprietary maner. You must use a tool that both contenders can read.
And yes NVidia dont give a shit to openCL, and i understand why, but i dont think it's wise. time will tell.
 

mapesdhs

Distinguished
Marcelo Viana writes:
> Then put a box with 8 k6000(8 is the total of cards that the "Nvidia maximum" alow) ...

You'd need to use a PCIe splitter to do that. Some people do this for sure, eg. the guy
at the top of the Arion table is using seven Titans, but PCIe splitters are expensive, though
they do offer excellent scalability, in theory up to as many as 56 GPUs per system using
8-way splitters on a 7-slot mbd such as an Asrock X79 Extreme11 or relevant server board.


> Do you think it is fair? ...

Different people would have varying opinions. Some might say the comparison should be based on a fixed
cost basis, others on power consumption or TCO, others on the number of cards, others might say 1 vs. 1
of the best from each vendor. Since uses vary, an array of comparisons can be useful. I value all data points.
Your phrasing suggests I would like to see a test that artifically makes the NVIDIA card look better, which is
nonsense. Rather, atm, there is a glaring lack of real data about how well the same NVIDIA card can run a
particular app which supports both OpenCL and CUDA; if the CUDA performance from such a card is not
sufficiently better than the OpenCL performance for running the same task, then cost/power differences
or other issues vs. AMD cards could mean an AMD solution is more favourable, but without the data one
cannot know for sure. Your preferred scope is narrow to the point of useless in making a proper
purchasing decision.


> But here they put a card against a card. And for me the only way is openCL because it is open. ...

That's ludicrous. Nobody with an NVIDIA card running After Effects would use OpenCL for GPU acceleration.


> ... You must use a tool that both contenders can read.

Wrong. What matters are the apps people are running. Some of them only use OpenCL, in which case
sure, run OpenCL tests on both cards, I have no problem with that. But where an NVIDIA card can offer
CUDA to a user for an application then that comparison should be included aswell. To not do so is highly
misleading.

Otherwise, what you're saying is that if you were running AE with a bunch of NVIDIA cards then
you'd try to force them to employ OpenCL, a notion I don't believe for a microsecond.

Now for the apps covered here, I don't know which of them (if any) can make use of CUDA
(my research has been mainly with AE so far), but if any of them can, then CUDA-based
results for the relevant NVIDIA cards should be included, otherwise the results are not a
true picture of available performance to the user.

Atm I'm running my own tests with a K5000, two 6000s, 4000, 2000 and various gamer cards,
exploring CPU/RAM bottlenecks.


Btw, renderfarms are still generally CPU-based, because GPUs have a long way to go before they can
cope with the memory demands of complex scene renders for motion pictures. A friend at SPI told me one
frame can involve as much as 500GB of data, which is fed across their renderfarm via a 10GB/sec SAN. In
this regard, GPU acceleration of rendering is more applicable to small scale work with lesser data/RAM
demands, not for large productions (latency in GPU clusters is a major issue for rendering). The one
exception to this might be to use a shared memory system such as an SGI UV 2 in which latency is no
longer a factor even with a lot of GPUs installed, and at the same time one gains from high CPU/RAM
availability, assuming the available OS platform is suitable (though such systems are expensive).

Ian.

 

Marcelo Viana

Honorable
Nov 25, 2013
10
0
10,510
good answer mapesdhs, and i agree with almost everything you posted, but yet i think you didn't got what i meant to explain in my replay.
You saying that the point of view must be based on software people use. Of course i'll make my decision to or not to buy a card on the software i use. I totally agree with you on that (if it's what you mean), but benchmark is another, completely different thing.

"You must use a tool that both contenders can read." isn't a wrong statement. My thing is render so i'll keep on that: I-Ray is a software to render on GPU, but use only cuda (unable to do this benchmark) VRay-RT is another software that can render on cuda and on openCL (still unable to do this benchmark unless you use openCL only).
If you gonna benchmark not the cards, but this two software ok, you can use a Nvidia card and benchmark this two software on cuda, and even that the card can read cuda and openCL, you must not use openCL, because one of the contenders(I-Ray) cannot read openCL.
In other way if you decide to use the software VRay-RT you can use a Nvidia card and benchmark using cuda and openCL to see what is better, but you can't use AMD card on that.

Perhaps, outside of benchmark world of course i can use Nvidia card, AMD card, I-Ray, Vray-RT, whatever i want. But on this review they do benchmark to compare two cards for god's sake.
Benchmark means: a software common to contenders to judge this contenders.

I hope you understand the meaning of my post this time.
In time: i understood your point of view and i agree with that, except benchmark.
 

Marcelo Viana

Honorable
Nov 25, 2013
10
0
10,510
good answer mapesdhs, and i agree with almost everything you posted, but yet i think you didn't got what i meant to explain in my replay.
You saying that the point of view must be based on software people use. Of course i'll make my decision to or not to buy a card on the software i use. I totally agree with you on that (if it's what you mean), but benchmark is another, completely different thing.

"You must use a tool that both contenders can read." isn't a wrong statement. My thing is render so i'll keep on that: I-Ray is a software to render on GPU, but use only cuda (unable to do this benchmark) VRay-RT is another software that can render on cuda and on openCL (still unable to do this benchmark unless you use openCL only).
If you gonna benchmark not the cards, but this two software ok, you can use a Nvidia card and benchmark this two software on cuda, and even that the card can read cuda and openCL, you must not use openCL, because one of the contenders(I-Ray) cannot read openCL.
In other way if you decide to use the software VRay-RT you can use a Nvidia card and benchmark using cuda and openCL to see what is better, but you can't use AMD card on that.

Perhaps, outside of benchmark world of course i can use Nvidia card, AMD card, I-Ray, Vray-RT, whatever i want. But on this review they do benchmark to compare two cards for god's sake.
Benchmark means: a software common to contenders to judge this contenders.

I hope you understand the meaning of my post this time.
In time: i understood your point of view and i agree with that, except benchmark.
 

mapesdhs

Distinguished
Marcelo Viana writes:
> You saying that the point of view must be based on software people use. ...

From the point of view of making a purchasing decision, yes, but I understand
the appeal of general benchmarking for its own sake, I do a lot of that myself.
Every data point helps. I don't agree with restricting the scope of a test
though just because not all contenders support a particular function or feature.
It's been common practice for years for sites to present benchmark results which
are only relevant to one particular type of product, be it a GPU type, CPU or
mbd issue, etc. Otherwise it'd be like saying that a mbd review shouldn't include
any USB3 results if even just one mbd in the lineup didn't have USB3 functionality;
people would still want to know how it fares on the boards which do though, and
that's what I'm getting at: I'd like to know how NVIDIA cards perform where it's
possible to use CUDA instead of OpenCL for those tasks which can use both.
Perfectly reasonable expectation IMO. Recent reviews looking at Mantle are a good
example; nobody would suggest that comparisons to NVIDIA cards shouldn't be done
merely because it's something NVIDIA cards don't support.

Not including CUDA results just because AMD cards don't support it is madness.


> If you gonna benchmark not the cards, but this two software ok, you can use

My point is simply this: if a card supports both APIs, then results for both
should be given, otherwise any conclusion is at best misleading or at worst
may be just plain wrong.


> In other way if you decide to use the software VRay-RT you can use a Nvidia
> card and benchmark using cuda and openCL to see what is better, but you can't
> use AMD card on that.

Of course, it depends on the task. That's what I meant about it being application
specific. However, this article allows one to infer conclusions about the cards
being tested which may be completely wrong for some other task. Without even one
example to which one can compare, how can one know? I still don't know how any
particular NVIDIA card performs for the same task when using OpenCL vs. CUDA,
because sites don't test it, which is annoying. All this article allows me to
infer with any certainty is that, on purely performance grounds, an NVIDIA card
is generally not the best option for OpenCL, but then that's been known for
years now, it's not new information. Dozens of existing reviews show this again
and again, but it's not really all that useful for someone who has an NVIDIA
card and is using it for a task that can use CUDA, such as AE. Indeed, this is
the perfect example: if someone is running AE on a system with a couple of
780Tis (CUDA-based RayTrace3D function), would the rendering be faster with an
OpenCL-based W9100? Nothing in this article helps one answer this question.

For the AMD cards, I can only come away with the same opinion I have for most
previous releases, namely that they're not as fast as on-paper specs would suggest.


NOTE: Viewperf 12 is bottlenecked by CPU power, in which case the true potential
of all the cards might be unrealised with just a stock speed 4930K (hard to
know if the subtests would use more than 6 cores if available). It would be wise
to run the tests again with a proper dual-socket XEON system, see what happens,
or at least compare to the 4930K running at 4.8. Maybe the tests are clock-limited,
but atm we're running blind. I've been testing with a 5GHz 2700K, will test soon
with a 3930K (stock vs. oc'd) and other configs (dual-XEON X5570 Dell T7500). See:

http://www.sgidepot.co.uk/misc/viewperf.txt


To the author: there's a typo in both SiSoft diagrams on page 5, it says Sabdra
instead of Sandra.

Ian.

 

Marcelo Viana

Honorable
Nov 25, 2013
10
0
10,510
Well mapesdhs, one can do what you propose. For example: using Vray-RT and see how long it take to render a scene on cuda(K6000) and see how long it take to do the same on openCL(W9100). It won't be a benchmark but will give some numbers(results). And that numbers could be very dangerous for normal readers.

Render is my thing, i have much knowledge about this specific task. It means that i can read this numbers in a very mature way that on without that knowledge can.
If a Nvidia card show a time of 1minute when AMD card show a time of 10minutes to do the same task, even so i can't claim Nvidia faster. My knowledge lead me to think that perhaps a software is tuned to use all the resource of cuda and isn't mature enough to use all the resources of openCL for example. It should not happens in a benchmark because the code will be the same for both.
See how dangerous is a test like this?

I still figuring out why most of the sites i read claim the W9100 a clear winner when here it is not.
As a example in this sites K6000 leads on Catia but on little margin, here the same result but by a large margin.
In other sites almost every one claim W9100 leading by large margin on Maya and Solidworks, but here Maya lead by narrow margin and solidworks is leading by K6000.
Isn’t easy to read a benchmark, imagine without it?
But it of course could be done to help a fill that are able to read the test in a professional way.
Cheers.
 

mapesdhs

Distinguished
Marcelo Viana writes:
> Well mapesdhs, one can do what you propose. For example: using Vray-RT and see how long it take to render
> a scene on cuda(K6000) and see how long it take to do the same on openCL(W9100). ...

That's not really what I want to know. :D I'd want to know how it compares for CUDA vs. OpenCL on just the K6000.
That will show whether a CUDA implementation of the same problem, at least for that particular application, is more
effective, and that helps make better use of knowing how the same OpenCL test runs on an AMD card.

Btw, how a card processes OpenCL may not be the same at all between different brands, plus of course
OpenCL isn't fully implemented on consumer cards.

"Dangerous" isn't relevant. I seek information I'm not being given, without which one cannot form a full picture of
what is going on.

Note that I'm perfectly familiar with rendering concepts aswell; check my SGI site. ;) I've been doing benchmark
research of all kinds for more than 20 years, eg. here's some of my old work from way back:

http://www.sgidepot.co.uk/r10kcomp.html


As for CATIA, see my page ref, many of these tests are CPU limited, so it could be that a stock 4930K
is holding both cards back, hard to say without further tests. A 5GHz 2700K definitely bottlenecks
many of the tests, especially Energy & Medical.

Ian.

 

Marcelo Viana

Honorable
Nov 25, 2013
10
0
10,510
mapesdhs writes:
“"Dangerous" isn't relevant.”
It's relevant for common readers, and this site is full of. But of course you clear are not.

Very nice site mapesdhs, lots of information, thank you for sharing.

As for Catia i totally agree with you something must holding back. Whish i could have the cards to do it myself.
But i have to confess i don't know yet how to make the tests in a way to avoid the limitations that others did like you have pointed.
mapesdhs writes:
“That's not really what I want to know. I'd want to know how it compares for CUDA vs. OpenCL on just the K6000.
That will show whether a CUDA implementation of the same problem, at least for that particular application, is more
effective, and that helps make better use of knowing how the same OpenCL test runs on an AMD card.”

Very interesting point, and at least a diferent way to go. “Dig, dig, dig is always the way”.

For me, with all the informations that i get so far (not clear, just a opinion) The W9100 is the best card, more memory, more flops etc... but in a real word, i'll go for K6000. Not for the card itself but for the cuda. AMD must release a driver that really support openCL in they cards on windows and (in my case) linux.
Other way if openCL 2 was already here (full work not bug drives) i have no doubt to go for W9100, best card, more memory, better price.
As i said just a opinion.
 

mapesdhs

Distinguished
Marcelo Viana writes:
> It's relevant for common readers, and this site is full of. But of course you clear are not.

Careful, if my head gets any bigger it'll need its own post code... :}


> Very nice site mapesdhs, lots of information, thank you for sharing.

Thanks! Most welcome.


> As for Catia i totally agree with you something must holding back. Whish i could have the cards to do it myself.

Hopefully I'll be able to work out something about this when I've tested
using my other systems, though not an ideal comparison - for that I'd need
a newer 2-socket XEON system using two CPUs with 6+ cores each (can't see
that happening any time soon, too expensive; it was painful enough
building up the Dell T7500 from scratch).


> Very interesting point, and at least a diferent way to go. “Dig, dig, dig is always the way”.

Thing is though, even if I did find out what I mentioned above, the
conclusion may only be valid for that particular application and that
particular GPU. Let's say just as an example that the CUDA version of a
task is 30% quicker than an OpenCL implementation of the same task; can
one infer from this that a CUDA version of any task will be 30% faster?
Certainly not. Alas, though it would be great to have a range of data
points on this, atm there isn't even one example one can examine to gain
some insight into any API efficiency differences. There's an assumption
(and maybe it's valid) that given the option, CUDA is the better choice,
but where's the data?

I'm a firm believer in the ethos espoused by David Kirkaldy, a man who
administered a15m-long 115 tonne testing machine in London in the late
1800s (eg. he was asked by the govt. to investigate structural problems
with parts recovered from the Tay Bridge disaster in 1879). An engraved
stone tablet above his works' entrance read, "Facts Not Opinions", an
attitude which annoyed many of his peers.


> ... The W9100 is the best card, more memory, more flops etc...

The memory angle presents a problem, something I was moaning about to a
friend this week.

Tasks such as volumetric imaging (medical), GIS, defense imaging, etc.
need a lot of main RAM and clearly will be much more effective if the GPU
has lots of RAM too, but the two Viewperf12 examples show that the real
apps used for such tasks put considerable demands on the main CPU(s)
aswell, ie. in this case the single 4930K appears to be a bottleneck. If
so, then the potential performance of a card like the W9100 is being
wasted because the host system doesn't have the compute power to feed it
properly (same concept as there being no point putting a more powerful GPU
in the Red Devil budget gaming build presented on toms this week, because
the main CPU couldn't exploit it). SGI did a lot of work on these issues
20 years ago - it's why some Onyx setups needed 8+ even if only one gfx
pipe was present, because the application needed a lot of preprocessing
power, eg. the real-time visualisation of an oil rig database is a good
example I recall (this image dates from the early/mid-1990s):

http://www.sgidepot.co.uk/misc/oilrig.jpg

The proprietary oil rig data was converted to IRIS Performer for every
frame using various culling methods, giving a 10Hz update rate on Onyx
RE2, 60Hz on Onyx2 IR. The system was creating an entirely new scene graph
for every frame.


Or to put it another way, there's not much point in a card having as much
RAM as the W9100 if the system doesn't have the CPU power to drive it
properly. The 'problem' is the slow-as-mud improvements Intel has been
making with its CPUs in recent years. Although they've added more cores to
the XEON lines, some tasks need higher single-core performance, not more
cores, especially those which aren't coded to use more than 6 cores (this
has been especially painful for those still using ProE). IBM sorted this
out years ago (some very high clocks present in their Power CPUs), so why
hasn't Intel? With the same higher thermal limits followed, they ought to
be able to offer CPUs with 4 to 8 cores by now at much higher clock speeds
than are currently available. Instead, they've gone crazy with many-cores
options, but the clock rates are too low. I'm certain that many pro users
would love the option of having 4.5GHz+ CPUs with only 4 to 8 cores max.
Such a config would speed up the typical single-thread GUI used by most
pro apps aswell.


> but in a real word, i'll go for K6000. Not for the card itself but for the cuda. AMD must release
> a driver that really support openCL in they cards on windows and (in my case) linux.

Me too, though also for the driver reliability. Not that NVIDIA is immune
to driver screwups, but I've had far fewer problems with NVIDIA drivers in
general. I agree with an earlier poster who said the W9100 needs to be a
lot cheaper than the K6000 to draw away those who might otherwise buy the
latter despite the moderate price difference. In many pro markets, it's
worth spending disproportionately more in order to achieve significantly
greater reliability. I talked to someone yesterday who told me a full
render of a movie they're working on is going to take about a week on
their GPU cluster, so it's obviously important that during all that time
the GPU drivers don't do anything dumb, otherwise it's a heck of a lot of
wasted power, delay, etc.


> Other way if openCL 2 was already here (full work not bug drives) i have no doubt to go
> for W9100, best card, more memory, better price.

In a way it reminds me a bit of what used to happen with pro cards 10+
years ago, ie. vendors never developed the drivers to get the best out of
a product before they moved on to the next product (SGI's approach was
very different, but costly). Someone in a position to know told me at the
time that few cards end up offering more than about a third of what they
could really do before optimisation work on the card is halted to allow
vendor staff to move on to the next product release. And it doesn't help
that sometimes driver updates can completely ruin the performance of an
older card, eg. way back in the early 200-series NV drivers, DRV-09
performance (Viewperf 7.1.1) was really good (check the numbers on my site
for a Quadro 600 with a simple oc'd i3 550), but then at some point an
update made it collapse completely. See my Viewperf page for details (last
table on the page).


In short then, more data please! 8)

Ian.

 
Status
Not open for further replies.