OpenCL In Action: Post-Processing Apps, Accelerated

Guest · Feb 2, 2012

We've been bugging AMD for years now, literally: show us what GPU-accelerated software can do. Finally, the company is ready to put us in touch with ISVs in nine different segments to demonstrate how its hardware can benefit optimized applications.

OpenCL In Action: Post-Processing Apps, Accelerated : Read more

DjEaZy · Feb 2, 2012

... OpenCL FTW!!!

amuffin · Feb 2, 2012

Will there be an open cl vs cuda article comeing out anytime soon? :ange:

Guest · Feb 2, 2012

Hmmm...how do I win a 7970 for OpenCl tasks?

deanjo · Feb 2, 2012

[citation][nom]DjEaZy[/nom]... OpenCL FTW!!![/citation]

Your welcome.

--Apple

bit_user · Feb 2, 2012

[citation][nom]amuffin[/nom]Will there be an open cl vs cuda article comeing out anytime soon?[/citation]At the core, they are very similar. I'm sure that Nvidia's toolchain for CUDA and OpenCL share a common backend, at least. Any differences between versions of an app coded for CUDA vs OpenCL will have a lot more to do with the amount of effort spent by its developers optimizing it.

bit_user · Feb 2, 2012

Fun fact: President of Khronos (the industry consortium behind OpenCL, OpenGL, etc.) & chair of its OpenCL working group is a Nvidia VP.

Here's a document paralleling the similarities between CUDA and OpenCL (it's an OpenCL Jump Start Guide for existing CUDA developers):

NVIDIA OpenCL JumpStart Guide

I think they tried to make sure that OpenCL would fit their existing technologies, in order to give them an edge on delivering better support, sooner.

deanjo · Feb 2, 2012

[citation][nom]bit_user[/nom]I think they tried to make sure that OpenCL would fit their existing technologies, in order to give them an edge on delivering better support, sooner.[/citation]

Well nvidia did work very closely with Apple during the development of openCL.

nevertell · Feb 2, 2012

At last, an article to point to for people who love shoving a gtx 580 in the same box with a celeron.

JPForums · Feb 2, 2012

In regards to testing the APU w/o discrete GPU you wrote:

However, the performance chart tells the second half of the story. Pushing CPU usage down is great at 480p, where host processing and graphics working together manage real-time rendering of six effects. But at 1080p, the two subsystems are collaboratively stuck at 29% of real-time. That's less than half of what the Radeon HD 5870 was able to do matched up to AMD's APU. For serious compute workloads, the sheer complexity of a discrete GPU is undeniably superior.

While the discrete GPU is superior, the architecture isn't all that different. I suspect, the larger issue in regards to performance was stated in the interview earlier:

TH: Specifically, what aspects of your software wouldn’t be possible without GPU-based acceleration?

NB: ...you are also solving a bandwidth bottleneck problem. ... It’s a very memory- or bandwidth-intensive problem to even a larger degree than it is a compute-bound problem. ... It’s almost an order of magnitude difference between the memory bandwidth on these two [CPU/GPU] devices.

APUs may be bottlenecked simply because they have to share CPU level memory bandwidth.

While the APU memory bandwidth will never approach a discrete card, I am curious to see whether overclocking memory to an APU will make a noticeable difference in performance. Intuition says that it will never approach a discrete card and given the low end compute performance, it may not make a difference at all. However, it would help to characterize the APUs performance balance a little better. I.E. Does it make sense to push more GPU muscle on an APU, or is the GPU portion constrained by the memory bandwidth?

In any case, this is a great article. I look forward to the rest of the series.

Guest · Feb 2, 2012

What about power consumption? It's fine if we can lower CPU load, but not that much if the total power consumption increase.

DjEaZy · Feb 2, 2012

[citation][nom]deanjo[/nom]Your welcome.--Apple[/citation]
... not just apple... ok, they started, but it's cross platform...

mayankleoboy1 · Feb 2, 2012

looking forward to this 9 part series

salgado18 · Feb 2, 2012

Ever since AMD announced the Fusion concept, I understood that is what they had in mind. And that's the reason I believe AMD is more in the right track than Intel, despite looking like the opposite is true. Just imagine if OpenCL is widely used, and look at the APU-only benchmarks versus the Sandy Bridge.

Of course, Intel has the resources to play catch-up real quick, or, if they want, just buy nVidia. (the horror!)

Really looking forward to the other parts of this article!

deanjo · Feb 2, 2012

[citation][nom]DjEaZy[/nom]... not just apple... ok, they started, but it's cross platform...[/citation]

Umm, ya pretty much "just apple" from creation to the open standard proposal to the getting it of it accepted, to the influencing of the hardware vendors to support it. Apple designed it so that it would be crossplatform to begin with, that was kind of the whole idea behind it.

memadmax · Feb 2, 2012

Since memory sharing seems to be a bottleneck. Why not incorporate two separate memory controllers each with their own lane to separate ram chips. Imagine being able to upgrade ur VRAM with a chip upgrade like back in the old days.

Guest · Feb 2, 2012

Glad to see AMD hit it this time....

Th-z · Feb 2, 2012

William, on page "Benchmark Results: ArcSoft Total Media Theatre SimHD". After enabling GPU acceleration, most actually have their CPU utilizations increased. It seems counter-intuitive, can you explain why?

tmk221 · Feb 2, 2012

And that is what APU should be about. Graphics cores should accelerate cpu cores. I just hope that more and more apps will take advantage of gpu cores.

razor512 · Feb 3, 2012

Please label the X axis on the graphs. The numbers do not mean much if we do not know what they are referring to.

bit_user · Feb 3, 2012

[citation][nom]JPForums[/nom]APUs may be bottlenecked simply because they have to share CPU level memory bandwidth.[/citation]Not just the sharing, but less overall.

I am curious to see whether overclocking memory to an APU will make a noticeable difference in performance.

I'm sure it would, in most cases. Memory usage often depends on the type of workload and the kinds of memory optimizations done by the developers. Since discrete GPUs typically have so much bandwidth, they will tend not to optimize for lower-bandwidth APUs. Furthermore, in most cases there's only so much a developer can do to work around memory bandwidth limitations.

Memory bandwidth is the biggest drawback of APUs. It's the reason I don't see the GPU add-in card disappearing anytime soon. At least, not until the industry closes the gap between CPU and GPU memory speeds.

alchemist07 · Feb 3, 2012

In the recent financial analyst day, one of their leads for the heterogenous system architecture was saying that APU should have a benefit over discrete GPU, there is an ADVANTAGE in sharing the same memory space since the GPU and CPU now write to the same space and don't need to transfer data between each others memory space. I would guess that is why they don't want separate memory for the GPU.

The only drawback is that the memory bandwidth/speed is limited by the CPU tech. When will we see GDDR5 memory speeds for CPU!? When this problem is solved the APU should give you an edge over discrete cards with the same number of cores.

gc9 · Feb 3, 2012

[citation][nom]Razor512[/nom]Please label the X axis on the graphs. The numbers do not mean much if we do not know what they are referring to.[/citation]
I agree that the graphs on page 7 (TMT) were not well explained. I didn't understand what "CPU Utilization" meant, or why you would want low, not high, utilization numbers. I momentarily thought if you have low utilization, that means you spent too much on your workstation hardware, and you should try for just under 100%. I changed my mind on the next page.

The next page (vReveal) cleared that up, but is also a little confusing:

We’re going to examine our data in terms of CPU utilization, measuring system impact, as well as render speed. Rather than indicate frames per second (which pegs at 30 and stays there, telling us very little), vReveal spits back a percentage of real-time at which a render job is operating.

Those sound like two related numbers from two different ways of running the software: Either

A. run it in real time, 30 frames per second, using say 50% of the CPU.
OR
B. run it as fast as possible, so the processing video takes a time that is 50% of the video length in time.

But the next sentence seems wrong:

For instance, if a one-minute video clip is rendering at 50%, the render job takes two minutes to complete.

Shouldn't it be 30 seconds?

Or is this a third way of running the software,
C. running it slow motion ?

Another possible error/typo on page 10 (vReveal on A8):

Sure enough, we see the non-accelerated 480p test shows that the FX-8150 enjoys 13% lower CPU utilization compared to the FX, while the 1080p clip lets the FX cruise around at 22% lower utilization.

Maybe the FX should be A8 ?

For many graphs, it was annoying that the text made comparisons between the different configurations, but the graphs did not. The reader has to switch back and forth between pages to make the same comparisons.

Great to see Tom's Hardware take on the topic of OpenCL and APU compute, however. Looking forward to future compute articles.

antilycus · Feb 3, 2012

@ apple lovers

Apple isn't the savior on this one. Linux is. Apple is all about using closed systems just as much as MS (itunes, carbon, cocoa, blah blah blah). Apple wants you to buy apple and upgrade your apple product every year. The reason Apple doesn't like Flash? Because Apple can't control it. If you want open standards check out Debian or Ubuntu, because Apple, while popular, is only going to kill open standards, so you have to buy their product next time you empty your wallet.

korogui · Feb 3, 2012

I would like to see some SVPMark benchmark running on those A8 APUs.

SVP uses some avisynth modules within MPC Homecinema/KMPlayer/Pot Player to do Motion Interpolation on your videos. The result with a decent CPU/GPU is better than most "120hz True Motion/TrimensionDNM/Cinemasmooth" thing.

ww.svp-team.com

OpenCL In Action: Post-Processing Apps, Accelerated

Guest

Guest

Distinguished

Illustrious

Guest

Guest

Distinguished

Titan

Titan

Distinguished

Distinguished

Distinguished

Guest

Guest

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Guest

Guest

Distinguished

Distinguished

Distinguished

Titan

Distinguished

Distinguished

Distinguished

Distinguished

Share this page