NightAntilli
Honorable
Let me explain some things regarding async, because people here seem to be completely lost... As soon as async is used properly, AMD will have the advantage. There is no question regarding this. It is basically a free performance increase, so, for it not to be used seems to be quite the waste, and that's an understatement. I don't think developers wanting to use async will be a problem. They've been talking about it for a long time, and it's regularly being implemented for consoles using GCN. To port it to AMD's GCN on PC would not be a huge task. The porting itself is harder than making the already implemented async to work. There are also multiple games to be released this year that will be using it.
Wouldn't be the first time though, that an ATi/AMD technology gets ignored because of nVidia's power in the gaming industry. When DX10.1 gave almost free anti-aliasing, it was ignored because nVidia hardware couldn't handle it. Even worse, they removed it from a game;
http://www.anandtech.com/show/2549/7
Back to async...
To do async like it's supposed to, nVidia requires a preemption context switch. To explain what that means, I have to highlight some other things first so that you people can understand what's going on.
When people are talking about AMD's async compute, what they actually mean is that graphics/shader tasks and compute tasks are processed in parallel, AND they are at the same time processed in a 'random' order. The latter part is not exactly accurate, but it makes it easy to understand. Some tasks (no matter if they're of a computing or graphical nature) are long, and some are short, and what I mean by processing in this random order is, that you can basically insert the short tasks in between the long tasks avoiding the GPU from idling. The GCN compute units can handle all the long + the short graphics/shader tasks AND the long + the short compute tasks mixed within each other like a soup. All the long, short, graphics/shader, and compute tasks are interchangeable with each other to be processed within AMD's compute units. This blending makes the efficiency of the GPU go very high.
nVidia's hardware cannot do this in the same way. They can handle either the mixing of long and short graphics/shader tasks, OR handle the mixing of long and short compute tasks. This is what we mean when we say it requires a context switch. You have to keep switching between graphics/shaders and compute tasks. This is obviously less efficient than AMD's hardware solution. And yes, being able to blend the short and long graphics/shader tasks is more efficient than doing them in order. Same for compute. But a context switch is costly. If you're doing async graphics now, you have to basically throw out your whole bowl of graphics soup to create a new compute soup, and this causes delay. What you gain by running the graphics/shaders soup and the compute soup separately in an asynchronous manner, is lost by having to switch between them.
Obviously, nVidia is claiming it can do Async, and they would not be lying necessarily, but, it's completely different than AMD's, and well, it's borderline useless for performance gains. And they will not be admitting this, because they advertised their graphics cards as superior for DX12, due to being capable of DX12.1, even though it doesn't exist. And they would get some backlash if it turns out that some graphics cards from 2011 (GCN 1.0) do some things (like async) better under DX12 than their "DX12.1" 2015 cards.
I hope this clarifies some things... Ashes of the Singularity is representative of DX12 performance + async. And I hope you understand now that it is futile to hope for nVidia's performance to be anywhere near AMD's performance when async is used. The elimination of the CPU overhead issue under DX11 with DX12 already gave AMD's cards a huge boost. Add in async, and the only card that can maybe compete is the 980 Ti to the Fury X. Within all other price points, AMD's cards will smoke nVidia's under DX12 + async.
nVidia has admitted that their preemption problem is still a long way from being solved. Here, on page 23 of their 2015 GDC presentation it is stated;
http://www.reedbeta.com/talks/VR_Direct_GDC_2015.pdf
That's why I'm saying that Pascal won't fare any better, and Polaris will be the cards to go for. Especially since Polaris will be fixing its front end to eliminate the DX11 issues that current GCN cards face. Combine that with the async benefits, and it's a no-brainer. That is of course, if Polaris indeed delivers what it promised.
Wouldn't be the first time though, that an ATi/AMD technology gets ignored because of nVidia's power in the gaming industry. When DX10.1 gave almost free anti-aliasing, it was ignored because nVidia hardware couldn't handle it. Even worse, they removed it from a game;
http://www.anandtech.com/show/2549/7
Back to async...
To do async like it's supposed to, nVidia requires a preemption context switch. To explain what that means, I have to highlight some other things first so that you people can understand what's going on.
When people are talking about AMD's async compute, what they actually mean is that graphics/shader tasks and compute tasks are processed in parallel, AND they are at the same time processed in a 'random' order. The latter part is not exactly accurate, but it makes it easy to understand. Some tasks (no matter if they're of a computing or graphical nature) are long, and some are short, and what I mean by processing in this random order is, that you can basically insert the short tasks in between the long tasks avoiding the GPU from idling. The GCN compute units can handle all the long + the short graphics/shader tasks AND the long + the short compute tasks mixed within each other like a soup. All the long, short, graphics/shader, and compute tasks are interchangeable with each other to be processed within AMD's compute units. This blending makes the efficiency of the GPU go very high.
nVidia's hardware cannot do this in the same way. They can handle either the mixing of long and short graphics/shader tasks, OR handle the mixing of long and short compute tasks. This is what we mean when we say it requires a context switch. You have to keep switching between graphics/shaders and compute tasks. This is obviously less efficient than AMD's hardware solution. And yes, being able to blend the short and long graphics/shader tasks is more efficient than doing them in order. Same for compute. But a context switch is costly. If you're doing async graphics now, you have to basically throw out your whole bowl of graphics soup to create a new compute soup, and this causes delay. What you gain by running the graphics/shaders soup and the compute soup separately in an asynchronous manner, is lost by having to switch between them.
Obviously, nVidia is claiming it can do Async, and they would not be lying necessarily, but, it's completely different than AMD's, and well, it's borderline useless for performance gains. And they will not be admitting this, because they advertised their graphics cards as superior for DX12, due to being capable of DX12.1, even though it doesn't exist. And they would get some backlash if it turns out that some graphics cards from 2011 (GCN 1.0) do some things (like async) better under DX12 than their "DX12.1" 2015 cards.
I hope this clarifies some things... Ashes of the Singularity is representative of DX12 performance + async. And I hope you understand now that it is futile to hope for nVidia's performance to be anywhere near AMD's performance when async is used. The elimination of the CPU overhead issue under DX11 with DX12 already gave AMD's cards a huge boost. Add in async, and the only card that can maybe compete is the 980 Ti to the Fury X. Within all other price points, AMD's cards will smoke nVidia's under DX12 + async.
nVidia has admitted that their preemption problem is still a long way from being solved. Here, on page 23 of their 2015 GDC presentation it is stated;
http://www.reedbeta.com/talks/VR_Direct_GDC_2015.pdf
That's why I'm saying that Pascal won't fare any better, and Polaris will be the cards to go for. Especially since Polaris will be fixing its front end to eliminate the DX11 issues that current GCN cards face. Combine that with the async benefits, and it's a no-brainer. That is of course, if Polaris indeed delivers what it promised.