Onus :
The Moderation team strongly prefers to not alter Best Answers selected by the Original Poster,
Very good that you mention that. I did not know that "Onus" was a moderator at the time of unselecting the answer, since there was no visible indication or anything to point out that he is. I refrained from choosing an answer, even though I could've chosen anti-duck's answer, if I really wanted to play the AMD vs nVidia game. I unselected the answer because the choice was changed after almost a month, and I didn't expect a moderator to do this, so, I unselected the answer without choosing a new one, which, well, you already know what happened. The rest I will send privately, in order to avoid further 'off-topic' discussions which will be pinned on me being at fault.
jimmysmitty :
And what do you have that shows that Async will benefit games? Nothing out there shows anything beyond asingle game benchmark that was co-developed by AMD.
Oh wow. Really? From your reply, I can already tell you didn't do your homework. Let me explain...
First of all, the boosts AMD got were both in Ashes and in Fable Legends, not just 'the game AMD helped develop'. In Fable the Fury cards didn't perform great, probably because of driver issues. They are faster with an i3 than with an i7. The R9 300 cards all got a much larger boost compared to nVidia.
Secondly, as I already explained clearly in another thread with
this post, AMD has a marketing deal with the publisher of Ashes of Singularity, not a development deal with the developer. In fact, Oxide (the developer) worked more with nVidia than with AMD, and this was stated in their own words.
Thirdly, we all know (I hope) that in DX11 AMD cards are driver limited while nVidia's aren't. If performance is equal in DX11, is it so preposterous to suggest that the API that will reduce driver overhead gives AMD a bigger boost and thus more longevity under DX12?
Fourthly, Async compute benefits games because it can calculate graphics and compute tasks at the same time. If your graphics calculation takes 16.7 ms and your compute task takes 16.7 ms, normally it will take 33.3 ms to calculate (33.3 ms = 30 fps). With Async compute, you can reduce this to say 20 ms, and then your framerate becomes 50 fps rather than 30 fps. Same hardware, same everything. The idle parts in the GPU are eliminated, boosting performance.
Software has been developed to test the Async compute on AMD vs nVidia. Here's a comparison;
What are we looking at here? I'll probably have to explain. First the top. You have two graphs for each card. On the left you see the 980 Ti. The graphics part and the compute part are calculated separately here. I don't remember the numbers exactly, but the compute tasks were programmed to be executed at four different loads. With the 980Ti, you clearly see this in the graph, where the higher the load, the higher the bar. This means it's taking longer for the calculation to take place.
On the Fury-X side, you see that the load doesn't matter. The result is given in the same time frame. This time frame is higher than all the compute tasks on the 980 Ti, independent of load.
On the bottom part of the top graphs you see the graphics portion. The height is the indication of time again. So again, the 980 Ti is faster in computing graphics than the Fury-X.
Now let's go to the bottom two graphs. Here, Async compute was turned on, and the GPUs need to complete both tasks. What do we see? On the 980 Ti, the new graph is pretty much the time it takes to do the graphics calculations, added to the time it takes to do the compute tasks. Async is obviously not working, because the time is supposed to be reduced with it, but we see a simple addition.
On the Fury-X, what do we see? The time it takes to do both the compute tasks and the graphics tasks are pretty much the same as just the compute tasks only of the graphs above, shooting slightly a bit higher in a few occasions. Async is obviously working, and we're getting the whole graphics calculation for 'free'. So much so, that at the highest compute load, the Fury-X performance surpassed the 980 Ti, despite both separate tasks being faster on the 980 Ti.
Is it clear now, why Async will benefit games....? If you actually want, there's a whole thread on Beyond3D with a bunch of people running the benchmarks and posting results;
DX12 Performance Discussion And Analysis Thread
jimmysmitty :
Personally I think it is all just rubbish until we see actual games. A GTX 980 doesn't support Async but does support FL 12.1 yet no AMD cards do. So what does that mean?
It means that they support conservative rasterization and raster order views in hardware. Those are two very specific tasks. Async compute is for graphics tasks and compute tasks in general, meaning it has more applications. The FL12_1 tasks can realistically be done in software also through the CPU. Async, well, you're probably going to kill the CPU trying to do it at anywhere near the same level as through the GPU.
jimmysmitty :
It is much like the draw calls. Everyone is going crazy over them yet so far they are just a big number, much like TFLOPS, that don't show any sign that they will be indicative of game performance.
The draw calls test that was released by Futuremark is how much information able to be scheduled to be sent to the graphics card for calculation. It is separate from the quickness of the calculation of the task itself by the GPU.
jimmysmitty :
Did you know that a R9 290X has the same compute performance as a GTX 980Ti? Yet the GTX 980Ti is much faster in games. So again, what does it mean?
Well, nVidia is able to make much more draw calls under DX11 compared to AMD. So basically, it means the draw calls AMD is able to make with their drivers under DX11 is insufficient to make use of all that compute performance, which is exactly why the R9 cards get that big boost under DX12.
jimmysmitty :
It means that all those big numbers that synthetic benchmarks spew out are pointless until we have a real world situation and game.
It only means this if you're unaware of what's really going on inside the hardware through the available software. It's easier to say 'we don't know' and sit on the fence, rather than putting in the time to investigate what's actually happening.
jimmysmitty :
And Maxwell is bindless, it is one of the features in FL 12.0 and in order to support a FL you have to support all the main features.
You're actually right, in the sense that I didn't specify it correctly enough. AMD cards are
fully bindless, meaning the full heap is available. nVidia cards are limited to a specific number of heaps. It's the reason AMD is considered Tier 3 in resource binding, and Maxwell2 is considered Tier 2. Not saying nVidia doesn't have any advantages. They have volumed tiled resources while AMD hasn't for example. It just so happens that AMD is missing the least important features. Look here;
Two that AMD does not have but nVidia does, are the FL12_1, and the volumed tiled resources advantage from nVidia is the Tier 3 vs Tier 2 of the tiled resources in the list. If you ignore importance and just check amount, AMD has tier 3 resource binding compared to Tier 2 of nVidia, Stencil reference value from PS which is absent in nVidia hardware, full heap UAV compared to nVidia's 64 UAV slots and Async shaders. That's still 4 vs 3 advantage on the AMD side, and that, from an architecture that's supposedly too old and is receiving complaints of being rebranded.
Even funnier is when I say that AMD is better for longevity than nVidia, and people think I'm crazy. That table over there has another evidence of this. Maxwell 1 was released in 2014, and only achieves FL 11_0, while GCN released in 2011 is FL11_1. Was I spreading 'misinformation'?