Will AMD cpus become better for gaming than intel with direct x12

Page 9 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
In the i3's case, you're seeing the effect of only having two cores. While it can keep up in terms of maximum FPS, the averages suffer somewhat. In the i3 case, you have a CPU that simply can't consistently keep the GPU fed at those settings, but even a small CPU performance increase (DX12) would allow it to keep up. Pretty much everything above the FX-6350 is CPU bottlenecked.

The clock rate results you just posted are more proof of this, given you get a significantly less then linear performance increase due to improving clock speeds on the CPU. This farther shows the GPU is the bottleneck capping performance.

What people seem to forget is the GPU driver is itself a limiting factor, and what you see in your The Division graphs is the effect of minor increases in CPU performance resulting in slight increases in GPU performance. In other words, you make the CPU ever so slightly faster, and you get slightly faster GPU driver performance, resulting in a slightly faster GPU result. This effect is minor though; we're talking single digit FPS gains. The primary bottleneck is on the GPU.
 
"Pretty much everything above the FX-6350 is CPU bottlenecked."

I think you meant GPU bottlenecked.

The FX-4320 has four threads, and a higher clock speed than the FX-6350, and yet, the FX-6350 is winning in both minimum and average fps, albeit by a little bit. This is an indication of multiple threads being used more efficiently than before, and that 6 threads is finally winning out over 4 threads like it should.

It's not as if a game has either the CPU or the GPU bottlenecking all the time. These benchmarks are an indication that most of the time, the CPU is not bottlenecking the GPU, but, in some cases it is, and in those cases, the fewer threads lose out against the higher amount of threads. Account for IPC, and this is the most balanced performance chart showing overall CPU power yet. Total IPC of the CPU is more important than IPC/core like most older games.
 


Pretty much this. DX12 is not changing much as far as performance goes. Weaker CPUs will see a boost, but at the end of the day, if a single CPU core is already capable of keeping a GPU fed under DX11, it will continue to do so under DX12. The ability to scale between cores changes nothing, since the total workload would be the same.

Simple analogy: If a single lane highway is capable of transporting all traffic at full speed, increasing the highway to four lanes does nothing except flush money down the toilet. Sure, you can evenly distribute the traffic across all four lanes, but since one lane was already enough to handle all the traffic, there's no real benefit aside from having each lane doing nothing a larger percentage of the time.

Taking my analogy farther, the only real advantage is adding capacity that could eventually be used sometime in the future, but as all city planners will tell you, that isn't a sure thing.
 



That response mate is seriously flawed
 
I thought it was obvious now what dx12 is about, its about feeding gcn properly and removing the dx11 single thread bottleneck in GCN, that's why we see amd cards performing like their theoretical tflops suggest they should, it was never about bulldozer it was all about gcn. If pascal is maxwell 1.1 like rumors suggest god help nvidia especially with Intel looking to licence amd graphics in the future.
 

Your argument is only from the Intel perspective, which is what makes it flawed. What if the CPU was unable to handle the total workload with a single core? It's obvious that if all cores are used properly, that it then would be able to. THAT is the whole point.

Your analogy is incomplete. For Intel, you have one lane with cars capable of driving 100 mph, and for AMD, you have one lane with cars capable of driving 50 mph, on a road that is say 10 miles long. The next car can only leave after the other has reached its destination. Obviously, if only one lane is used, the Intel will be faster, because two cars will arrive when one car of AMD has.

If the Intel has 4 lanes, and the AMD has 8 lanes, and all are used for each, it's another story. The Intel cars arrive first, but, by the time the AMD cars arrive, twice the amount arrive, nullifying the difference. THAT is why DX12 matters, because it's much easier to use all eight lanes rather than just one or two.
 
That's incorrect. This alone explains this;

Multi-threaded-Command-Buffer-Recording-Benchmark.jpg
 


Not really reinforcing your point, the game code is running on the same threads at mostly the same proportion as DirectX11, with DirectX 12 simply spreading the driver overhead load across many cores and improving the threading of the DirectX runtime by a little bit. This does help performance with AMD hardware due to their poor single threaded CPU performance and the really high overhead DirectX 11 driver on their GPUs. However, the game code itself hasn't really changed, so there still is a potential bottleneck there if the game code isn't heavily multithreaded or is designed in such a way that it gets bogged down on something because the game is waiting on one core to do something related to AI or something like that.
 


No that does not say anything. Have you ever programmed a game? There is a lot of code that goes into it for logic other than graphics. There are so many conditions the CPU has to check constantly, so many calculations to make, AI logic to calculate. Hit detection. Physics engines. All of this has nothing to do with DX12.
 


Grr I'm not talking about DX12 API, I'm talking about other APIs. The largest being the .net framework. There are more to games that graphics, so I'll leave this at that. How a rock is rendered does not affect the programming of AI, physics engines, etc. You are trying to combine two entirely separate entities - graphics and game logic.
 
Yeah, the game logic itself is it's own separate entity, and won't be affected by the DX12 API. A games functional code is more or less independent from the display API of choice.

Here's a good example of what DX12 actually does on the CPU side:

DirectX12Multithreading.png


Remember: 16ms = 60 FPS, which is your target. The actual game code gets done in about 11ms or so, but the DX12 runtime/driver extend out to 29 ms, or only about 35 FPS. Now, DX12 allows that massive render thread to be broken up into smaller units, allowing the game to reach it's 60 FPS target.

Note in this particular case this could also be accomplished by ensuring the render thread isn't on the same CPU core as the main game executable, which would accomplish the same end result without needing a new API [In this case, you'd get 55-56 FPS].

Also note that DX12 resulted in a net INCREASE in total CPU usage, since the DX12 runtime needed to be run on all cores. This could result in some cases where the "improved" threading leads to performance loss.
 

Graphics and game logic both influence the use of the CPU. If the graphics instructions from the CPU to the GPU are limited to a single thread along with the physics, AI etc, a CPU with a weak single core will be bottlenecked. If you can divide the graphical instructions across multiple threads, even if your physics & AI are limited to a single thread, there is more available on that thread for it to be rendered. This makes a previously useless CPU become viable, and depending on the amount of cores, even superior.

It's the same reason that a DX11 game 'ported' to DX12 won't really see benefits, because it's still coded with a focus on single threads. To program for multiple threads you have to build the game engine around the perspective of spreading as much instructions as possible across as many threads as possible.

@gamerk316; You just posted an image I posted a few posts before. What's your deal...? In any case, it wouldn't necessarily use more CPU due to the reason you posted. The DX12 driver is indeed divided across the cores, but, the total CPU time required is equal as per that image. The higher use of CPU is generally due to the required synchronization of the multiple threads to become one final calculated result, where one finishes earlier than the other, and they have to wait for one another to finish.
 
It's the same reason that a DX11 game 'ported' to DX12 won't really see benefits, because it's still coded with a focus on single threads. To program for multiple threads you have to build the game engine around the perspective of spreading as much instructions as possible across as many threads as possible.

Excess threading = bad.

As a general rule, you don't go out of your way to thread unless you have a reason for it, due to all the synchronization issues you can potentially run into. Generally, in games, each specific task (Audio, AI, Input, etc) is run sequentially, though each individual task can be threaded to some extent. This gives the best compromise between program complexity and performance.

Secondly, a stalled thread should take no CPU time, since the CPU should swap out the thread and run something else until it can run [unless the developer is doing the equivalent of using a while loop to check for status, or heaven forbid, a sleep statement, which he should be placed as far from actual code as possible ASAP]. Granted, this is generally bad, since your apps performance is now at the mercy of the CPU scheduler, which while OK, could use some tweaks going forward (such as taking average CPU core loading into account when assigning threads, for example).

You also have the case of multiple weak CPU cores each being assigned a high workload, where a stall on any single core could bring the application screeching to a halt because some other CPU heavy task decides it wants to run and bumps one of your threads. This is another reason why over-threading on a multitasking OS just asks for trouble.

But point being, I've seen plenty of cases where threading when it was not needed resulted in measurable performance loss. In this case, it makes sense to thread out the GPU thread, but to say "more threads = more performance" is overly simplistic.
 

I thought the point was with dx12/vulkan was that multiple cores could talk to the gpu rather than actually split the workload, afaik dx11 could already split work across multiple cores however only 1 could actually send data to the gpu which in turn means dx12 will have less stalls in the main render thread even if for example unit count increases? I honestly thought we'd see games that were not possible on dx11 due to its draw call limitations, everything so far has been meh tbh but we'll see happens in the future.
 
There is only 'excess' multi-threading when you're creating loop holes, or if the API has a limit of communication towards the hardware. Obviously multi-threading everything is nonsense and will make your coding a nightmare. 4-8 threads is the sweet spot right now. Previously developers had trouble coding two threads, so we're having huge progress since the X360/PS3 time, and now with the current consoles only more so. Not saying you're incorrect, just trying to clarify,

@con635, basically yes, that's the point. But it's not a quick switch for developers. It's quite evident when you look at performance of games. Multi-threaded games like Assassin's Creed Unity that were programmed on wide console CPUs had a bunch of problems working on DX11, and single-thread heavy games like Gears of War that were developed on a single-thread heavy CPU had a bunch of problems working under DX12.

It's a shame that people are underwhelmed by Ashes of the Singularity, because it's quite impressive. For the ones interested, watch this;
https://www.youtube.com/watch?v=OQOsbxaBVnw
 
In regards to draw calls, games optimized around that by simply dispatching a few REALLY big ones. Sure it was a hassle to manage, and yes it wasn't optimal, but it wasn't this big giant bottleneck people were proclaiming it to be.

And yes, by "split the workload", I was referring to allowing more then one thread to speak to the GPU.
 
A lot of conflicting info to be found.

In this sept 2015 article wccftech says the frostbite engine already has dx12 built in so games like bf4 would be able to use it.
http://wccftech.com/frostbite-supports-dx12-dice-technical-director/

In this oct 2015 article, dice said bf4 would NOT be getting a dx12 update.
http://bf4central.com/2015/10/battlefield-4-directx-12/

That wccftech article shows a lot different performance comparisons than other bf4 cpu benchmarks where the 8350 gets beat out by an i3 and i5.
http://www.hardwarepal.com/battlefield-4-benchmark-mp-cpu-gpu-w7-vs-w8-1/

Or this article where an i5 had first one then 2 cores disabled showing little to no difference in fps with a 290x.
http://www.bit-tech.net/hardware/graphics/2013/11/27/battlefield-4-performance-analysis/8

Why did wccftech use 2 different systems? Games drives were different, os drives were different, obvious ram had to be different. Were the os's the same or different? The amd systems says win10 pro 64 bit, the intel system says win 10 pro (which also has a 32bit mode). Not to say any of these things had a major impact but we all know that benchmarking legitimacy 101 begins with the SAME parts, not mixed builds. If the win10 version used for the intel rig were 64 bit, why exclude it? Is this the sort of lack of attention to detail the benchmarks were handled with?

My question isn't really amd vs intel at this point but what kind of 'noobz' are working for wccftech? What sort of reliable data are they or aren't they providing if they can't nail the simple things? Too many questionable things here to give them much credit, they're not new to the process so yes I expect more from them.
 
synphul, mantle works the same way as dx12 so bf4 practically was the first dx12 (low level API) game.
Mantle in this game works as described above by gamerk316 and others,it splits up only the graphics work,now you might be shocked by this but BF4 (frostbite engine) runs all it's logic on one single core* and this is what dictates the FPS you will be getting in the game** , disabling cores will influence frame times since less cores will have to do more work in the same amount of time but FPS will stay the same.

*needing at least one second thread to do the graphics preparation
**unless you are looking at a wall in close up,there the game logic hasn't anything to do so you will get as many FPS as your GPU can handle. (same for in game benches since it's all GPU load)
 


Frostbite is one of the few engines that makes use of DX11's multithreading model, so it does use helper render threads. As a result, you get a distribution that looks something like this:

Main Gain Thread: 50% on one CPU core
Main Render Thread: 60% on one CPU core
Helper Render Threads: 5% per thread (BF4 uses 10 total, based on an analysis I did some time back)
All other Threads: <1% combined load across 60+ threads

This explains why an i3 with Hperthreading can beat a 8350: The total loading is low enough where an i3 isn't bottlenecked, and the ability to do more work over the same timespan of a 8350 results in higher FPS. Yes, the i3 is more sensitive to latency issues due to high core loading [any extra work results in a core being overloaded], but it can handle to game by itself.

What DX12 is allowing is that massive Render thread to be broken up into smaller chunks, among other improvements. From a CPU perspective, the smaller render threads will help prevent individual cores from being overloaded, which will help CPUs like the i3 better manage the workload without bottlenecking. But aside from that, the new threading model by itself won't give any extra performance via threading.

This is why the i3-4330 outperforms the FX-8370 in Ashes in DX12 across all benchmark settings. Despite the scaling, the i3 isn't bottlenecked, and is simply the faster CPU. When no core is bottlenecked, the faster CPU will be faster. Pretty much simple as that.
 

DX11's multithreading is confined to the driver part,there are many games where the driver is split up into several threads.
In frostbite they just use many threads to prepare as many frames as possible so that the game is able to choose the most current version to display and never runs out of current frames,this part is way earlier than what the driver does,the driver just takes the frame that the game tells it to and does what it does to display it.
Cryengine/crysis 3 would do the same thing, boosting a number of threads to prepare frames when there is nothing game specific happening at the moment,that's why it has so extremely bad frame variance unless you bottleneck it on your GPU.
[video="https://www.youtube.com/watch?v=lEafN99Q4K4"][/video]
On a dual bf4 will not run as many threads,one main one worker and the driver on nvidia cards,on an i3 it might run 3 worker threads instead of just 1.

For most cases your thinking is correct,games will run the same amount of threads,with the same balance between them, on any CPU so in deed when no core is bottlenecked, the faster CPU will be faster.