Discussion: AMD's last hope for survival lies in the Zen CPU architecture

Page 11 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


Hard to say. The whole purpose behind DX12 is to lessen the impact that the CPU has on a GPUs performance, lower the cycles the CPU needs to do before the GPU starts pushing.

If anything it should be more like the GTX 980Ti and have very even performance across most CPUs, unless the game utilizes more cores for other functions which would give more cores and advantage.

If anything it could mean the drivers are still immature or that somehow Fiji is breaking the laws of physics.
 

Everyone should know by now that driver maturity can have a massive impact on performance on both sides of the AMD-Nvidia fence: both have shown 20-30% performance gains from driver optimizations in the past. I would not be surprised if Nvidia had some DX12 driver optimization surprises further down the pipeline.

For now though, AMD is hyping DX12 like there is no tomorrow and Nvidia is keeping pace. Current results are far too close and far too few to say AMD won.
 


The 4k results show the true difference in performance, which is about 2 FPS. Note though, that Fury does have more responsive frame times, likely due to it's faster VRAM, so it would likely be smoother despite a lower absolute FPS. At lower resolutions, the trend remains, with the 980 Ti slightly faster.

What I find more impressive is at 4k, the i3 apparently isn't CPU bottlenecked, since FPS basically doesn't budge from the i7 numbers. That basically means that both NVIDIA and AMD are purely GPU bottlenecked at 4k, regardless of the CPU used.

As for the 1080p numbers, the i3 appears bottlenecked, given that AMD and NVIDIA perform the same. This is farther evidenced by NVIDIA gaining some FPS as you move to i5/i7. Not sure why AMD is gaining FPS though; odd driver bug, or noise in the dataset? Bears watching, but isn't conclusive by itself.
 


I side with gamerk on this one. The tests showing DX12 stuff, are very murky still. Then you add driver optimization to the table and you will end up with another picture most probably.

But to answer directly. The 290X behavior is because of driver maturity and being GCN 1.1. I don't want to focus on the 290X though, even tough it does perform great compared to nVidia's 970. The simple reason is Fiji is GCN1.2+ (I'd call it 1.3, but it's not a departure from Tonga, really) and it should be MUCH better, but instead a previous gen GCN version actually keeps up with a GCN version that is on steroids. That is not how you do uArchs. Not in CPUs, not in GPUs. There is simply too much fat inside of GCN and that is why they were forced to do the internal shake up for the graphics division. They are trying to keep alive a GPU uArch that is not keeping up with what the market wants. Not a bad uArch, but not what the market wants.

Cheers!
 


That's the thing. I am suspecting that it's indeed an issue with the drivers for Fiji. It's the uncertain nature of Fiji that makes me not want to draw any conclusion on it, but, if it's indeed superior to the older GCN architectures, there's no reason as to why this should be happening other than drivers. Doesn't justify buying the Fury X right now. It could also be a huge bug or design flaw that won't be solved. In that price range, I'd say the 980 Ti is the safer buy, but I digress.
The same thing was seen in the Ashes of the Singularity benchmark. The R9 390X got an extremely huge boost surpassing even the GTX 980, but the Fiji GPUs were again very disappointing, even though they actually carry a lot of punch if you look at the specs. So the Fiji is not really the best example right now. If we take the cards with stable drivers, AMD is actually in a very good position.

The point is actually the following. Us as consumers want something to last as long as possible for as little money as possible. If people have been paying attention, and AMD's track record speaks for itself, they have the GPU architectures with the longest life. Leaving the reasons in the middle, they should be re-enforcing this in their marketing. Someone who bought an HD 7970 or similar GPU still doesn't really have to upgrade, since it's basically become an R9 280X. I personally regret not waiting for GCN and getting an HD6000 series card rather than the HD7000 card, but only due to DX12. Other than that, the card is still working great other than its 1GB memory limit. This is why I'm waiting for the next gen GPU architecture. I won't make the same mistake, of buying an architecture just as it's going to be replaced by something better. And this time it'll include a nice die shrink too :) But in any case, AMD should be enforcing the idea that their cards' architecture is future proof. They might get a temporary boost in sales this way. They need to still account that it's temporary because the longer people stay with their cards, the less of them they will sell. Or they have to go the nVidia route, enforcing more regular card upgrades in various ways.

One can argue about optimization and bias and everything, but GCN is actually still holding its own very well for such an old architecture. Even the newest consoles are using GCN rather than something else, because they saw the flexibility and depth of the performance that can be squeezed out of it, particularly with Async compute. AMD is the one that has to capitalize on this. Most people are oblivious to the strength of the GCN architecture.

The GPUs alone won't save AMD though, so Zen better have some punch. I don't think this should be a problem, but AMD has had blunders before, so...

As for an i3 not bottlenecking on 4k, not really surprising. CPUs don't really care about resolution, while the GPU is going to be taxed more at higher resolutions.
 
I'm sailing the other way that you guys. I own an Nvidia GPU and plan to upgrade to an R9 390. First off, I was definitely debating between this and the GTX 970. For one thing, I know it can draw in nearly double the power, so after some calculations I realized it'll only cost me about $1.50 maximum more a month in electricity for the 390 compared to the 970. Secondly, 8GB vRAM. This can store a lot of texture data that may otherwise need reloaded from the RAM and CPU. Third, its performance on 1440p.

I would say AMD is very situational. If you have a high quality, high-wattage power supply, as well as live in an area with low ambient temps and cheap electricity, I see AMD as a much better GPU company than Nvidia. If you have a low quality power supply, live in a hot area, or have expensive electricity, then Nvidia is the better option. But for me, I see no reason to buy a GTX 970 over the R9 390. My case has good airflow and should be able to handle the heat.

I know it's a rebrand, but I honestly don't care. If it's going to give me better FPS than a GTX 970 for the same price, and a lot more vRAM, I'm going to purchase the 390. AMD does one heck of a job with their rebrands, I'll give them that.
 

How often does a GPU with 3GB or more RAM need to reload textures from system RAM? If you look at benchmarks for a 4GB vs 8GB R9-290X with the same core and memory frequencies, the performance differences are typically less than 3% even at 4k.

By the time the GPU might have a use for more than 4GB RAM, the framerates are already deep into unplayable territory due to a combination of memory bandwidth and rendering bottlenecks.
 


It's moving in that direction though. There's already a handful of games that are using upwards of 4GB for texture data, and as the amount of GPUs on the market with more then that amount grows, the more devs will make use of it.
 

That's what people were telling me when I bought my HD 6850 and had to choose between 1GB and 2GB. Right now, the 1GB is the limit, rather than the GPU itself. If it wasn't for the memory I could be running higher settings. So... My personal rule is, for any card that's mid range or higher, the more memory the better if you want to use it for long term.
 
In regards the memory debacle. It's a simple answer if going for more or less memory (when available): Is the GPU fast enough to keep the bigger memory buffer fed? In the context of games, of course.

And in regards to the GTX970 vs 290X, at first I have to say I gave the 970 the overall victory, but with the passing time, it's a repeat of the GTX670 and 7970Ghz. I started with the GTX670, which was cheaper at the time against the 7970Ghz, but when the 7970Ghz went down I made the switch. Sold the GTX670 for basically the same price (gotta give credit to nVidia for keeping the prices up, haha) I got it and switched it for the 7970Ghz. The GTX670 was no slouch and it was great for the money, but the 7970Ghz has aged wonderfully well. So yeah. Credit due when credit owed. I'm pretty sure the GTX970 vs 290X/390 debacle will head in the same direction in a few months, if not currently there.

Cheers!
 

I doubt there is any game out there that has anywhere near 4GB worth of unique game assets loaded at any given time. If you see a game "use" 4GB of GPU memory, more than half of it is frequently used assets being duplicated across multiple memory channels to reduce the likelihood of bottlenecking individual ones by spreading memory requests more evenly.
 


Shadow of Mordor, GTA V, and Batman Arkham Knight all do, and in those titles, my 2GB version of the GTX 770 basically can't hang. I have to run LOW settings in some games, simply because of VRAM limitations.
 

And none of this contradicts what I wrote: if you have a game with 1.5GB worth of unique assets loaded into GPU memory, that leaves next to no memory available (on a 2GB card) to duplicate frequently used assets (Windows itself uses 300MB) across memory channels and if any single memory channel starts bottlenecking, it brings the whole GPU down with it.
 
Memory bandwidth is the limit in general, which is why the duplication is used. Maybe that's why the R9 390 got a huge boost in DX12 over the R9 290. The bigger memory allows more duplicates, which reduces bottlenecks? Pure speculation from my part, but would make sense I guess.
 
With DX12 using better all your cores. If AMD launch prices like this, only a no brainer will buy a I7.

200 dolars = 8 physical cores and 16 virtual cores. 3.5 Ghz - OC 4.5 ghz
300 dolars = 12 physical cores and 24 virtual cores. 3.7 ghz - OC 4.5. ghz
400 dolars = 16 physical cores and 32 virtual cores. 3.9 ghz - OC 4.5 ghz


With 16 cores and 32 virtual cores. Good bye overpriced intel products. I will accept lose 5 - 7 FPS per second gaming but get insane amount cores.
 

Yes and no. Duplication only helps you so long as the assets being duplicated get accessed frequently enough to generate a large enough volume of concurrent accesses to justify it, such as the brick patterns that may get used 1000+ times each when rendering a hallway or wall. Resource duplication would be mostly wasted memory (and memory bandwidth) on 4k textures that get used only a handful of times per scene at most, at something like 1/10th scale.
 


Zen is set for up to 8 cores with 16 threads and if it is a decent performing chip it will be priced accordingly.

If anything DX12 has shown that CPU bottlenecks will diminish as it is set to cut CPU cycles needed by quite a bit so more cores might not even help as much unless, as it is now, developers code to use the cores.

Crysis needed a dual core for one reason. The sound was coded to run on a second core while the game would run on the primary core of a CPU. Without the second core the game would crash, under perform or have no sound.
 


There were hardly any changes made between those "generations" of GCN. They're so similar calling them different generations is misleading imo. Steroids is by far an exaggeration. Furthermore, Fury basically jsut threw more cores and faster memory at it- AMD didn't bother scaling the rest of the resources up, my guess is because they were mostly making the cards to test HBM and they hyped them up calling them Fury.

Besides, once they fixed powertune in Nano, it traded blows with Maxwell in efficiency. Even with the power savings from HBM, that's still pretty damn impressive. Besides, when you say GCN has too much fat, what fat are you referring to? If they can cut more and make it more efficient, then it would have beaten Maxwell once they fixed powertune... Again, we're talking an architecture that's hardly changed since it first launched in late 2011/early 2012. They haven't done any serious restructuring like Nvidia did with Maxwell from Kepler.
 


Yeah, if anything, DX12 is going to benefit Intel's i3 and i5 lineup. It looks like there's really going to be no reason to get a CPU with that many cores, since at least in SoM, a dual core with HTT is enough to push the GPU at 4k.
 


That is precisely where my rant was aimed at.

GCN was originally created to embrace HSA compliance and go forward with a jack-of-all trades GPU arch, that's why I mean by "fat" in it. If you get only GPU centric tasks to be executed inside GCN, you can bet it will run more efficiently. The GPU division was subject to the requests of the CPU division to comply with HSA and, maybe, make it easier to integrate with CPU centric logic. That's what I gather on AMDs idea behind current GCN and it's a point I like to bring as well: they should take another look at VLIW4-based archs and see if they could make mid/mainstream cards from it.

Cheers!
 


GCN was created to fix growing pains with VLIW and modern workloads (especially compute, but not just). AMD went to VLIW4 from VLIW5 because, on average, only 3.4 cores out of every 5 in a group were doing anything, and that's in gaming workloads that VLIW is supposed to work well with. VLIW4 significantly reduced the issue from VLIW5, but that's still not enough. With more and more modern games using compute-related features, that would probably drop down to under 50% with VLIW4.

Even ignoring such features, coding games and applications to work right with a VLIW compiler is very limiting in what programmers can do. Just look at Intel's Itanium for another example of how difficult these VLIW-related compilers are to work with. There is no doubt in my mind that many of the driver pains (specifically, poor performance in many games when new) Ati and AMD have had were related to the compiler difficulties.

http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute

http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review

http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review

http://www.anandtech.com/show/9621/the-amd-radeon-r9-nano-review

Also, with Powertune working properly, the Nano caught the GTX 970 and 980 in power efficiency. Sure, it also needed HBM for that, but again, DX12 lessens the gap further and HBM doesn't make more than a 15% or so difference. AMD has plenty of issues, but GCN isn't one of them seeing as it is proven to be able to compete with Maxwell in power efficiency when it isn't being held back by inadequate Powertune control and inefficiency from higher level drivers/APIs that VLIW needs.

We already know AMD is now capable of competing with Maxwell with GCN, what we need to see is if they will manage to compete with Pascal. AMD is in a good position to get a better start than they had this generation since their primary problems are now solved. I don't think any VLIW4 derivative would be able to handle modern workloads nor be compatible with DX12 without more work than AMD can afford. Assuming that it was feasible, AMD can't afford to support both a more gaming-oriented arch at the same time as a compute-related arch. That's assuming it would make a significant impact on gaming performance, which I have my doubts on. AMD certainly can't afford to lose what little professional graphics market share they have thanks to the high profit margins there.
 


That's exactly the path amd has been taking the past 5yrs, it's not exactly working well for them. Weak cores are weak cores, 8 of them, 16 of them, 48 of them it doesn't matter. People are buying intel because with half the hardware it's still outperforming amd. They keep trying to find niche's where their cpu's can keep up by using 'all those threads' and those niches are becoming more and more scarce. That's the reason amd has continually slashed prices, they're forced to based on their performance. Not because they think the average consumer is a wonderful person who deserves some fantastic deal so amd will take the profit hit for them.

They're priced accordingly and not by choice. No company wants to slash their prices as it cuts directly into their profits. When your product doesn't live up to the original set expectations and instead only performs as well as mid to lower range options market dictates your price must follow suit or be stuck with a product that doesn't sell at all.

If (and this is a big if), zen manages to compete head to head with say the i7 realistically it will be priced within $20 or so. Don't expect i7 performance for fx 83xx prices. That's just common business practice 101. There aren't $28k ferrari's and there aren't $180k kia's for a reason. It's not speculation it's market habits and amd has already shown in the past their prices equal (or at times have exceeded) intel's - when they have a directly competing product. The past 5yrs amd prices have been on the low side and it follows suit with their performance.

Unless they improve the strength of their cpu cores which is where amd falls flat repeatedly throughout their lineup (ipc) they won't have a chip worth $200 whether it has 8 cores or not. Much less $300-400. Consoles use 8 weak cores as well and outside of running highly limited games coded specifically for those 8 cores I'd like to see how long someone would care to use one of those weak 8c cpu's in a desktop before they pitched it out the window because it's got little more horsepower under the hood than a speak and spell. Case in point that core count is highly overrated. Cell phones have quad core and higher processors in them, I surely wouldn't try to run a AAA game on one.
 

Once your reputation takes a hit from consistently under-performing under most circumstances, it takes a long time and a considerable amount of effort to prop it back up. If AMD wants to sell their chips within $20 of Intel, Zen has to succeed in narrowing the performance gap and AMD will need to continue gaining ground for the next few years.

If Zen is successful, AMD might be able to slash the pricing gaps in half.
 

Fair points indeed and thanks for them, but I still think VLIW4 had more to offer going down the road; comparing it to Itanium is kind of unfair as well. I won't dive deep into it, but VLIW was not for serial loads. Now, you're completely right in saying it would be an investment AMD can't afford right now, even if they dust out the plans from the 6900 gen. Now, compute-wise, I'll argue VLIW uArchs have their strengths, and those lie precisely in parallel workloads. I don't need to read the graphs to remember how they stand: serial like tasks were a pain to compile and optimize, but most parallel were easy and smooth to optimize for, given they were small enough (atomizable?). Like you say, with just 1 gen using VLIW they added, what was it, 20% efficiency with around 10% performance loss? That is pretty darn good if you ask me, but yes, scaling it up was not easy. That's when they introduced dual GPU cards that actually scaled amazingly (thanks to VLIW in no small portion). I don't know, maybe my memory is painting a brighter picture, but I do know and remember them doing great before the focus was on "heterogeneous compute tasks".

Now, in regards to PowerTune, yes. You're 100% correct. Most of the benefits nVidia gained were not from the uArch itself, but the tune of power management going over gens. So, when AMD got it right, it shows. Still, I think I will be proven right when nVidia puts Pascal forward using HBM. You can trust me they'll get all awards for being efficient and whatnot. If AMD doesn't tune GCN towards light compute heavy efficiency, they will trail nVidia again. They *do* need to come out with a cut-down version of GCN that will take the mid/mainstream market by storm, cause they need it. I still believe VLIW would be the best they have, but ugh, now that you mentioned it, they'll have to invest too much in order to get a second uArch and drivers and all that for such different GPUs.

Cheers!
 
Status
Not open for further replies.