FX Vs. Core i7: Exploring CPU Bottlenecks And AMD CrossFire

Page 6 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
[citation][nom]ojas[/nom]I only wished to see more CPU dependent games, because i thought the point of the article was exploring bottlenecks...[/citation]
This seems to be where all your frustrations are coming from, and the reason reading the 7 page of posts is nearly impossible now, with all of your charts.

If you want to get a good idea of bottlenecking, you have to explore both when and when they do not bottleneck.
 
[citation][nom]Crashman[/nom]Hey, do you think there should be another article comparing these results with SLI results?[/citation]
Yup, i think that would be useful. You could add these CF results to that as well, otherwise we'd have to go back and forth to compare the charts...

[citation][nom]bystander[/nom]This seems to be where all your frustrations are coming from, and the reason reading the 7 page of posts is nearly impossible now, with all of your charts.[/citation]
Yes it is where my frustration started from, because of the intro to the article.

I've posted "my" charts on the 7th page FFS, and i think once before. If you're going to be difficult for the heck of it, you might as well just stop.

Hell, to get those charts was an effort, i had to actually remember the sources and all. Not much fun running around the web with a score of tabs open, trying to contribute something that doesn't really affect me in any way. I won't even be able to afford spending $800 on GPUs for a few years. The only reason i did that was for reasons already mentioned, and i was genuinely interested in this.

if you want, i can remove the img tags. you just have to ask nicely. :|

"If you want to get a good idea of bottlenecking, you have to explore both when and when they do not bottleneck."
Yeah, that's the point, you need to compare it with something, like 2 cards in SLI or a single GPU situation where it's clear there isn't already a bottleneck. That's the problem i could see here. I think you're under the impression i've been trying to look for ways to make AMD look better or something.
 
If this is a value comparison, why not use the 8320? I'm sure it will clock, the same or very closely to the 8350 and that would help expand the value gap.
 
[citation][nom]cobra5000[/nom]If this is a value comparison, why not use the 8320? I'm sure it will clock, the same or very closely to the 8350 and that would help expand the value gap.[/citation]The entire article a value comparison? Hardly. It's a competency evaluation.
 
Amd need to wake up as intels cpu's are at a very affordable price at the moment. I've always bought amd cpu's because of their good pricing but im afraid to say amd that an i5 3570k is in the post. Ill be sorry to leave you but what else am i to do. At least your gpu's are still in my system yet driver issues have really got on my nerves this last 12 month and I nearly went green!
 
I like it when I see benchmarks where the two contestants have a difference of say 10-20 fps when the overall fps is above 100 and make such strong comments on it...... really funny

Then it is even more interesting that after having compared 3 different titles where both contestants are on par there comes another one that shows such a great difference.

Well, I must say that I hardly ever read benchmarks cause I find them really trivial for people with low end PCs or those who do not spend a fortune for high end ones. What I found so very interesting and has been verified for graphic cards too is the magic word CODING. Those hints in-between the lines in the article are magic.

The war between these two companies is more subtle and devious than most people can perceive.
 
[citation][nom]kathiki[/nom]I hardly ever read benchmarks cause I find them really trivial for people with low end PCs[/citation]Well then what value do your comments add to a high-end graphics article?
 
Actually there is also one problem with this test, the PCIE 2.0 vs PCIE 3.0
Other test have shown ( Hardocp etc.) That pcie 3.0 is aprox 5-10 % faster on the high end gfx cards than the pcie 2.0
This gets worse when running in CFX, and being heavily dependent on the PCIE bus. So the bottleneck is actually the pcie bus not the gfx card
 
The FX8350 is a Winner. After reading this artical and seeing a Video on Youtube of another independent study i am convinced that the FX8350 will work just fine for my gameing needs. I already have an AMD board that will accomandate it so it will save me alot of money not haveing to buy a different Motherboard and what not. Thanks Tom - GO AMD
 
[citation][nom]hakesterman[/nom]The FX8350 is a Winner. After reading this artical and seeing a Video on Youtube of another independent study i am convinced that the FX8350 will work just fine for my gameing needs. I already have an AMD board that will accomandate it so it will save me alot of money not haveing to buy a different Motherboard and what not. Thanks Tom - GO AMD[/citation]AMD is good enough for the needs of most gamers. This article specifically tested a super-high-end graphics system for a few reasons, one of which is that AMD owns the firm that produces the graphics cards.
 
[citation][nom]Crashman[/nom]I'm calling BS on this one because AMD's "eight cores" are actually four modules, on four front ends, with four FP units. Games have historically been limited by FP units specifically and front ends in general, no? What I'm seeing is that Intel's per-core IPC appears to be a little higher, when two different FOUR "full" CORE processors are compared.[/citation]

tl;dr warning for the less patient and/or people who don't care.

Let's think a little more about this. First off, FPU performance was historically important, but is easily shown to not be important for most games these days. For example, there is a very large difference in FPU performance and support of modern FPU instructions between the Family 10h CPUs (Athlon II, Phenom II, Sempron, etc.) and FX as well as Intel's modern CPU families, yet Phenom II is clearly able to hold its own against CPUs with similar integer performance despite the significant lack in FPU performance. So, FPU performance is clearly not even remotely as important as it used to be, at least with modern games.

Next, in the few gaming situations where the game can load up many threads effectively, performance gains on FX with higher core counts scales fairly similarly to Phenom II with four and six core models. Sure, we can clearly see how the front end bottle-necks in Bulldozer and Piledriver are not to be ignored, but they most certainly aren't on the order of making a module act anything like a single core.

Furthermore, just because there are FPU and front end bottle-necks does not mean that we should act as if modules are merely single cores. They do not act like single cores nor do they perform like them. Yes, performance per core on Bulldozer and even Piledriver is poor, but that's no excuse to pretend that each module is a single core. There shouldn't be any doubt about that there are two cores in each module. There are significant bottle-necks incurred in how AMD implemented it, but they should be seen as what they are, bottle-necks, not an excuse to pretend that they are reason to call a module containing two integer cores as anything but two cores. Before there were FPUs and any cache, CPUs were stil single core CPUs and that's a fact, so just because AMD changed how they organized the cache and FPUs is no reason to change the meaning of what a core is in the case of x86 CPUs. An x86 CPU's core is a integer processing core even though the technology has evolved greatly over time.

Also, since each FPU in an AMD module can process two 128 bit FP instructions at once instead of a single 128 bit FP instruction like the family 10h CPUs that preceded FX, although it's still a single FPU, it's arguably no worse than two of the older FPUs. Even more, it has support for many instructions that couldn't be run on the old FPUs, so it's even arguably better than two of the old ones.

There is no denying that Intel's current CPUs are far better per core than AMD's competing models and that Intel's far superior front ends are a huge part of that, but that has no impact on how we define the number of cores in AMD's CPUs. Like I said earlier, we didn't say that early CPUs were not single core CPUs just because they didn't have one FPU per core (especially since many lacked an FPU completely). We also didn't say that, for example, AMD's and Intel's first dual core models were not dual core models despite their huge front-end bottle-necks (especially for Intel) and very limited success in improving performance at the time.

It's not difficult to prove how improving the multi-threaded utilization of CPUs with many cores (such as six and eight core FX models) would improve their situation greatly. However, one thing that I wouldn't ignore (although I wouldn't call it a huge issue) is that even with AMD's CPUs being better utilized as a result of that, chances are that it wouldn't be a huge boost in FPS in most modern gaming situations. As we can plainly see from many of the benchmarks here at Tom's and many other places, most games are not so limited by the CPU that there is a huge gaming FPS difference between average performance on AMD's best and on Intel's best right now even with very high end graphics and such.

EDIT: That's not to say that there isn't a significant performance difference outside of games; I'm just saying that most modern games aren't as reliant on CPU performance improvements once you've already gotten into the high end as they are when you compare low end CPUs to mid-ranged and high end CPUs. It's not like with graphics cards where there's almost always ways to use more graphics performance.

Still, that software is generally not specifically optimized to work ideally on AMD's CPUs is no excuse for their poor performance per core. AMD really should work on improving their front end a lot, among other things.
 
Also, since each FPU in an AMD module can process two 128 bit FP instructions at once instead of a single 128 bit FP instruction like the family 10h CPUs that preceded FX, although it's still a single FPU, it's arguably no worse than two of the older FPUs. Even more, it has support for many instructions that couldn't be run on the old FPUs, so it's even arguably better than two of the old ones.

Actually the "FPU" can process two separate 128-bit instructions simultaneously, one from each "core". This allows each "core" to act as a full processor.

The real reason people keeping seeing IPC differences is that the "cores" inside a PD CPU are slimer and have less resources then a SB/IB core. SB / IB units have 3 ALU integer units along with a single 256-bit SIMD unit (FPU). The FPU has three data paths that can execute 1~2 128-bit SIMD instructions depending on the type (ADD/MUL/BLEND are on different paths) or 1 256-bit SIMD instruction.

BD/PD has 2 ALU's per "core" though they are dedicate ALU's and AGU instructions are processed separately. AGU tends to be a minority so those units are rarely working at full steam. The 256-bit SIMD unit is divided into two 128-bit FPU units that can each process one 128-bit SIMD instruction or the entire thing can process a single 256-bit SIMD instruction.

Looking at these basic design characteristics one can easily see how the SB/IB "core" simply has more resources then an equivalent BD/PD "core. To make this matter worse Intel has a very large lead in branch prediction technology along with SB/IB having a small fast dedicated L2 cache for each "core" while the BD/PD unit has to rely on shared "slow" L2 cache. These compounding effects is why we see such lower "single core" performance in a BD/PD CPU. AMD deliberately took the design decision to trade per-core performance for raw core count. What BD/PD CPU's are particularly good at (design wise) is many independent programs running at once. When places like Toms do benchmarking they tend to turn off everything instead of a single program, which is kinda unrealistic. It creates a more controlled environment but also turns "Single Player Timed Demo" benchmarks into little more then a new kind of synthetic. I prefer to do benchmarks while running two or more high demand applications (convert two or more videos at once) and measure the system load.
 
Hold on a second why are you only standing a 3770K with 8350FX? Not really an apples to apples comparison, since they're in a different price category, please add in the 3570K. Don't compare "flagships", you need to compare value for money, in the SAME CATEGORY. Smells like some Intel bias here... Not really a great balanced article, because the assumptions made are not a statistical buy approach for most customers - we don't all seek to buy full systems based on upgrade on ALL components. Nowadays, after CPU and GFX, its memory and an SSD.
 


But the i5 doesn't perform appreciably worse than the i7 in games. If anything, the decision to use the i7 gives AMD an advantage in the value comparison.

Personally, I don't think I agree with the article's rationale, that we should be looking at flagship versus flagship, irrespective of cost -- but I understand where the author was trying to go: if you're looking to compare the gaming capabilities of one whole brand versus another, then you go with the highest-performance product from each.

The problem is, as you say, that consumers don't (or shouldn't) think in those terms. A rational consumer will compare the i5 to the 8350, if those are the two items in his price range. If the i7 is in the rational consumer's price range, then he should probably just buy it.
 
[citation][nom]palladin9479[/nom]Actually the "FPU" can process two separate 128-bit instructions simultaneously, one from each "core". This allows each "core" to act as a full processor.The real reason people keeping seeing IPC differences is that the "cores" inside a PD CPU are slimer and have less resources then a SB/IB core. SB / IB units have 3 ALU integer units along with a single 256-bit SIMD unit (FPU). The FPU has three data paths that can execute 1~2 128-bit SIMD instructions depending on the type (ADD/MUL/BLEND are on different paths) or 1 256-bit SIMD instruction.BD/PD has 2 ALU's per "core" though they are dedicate ALU's and AGU instructions are processed separately. AGU tends to be a minority so those units are rarely working at full steam. The 256-bit SIMD unit is divided into two 128-bit FPU units that can each process one 128-bit SIMD instruction or the entire thing can process a single 256-bit SIMD instruction.Looking at these basic design characteristics one can easily see how the SB/IB "core" simply has more resources then an equivalent BD/PD "core. To make this matter worse Intel has a very large lead in branch prediction technology along with SB/IB having a small fast dedicated L2 cache for each "core" while the BD/PD unit has to rely on shared "slow" L2 cache. These compounding effects is why we see such lower "single core" performance in a BD/PD CPU. AMD deliberately took the design decision to trade per-core performance for raw core count. What BD/PD CPU's are particularly good at (design wise) is many independent programs running at once. When places like Toms do benchmarking they tend to turn off everything instead of a single program, which is kinda unrealistic. It creates a more controlled environment but also turns "Single Player Timed Demo" benchmarks into little more then a new kind of synthetic. I prefer to do benchmarks while running two or more high demand applications (convert two or more videos at once) and measure the system load.[/citation]

Most of that is either something that I'd already said or expanding on what I said 😉 Honestly, I'm still not sure if the ALU difference is really an issue because the difference in cache performance and such is so huge along with other front end bottle-necks and the difference in performance between Bulldozer and Phenom II is much smaller than the difference in ALU count per core would suggest, especially considering the additional front end bottle-necks. However, that is not to say that I'm not aware of any of these issues nor that I am not aware of many other issues involved in Bulldozer and Piledriver CPUs.

My post above was simply to prove that the Bulldozer modules should be considered as two cores rather than as a single core and the large performance per core difference in many workloads between AMD and Intel is not a viable excuse to change that nor are the combined FPU and front end designs. I was also pointing out that although it was ultimately AMD's choice to go for many slow cores instead of a few fast cores or even many fast ones and I don't agree with that approach for a general purpose CPU, software that fully utilizes all of those many cores would run much better than software that doesn't and that is including games. Do you see any fault in these two views?
 
[citation][nom]computertech82[/nom]It would make more sense to show SINGLE video cards (6770 and up), vs dual core and up cpus. Basically what is the best upgrade from a variety of setups. Not just for dual EXPENSIVE video cards.[/citation]

The point of this article was to test how much of a bottle-neck the FX-8350 is in a variety of games compared to the i7-3770K, not to be a general guideline for people looking for upgrades from lower end systems.
 



I wasn't arguing with you only expanding to show the naysayers that each "core" inside a BD unit is a real core and not SMT/CMT design. People keep saying it has "one" FPU per module which isn't exactly true, it's two FPU's that can bond together. I also wanted to lay out how AMD's "cores" have less resources per core in order to maximize the number of cores present on the die. The 2 ALU vs 3 ALU has more to do with speculative procesing, not only can you process the instruction your on now but in theory the one you need to do next. Of course if the CPU guess's wrong then all that preemptive work needs to be tossed aside. It's one of those design features that could be good or bad depending on the situation it's used in.

For example at work one of the systems we've been deploying use's T4-2 hardware. That's two sockets for 16 cores and 128 threads total. After it was up and running full speed we did some analysis and we got 700+ threads with over 12,000 LWP's. It was running eight or nine sparse child zones each loaded with their appropriate software. On average we had 3~4 threads waiting in the processing que for CPU time. That kind of load absolutely needs an insane number of "cores". Playing a timed loop demo of a largely single threaded FPS game isn't going to do much for that type of CPU power.
 
You guys should make a 5760x1080 (or 3240x1920 in protrait) shootout. Choosing the minimum AMD and Nvidia cards required to play each game. From Wow to Crysis 3.
 
[citation][nom]gallovfc[/nom]You guys should make a 5760x1080 (or 3240x1920 in protrait) shootout. Choosing the minimum AMD and Nvidia cards required to play each game. From Wow to Crysis 3.[/citation]

That sounds like a lot of work, granted it is interesting 😉
 
[citation][nom]blazorthon[/nom]That sounds like a lot of work, granted it is interesting[/citation]
But I bet that's what everybody is looking for !! 😉
 
Status
Not open for further replies.