AMD CPU speculation... and expert conjecture

Page 731 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

i've always suspected lack of modem and low powered igpu being the primary factors. that's before intel started it's [strike]bribe teh oems[/strike] contra revenue program.
i don't know much about immediate rendering in desktop gpus and tile-based rendering in ulp socs... could be another factor.

arm is supplying amd with a57 core license. meaning they can use the core design in their own soc with their uncore, igpu and other blocks but amd can't redesign the core itself. that one would be the k12. amd could easily(as compared to something like a k12 soc) out an a57+gcn design. similar to what nvidia likes to falsely advertize *cough*
tegra x1
*cough*.

AMD Will Release New Catalyst Linux Driver Update This Month
http://www.phoronix.com/scan.php?page=news_item&px=AMD-Catalyst-Linux-March-2015
 

8350rocks

Distinguished


http://openbenchmarking.org/prospect/1305170-UT-LLVMCLANG75/fd501a41a2adcc643acc832de94444f9fd7d9678

Sorry, it was a 3960x it blew away...not the lowly 3930k.
 
OK, this has gone on long enough. I'm going to make it as clear as crystal to all participants:

A: Civility is required at ALL times in these forums. If you cannot be civil to one another then I suggest that you find somewhere else to argue.

B: It's fine to take umbrage with something someone has posted. However, you may only attack the message, NOT the messenger.

Acceptable: "I disagree with your position for the following reason/s.....", "Your post contains the following incorrect information"

Unacceptable: "You're stupid", "What are you smoking", "You're an idiot", "you always post nonsensical things that are designed to exacerbate a single thing."

Any further breaches of the rules will result in post deletion, some vacation time for the offenders, and likely thread closure.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


That PostgreSQL pgbench benchmark shows an i3 @ 1.8GHz being 54% faster than a six-core i7 @ 3.3GHz . Which is your conclusion? That dual-core ULV are faster than six-core Extreme at database server workloads?

My conclusion is that there is something weird with that benchmark: either some misconfiguration or some compiler issue or different hardware configurations affecting the results. The benchmark cannot be used to compare i3 vs i7 vs FX. I was curious about what the author who run that benchmark has to say and that is what I found:



http://www.phoronix.com/scan.php?page=article&item=llvm_clang33_3way&num=1

It was not a Intel vs AMD review but a three-way compiler review. The correct way to read results is to compare benchmarks produced by different compilers on same hardware and see if newest version of Clang improved or not. This is what the author who run the review said about the PostgreSQL pgbench benchmark:



http://www.phoronix.com/scan.php?page=article&item=llvm_clang33_3way&num=4
 
Gamer your math is slightly off because your factoring in core count. Unless all cores at loaded at 100% then it won't be accurate. That is why you see such a disparity between the fx8 and the fx6 models, the fx8 has a ton of idle resources that aren't being used to produce work but are being used to divide the work done.

FX-8350
Performance = (IPC * Clock) * Number_of_Cores
104 = IPC * 4.0 * 8
IPC = 3.25

Here your taking the absolute work done in one second, 104 frames, and then dividing by clockrate then dividing again by core count. This will dilute the performance of any chip not working at 100% of it's capacity and isn't IPC but rather FPC (frames per clock).

In DA:I the 4.0 Ghz PD chip gets 104 FPS while the 3.9Ghz PD chip gets 97 FPS, they both use the same architecture and are identical in all ways except the fx6 has a module factory disabled. So the real difference between them would be.

104/4 = 26 FPC vs 97/3.9 = 24.871 FPC.

For the Intel i5

i5-3470
Performance = IPC * Clock * Number_of_Cores
141 = IPC * 3.2 * 4
IPC = 11.02

141/3.2 = 44.06 (69% better then fx8)


i7-4960x
155 / 3.6 = 43

i7-4770K
148 / 3.5 = 42.28

Honestly after looking at those results I would call shenanigans because some of them don't make sense. After doing some research looks like DA:I had several bugs that would cause erratic performance with nVidia GPUs and AMD CPUs though it's not consistent and some people had no problem. They released a patch later on and running it through Geforce Experience seem to alleviate the problem. So I would bet the game is trying to do something with scheduling and get all sorts of issues. Personally I'd rather use a title will a bit more consistent performance metrics cause I think that one is a bit erratic.
 
^^ I agree, that's why I prefaced with "A SPECIFIC APPLICATION", since core loading comes into effect. Ideally, we'd have core loading numbers and could factor those in (or even better: 100% flatlines across the board, so we could just factor out that entire bit).

The problem with benching games in particular, is that the GPU tends to supress the results to such a degree that you really can't make out a difference. DA:I was the only game I could find quickly that had a measurable difference between Intel/AMD.
 
The problem with benching games in particular, is that the GPU tends to supress the results to such a degree that you really can't make out a difference. DA:I was the only game I could find quickly that had a measurable difference between Intel/AMD.

And only when they turned it to 1280x720 on low with a GTX 980. CPU isn't nearly as important as graphics in gaming though you do want a minimum of three ~ four addressable cores just to keep the OS from interfering with you. Once they turned it up to 1920x1080 all the CPU's performed about equal with only small differences. To be perfectly honest the 8350 is a niche CPU useful only to people who actually need eight separately addressable cores with their own dedicated integer units on a budget, which is basically folks doing VMWare labs or rendering. Otherwise the fx63xx offers pretty solid value when on a budget, if only they had mini-itx motherboards for them.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


In a game like that I'd disable the odd cores of the 8350 to drop the module penalty.
 


Good idea. It would reduce power usage as well.
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860
I said several pages ago that IPC != program specific performance, yet was told i was simply "wrong". Here we get an example how ipc can magically change simply by shutting off 2 cores.

So that means with a fx 4320 vs 5960x, in the above example, the 4320 will have higher "IPC" than haswell e ... really ...

So again, who thinks IPC = program specific performance/cores/clock?
Or IPC x clock x cores x programming methods = program specific performance, where IPC is a constant and the only variable is the programming method.
 


That's the effect of CMT. Disable it, and you also remove the 20% performance hit you incur when using the second core of a module, so disabling every other core would RAISE IPC by avoiding that penalty.

Hence why it's really impossible to accurately measure it, as Palladin noted above. That's why I always note you can only do the math per-application, since various forces and processor usage are coming into play.

Likewise, FP IPC isn't the same as Integer IPC, which isn't the same as SSE2 IPC, and so on. Then you have cache dynamics, memory dynamics, and so on. In short: IPC isn't some flat number.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


http://en.wikipedia.org/wiki/Instructions_per_cycle

The number of instructions executed per clock is not a constant for a given processor; it depends on how the particular software being run interacts with the processor, and indeed the entire machine, particularly the memory hierarchy.

One thing is the maximum IPC allowed by the architecture, another is the practical IPC obtained for a given workload, which depends of the programmer/compiler ability to extract all the potential from the architecture.

As mentioned by gamerk FP IPC isn't the same as Integer IPC. Some architectures are optimized for integer (e.g. Piledriver), others for FP (e.g. ACE); some architectures are optimized for memory intensive workloads (secondary APUs on AMD exascale project), others are optimized for compute intensive workloads (e.g. main APU on AMD exascale project); some architectures are optimized for latency (e.g. Broadwell Xeon), others are for throughput (e.g. KNL Xeon), there are architectures that are more balanced and run good enough on different workloads but don't shine in anything, etc.

You cannot say that IPC is a constant.
 

jdwii

Splendid


I tested this theory and it only improved single core performance by around 8% guessing you would have to do this on the hardware level to get 20% gains.
 

jdwii

Splendid


Most here know how much i hate the term IPC i always ask with what instruction set? I generally favor average performance among many tasks.
 
But for the most part guys, you can guesstimate and get a good idea. Intel cores are much faster per clock then AMD ones, we don't need a scientific dissertation to tell us that. And we see the effects. It's just, on a application to application basis, we can't be sure how big that IPC gap is, and how much it affects performance, that's all.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I believe I have an explanation for that.

If two threads are independent then scheduling them on cores on different modules eliminates the CMT penalty (~20%), but if threads are data dependent then scheduling them on different modules add a performance penalty from the cycles wasted on copying data from the L2 cache of one module to the L2 cache of another module. Thus disabling cores and scheduling threads to different modules doesn't always increase performance by ~20%.

FX-Scheduling.jpg
 


You also have to consider Turbo, which occurs more aggressively when using only one module, versus two.

So yeah, it gets complicated, which all goes back to IPC not being a flat number,
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


You have to account for turbo in all the models. A review with a stock Intel chip leaves turbo enabled, and you simply don't know what clockspeed the chip is running at for the duration of the benchmark.

If 4770k is 3.5ghz base, 3.9ghz turbo, and it is running on an open test bench with great ventilation and a large after market cooler, who knows if it ever drop below 3.9ghz? That's an 11.% variance in potential frequency the chip is running at.

If you want to calculate IPC, you need to find test data that has turbo disabled completely and the chip locked to a specific frequency.

Here's a good sample. 4ghz locked.
http://www.hardocp.com/article/2012/10/22/amd_fx8350_piledriver_processor_ipc_overclocking/3

You can see clearly x87 instructions per clock in SuperPi for Piledriver are absolutely abysmal. wPrime though, Piledriver is only 30% behind in IPC assuming 4 intel cores = 8 AMD cores. And as it has been noticed, the core scaling is vastly different between the two so it's difficult to tell. I don't think it's possible to extrapolate single core performance.

Then again, all of this relies on the assumption that both CPUs are being fed the same instructions. There's too many variables that most reviews don't account for. And even if your math is right, if you're using the wrong data, you will get bad results.

If someone asks you what 2 + 2 is, and you tell them 7 + 8 is 15, you did your math right, you just used wrong data. That's what happens way too often with all these IPC calculations. I simply don't think it's a feasible measurement of performance.

You basically need the source code compiled in a fair way by a fair compiler that will feed both CPUs the same exact instructions with that program using every library compiled in a fair way as well.

I think that some of you are chasing something that's simply not worth the effort because there's way too many ways to have inconsistencies.

Are the CPUs turboing? Are they running the same instructions? Is one being gimped by a compiler? Do they have OS performance patches or changes?

Here's the thing. If someone did do something like compile Gentoo from source completely with the most generic, comparable cflags between all test chips, no one would care. They'd ask you why you're benching an OS they won't use. And if you run Gentoo, you'll optimize as much as possible.

Which is why I don't think IPC is worth anything. The only benchmarks that should matter are stock with turbo enabled and then benchmarks where you compare chips with average overclocks to each other. It's just a dumb marketing metric that has so many flaws that I doubt any serious CPU engineer would take it seriously.
 
^^ But again, that only matters if you're trying to find some magical absolute number, which doesn't exist for all the reasons you describe. That's why I ALWAYS calculate IPC on a per-application basis, then try and figure out why the numbers are what they are. I was doing this back when BD hit, and was the first person to speculate the module penalty was holding the chip back, and this was months before MSFT released a patch to "fix" the issue (Though I still think AMD just should have set the HTT CPUID flag to fix the problem on their end, but that's a different debate).

Point being, you can use IPC comparisons to find problems. That's why we do them.
 

con635

Honorable
Oct 3, 2013
644
0
11,010

Only leaks were the fake 3d mark and the most likely fake chiphell leaks, I would optimistically say on rumored spec it will be a little over 50% faster than 290x. Could be a good battle with titan x if so.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The basic performance equation of computer science is

Performance = IPC * Frequency.

Every CPU engineer knows this equation. It is routinely used to catalog different kind of architectures: scalar vs superscalar, speed demon vs brainiac...

Not only engineers work with above equation but they work with models that describe the IPC of a given architecture (real or virtual) as a function of several parameters. By changing those parameters engineers can improve a given architecture. In fact, most of the research in CPU architecture during last decades was toward developing new ways of improving IPC. And the research continues today. A recentest example was VISC

http://techreport.com/news/27259/cpu-startup-claims-to-achieve-3x-ipc-gains-with-visc-architecture

As several of us stated before, IPC is not a kind of constant of the universe. But all the variability that affects IPC values also affects performance: What performance was measured? Is instantaneous performance over an atomic block? Average performance for a given application? Average performance over a range of applications? Is performance with turbo enabled? with turbo disabled? What OS? What compiler flags? The computer case was open or closed?

Using your own arguments we would conclude that we cannot discuss performance of computers because there are many variables involved; we would conclude that performance is "just a dumb marketing metric that has so many flaws"; but both are incorrect conclusions because your arguments were flawed.
 
Status
Not open for further replies.