AMD CPU speculation... and expert conjecture

bobbybamf12 · Apr 15, 2013

sarinaide :

lol that is true! and regardless that's still a beast system! Just never realize you could crossfire on a FM2 board.

JAYDEEJOHN · Apr 15, 2013

The growth in speed increases are outpacing Intels.
Its been shown here that AMDs "IPC" if you will, isnt as bad as some thought, and compared to BD, PD showed huge gains.
SR hints ad another set of nice gains, again out distancing Intels growth, and lets just look at the first APU iterations, where no clock speeds, no tweaks, using the old VLIW approach, it was miles ahead, and being held back by BW, where the competition wasnt even fast enough to take advantage of the BW it had.
Upping it to GDDR5, with its BW, using GCN 1.1 if you will, improving clocks and power, then improving power management where AMD has been further behind Intel, we again will start seeing faster growth here as well, allowing for a more sustainable turbo, which Intel already enjoys, and is why unless these chips are locked in at a given frequency, so far, this advantage has been huge for Intel, where the growth is less from them, as we see HSW adding a better gfx solution, and in house VRMs, but the TDP is up in high usage, and the "IPC", if you will, is very flat.
So, AMDs gfx solutions, huge upside.
CPU, nice upside.
Power management, huge upside there as well, and until we see the full growth of the BD design, which we wont until EX, where perf is supposed to skyrocket, I cant see why people wont be picking up these chips, and future ones as well, which is alot more than can be said before RR, where they let BD out before it was truly ready, and why we see the current downtrend, but they arent PDs, or SRs, and these are right around the corner, in varying iterations

8350rocks · Apr 15, 2013

gamerk316 :

Pixels per second is not tied to IPC at all...IPC, without having access to the base code of an application, is a Unicorn, in that you and I cannot feasibly determine this number.

IPC has engineered maximums built in to the architecture of the CPU. The variance between programs, stems from applications operating below the engineered maximums to varying degrees

There are some circumstances where a program could theoretically reach the engineered maximum IPC, though this would require an extremely simple program coded beyond humanly well, where all the single instructions processed eliminated larger groups of instructions as they were processed perfectly. I am not sure that any program currently coded today can achieve this utilization of architecture. This would be..."the perfect program". Nothing is perfect...and as I said, while it may be theoretically feasible...it's not at all likely.

http://www.cse.wustl.edu/~jain/cse567-06/ftp/processor_workloads/

That link shows CPU workloads...notice pixel rendering is not among the listed items?

That measures system capability to run said program at X speed. It gives you a metric to compare, but you can reasonably conclude nothing from it other than X CPU runs that program better than Y CPU based on the system's performance that it is installed in

http://en.wikipedia.org/wiki/Instructions_per_cycle

That talks specifically about IPC...note...they talk about values of IPC relatively, without specifics, but you can see that they clearly can say that newer architecture is a higher clock speed with lower IPC. Meaning while the AMD Athlon could process more IPC theoretically than the FX8350, the FX8350 is technically faster because 32 IPC @ 4 billion cycles per second is more than 36 IPC @ 1 billion cycles per second. In 1 second the FX8350 can compute an engineered maximum of 128 billion instructions based on architecture. While Intel's i7-3770k can theoretically process a maximum of 122.4 billion instructions per second (both figured at stock clocks). This shows 2 things...1.) Intel does not have an advantage comparing CPUs on a "raw hardware power" standpoint. 2.) AMD should actually pull ahead if they can get the wrinkles ironed out of their architecture. By shortening pipelines, adding a second decoder on the front end, and decreasing time lost by executing instructions that were miscalculated.

See, a CPU will make "calculated guesses" as to what needs to come next on the instruction path...if it guesses wrong...it clears it's cache and starts over. This is what determines efficiency of the CPU. If people code more specifically for your hardware or architecture design...you gain an advantage in efficiency in your CPU...making it appear faster when it's really not...it's more efficient.

This is why AMD architecture in consoles is such an enormous victory for AMD, that means developers will be coding specifically for their hardware. Which means not only is their design going to become more efficient, but the coding for it will become more efficient to maximize efficiency of instructions for the games being designed.

So, you're chasing a unicorn trying to prove intel IPC is significantly higher than AMD. I gave you the engineered maximums...that is what it is. You cannot change it. Right now, intel is likely 20% more efficient in general, especially in single threaded tasks. But when steamroller comes with 44 IPC @ 4.5 billion cycles per second, with better optimization and better software utilization of architecture. You will see a shift in the balance of things.

I hope you're picking up what I am putting down...you're talking about system optimization and claiming it has anything to do with IPC. It does not.

lilcinw · Apr 15, 2013

gamerk316 :

8350rocks :

IPC is NOT a flat value, and varies by workload. Throw in the shared backend, and you can easily get pipeline stalls (which is blamed as one of the reasons for BD's poor single threaded performance). This is especially notable in FP workloads, as the FP scheduler isn't shared like the integer one is.

So anyone claiming to have absolute numbers on IPC is kidding themselves. You can only solve IPC PER APPLICATION.

I'll use this as an example of the math:

As its single threaded, I can safely disregard any pipeline stalls. I'm going to do the math assuming no turbo for simplicity sake though, so these numbers won't be perfect...

I'll compare the 4300 and 2500k.

AMD FX-4300: 3.8GHz
Intel i5 2500k: 3.3GHz

Solving for IPC:

Time = NumberCores * Clockspeed * IPC

IPC = Time / NumberCores * Clockspeed

For AMD (remember: Single Threaded):

IPC = 236.3 / 1 * 3.8
IPC = 236.3 / 3.8
IPC = 62

For Intel (2500k):

IPC = 274.9 / 1 * 3.3
IPC = 274.9 / 3.3
IPC = 83

And just for kicks, the 3570k:

IPC = 302.2 / 1 * 3.4
IPC = 302.2 * 3.4
IPC = 88

Now lets solve multithreaded:

FX-4300:

IPC = IPC = 983.6 / 6 * 3.8
IPC = 983.6 / 22.8
IPC = 43

Note how IPC decreased? Thats likely either a failure to fully utilize all the cores, or the shared backend robbing performance.

2500k:

IPC = 1012.8 / 4 * 3.3
IPC = 1012.8 / 13.2
IPC = 77

Hence why the i5 at a lower clock still beat the 4300: Superior IPC. But note how IPC only dropped by 6, compared to the drop of 19 for AMD: This indicates that the i5 actually scales better then BD/PD (again: could be pipeline stall, lack of core loading, and other factors).

3570k:

IPC = 1108.9 / 4 * 3.4
IPC = 1108.9 * 13.6
IPC = 81.5

Same story here: IPC drops slighty (7.5), but less then half as much as AMD. Farther indication theres a problem somewhere as workload starts to scale (not a good sign for an arch designed to scale).

As far as this PARTICULAR application goes: Intel has far superior IPC, and scales better in the multithreaded bench, even if it looses in pure performance.

Farther, you can break down IPC per core, which nets this:
AMD FX-4300: 7
Intel i5 2500k: 19.25
Intel i5 3570k: 20.375

Hence why AMD only wins when clocked higher and when all the cores are used: Its per core performance is less then half as much. (Do remember that shared backend though).

So yeah, your IPC numbers are kinda worthless in hindsight. You can only solve PER APPLICATION, as loading factors will vary with different apps.

You did a bit of bait and switch (probably unintentional) with your numbers. For the multi-thread you used the 6300 instead of the 4300.

It should be IPC= 727.9 / 4 * 3.8
IPC = 727.9 / 15.2
IPC = 47.9

Per core is 12 instead of 7.

Still not a rosy picture but not quite as bad as your original.

8350rocks · Apr 15, 2013

1 pixel rendered != 1 instruction...

http://blog.hvidtfeldts.net/index.php/2011/02/gpu-versus-cpu-for-pixel-graphics/

Look at how IPC is calculated...the IPC of the CPU in his example is 16...it's an intel...there is no such IPC as something in the 50's even, much less in the 70's...unless we're talking something on the order of high end server CPUs, though we're not.

You're not anywhere near the ballpark of IPC with pixels per second...If the planet Earth was IPC...you'd be in the andromeda galaxy, about 2 mil light years away.

EDIT: To give you an idea of theoretical performance...

These are maximum theoretical single precision GFLOPS capability for both the i7-3770k and FX 8350:

i7-3770k: 112
FX8350: 256

Maximum theoretical double precision GFLOPS:

i7-3770k: 28
FX8350: 64

Now maybe you can grasp what I am talking about since we are comparing "in intel terms".

As you can see, the raw muscle available is basically double in favor of the FX8350, even including hyperthreading.

griptwister · Apr 15, 2013

@8350rocks: +1 for the link.

Correct me if I am wrong... But isn't Kaveri supposed to use DDR4 and a new FM3 Socket?

8350rocks · Apr 15, 2013

griptwister :

I have heard rumors about DDR4 and FM3 socket may be coming with Kaveri, no confirmation yet though.

JAYDEEJOHN · Apr 15, 2013

It would be nice to see Kaveri compete with a 8150.
Everyone wanted the old chips to undergo shrinks etc, but they arent as scalable as Kaveri will be, doesnt have the power/perf Kaveri will have etc.

$hawn · Apr 15, 2013

Damn!! A lot can get posted in 24 hours 😀

$hawn · Apr 15, 2013

noob2222 :

You are absolutely correct to state that I shouldn't have looked at a single benchmark to gauge IPC. Taking the average IPC of a few single threaded tests would have been more fair.

Where your logic fails however, is in the multi threaded benchmarks you have posted. You don't judge the IPC of a single core in any application based multi threaded benchmarks, especially when considering that the 2 architectures handle multiple threads differently ( HT vs the module concept.)

Again, for your single threaded cine bench test, considering turbo speeds,

Intel Core @3.8GHz => 312.4
AMD core @3.8GHz => 252x(3.8/4.2) = 228

IPC ratio in cinebench => 312.4/228 = 1.37.

In other words, here ALSO, at equal clock speeds, Intel is 37% faster, not a mere 15-20%.

AAC encoding might have been a worst case AMD scenario or something, that was my bad. But if you use the same logic on most other single threaded stuff, you'll find that intel is usually more than 30% faster at least.

Correct me if I'm wrong.

PS:- I intentionally left out turbo core from the previous calculations, because we were comparing the multi-threaded performance of a Centurion vs an SB-E, and all core turbos are usually not that dramatic. It helps to simplify guestimations by quite a bit. However, in real life turbo scenarios, Intel will pull even more ahead, as the SB-E turbos up by 18%, while the centurion can manage only 10% if the 5.5Ghz max turbo story is true.

$hawn · Apr 15, 2013

sarinaide :

It's pretty obvious that you're an AMD fanboy, but no worries, as I am actually a bit biased towards AMD myself. But to blindly close your eyes and not see the truth is wrong.

Your 15-20% faster is more like the lower limit, not the upper one!! As i've just shown with 2 calculations, in which Intel is 37% and 50% faster. Its a reproducible calculation with most single threaded tests, and you can calculate them yourself when you have time.

There's nothing wrong with having subpar single core performance, especially when you can more than make up for it in multi threaded workloads like AMD does.
For example, encoding a song is only a matter of seconds, and is irrelevant, but when re-encoding videos with x264 for example, AMD usually provides much more performance per $, and that I like. The i3 vs FX 6300 is an outshining example of this

JAYDEEJOHN · Apr 16, 2013

But by todays demands, and certainly for tomorrow onwords, single core becomes less relevant .
You could say that have beefed up single core perf is the wrong direction, if your pricing model gets beaten in ever growing MT apps.
Now, going to more cores of course you would see an even greater gap, and wins by Intel where it loses, but then again, you have the price of doing so.

I guess what we need is the old car analogy, where everyone would love to have a MacLaren, or at least its engine in their cars, but when a Porsche is so close.....

So again, it could be argued, Intel isnt offering better to most, but only a few, it all depends on how you look at it.

8350rocks · Apr 16, 2013

$hawn :

The issues lie in your "IPC" calculations...you cannot deduce IPC from anything anyone has posted...it's not feasibly possible. The limits for IPC are hard wired into the architecture because they are hardware limited. None of the numbers are anywhere near what some of you are talking about either. IPC over 40 is a bit crazy...yes, it is dictated by program's coding, but they never exceed the engineered maximum...that's why overclocking has become so popular...you can't increase IPC, but you can increase the clock cycles per second.

IPC has little relevance from software...the term is thrown about far too often. IPC is an architecture term from engineering as the maximum theoretical capabilities of the hardware itself. Somehow or another, it got attached to software, and it's actually nothing to do with software.

Coding efficiency and optimization of architecture are the advantages Intel has over AMD...those are the only advantages...

If you look at them on paper...the AMD is easily a LOT more CPU...

The issue is the hardware is not being advancely optimized by the coders...once that occurs...look out.

mayankleoboy1 · Apr 16, 2013

The issue is the hardware is not being advancely optimized by the coders...IF that occurs...look out.

Corrected it for you.

palladin9479 · Apr 16, 2013

mayankleoboy1 :

No he's correct, it's an eventuality not a probability. Commodity software is just now getting to the point where it can do multiple threads worth of work simultaneously.

The overuse and incorrect use of the term "IPC" is something I've been warning people about for ages. "Per-clock" means absolutely nothing, it's performance vs cost vs energy usage.

sarinaide · Apr 16, 2013

bobbybamf12 :

We have been saying this for a while, AMD's arch may not have been perfect but yet somehow it isn't as bad as made out to be bar a few exceptions in synthetics which are latched onto like a pitbull on a chew toy. So if I have taken the minimum, you have stretched the maximum it would be somewhat ironic if its the inbetween ie: 25% but such is life.

I just said that SR just has to stay relatively close to intel ie: within 10% to provide a bit of a problem to Intel, and since improvements are carried out across the line that will mean Kaveri will have similar per core performance which may just very well be the arch that is the salvation to the desktop market. Competition is good and AMD innovation is Exceptional, driven by Keller anything is possible. But just for the record I haven't said that AMD will be better nor do I expect it, intels process is to advanced.

[edit] The quote was Shawn's not Bobby's dunno what happened there 😀

mayankleoboy1 :

I don't expect SR to beat Intel in flat per core performance, just "touch wood" close the gap to within acceptable levels. That will make 2014 a very exciting year.

mayankleoboy1 · Apr 16, 2013

If stock AMD gets within 10% of stock Intel, AMD gets my money.

griptwister · Apr 16, 2013

mayankleoboy1 :

My question to you, "Which Intel CPU are you talking about?" Because I'm more than positive AMD is going to put the hurt on the i3s, heck, maybe even i5s if their 6 core kaveri is decently clocked.

BuddiLuva · Apr 16, 2013

bobbybamf12 :

de5_Roy · Apr 16, 2013

we've been 'getting there' ever since amd hex core cpus came out. when the heck will we actually 'get there'? it's 2013 and majority of thuban owners either switched to intel or changed to pd cpus, rendering thuban purchase decision at that time (for multicore 'revolution') pointless. at that time, what those guys got were low clocked power hungry cpus that didn't scale with workload and needed to be overclocked. amd hypes multicore, multithreading a lot but barely supports it in software as well - wouldn't have happened if programmers wanted to make multicore-friendly software or had any say in hardware/sdk development. for example, does anyone here know how codexl coming along? where is amd's own compiler? the continuous negligence from red, green and blue (in seeking instruction support/promoting, for mainstream ) resulted in current skeptical and cynical mindset - something amd needs to change. multicore/multithreading is badly needed now more than ever.

Rumored details of embedded Kabini APUs
http://www.cpu-world.com/news_2013/2013041501_Rumored_details_of_embedded_Kabini_APUs.html

JAYDEEJOHN · Apr 16, 2013

The way I see it is, as nodes become more expensive, less effective as well in some cases, thus becoming even more exotic (coatings etc), thus more expensive yet to retain perf/power, you will start seeing more and more SW doing MT and the like, better use of extensions etc.
Until recently and yet still, cpus have been brute forcing it, especially Intel, and is why we see their smp doing so well.

This is sort of like extra money in the bank, brute force can do.
As SW becomes more demanding, it will either slow down, or become more innovative.
Since AMD is playing catch up, and doesnt enjoy nearly so much of this perf/power advantage, the ongoing tweaks will matter more to them, as still some low hanging fruit is there for them, allowing for some brute forcing of their own, as they have solutions for MT, where they compete quite well.
Going forwards, using bigger chips for DT, which is in decline, means business models have to change, where depending on ROI, it can and will have an effect.
We will see a future where good enough have driven the DT market into SFF with connects, and DT as we know it today as expensive super computers, where discrete gfx and huge cpus still rule, but at costs much higher than we see now, only to sustain a smaller market, within which each node becomes more elaborate and expensive, APUs smothering half of the current family of discrete etc ?

I think its on its way.
The current players and their respective business models will play into all this, as theyre mostly publicly held, and so must maintain certain levels of ROI.
Currently, if this is to be whats going to happen, look at current pricing in the cpu and gfx markets, and where each brand chooses to set their product price.
How is this looking going forwards, all things being somewhat equal

gamerk316 · Apr 16, 2013

lilcinw :

Wow, did not see that. Good catch. Thats why you show work 😀

Basic argument still holds though.

gamerk316 · Apr 16, 2013

8350rocks :

(I REALLY need to stop doing major posts when I'm exiting out the door, because I always screw something up)

Technically, correct. There is one value missing from the equation I used: The number of instructions the application executed. As a result, the IPC values used are higher then what is feasibly possible.

HOWEVER:

Assuming the code paths for the CPUs are more or less the same (which basically means limited use of CPU opcodes that are Intel/AMD only, and a non-idiotic compiler), the Number of Instructions should be more or less the same between Intel and AMD (within 5% or so), so while you can't measure IPC directly, you CAN get an estimate at the relative performance difference between them, since the Number of Instructions can be estimated as just a scalar value to the rest of the equation.

The POV_Ray multithreaded benchmark, the FX-4300 (47.9) and Intel i5-2500k (77) numbers would need to be divided by the Number of Cycles executed to get the real IPC value. Assuming this is more or less the same for both CPU's (which may or may not be a valid assumption), you can measure the relative performance difference in IPC, which nets you:

100 - (47.9 / 77 * 100) = ~38% IPC advantage in favor of Intel for this particular benchmark.

And yes, again, this is why I need to stop posting when I'm walking out the door at work.

noob2222 · Apr 16, 2013

$hawn :

nice try but crude math doesn't pay off. Look at the 2nd part that you missed. Intel cpus will never run at their rated speed unless you put them in the oven to increase the temperature to the point that it throttles. you also can't assume "max turbo" more on that later, but lets examine "turbo values" just for the sake of ignoring the actual value

Look at amd's numbers.

238 for the 4300, 3.8ghz base, 4.0 ghz turbo
242 for the 8310, 3.5 base, 4.0 turbo ...
252.1 for the 8350, 4.0 base. 4.2 turbo

so at turbo speeds, 238/4.0=59.8 per ghz for the 4300, and 240.7/4.0 = 60.7 for the 8320, and 252.1/4.2 = 60.2

That would mean that the 8320 is the best AMD cpu because its IPC is higher ... This is why you can't assume turbo speeds = actual speed.

now lets look at Intel side,
312.4 for the 3770k 3.9 turbo = 80.1
302.2 for the 3570k 3.8 turbo = 79.4
288 for the 3470 3.6 turbo = 80
266.7 for the 3220 no turbo 3.3ghz = 80.8

the ratio of 80 to 60 is 33, not 37.

lets look back a page ". In other words, SB is 1/0.66 => 1.51x PD IPC, ie, its 51% faster."
Lets check that statement since you changed the comparison as IVY didn't gain much in Itunes.

274.9 for the 2500k at 3.7 turbo =74

so now you lost that arguement even deeper (74/60 = 1.23, in other words 23% not 51%, you just lost 55% of your lead) instead compare it to Ivy bridge since its faster than SB. Thats assuming the cpus run at their turbo speeds. .

the part you miss interpreted about the multi-threaded test is in theory, if said program scales 100%, then IPC for said program = single core IPC X speed X cores. Scaling can be calculeated to how well the cpu works comparing to RL results

3570k of 80(ipc per ghz) *3.4ghz *4 = 1088 Actual = 1108.9, scaling of 102% How can you scale over 100%? remember I said more on that subject, here it is. Intel cpus never run at their "rated speed". Its all a marketing gimmick to give you the assumption that the cpu is faster than it actually is if you assume the cpu is running at its "rated speeds" Without monitoring the cpu speed throughout the test, determining core scaling is near impossible to be accurate, same problem with trying to calculate "IPC"( or in this case IPG(hz)

but lets see what happens anyway using the flawed values.

lets see how well HT scales in comparison
3770k (80*3.5 ghz *4 = 2240, actual 1363.6 = 120% scaling minus 102% that the 3570 showed = 18% ht boost.

what was that quote again? "Assuming HT brings in a 30% boost on average,"

don't think i have ever seen ht boost even close to 30%.

Lets see how the 8350 scales
60*4.0*8 = 1920 actual = 1504.4 = 78% (pretty much right where amd stated modules = 80% of a dual core)

so lets examine again what I said.

"After all, if its the 50% that you claim, this should never happen:"

So assuming what you claimed as the truth, 8350 actual numbers should have been (3770k ipc of 80) *66% (1/0.66 rembmer but we are going in reverse to calculate AMD) *4.0 ghz *8 cores, *78% scaling for amd's architecture = 1317.

Or to sb: 250k= 74* 66%*4.0ghz*8 cores*78% = 1219

So if amd's ipc is 50% of Intel's SB, the maximum 8350 score should have been 1219, staying well behind the 3770k. (IE: it should never happen)

Clearly not the case, so clearly AMD != 50% of Intel's SB cpu, its not even 50% of IB.

t should be IPC= 727.9 / 4 * 3.8
IPC = 727.9 / 15.2
IPC = 47.9

this assumes 100% core scaling, can't calculate this direction on a multithreaded test as already shown, especially when HT or CMT are involved.

In other words, all these futile attempts to calculate "IPC" are all flawed. all we can do is take the values for what they are for any given benchmark. IPC is impossible to get any accuracy and carry it from one program to the next. sure, you could guesstimate to some margin of error, but does that make it correct?

sarinaide · Apr 16, 2013

Its why I refrain from getting into these "IPC" "Per Core" arguements, most are so redundant in daily operations for a normal user that you cannot realistically tell the difference, but factors such as "SMT" are dismissed when we are clearly moving to that, then "HSA" is another very distinguishable facit that is overlooked so that we can beat on AMD's "per core" all day long. Some will throw the fanboy around but nobody ever said we don't want more or we consider it where we want but AMD is doing what they had to do all along when opting for a new type of system architecture, they had to role through the road map and suffer the growing pains with it, but ultimately there are facets of AMD's arch that is very impressive so perhaps this "fail" moniker is branded all to easy by people who seem to have incentive in pandering Intel around as the best thing since sliced bread.

I do on the other hand use Intel for specific system build requirements, I run a dual Xeon hex core with 580's and a 7970 for purely crunching reasons and it does a splendid job at that. For regular consumer grade systems, gaming, media etc I find Intel lineup duller than a dead cow's eyes, when the temporal boredom sets in, they begin to aggitate me so I don't buy them anymore. If we are insistent on playing the "blind eye" game then why not focus on areas where AMD is actually doing well at. Sure the APU my be loosely defined as a x86 with graphics but only a naive person would be so narrow in mindset, the APU represents the future of heterogeneous system architecture which has in its limited time come on leaps and bounds already flexing its robust performance over pure x86, to do that you need a good graphics core and well AMD is giving Intel a very mighty spanking here, should people focus a bit here or are we "AMD fail only zone" like the muppets at Anandtech.

AMD CPU speculation... and expert conjecture

Honorable

Champion

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Champion

Distinguished

Distinguished

Distinguished

Champion

Distinguished

Distinguished

Splendid

Splendid

Distinguished

Distinguished

Distinguished

Splendid

Champion

Glorious

Glorious

Distinguished

Splendid

Share this page