AMD Piledriver rumours ... and expert conjecture

Page 136 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
These crazy exaggerated [strike]clockspeed discrepancies[/strike] benchmark calculations being used in theoretical discussions involving IPC, is a continuation of the woeful behaviour which saw gullible and/or young posters on forums all around the world getting sucked into buying [strike]AM3+ motherboard before Bulldozer came out[/strike] an Intel platform.

Cheers! 😗
 
For a single core:

Performance = Instruction Per Cycle * Number of Cycles

Speed (Hz) is half the equation, IPC is the other. They count equally.

If CPU x can do twice the number of work per clock as CPU y, then CPU y needs to be clocked twice as high to offer the same performance. Simple. If the CPU is 20% slower in IPC, it needs to be clocked 20% higher to compensate.

[I now stress IPC varies by workload, and other parts of the system come into play. I'm dumbing things down for the purpose of discussion].

Heres the issue: Due to heat/power draw, you don't see large frequency differences in CPUs any more. As a result, IPC is what separates a "good" CPU from a "bad" CPU.

More cores simply act as a multiplyer to the above equation. When using SMT [be it HTT or CMT or some other implementation], you have some sort of scaling factor [10% for HTT, 80% for CMT, etc].

[I now stress this assumes a 100% full load where all the processing power of the CPU is used, which typically isn't true. If it was, BD would look much better then it currently does].

So to compare an i5 and 4000 series BD, you get something that looks like this:

SB i5:
Performance = (IPC * Clockspeed) * 4

40xx BD:
Performance = ( ((IPC * Clockspeed) * 2) + (((IPC * Clockspeed) * 2) * 80%) ) [2 full cores, 2 CMT cores]

Lets assume both are clocked at 3GHz at stock and have the same exact IPC [which we know is false].

SB i5:
Performance = (1 * 3) * 4 = 12

40xx BD:
Performance = ( (1 * 3) * 2) + (((1 * 3) *2) * .8) )
Performance = ( (6) + (4.8) )
Performance = 10.8

Reducing to 1:
BD = 1
i5 = 1.111....

Thats your performance difference. Even if BD and SB (i5) has the same exact IPC, due to the 20% performance hit of using CMT, you'd still be 10% slower overall. And since we know BD has a lower IPC then SB, that difference tends to be larger.

My point being, you can mathematically solve performance differences in any given benchmark to figure out the difference in IPC for that specific benchmark, since the performance difference for the benchmark and speed of the CPU are known. I'd like to see some review sites start to look into IPC differences on a per-benchmark basis, as it would be interesting...
I have no idea why you posted those calculations as they are not very useful. Everyone knows the FX4100 isn't as powerful as the i5 given the lack of IPC and CMT scailing. here a nice link on bulldozer core scaling.

http://www.phoronix.com/scan.php?page=article&item=amd_bulldozer_scaling&num=2

IPC isn't something magic that can be just looked at like that. Some instructions are faster on some uarch and some uarch completely lacks some instructions. Niether is the scaling of CMT. It varies greatly from program to program and thus is the things that AMD can work on. Making the architecture more efficient when doing certain thing will help greatly with both IPC and CMT scaling.

performance also isn't as simple as IPC * Clock speed. Some processors will not scale linearly with clock speed past some points. Performance will always start to fall at higher frequencies. All of these things are part of design tradeoffs. Performance of difference cpus will always vary in different tasks. Look at the tasks you need to see how affective the cpu is.
 
In that case there are three socket 775 Intel variants ... the early models, the middle era and the last EOL variants.

Upgrading from Pentium to core2 could be a major issue there too.

Add those into the mix and both have about 15 sockets essentially due to power and chipset iterations.

mu could spell it all out in a masterful 1500 word post ... then again he might just say "yes".

Yeah, that's pretty much right. You had your first-gen 915/925 Prescott platform, then the 945/955 Pentium D platform, then the 965/975 Conroe platform, and then finally the 30/40-series Penryn platform. There is some overlap amongst the platforms, mainly the 945 being the cheap platform of choice for cheap Core 2s as well as Pentium Ds.
 
+1 On that now if Sandy was clocked at 1.0Ghz and BD was clocked at 3.2Ghz this would be a different story! I'm so sick of Amd and amd fans downplaying Single core performance.

And like i keep saying Intel is not giving us 1 core when Amd is giving us 8 their giving us 4 Powerful cores when amd is giving us 8 mediocre cores(Around the same price range) which have about 50% the performance per core of Intel maybe a little more. Since when all 8 cores are active is usually(At best) can only give 10% better performance then the 4 core from intel but since this really ever happens it doesn't get that chance.
bulldozer only need to clock 30% higher to be competitive as they would be pushing the performance of the i7 extreme platform on many of the tasks it has advantages in. Its not so hard to think that piledriver will be clocked 30% higher than ivy since ivy hasn't moved in clock and doesn't seem to OC as well as sandy.

Some people might also ask why have such fast 4 cores when programs that can't use more cores generally don't need them and all programs that need more processing are moving toward parallization of its workloads. In such as scenario, AMD wouldn't need to clock much higher than Intel even with the IPC discrepancy.

Imho buying a CPU for most people would be about marketing, even people who are "enthusiasts" buy into it. The difference between the CPUs wouldn't be much to scream about on an everyday basis. And many enthusiasts buy i7s over i5s when they do nothing that uses hyperthreading. All this screaming about IPC doesn't matter nearly as much as judging what you need the CPU for and how much CPU you actually need.
 
I feel so bad... 😛

seriously though.
when a Deneb C3 (X4) clocked @ 4.2-4.4GHz still gets housed by a i5-2400 running stock of course, means there is a problem.
and the only turn for AMD owners as in upgrade is the FX-81xx.
I mean c'mon man....

I'm trying to hold out for Piledriver and considering I also have a 2500K unit has allowed me to make it thus far.
but it's getting really irritating just thinking about it..
I can have a i5-2400 for my main box and the 2500K for gaming.

my AMD does serve it's purposes well but just knowing it's behind the curb and no real options to improve on,
what else is there.?

oh AMD, you have given me a serious case of MEH when discussing you... :pfff:

If you're getting that relation in performance thanks to the Cinebench numbers, I'll tell you right away that the i5 doesn't feel as smooth as the X4 for heavy workloads. I think I said this before and will say it again.

This might be a lil' biased, but just take a quick look: http://www.amdzone.com/phpbb3/viewtopic.php?f=532&t=138922

Cheers!
 
As I said, several times: I measured IPC for a specific program. As I also noted, IPC varies depending on workload; it is NOT a flat value. I also noted up front I was assuming 100% core loading, which is almost certainly not true.

My entire point was to show that for a given benchmark, where the performance and processor speed are known, you can mathematically calculate IPC for that benchmark. Heck, you could factor CPU load too, to make an even more accurate calculation. In any case, do that across a variety of different CPU bound tests, and you get an idea how much IPC affects the results.

In terms of pure processing power at 100% load in non-CPU bound programs, BD is significantly more powerful then SB. Problem is, more often then not, that condition I just listed is not true, so the higher IPC of SB overcomes core count and an increased number of CPU cycles BD offers.
If you look at it that way, ipc of 41xx is much higher than 81xx, as well dual core sb is much higher than 2600k. That's not testing instructions per clock per core, that testing instructions per cpu and dividing the results in a way that's not even remotely accurate.

I will say it again, ipc can't be truely tested for pure ipc, especially in lopsided optimized software. Claiming instructions per clock per core is false and irrelevant for todays programs. There is only overall performance for any given program aside from a pure synthetic test.

Just for the record, sysmark is not pure.
 
These crazy exaggerated [strike]clockspeed discrepancies[/strike] benchmark calculations being used in theoretical discussions involving IPC, is a continuation of the woeful behaviour which saw gullible and/or young posters on forums all around the world getting sucked into buying [strike]AM3+ motherboard before Bulldozer came out[/strike] an Intel platform.

Cheers! 😗
:lol: Do you want to point out where anyone who gave a damn about performance, suffered for buying an Intel platform in recent years?
 
:lol: Do you want to point out where anyone who gave a damn about performance, suffered for buying an Intel platform in recent years?

I'll give you that the price performance is just a tad better for Intel, but...

IPC, more coars, blah blah.

All that matters is Price/Performance in the applications YOU care about most.

The question is what is AMD doing about the known bottlenecks in BD?

This.

Cheers! 😛
 
If you look at it that way, ipc of 41xx is much higher than 81xx, as well dual core sb is much higher than 2600k. That's not testing instructions per clock per core, that testing instructions per cpu and dividing the results in a way that's not even remotely accurate.

I will say it again, ipc can't be truely tested for pure ipc, especially in lopsided optimized software. Claiming instructions per clock per core is false and irrelevant for todays programs. There is only overall performance for any given program aside from a pure synthetic test.

Just for the record, sysmark is not pure.


IPC is not all that matters but when it comes to the BD its issue is PURE CORE PERFORMANCE(Which means IPC/CPI BD is already clocked high enough!) Most people scream and say benchmarks aren't done right,??First of all I never take any type of synthetic benchmark to seriously, Only programs that people use, Such as Video encoding and even that At best And ties with a 4 core processor from Intel that cost less money and uses less heat and it cost more for Amd to make it as well since it has a bigger die size. Then i care about Gaming and in that regard the BD is worse then the Phenom and the Phenom can't even compete head on with first gen I7's and some high end I5's.
I remember someone here complaining about Skippy FPS in skyrim when he had a 7970+8150fx their is no excuse for that, when a I3 can play it fine. Again Amd does not need to have the same performance as Intel per core heck i would even take 20% less performance per core but twice the cores for the future for the same price but not 50% as powerful per core(Compared to Intel) with twice the cores.

I really hope People working at Amd know better then some of the people in this forum if not we wont ever see competition in CPU's again at any price point.
 
A revolution and new blooded is needed to breathe some real life into this industry. To me more and more things are stale, dried out, and dead. With only two horses pulling the wagon when there used to be more the industry looks like it is being readied to be merged into one corporation only environment while smaller companies are either pushed out or bought up. There is a desperate need for diversity and renewed vitality in this industry that provides truly new ideas and avenues of progress. However instead the x86 world only has one big company dominating almost everything with one company that can't get much right with a third that looks to be almost ready to be bought up at any moment or fail.

Look at this thread and others similar we are going in circles over the same stuff day in and day out that haven't gotten us anywhere with few exceptions.
 
that's not what I'm getting at.
BD has to be clocked so high because it's crap, slow data crunching crap at that.
so it has to be overclocked to make any performance.
but who wants to crazily overclock crap when you can run normally an Intel unit..
heat, power etc...
The architecture handles high frequencies very well. Without the transistor leakage and some of the screwed up voltage management of the chip, it would be perfectly fine at higher frequencies.

Clock speed amongst other things are design tradeoffs which are perfectly good when you balance them. if the architecture is designed for high clocks it will work as such.
 
Where are you coming up with 50%? Only time its even close to 50% is when its compiled with intel's help, but I guess that's amd's fault too.


If it takes 8 cores to tie a 4 core from Intel(With HT and some times without) in Hand brake(which can use all 8 cores). Not to mention most games listed below have a dramatically lower FPS rate. Plus a lot of the single core benchmarks are 50%+ lower then Intel's.

http://www.anandtech.com/bench/Product/434?vs=551


Lets also not forget that the BD has 8MB of L2 cache to help it along which is more then what Intel has.

Just to let you guys know i have my processor clocked at 3.9Ghz and my friends is clocked at 2.2Ghz and he has a I7 sandy mobile processor and he gets around 5-7% better Single core performance then i do we tested this by Cinebench as well as WPrime and with Fritz_Chess benchmark by only having 1 core run. This is when i have a 77% clock speed advantage. This just goes to show their is a issue indeed.

Thank god i still beat him when all cores all stressed by around 15-20%! To bad real programs didn't use all of my 6 cores.
 
The architecture handles high frequencies very well. Without the transistor leakage and some of the screwed up voltage management of the chip, it would be perfectly fine at higher frequencies.

Clock speed amongst other things are design tradeoffs which are perfectly good when you balance them. if the architecture is designed for high clocks it will work as such.


Well?? Well?? It takes a 125 watt to come close to a 95 watt chip from Intel that has less transistors i'm truly missing something here.
 
IPC, more coars, blah blah.

All that matters is Price/Performance in the applications YOU care about most.

The question is what is AMD doing about the known bottlenecks in BD?

Pretty much what I've been saying this whole time.

Performance vs Cost vs Energy Usage vs Density. Currently Intel I5-2500K takes the crown.

I'm waiting for the numbers to climb from "Blue deity is 50% bettar!! then false green deity" to "Blue deity is 100% better!!!!"
 
If it takes 8 cores to tie a 4 core from Intel(With HT and some times without) in Hand brake(which can use all 8 cores). Not to mention most games listed below have a dramatically lower FPS rate. Plus a lot of the single core benchmarks are 50%+ lower then Intel's.

http://www.anandtech.com/bench/Product/434?vs=551


Lets also not forget that the BD has 8MB of L2 cache to help it along which is more then what Intel has.

Just to let you guys know i have my processor clocked at 3.9Ghz and my friends is clocked at 2.2Ghz and he has a I7 sandy mobile processor and he gets around 5-7% better Single core performance then i do we tested this by Cinebench as well as WPrime and with Fritz_Chess benchmark by only having 1 core run. This is when i have a 77% clock speed advantage. This just goes to show their is a issue indeed.

Thank god i still beat him when all cores all stressed by around 15-20%! To bad real programs didn't use all of my 6 cores.
Ok, so for you ipc = overall performance / core count, nevermind scaling or optimizations that affect the outcome.

As for cinebench, yes there is a very strange behavior with that bench, and not knowing how the program is setup I don't know what the issue is, but the first square and one other is done at almost 1/8 as fast as the rest of the grid. Is that grid pure fpu, l2 cache loads, l3, main memory.... whatever it is, thats where BD needs to be fixed. But that one problem automatically gives AMD = fail over all the benefits it may have.
 
Wider slower, narrower faster,
Power and productivity is whats important.

Who here uses nVidia?
They just went wider with the 680
Also, look at TSMCs 28nm process, its leakage vs Intels 32 or 22nm processes

An arch is totally designed for many uses, with a particular thermal window in mind.
Nowhere does it say going from BD, AMD cant do a lot better, using the same arch.
Its narrower, thus a higher clock, now get a few tweaks for perf, up the clocks a lil, and you see decent improvement.
Its not a one way street
 
Status
Not open for further replies.