AMD Piledriver rumours ... and expert conjecture

Page 173 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
with all the talk of trinity, im actually a bit more exited about amd's 2nd gen fx series. can they make a comeback after their initial fall? im really looking forward to this.

also, its rumored that the next get consoles will be using a piledriver cpu. what i wonder is how powerful we can expect it to be.

I don't see it. MAYBE for the next XBOX, if MS wants to unify their OS across all platoforms. But for the next PS, it doesn't make sense to not stick with a PPC based CPU. Remember that all consoles in recent years have been PPC based, and thats due mainly to PPCs lower power draw compared to X86.
 
I heard a FX-8150 or FX 8120 does great at Folding at Home
those 8 cores can do real well at it especially under a Linux distro

BTW I started a poll concerning Folding at Home
http://www.tomshardware.com/forum/34526-12-folding-home-forum-section-good


this a link to our Toms Folding at Home thread if you are interested in joining our team in helping to save lives in the future
http://www.tomshardware.com/forum/forum2.php?config=tomshardwareus.inc&cat=28&post=268010&page=1&p=1&sondage=0&owntopic=1&trash=0&trash_post=0&print=0&numreponse=0&quote_only=0&new=0&nojs=0

our members there would be glad to answer any questions for you
 
Im imagining piledriver to be a decent boost in performance efficiency and overclockbility. But the cyclo clockmesh will not be able to have much benefit above 5ghz(peak efficiency seems to be about 3.2-3.6ghz from the release documents) so its OC'ed power consumption should be substantial but probably still better than bulldozer.

I don't expect piledriver to deliver more than phenom II IPC performance but the single multicore performance should be higher due to the increase in both clock and cores. Most of the work from looking at trinity shows that its improvements are on power consumption. So I'd think piledriver will be able to hit much better efficiency.

AMD seems to be holding everything till steamroller to have large improvements. Piledriver looks to be a step in the right direction but nothing too special. AMD should be able to compete against ivybridge at price at least.
 
AMD sorry but I have to tell you something....
this isn't working out and I think we need to see other IPC solutions.
it's not you, it's me.
OK I'm lying, it's all your fault but I'm the one walking away.

I just got tired of the fact that even when I clock you to 4.2GHz you still got whooped by an i5-2400..
enough already and I'm tired of people talking about us like we were to marry.
hell I'm not even going to date sexy Kepler neither.

with that being said, I have just packed and shipped you off to someone who desires you or desired the price I pimped you for..
do not worry about my Windows install for my install WILL transfer over without issues.
I thanks you for caring.

I will gladly give you a ride to the post office, let's go.
my 2500K twin unit (another 2500K) is in route and you don't need to she her, much jealousy you will have.
I didn't hold back with her and went for it and she has a better frame...
ASUS P8P67 WS REVOLUTION LGA 1155 Intel P67 / NVIDIA NF200
($129.99 super sale)

cheers.


ROFL
Oh cmon AMD was a good CPU to you
Remember the days when you were treated like crap by Ms Pentium 4?
Who was there in those rough times?
And remember when you and AMD hit 4ghz together and what a fun day that was?
You two have such a history together
think of the kids
Little APU will be heartbroken
maybe you two could talk to a counselor?
 
Im imagining piledriver to be a decent boost in performance efficiency and overclockbility. But the cyclo clockmesh will not be able to have much benefit above 5ghz(peak efficiency seems to be about 3.2-3.6ghz from the release documents) so its OC'ed power consumption should be substantial but probably still better than bulldozer.

I don't expect piledriver to deliver more than phenom II IPC performance but the single multicore performance should be higher due to the increase in both clock and cores. Most of the work from looking at trinity shows that its improvements are on power consumption. So I'd think piledriver will be able to hit much better efficiency.

AMD seems to be holding everything till steamroller to have large improvements. Piledriver looks to be a step in the right direction but nothing too special. AMD should be able to compete against ivybridge at price at least.

damn!!! i was right!!😀 Can u please give me the link to these documents which indicate max efficiency at the 3.6GHz range.....i predicted this wud happen!!! See my post on page 100 :)
 
AMD sorry but I have to tell you something....
this isn't working out and I think we need to see other IPC solutions.
it's not you, it's me.
OK I'm lying, it's all your fault but I'm the one walking away.

I just got tired of the fact that even when I clock you to 4.2GHz you still got whooped by an i5-2400..
enough already and I'm tired of people talking about us like we were to marry.
hell I'm not even going to date sexy Kepler neither.

with that being said, I have just packed and shipped you off to someone who desires you or desired the price I pimped you for..
do not worry about my Windows install for my install WILL transfer over without issues.
I thanks you for caring.

I will gladly give you a ride to the post office, let's go.
my 2500K twin unit (another 2500K) is in route and you don't need to she her, much jealousy you will have.
I didn't hold back with her and went for it and she has a better frame...
ASUS P8P67 WS REVOLUTION LGA 1155 Intel P67 / NVIDIA NF200
($129.99 super sale)

cheers.

Getting rather dramatic, comparing a 2008 chip to a 2011 chip lest of all in 99% of instances bar overclocking a 2400 is the same thing as a 2500K but a 2400 looks more dramatized. In most cases its almost impossible to tell the difference in performance outside synthetics, and even then the quantum metrics of synthetics is essentially down to what the programmer decides which metric fits which chip. In gaming is there really any difference between a X4 and X6, 2100 and 2700? nothing noticeable but synthetics will tell you there is a huge disparity.

You know Intels future map, smaller and more efficient, lower overclocks and fragile silicon which is either DT trimmed servers chips or buffed up mobile chips, with the same flavour. How much did trigate really influence performance, none...If anything Intel are already bouncing off the IPC limiter, effiecency is barely better than SB and no additional cores, while they have aced single threaded performance, the future is not single threaded performance and they have done very little to address multithreaded performance, save a 8150 which is clock for clock weaker but hangs with and has victories over intels 8 threaded and more expensive chips. Then there is IGP.

We can be bombarded with sale gimmicks like IPC, trigate...excuse me 3D trigate transistors, 22nm, HD 4000 and dynamic AA and lucid transcoding, but at the end of the day a chip is a chip and if it can do what you need it to do that is enough.
 
Getting rather dramatic, comparing a 2008 chip to a 2011 chip lest of all in 99% of instances bar overclocking a 2400 is the same thing as a 2500K but a 2400 looks more dramatized. In most cases its almost impossible to tell the difference in performance outside synthetics, and even then the quantum metrics of synthetics is essentially down to what the programmer decides which metric fits which chip. In gaming is there really any difference between a X4 and X6, 2100 and 2700? nothing noticeable but synthetics will tell you there is a huge disparity.

You know Intels future map, smaller and more efficient, lower overclocks and fragile silicon which is either DT trimmed servers chips or buffed up mobile chips, with the same flavour. How much did trigate really influence performance, none...If anything Intel are already bouncing off the IPC limiter, effiecency is barely better than SB and no additional cores, while they have aced single threaded performance, the future is not single threaded performance and they have done very little to address multithreaded performance, save a 8150 which is clock for clock weaker but hangs with and has victories over intels 8 threaded and more expensive chips. Then there is IGP.

We can be bombarded with sale gimmicks like IPC, trigate...excuse me 3D trigate transistors, 22nm, HD 4000 and dynamic AA and lucid transcoding, but at the end of the day a chip is a chip and if it can do what you need it to do that is enough.
To an engineer, possibly. To a marketing executive, it is a sales pitch to help stock holders.
 
Anandtech has a nice analysis of BD.

http://www.anandtech.com/print/5057

Quite excited I am for Piledriver.

If all goes well, it might just fit into my computer.

With the added efficiency, it might be a worthy upgrade.

From the article:

The Real Shortcomings: Branch Misprediction Penalty and Instruction Cache Hit Rate

Bulldozer is a deeply pipelined CPU, just like Sandy Bridge, but the latter has a µop cache that can cut the fetching and decoding cycles out of the branch misprediction penalty. The lower than expected performance in SAP and SQL Server, plus the fact that the worst performing subbenches in SPEC CPU2006 int are the ones with hard to predict branches, all points to there being a serious problem with branch misprediction.

Our Code Analyst profiling shows that AMD engineers did a good job on the branch prediction unit: the BPU definitely predicts better than the previous AMD designs. The problem is that Bulldozer cannot hide its long misprediction penalty, which Intel does manage with Sandy Bridge. That also explains why AMD states that branch prediction improvements in "Piledriver" ("Trinity") are only modest (1% performance improvements). As branch predictors get more advanced, a few tweaks here and there cannot do much.

It will be interesting to see if AMD will adopt a µop cache in the near future, as it would lower the branch prediction penalty, save power, and lower the pressure on the decoding part. It looks like a perfect match for this architecture.

Another significant problem is that the L1 instruction cache does not seem to cope well with 2-threads. We have measured significantly higher miss rates once we run two threads on the 2-way 64KB L1 instruction cache. It looks like the associativity of that cache is simply too low. There is a reason why Intel has an 8-way associative cache to run two threads.

For emphasis:

The problem is that Bulldozer cannot hide its long misprediction penalty, which Intel does manage with Sandy Bridge. That also explains why AMD states that branch prediction improvements in "Piledriver" ("Trinity") are only modest (1% performance improvements). As branch predictors get more advanced, a few tweaks here and there cannot do much.

So...if the branch predictor is being blamed as the primary cause of performance woes, and AMD admits very little performance increase of PD is comming out of branch predictor improvements, where exactly is PD getting its IPC improvements from? Based on this analysis, you'd have to conclude that PD is basically going to gain performance through clock speed increases, which would NOT be a good sign, as it indicates AMD is either unable or unwilling to make major improvements to the underlying architecture.
 
you mean this article that only showed any difference in sandra's synthetic test?

<snip>

Gaming performance between 2133 and 1600 = ... its not 7%

<snip>

Maybe SB-E does better. http://www.tomshardware.com/reviews/quad-channel-ddr3-memory-review,3100-10.html

... nope.

besides, outside benchmarks, only time ram specs seem relevant with amd components is when they're being used with the apus. i wonder if anyone even notices performance increase from faster memory in a zambezi based pc.

considering I seem to be the only one running a zambezi pc here on 2133 memory ... i must not know what I am talking about, too hard to go to the bios and change it to 1333 and verify the results are accurate.

I've run quite a few tests of this very issue on my system (Opteron 6234, supports four channels of DDR3-1866) and apart from synthetic memory tests like Stream, I just about never see any decrease in performance that is even a full percentage point until I clocked the RAM down to DDR3-800. Anything poorly-threaded didn't even care if it ran at DDR3-800. The only applications/benchmarks that seem to care at all if I was running my RAM at DDR3-1333 or faster were a small number of HPC type of benchmarks, which had a decent bump in performance from DDR3-800 to DDR3-1066 and then a percentage point or so from 1066 -> 1333 and about half a percent from 1333 -> 1600. That very well meshes with your observations and THG's as well- there isn't much benefit from having RAM that's much faster than the fuddy-duddy 1333 or 1600 with non-APU CPUs. An old wise guy I used to know once said that with RAM, pick quantity over speed if you need to choose, because even the slowest RAM is still a heck of a lot faster than your hard drive (swap or pagefile.)
 

but you dated that power hog super 😉


not always
i never use paging 😛
then why ddr4? (other than power saving)


(assumption)
4ghz 4 core pd will be equal to phenom 2 core at 4ghz in ipc (single core performance). But due to module design, it will be equal to 3.2 ghz phenom 2 (in highly multithreaded like 7zip bench). But it may use 65-80w of power in comparison to 125w of phenom 2 😗
 
but you dated that power hog super 😉


not always
i never use paging 😛
then why ddr4? (other than power saving)


(assumption)
4ghz 4 core pd will be equal to phenom 2 core at 4ghz in ipc (single core performance). But due to module design, it will be equal to 3.2 ghz phenom 2. But it may use 65-80w of power in comparison to 125w of phenom 2 😗


Where did you get that from 😱 ......link please :hello:


 
i never use paging 😛

I always do. There is a memory overhead penalty when you disable paging, because ALL virtual memory addresses need to be backed by physical memory at creation time, instead of assignment time.

For instance, lets say I create a 1000 element integer array. With paging enabled, my total RAM usage is still nada. Its not until I assign some value to an array element where it needs to be backed by RAM. With paging disabled, however, because every virtual address needs to be mapped to a physical address, all 1000 elements need to be backed by RAM upon object creation. This becomes a MAJOR waste of RAM if you have a program which creates some data structure based on "worst case" conditions. [And yes, I know a few programs where this happens].

Nevermind you have the TLB on the CPU specifically to speed the process up.
 
From the article:

The Real Shortcomings: Branch Misprediction Penalty and Instruction Cache Hit Rate

Bulldozer is a deeply pipelined CPU, just like Sandy Bridge, but the latter has a µop cache that can cut the fetching and decoding cycles out of the branch misprediction penalty. The lower than expected performance in SAP and SQL Server, plus the fact that the worst performing subbenches in SPEC CPU2006 int are the ones with hard to predict branches, all points to there being a serious problem with branch misprediction.

Our Code Analyst profiling shows that AMD engineers did a good job on the branch prediction unit: the BPU definitely predicts better than the previous AMD designs. The problem is that Bulldozer cannot hide its long misprediction penalty, which Intel does manage with Sandy Bridge. That also explains why AMD states that branch prediction improvements in "Piledriver" ("Trinity") are only modest (1% performance improvements). As branch predictors get more advanced, a few tweaks here and there cannot do much.

It will be interesting to see if AMD will adopt a µop cache in the near future, as it would lower the branch prediction penalty, save power, and lower the pressure on the decoding part. It looks like a perfect match for this architecture.

Another significant problem is that the L1 instruction cache does not seem to cope well with 2-threads. We have measured significantly higher miss rates once we run two threads on the 2-way 64KB L1 instruction cache. It looks like the associativity of that cache is simply too low. There is a reason why Intel has an 8-way associative cache to run two threads.

For emphasis:

The problem is that Bulldozer cannot hide its long misprediction penalty, which Intel does manage with Sandy Bridge. That also explains why AMD states that branch prediction improvements in "Piledriver" ("Trinity") are only modest (1% performance improvements). As branch predictors get more advanced, a few tweaks here and there cannot do much.

So...if the branch predictor is being blamed as the primary cause of performance woes, and AMD admits very little performance increase of PD is comming out of branch predictor improvements, where exactly is PD getting its IPC improvements from? Based on this analysis, you'd have to conclude that PD is basically going to gain performance through clock speed increases, which would NOT be a good sign, as it indicates AMD is either unable or unwilling to make major improvements to the underlying architecture.


Gamer it's already been said Amd is also improving other aspects as well, Plus look here
http://www.tomshardware.com/forum/335164-28-trinity-4100

Trinity is already 10% faster per clock compared to Bulldozer and this is without L3 cache. With this info alone we can hint that IPC will go up! Theirs going to be other improvements as well that will help performance such as a higher clock speed. If Amd prices their parts right Intel will have competition. Will it hurt their I7 or High-end I5's No i don't think it will.


Edit look here as well

Piledriver_core_improvements.png
 
I've run quite a few tests of this very issue on my system (Opteron 6234, supports four channels of DDR3-1866) and apart from synthetic memory tests like Stream, I just about never see any decrease in performance that is even a full percentage point until I clocked the RAM down to DDR3-800. Anything poorly-threaded didn't even care if it ran at DDR3-800. The only applications/benchmarks that seem to care at all if I was running my RAM at DDR3-1333 or faster were a small number of HPC type of benchmarks, which had a decent bump in performance from DDR3-800 to DDR3-1066 and then a percentage point or so from 1066 -> 1333 and about half a percent from 1333 -> 1600. That very well meshes with your observations and THG's as well- there isn't much benefit from having RAM that's much faster than the fuddy-duddy 1333 or 1600 with non-APU CPUs. An old wise guy I used to know once said that with RAM, pick quantity over speed if you need to choose, because even the slowest RAM is still a heck of a lot faster than your hard drive (swap or pagefile.)


Wait are you telling me that some people here actally know the truth well that's the first! :kaola:

This is what i was saying their is some programs that do benefit usually they are multithreaded ones such as Winzip.
 
Status
Not open for further replies.