AMD Piledriver rumours ... and expert conjecture

Page 76 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
Actually, he probably does 😛. You might recall that AMD also said Bulldozer would have IPC improvements over Phenom II, and we all know how that one turned out..
I was never aware of IPC claims at AMD, I was under the assumption the target per core performance was phenom II level and IPC would be lower. AMD knew this before they launched and was part of the design. All the info from AMD was pointing toward a big jump in total through put and multithreaded integer performance.
 
I was never aware of IPC claims at AMD, I was under the assumption the target per core performance was phenom II level and IPC would be lower. AMD knew this before they launched and was part of the design. All the info from AMD was pointing toward a big jump in total through put and multithreaded integer performance.

AMDs original plan was to have higher IPC. Of course that was the 2007 Bulldozer, the one on 45nm that was to come out after Barcelona and as well have a very different version of what they call CMT. It was supposed to be more like SMT but with more resources per core in order to actually give more than the top end 20% that SMT brings.

While SMT shares most resources and has a execution engine that can throw two tasks in a pipeline at once, BD was to have more parts. It wasn't originally the 80% if I remember but then again there wasn't much info to start with and then it dissapeared from the road maps for a while when Deneb came out, which became the 45nm CPU for AMD.

AMDs main goal was more along the line of , Bulldozer current version, was to keep IPC the same and allow for higher clocks. Biggest problem, as Intel found with NetBurst, was that increasing the pipeline tends to end up with lower IPC than previous designs but does allow for higher clocks. Pentium 4s hit 3.8GHz stock at their top. I was stating this but people dismissed it. PD may be able to fix the IPC deficiency but it will still be lower compared to SB and even lower to IB meaning they will need the higher clocks to sustain a competitive product which in the down side will mean higher power usage, something AMD used to do very well in pre Core 2.
 
the pipeline in phenom was shorter than sandy bridge, the pipeline isn't all thats important when looking at IPC.

No not only, but its a major part. If branch prediction sucks then the pipeline has to be flushed to input the correct information. That takes time. BDs pipeline is longer than SB and PhII and it seems to have bad branch prediction among other things.

As for SB vs PhII, SB also has the ability to store all instructions in the L3 cache to be recalled at any time, while PhII and even FX still has to go to memory which is still much slower.
 
I am going to assume you missed what i wrote there. Sarcasm.

100% of the market does not have access to DDR4. Which means making cpu's support it now, when 0% of people would use it, would be an utter waste of time.
When Vishera comes in 2013 no one knows for certain what will be
 
nice job knowing nothing yet claiming you know more than the engineers at AMD. AMD has already said piledriver will have IPC improvements.
or do you magically know something AMD doesn't?

AMD said a lot of things about BD performance, few of them which turned out to be true. I'm doing exactly what I did with BD: Looking at the design decisions made, and projecting performance SOLELY based on that.

Will PD have an improved IPC? Probably, due to small optimizations. Anything major? Not really. So when AMD says "IPC has improved", without knowing HOW and WHY, I take a very pessimistic view on how good those IPC improvements will be.
 
And as I noted above, you previously tried to say that SIMD instructions didn't work well in parallel in an attempt to convince people that AMD's Fusion idea was a bad one. That one statement from you pretty much threw your entire credibility away when it comes to micro-architecture. That you actually thought that and tried to use it in an argument shows the exact level of knowledge you have. And then based on this knowledge, you then say that AMD needs to redesign not their die, nor some specific component, but the ENTIRE architecture.

No, I simply pointed out that as a whole, software does not tend to scale well. Period. As such, I viewed a massivly multicore design as a way to improve overall performance as fundamentally flawed. I can say this, because of years of doing SOFTWARE DESIGN. Same concept as why I bashed Intel's larabee when it was announced, as I found the concept of using a few serial processors to make a massivly parallel chip close to idiotic in all honesty. If Intel also went in the direction of a massivly multicore chip, I wouldn't hesitate to bash them as well.

The fact is, since the majority of the software written is not going to scale particularly well, adding more cores is not going to significantly improve performance. Farther, since you get diminishing returns, one could argue that adding more cores is actually a waste of die space after a certain point in time. [I note servers are a different beast, since you tend to have more then one heavy duty app going at one time. BD would be competitive in the server space, as I've said for a while now.]

As for the rest of my BD projections, I looked at the Pentium 4 for comparison, since BD used a LOT of design principles that were simmilar to the P4 approach [long pipeline for higher clocks, at the expensive of IPC, a form of SMT, etc]. That gave me a baseline for change in performance. Then I looked at how AMD implemented those concepts, and made my conclusions about drop in performance. PS: I was right.

At the end of the day, I simply find the concept as adding more cores to increase performance as fatally flawed on the desktop. On this point alone, I consider BD as fundamentally flawed. Combine that on the reliance on high clocks, which are limited by power/thermal constraints, this leaves BD with very little forward progress to work with going forward. These two problems alone warrent a re-design in my mind, since even if every other problem with BD were magically fixed [which I doubt], BD still wouldn't be as good as SB.

Meanwhile, you make the argument that since BD is modular, AMD can simply swap out the offending components. You fail to ask some fundamental questions though: Why is the scheduler so bad? Why are cache latencies so high? What design changes need to be done to fix them? What design tradeoffs will result? And most importantly: If AMD could simply have swapped out the offending components without negativly affecting the rest of the processor, why didn't AMD replace those components in the first place?
 
module use sharing of all components between a module except integer unit!?, then it is like htt of intel but with a seprate integer unit.
That sharing is causing higher latency!?

(using aida64 i got that fx6 have less memory latency than sb. Also a 1090t@4ghz with a single channel 1333cl8t1 ram have 42ns which is lowest in the benchmark database of aida64 which i have.)
 
module use sharing of all components between a module except integer unit!?, then it is like htt of intel but with a seprate integer unit.
That sharing is causing higher latency!?

Its a bit more then that; HTT is basically a REALLY simply SMT approach, where only the registers and a few other items are duplicated. AMD's CMT approach is much more powerful then HTT, but as has been pointed out by a few people, they can't be fed properly.

As for cache latencies, when you have a large, shared cache, you tend to increase the time to traverse all of it. On the plus side, you increase the chance of whatever you are looking for being in the cache in the first place. Its a design tradeoff. Decreasing the size of the cache may improve latencies a bit, but you increase the chance you have to go to main memory, which is even slower. Hence, why you can't simply "fix the latency"; you have to make some design change, which will typically include some form of tradeoff.

Also, I again point out the complete news blackout that seems to be on PD. Do we even know what changes are being done to it yet, aside from the fact it will clock higher?
 
AMDs original plan was to have higher IPC. Of course that was the 2007 Bulldozer, the one on 45nm that was to come out after Barcelona and as well have a very different version of what they call CMT. It was supposed to be more like SMT but with more resources per core in order to actually give more than the top end 20% that SMT brings.

IIRC just one year ago, JF-AMD (VP of AMD server marketing in case new people here don't recognize the userID) was posting in the BD thread here and in numerous other threads at Anandtech, AMDZone, etc. and assuring everybody that BD's IPC would be higher according to the engineers. Of course, Baron took that and immediately said a minimum of 30% higher and maybe 60% higher IIRC - would surpass not only SB but IB too 😛..

This is the trouble with the new guys - they weren't here a year ago and thus have no memory of past AMD promises and subsequent failures to deliver, which is why many of us now favor Intel since Intel comes much closer to the mark usually. It's not being a 'hater', just disillusioned after 5+ years from the Barcelona to Bulldozer era. I think we all acknowledge that AMD hit a home run with K8, and most of us were expecting another with Barcie and felt torpedoed amidships when it finally appeared - late, extremely low clocks, and way underperforming from what AMD's VP Randy Allen said it would be just months before the launch. The TLB bug I discount because it didn't affect most users, but the stupid patch AMD released, cutting poor performance by another 10%, as a 'quickie' type fix, was just insult piled on top of injury..

And I really don't see much changing at AMD now or in the near future - still the company of much overhype and underachievement in CPUs. If that makes me a "hater", then so be it. At least I see the company through the lens of its present and past performance, rather than latching on to the latest news blurbs and believing the hype.. If the AMD lovers want to believe the latest rumors and spend their $$ before independent 3rd-party reviews are out, then good luck with that 😛..

 
Intel wants offer its Fabs 22nm to help others

http://208.65.201.194/news/General-Tech/Intel-becomes-22nm-foundry-yes-other-people

IMO it's more Intel wanting to look at FPGAs (since the other company's product that Intel fabs is an FPGA).

However I had to laugh at the article's comment:

February 23, 2012 | 04:03 PM - Posted by Imperfectlink
Perhaps AMD should consider signing on as a customer. ;-)
 
From the front-page article comparing an APU to CPU + discrete, at the total component cost of $140: http://www.tomshardware.com/reviews/pentium-g620-amd-a8-3870k-radeon-hd-6670,3140-12.html

We can hardly be surprised by the outcome. Intel's Pentium G620 and AMD's Radeon HD 6670 achieve roughly 17% slower application performance than a stock A8-3870K. However, they offer roughly the same margin of advantage over AMD's APU when it comes to minimum frame rates in games. Average frame rates favor the discrete graphics card by nearly 25%.

Of course, overclocking helps the APU stretch to almost 40% faster than our Pentium processor, and average frame rates pull within 10% of the Intel-based system. That doesn't quite tell the whole story, though. Because the discrete Radeon's advantage was enough to win our game tests, we didn't overclock it. Almost certainly, a little additional tweaking would have pushed the add-in card's performance further in front of the APU's best effort.

What conclusions can be drawn from this data, then? Clearly, the A8-3870K is a better platform for general productivity, particularly when you run threaded applications (or do a lot of multi-tasking) able to leverage four physical cores. The Pentium G620 and discrete Radeon card combine to form a superior gaming system. We used a $140 budget to create as fair of a comparison as possible, but enthusiasts with a little more money to spend on graphics can get even better performance by dedicating additional funds to that subsystem. Meanwhile, the A8-3870K is already AMD's fastest APU, so there's not much room to scale up.

What about overclocking? If you're a value-seeker, eager to push stock components further, the A8-3870K is a fun toy. Asus' F1A75V-Pro motherboard managed to achieve a 3.3 GHz processor clock and 800 MHz graphics frequency through its automatic overclocking feature, and we managed a 3.6 GHz core clock and 960 MHz graphics setting through our own manual efforts. The result was a notable boost to application performance, along with a gaming speed-up that came closer to matching a stock Radeon HD 6670. Intel simply doesn’t have anything in the same price range able to match the A8-3870K’s blend of graphics performance, capacity for threaded apps, and overclocking headroom. It's just unfortunate that overclocking has such a negative effect on the APU's power consumption.

And how about each platform's upgrade path? This is an especially critical point for gamers. Out of six tested titles, two had to be run at 1024x768 in order for us to present playable performance. It's actually fairly impressive that two low-cost configurations can push 720p in most games at decent frame rates. But if you're serious about entertainment, low resolutions will limit the enjoyment you get out of modern titles. At some point, you'll want to upgrade. The good news is that a $120 graphics card is good enough for a smoother experience at 1080p.

But that's where the A8-3870K loses some of its appeal. As we already established, if you're using a Socket FM1-based motherboard and an A8-3870K, your only upgrade would be to a faster add-in graphics card. In that scenario, the APU basically becomes a $140 Athlon II X4. The upcoming Llano replacement, code-named Trinity, is expected to employ the incompatible Socket FM2 interface, so the -3870K could be as good as it gets on Socket FM1.

Perhaps surprising to critics of Intel's interface evolution, LGA 1155 seemingly has room to grow. Not only can you drop a more powerful multiplier-unlocked Core i5 or i7 into it today, but the upcoming Ivy Bridge-based processors should work with existing motherboards as well.

Of course the G620 is a 2-core/2-thread CPU and not oc'd either. Would be interesting to compare using an i3-2100 for about $40 more, since the latter has hyperthreading.
 
The upcoming Llano replacement, code-named Trinity, is expected to employ the incompatible Socket FM2 interface, so the -3870K could be as good as it gets on Socket FM1.
this has me a bit worried. this is probably the fifth time i've read about trinity being incompatible with socket fm1. desktop llano owners would be without an upgrade path..
on the positive speculative note, this could mean that amd is building pcie 3.0 controller, different (hopefully improved) imc into trinity. possible result: the igpu be much less bw starved (check the sandra memory bench), possibility to handle the pcie 3.0 gfx cards well - compared to how fx and llano handle current, mid and higher end cards. mmm.. positive speculations...
 
Which APU will be going to FM2? Reading the front page article comments and all this DDR4 speculation has me wondering if AMD will move to more memory channels to support higher bandwidth for the on die GPU.

But from what others have said about AMD socket changes that will be FM2/3 before that could happen correct?
 
The upcoming Llano replacement, code-named Trinity, is expected to employ the incompatible Socket FM2 interface, so the -3870K could be as good as it gets on Socket FM1.
this has me a bit worried. this is probably the fifth time i've read about trinity being incompatible with socket fm1. desktop llano owners would be without an upgrade path..
on the positive speculative note, this could mean that amd is building pcie 3.0 controller, different (hopefully improved) imc into trinity. possible result: the igpu be much less bw starved (check the sandra memory bench), possibility to handle the pcie 3.0 gfx cards well - compared to how fx and llano handle current, mid and higher end cards. mmm.. positive speculations...

I'd think most DT users would go with Piledriver and a discrete card, unless they are really price constrained, esp. if they already have an AM3+ board and assuming PD will actually go into the AM3+ board.
 
Chad is pretty dead on with the memory. As I have said many a time before, DDR4 is speced and slated for release in 2014, per Intel who will probably start using it a year before AMD if not more.

Intel always moves to the new memory technologies upon release while AMD tends to wait till its actually affordable.

Considering the change to DDR4 from DDR3, using a different approach to how the IMC connects to it instead of multiple DIMMs per channel it will be a single module per channel, AMD might not want to wait as it may show increases that will benefit their IGP as well as server parts.

But still we have 2 years till its possibly adopted by Intel and even then probably 3-4 years until its as affordable as DDR3 currently is (8GB of nice Corsaid is about $50-$60 right now).

So no, PD wont use DDR4. It and many of the successors will use DDR3.

Haswell (1150 due next year) is DDR3 only and I doubt they will be replacing 2011 that fast just to be able to make use of DDR4 but I am guessing that it might be 2015 before DDR4 becomes mainstream. By then AM3+ and 1155 will be long dead so what is going to use it first I don't know maybe AMD will get there or Intel in it's high end server platform but dumping 1150 (Haswell) and 2011 so fast won't sit well with most.
 
Status
Not open for further replies.