AMD Piledriver rumours ... and expert conjecture

Reynod · Oct 27, 2011

We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...

gamerk316 · Sep 5, 2012

sarinaide :

15% performance + 15% performance != 30% performance gain.

So yeah, lower expectations right now.

viridiancrystal · Sep 5, 2012

gamerk316 :

I feel that porting should not be made easier, as it only encourages putting even less effort in porting to PC.

gamerk316 :

Nope. If it is two stages, then the first is 15% on 100% (start). The second would be 15% on 115%, which would put you at 132%. Doesn't seem like much, but if you keep going, 5 stages later your looking at 175% of the original by your math, and 200% by my math 😛. Big deal.

esrever · Sep 5, 2012

de5_Roy :

that laptop has been out for months, why is fud reporting it now?

de5_Roy · Sep 5, 2012

Samsung readies first 13-inch Trinity ultrathin
http://www.fudzilla.com/home/item/28621-samsung-readies-first-13-inch-trinity-ultrathin
hp has one based on the same apu (sleekbook)
http://www.newegg.com/Product/Product.aspx?Item=N82E16834158633

gamerk316 · Sep 5, 2012

viridiancrystal :

Question: If the console is using a PC OS, PC HW, then guess what? Its a PC. The Xbox Next IS a PC, just one you can't configure using a slimmed down version of Windows specific to the console. Porting should be trivial to accomplish with minimal performance loss.

Consoles are going the way I predicted: Hardware locked PC's running a proprietary OS. Puts the PS4/WiiU in a very odd spot, quite frankly, since you'd need significantly more work to port to those platforms...

viridiancrystal · Sep 5, 2012

gamerk316 :

That the point. Even my PC is going to trump any console, by far. Not just in performance, but also in features. If you can simply change a little code and port it to the PC market, then it wont give much, even less than now, incentive to take advantage of what PC has to offer.

gamerk316 · Sep 5, 2012

Just making another point on consoles, while we're on that topic:

I've said many times you code for consoles differently then you do for a PC. Here's a real example for you: On another forum, a member noted that in Madden NFL 13, there is a limit to how many yards can be recorded in a single game: 1023. Since yards can be negative, the obvious conclusion is a signed 11 bit integer value was used to store in game yardage (2^10 - 1 = 1023, plus sign bit, for a possible range of +- 1023).

Think about that: who in their right mind would use a custom 11 bit integer rather then a standard 16 bit integer for some data storage? But you save 5 bits, which on consoles you need. On a PC, you don't care, you just declare 'int yards' and be done with it, because who cares about those 5 bits?

On another topic: Is 'int' 16 bits? 32? OS dependent (EG: 64 bits on Win64, 32 on Win32)? This is typically compiler defined (though convention is for integer to be 16 bits). But on a PC, with 2GB+ RAM, you don't care. On a console though, where you may have just over 200MB to play with, you do.

Hence why there's often significant code changes that end up making the PC release LESS optimized then the console variant, but who wants to keep that 11-bit integer type around when int-16 will suffice on a PC? Hence why a game that runs fine on a PS3 with just over 211MB of RAM available for use often requires more then 2GB on a PC. Thats due to coding inefficiencies and the result of having an OS with a more abstract memory management scheme.

Hence why I don't like the Xbox Next on principle: You are going to see console games coded the exact same way PC games would be. That yards value would be coded as a 16-bit integer, even though you don't need the full range of data an int-16 provides. Which won't be a problem at release, but two to three years later, when the hardware is dated, is where I expect the next generation of consoles to fall flat on their face, due to these coding inefficiencies. On a PC, we'd upgrade the CPU/GPU, on a console, you can't.

-Fran- · Sep 6, 2012

Well, real "int" values are defined by the CPU itself if you want the detail, haha. It's usually the same size as the register pointers.

And consoles have a mayor advantage for a programmer: closed environment. It's a real PITA to program for *Every* hardware combination around. In consoles, you don't have that problem... At least, not that much, haha.

And anyway, weren't games FPU heavy?

Cheers!

fazers_on_stun · Sep 6, 2012

davemaster84 :

According to the roadmaps Haswell is due out in about 6 months - March of next year. And Steamy not until Q4 of next year or about 13-15 months from now. So if Steamy gets delayed a bit, due to all the changes AMD appears to be making to it, then it could be up against Bridgewell instead of Haswell.

IMO Steamy might catch up to SB or maybe even IB, if they do all the fixes for BD that have been mentioned. And I doubt PD will catch up to SB. But we shall see when PD is released, maybe next month. 😉

-Fran- · Sep 6, 2012

I think Jaguar will be a very good surprise to counter Haswell in the ULV area. And then, AMD can focus on getting Steamroller to not-suck-moose-gonads.

Cheers! 😛

fazers_on_stun · Sep 6, 2012

de5_Roy :

IIRC discrete GPUs measure the entire power draw including that of the 1-3 GB of DDR3 or GDDR5 memory, which oc'd probably draws a lot more juice than typical DDR3 for main memory which I think is < 10W. However an APU shouldn't include the power draw of the memory since it uses main memory outside the chip.

davemaster84 · Sep 6, 2012

jaguarskx :

Perhaps I should have explained myself a little bit better. When we talk about improvement it's necessary to specify if it's an overall estimate or just regarding one aspect. In my case, I meant for gaming and probably for video encode, right now I'm pretty sure my phenom II is just 5% below a 2500k regarding games, I am not sure about the rest, but I think maybe PD will match SB there.

davemaster84 · Sep 6, 2012

fazers_on_stun :

OMG Haswell is at the corner! Anyway I just hope AMD releases Piledriver in October as planned, otherwise those 350 bucks I'm saving for the 8350 wont gonna hold much longer 😛

dattimr · Sep 6, 2012

fazers_on_stun :

On the other day I found this very interesting article regarding the performance increase in going from Nehalem to Ivy Bridge (which means three CPU generations, or three Ticks and two Tocks):

http://ixbtlabs.com/articles3/cpu/intel-ci7-123gen-p1.html

Overall, the average gain in a clock-per-clock comparison was some less than impressive 10%. Therefore, with all the apparent focus that Haswell seems to be putting on graphics, there seems to be plenty of chance for AMD to catch up with either Haswell or Bridgewell.

jdwii · Sep 6, 2012

dattimr :

Nice post thanks, looks like it isn't much maybe Intel doesn't care about CPU performance that much anymore, seems like its only power consumption and die size reductions. Maybe Amd was right when they said 2010-2020 will be mostly about efficiency. If you look at 2000-2010 performance went WAY 😱 up!

sarinaide · Sep 6, 2012

Basically all the "engineering samples" and benches, die shots are either fake or irrelevent. The results being branded around are post production BD cores in which AMD tried to extract whatever remaining performance out of the chip prior to Piledriver engineering, as for the die shots and results I think we can all be **** sure they are photoshoped.

AMD have Piledriver core layout slides on the website, these so called "FX 8350" reviews are basically BD core layouts.

bonds2034 · Sep 6, 2012

ok so my biggest question is will the new trinity line be able to compete with intell or is this just gona be another buget cpu... i was disapointed when intell droped l3 in the celeron,, so will not having any l3 hurt this processor???

sarinaide · Sep 6, 2012

Trinity is an APU in mobility and desktop trim, it will still be a budget setup but the improvements over Llano are significant, that said we will only know Trinity DT performance relatively soon.

gamerk316 · Sep 6, 2012

dattimr :

Which makes sense. There really aren't too many ways left to increase efficiency on a per-clock basis. Why do you think extensions like AVX(2) are getting so much attention now?

But of course, given how most programs tend not to scale well, and clock speed is limited by heat, you realize we're fast running out of ways to increase CPU performance.

mayankleoboy1 · Sep 6, 2012

AMD has a much better solution :

Make a product thats slower than the previous generation. Then spend the next 4 generations 'improving' the mistakes. And lo, in each gen you have a 20% increase. So simple 😛

truegenius · Sep 6, 2012

^ more core 😉

And lo, in each gen you have a 20% increase.

hindi

gamerk316 · Sep 6, 2012

http://semiaccurate.com/2012/09/06/a-brief-look-at-amds-steamroller-core/

So Charlie thinks that Steamroller will be the big massive improvement, and PD will basically be a minor stepping. We'll see, but the logic makes sense...

fazers_on_stun · Sep 6, 2012

gamerk316 :

From the article:

At Hot Chips 24, AMD’s Mark Papermaster gave a keynote speech that had a few technical tidbits in it. Lets take a look at two of these in particular, the Steamroller core and high density libraries.

There was a lot more to the speech, but since that is marketing, buzzwords, and related fluff, we will spare you a rehash of it. The only phrase that you really need to know is “Surround Computing”, AMD’s term for computing all around you, hopefully transparently. It is not just a five monitor game of Generic FPS #12: Wallet Lightening DLC Conveyance Addendum played in a dark room to tan both ears at once. Expect to see Surround Computing used a lot in future messaging from AMD.

Back to the interesting stuff, the Steamroller. If you recall, AMD’s cores are named Bulldozer, Piledriver, Steamroller, and Excavator. Bulldozer is out on the market in FX and Opteron guises, and Piledriver came out as the core in Trinity. The next variant is Steamroller, and that won’t come out until either Kaveri or the 2013 Opterons/FX chips break cover. Bulldozer was a radical architectural change from the status quo, it had a shared front end, shared FPU, and two distinct integer units that were somehow called ‘cores’. Piledriver cleaned up a lot of what made Bulldozer underwhelm, but the fundamental problems that hamstrung Bulldozer didn’t go away.

If you recall, that shared front end was supposed to be fast enough to feed both cores without bottlenecking either one. It wasn’t. It was supposed to have so much capacity that when one core was idle, the second would positively fly. It didn’t, but it did fall less flat with one unit idle, far less flat. The shared front end did the silicon equivalent of what that guy in the hockey mask does to wayward teenagers wandering outside of that cabin in the woods…..

The second revision called Piledriver fixes a lot of little problems, but can’t touch the architectural ones. If you think of Piledriver as Bulldozer 1.5, that is a far better description than a complete redo, it is simply evolutionary. A lot of things were cleaned up, and the most major change seems to be adding two MMX pipes to the FP unit. In the end, a lot of small bottlenecks were opened up, but that shared front end is still picking off the teenagers who went looking for their comrade, you know, the one that Bulldozer’s decoder got.

That brings us to the latest addition to the line, Steamroller, on paper it fixes a lot. Steamroller is the Bulldozer we were hoping to get a year and a half ago. Had it come out in 2011 instead of 2013, it very well might have set the world on fire, but it didn’t. Steamroller is the one kid of the group that makes it out of the forest alive. Why? Take a look at the front end, and compare it to the two prior architectures.

There are two things to note, the dropping of one MMX pipe in the FPU, and the two decoders in the front end. The one that matters is of course the decoders, and it explains why the teenager reading computer architecture books made it out of the forest without being strangled, it fixes _THE_ major problem in Bulldozer. No longer are the cores strangled. In theory. Lets wait for silicon before we celebrate, someone in a hockey mask could still pop out of the cake in the last scene.

In a world where CPU architecture people would kill for a full percentage gain in the front end, and one or two fractional percentage gains from different areas are considered a clear win, AMD is claiming a 30% gain in ops delivered per cycle and 25% more max-width dispatches per thread. In short, they did the obvious, and it did the obvious, but 30% is a massive gain that is hard to understate.

If nothing else gets in the way to hamstring performance, and at this point we would be fairly surprised if something did, then Steamroller should bring about a massive performance gain in single threaded code. To make things better, it is unlikely to fall flat when the second core in a pair is doing something strenuous like hosting a solitaire game. On paper, this is what we have been waiting for.

That brings up the other point, 30% is borderline crazy for an increase, especially one that directly relates to performance. The decoders were the main bottleneck in the architectural paradigm up to this point, so most of that should carry over to the end user on single threaded code. The problem? What was the starting point again? Oh yeah, not so hot. 30% increase in IPC from a current Intel core would be greeted with blank stares and incredulous looks from people who understand the tech. 30% from Bulldozer’s starting point is just enough to get AMD back in the game. That said, it’s about time.

So Steamy not starving the pipelines means they would increase IPC to about what SB or maybe IB have...

fazers_on_stun · Sep 6, 2012

dattimr :

Good find - thanks.

A couple things I noted - both the core and uncore clocks were held at 2.4GHz for all the CPU generations. So the improvements in the front ends of SB and IB to keep the pipes filled at their native or oc frequencies go for naught when underclocked like that. Would have liked to see the same tests done at 3.0 GHz, assuming Nehalem's uncore could be clocked higher or those of SB/IB downclocked to match.

Also, since IB is just a tick of SB, it's really 2 generations of CPU architecture, maybe 2.1 generations if ya wanna get exact 😛..

Finally, if it had been AMD marketing doing the graphics, they wouldda chopped each graph off at either the 100% mark or maybe the 120% mark, to make the differences seem a lot bigger than reality 😀..

cgner · Sep 6, 2012

dattimr :

Very interesting article. Thanks for the link. My buddy has a c2d@ 3ghz and I keep telling him that its plenty powerful for todays games because they hardly need more than 2 cores and c2d had plenty core power at 3 ghz.

AMD Piledriver rumours ... and expert conjecture

Administrator

Glorious

Distinguished

Splendid

Splendid

Glorious

Distinguished

Glorious

Glorious

Splendid

Glorious

Splendid

Distinguished

Distinguished

Distinguished

Splendid

Splendid

Honorable

Splendid

Glorious

Distinguished

Distinguished

Glorious

Splendid

Splendid

Honorable

Share this page