AMD Piledriver rumours ... and expert conjecture

Reynod · Oct 27, 2011

We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...

noob2222 · Oct 4, 2012

gamerk316 :

I love how everyone uses those 2 benches to throw a hissy fit about Phenom >>>>>>> BD omg BD sucks balls omg total disaster.

Return of the SAME benchark showing considerable improvement ... meh, its just a cherry picked benchmark, I don't believe it to be true.

Some people are just as funny when they cherry pick what they want to beileve.

fazers_on_stun · Oct 4, 2012

de5_Roy :

From what I've read about HSA (not much - just the article here on THG some weeks ago), seems like it will offer a sliding scale of benefits, depending on how parallelized threads can be made. So for stuff like codecs, compression, etc it should offer similar benefits as what we see with GPU-enabled software. For sequential stuff, not so much.

However there are other ways to skin a cat 😀. Knight's Corner as an x86 coprocessor, for instance, or alternatively using hardware implementations for the most common tasks (Quick Sync, which trounces all GPU encoders). What I'd like to see would be exploring the use of FPGA-like coprocessors on the ring bus to give hardware speeds and far more flexibility than fixed arrays like on a GPU. I think AMD is pushing HSA because that's all they can afford to research - once stuck with a GPU as your universal hammer, always stuck with it.

But as long as Intel is going the APU route too, it might as well conform to some standards like what the HSA group is determining.

mayankleoboy1 · Oct 4, 2012

QS is bad quality transcoding.

We are going back to the initial days of computing, which had a separate piece of hardware for every job. Then we integrated all functionality and made it general purpose+programmable. Now gettingback to fixed function hardware.

So for stuff like codecs, compression, etc it should offer similar benefits as what we see with GPU-enabled software. For sequential stuff, not so much.

But all these things are already using parallel computing on CPU, and some on the GPU too.
AFAIUI, the goal of HSA is to make even the sequential stuff parallel.

sarinaide · Oct 4, 2012

noob2222 :

Spent a lot of time doing this to try better understand how things appear as they do, anyways that's irrelevant my point here is that synthetics don't at all correlate well into real world usage.

Gaming synthetics I tend to accept as fairly trite, its tested with the best available hardware and gives performance parameters which are basically cast in stone bar, hardware drivers updates, patches, fixes, DLC's and all that which may fix certain manifest issues in hardware or software, so all in all I have run common game benches and get pretty much within margin of error results (most synthetics are older than mother time).

Productivity this is where coloration falls off the cliff, I find that productivity synthetics correlate less than a 1% accuracy in real world domains. Most are used for wow factor but no mater the synthetic or our own written software can show up into real world experience, a 25mb PDF conversion which is bigger than 90% of day to day workstation users complete in a week let alone a day is a matter of 13s from a i7 to a 8150........13s (give or take) does not subscribe to the "time is money" mantra which has been brutally abused, that applies to hours not seconds as a now fully practicing attorney I don't make money by seconds.

de5_Roy · Oct 4, 2012

that obr article was funny! the cpu prices look better than last year's. borderline reasonable! 😛

mayankleoboy1 :

at least qs has some support. whatever happened to trinity's vce? why didn't any review test it as well as it's hybrid mode? did amd tell them to not test it at launch or something?
what has amd done so far to push developers to use their new instructions like xop, fma4? as for parallel computing, software support is a must. isv should be aware of hsa and code accordingly. and the code has to able to scale as much.

techreport finally reviews desktop trinity! woohoo!
http://techreport.com/review/23662/amd-a10-5800k-and-a8-5600k-trinity-apus-reviewed
hexus tests trinity's 'dual graphics'
http://hexus.net/tech/reviews/cpu/46157-amd-a10-5800k-dual-graphics-evaluation/
indeed that is something intel cannot do.
qualcomm joins hsa, redux
http://semiaccurate.com/2012/10/04/qualcomm-is-the-last-big-soc-vendor-to-join-the-hsa-foundation/

esrever · Oct 4, 2012

Im still waiting to see full windows 8 on bulldozer. I mean there was a fuss about it like a year ago and maybe someone will finally get to test it.

fazers_on_stun · Oct 4, 2012

sarinaide :

Heh, not to poke at you too much but a few pages back you said you were seeing samples averaging 10-25% over BD:

sarinaide :

What happened??

sarinaide · Oct 4, 2012

fazers_on_stun :

well 10-15% is there, in some instances its higher some up to 25ish% others as low as a percent, some actually less but irrelevent to performance. Overall though the 10-15 is about as narrow as I want to go, if I broadened it I would say more along the 7-11% range.

fazers_on_stun · Oct 4, 2012

mayankleoboy1 :

From Anandtech's article on Ivy, that was somewhat true for the first generation on Sandy, but not so true nowadays on second generation on Ivy, as Intel exposed the inner controls allowing for the software to control the amount of compression.

We are going back to the initial days of computing, which had a separate piece of hardware for every job. Then we integrated all functionality and made it general purpose+programmable. Now gettingback to fixed function hardware.

But back in those days we didn't have a choice since desktop computing was non-existent. IMO there is probably a 'happy medium' as in bang-per-buck between general purpose and dedicated hardware, esp. if adding a small and inexpensive amount of the latter can yield a +10x performance increase like with QS. FPGAs can offer dedicated hardware performance with a great deal of programmability, but they're supposed to be a real beeyotch to program 😀.

"So for stuff like codecs, compression, etc it should offer similar benefits as what we see with GPU-enabled software. For sequential stuff, not so much."

But all these things are already using parallel computing on CPU, and some on the GPU too.
AFAIUI, the goal of HSA is to make even the sequential stuff parallel.

By sequential, I meant the stuff that cannot be parallellized. Paraphrasing Amdahl's law - if you have one process that has to wait on the result of another process before it can start to work, there is no time-savings benefit in running both in parallel..

esrever · Oct 4, 2012

fazers_on_stun :

From Anandtech's article on Ivy, that was somewhat true for the first generation on Sandy, but not so true nowadays on second generation on Ivy, as Intel exposed the inner controls allowing for the software to control the amount of compression.

We are going back to the initial days of computing, which had a separate piece of hardware for every job. Then we integrated all functionality and made it general purpose+programmable. Now gettingback to fixed function hardware.

But back in those days we didn't have a choice since desktop computing was non-existent. IMO there is probably a 'happy medium' as in bang-per-buck between general purpose and dedicated hardware, esp. if adding a small and inexpensive amount of the latter can yield a +10x performance increase like with QS. FPGAs can offer dedicated hardware performance with a great deal of programmability, but they're supposed to be a real beeyotch to program 😀.

"So for stuff like codecs, compression, etc it should offer similar benefits as what we see with GPU-enabled software. For sequential stuff, not so much."

But all these things are already using parallel computing on CPU, and some on the GPU too.
AFAIUI, the goal of HSA is to make even the sequential stuff parallel.

By sequential, I meant the stuff that cannot be parallellized. Paraphrasing Amdahl's law - if you have one process that has to wait on the result of another process before it can start to work, there is no time-savings benefit in running both in parallel..

read this :
http://techreport.com/review/23324/a-look-at-hardware-video-transcoding-on-the-pc/3

jdwii · Oct 4, 2012

kukreknecmi :

I'd say its pretty good since IPC does go up with L3 cache(by around 3-5% on average) not to mention a 8150 has a clock speed of 3.6Ghz vs 4.0Ghz on piledriver i'd say performance should be 10-15% better while being 5-10% better on power consumption as well. Lol i said this like 3 months ago. Everyone here saying Amd lied to us well this time they didn't all they said is 10-15% better on IPC/Power consumption and they definitely did that if you add the power figures into the IPC figures. But some people were saying a 20-30% performance benefit which i knew was not going to happen on the same die fabrication in 1 year not to mention amd had samples of Trinity before Llano was at newegg last year in june.

Now lets just hope Amd can release Piledriver ON TIME.

sarinaide · Oct 4, 2012

jdwii :

Ditto, I did today say between 10-15% this is as accurate as I can say, IPC is up, latency is better, things are better but there is a long way to go yet, either way AMD did well in this regard.

fazers_on_stun · Oct 4, 2012

esrever :

Just finished it before seeing your post actually, and it supports my statement about Ivy exposing more QS controls to the software. Obviously given the quality differences between MediaExpresso and MediaConverter, it's up to the software as what it does with those controls...

Personally I still use DVDShrink 3.12 (software-only) in dual-pass mode to sharpen compression and lessen artifacts going from dual-layer DVDs to single-layer DVDR discs, but I note that more recent software like DVDFab and even Nero's recoders support QS nowadays.

jdwii · Oct 4, 2012

For 240$ i can get a A10 with 8GB of DDR3 1866 ram as well as a decent board with 6 6Gb sata's as well as 2 USB 3.0's well i'm simply impressed if i add just 30$ for a 212+ i can have 6670+I3 ivy performance.

Amd does have Intel beat at the low-end i can't believe how close a A10 is to a I3 in performance i would almost say its the same and when you overclock the Amd chip it beats it. I'm pretty sure the 4 core Piledriver with L3 cache will be a hit if its price right.

gamerk316 · Oct 4, 2012

noob2222 :

You are talking about two totally different things:

1: Phenom IS faster then BD. Varies by workload, but very few instances where the reverse is true. This was true across almost all workloads tested, and confirmed via more real world tests.

2: I simply pointed out benchmarks tend to show best case performance conditions.

sarinaide · Oct 4, 2012

APU's don't give a accurate reflection of a pure PD module but its an insight that no matter what PD is probably more in line of what was expected from BD.

gamerk316 · Oct 4, 2012

mayankleoboy1 :

CPU's are not designed with parallel workloads in mind. Throwing more cores simply hides the scaling problem.

Back in the 80's MIT tried to build a computer with over 60k CPU's. They stopped at 1,000, because they found no matter what they did: Software would not scale beyond a few dozen cores.

Some applications are reasonably parallel; loading powerpoint slides, database management, and so on. When you work with objects within a program that do not depend on other objects within the program, you can get very good scaling (80%), up to infinity.

Most applications, however, are not designed like that, and as a result do not scale particularly well. You can parallize PARTS of applications to an extent, but the overall program control remains sequential (do A, B, C, wait for results, do D and E based on resulted from A, B, and C, wait for results, then do F with the final data set). This limits your scaleability, and you drop off after about a dozen cores.

Why are GPU's used for rendering? Because after you create a 3D geometry, most of the computations become independent down to the pixel level. And given you have a few million pixels to compute, a 200 core architecture with weak cores will outperform a 4 core architecture with much more powerful cores. Have a GPU do general purpose computing however, and you see applications running slower then on a CPU, because those weak cores, combined with a serial workload, kill performance.

And this is before hardware is considered. Guess what, if I have to read data from RAM, guess what? The thread is going to be preempted by the OS, because it can't continue for a few ms. Threads need to communicate? Another roadblock. And so on and so forth. At the end of the day, hardware read/writes are sequential operations. Hence why you'll never see scaling much above 80% on PC's as currently designed.

viridiancrystal · Oct 4, 2012

gamerk316 :

As far as "benchmarks" go, rendering and audio encoding are as real world as they get.

noob2222 :

+1

AMD said a few times that the cores in Trinity were not going to be the same as the full Piledriver. I'm going to assume they meant more than just L3. Mobile trinity came out how many months ago? Desktop parts are the same die, just different clocks. They had quite a bit of time to work on Piledriver and make improvements, if they wanted. They could have also put that time into Steamy if they were really done with Piledriver.

ctbaars · Oct 4, 2012

@Sarin aide
You got some double talk going on there :lol:

noob2222 · Oct 5, 2012

gamerk316 :

Id say closer to 50/50

Even on toms review, BD was within 1 fps of the phenom II 980 in all the game benches (wich is actually clocked 100 mhz faster)

As I said, when people use selective "cherry picking" on what you want to believe, sure, only look at the red numbers and delete all the black ones.

Is Phenom faster than BD? id say you have a 50/50 shot at the same clock speed of having a miniscule measureable difference of 0.1-5% either way.

Does PD improve on BD? without a doubt, but we won't actually KNOW how much till NDA is lifted.

-Fran- · Oct 5, 2012

That's a stock chart, right Noob?

When OC'ed, the 980 and 1100T scale way better against Zambezi, so that chart will favor the PhII at 4.2Ghz, I'd say. It would be very interesting to see with OC'ed results =/

Cheers!

esrever · Oct 5, 2012

-Fran- :

there is no way that is true.. they should both scale very similarly. 4.2 on a phenom II is usually a lucky chip as well. 20% OC on both chip will should 20% improvement in software thats not limited by memory. bulldozer has better memory performance as well as better instruction decode. I don't see phenoms scaling better. 5ghz bulldozer should be much faster than 4.2 ghz phenom II x6.

-Fran- · Oct 5, 2012

Oh yeah, a 5Ghz Zambezi should be faster overall compared to a 4.2Ghz PhII; I don't argue that point. I'm just saying, when in the OC territory, PhIIs tend to have a better showing since Zambezi doesn't OC that far away from them and at 4.5Ghz, it's either equal or worse than a PhII even below 4.2Ghz according to most benchies out there. That's one of the things that Vishera should have fixed... I hope... Higher OC room and a tad better IPC, making it an upgrade from heavily OCed PhIIs.

Anyway, if SR's desktop incarnation is indeed headed towards AM3+, I might skip Vishera as well. Not that I like the sound of that anyway... I'd rather have a new platform at this point =/

Cheers!

cgner · Oct 5, 2012

jdwii :

.....what? What are u referring to exactly?

king smp :

Only 3? Besides, I trust those guys as much as all the websites reviewing amd vs intel stuff. ENOUGH SAID BRO

jdwii · Oct 5, 2012

cgner :

Did honda get over 200 HP yet? Also how is toyota/Honda's Truck market Do they have anything that can tow my quad yet na just joking.

AMD Piledriver rumours ... and expert conjecture

Administrator

Distinguished

Splendid

Distinguished

Splendid

Splendid

Splendid

Splendid

Splendid

Splendid

Splendid

Splendid

Splendid

Splendid

Splendid

Glorious

Splendid

Glorious

Distinguished

Distinguished

Distinguished

Glorious

Splendid

Glorious

Honorable

Splendid

Share this page