AMD Piledriver rumours ... and expert conjecture

Page 196 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
amd piledriver was announced that their chips will give 10-15% increase in ipc performance but i think still the piledriver will lag behind core i5's because if u see the price to price comparison of i3 2100 and a8 ,i3 tops at gaming..i'm asking this because i'm going to buy a budget gaming mouse logitech g300 or razer abysssus if i buy a an amd system i would like all system colors to be red so i'm waiting for the release of fx piledriver benchmark just for the purchase of a mouse..

Thing to note with those Trinity benchmarks is that Trinity has to worry about energy efficiency, not to mention sharing half its die with a IGPU and no L3 cache. Vishera wont have to worry about those things so 15% is easily doable and maybe 20% could even be a possibility. If it comes out at 15-20%, it will be within a reasonable distance to Intel and so long as they dont price them stupid like they did with the 8150 at nearly $300, it can be very competitive. If they bring their flagship out at $200, theyve got something. $200 for a very overclockable 8 core processor that is only around 10% slower than Intel would be not a bad deal. Vishera doesnt have to be better, just competitive.


I'm not iffy, I just doubt I'll get past 4.2Ghz 😛

If it's a minor upgrade, I won't be able to justify the $160 price tag of a FX 8120, in the same way I know everything I want to play runs fine on my HD 4770 @ 1080p, high-med settings.

It depends on what youre running now. I needed 4.3 to get my FX to perform better than my 4 GHz 1090. If youve got something like a X4 running in the mid 3's, then 4.2 on a FX would indeed be a decent upgrade. Might not be worth $160 though unless you count the coolness of having a shiny new toy to play with as being worth anything. 😀
 
Do we know when AMD will be using gpu as a floating point processor? I think that is going to be with Steamroller, but I haven't heard anything on the timing for that yet.

And how are they going to do that, exactly? Nevermind the latency issue, you'd need an entire backend API, as you don't want this done automatically on the CPU side. Nevermind the resource contention should, for instance, the application in question already is making use of the GPU...[massive GPU bottleneck, anyone?]

So yeah, marketing fluff.
 
And how are they going to do that, exactly? Nevermind the latency issue, you'd need an entire backend API, as you don't want this done automatically on the CPU side. Nevermind the resource contention should, for instance, the application in question already is making use of the GPU...[massive GPU bottleneck, anyone?]

So yeah, marketing fluff.


+1

The closest well get to something like that is OpenCl. Amd's FPU is not why its a slow performer most of are apps don't even use the FPU again its the cutting of resources by 33% when compared to phenom and its also high Latency in cache.

 
Again, in games, PD won't be faster then SB/IB/PD/X6 simply because you'll run into a GPU bottleneck long before you run into the CPU one. Anyone who is hoping for improvements gaming side is going to be disappointed for that reason alone.

Not necessarily. An AMD and Intel will run neck and neck in almost all games at 1920x1080 with a single GTX570 but crank it up to 2560x1600 or even higher and add a 2nd GTX570 and youll see the Intel start to pull ahead. High resolution and multi GPU configurations are becoming more and more popular and these systems will see the benefits of a faster processor.
 
I would say it might be something close to how QS and OCL operates in a mixed bag inside that API. For one, you will have the API capable of getting resources to do specific tasks and on the other side, you'll have hardware ready to help in this overall process speed.

That's why AMDs idea is not a bad one at all. It might not be the bestest of the best for CPU performance as a whole, but the tradeoff in complexity and diespace might be worth it for server applications, where you don't need blazing fast speeds in FPU performance. It won't be as fast as having strong FPU inside the CPU for all floating point instructions (AVX in the coming years it seems), but you can get pretty good performance with a big enough chunk of processing power at the side; just like GPUs do the trick today in HPC.

For me it's still very obscure, but I'm sure more information, on how exactly they're planning on doing this, will come and shed some light into it. Once that information is up, we'll be able to decide if it's indeed worthless (like Larabee at its time) or something really cool (like the APUs turned out to be).

Cheers!

EDIT: Intel is planning to object the EU anti trust lawsuit (and USD$1450M fine) from 2009 in a few months/weeks time according to some news around the web, hahaha. I wonder how that will turn out.
 
+1

The closest well get to something like that is OpenCl. Amd's FPU is not why its a slow performer most of are apps don't even use the FPU again its the cutting of resources by 33% when compared to phenom and its also high Latency in cache.

GamerK is incorrect in his assumptions. He's thinking about replacing SIMD SSE instructions with GPU native instructions. What he didn't know or conveniently forget to mention is that AMD's current generation GPU's (GCN) are capable of executing Integer and SSE instructions. They simply haven't exposed those functions to the OS nor the CPU yet. The next iteration will incorporate more CPU like functionality, though they'll never get the massive decoder units that plague current x86 CPUs. Something else that wasn't talked about much is that the next generation of APU will allow the iGPU and CPU to talk in the same memory space and to read each other's memory contents.

While trying to use your PCIe GPU to do SIMD instructions would be a bad idea, replacing the CPU's SIMD execution unit with a GPU's array would be an advancement. There is no latency involved or API mess as it's on the same die. SIMD instruction comes in (SSE / AVX / XOR / MMX / ect), instruction dispatcher / scheduler then sends it to the on-die GPU / SIMD array to be processed. Thus your replacing the onboard FPU from the Pentium Pro / MMX days with a GCN (or whatever is next) SIMD array that may or may not double as a GPU. Also remember that SIMD instructions are by definition highly parallel in nature. Doing two, three, six, or sixteen of them simultaneously isn't a problem provided you have a big enough array available.

This is the natural evolution of SIMD computing. Try to imagine that BD uArch picture, now remove the center "256-bit shared FPU" and replace it with a connection to the iGPU (APUs) or to a smaller SIMD array (iGPU without display processing components).
 
Before anyone gets their blue / green panties in a bunch, this isn't just about what AMD's doing. Intel is also taking the above mentioned approach. Their execution units inside the HD2/3/4K are basically just SIMD units. Eventually those execution units will be replacing Intel's own FPU. This is also why nearly every Intel CPU being made has an on board iGPU, even if it's disabled. Eventually Intel will integrate those as a back end "FPU" via SIMD arrays.
 
I think its the way of the future. AMD is putting most of its energy in doing well in the APU and mobile market as thats where the most money is at. Intel is doing the same thing. Ivy Bridge came out with barely a 5% improvement in speed over Sandy but the IGPU improved a good bit. I think we enthusiasts are going to be taking 2nd fiddle from now on as they can make a ton more cash dominating the APU and mobile market than they can keeping PC gamers and overclockers happy. Thats why I think its so important for us to have a competitive CPU market and why its so important that Piledriver come out swinging.
 
I was talking about trinity and in this regard yes memory speed is holding back GPU performance on the APU without a single doubt.

For the GPU to a point, yes. But ATI GPUs have always used faster memory and DDR3 is a major drop from GDDR5. 20GB/s compared to my 288GB/s. (have memory at 1500MHz right now) is a large change.

But I meant for CPUs. And still, APUs will benefit from stacked RAM before they will even DDR4, as even DD4 is only supposed to hit 51GB/s (about the same as the theoretical limit of SB-E) ad I don't think we will see AMD CPUs with DDR4 until at least 2014 at the earliest.
 
Not necessarily. An AMD and Intel will run neck and neck in almost all games at 1920x1080 with a single GTX570 but crank it up to 2560x1600 or even higher and add a 2nd GTX570 and youll see the Intel start to pull ahead. High resolution and multi GPU configurations are becoming more and more popular and these systems will see the benefits of a faster processor.



I agree with gamers remark A Amd rig will not really be much faster then a Intel rig under gaming and to be honest i don't think Amd will ever be as good as Intel under gaming which DOES favor IPC and clock speed over multiple threads.

A GPU is still a bottleneck under most situations. Except for a few games Intel and Amd well usually give you good performance under the most common Resolution which is 1080P. And under multiple screens the video card becomes a HUGE bottleneck yes even to 7970 1hz edition in CF or 2 690's in SLI.

I guess its what people really want by reading 90% of everyone's hopes for Piledriver seems to be gaming and i really don't understand why? Amd will not beat Intel in this area but i expect their 4 core Piledrivers to to give quite a punch on the Price/Performance area but then again Amd likes to price things a little bit to high at first with their Radeon 7000HD series no i'm not talking about the 7950/7970 i'm talking about the 7770 which on release should of been 129.99-139.99$. Who ever department decides pricing at Amd should be fired.

If the rumors are true about the 4 core Piledriver being clocked at 4.4-4.5 and the fact that Amd's piledriver is 10-15% faster per clock on top of the efficiency improvements Amd made That 4 core will be great and if the clock speed rumors are right this processor will beat the crap out of the 980 under gaming/90% of benchmarks.

Now if Amd can get it out on time(ha ha ha).
 
For the GPU to a point, yes. But ATI GPUs have always used faster memory and DDR3 is a major drop from GDDR5. 20GB/s compared to my 288GB/s. (have memory at 1500MHz right now) is a large change.

But I meant for CPUs. And still, APUs will benefit from stacked RAM before they will even DDR4, as even DD4 is only supposed to hit 51GB/s (about the same as the theoretical limit of SB-E) ad I don't think we will see AMD CPUs with DDR4 until at least 2014 at the earliest.


Staked ram will be a huge benefit some one would have to be crazy to disagree, Amd better hope its not as great as we think since Intel is much closer to Amd on this one.

Before anyone gets their blue / green panties in a bunch, this isn't just about what AMD's doing. Intel is also taking the above mentioned approach. Their execution units inside the HD2/3/4K are basically just SIMD units. Eventually those execution units will be replacing Intel's own FPU. This is also why nearly every Intel CPU being made has an on board iGPU, even if it's disabled. Eventually Intel will integrate those as a back end "FPU" via SIMD arrays.

I just don't see it replacing the FPU completely but i must say i'm pretty excited on how OpenCL improves performance and on things that don't demand Low latency Using the GPU resources is efficient.

I can't wait for OpenCL support on Handbrake!
 
I agree with gamers remark A Amd rig will not really be much faster then a Intel rig under gaming and to be honest i don't think Amd will ever be as good as Intel under gaming which DOES favor IPC and clock speed over multiple threads.

A GPU is still a bottleneck under most situations. Except for a few games Intel and Amd well usually give you good performance under the most common Resolution which is 1080P. And under multiple screens the video card becomes a HUGE bottleneck yes even to 7970 1hz edition in CF or 2 690's in SLI.

I guess its what people really want by reading 90% of everyone's hopes for Piledriver seems to be gaming and i really don't understand why? Amd will not beat Intel in this area but i expect their 4 core Piledrivers to to give quite a punch on the Price/Performance area but then again Amd likes to price things a little bit to high at first with their Radeon 7000HD series no i'm not talking about the 7950/7970 i'm talking about the 7770 which on release should of been 129.99-139.99$. Who ever department decides pricing at Amd should be fired.

If the rumors are true about the 4 core Piledriver being clocked at 4.4-4.5 and the fact that Amd's piledriver is 10-15% faster per clock on top of the efficiency improvements Amd made That 4 core will be great and if the clock speed rumors are right this processor will beat the crap out of the 980 under gaming/90% of benchmarks.

Now if Amd can get it out on time(ha ha ha).
*sigh*

L4DCoreScaling.png
 
Viridiancrystal

Benchmark using 8 cores and then benchmark using 4 cores, use the Affinity inside task messenger.

I did this for GTA4 and i saw that the game used 3 cores but the 4th core didn't make much a difference not even 1FPS. Then i did the same for some other games.
 
And how are they going to do that, exactly? Nevermind the latency issue, you'd need an entire backend API, as you don't want this done automatically on the CPU side. Nevermind the resource contention should, for instance, the application in question already is making use of the GPU...[massive GPU bottleneck, anyone?]

So yeah, marketing fluff.
And when you have an IGPU and a discrete gpu ... people who worry about gpu bottlenecks aren't using the IGPU so why not put it to use?

http://www.anandtech.com/show/5831/amd-trinity-review-a10-4600m-a-new-hope/7

This is where AMD is headed, wether you like it or not, it works.
 
I just don't see it replacing the FPU completely but i must say i'm pretty excited on how OpenCL improves performance and on things that don't demand Low latency Using the GPU resources is efficient.

Maybe your getting your instruction sets confused. When we say "FPU" we're usually referring to the SIMD execution unit inside modern CPUs. "FPU" instructions are 80-bit x87 and are very old. The FPU was originally a co-processor that was used to execute specialized math functions on values that had a floating decimal place. Doing such math in integer registers is cumbersome and slow, thus the 80-bit FPU could do those transactions much faster. It was expensive and only used in specific circumstances. With the advent of "gaming" the requirement to use floating point math become more popular, the 486's had their FPU's integrated into the CPU die instead of being a separate co-processor.

Today we have 64 and 128 bit Single Instruction Multiple Data (SIMD) units, SIMD was a way to do math on multiple data-sets using a single instruction. Adding A to B,C,D in sequence is comparatively slow when you can add A to B,C,D in a single go, no need to clear registers, do multiple memory reads or PUSH / POP on the stack. Early demand for multimedia services pushed the creation of what we now recognize as modern SIMD units (SIMD had been around a long time but was used for specialized computing tasks). MMX and 3D-NOW are both the initial implementations of SIMD, though they were separate from the x87 processing unit. Flash forward and SSE is faster then both MMX and x87. Execution units are now SIMD native and merely emulate the processing of x87 FPU instructions.

So when we say "FPU" we're really referring to SIMD instructions not legacy 80-bit x87 (though their still used). GCN can process SSE SIMD instructions along with general integer operations, no need for another ISA or special compiler support. GPU's started as raster accelerators but have since evolved into very power SIMD array processors. They've since eclipsed the combined x87FPU + SSE SIMD units that are inside our BD / Core CPUs. The next step would be to fuse those SIMD units into the CPU's directly and cut out the now useless FPU. In essence your iGPU becomes your FPU, no changing in instruction sets required. Where as before your program would issue a SSE2.1 instruction to the CPU, and the CPU would issue it to the SIMD FPU, now the CPU would issue it to the iGPU interface. The iGPU would then process it and send it back.

What does this mean?

Right now every "core" on a chip has it's own SIMD FPU unit. They will remove all of those and each core will utilize the iGPU's array. Remember GPU's are the equivalent of 12~30+ FPUs, so even with eight cores sharing one SIMD iGPU there is plenty of power to go around. It's also more efficient use of processing resources.

Honestly the "FPU" has already been replaced by SIMD units. We just refer to them both by the same word.

How is this different from OpenCL / GPCPU?

OpenCL / GPGPU are API's that allow a software program to off load code to a special co-processor, namely the Video card. You can use this to send packages to your dGPU or iGPU. There is latency and it's not seamless, your program must be written specifically for those languages. What I was referring to previous is using the iGPU's SIMD array's to process SSE / AVX / XOR / ect instructions rather then individual FPU SIMD units. Now that GPU's have progressed to the point where their just giant SIMD arrays, its become a waste of silicon to include an individual SIMD FPU into every core on a CMT die. The i5-2500K for example has four SIMD FPU units, Intel could remove those and instead have each core dispatch SIMD FPU instructions to the HD2/3/4/5K unit. BD / PD has 81xx has four large FPU units each compromised of two smaller FPU's, that is eight SIMD FPU's that could be removed and instead have the instructions sent to the SIMD array unit.
 
FX8350 with pile driver cores @ 4ghz stock!, will it be finally worth upgrading my 1100t ?
uther39: I'm in the same boat as you. I bought a 990FX mb (Asus Sabertooth) on the hope that the BD would be a big leap from my Phenom II 965. When it wasn't I snatched an 1100T for a good price and I'm quite happy. Not a slouch but not up with my Intek 2500k chips. Will the 8350 FX PD be the answer? I'll wait for the tests.
 
Viridiancrystal

Benchmark using 8 cores and then benchmark using 4 cores, use the Affinity inside task messenger.

I did this for GTA4 and i saw that the game used 3 cores but the 4th core didn't make much a difference not even 1FPS. Then i did the same for some other games.
Well, Left 4 Dead has no in-game benchmark, but I'll give you this: It was using about 30% of my cpu in that state, give or take two or three %. That considered, moving it down to four cores shouldn't show any change, because it still has more than that 30% to work with. I would have to shove it down to 2 cores to see any cpu bottleneck.

Put simply: That is a big gpu bottleneck. If the game is "thread jumping" there isn't really any way of showing how many cores it uses.
 
Well, Left 4 Dead has no in-game benchmark, but I'll give you this: It was using about 30% of my cpu in that state, give or take two or three %. That considered, moving it down to four cores shouldn't show any change, because it still has more than that 30% to work with. I would have to shove it down to 2 cores to see any cpu bottleneck.

Put simply: That is a big gpu bottleneck. If the game is "thread jumping" there isn't really any way of showing how many cores it uses.


If you use FRAPS to benchmark while playing and then go to the task messenger to lock the game to only 4 cores then you will be able to see if its really using 8 cores or not, Benchmark twice once using 8 cores then only using 4.
 
Status
Not open for further replies.