AMD Piledriver rumours ... and expert conjecture

Reynod · Oct 27, 2011

We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...

palladin9479 · Jun 5, 2012

I was implying, that phenom is holding those cards back ALOT!
Just because you are getting "enough" FPS, does not mean you are not getting held back by the CPU.

Heck the CPU in my dragon rig can bearly keep the 480/580 fed in most games...
BF3 has quite a few stutter moments in multiplayer on the phenom, and this is with a single card....

Not saying AMD is bad, i love my AMD rig, but i know which is more suitable for the task (multi-GPU)

It's much more complicated then that. CPU's isn't involved much with rendering, shaders have taken over the last of that. Only way a CPU could be "holding back" a GPU is if the game involved is poorly coded, console ports come to mind here. In that scenario the locks and waits are being generated and the cards are being idled while waiting for the CPU to finish waiting on the keyboard buffer to time out, or for a USB driver to generate a response signaling no data. Those may sound trivial but its very common with console ports where the console doesn't have a proper I/O driver and the game is coded around having to handle I/O wait on it's own.

Other then that, it's really not possible for a CPU to actually "hold back" GPUs, at least not without some serious playing with settings which tend to generate ridiculously high frame rates, aka your 90+ claim. Otherwise all the post processing and fancy lighting effects are more then enough to keep any GPU busy @1920x1080 with 2~4xAA.

Now to explain 3D gaming and how it relates to SLI. In most games you don't get a 1:1 scaling with multi-GPUs, this is known to many enthusiasts. Software issues and timing conspire to prevent that 2nd GPU from fully being realized. Thus even a i7 will "hold back" a multi-GPU setup in the sense that you won't get full utilization on all GPUs. All except the case of 3D gaming.

In 3D gaming your rendering each scene twice, using the exact same set of 3D data. The CPU does have some work to do but this isn't done by the game and is instead done by the graphics driver in interpreting the graphics calls and computing the angles of the camera. It runs as part of the driver and thus can be put on different "cores". These calculations do not require anywhere near a full cores worth of processing power. Because the game is not required to feed the graphics card two sets of data, there is no further requirements CPU side (other then the aforementioned calculations on additional cores). Simply put, a 3D setup operating on a SLI rig will operate at the same frame rates as a non-3D setup on a single card rig (assuming all cards are the same).

In short, I do not experience any stuttering or slow down with my PII + 2x580 Hydro's. I'm already at 60+ fps stable, and that's per-eye (so technically 120fps). Remember this is a 3D gaming setup, not two cards running on a non-3D setup. Would putting an I5 make it run 65~70+ fps (3D), possible, would that make any difference at all to me, no. Any game that would run in less then 60fps (per eye) is already GPU bound and wouldn't see much if any additional performance. It would be a waste of money to do so. Actually LGA1156 would technically be a downgrade platform wise, Intel seems dead set on only including 16 PCIe lanes. I was hoping IB would have 32 lanes, but I was disappointed. Maybe my next platform revision next year will include an Intel chip, it all depends on what happens.

-=Note=-
3D Gaming by definition requires V-Sync to be enabled, typically with triple buffering. This is why I say 60+ because the actual possible frame rate is not known as it's locked to 60. There used to be a problem where in some scenes you'd get hit by a high graphics load that would bring you under 60, with V-Sync it would then wait an entire refresh cycle to display the lagged frame (30fps for a split second). NVidia's recent adaptive v-sync has largely fixed this issue.

jimmysmitty · Jun 5, 2012

Well i have to have the game...
Unfortunatly i do not own it 🙁

I do in fact have 200+games on steam though...
Metro 2033, resident evil 5 ect ect.

Too bad. Its a great game. I love the bullettime, always have, but the detail is amazing. Only problem is you need a lot of VRAM for it. at 2560x1600 maxed out and 8x MSAA, it wants 6GB of VRAM.

And pallidin, not sure why you would be dissapointed that IB only has 16 lanes when its PCIe 3.0 which is going to be equal to 32 lanes of PCIe 2.0, and we all know that PCIe 2.0 is still not saturated and probably wont be for a few more GPU generations.

Either way, no mainstream Intel platform will not have 32 lanes as I don't think Intel will do that. If you wanted the dual GPU setup you can do a single GTX590 or GTX690 and get the same results on a full x16 lane. Of course that may change if Intel kills the "enthusiast" grade setup but I doubt they will.

palladin9479 · Jun 5, 2012

And pallidin, not sure why you would be dissapointed that IB only has 16 lanes when its PCIe 3.0 which is going to be equal to 32 lanes of PCIe 2.0, and we all know that PCIe 2.0 is still not saturated and probably wont be for a few more GPU generations.

Because all older (as in last generation) cards are PCIe 2.0 which won't receive any improvements from a 3.0 controller. Within the last year games have started saturating PCIe 2.0 x8 lanes, it's a rather recent development. Without a 32 lane PCIe controller I'd be taking a step backwards. Next year when I do a refresh cycle I'll most likely pickup 3.0 cards and then the whole problem goes away. 680 Hydro's look really nice, but I dropped $700 for each of my current ones last year and wanting to stretch the investment out to 2yrs at least.

sarinaide · Jun 5, 2012

Run efficiently...
Handle Multi GPU (400-600 series Nvidia)
(5000-7000 series AMD).

"more efficencient" a sales gimmick that makes not one bit of difference overall, and for an enthusiast efficiency is kind of like kissing my own sister.

I have run 6970's in crossfire along with 6970+6990 trifire setup currently and I don't have any problems, probably another Nvidia pitch like Intel telling the miss informed that there chips can do everything the competitors can't but oh well.

seriously dude.
overclocking feats should be more than the highest number as in clock speed.
overclocking AMD just makes it on par with stock Intel units in terms of IPC / data crunching.
same as HD Radeon

to bad in reality it's not all about clock speed....

Again the point I made was in jist of the arguement that AMD have forsaken the ardent enthusiasts needs, I then drew light to the fact that professional and competitive overclocking is about achieving highest clocks and stable not synthetics, AMD engineering wise have structurally sound processors that can take infinately more punishment and still work.

These may seem irrelevent to those that disregard the overclockers community, but again to some it is a positive factor.

Just for Fun let me try to restate some of the issues with Bulldozer in regrades to the performance, from most important to least important based on Cient/Work station performance not server.

Longer Pipeline= Higher clock speed which is something Amd failed to do
Branch prediction/Prefetcher
L2 Cache
L1 Cache(WTH did they share it?)
2x ALUs and 2x AGUs for Bulldozer vs 3x ALUs/AGUs for the phenom per core. Plus 2 IPC per core for BD vs 3 IPC for the Phenom BUT the Bulldozer can handle these operations more efficiently then the Phenom.
CMT only has 80% scaling per core when the Phenom had 93% scaling(so good!)
The Longer Pipeline which is not that big of a deal since the Branch Prediction was supposed to overcome this, but it does hint at what Amd was trying to do and it does cause some small latency issues.
How windows handles CMT(based on windows 8 vs 7 benchmarks this is only a 5-10% increase in performance and usually less then 5%)
You can easily see this in these articles below

This article seems like the L2/L1 cache is being part of the problem.
http://www.extremetech.com/computing/100583-analyzing-bulldozers-scaling-single-thread-performance/2

Then if we take a look at this article we can see CMT only can scale 80% so its multithreaded performance is also lower then a TRUE 8 core Phenom would be.

http://www.legitreviews.com/article/1741/11/

Then the Bulldozer is only clocked at 3.6Ghz which is lower then the Phenom 975/980 and is only 8% higher then the 1100T.

So what did Amd improve in Piledriver?
Clock mesh should improve Clock speeds/Power consumption, Global foundries most likely improved their 32nm die as well meaning Piledriver might be made on a newer stepping and this might make Piledriver a little smaller then Bulldozer was and global foundries should have more Piledriver processors at launch vs Bulldozer at launch, which i hope means Amd will have lower prices.
L1 cache has also been tweaked
The Prefetching and branch prediction have been improved as well
Scheduler has also been improved

This most likley leading to a modest 15-20% boost in performance compared to the bulldozer at stock with IPC being around 7-10% better

What Amd left out and we hope they improve with "steamroller"
die shrink even 28nm would be nice
L2 Cache/L1 cache
2x ALUs and 2x AGUs for Bulldozer vs 3x ALUs/AGUs for the phenom per core. Plus 2 IPC per core for BD vs 3 IPC for phenom
Branch prediction/Prefetcher
CMT only has 80% scaling so i would like to see a 10 core steamroller

So this concludes what i think is personally wrong with Bulldozer and some of the improvements they made with Piledriver and dreams i have for Steamroller.

Now What did Amd improve on when it comes to the Bulldozer vs the Phenom?
Number one is support for newer instruction sets(this is easily a performance boost in some areas), Memory controller is on par with the original I7 series.FPU is better since their is only 4 of them in BD vs 6 in the Phenoms and Bulldozer still beats the Phenom FPU is some cases, L3 cache speeds/size. Turbo core,(scratching my head right now), I guess higher clock speeds with the 4170fx.
Since the Phenom had higher scaling and bulldozer had lower performance per cycle the bulldozer is usually only 10% faster in multithreading while being 10-15% slower in single core tasks with having a 9% higher clock speed with turbo on.

Agree with this, I am still going to say that BD was a relatively obselete improvement on PII regardless of what is said, PII had successes in low thread performance but dropped off drastically in multithreaded performance, memory to, the FX is much stronger than PII in that regard, but again we already know how strong FX is in the high thread counts, namely a 8150 can match, hang or beat a more expensive 3770 where HT is supported, in HT non support or any form of firm/software support the FX is significantly better. Again a fact overlooked in persuance of the old and tedious IPC rhetoric.

I wouldn't judge what Haswell's graphics could do yet. However, I will say that ~30% from 3K to 4K was probably not a great sign for Intel. 300,000 extra transistors from SB to IB, most of it was for IGP, and they get 30%. I doubt they will get away with making the chip size too big, so I don't see them improving much more than 25-30% on Haswell either, if they spend more than a few minutes on the cpu improvements.

Advantage: AMD
GCN is already out, in full force. Drivers will be ready to take advantage of the GCN stream processors more than Intel's for their equivalent (IE's, I think. Probably not.) GCN is a fantastic arch, and it almost seems like it was designed for use in APU's.

Overall, based on what I've seen from Intel out of Ivy, I think AMD will be ahead in graphics performance for quite a while going forward.

I really don't see how HD4000 is impressive, with Lucid MVP assistance then it is 30% better but in general its maybe worth 3-4 FPS over HD 3000 in a range of games that are regarded mainstream. All are completely unplayable at any decent resolution. I don't think Haswell will be revolutionary in the IGPU stakes, mainly it is not like Fusion design, it will suffer the limitations of Intel pushing dieshrinks, also drivers are non existant, it is not like you just make a GPU and drivers without the requisite experience in that field...ATI aquisition gave AMD all the trump cards in this department. Steamroller should all go to plan will be a revolutionary heterogenous chip, probably regarded the first of its kind to incorporate perfomance CPU with higher end GPU performance. Haswell will still be a CPU trying to replicate a GPU, there are copyrights and other conformaties that Intel will need to aquire to enter the GPU fray, so I can't see it being a block buster.

For the "power efficiency" mantra folk...AMD's fusion architecture will alleviate that a lot, when you can reduce the power consumption needs of a discrete card you have less power draw and heat....ultimately a good idea.

palladin9479 · Jun 5, 2012

Problem with Intel IGP's isn't one of hardware and people need to realize that. Drivers are easily 50% of the equation, look at how much work ATI / NVidia put into their drivers. These two entities have been writing high performance 3D drivers for over a decade each, this experience isn't something that can be accelerated. Intel has to build it's own program database for tweaking and tuning, they need to work through all the bugs, glitch's and other artifacts introduced with any new 3D chip. They need to develop experience in writing 3D acceleration drivers.

It's drivers that has Intel behind ATI/ NVidia, not hardware.

sarinaide · Jun 5, 2012

@ sarinaide - you do realize that depending on my mood swings.. 😗 I like to be a little 'Dennis the Menace' type.
and just like messin' with people....

unless I'm saying something for real, and I will state that (for real).....
otherwise I'm just having fun...
😛

I do like terrorizing 'rekon-uk' however and his 'gold plated' GTX 480's...
😗

Are you saying I am mr Wilson?....the missus doesn't think I am that old 😛

Recon loves the gold, and he loves the 480's....I like them too...you could make scrambled eggs at your PC, good ole Fermi fires.

gamerk316 · Jun 5, 2012

It's much more complicated then that. CPU's isn't involved much with rendering, shaders have taken over the last of that. Only way a CPU could be "holding back" a GPU is if the game involved is poorly coded, console ports come to mind here. In that scenario the locks and waits are being generated and the cards are being idled while waiting for the CPU to finish waiting on the keyboard buffer to time out, or for a USB driver to generate a response signaling no data. Those may sound trivial but its very common with console ports where the console doesn't have a proper I/O driver and the game is coded around having to handle I/O wait on it's own.

I'd LIKE to think most devs are at least smart enough to use windows messaging and simply act to any I/O input, rather then spending "x" amount of time waiting to see if the user pressed a button. No reason to have a buffer to handle I/O, period. The days of the MFC message pump are long since dead.

I get a message that a button is pressed by the user. I then check to see if that button is mapped to any particular command. If it is, I execute that command. If not, I dump the message. Simple. I/O should be handled via the OS, and NOT the developer.

Other then that, it's really not possible for a CPU to actually "hold back" GPUs, at least not without some serious playing with settings which tend to generate ridiculously high frame rates, aka your 90+ claim. Otherwise all the post processing and fancy lighting effects are more then enough to keep any GPU busy @1920x1080 with 2~4xAA.

You would think, but it IS possible. Physics is one example, where its possible to be more expensive to perform on the CPU then rendering is on the GPU. But in general, I agree that with most everything about rendering being moved to Shaders, you really shouldn't have a CPU bottleneck until you crank down the graphical settings to their bare minimum.

Now to explain 3D gaming and how it relates to SLI. In most games you don't get a 1:1 scaling with multi-GPUs, this is known to many enthusiasts. Software issues and timing conspire to prevent that 2nd GPU from fully being realized. Thus even a i7 will "hold back" a multi-GPU setup in the sense that you won't get full utilization on all GPUs. All except the case of 3D gaming.

In 3D gaming your rendering each scene twice, using the exact same set of 3D data. The CPU does have some work to do but this isn't done by the game and is instead done by the graphics driver in interpreting the graphics calls and computing the angles of the camera. It runs as part of the driver and thus can be put on different "cores". These calculations do not require anywhere near a full cores worth of processing power. Because the game is not required to feed the graphics card two sets of data, there is no further requirements CPU side (other then the aforementioned calculations on additional cores). Simply put, a 3D setup operating on a SLI rig will operate at the same frame rates as a non-3D setup on a single card rig (assuming all cards are the same).

Agree. Note that *most* game profiles for CF/SLI use AFR, where one frame is computed by GPU #1, and the next by GPU #2, so there shouldn't by any increased CPU bottleneck, as the CPU isn't doing any more work. The only thing that changes is how many FPS the GPU(s) can pump out.

In short, I do not experience any stuttering or slow down with my PII + 2x580 Hydro's. I'm already at 60+ fps stable, and that's per-eye (so technically 120fps). Remember this is a 3D gaming setup, not two cards running on a non-3D setup. Would putting an I5 make it run 65~70+ fps (3D), possible, would that make any difference at all to me, no. Any game that would run in less then 60fps (per eye) is already GPU bound and wouldn't see much if any additional performance. It would be a waste of money to do so. Actually LGA1156 would technically be a downgrade platform wise, Intel seems dead set on only including 16 PCIe lanes. I was hoping IB would have 32 lanes, but I was disappointed. Maybe my next platform revision next year will include an Intel chip, it all depends on what happens.

As long as FPS is stable, you can get away with 45 FPS. I only start looking at FPS if I get obvious stuttering, which with a single GTX 570 paired with a 2600k, I don't typically get. And as stated, because games are GPU limited, you typically don't get significant FPS increases when you increase CPU power. RTS's are the only type of game where that might not be the case...

-=Note=-
3D Gaming by definition requires V-Sync to be enabled, typically with triple buffering. This is why I say 60+ because the actual possible frame rate is not known as it's locked to 60. There used to be a problem where in some scenes you'd get hit by a high graphics load that would bring you under 60, with V-Sync it would then wait an entire refresh cycle to display the lagged frame (30fps for a split second). NVidia's recent adaptive v-sync has largely fixed this issue.

Agreed. Adaptive V-sync was LONG overdue. Great move by NVIDIA to finally handle the 30FPS Vsync problem. Though I note an even easier solution is a native 120Hz moniter [making Vsync more or less a non-issue 😀 ]

4745454b · Jun 5, 2012

Without a 32 lane PCIe controller I'd be taking a step backwards.

How so? 8x/8x PCIe 3.0 is 16x/16x PCIe 2.0. That's why PCIe 3.0 is important. And why its bad AMD is delaying PCIe 3.0 on their boards. (or what Intel can do that AMD can't)

And the video work might be "done in shaders now", but those shaders can still only process data as fast as its sent to them. If you have the powerful cards, you'll need a powerful enough CPU to send them data fast enough. You'll only stall them out if they have to wait for data from the CPU.

gamerk316 · Jun 5, 2012

And the video work might be "done in shaders now", but those shaders can still only process data as fast as its sent to them. If you have the powerful cards, you'll need a powerful enough CPU to send them data fast enough. You'll only stall them out if they have to wait for data from the CPU.

But heres the thing: We are barely saturating PCI 2.0 x8 right now, let alone X16. Bandwidth isn't the problem. And shaders are basically really weak individual CPU's. A 192 Shader GPU is basically a CPU with 192 really weak cores. They aren't going to be waiting on the CPU any time soon.

That being said, with GPU's basically already general purpose processors, it doesn't make much sense to NOT allow them their own direct access to main memory. In theory, it should be perfectly possible to run an entire OS directly via shader programs. [Granted, might be slow as sin, but it should be possible].

-Fran- · Jun 5, 2012

Wasn't GPU memory mapped to the regular memory space as RAM? And when in SLI, you had to page big chunks of that memory to go into the other card?

Am I old in the know-how the GPUs work today? XD

Cheers!

Cazalan · Jun 5, 2012

I don't think Haswell will be revolutionary in the IGPU stakes, mainly it is not like Fusion design, it will suffer the limitations of Intel pushing dieshrinks, also drivers are non existant, it is not like you just make a GPU and drivers without the requisite experience in that field...

It will be if it includes the embedded RAM to go along with it.

gamerk316 · Jun 5, 2012

Wasn't GPU memory mapped to the regular memory space as RAM? And when in SLI, you had to page big chunks of that memory to go into the other card?

Am I old in the know-how the GPUs work today? XD

Cheers!

A good read here that I found on VRAM and Adressing:

http://www.gamedev.net/topic/596716-gpu-and-system-ram/

skaughtz · Jun 5, 2012

Trinity desktops delayed until October.

http://www.pcauthority.com.au/News/303748,computex-2012-amd-trinity-desktops-delayed-until-october.aspx

^@*!&# :fou:

-Fran- · Jun 5, 2012

A good read here that I found on VRAM and Adressing:

http://www.gamedev.net/topic/596716-gpu-and-system-ram/

Well, it was a little outdated for some part.

They're still mapping it, but it should not be a concern in a practical way it seems.

Cheers! xD

Cazalan · Jun 6, 2012

Trinity desktops delayed until October.

http://www.pcauthority.com.au/News/303748,computex-2012-amd-trinity-desktops-delayed-until-october.aspx

^@*!&#

Shouldn't Vishera (Piledriver) be out before then?
Not a big deal IMHO. Trinity is for laptops.

jdwii · Jun 6, 2012

Yeah i really go out my way to cripple things....
I just laughed a david bowie out of my ass....

Funny that my Phenom II rig

HERE:

AM3 Dragon.
============

Phenom II x4 B55@4.2GHZ 1.475V
MSI 790FX-GD70 mainboard
Gainward Geforce GTX 580 Phantom ED OC edition
4GB Geil Black Dragon DDR3 1600mhz CAS8
CoolIT ECO ALC 2x Coolermaster Sickleflow 120mm Push/Pull
Hitachi Deskstar 7200RPM 1TB
Xigmatek Utguard chassis
Corsair TX650W Power supply

Scores less than my 2500k system here:

Sandy Bridge.
==============

i5 2500k@4.5GHZ 1.3V
EVGA Z68 SLi
EVGA GTX 480 2-way SLi
Gelid icy vision rev 2.0 x2 for both 480\'s
8Gb G.skill RipjawsX DDR3 1600mhz CAS9
Corsair H100 4x Phobya G.silent 120mm Push/Pull
2x Samsung Spinpoint F3 7200RPM 1TB RAID-0 (media/games)
Crucial M4 64GB (Windows)
Silverstone Raven RV02-B chassis
Silverstone Strider 1000watt 80 plus silver Power supply

And yes... i run without SLi, as my new motherboard is..... just... a pain..
Games run much smoother on the SB based rig...

Scores? CPU or GPU? well if you only play at 1080P then yeah because the 7970 is overkill and the 480 is just as good as the 570 which is perfect for 1080P and if you have a 1.5Gb version then you will even max skyrim.

This statement is an oxymoron. You can not state that AMD can't do SLI, then claim your not using SLI.

Also can not use "smoother" as that's subjective and based on perception. If you wanted to see the AMD system skip / lag then you'll see it, whether it exists or not. If you wanted to see a single card i5 run "smoother" then a dual card PII, then you'll see it regardless of the actual truth. Just like people claim to recognize single images at 8.3ms.

I'm living breathing proof that a PII @4.0Ghz or higher (got mine to 4.5 but I don't like running it that hot) can run two 580's in SLI just fine. SC2 is about the only game that ends up being CPU bound and only because it's got retarded core usage in single player (someone should do a multiplayer benchmark to see if it's like BF3).

Even though i agree some what toms did a article proving that a Amd setup can be a bottleneck with two cards and i think it was a 5870 on a Intel machine vs two 5870's on a Amd machine i'm sorry i can't find the article.

just you....
just you.

Mal Mal Mal.....LOL

Piledriver will be AMD's Intel destruction. Modules optimized to be better than anything ever made.

AMD is going to win with Trinity and Piledriver. IPC and high clocks galore.

Oops..wrong account.

Yes Intel will do that and i'm sure Piledriver will be delayed until 4th Quarter since Trinity is coming out in the 3rd Quarter.

I was addressing the laughable claim that the clockspeeds a CPU attains when overclocked under L2, is a meaningful positive in how well that CPU is going to sell to the general public.

That's it, that's all I was saying, there is no need to bring in matters completely unrelated.

Yes Chad i know i know..... 😀
Just thought i would troll you a bit since most people usually jump all over you.

esrever · Jun 6, 2012

Shouldn't Vishera (Piledriver) be out before then?
Not a big deal IMHO. Trinity is for laptops.

Im guessing they are all going to come with windows 8.

palladin9479 · Jun 6, 2012

I'd LIKE to think most devs are at least smart enough to use windows messaging and simply act to any I/O input, rather then spending "x" amount of time waiting to see if the user pressed a button. No reason to have a buffer to handle I/O, period. The days of the MFC message pump are long since dead.

I get a message that a button is pressed by the user. I then check to see if that button is mapped to any particular command. If it is, I execute that command. If not, I dump the message. Simple. I/O should be handled via the OS, and NOT the developer.

Games built for PC yeah, games ported from console land ... not always. Polling the OS looking for I/O rather then letting the OS signal when it has I/O for you, ridiculous waste of cycles. It's another of those evils that we've inherited from console gaming. Console's don't have proper OS's with I/O abstraction, most console games take direct control of I/O handling for performance reasons and when they port it over to PC they don't always unlearn that part.

You would think, but it IS possible. Physics is one example, where its possible to be more expensive to perform on the CPU then rendering is on the GPU. But in general, I agree that with most everything about rendering being moved to Shaders, you really shouldn't have a CPU bottleneck until you crank down the graphical settings to their bare minimum.

Well honestly I rarely think about physics as that's been working on the GPU's for awhile now. And yes I agree that should it not be accelerated it could really eat up CPU cycles, especially if the developer wasn't smart enough to run it in it's own thread separate from everything else.

As long as FPS is stable, you can get away with 45 FPS. I only start looking at FPS if I get obvious stuttering, which with a single GTX 570 paired with a 2600k, I don't typically get. And as stated, because games are GPU limited, you typically don't get significant FPS increases when you increase CPU power. RTS's are the only type of game where that might not be the case...

Yep SC2 is the only game that I don't max out the 580's on. It's much more CPU dependent then most games you'll be running into.

Agreed. Adaptive V-sync was LONG overdue. Great move by NVIDIA to finally handle the 30FPS Vsync problem. Though I note an even easier solution is a native 120Hz moniter [making Vsync more or less a non-issue 😀 ]

I got a 120hz monitor haha, the issue was in 3D mode when your refresh rate is locked to 60 per eye. You need V-Sync for 3D gaming to work, should a huge graphics spike happen it can really mess up the effect as both eyes have to slow down (spike is duplicated on both frames). Adaptive V-sync helped tons although in a different way, it allows each eye to get a slightly different FPS. You really have to be careful with your settings.

palladin9479 · Jun 6, 2012

Without a 32 lane PCIe controller I'd be taking a step backwards.

Click to expand...

How so? 8x/8x PCIe 3.0 is 16x/16x PCIe 2.0. That's why PCIe 3.0 is important. And why its bad AMD is delaying PCIe 3.0 on their boards. (or what Intel can do that AMD can't)

And the video work might be "done in shaders now", but those shaders can still only process data as fast as its sent to them. If you have the powerful cards, you'll need a powerful enough CPU to send them data fast enough. You'll only stall them out if they have to wait for data from the CPU.

Read it very carefully. You'll notice I have 580's which are PCIe 2.0 cards. They get absolutely nothing from PCIe 3.0 controllers, thus a SB/IB setup would still run me at 8/8 @ 2.0 speeds. For years we weren't saturating the bandwidth on a 8x slot, recently in the last year that's started to change with more and more data being loaded into the cards memory. We had to get past the "PS3/360 limit" days of PC gaming for this to take place. I expect games in the next year to steadily increase their bandwidth requirements.

My current cards were $700 each (EVGA GTX580 Hydro 2's), that's $1400 +S&H I had to spend for my current graphics solution. I would like to keep that investment as long as possible, meaning I'm going to upgrade next year at the earliest. Going to a SB/IB right ~now~ would be a waste of money, new Mobo + CPU for little to no gain and possibly some loss as new software gets released. Next year when I do a system refresh I'll scope out the landscape and make the decisions I need to make about a new platform. If I buy new GPU's (small chance) then I can accept a 8/8 PCIe 3.0 solution, otherwise I'll need a 16/16 PCIe 2.0 solution. The 970BE is getting long in the tooth, great CPU for it's time and extremely flexible. I would of upgraded to an SB last year if Intel was offering a 16/16 PCIe 2.0 solution.

Short Answer:
I'm a forward planner, I try to plan for things that will be around tomorrow not what's happening today.

Shaders:
The CPU does little to no work regarding shaders. At most it parses it though a library (are shaders compiled to binary for direct load?) then sends it to the GPU via DMA.

Many years ago during the 8-bit ISA era there was a concept discovered where memory transfers could happen directly system memory to a peripheral device. Normally the system CPU would manually fork life each block of data from memory to device and vice-versa, this is expensive and takes way to many cycles (2~3 cycles per block of data). DMA allows a transfer of an entire region at once with only 1 cycle to the CPU, it's about 100~1000x faster and doesn't lock the CPU up. Wasn't commonly used until 32-bit PCI era where it was incorporated into the PCI standard rather then an add-on to the ISA standard.

Thus the CPU doesn't really "feed" shader programs to the GPU, the GPU gets them from memory after the program loads the whole batch in. Computationally very cheap to do.

palladin9479 · Jun 6, 2012

nVidia NF200 chipset..

Thought about it, not available on the boards I wanted and it's got a 50/50 track record. It's just a bridge chip, your still limited to 16x total bandwidth from the CPU to the cards. Intel seems to not like the idea of that chip existing, I think their deliberately keeping the "cheap" CPU / boards are 16x for marketing / pricing reasons. Using the larger PCIe bandwidth of the LGA2011 platform as a selling point and thus the immense markup on those boards / CPUs.

jimmysmitty · Jun 6, 2012

It will be if it includes the embedded RAM to go along with it.

Not only that but there is also the increased (rumoreD) EU count to 40 instead of 16, probably with a few optimizations. Plus AMD can only fit so many SPUs on a die before it becomes too large and is not cost effective to build, until 22nm and last I checked GF wasn't well on their way to 22nm when they just introduced 32nm.

Of course its all to be seen.

Trinity desktops delayed until October.

http://www.pcauthority.com.au/News/303748,computex-2012-amd-trinity-desktops-delayed-until-october.aspx

^@*!&#

Wasn't there a report where it was just delayed till September? Now October?

Delays..... what fun.

Possibly a yield issue? Maybe they want a stronger supply to start with?

nVidia NF200 chipset..

I forgot about that thing. Some of the really high end Asus Z68 mobos had them (I think anything above the P8Z68-V Deluxe such as the Rampage or whatever).

palladin9479 · Jun 6, 2012

I have one...
ASUS P8P67 WS REVOLUTION / NVIDIA NF200 (B3)
http://www.newegg.com/Product/Product.aspx?Item=N82E16813131714

http://www.hardocp.com/article/2011/01/05/asus_p8p67_ws_revolution_motherboard_review
http://www.tweaktown.com/reviews/3795/asus_p8p67_ws_revolution_intel_p67_express_motherboard/index.html

no complaints here.. 😉

Now your making me think about buying one damn it.

Did Asus release a BIOS update for this board? Heard rumors (*cough*) that Asus wasn't going to support all the new IB chips on their older performance boards.

bartholomew · Jun 6, 2012

Amd to release PD on Oct. 17th?

gamerk316 · Jun 6, 2012

Games built for PC yeah, games ported from console land ... not always. Polling the OS looking for I/O rather then letting the OS signal when it has I/O for you, ridiculous waste of cycles. It's another of those evils that we've inherited from console gaming. Console's don't have proper OS's with I/O abstraction, most console games take direct control of I/O handling for performance reasons and when they port it over to PC they don't always unlearn that part.

True, sadly enough. But I'd like to think *most* major developers know better.

Well honestly I rarely think about physics as that's been working on the GPU's for awhile now. And yes I agree that should it not be accelerated it could really eat up CPU cycles, especially if the developer wasn't smart enough to run it in it's own thread separate from everything else.

Me? I'm thinking we should have moved to real-time physics engines over a decade ago, rather then holding onto the age old "bullet type "X" fired from gun "y" does "z" damage. Please, build an engine that can handle all this automatically.

Hopefully, as progress on improving visuals stall [pending the move to ray tracing/ray casting], we'll get some decent physics implementations.

I got a 120hz monitor haha, the issue was in 3D mode when your refresh rate is locked to 60 per eye. You need V-Sync for 3D gaming to work, should a huge graphics spike happen it can really mess up the effect as both eyes have to slow down (spike is duplicated on both frames). Adaptive V-sync helped tons although in a different way, it allows each eye to get a slightly different FPS. You really have to be careful with your settings.

Well, I avoid 3D like the plague. Think about it: What they are doing is taking a 3D image, putting it onto a 2D screen, then faking 3D. Am I the only person who realizes how stupid that is?

Make an actual 3D monitor, and find a way to send the actual 3D image across. Might as well skip rasterization entirely at that point...I see no reason to use "fake 3D".

de5_Roy · Jun 6, 2012

...
Well, I avoid 3D like the plague. Think about it: What they are doing is taking a 3D image, putting it onto a 2D screen, then faking 3D. Am I the only person who realizes how stupid that is?
...

no you're not. when 3d tried to be mainstream (again) i thought why would people want to go through so much trouble to game at 60fps on a sli setup. 😉
however, different people have different preferences.
i hope 4k will have better luck, i'd rather have higher resolution or fps than 3d.

AMD Piledriver rumours ... and expert conjecture

Administrator

Splendid

Champion

Splendid

Splendid

Splendid

Splendid

Glorious

Retired Mod

Glorious

Glorious

Distinguished

Glorious

Distinguished

Glorious

Distinguished

Splendid

Splendid

Splendid

Splendid

Splendid

Champion

Splendid

Distinguished

Glorious

Splendid

Share this page