AMD Piledriver rumours ... and expert conjecture

Page 199 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
The point is a simple one: If the GPU bottleneck is removed, where do CPU's fall in comparison to eachother? How much more headroom does CPU X have over CPU Y? Will there be a CPU bottleneck if a CF/SLI in the future? These are the questions that need to be answered when looking at CPU's in gaming, and no one does that anymore; they just want a test that shows every CPU from an i3 to an i7 putting out 42 FPS in BF3. what are you REALLY proving? Taking the BF3 example, how many times have we heard on this forum people asking if an i3 is sufficent, because it puts out the same FPS? Then we have to explain that in MP, the i3 chokes to death. Woops, so what does the BF3 benchmark actually prove?

If your getting the same number for an i3 as an i7 in BF3 then your doing it wrong. Take it out of single player timed demo mode, your not actually benchmarking anything thing.

It's already been proven, games don't run the same way in actual usage as they do in a pre-recorded single player timed loop demo. Your only testing the graphics engine and not the rest of the game. MP benchmarks show the BD / i5 / i7 beating the i3 pretty handily and by a fair enough amount to be noticeable.

In the end I don't care what color people paint their body's with, only preventing the horribly wrong notion that a dual core is "all you need".

For BD,

BD's design wasn't bad, it's implementation was. AMD really screwed up with the shared L2 cache, hopefully they fix that by steamroller. BD got a really bad rap on single core performance which is all anyone really cared about upon release. A single BD "core" really only counts for 0.75 of a core, which is why a 8150fx seems to get the same numbers as a 1090T. This is a result from cutting out one ALU and the arbitration issues with a shared front end and L2 caching system. AMD was hoping to overcome this limitation with sheer clock speed, they failed in this regard. There are some tools that you can use for dynamic overclocking (the real way to do OCing for this situation). You can get a pretty hefty performance boost in applications that use one to two cores by creative use of the affinity flag along with dynamically overclocking the two cores it's assigned to while under-clocking / under-volting everything else. People doing OC'ing keep trying to OC all four modules at once, this is never going to work well as that's eight cores consuming power / TDP at once. Work smarter not harder. Of course this should all be automated but Windows is schizophrenic about how where it assign's threads.
 
Seeing the moves of Qualcom and nVidia squawking about production availability from TSMC, the 450mm wafers is truly good news, as Intel has held out as long as they can, as theyve proposed this move in the past, but has met opposition thru the tool makers themselves, and is why they are collaborating to make it happen.
The smaller wafers we see today simply bring no savings in tooling/R&D/process costs, as weve seen in the past.
This is good news for the entire industry indeed.
 
@palladin- question - Does K-stat (?) work with Phenom II Debebs?
probably a noob question-not sure if Phenom II has P-states-I think I read that Thubans do?

Yes it does and it's amazing. Software overclocking after bootup is so much faster then constantly playing with BIOS options. After getting such success with my 3550MX I tried using it on my 970BE and it worked exactly the same. I've hit the limit on my CPU @4.2 (stable on hyper-pi), I can go higher but it'll fail stability test. I'm sure if I go into bios and play with more voltage settings I could push it ever higher then that but honestly not worth it to me.

For bulldozer CPU's use PSCheck, does the exact same thing as K10stat.
 
Let's not call Bulldozer a good cpu, it isn't and there are good reasons why it's not. AMD can turn Bulldozer into a good cpu, if not with Piledriver then perhaps the next iteration of that architecture. You are only angering the Intel fanboys.

Understand that Bulldozer, had it been done well, was the cpu that I wanted. Intel won't dare let us have an 8 core cpu pfft.

Intel was the first to push out a mainstream quad (Q6600) and 6 core (9xxX i7s, even if they were too expensive) so I would't be suprised if Intel doesn't release the first "true" 8 core desktop CPU as well since their process lead can allow it. They could even make a 8 core IB CPU if they wanted, but honestly its near pointless for us enthusiasts as of now. As well it probably would have the best yields and thus would cost more per chip than anyone would want to spend.

Not sure with AMD though. It would take a 8 module CPU toi give 8 full cores with 8 partial cores unless they revamp the modules. Not sure the die size would be worth it on GFs 32nm right now.
 
Seeing the moves of Qualcom and nVidia squawking about production availability from TSMC, the 450mm wafers is truly good news, as Intel has held out as long as they can, as theyve proposed this move in the past, but has met opposition thru the tool makers themselves, and is why they are collaborating to make it happen.
The smaller wafers we see today simply bring no savings in tooling/R&D/process costs, as weve seen in the past.
This is good news for the entire industry indeed.

There is only so much you can pull out of a wafer node or a process node before the benefits do not outweigh the costs. I think it will be good as once Intel moves to it, others will slowly but surley follow. TSMC could use the extra wafer space as their yields on 28nm seems to be rather meh.

Yes it does and it's amazing. Software overclocking after bootup is so much faster then constantly playing with BIOS options. After getting such success with my 3550MX I tried using it on my 970BE and it worked exactly the same. I've hit the limit on my CPU @4.2 (stable on hyper-pi), I can go higher but it'll fail stability test. I'm sure if I go into bios and play with more voltage settings I could push it ever higher then that but honestly not worth it to me.

For bulldozer CPU's use PSCheck, does the exact same thing as K10stat.

I have used OS based OCing but I still can't move past good old BIOS based OCing. Plus the new Asus UEFI BIOS is easier to look at and move around. Maybe I am just old fashioned.
 
It would take a 8 module CPU toi give 8 full cores with 8 partial cores unless they revamp the modules.

No such thing as "full" or "partial" core on BD. It's exactly eight cores divided into four pairs with each pair sharing a front end and cache, the entire die shares 8MB of L3 cache. AMD did it that way to save space, as much of a bad idea as it was.

To reiterate, all "cores" on a BD module are identical, there is no "full", "partial", virtual or other misleading nomenclature.

I have used OS based OCing but I still can't move past good old BIOS based OCing. Plus the new Asus UEFI BIOS is easier to look at and move around. Maybe I am just old fashioned.

Both do exactly the same thing, modify the MSR's (machine state registers). BIOS sets the initial parameters and leave them that way, dynamic OC'ing allows you to change those parameters on the fly.

I'm not sure if there is any equivalent tool amongst Intel CPUs for what K10stat / PSCheck does. The only reason they work is that AMD left the MSR's open for writing on their CPU's, you can actually blow a CPU up setting their values wrong. Many LN'ers have cooked their CPU's by setting them to 2.0v+ and running them too long at that setting. BIOS settings are also applied to all cores where as dynamic overclocking allows you to change speeds on a per-core basis. You can establish different OC profiles for different programs to customize your system.
 
Intel was the first to push out a mainstream quad (Q6600) and 6 core (9xxX i7s, even if they were too expensive) so I would't be suprised if Intel doesn't release the first "true" 8 core desktop CPU as well since their process lead can allow it. They could even make a 8 core IB CPU if they wanted, but honestly its near pointless for us enthusiasts as of now. As well it probably would have the best yields and thus would cost more per chip than anyone would want to spend.

The Q6600 was the first mainstream quad-core. I'll argue that AMD debuted the first mainstream six core as the Thuban Phenom II X6 was far less expensive and thus far more popular than the Westmere i7-980X. Even today very few people have an Intel six-core CPU as the least-expensive one is still around $400 (Xeon E5-2420.) I wouldn't expect any six-core parts to become mainstream as the information leaked from Intel about Haswell continue to have mainstream parts on LGA1150 being no more than quad-core parts.

If you really, really want an 8 core Intel processor that may or may not work on a desktop board, plunk down $1100 for a Xeon E5-2650. It's a 2.0 GHz base clock Sandy Bridge part with 20 MB of L3 in LGA2011.

Not sure with AMD though. It would take a 8 module CPU toi give 8 full cores with 8 partial cores unless they revamp the modules. Not sure the die size would be worth it on GFs 32nm right now.

AMD actually does make CPUs with 8 modules/16 cores right now (Opteron 6200 series), and they cost about what six-core Intel CPUs do, in the ~$500 to ~$1200 range. Die size is 315 mm^2. AMD manages this feat by taking a page out of Intel's old playbook and uses two 315 mm^2 4-module dies to make these chips, except that it gives each die its own set of memory interfaces rather than tying both dies together over one bus like Intel did, so that scaling is much better. The only downside to this approach is that you now have to use NUMA on a single-CPU system and the Windows scheduler doesn't handle NUMA very well. But, overall it's very preferable to use the MCM rather than to make one enormous 600+ mm^2 die that would be darn near unyieldable and probably have to sell for at least a grand for AMD to make any money.
 
The point is a simple one: If the GPU bottleneck is removed, where do CPU's fall in comparison to eachother? How much more headroom does CPU X have over CPU Y? Will there be a CPU bottleneck if a CF/SLI in the future? These are the questions that need to be answered when looking at CPU's in gaming, and no one does that anymore; they just want a test that shows every CPU from an i3 to an i7 putting out 42 FPS in BF3. what are you REALLY proving? Taking the BF3 example, how many times have we heard on this forum people asking if an i3 is sufficent, because it puts out the same FPS? Then we have to explain that in MP, the i3 chokes to death. Woops, so what does the BF3 benchmark actually prove?

If your getting the same number for an i3 as an i7 in BF3 then your doing it wrong. Take it out of single player timed demo mode, your not actually benchmarking anything thing.

It's already been proven, games don't run the same way in actual usage as they do in a pre-recorded single player timed loop demo. Your only testing the graphics engine and not the rest of the game. MP benchmarks show the BD / i5 / i7 beating the i3 pretty handily and by a fair enough amount to be noticeable.

Here's the problem though: Because MP performance will be variable, you really can't give a in depth bench using that either.
 
On Resonant Clocking:
http://spectrum.ieee.org/semiconductors/processors/powersaving-clock-scheme-in-new-pcs

Thanx for the link, i miss these articles ever since my IEEE subscription expired :)

Btw looks like my predictions were correct :)

What’s more, although a resonant clock doesn’t have to be run directly at the resonant frequency of the LC circuit, its efficiency as an energy recycler goes down when the clock is run significantly faster or slower.

“Driving far from resonant frequency won’t save power, and at some point [the circuit] won’t work at all,” says Phillip Restle, a member of the research staff at the IBM Thomas J. Watson Research Center in Yorktown Heights, N.Y. Cyclos has inserted a switch that allows the AMD chip to turn the resonant part of the clock on and off. But Restle, who performed some of the earliest work on resonant clocks in microprocessors, says the switch isn’t a perfect fix because it adds to the power a chip consumes.


I had talked about this here before, but its a bit messy read 😀
http://www.tomshardware.com/forum/335420-28-cyclos-technology-overclocks

Another fact is that PD's 8 core die could possibly end up much larger than 315mm2, maybe that explains the rumours of a native 3 module die :)
AMD’s Piledriver CPU design—which forms the heart of the company’s new Trinity chips—uses 92 of these 100-micrometer-wide inductors, spread out over each dual-core processor module.
100 um wide is quite a bit of space there, not to mention that the engineers had to re-design the layout all over again to incorporate the inductors within the module :)
 
And in other news, AMD expected to post a loss this quarter:

http://www.amd.com/us/press-releases/Pages/press-release-2012jul9.aspx

AMD (NYSE:AMD) today announced that revenue for the second quarter ended June 30, 2012 is expected to decrease approximately 11 percent sequentially. The company previously forecasted second quarter 2012 revenue to increase 3 percent, plus or minus 3 percent sequentially. The lower preliminary revenue results are primarily due to business conditions that materialized late in the second quarter, specifically softer-than-expected channel sales in China and Europe as well as a weaker consumer buying environment impacting the company's Original Equipment Manufacturer (OEM) business.

The company expects second quarter gross margin to be approximately in line with prior guidance. Operating expenses for the second quarter are expected to improve and to be approximately 8 percent less than prior guidance of approximately $605 million, a result of tightly controlled expenses in the quarter.

AMD will report second quarter 2012 results after market close on Thursday, July 19, 2012. AMD will hold a conference call for the financial community at 2:00 p.m. PT (5:00 p.m. ET) that day to discuss second quarter financial results and to provide information regarding expected third quarter results. AMD will provide a real-time audio broadcast of the teleconference on the Investor Relations page at http://www.amd.com. The webcast will be available for 10 days after the conference call.

 
Personally, if Piledriver does deliver those numbers, I think AM3+ wont be a wasted socket. PD is also rumored to be more energy efficient, which is a plus.

Well like you said "if". With the bulldozer AMD kinda exaggerated the performance so lets hope they do it right with piledriver. So far what i read with piledriver the top cpu's will still be 125watts(which is lame) but you will get higher clock speed but then again the lowerend piledriver may have better power consumption tho. Lets hope piledriver will have good headroom for overclocking i don't think AMD will mess that up at lease. lol
 
Here's the problem though: Because MP performance will be variable, you really can't give a in depth bench using that either.
Agreed! also bf3 can use up to 8 cores in multiplayer so it beats the i3 pretty good but since singleplayer loads everything before you even play the i3 will perform better! Once games start using new engines i think we'll see alot more using more then 4 cores so the i3 might not last as long as the bulldozer will as far as gaming anyways.
 
With the focus on fusion, AMD is pushing more Per/watt, which PD looks to be doing.

Another big downside to BD was the extreme amounts of power it started to draw when overclocked. Hopefully that is fixed/helped with Piledriver.
 
Agreed! also bf3 can use up to 8 cores in multiplayer so it beats the i3 pretty good but since singleplayer loads everything before you even play the i3 will perform better! Once games start using new engines i think we'll see alot more using more then 4 cores so the i3 might not last as long as the bulldozer will as far as gaming anyways.

Hence why you NEED a reliable CPU test for gaming, which is my entire POINT.
 
I think it's fairly intuitive that it was delayed to push out Trinity mobile as fast as possible, but let's hear your idea.

I think its coz they have to re-design the layout of the module for desktop frequencies, near the 4GHz range. They can then use this in PD and Desktop trinty.
They are using cyclos tech, and for maximum efficiency, the values, and hence sizes of the inductors won't be the same as in the 2-3GHz class mobile chips:) Read the IEEE article that gamerk316 posted a few posts back :)

Some people may laugh, doesn't matter:) Me i'm going to sleep now, Gud nite :)
 
Maybe it's not as easy as ramping up the clocks on mobile chips to make desktop trinty. The power dissipation will quickly get out of hand beyond the resonant frequency :) Same thing i'd posted about a long time ago :)

There's one more implication of this, again, only if some1's ready to hear :)
 
Hence why you NEED a reliable CPU test for gaming, which is my entire POINT.
Yeah well there are benchmarks test that are good tho but You can't compare a 8 core to a dual core. Any application that uses 2 cores the i3 will wins hands down but if you have something that uses 8 cores the fx chip will win.
 
Maybe it's not as easy as ramping up the clocks on mobile chips to make desktop trinty. The power dissipation will quickly get out of hand beyond the resonant frequency :) Same thing i'd posted about a long time ago :)

There's one more implication of this, again, only if some1's ready to hear :)

Well sure there's a certain amount of re-design that has to go into ramping up or down your TDP, especially at these minute process nodes. I think our points are one in the same. Had they first designed for Trinity DT than it would be shipping now and mobile would be delayed.

It's obvious that AMD is really changing their priorities, and I think it's a good move.

AMD: Mobile, Server, GPU (HPC), Desktop, Embedded

This really shows the massive difference in size between AMD and Intel. Compare AMD's product line to Intel's and you get a good idea of why AMD is still in the game. They focus on just a few categories where Intel is all over the place

Intel: Manufacturing, Desktop, Server, Notebook, Ultrabook, Smartphone, Tablets, HPC, Embedded, NAND, Chipsets, Security, Communications, Foundry, etc.
 
Hence why you NEED a reliable CPU test for gaming, which is my entire POINT.
I'm pretty sure there plenty of tests out there for gaming. Your looking for a test one size fits all which there no way thats possible. Since a game that uses only 2 cores i3 will come out on top of bulldozer 8 core chips easily since it does have the better ipc. Games that uses up to 8 cores like bf3 the bulldozer will win easily. There plenty of benchmarks out there for all different games and cpu's so all it takes is some googling. All games are made uses different engines so best way to know is looking up that specific cpu with the game you want to choose. Thats reliable enough in my opinion.
 
Yes it does and it's amazing. Software overclocking after bootup is so much faster then constantly playing with BIOS options. After getting such success with my 3550MX I tried using it on my 970BE and it worked exactly the same. I've hit the limit on my CPU @4.2 (stable on hyper-pi), I can go higher but it'll fail stability test. I'm sure if I go into bios and play with more voltage settings I could push it ever higher then that but honestly not worth it to me.

For bulldozer CPU's use PSCheck, does the exact same thing as K10stat.


thank you I will try it out tonight
I assume that I have to set my BIOS back to default setting (currently OCd to 4ghz on a 965BE)
 
Status
Not open for further replies.