AMD Piledriver rumours ... and expert conjecture

Page 229 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 


I stand corrected, thanks for the links.

I have nothing against AMD, I loved my FX-55 😵
 


Not traditional ways to greatly improve performance, but die stacking seems promising if the technical difficulties can be overcome:

Die stacking has promise and problems

Note: This is Part 2 of 2

The three most pressing problems of modern silicon design: power, I/O pins, and yields can all be addressed by chip stacking. A silicon on silicon 3D stack allows a designer to make interconnects at the scale of structures on the chips, you simply draw them like you would a metal layer. Connecting the same die to an organic or ceramic carrier would make the bumps larger by an order of magnitude, the minimum size there is much larger than that of silicon on silicon. As Nvidia found out with Bumpgate, thermal stress and other factors mandates this size difference.

The picture above is a cross-section of a memory on logic stack from the Qualcomm presentation. As you can see, the size of the die to die ‘microbumps’ is ~20 microns, the die to package bumps are 5x larger at 100 microns. The package to board balls/bumps are 5x larger than that, so 500 microns or so. This 1:5:25 ratio was consistent across the talks from all five companies, but the absolute values vary with the process used and intended markets. The short story is given the same area, you can fit 25x the connections in a stack than you can to a carrier. Add in the far lower thermal stresses because of very similar materials, and you have a win for stacking.

If you have multiple dies, not only can you get 25x the pins, but the wires are thinner, shorter, and potentially made from better quality material. This lowers the RC (Resistance Capacitance) mentioned previously by large amounts, lowering power used. In aggregate, 25x the I/Os between chips in a stack can consume far less net power than a fraction of that bandwidth going off package.

Xilinx, the one company that is producing stacked chips in volume, has one variant called the Virtex 7 2000T. This uses four FPGA slices on an interposer, and the number of connections between slices is listed as, “>10K”. The chip has a 45 x 45mm package, so the 1mm ball pitch would mean a maximum of 2025 pins out. For reference, the latest Intel Xeon CPUs have 2011 pins, so Xilinx is not way out of the mainstream here. Instead of going down in count, the first stacked chip on the market brings a 5x increase in I/Os, and progress is moving toward yet higher densities very quickly.

In addition to the I/Os and power, a company gets all sorts of yield advantages from stacking. You can connect multiple smaller dies that each yield far higher than a large monolithic part. Better yet, you can put in many different types of chips, even ones made on incompatible processes. Logic and DRAM? No problem. Bleeding edge process logic coupled with an analog chip made with the process equivalent of crayons on toilet paper? Easy. Throw in high voltage I/Os on a different die, and you have a worst case that is probably impossible to do on a monolithic part. Better yet, the latency between dies is far better than going off socket, a massive gain for memory bandwidth even without taking the potentially increased widths in to account. Yes, it is the best of both worlds.

To throw some cold water on this happy picture, we should point out that there are problems. The first of which is that, well, 3D parts can’t be made in high volume yet. There is talk, there are proposed solutions, but no one is doing it at the moment, Xilinx is only 2.5D. This will change in a hurry, there have been some prototypes spotted here and there, and Intel is about to talk about Crystalwell next week even if the Ivy Bridge variant never made it to market.
...

IIRC there was an earlier rumor of Haswell having maybe 64MB of stacked LDDR3 for the GT3 or maybe GT4 iGPU, and I think some future APU from AMD will do likewise, giving us some substantial performance boost. Hopefully Intel will be forthcoming with some facts & figures next week in their Crystalwell presentation.

2nd thing I note is the FPGA companies (like Xilinx) are pretty far ahead of the game with stacked logic. IIRC Intel has been fabbing FPGAs on their 22nm trigate process for a couple of FPGA companies, although I don't think Xilinx is currently a customer. Anyway, people have been wondering what's in it for Intel, to be a foundry for these FPGA customers. My bet is that Intel looking into replacing the fixed-function hardware in SB/IB for QuickSync with some sort of reprogrammable hardware based on FPGAs, almost as fast and far more capable. FPGAs are fast enough to be used for prototyping GPUs and CPUs, so this could be the next revolutionary step. And the fact that the interconnects between the chips are wide and low-latency might be enough to permit an extension of the ring bus to stacked logic, which fits in with Intel's architecture of putting all the processors on a ring.
 


FPS alone doesn't matter:

http://techreport.com/articles.x/23246

The main issue is latency. While a Core 2 is about as strong as a Phenom II clock for clock, it will be starved due to still relying on the FSB. So while per core performance will be about the same, the PII is still a better CPU in most workloads due to lower latency.

What I would expect is something close to 45 FPS, with some significant stuttering from time to time in most games, due to uneven time in creating frames.

Its a shame no one is willing to pull out a Q9550 and bench it for comparisons sake, given how many people still have LGA775 systems, a Q9550 could be a really cheap upgrade for a LOT of people.
 



True that, but its still hardly worth spending $300 on new cpu and mobo if your old system plays games just fine.
 
@ Gamer - yep, I'd agree with that. I still have my ancient Q6700 rig in the basement, and even with a more modern GPU (5770 replacing a dead 8800GTX), some older games that spawn a ton of enemies will just choke it to death. Like 2-3 fps with 100+ enemies on-screen 😛.. At that point all I can do is exit the game as it is 100% unplayable.

Will have to try those older games on my new rig with a 7970 and 3770K and see what happens, assuming they'll run on Win7 as opposed to XP. I'm about 50-50 running legacy stuff even with Win7 professional's XP mode...
 

Thats because people would remember that Intel isn't all powerful and that even their cpus LOST to previous gen sometimes. But if AMD fails even one benchmark its a disaster.

http://www.anandtech.com/show/2658/19
http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/22151-intel-lynnfield-core-i5-750-core-i7-870-processor-review-17.html

corei5_corei7_72.jpg


yep, thats the Q9550 in 2nd place above nahalem and lynnfield by a considerable margin on one site and not on the other. Remember how AMD fx had the same issue? only site thats accurate were the ones that showed AMD = fail.

19222.png


Here is the intersting part, what settings do you have to change to bring the PII x4 from the top to the bottom in the same game? Why don't we still see comparisons to the q9550? how many people would upgrade if they knew they were only gaining a few fps here and there and losing a few fps in other situations.

As far as "fps doesn't matter" ... latency = fps in time format.

FPS = frames per second
Latency = seconds per frame.

they are just the reciprocal of each other, nothing else. Its just easier to talk fps.
 

I don't see the 30% IPC increase panning out.
 



It does look like some much needed improvements to keep the pipes fed.

Maybe 2 years after the fact we'll see what Bulldozer should have been. That would put it 7 years into the design cycle right? Bulldozer was supposedly a 5 year endeavor.
 
All I can say is I should have switched to Intel sooner and I doubt I'll be going back to AMD unless they really really improve. There really is a big difference and I pity anyone still gaming on a Core 2 or Phenom 2.
 



eh, 5 years of fooling around failing to improve on even existing tech. fail
 
might want to read about the HDL again.

All I can say is I should have switched to Intel sooner and I doubt I'll be going back to AMD unless they really really improve. There really is a big difference and I pity anyone still gaming on a Core 2 or Phenom 2.

I pitty anyone who thinks they can game better with 62 fps over 60. that 2 fps totally Pwns l33t skilz me 2 gud for u. wonder how much improvement you got with the clean install of windows (vista?)
 


Windows 7 going from a 965 and 7950 to a i5 3470 and 7950. Big difference, look at the techspot article linked above. FPS shot up at least 10%+ minimum and max in a lot of games. Not 62 vs 60, more like 35 vs 50.
 


These aren't really stacked. They consider them 2D with an interposer, but they are impressive for the sheer transistor counts in a single package. These chips aren't cheap either, they can go for upwards of $12,000 each.

So far it looks like Haswell will be similar. There will be multiple die in the same package but not stacked. They've been doing this for a while.

The first mainstream 3D chips from Intel would likely be Atom based for the cell phone/tablet market.
 

I doubt that is the sole cause, because their graphics division is doing fine, best they've been in a while actually.
 


What gamer and the article are referring to is not FPS or it's inverse, average latency, but instead those annoying lags and delays in some percentage of frames during gameplay - i.e., the big spikes, up or down, in FPS or latency that is the antithesis of the oft-quoted AMD mantra of "smoothiness" 😛. And what gamer was saying is that Conroe was less "smoothier" than K8 or K10 due to the FSB causing latency issues occasionally (i.e., the bus turning into a parking lot for a few milliseconds).

I would think that as an AMD defender nonpareil, you would be agreeing heartily with his analysis, but maybe you misunderstood it :)..
 


Heh, OK I'll call them a 'sandwich' instead :).. However multiple die in the same package connected horizontally (such as Intel used with the iGPU in Lynnfield isn't going to be anywhere comparable to using a TSV interposer and 2nd chip aligned vertically.

IMO, given GF's statements about how they don't need no stinkin' trigate transistors, and no news I've heard about interposers & 2.5D stacking 😀, my guess is that they are somewhat behind the curve on those technologies (assuming they would be the ones to fab the interposers and memory or logic chips to be stacked on the CPU or GPU). If not, then I can imagine all sorts of problems arising from trying to stack chips from different sources. A lack of good communications between AMD and GF was probably contributory to the yield ramping problems they had with BD and Llano - imagine bringing in a third or 4th party..
 



I never get those slowdowns. Even when there is pure chaos on the screen with smoke, explosions, bullets etc. I guess the secret is having 6 actual cores.
 



I had a 965BE at 4ghz and I know that the equivalent Intel CPU in all the benches I did was the I7-920 at stock speeds
Now would say that a I7-920 is not good enough for gaming?
 


Yes. The 920 is far behind Ivy Bridge.
 
Guys we can bounce off the walls at each other all day long and nothing will be gained, appart from early reports of Piledriver changes over Zambezi nobody knows anything, AMD for one have been very tight lipped about anything so right now its just complete speculation. What did come out was AMD stated they are targetting at the very least a 15% performance per clock improvement per release and they did indicate they were quite pleased with Vishera's progress.

As to the ultimate king of processors, probably not, probably never again when you consider AMD's level of technology compared with Intel it is highly unforeseeable that AMD will ever defeat Goliath in a matchup of pure brute force, but in a article released on THG recently about AMD "Vision" AMD have stated themselves they are not looking to slug it out on one aspect of CPU architecture. The future is not single threaded performance.

So we have gone on for 129 pages highlighting the only negitive on Zambezi but what about the promise, how about a little love 😛.

 
Status
Not open for further replies.