AMD Piledriver rumours ... and expert conjecture

Reynod · Oct 27, 2011

We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...

jimmysmitty · Mar 11, 2012

I'll just stop you right there because it's already being done. Chips are taped out and being fabbed RIGHT NOW doing this. They'll be in product this year.

There is an entire JEDEC standard based on this. It's called WideIO and it's just in the first phase. You're looking at 16Gb stacks today that will double easily by next year. That means within 2 years you could have 8GBytes which is more than adequate for most laptop/desktop computing needs. And that's just WideIO. Micron and IBM have their own thing going on and they were talking 20 die stacks.

The power/heat savings is significant (70%) when you don't have to go off chip for memory.

You cite heat as an issue and yet we're looking at 35W Trinity and Ivy Bridge CPUs. They can put on entire GPUs which generates significant more heat than DRAM on the same die. There's nothing to prevent the power envelopes needed for 3D stacks, other than the technical challenges of assembling stacks with sub micron accuracy.

Denying CPUs will go 3D is like 3DFX denying integration of 2D/3D video cards. It's not a matter of IF it's a matter of WHEN.

Don't worry. People will state "this will never happen" or "that will never happen". Then it happens.

I am sure Intel is looking at stacked RAM. In fact they are, its just a matter of if the benefits outweigh the risks, much like why Intel went gate last for HK/MG (which as I said before, AMD and the industry will follow).

Cazalan · Mar 11, 2012

You are spot on about people not knowing much if anything about Sun at all. Most here have never owned, maintained, or even used Sun hardware at all. Only a few now days know what they are looking at when it comes to Sun hardware much the same with SGI. It is a dead shame that it is that way but maybe that will change. Perhaps there will one day be another golden era when everything isn't locked down like it is now and gain some diversity once again.

Sun and SGI were great and all but they were pretty closed systems.

Whoah Whoah Whoah!

Why so pessimistic? There's more open source hardware, software, operating systems, compilers, you name it, now than ever in history. Buy an FPGA and design whatever the heck you want. A $5 microprocessor has more power than what NASA had going to the moon.

viridiancrystal · Mar 11, 2012

Well, if the Radeon 7k series has shown us anything, it is extremely low power consumption, which is was AMD needs.

Familiar speed, less power.

Trinity's successor will use GCN, which i believe Trinity is not. Much less power use from the gpu is a big asset.

If fusion is the future, then AMD doesn't need to have the fastest graphics cards, just the most efficient.

jimmysmitty · Mar 11, 2012

Well, if the Radeon 7k series has shown us anything, it is extremely low power consumption, which is was AMD needs.

Familiar speed, less power.

Trinity's successor will use GCN, which i believe Trinity is not. Much less power use from the gpu is a big asset.

If fusion is the future, then AMD doesn't need to have the fastest graphics cards, just the most efficient.

It may be GCN. It depends on how well that arch will work on a CPU die.

As for the efficiency, thats not just arch. Its process. TSMCs 28nm while haveing poor yields, has great efficiency.

Plus Trinitys successor will have to face Haswel on 22nm 3D Tri Gates and possibly Bridgewell, 14nm.

king smp · Mar 11, 2012

everyone wants profit and thus they haven't released k version of i3, no i4 or i5-2400k.

But if they do then their will be no point to recommend amd, even the phenon x6 cannot competete with those cpu at that price.

I went amd in my latest build because of oc option (as i do it for fun even it have more fun than playing games to me, i can't leave ocing) at cheap price which intel lacks in sub 200$ market., also due to more cores at that price.

there is still one great option for AMD IMHO
http://www.newegg.com/Product/Product.aspx?Item=N82E16819103995
the 960T Zosma
now granted it is not a BE but with my Deneb I got it from 2.8-3.4 pretty easily
so I would think I could get a Zosma up to 3.5 simply
I also have an Asus AM3 770 board I can unlock with
so hopefully have a 6 core 3.5 CPU
for $125
that is a great bargain for current AM3 owners
also I am very interested in the Thubans
I do alot of video work using multithreaded software like Handbrake and Power Director
so would benefit from MOAH cores LOL
so a Thuban BE edition is a dream of mine
so AMD still currently has some interesting offers
I have seen the 960T going for $109 in some places
I could blow away a FX 4170 especially if I can unlock successfully
but it is a darn shame that AMD EOLd so many PHIIs

-Fran- · Mar 11, 2012

The "leaked" road map for the GCN radeon cards states that the 8xxx series will start using the hybrid computing API (or whatever they said it was).

I think there's when we'll start tasting what AMD wants to show us of their "Fusion" marketing word.

I really wonder how far they'll be software wise with devs at that time... That's like 50% of the gamble.

Cheers!

truegenius · Mar 11, 2012

amd is killing ph2 and thats why i bought this ph2x6.😉

igpu gpu in llano have no use for us (enthusiasts), and thus stacked ram will be of no use, but l4 can be usefull. Some of us have 2-4gb of ram but some have upto 32gb, stacked ram will make upgrading ram difficult (as we will need to change cpu to get more ram (as these setup will not have dedicated ram slots, if they will have then stacked ram will be no more than l4 cache)).

jimmysmitty · Mar 11, 2012

amd is killing ph2 and thats why i bought this ph2x6.😉

igpu gpu in llano have no use for us (enthusiasts), and thus stacked ram will be of no use, but l4 can be usefull. Some of us have 2-4gb of ram but some have upto 32gb, stacked ram will make upgrading ram difficult (as we will need to change cpu to get more ram (as these setup will not have dedicated ram slots, if they will have then stacked ram will be no more than l4 cache)).

True but if they put 4GB of dedicated on die RAM, it would be insanley fast compared to RAM on the mobo. Then the rest would be extended RAM.

I wouldn't mind having 4GB of on die RAM. I could imagine the performance benefits easily.....

fazers_on_stun · Mar 11, 2012

I'll just stop you right there because it's already being done. Chips are taped out and being fabbed RIGHT NOW doing this. They'll be in product this year.

There is an entire JEDEC standard based on this. It's called WideIO and it's just in the first phase. You're looking at 16Gb stacks today that will double easily by next year. That means within 2 years you could have 8GBytes which is more than adequate for most laptop/desktop computing needs. And that's just WideIO. Micron and IBM have their own thing going on and they were talking 20 die stacks.

The power/heat savings is significant (70%) when you don't have to go off chip for memory.

You cite heat as an issue and yet we're looking at 35W Trinity and Ivy Bridge CPUs. They can put on entire GPUs which generates significant more heat than DRAM on the same die. There's nothing to prevent the power envelopes needed for 3D stacks, other than the technical challenges of assembling stacks with sub micron accuracy.

Denying CPUs will go 3D is like 3DFX denying integration of 2D/3D video cards. It's not a matter of IF it's a matter of WHEN.

Heh, make sure to bookmark Prof. Palladin's blanket statement, although this time he did state "main" memory instead of just "memory" per se, which I take as a bit of a retraction from previous posts in this thread where he stated stacking GPU memory on top of the die would never happen either..

From what I've read, trading high-speed, high-power DDR such as main memory or GPU memory for low-power, low-speed memory located stacked on top of the die is more than compensated by the reduction in latency and the increase in bandwidth. S/A has had several articles on this, as well as IEEE iirc.

And CPU/GPU power is coming down quickly too - the top-end 22nm 3770K IB with 10% or more CPU performance at 50% or more GPU performance than the 32nm 2600K SB, at around 20 watts less TDP. That's a 20% power reduction right there with increased performance to boot. Intel has already demoed near-threshold-voltage logic as well. So the bits and pieces are fairly self-evident to anybody who cares to look. I would not be surprised to see Intel come out with a many-core, NTV CPU in the next few years, with similar NTV main memory stacked on top, for a one-chip solution similar to the SoC stuff everybody is working on nowadays. Whether it uses a boatload of Atom cores, or a bunch of Haswell cores, we might get top-end desktop performance in a cellphone power usage envelope.

But the overall trend is clear - look at the recent news stories about Intel testing digital radio transceivers on-die, for Wifi/GSM/Bluetooth communications. This not only reduces power consumption but costs as well since that would make at least one less separate chip needing space & connections on a printed circuit board.

fazers_on_stun · Mar 11, 2012

How do you expect to have 4GB RAM on die though ? 😱
Currently SB-E only has 20MB, how big is that chip gonna be ?

He meant stacked memory on top of the die, using silicon interposers for connection of the memory die to the CPU die. The dice are fabbed separately then stacked & interconnected at bonding time..

nforce4max · Mar 11, 2012

How do you expect to have 4GB RAM on die though ? 😱
Currently SB-E only has 20MB, how big is that chip gonna be ?

Stacked MCM, don't need to run the ram at core clocks to the cpu. I will be just happy if they got 4gb+ on to the same substrate as the cpu as it will make for smaller and much less complex boards for small form factor pcs and cheaper laptops. If they get the boards small enough they can make room for much better coolers and maybe a second 2.5 drive for typical laptops $300-$700 range.

nforce4max · Mar 11, 2012

He meant stacked memory on top of the die, using silicon interposers for connection of the memory die to the CPU die. The dice are fabbed separately then stacked & interconnected at bonding time..

You should take a look at the Cray 3 and see how it was built. A single 4x4inch module had 1024 ICs and 12000 vertical connections yet was only 1/4 of an inch thick. Small yet that it fits in the palm of a hand. Each die had to be milled down further to make them thin enough for each layer could be connected using gold wire posts.

http://www.ebay.com/itm/Cray-3-Module-Supercomputer-Board-Rev-B-Flat-Tabs-wire-Edge-Tabs-/120842132767

nforce4max · Mar 12, 2012

Interesting question indeed. Too bad you can't drop the multi lower than 16 on a 2500/2600K. Have a 1 core, 1GHz faceoff and even throw the Pentium III Coppermine at 1GHz into the miz just for shnitz and giggles.

Well to think about it can SB be downclocked or locked at a lower multiplier around 1.4ghz and match P3 tualatin instead?

de5_Roy · Mar 12, 2012

80s.... scorpion?
anywho, here's a tiny bit of steamroller
http://www.fudzilla.com/home/item/26280-keveri-2013-fusion-supports-ddr3-2133

truegenius · Mar 12, 2012

why amd is still using 28nm? Is gf or tsmc are causing problems,!?
Steamroller based apu will not stand a chance against 4th gen i core at 22nm ie haswell maybe ivy too.

Cazalan · Mar 12, 2012

why amd is still using 28nm? Is gf or tsmc are causing problems,!?
Steamroller based apu will not stand a chance against 4th gen i core at 22nm ie haswell maybe ivy too.

GF/TSMC are 2 years behind Intel in process technology.
AMD has no choice but to stay with 28/32nm.

I agree AMD will have serious trouble competing on the next gen APU (2013). There's only so much more they can do with 28nm process.

g4114rd0 · Mar 12, 2012

80's ...too much hairspray!

AMD FirePro Professional Graphics and Sony Vegas Pro 11..."
http://blogs.amd.com/fusion/2012/03/12/accelerate-your-performance-with-amd-firepro%E2%84%A2-professional-graphics-and-sony-vegas-pro-11/

esrever · Mar 13, 2012

Denying CPUs will go 3D is like 3DFX denying integration of 2D/3D video cards. It's not a matter of IF it's a matter of WHEN.

An apu is a cpu with gpu capabilities. Theres no way dedicated cpu hardware can do what the gpu does but the fusion is what both intel and AMD have been working on.

I would expect the cpu to be replaced by the gpu than gpu being replaced with the cpu. In the conventional case, a gpu has far more raw performance and all we need is good algorithms to implement all cpu tasks on it. There will still need of some cpu logic but I would imagine it would become much much smaller than the gpu.

sandy to ivy made the gpu take more die area and Im expecting it to grow. Same with llano to trinity. Eventually the cpu will be much bigger than the cpu and opencl and what ever come in the future will handle much of the heavy computing needed.

braincruser · Mar 13, 2012

Not everything can be implemented in gpgpu.
Only heavily parallel algoritms see improvements with gpu computing. Also dedicated cpu hardware CAN do what the gpu does, in fact it already does just in a smaller degree.
What i believe will happen is amd with its development of fusion creates one CPU that has say 2-4 traditional cpu cores that can scedule tasks to the hundreds of shaders in the same manner it handles FP calculations and SIMD instructions(just this time it would be MIMD) there fore improving performance to heavy calculations and graphics while keeping the flexibility of the processor.
Which as i read through the paragraph seems like going back to the time without 3D acceleration. Exactly that in fact.

palladin9479 · Mar 13, 2012

I'll just stop you right there because it's already being done. Chips are taped out and being fabbed RIGHT NOW doing this. They'll be in product this year.

There is an entire JEDEC standard based on this. It's called WideIO and it's just in the first phase. You're looking at 16Gb stacks today that will double easily by next year. That means within 2 years you could have 8GBytes which is more than adequate for most laptop/desktop computing needs. And that's just WideIO. Micron and IBM have their own thing going on and they were talking 20 die stacks.

The power/heat savings is significant (70%) when you don't have to go off chip for memory.

You cite heat as an issue and yet we're looking at 35W Trinity and Ivy Bridge CPUs. They can put on entire GPUs which generates significant more heat than DRAM on the same die. There's nothing to prevent the power envelopes needed for 3D stacks, other than the technical challenges of assembling stacks with sub micron accuracy.

Denying CPUs will go 3D is like 3DFX denying integration of 2D/3D video cards. It's not a matter of IF it's a matter of WHEN.

And now I'm going to make you look really stupid.

I've known about WideIO for a long time ago, you forget I happen to live in South Korea and have attending tech briefs about this. My Korean counterparts are employees of Samsung.

http://www.samsung.com/us/business/semiconductor/news/downloads/presMDeen.pdf

It's a method used to increase memory density by stack consecutive layers on top of each other. This is something we all know.

Samsung is doing this in the mobile sector when heat production is low. Their doing it there because in order to get high memory density you need to stack multiple layers on top of the CPU die. Take a very close look at page 11 of those slides. Notice how you now have multiple layers of memory stacked on top of a CPU die. These layers are made of semiconductor material, semi-conductors do not conduct heat very well compared to conductors. There is a reason you use copper / aluminum on your heat-sink. If you have a CPU generating 35W of TDP you will need to remove that 35W by some means, having a layer of insulating material between the CPU die and the package heat spreader is very bad.

Their marketing it in mobile communications, PC memory sticks and high speed telecom equipment. Every last one of those is a low thermal output system.

Now compare that to a 120W CPU that requires a large HSF or its safety's trip and it shut's itself down. The type of CPU's we've been talking about in this thread.

The amount you could fit, assuming 100% space utilization, is a mere fraction of what's in today's systems, not to mention what will be in the systems in the next four or five years.

They have 4Gb stacked memory on a computing device available right now, I've personally seen it. Their planning 8Gb in the near future. Their also about to offer memory sticks in the 8~16GB range with 32GB planned down the rode (technically already have 32GB but that's RDIMM and very expensive). So what they can put on an ARM CPU is a fraction of what you'll be running in your desktop. Any memory on a desktop CPU would be limited by the number of layers until thermal transfer was compromised, even one layer will lower thermal efficiency. It lowers total platform TDP at the expense of requiring a single part to remove it all rather then spreading it out across multiple parts. Your claim about 16Gb is false, that's memory sticks not on a CPU.

http://www.samsung.com/global/business/semiconductor/minisite/Greenmemory/Products/DDR3/DDR3_Lineup.html#prodSF01B

Here is what they have and what their working on. 20nm Stacked DDR3 is going to make 8GB sticks cheap and the 32GB sticks much cheaper (not $1000+ per stick).
Didn't we discuss earlier about AMD attaching a Radeon GPU to their CPU causing them to be more conservative with their thermal envelope. Exact same issue happens when you put an insulator between a CPU and it's heat transfer plate.

palladin9479 · Mar 13, 2012

Don't worry. People will state "this will never happen" or "that will never happen". Then it happens.

I am sure Intel is looking at stacked RAM. In fact they are, its just a matter of if the benefits outweigh the risks, much like why Intel went gate last for HK/MG (which as I said before, AMD and the industry will follow).

I never said stacked memory wouldn't go on a CPU die, I said it wouldn't replace main memory. If the die has a low enough TDP and you know you'll be able to deal with the thermal issues, then there is no reason not to put some fast local memory on there. It won't replace desktop main memory due to how cheap it'll make system memory. How dense will expansion memory sticks become if they can pack 4GB into the space of a single CPU die? DIMM's have four to sixteen modules on top them. Assuming each of those modules contains the same amount of memory as is stacked on the CPU your looking at 4~16x more memory per stick, or 8~32x more memory in the typical dual stick setups.

http://www.marketwatch.com/story/jedec-publishes-breakthrough-standard-for-wide-io-mobile-dram-2012-01-05

If your looking for SoC then it makes perfect sense, cell phones / handhelds or even gaming consoles all would benefit tremendously from this. Pico-ITX solutions would also benefit from having the memory soldiered directly onto the board or a chip.

Intel / AMD will take their time and only fully implement this in low-end solutions. Memory fragmentation problems go beyond what NUMA introduces. You end up with two separate memory bus's, one for the fast local memory and the other for the slow(er) external memory. A similar situation happens with Hybrid HDD's (the ones with SDD caches inside them). You have two locations that you can put something, and only the OS knows where stuff is.

There are two solutions, one is to run the local memory at the same speed as the external memory and just treat it like you would any other DIMM. This solves the fragmented memory map issues but your wasting the potential of having such fast local memory. Especially when you consider that by the time you get 4GB of memory on a Intel chip, the system will have 32-64GB of main memory installed. The other solution involves turning the fast local memory into a fast cache, similar to the older cache sticks from the early 90's. In this case the faster memory can be used to hold frequently accessed data and serves as a buffer between the CPU and system memory.

In either case, 4GB looks big ~now~ but by the time you see this you'll have 32~64GB memory made from the same stuff in your computer. You'll also have a slower / lower power CPU then you would of otherwise due to the increased thermal insulation. That is the trade off you pay for.

Here is a picture of what it looks like

http://blog.stericsson.com/wp-content/uploads/2011/12/WideIO.png

You have a package material with a CPU on top of it. On top of the CPU a heat spreader is plated for thermal interface with the cooling solution. When a 3D memory stack is implemented it's put on top of that chip and is between the CPU and it's thermal plate. Silicon has worse thermal transfer properties then metal, in the context of this design the memory stack is acting as a thermal insulator between the CPU and it's thermal plate. Not only that, but all 100% of the heat generated from the CPU must pass through the 3D memory stack which in turn heats up the memory stack before being dissipated by the thermal transfer plate. This means whatever thermal envelop you had for the CPU just shrank as not only must you deal with the increased heat lead of the memory stack but also the decreased thermal transfer efficiency. This will manifest itself as lower clock speeds.

palladin9479 · Mar 13, 2012

I just noticed the mistake Caz made when he read up on the news. He confused Gb with GB. 4Gb is only 512MB. Which after considering it's a low heat SoC is right in line with my saying 64~256MB of L4 cache being practical on a desktop CPU. 256MB is 2Gb and when accounting for the need of thermal efficiency you would get ~128MB of useful memory, or one quarter whats sorta (not actually in stores ~yet~) available now.

So who's for running Windows 8 with 256 ~ 512MB of memory?

g4114rd0 · Mar 13, 2012

Freescale Releases 28nm 12-Core and 8-Core 64-bit Power CPUs..."
http://www.brightsideofnews.com/news/2012/3/12/freescale-releases-28nm-12-core-and-8-core-64-bit-power-cpus.aspx

gamerk316 · Mar 13, 2012

Take a look at this:

http://www.youtube.com/watch?v=gpfRrQfqD4E

This is the type of thing that we'll see offloaded from the CPU over time, as you have literally hundreds of independent objects that need to have physics applied across their entire body in a real-time basis. Yes, CPU's can thread this REALLY well, but from a performance perspective, these types of calculations are best done on a GPU-like architecture.

In my mind, anything that can thread well is likely going to be moved off the CPU onto a GPU-like architecture, leaving the CPU to carry out the more linear parts of the program. I'm not saying there won't be any scaling [audio/UI and individual components of a program can still be threaded fine], but all the heavy-duty calculations will probably be done on a parallel co-processor.

king smp · Mar 13, 2012

nice link gamerk316
I noticed it was done on a GTX 480 also
I assume using CUDA

AMD Piledriver rumours ... and expert conjecture

Administrator

Champion

Distinguished

Distinguished

Champion

Splendid

Glorious

Distinguished

Champion

Splendid

Splendid

Splendid

Splendid

Splendid

Splendid

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Splendid

Splendid

Distinguished

Glorious

Splendid

Share this page