AMD Piledriver rumours ... and expert conjecture

Reynod · Oct 27, 2011

We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...

Cazalan · Mar 13, 2012

And now I'm going to make you look really stupid.

I've known about WideIO for a long time ago, you forget I happen to live in South Korea and have attending tech briefs about this. My Korean counterparts are employees of Samsung.

Your own links show the technology is becoming reality. Of course Samsung is talking about their mobile ARM processors because that's their forte. And what's happening there? They're going 64bit, they're going quad core, 8-core. The power consumption is ramping up in the mobile space, while the x86 space is ramping down.

Nothing you linked or wrote there presents a brick wall that the technology hasn't overcome. You're talking about 120W CPUs which even before the end of this year will be on the extreme end for a CPU.

I didn't allude to this showing up on server chips first, and certainly not today for x86 chips. This would be on Haswell type 25-35W chips or probably Atom first. We're talking APUs and mainstream chips in 2014/2015, not workstations and servers.

You talk about heat as if the industry wasn't aware of that when they started making these standards. How absurd! They're made to specifically address lowering overall power and heat, while also increasing density and bandwidth.

TSVs themselves (PWR/GND) conduct heat. Dummy TSVs grids/arrays can be added to conduct more heat. Thermally conductive epoxy for bonding the die. This stuff wasn't invented yesterday. It's been in development for the better part of a decade.

Cazalan · Mar 13, 2012

I just noticed the mistake Caz made when he read up on the news. He confused Gb with GB. 4Gb is only 512MB. Which after considering it's a low heat SoC is right in line with my saying 64~256MB of L4 cache being practical on a desktop CPU. 256MB is 2Gb and when accounting for the need of thermal efficiency you would get ~128MB of useful memory, or one quarter whats sorta (not actually in stores ~yet~) available now.

So who's for running Windows 8 with 256 ~ 512MB of memory?

I didn't confuse it. WideIO has "Support for up to 32 Gbit monolithic density", that's 4GB.

Elpida started sampling/shipping a 4Gbit version, which yes is 512MB.

That's just a first rev and companies can expand on that as they see fit.

fazers_on_stun · Mar 13, 2012

palladin9479 wrote :

And now I'm going to make you look really stupid.

Click to expand...

Your own links show the technology is becoming reality.

leper84 · Mar 13, 2012

there is still one great option for AMD IMHO
http://www.newegg.com/Product/Product.aspx?Item=N82E16819103995
the 960T Zosma
now granted it is not a BE but with my Deneb I got it from 2.8-3.4 pretty easily
so I would think I could get a Zosma up to 3.5 simply
I also have an Asus AM3 770 board I can unlock with
so hopefully have a 6 core 3.5 CPU
for $125
that is a great bargain for current AM3 owners
also I am very interested in the Thubans
I do alot of video work using multithreaded software like Handbrake and Power Director
so would benefit from MOAH cores LOL
so a Thuban BE edition is a dream of mine
so AMD still currently has some interesting offers
I have seen the 960T going for $109 in some places
I could blow away a FX 4170 especially if I can unlock successfully
but it is a darn shame that AMD EOLd so many PHIIs

Zosma's are awesome. I had mine with all 6 cores 4.0 @ 1.35v... only thing that worried me were temps around 57-58 with prime using a hyper 212+ so I backed down. I can't remember exactly what but with 4 cores it used even less voltage.

From everything I've seen online they have a huge success rate at unlocking.

king smp · Mar 13, 2012

@leper84- to me the 960T BE is the only real good deal that AMD really has right now
for AM3 owners with a unlocking board it is awesome
I have a Asus M4A77TD with ACC and Asus Unleashing Mode
I am not planning on a CPU upgrade unless it is an even swap (might sell PHII 925 and buy 965BE hopefully)
but if I was going to buy a AM3 CPU out of pocket the Zosma is an awesome deal
4 cores at 4ghz and possibly two unlocked cores!!!

but I probably will wait to get a BE Thuban much later on hopefully used on Ebay cheap

esrever · Mar 13, 2012

the FX4100 at $100 isn't so bad. its still a decent all around cpu with pretty decent gaming performance when paired with a cheap gpu. The $20 difference between it and the i3 is the difference between the 6850 and 6870, pretty good deal with the FX imo.

ctbaars · Mar 14, 2012

no FX-41xx is crap, period...

Isn't that a little harsh. Yes, there are better choices bit it works.

jimmysmitty · Mar 14, 2012

I didn't confuse it. WideIO has "Support for up to 32 Gbit monolithic density", that's 4GB.

Elpida started sampling/shipping a 4Gbit version, which yes is 512MB.

That's just a first rev and companies can expand on that as they see fit.

Didn't Elpida just go out of business?

Zosma's are awesome. I had mine with all 6 cores 4.0 @ 1.35v... only thing that worried me were temps around 57-58 with prime using a hyper 212+ so I backed down. I can't remember exactly what but with 4 cores it used even less voltage.

From everything I've seen online they have a huge success rate at unlocking.

AMDs 6 cores were pretty nice. And 57-58c is pretty good fo that cooler. The CM 212 isn't the best one out there but 57-58 at 4GHz on 6 cores is pretty darn good.

Still sluggish on the AMD news front. I am suprised since IB will launch next month, 29th of April has been confirmed, AMD should be trying to spoil the news with benchmarks and news of their own.

esrever · Mar 14, 2012

my toilet works but I rather have a rich person's porcelain lined in gold like the Palaces in Egypt..
and no, it doesn't work for more than an internet box.

right... and what if your basis for this?

esrever · Mar 14, 2012

hands-on experience and being more knowledgeable than you apparently.

you are totally right. what a nice well thought out and completely relevant arguments you provide. Totally not a baseless ad hominem response. I must applaud you.

-Fran- · Mar 14, 2012

The Sultan of Brunei has solid gold toilets in his palace

Gold is so "new rich people". I like diamonds; all the way with shiny.

Cheers!

king smp · Mar 14, 2012

Mods are going to get tired of deleting off topic posts on this thread LOL
and a titanium toilet is the only way to go

king smp · Mar 14, 2012

make sure you have a temp controlled hot water loop running thru the toilet seat

palladin9479 · Mar 14, 2012

I didn't confuse it. WideIO has "Support for up to 32 Gbit monolithic density", that's 4GB.

Elpida started sampling/shipping a 4Gbit version, which yes is 512MB.

That's just a first rev and companies can expand on that as they see fit.

I was giving you the benefit of a doubt. Since you just admitted to knowing the difference then you also just admitted to thinking that people can have 4GB of main memory on a CPU. If you can fit a single 32Gb stack onto a CPU then you can get 32Gb x 8 onto a cheap desktop memory stick. That's 32GB of system memory per stick or 64 for a dual stick setup, 128 for high end setups. 4gb vs 64/128, yes it's merely a fraction of whats available for system memory.

Which disproves your statement, that 3D stacked memory would make main system memory obsolete. You made that statement during a discussion about memory bus's on CPU's and the performance impact it has on IGAs.

Right now we have 512MB of memory on a small ARM chip, you won't be seeing 2GB for another two years, this is from Samsung btw. This is in the mobile space where such small memory is acceptable and the devices are low power low heat devices.

Not only that, but you won't be seeing this anytime soon, if ever, on a desktop chip, for the reason's I've stated before. Reason's you have yet to be able to refute. That the silicon memory layer acts as a thermal insulator between the hot CPU and the thermal plate. That is the actual engineering challenge not the memory density. The more you stack memory the harder it becomes to remove heat from the underlying CPU. Since this is a enthusiast board I won't insult people's intelligence about what that means to clock speeds and performance for that CPU.

And you still haven't even discussed what your idea means to memory architecture, so even assuming all those issues could magically be hand waived away, you still are stuck with a segmented memory map and x86 doesn't handle that very well. Low capacity of memory relative to what's cheaply available combined with lower performance expectations due to thermal issues coupled with a disparate memory map means it will be used for L4 cache in small amounts, nothing else.

Now what's really funny is that your so caught up with vertical stacking that you never sat back and saw what ~is~ possible, horizontal placement of a 3D memory stack. They stack them vertically to better make use of limited physical space, this is a problem in mobile applications (and real estate out here). In a desktop computer you have plenty of room on the CPU mounting board. The CPU die is often 20% or less of the actual size of the socket, which means tons of horizontal space is being wasted. Instead of stacking memory ontop of the CPU you put it horizontal with the CPU with the interconnects being on the bottom layer and fusing to the side of the CPU. Fundamentally it's the same concept, two separate die's but instead of them connecting in a vertical manor you have the bottom most layers connect in a horizontal manor. This allows the thermal load from the CPU unimpeded access to the heat plate while leaving you plenty of room for a larger memory stack. Still not enough to compete with main memory, but more then enough to be GPU memory or a large cache. Requires more engineering work, especially as you'll need a finely fitted heat plate, but more then doable.

palladin9479 · Mar 14, 2012

Your own links show the technology is becoming reality. Of course Samsung is talking about their mobile ARM processors because that's their forte. And what's happening there? They're going 64bit, they're going quad core, 8-core. The power consumption is ramping up in the mobile space, while the x86 space is ramping down.

Nothing you linked or wrote there presents a brick wall that the technology hasn't overcome. You're talking about 120W CPUs which even before the end of this year will be on the extreme end for a CPU.

I didn't allude to this showing up on server chips first, and certainly not today for x86 chips. This would be on Haswell type 25-35W chips or probably Atom first. We're talking APUs and mainstream chips in 2014/2015, not workstations and servers.

You talk about heat as if the industry wasn't aware of that when they started making these standards. How absurd! They're made to specifically address lowering overall power and heat, while also increasing density and bandwidth.

TSVs themselves (PWR/GND) conduct heat. Dummy TSVs grids/arrays can be added to conduct more heat. Thermally conductive epoxy for bonding the die. This stuff wasn't invented yesterday. It's been in development for the better part of a decade.

Thermal dynamics doesn't work that way. If it did then BD's thermal issues could be solved overnight and we'd have much faster CPU's then we do now.

palladin9479 · Mar 14, 2012

Didn't Elpida just go out of business?

AMDs 6 cores were pretty nice. And 57-58c is pretty good fo that cooler. The CM 212 isn't the best one out there but 57-58 at 4GHz on 6 cores is pretty darn good.

Still sluggish on the AMD news front. I am suprised since IB will launch next month, 29th of April has been confirmed, AMD should be trying to spoil the news with benchmarks and news of their own.

Yes they did, and Samsung's been sampling 3D stacked memory since 2009. Their now on a 30nm process moving to a 20nm one. First 20nm sticks should be hitting the market sometime this summer to late fall, assuming they don't have any hick ups. They have two goals, first being to make a cellphone CPU that has fast local memory of at least 1GB, 512MB has been determined too small. Second being to make cheap 8GB DDR3 sticks available. Eventually they plan on making 16 and 32GB DDR4 sticks for the consumer markets, but those plans are a couple of years out.

That is why I know Caz's got the wrong idea. On-chip memory isn't capable of replacing main desktop memory. Right now is 512MB on a low heat chip, could you Run windows 7/8 on 512MB of memory? Not comfortably, which destroys any performance advantage you'd get from it. In two to four years, when it becomes possible to put 2GB of memory on a slow chip, we'll be using 32~64GB of main memory and you'll be faced with the same problem. No matter how much you pile into the CPU it'll always be cheaper and more economical to put it outside, and your requirements will grow such that it won't be possible to meet with what is technically available at that time. Need I remind everyone of Bill Gate's famous quote of 640KB being "enough" memory. Caz is basically making the exact quote, that 2GB is "enough" when we'll be running 32~64GB.

jimmysmitty · Mar 14, 2012

I was giving you the benefit of a doubt. Since you just admitted to knowing the difference then you also just admitted to thinking that people can have 4GB of main memory on a CPU. If you can fit a single 32Gb stack onto a CPU then you can get 32Gb x 8 onto a cheap desktop memory stick. That's 32GB of system memory per stick or 64 for a dual stick setup, 128 for high end setups. 4gb vs 64/128, yes it's merely a fraction of whats available for system memory.

Which disproves your statement, that 3D stacked memory would make main system memory obsolete. You made that statement during a discussion about memory bus's on CPU's and the performance impact it has on IGAs.

This is true. Main system memory will never be obsolets. But having the ability to thro RAM on the CPU would improve performance vastly as it would be speeds comparable to cache RAM.

And I doubt the average user will get to 64GB anytime soon. Of course I would expect stacked RAM to start small, probably 1GB, but it would be a major benefit for the IGP and when the IGP isn't using it it could help the CPU as well.

palladin9479 · Mar 14, 2012

This is true. Main system memory will never be obsolets. But having the ability to thro RAM on the CPU would improve performance vastly as it would be speeds comparable to cache RAM.

And I doubt the average user will get to 64GB anytime soon. Of course I would expect stacked RAM to start small, probably 1GB, but it would be a major benefit for the IGP and when the IGP isn't using it it could help the CPU as well.

Samsung has it at 512MB right now, they want it to 1GB before the year's out for their phones and tablets. My point was that by the time 4GB is possible on a fast desktop CPU (compared to a slow mobile one) then cheap 32GB sticks for consumer use will also be cheaply available. They use the exact same technology, thus if one's available then the other must be available. You'll always be looking at a factor of 8 difference because 8 chips is the most common setup for consumer DIMM sticks. This is completely ignoring the thermal issues presented.

I wouldn't doubt for a minute that both Intel and AMD are looking at 3D memory for their IGAs, it makes entirety too much sense as those CPU's (APU) are the closest to mobile CPUs. You'll run into the same thermal issues but their not a deal breaker under ~40W as you can just have a better cooling system to compensate. CPU's won't be able to use it as main memory if their configured to use off die memory sticks due to the different memory protocols. As is it will require two different memory controllers (one WideIO and another DDR3 or DDR4) on the CPU and both have radically different bus widths (512bit vs 64bit). This is on top of them being a non-linear memory arrangement.

Honestly if I had to design a system with the purpose of using the fast local memory in conjunction with remote memory, I'd have to use some form of hardware based memory mapping to map the faster 4GB into the same address space as the 64GB system memory. Thus you'd still only have 64GB of memory but the OS could use the MMU to chose which portions are mapped to the fast local memory. Would have to have one helluva OS to manage that as I see all sorts of page faults happening then. It would be about as efficient as using those hybrid SSD/HDD combos they have now. Faster at some things but no difference at others.

jimmysmitty · Mar 14, 2012

^But the speed of consumer system RAM vs SoC RM would be vastly different.

And if Intel is already working on it, how do we know Haswell wont have it? Intel has the 3D transistor tech, others don't. Once its out they can move forward with it.

I think it will happen well before 32GB consumer sticks become normal and affordable.

palladin9479 · Mar 14, 2012

Well for one Intel doesn't make memory, IC production technology isn't universal as Intel's venture into GPUs has demonstrated. Also while the chips occupy the same packaging they are not the same silicon, this means Intel would need to ramp up a new production line for producing these DRAM chips. Intel could eventually do this, they have more then enough money to pull it off, but like their GPU's it will take time, not something that can be done in a year. Better for then to license someone else's technology then to produce their own.

The 32GB number was just a point of reference because he chose to use 32Gb (4GB) as his target. The standard defines up to that size but that size isn't in production yet much less engineered onto something. 4Gb is currently the standard for chips which is why 4GB sticks are so cheap. Making an 8GB stick would require 16 4Gb chips which starts to get expensive and into complicated engineering to pass the DDR signal through all those chips. So if they had the technology to put a 32Gb memory stack onto a CPU die, then they have the technology to cheaply create a 32Gb x 8 memory stick (32GB).

Looking at those numbers should give you an idea of what's possible now, next year and over the course of the next four years.

jimmysmitty · Mar 14, 2012

Well for one Intel doesn't make memory, IC production technology isn't universal as Intel's venture into GPUs has demonstrated. Also while the chips occupy the same packaging they are not the same silicon, this means Intel would need to ramp up a new production line for producing these DRAM chips. Intel could eventually do this, they have more then enough money to pull it off, but like their GPU's it will take time, not something that can be done in a year. Better for then to license someone else's technology then to produce their own.

The 32GB number was just a point of reference because he chose to use 32Gb (4GB) as his target. The standard defines up to that size but that size isn't in production yet much less engineered onto something. 4Gb is currently the standard for chips which is why 4GB sticks are so cheap. Making an 8GB stick would require 16 4Gb chips which starts to get expensive and into complicated engineering to pass the DDR signal through all those chips. So if they had the technology to put a 32Gb memory stack onto a CPU die, then they have the technology to cheaply create a 32Gb x 8 memory stick (32GB).

Looking at those numbers should give you an idea of what's possible now, next year and over the course of the next four years.

They may not make DRAM but all of their processes start as SRAM and as well they work with Micron on NAND memory so I doubt it would be something hard for them to do.

As I said though, the point is speed and efficiency. Power usage for total system would drop with on die memory and performance would jump up since latency would drop tremendously.

I am sure that if its possible, AMD as well would look to it as having the GPU rely on system RAM bottlenecks it pretty badly.

palladin9479 · Mar 14, 2012

Also something I didn't mention earlier. The power savings of using on-chip memory is primarily from not having to create an external system memory bus and all the overhead associated with it. The moment you are forced to create one then that power savings goes out the window as you now not only have to power the system memory bus but also the attached on-die memory. And while the on-die eats just a fraction of what a memory chip would eat, it still eats power.

The primary attraction of small amounts of on-chip memory is for low power mobile computing devices.

palladin9479 · Mar 14, 2012

As I said though, the point is speed and efficiency. Power usage for total system would drop with on die memory and performance would jump up since latency would drop tremendously.

Total system power usage goes up if you have to have external system memory anyway. And if you need main system memory then your performance / latency boost is diminished as you'll still have to read from main memory. Now if it's used just for GPU memory / L4 cache then you'd retain all the performance advantages without frequent access to main memory.

Really boils down to the question, can you replace main system memory with on-die memory? I believe not due to scaling and requirements. We simply need too much memory, and that need will only grow as applications migrate to 64-bit. 512MB sounded like plenty of memory years ago and was "acceptable" a few years ago, now it's 2GB at the low end. Couple years from now it'll be 4GB minimum at the low end with 16GB or more preferred. Try to imagine running Windows 7 with 512MB or 1GB of memory if you want an idea of the primary issue.

And yes I agree that AMD should be looking into making GPU memory out of this. Would vastly improve the performance of their IGAs.

jdwii · Mar 14, 2012

you are totally right. what a nice well thought out and completely relevant arguments you provide. Totally not a baseless ad hominem response. I must applaud you.

LOL

truegenius · Mar 14, 2012

is anyone interested in discussing tdp reduction in trinity, that means if amd have managed to get clock speed above 3ghz in under 40w tdp then it can be a +point for amd to competete against intel's lead (in smaller fabrication process).

After this much power saving, how much clock speed you are assuming at what tdp and in which config (x2,x4,x6 or x8).

AMD Piledriver rumours ... and expert conjecture

Administrator

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Splendid

Distinguished

Champion

Splendid

Splendid

Glorious

Splendid

Splendid

Splendid

Splendid

Splendid

Champion

Splendid

Champion

Splendid

Champion

Splendid

Splendid

Splendid

Distinguished

Share this page