AMD Piledriver rumours ... and expert conjecture

Reynod · Oct 27, 2011

We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...

palladin9479 · Mar 15, 2012

Benchmarks of 64-bit FF vs 32-bit FF

http://www.extremetech.com/computing/90546-firefox-8-x64-has-64-bit-browsing-finally-come-of-age

Just as an example.

wh3resmycar · Mar 15, 2012

running eclipse + avd + favorite browser would eat a considerable amount of RAM (above 1gb) already.

imagine if you put extra services here and there, sql / apache, and somebody dares to put RAM inside a CPU? that's icky. and i doubt win8metro is going to soften up on eating RAM. you guys are not talkin about these creatures you call "general users" aren't you? they're moving onto tablets, to hell with them.

anyway. how many PCIE lanes will the next gen AMD chipset sport? i don't know if that has been mentioned before but i think that's an (err only) area AMD can actually compete on.

palladin9479 · Mar 15, 2012

Memory on CPU makes perfect sense for Phone / Tablets. Utilizing a small SoC with a lightweight OS provides a much smaller memory footprint with a full blown Desktop. That being said, Samsung is now shipping 1GB phones and will be putting more on Tablets.

dogman_1234 · Mar 15, 2012

I actually can see SoC as being good for servers as well.

Think Cray for a while and let it sink in...

palladin9479 · Mar 15, 2012

I actually can see SoC as being good for servers as well.

Think Cray for a while and let it sink in...

Absolutely not. Servers are now running 128 ~ 256MB of memory across four CPU's.

Supercomputers are using terabytes of memory not gigabytes. And that's today, not tomorrow.

palladin9479 · Mar 15, 2012

What drugs are you on? i want them because 32bit rules the land.

Huh? What rock have you been living under?

NT Kernel 32 limits applications virtual addressing space to 2GB, that was fine in the 90's, that's not fine anymore. Even with if it's compiled with LAA (Large Address Aware) as an option for 4GB of address space you take a performance hit with your inability to make system calls to the kernel. Programs are finally being compiled as both 64-bit and 32-bit binary's, or just to AnyCPU binary format (similar to what Apple did). In the next few years 64-bit versions will start to become standardized.

I'm not even going to attempt a debate here, I'll let your ignorance stand as a warning to all what drugs do to your head.

palladin9479 · Mar 15, 2012

http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_AMD64_Porting_FAQ.pdf

Older presentation and technical details by AMD about AMD64 and how to work with it. Good reading for those who might not understand the finer points of x86 vs x64.

What I always that was interesting is how the x64 ISA (if it can be called its own) not only extended the x86 ISA but also introduced additional RISC style registers. x86 has 8 general purpose 32-bit long registers (EAX, EBX, ECX, ect..) that used to have specific functions on the 16-bit 8086 CPU (AX,BX,CX) but are more flexible now. X64 has those same eight registers, now 64 bits long (RAX, RBX, RCX) but also has eight more 64-bit length registers (r8, r9, r10, ect..). This allows software to be further optimized to store more values inside registers instead of having to play register-merry-go-round. Also removed v86 and several functions of the older 32-bit / 16-bit x86 world and enforces a flat memory model instead of a segmented one. Operands are still 32-bit lengths so as to save code space, but registers and address points are 64-bit. Also added eight more 128-bit SIMD registers.

Basically a piece of software that's been optimized for 64-bits running on a 64-bit OS will see a performance increase over that same program at 32-bits running on a 32-bit OS, most of the time.

gamerk316 · Mar 15, 2012

Memory needs are growing not shrinking. The biggest constraint over the past five years has been 32-bit code. On Windows NT x86 applications are only given 31-bits worth of address space, the final 32nd bit is used as an easy way to distinguish between shared kernel memory and local application memory. Every application gets 2GB of their own virtual address space but the kernel has a single 2GB of address space. This is a left-over from NT 4.0 days. NT Kernel x64 has no such limitation and applications have ridiculously large address space. If an applicaiton attempts to load more then 2GB worth of data into it's memory space it'll cause a page fault as it would attempt to write to the protected kernel memory space, it'll cause the application to crash. Anything over 1.8GB presents a risk that it would inadvertently cause a page fault by loading something slightly too big into memory. This is why you can have 16GB of memory in your system but Skyrim will only load a limited amount of data, the programmers didn't want to risk crossing that 2GB boundary.

As programs start to be compiled and released as 64bit executables this limitation will go away and you'll see programs start to load 4GB+ of data into memory. Games are already 4~10GB+ in size, so large that some of them take multiple DVD's to install. This combined with system cacheing points to more memory being required, not less.

Obsidian's developers went into detail about these problems after their made NWN2. NWN2 was a 32-bit executable but their aurora toolset they had made to create and develop for NWN2 would have big issues with crashing on their development machines. 2GB was simply not enough memory for their tool set to load all the required resources for development and they were forced to develop a 64-bit version of it so they could finish developing the game. NWN2's last official patch was 2009, it's now 2012.

Agreed. I wish apps would at LEAST be compiled with the Large Address Aware flag at release though; it really doesn't make much sense to not expand that upper limit to 4GB for 64-bit systems...

king smp · Mar 15, 2012

Actually a reference to me being a Chaotic/Good ranger yet some of the ways I would play were more inline with Paladins, hence the nickname.

hmm saw you as neutral
Chaotic Good
that tells me something about you

funny thing is I have been reading that 128bit OSes are in the near future
is that true?

de5_Roy · Mar 15, 2012

AMD Shows Live Demo of Trinity APU With Eyefinity Gaming
http://www.tomshardware.com/news/AMD-Trinity-Piledriver-VCE-Demo,15009.html
intel and apple should take note.
4 core cpu (apu) in ultrabook form factor,
eyefinity support,
playable fps in latest games,
vce (hopefully amd enables it in time)....

-Fran- · Mar 15, 2012

I already saw almost the same thing when Llano showed up in a closed presentation here in Chile from AMD. The only difference, is that Dirt3 wasn't running with high details and AA, it was med/low and no AA giving 30FPS+.

It looks good though, seems like Trinity might save the day for notebooks and give low end desktops a better P/P by making Intel and nVidia come up with better low end offerings; just like the Pentium G840 Toms found.

Cheers!

noob2222 · Mar 15, 2012

talking about overclocking the FX-Bulldozer, that's hotter and louder crap...
and I run my overclock on stock voltage..
keep the X6 and stay away until Piledriver, Bulldozer is not your friend...

you don't have to run 1.5-1.9V to overclock bd. At stock the turbo voltage is 1.41V. Currently my system is at 4.7Ghz at 1.34v, looking at it this way, thats an undervolt and overclock. And my massive hot temp is at 49C with 2 low speed fans (1300 rpm) going through a double 120 radiator ... ya that thing is silent. With high speed fans (2000 rpm) it drops the temp to 39C. Not worth the noise since its still well below the rated max of 62C with either fan setup.

Newegg isn't the best place to find "I tried it" reviews. But I will add this, there is no reason at all to go from a true quad to a dual module "quad-core" cpu.

noob2222 · Mar 15, 2012

that's a nice clock you have set.
have any radiator trouble (leakage).?

only time I did that was when I used a screw that was too long, nothing jb weld didn't fix.

king smp · Mar 15, 2012

Hey Mal
http://www.microcenter.com/single_product_results.phtml?product_id=0365791

best deal I have seen

Cazalan · Mar 15, 2012

Actually you claimed stacked memory, you didn't even know about horizontal stacking yet. It hasn't appeared in any news releases and won't for another year at least.

Lol. What else could "stacked on or next to" mean? Horizontal arrangement of die is nothing new. The previous picture clearly shows multiple stacks of varying size and type.

This is from 2007.

http://www.intel.com/technology/itj/2007/v11i3/3-bandwidth/6-architectures.htm

Not all research makes it to the front page news for long. 3D transistors were invented by IBM long ago. We're just getting them mass produced by Intel this year. GFlo won't have them until 2014.

jimmysmitty · Mar 15, 2012

AMD Shows Live Demo of Trinity APU With Eyefinity Gaming
http://www.tomshardware.com/news/AMD-Trinity-Piledriver-VCE-Demo,15009.html
intel and apple should take note.
4 core cpu (apu) in ultrabook form factor,
eyefinity support,
playable fps in latest games,
vce (hopefully amd enables it in time)....

The playable was demoed on the DT Llano, doesn't meaa same performance on the ultra thins. I actually do wonder what speed the CPU and IGP is to get 17w TDP.

I already saw almost the same thing when Llano showed up in a closed presentation here in Chile from AMD. The only difference, is that Dirt3 wasn't running with high details and AA, it was med/low and no AA giving 30FPS+.

It looks good though, seems like Trinity might save the day for notebooks and give low end desktops a better P/P by making Intel and nVidia come up with better low end offerings; just like the Pentium G840 Toms found.

Cheers!

It looks interesting. I still think reviews will be the best as closed doors means they control everything you see so there could be many factors such as the rest of the specs etc.

The 29% better in CPU still is one thing I question but it is being compared to a Athlon II, not Phenom II in the CPU part.

Cazalan · Mar 15, 2012

I want to hear, from you, what you actually expect to do with this 2GB desktop PC in a world that will be full of 16 ~ 32GB PC's. (2GB * 8 = 16GB per DIMM stick at approx $30~40 USD).

I'm not saying I would buy it, or anyone here would buy it. Clearly you misunderstand the low end market segment.

For the Mom&Pops and people that just want to surf the web, edit photos and Facebook people it's adequate. People are doing all that with an iPad with 512MB.

Clearly some people are happier with less. Not everyone can afford a $1000 PC. There's people in the world living on $1/day.

fazers_on_stun · Mar 15, 2012

Now what's really funny is that your so caught up with vertical stacking that you never sat back and saw what ~is~ possible, horizontal placement of a 3D memory stack. They stack them vertically to better make use of limited physical space, this is a problem in mobile applications (and real estate out here). In a desktop computer you have plenty of room on the CPU mounting board. The CPU die is often 20% or less of the actual size of the socket, which means tons of horizontal space is being wasted. Instead of stacking memory ontop of the CPU you put it horizontal with the CPU with the interconnects being on the bottom layer and fusing to the side of the CPU. Fundamentally it's the same concept, two separate die's but instead of them connecting in a vertical manor you have the bottom most layers connect in a horizontal manor. This allows the thermal load from the CPU unimpeded access to the heat plate while leaving you plenty of room for a larger memory stack. Still not enough to compete with main memory, but more then enough to be GPU memory or a large cache. Requires more engineering work, especially as you'll need a finely fitted heat plate, but more then doable.

http://en.wikipedia.org/wiki/Through-silicon_via

In electronic engineering, a through-silicon via (TSV) is a vertical electrical connection (via)(Vertical Interconnect Access) passing completely through a silicon wafer or die. TSVs are a high performance technique used to create 3D packages and 3D integrated circuits, compared to alternatives such as package-on-package, because the density of the vias is substantially higher, and because the length of the connections is shorter.

Going the MCM route with 2 die spaced horizontally in one package would mean the signal bus would be much longer than with 3D stacking, and require scaling up the bus drivers to handle the higher RC load. http://realworldtech.com/page.cfm?ArticleID=RWT050207213241&p=2

Interconnect Problems – Think Globally
While transistor switching performance has continued to improve by roughly a third each fabrication process generation, the wires that connect them throughout the chip, the metal interconnects have comparatively deteriorated in performance]. Interconnect can also draw up to a third of the power utilization of a modern microprocessor. Indeed, the energy required to drive an operand across a chip’s wires can dwarf the energy needed to operate on it by the computation logic]. There have been isolated one-time improvements to interconnects, such as slightly better insulating materials between interconnect layers to reduce parasitic capacitance. The switch from aluminum to copper interconnects reduced resistance, which similarly increases the performance of interconnects. However, the future of wire performance is clear and getting worse with each process generation.

Latency of on-chip wires is generally a product of their resistance and capacitance, the RC delay, which is near a factor of the speed of light. A wire’s RC propagation delay is quadratic in proportion to its length (i.e. a wire that is twice is long might have an RC delay 4x larger, or more). As process feature sizes shrink, the capacitance of shrunken wires decreases marginally. However, the cross-section of the wire is cut in half, which doubles the resistance, effectively doubling the propagation delay. Functional unit blocks will also shrink, which reduces the length of the local intra-block interconnect (i.e. wires between different stages of a multiplier). This tends to mitigate latency increases, at least for the local wires. Yet constant wire latency in the presence of enhanced transistor switching speeds effectively increases interconnect latency, in relative terms. This comparative increase is tolerable for intra-block wires as they are very short, contributing little latency to the overall cycle time of the chip. It is the inter-block and especially upper level global on-chip interconnect where the majority of the increasing delay is encountered. Assuming a constant microprocessor die size between process generations, the latency of the global wires that have to travel the length of the chip could double with each shrink. This would triple the relative difference between global wire and transistor performance every process generation, reducing the chip area a global signal can travel per narrowing clock cycle..
..
A time honored solution to alleviate this problem is inserting buffers and flip-flops to partition a long wire into segments, boosting the signal. Since wire delay is a quadratic function of the length of a wire, segmenting a wire into two equal sub-segments halves the total wire latency, although the buffer itself introduces a small delay. However, buffers and flip-flops are not free, as they consume additional power. The number of buffers needed to ameliorate the interconnect-transistor disparity would grow exponentially over different process generations, making it unsuitable as a long term solution.

The above is true for on-chip signals, but going off-chip requires an I/O driver to amplify the tiny on-chip bus signals by ramping them up through progressively larger stages until the final one which is connected to the bonding pad and then the off-chip bus wiring.

In contrast, 3D stacking results in lower latency and lower overall power:

The three dimensional integration of multiple device layers results in several advantages over the present regime. The chief benefit is that the interconnects between blocks are shorter, in some instances considerably so, as illustrated in Figure 6. This lowers power dissipation since fewer buffers and flip-flops are needed. Reducing the amount of metal that runs across the chip also reduces power dissipation. Lower inter-block latency reduces cycle time, increasing frequency and chip performance. Stacking layers also increases chip density, as more transistors are able to be placed per unit of volume and within one clock cycle of each other. Cost reduction is a byproduct of this as fewer pins are needed per chip to communicate with other nearby chips, compared to the prior arrangement, simplifying packaging.

While all these advantages are significant, the three dimensional integration also has its drawbacks. Overall power consumption is reduced since less interconnect is used; however power density can increase in parts of the 3D integrated circuit. Without careful attention early in the design and simulation of the chip, the resulting thermals could reach unacceptable levels, affecting device reliability and requiring expensive cooling solutions.

If the rumors about Haswell having at least 1GB of low-power GDDR stacked on the GPU turn out to be true, then with a die shrink or two main memory could be next..

Cazalan · Mar 15, 2012

The two big drawbacks, something Caz is dancing around and refuse's to answer, is that your limited to ONE memory chip at most and to lower clock speeds due to thermal insulation. Both of these are acceptable in the mobile space as total thermal output is what is measured not per-device thermal output. Removing the external system memory bus reduces heat more then the insulating factor of one semi-conductor on top of another.

4! The WideIO REVISION 1 spec is for 1 to 4 die stack. Again, this is just 1 of many types of 3D memory technology. IBM, Micron, Intel, Hynix and others are working on their own versions.

WideIO is indeed clocked slower (~200MHz), but it is 512 bits wide.
What it lacks in speed it makes up in width.

A DDR2/DDR3 DIMM is 64 bits wide (72bit w/ECC).

This is over a year old showing a 2 die stack. They've already demonstrated 8 die stacks with other memories.

http://www.ecnmag.com/News/Feeds/2011/03/applications-medical-electronics-samsung-wide-io-memory-for-mobile-products-a-dee/

jdwii · Mar 15, 2012

Huh? What rock have you been living under?

NT Kernel 32 limits applications virtual addressing space to 2GB, that was fine in the 90's, that's not fine anymore. Even with if it's compiled with LAA (Large Address Aware) as an option for 4GB of address space you take a performance hit with your inability to make system calls to the kernel. Programs are finally being compiled as both 64-bit and 32-bit binary's, or just to AnyCPU binary format (similar to what Apple did). In the next few years 64-bit versions will start to become standardized.

I'm not even going to attempt a debate here, I'll let your ignorance stand as a warning to all what drugs do to your head.

You mad Bro?

I think everyone knows 64bit is the future but most(90%) of the mainstream app's still use 32bit. And their is 3 main reasons why, number 1 not everything needs to be in 64bit, And number 2 it takes to long to make a 32bit program into 64bit and the end results are barley worth it half the time, And this is the major reason why-companies are cheap and some programmers are lazy.

Not to mention 32bit can support 4GB of ram and usually only 3.25GB on windows.

jdwii · Mar 15, 2012

you don't have to run 1.5-1.9V to overclock bd. At stock the turbo voltage is 1.41V. Currently my system is at 4.7Ghz at 1.34v, looking at it this way, thats an undervolt and overclock. And my massive hot temp is at 49C with 2 low speed fans (1300 rpm) going through a double 120 radiator ... ya that thing is silent. With high speed fans (2000 rpm) it drops the temp to 39C. Not worth the noise since its still well below the rated max of 62C with either fan setup.

Newegg isn't the best place to find "I tried it" reviews. But I will add this, there is no reason at all to go from a true quad to a dual module "quad-core" cpu.

I will try to out bench you if you like. I'm a beast or wait i mean my phenom.

Cazalan · Mar 15, 2012

Huh? What rock have you been living under?

I'm not even going to attempt a debate here, I'll let your ignorance stand as a warning to all what drugs do to your head.

jdwii was right. Far more 32bit CPU's are shipped per year than 64bit.

Windows 7 is increasing 64bit numbers but even close to half of Win7 clients are running 32bit.

You're too focused on the desktop and server market.

Cazalan · Mar 16, 2012

I thought today's article on Trinity was fairly impressive.
Doing triple-head display with an APU.

http://www.tomshardware.com/news/AMD-Trinity-Piledriver-VCE-Demo,15009.html

If Trinity is 50% better graphics wise than Llano and you crossfire with a similar card you're looking at a doubling in performance on the cheap.

Cazalan · Mar 16, 2012

The playable was demoed on the DT Llano, doesn't meaa same performance on the ultra thins. I actually do wonder what speed the CPU and IGP is to get 17w TDP.

They'd probably have to get into the E-350 clock speed range to get close to 17W.

That's 1.6Ghz or so.

noob2222 · Mar 16, 2012

might be sending you some PM's on liquid, I might try a simple Corsair solution first.
if you don't mind.?
thanks.

dont mind at all.

I will try to out bench you if you like. I'm a beast or wait i mean my phenom.

on a maximum stable or reliable overclock? I can reach 5.1 ghz but the temps and voltage required are more than I like. 1.46v, 56C after 1h of prime 95, even though thats still lower than some of toms testing, imo not worth 400 mhz, but doable.

AMD Piledriver rumours ... and expert conjecture

Administrator

Splendid

Distinguished

Splendid

Splendid

Splendid

Splendid

Splendid

Glorious

Splendid

Splendid

Glorious

Distinguished

Distinguished

Splendid

Distinguished

Champion

Distinguished

Splendid

Distinguished

Splendid

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Share this page