AMD Piledriver rumours ... and expert conjecture

Page 209 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
The thing is one program that does very much recieve a boost from L3 cache is .... 90% of all GAMES . http://www.xbitlabs.com/articles/cpu/display/phenom-athlon-ii-x2_7.html

If your building a gaming computer, you don't want a chip that doesn't have L3 cache, you pretty much across the board crippled yourself.

mostly it looks like only video-editing is unaffected.

Again, learn to read what I said. I said that the L3 cache is a waste due to the increased power draw, and all the space it takes up on the die that could otherwise be used to increase performance.
 
Again, learn to read what I said. I said that the L3 cache is a waste due to the increased power draw, and all the space it takes up on the die that could otherwise be used to increase performance.


Overall it doesn't increase power draw because if the data doesn't reside in L2/L3 it would have to go to main memory which is more costly (power and time wise).

 
Again, learn to read what I said. I said that the L3 cache is a waste due to the increased power draw, and all the space it takes up on the die that could otherwise be used to increase performance.
how, by adding more cores? maybe by trying to design a larger single core that takes up more die space? not sure what your line of thinking is on this.

L3 cache is a simple design thats easy to implement. A cpu core can only go so fast, l3 or not, its the same speed. If your worried about power draw and not performance, use an Atom cpu.

Just something to think about, RCM has a maximum efficiency at its resonant frequency. Target that frequency to the L3 cache speed and take a guess at what happens.
 
I rarely go by "Up to" statements L3 cache for client Computers its a complete waste of space. Up to Statements are purely for marketing in my eyes, i'm a average performance benefit Guy and L3 cache is a 5% boost or less under 90% of apps.

And it does take quite a bit of the die that can be used for other things Such as Better per core performance.

You do have a point regarding "up to" statements. Still, if you want to talk about averages, the right number, according to what I have posted so far, would be 10%.

But increasing the performance of these modern CPU architectures, even if only by a few percent, is not trivial--and the 10% are the result of a single design change.

This change comes at the cost of die space and power (which translates to heat), but it's probably one of simplest and most effective ways to increase performance.

To say that having "3AGU/ALU over 2 with 6IPC per module" would be better than having L3 cache seems to be a very bold claim, to say the least. Can you really tell?



 
LOL - I remember that thread - some 60+ pages all because some Canadian guy didn't wanna spend $5 on a power cable for his "very old computer not working" or whatever the thread title was 😛. IIRC some 5 years ago. That guy gave Canadians everywhere a bad rep.

Come to think of it, "Triny" is Canadian too - wonder if there's a connection somewhere 😀..

j/k

I have found the link: http://www.tomshardware.com/forum/page-173814_28_150.html :lol:
 
Overall it doesn't increase power draw because if the data doesn't reside in L2/L3 it would have to go to main memory which is more costly (power and time wise).

Which is going to happen 99% of the time anyways. And time wise, a 20 clock cycle difference isn't going to make a massive performance benefit one way or the other in the vast majority of workloads.

I'm not arguing the L3 doesn't help, I argue the die space could be better used for some other purpose.

L3 cache is a simple design thats easy to implement. A cpu core can only go so fast, l3 or not, its the same speed. If your worried about power draw and not performance, use an Atom cpu.

A quick rule: [rough numbers, but enough to get the point across]

If the data resides in a CPU register, the CPU can get it instantly
If the data resides in the L1 cache, the CPU can get it in 1-3 clock cycles
If the data resides in the L2 cache, the CPU can get it in 15-30 clock cycles
If the data resides in the L3 cache, the CPU can get it in about 80 clock cycles
If the data resides in RAM, the CPU can get it in about 100 clock cycles
If the data resides on the HDD, the CPU can get it in about 100,000 clock cycles

There isn't much performance benefit of the L3 compared to main memory. Its there, but I argue the space on the die could otherwise be utilized by some other technology that will give a larger performance benefit then the L3 currently does.

Also, understand all levels of cache are basically a way to hide the slow memory access times compared to the speed of the CPU. Its a speed hack, nothing more, nothing less.
 
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/6

L3 more than 2x the speed of main memory. I don't know where you get your data.

Last time I had to do an in-depth analysis of memory access times was back in the DDR2 days. Guess that extra latency of DDR3 had a larger impact then I expected...

Point still stands though.
 
A quick rule: [rough numbers, but enough to get the point across]

If the data resides in a CPU register, the CPU can get it instantly
If the data resides in the L1 cache, the CPU can get it in 1-3 clock cycles
If the data resides in the L2 cache, the CPU can get it in 15-30 clock cycles
If the data resides in the L3 cache, the CPU can get it in about 80 clock cycles
If the data resides in RAM, the CPU can get it in about 100 clock cycles
If the data resides on the HDD, the CPU can get it in about 100,000 clock cycles

There isn't much performance benefit of the L3 compared to main memory. Its there, but I argue the space on the die could otherwise be utilized by some other technology that will give a larger performance benefit then the L3 currently does.
That technology doesn't exist yet. If it did, im sure we wouldn't have any l3 cache.

Like I said, im not sure what your grasping at, but its pointless to argue something thats not realistic. ATM and in the forseeable future, using the die space its either l3 cache or an IGP, or your favorite complant, moar cores.

For an analagy of complaining about l3 cache being pointless, its kinda like me saying software programmers need to quit complaining and just program for multiple cores already. Without the necessary means or research, its not going to happen right away.

If i could design my own cpu somehow, id have one massively fast core surrounded by helper cores.
 
I'm not arguing the L3 doesn't help, I argue the die space could be better used for some other purpose.

It's effective enough that Intel puts up to 30MB of it on. They wouldn't waste that many transistors for no good reason. CPU architecture is a balanced design for all types of workloads.

And next year L4 caches coming on Haswell.
 
It's effective enough that Intel puts up to 30MB of it on. They wouldn't waste that many transistors for no good reason. CPU architecture is a balanced design for all types of workloads.

And next year L4 caches coming on Haswell.

In servers. They do put a lot of L3 in desktop and now they're going for L4, but I wonder how they'll balance it down the road (die space).

Anyway, Vishera needs to get out to answer this question. I do believe L3 won't be worth a 10%, but I do hope I'm wrong and it is in fact, more overall.

Cheers!

EDIT: Typo
 
You do have a point regarding "up to" statements. Still, if you want to talk about averages, the right number, according to what I have posted so far, would be 10%.

But increasing the performance of these modern CPU architectures, even if only by a few percent, is not trivial--and the 10% are the result of a single design change.

This change comes at the cost of die space and power (which translates to heat), but it's probably one of simplest and most effective ways to increase performance.

To say that having "3AGU/ALU over 2 with 6IPC per module" would be better than having L3 cache seems to be a very bold claim, to say the least. Can you really tell?


Amd cut these resources and this is why Bulldozer is 25-33% slower in IPC when compared to the Phenom. On top of that the L2 cache has high latency.
 
In servers. They do put a lot of L3 in desktop and now they're going for L4, but I wonder how they'll balance it down the road (die space).

Simple - go vertical :). At least that is what Intel is supposed to be doing with the "L4" cache which is a separate die allegedly connected with through-silicon vias (TSV) interposers. Seeing as how they've already gone "3D" with transistor structure, I guess this is the next step. Anyway, will shorten signal paths and thus provide a wide-bandwith, low-latency memory connection. If the S/A rumors about Haswell's GPU having 5x performance increase over IB are true, I'd think the L4 cache would be a major contributor to that increase, as the execution units are supposedly going from 16 to 40 or only 2.5x for the high-end version anyway.

Personally I agree with the arguments that L3 cache is an effective way to increase performance. Neither Intel nor AMD would be using it if it did not yield "bang for the buck" performance increases, so arguing about whether it is an effective use of limited die area is kinda pointless IMO - the true experts (CPU designers) have already addressed the issue. IIRC Intel at least has said their design philosophy is that they won't include any feature that does not yield at least a 2% increase in performance vs. 1% increased power draw. So obviously L3 caches falls into that category since otherwise Intel at least would find another use for the die space.

Also, IIRC L3 cache snooping is a way for cores working on different threads to find data & results the other core may have already fetched or written, so it should be even more beneficial in threaded apps, besides the single-core benefit to small programs fitting entirely in the L3 cache.
 
I'm still wondering about vertical memory allocation. More than L4 being the new curiosity from Intel, I'd say it will be how they allocate it, like you say fazers, "vertically".

I remember something about this being talked not so long ago. I think Samsung had vertical allocation going on?

And also, I wonder if it's directly related to FinFETs, so that other foundries can do it as well.

Like gamerk says, if the "high latency" RAM trend continues, L4 will account for a lot of improvements memory management wise. Was DDR4 going to address some of that, or not?

Cheers!
 
An alternative viewpoint on why AMD fell short last quarter..

Decoding AMD's Q2 Execution Shortfall

Advanced Micro Devices (AMD) shareholders have seen a significant destruction of wealth over the last four months, with the stock plummeting from just over $8/share to the low $4's. The downtrend began after the Q1 2012 earnings report, in which the company beat estimates on the top and bottom lines. The stock was then demolished, going from the $6 range to the low $4's, after the company hit investors with a one-two punch of a revenue warning coupled with weak guidance at the quarterly report. So, what happened?

Listening to the Q2 report, it seems that the main issue was weak desktop channel sales that "started in China and then spread globally." Unfortunately, it seems that AMD's Llano sales in the channel is suffering the Osborne effect. Let me explain:

When average users go to buy an OEM system from the likes of Dell (DELL) or Hewlett-Packard (HPQ), they're generally not interested in the specifics; that is, they likely won't read in depth performance benchmarks. They'll see the buzzwords and key words such as '3.0GHz', 'Quad Core', and '8GB of memory', and simply buy the system that seems to best fit their price range. AMD's products are actually pretty strong in this way: even their '4 core' Bulldozer chips that more often than not underperform a '2 core' Intel (INTC) Sandy Bridge chips in CPU-intensive tasks, look attractive to the average buyer.

In the notebook space, the specifics of the hardware are even less important, with users favoring the right mix of portability, battery life, and cost. AMD's products also do well here, especially as low-cost alternatives to higher-priced Intel alternatives with the same 'checkbox features'.

But things got too hot for AMD in the channel. The systems that are sold in this segment are usually systems from 'white-box' OEMs and chips sold directly to the end user. The white-box OEMs and the folks buying their CPUs directly from retailers and e-tailers are more "in the know" about the performance characteristics of the systems they're buying. Further, the folks selling are usually more knowledgeable about what products offer the best "bang for the buck" (this is one of the competitive advantages that small computer businesses have over companies like Dell).

This leads to the issue then: why should these folks buy a Llano based desktop when they know that Trinity is on its way, and know that it is a fairly significant step up from the Llano chips? AMD's channel partners will know not to load up too much on Llano, especially given what's in the pipeline; they wouldn't want to have to suffer inventory write downs when the new stuff hits.

But to add salt to the wound, AMD isn't just competing with itself; it's up against Intel. While AMD's Llano and Trinity chips still offer a very clear competitive advantage on the integrated GPU side of things over Intel's offerings, AMD finds itself in the following conundrum: channel partners that are trying to sell 'gaming capable' systems will most likely want to use discrete graphics cards from either AMD or Nvidia (NVDA) paired with Intel's CPUs due to Intel's generally superior gaming performance even with their low end "Pentium" chips. And for systems that don't require gaming-class graphics performance, Intel's integrated graphics solutions will be adequate paired with Intel's superior CPU performance.

Finally, there's the additional issue that desktops are just not all that fashionable these days. Power users and budget-minded folks alike still buy desktops for the performance per dollar (and at the high end, the raw performance), but the general public is clearly flocking to portability. So I suspect that weakness in desktop sales in general hurt AMD here, especially since the company has only 19.1% of the x86 processor market, but has 43% of the desktop x86 market. Microsoft's (MSFT) upcoming Windows 8 operating system is particularly desktop-unfriendly as it focuses on the touch-friendly Metro UI, so I don't suspect the software giant's upcoming operating system will help matters here.
 
Status
Not open for further replies.