AMD Piledriver rumours ... and expert conjecture

Reynod · Oct 27, 2011

We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...

nforce4max · Feb 17, 2012

From what I have been reading here and elsewhere this year isn't going to be as great as many of us where expecting or at the very least hoping for when it comes to next gen upgrades :s

Pretty much stuck with two aging and decaying GTX280 and a gtx460 on the side that has outlived what most have seen that bought from Gigabyte. For now I am looking at the 7850 being a good option for my Intel build while my vintage relics will remain with my phenom 2 x4 box.

Cazalan · Feb 17, 2012

Your spewing tons of BS about "effective" Integer cores (WTF is effective supports to mean, is that a concept you created in your own head with next to no engineering experience). Four pipelines per core meaning eight per module, far more then two.

You're arguing over semantics and being quite rude about it. While there are 8 integer cores per module they aren't being utilized effectively 100% of the time, or anywhere close to that. If they were then BD would not have been the disappointment it was.

The FX-4100 has 33% more integer cores than a Phenom-X4.
The FX-8150 has 78% more integer cores than a Phenom-X6.

In both comparisons the FX has a 32nm vs 45nm process advantage, more cache, more transistors and higher clock speeds. Performance wise it can be slower or slightly faster depending on benchmark.

Architecture wise it should have a significant advantage. We can just hope that AMD learned something and Piledriver can redeem the architecture.

If 22nm Ivy-Bridge doesn't perform better than 32mn Sandy-Bridge it will be equally disappointing.

jdwii · Feb 17, 2012

You're arguing over semantics and being quite rude about it. While there are 8 integer cores per module they aren't being utilized effectively 100% of the time, or anywhere close to that. If they were then BD would not have been the disappointment it was.

The FX-4100 has 33% more integer cores than a Phenom-X4.
The FX-8150 has 78% more integer cores than a Phenom-X6.

In both comparisons the FX has a 32nm vs 45nm process advantage, more cache, more transistors and higher clock speeds. Performance wise it can be slower or slightly faster depending on benchmark.

Architecture wise it should have a significant advantage. We can just hope that AMD learned something and Piledriver can redeem the architecture.

If 22nm Ivy-Bridge doesn't perform better than 32mn Sandy-Bridge it will be equally disappointing.

What? The 4 core has 4 Full Integer cores and 2 FP units?

palladin9479 · Feb 17, 2012

What? The 4 core has 4 Full Integer cores and 2 FP units?

Their pulling and twisting numbers to try to make the FX8150 into a four core unit without actually knowing much about processor design.

AMD coined the term "Integer Core" not to mean integer units, which are actually components of a processor, but to represent a regular processor that has had it's FPU decoupled. The last CPU to not have an FPU was a 80486SX. From the Pentium onwards the FPU was integrated. Eventually it morphed from a standard 80-bit FPU into the 128-bit SIMD (64x2 or 128x1) units we see nowadays. AMD decided to remove the FPU and make it into a single 256-bit AVX unit that is capable of simultaneous 2x128-bit SIMD transactions (2x64 each) and allow it to be addressed separately from the standard integer units.

Thus technically a single AMD "module" contains three "cores", two Integer "Cores" complete with schedulers, 4 pipelines and their own set of L1 cache, and one SIMD "Core" that has it's own scheduler. Since FPU's are rarely treated as full processors it's not counted as a core.

To manage these three "Cores" AMD decoupled the instruction decoded, branch predictor / instruction prefetch unit and gave it four x86 instruction schedulers. Those are what keep track of the x86 macro-ops that are then reduced into several micro-ops each and dispatched to the internal Integer Units / FPU for processing. The internal Integer Units scheduler then tracks the processing of those micro-ops through its pipeline until the instruction is complete and the value is returned to the front end decoder + predictor for evaluation and return to program. It's complicated, much more then a typical processor design would be, and requires many independently moving parts. L2 cache is shared amongst all those components which explains the monstrous latency involved.

Anyhow the concept of "cores" is useless when looking over processors. Modern day processors have multitudes of processing resources with miniature dedicated processors dedicated to managing those resources. This is why we don't have 32 integer units on a CPU, it would be nearly impossible to track and schedule it efficiently unless you had 32 separate non-dependent integer threads going on. This doesn't even touch SIMD / FPU instructions and memory operations.

esrever · Feb 17, 2012

Their pulling and twisting numbers to try to make the FX8150 into a four core unit without actually knowing much about processor design.

AMD coined the term "Integer Core" not to mean integer units, which are actually components of a processor, but to represent a regular processor that has had it's FPU decoupled. The last CPU to not have an FPU was a 80486SX. From the Pentium onwards the FPU was integrated. Eventually it morphed from a standard 80-bit FPU into the 128-bit SIMD (64x2 or 128x1) units we see nowadays. AMD decided to remove the FPU and make it into a single 256-bit AVX unit that is capable of simultaneous 2x128-bit SIMD transactions (2x64 each) and allow it to be addressed separately from the standard integer units.

Thus technically a single AMD "module" contains three "cores", two Integer "Cores" complete with schedulers, 4 pipelines and their own set of L1 cache, and one SIMD "Core" that has it's own scheduler. Since FPU's are rarely treated as full processors it's not counted as a core.

To manage these three "Cores" AMD decoupled the instruction decoded, branch predictor / instruction prefetch unit and gave it four x86 instruction schedulers. Those are what keep track of the x86 macro-ops that are then reduced into several micro-ops each and dispatched to the internal Integer Units / FPU for processing. The internal Integer Units scheduler then tracks the processing of those micro-ops through its pipeline until the instruction is complete and the value is returned to the front end decoder + predictor for evaluation and return to program. It's complicated, much more then a typical processor design would be, and requires many independently moving parts. L2 cache is shared amongst all those components which explains the monstrous latency involved.

Anyhow the concept of "cores" is useless when looking over processors. Modern day processors have multitudes of processing resources with miniature dedicated processors dedicated to managing those resources. This is why we don't have 32 integer units on a CPU, it would be nearly impossible to track and schedule it efficiently unless you had 32 separate non-dependent integer threads going on. This doesn't even touch SIMD / FPU instructions and memory operations.

Nice to know some people actually know something about bulldozer's design. Most people seem to just spew random garbage they collected from bits of information they don't understand.

The worst are the people who don't know the difference between the module and SMT and calls the FX 81XX 4 cores. Makes no sense even by the benchmark numbers.

One thing I never got is why they have such a small L1 when the L2 is so bad.

$hawn · Feb 17, 2012

One thing I never got is why they have such a small L1 when the L2 is so bad.

seriously, me too

g4114rd0 · Feb 17, 2012

...A leaked slide reveals AMD is preparing to release three new Zambezi based FX CPUs with 95W TDP - FX-8140, FX-6120 and FX-4150.".

http://vr-zone.com/articles/amd-to-release-three-new-fx-cpus-in-q1/14926.html

g4114rd0 · Feb 17, 2012

Feature highlights of the AMD Catalyst™ 12.2 pre-certified driver...(out of topic).

http://blogs.amd.com/play/2012/02/16/amd-catalyst%E2%84%A2-12-2-pre-certified-driver-whats-new/

fazers_on_stun · Feb 17, 2012

Your spewing tons of BS about "effective" Integer cores (WTF is effective supports to mean, is that a concept you created in your own head with next to no engineering experience). Four pipelines per core meaning eight per module, far more then two.

You also fail to understand how instruction queing and decoding work. That's an entire book in and of itself and far beyond the scope of this discussion. Safe to say, your horribly wrong if you think that four schedulers couldn't keep two integer cores and two 128-bit FPU's busy as part of a superscaler design. Eventually your realize there are internal schedulers to each core, and then your going to ponder what those schedulers are for and the difference between the ones outside and the ones inside.

Hmm, I used to think that you were one of the more knowledgeable and reasonable posters here, but this post in particular, and your recent responses to others in general, have pretty much destroyed that perception. You seem more interested in your own bloated ego and protecting your "Internet credibility" than in discussing the issues.

I provided links to an S/A article that countered your position that BD's problems were mostly confined to cache. Either point out how S/A was wrong or don't bother responding. Attacking a poster rather than countering his or her argument is not acceptable under the TOS here.

Back on topic, it doesn't matter how many integer or FP pipes a "core" might have - if the shared scheduler can only keep two per core filled under CMT, then the core is only going to be able to execute two integer computations simultaneously, rather than 3 or 4 as in K8 and later, and Core2 and later respectively.

Pretty much your just looking for reasons to hate on a product, this is the kind of blind hate that I can't stand to see spread, even on enthusiast websites.

Whatever. I was considering BD for my next build, and was disappointed to see AMD release yet another mediocre product. I don't hate the product or AMD, I just think it is a disappointment. I might point out that your own posts demonstrate you think it is not a stellar product as well.

fazers_on_stun · Feb 17, 2012

You're arguing over semantics and being quite rude about it. While there are 8 integer cores per module they aren't being utilized effectively 100% of the time, or anywhere close to that. If they were then BD would not have been the disappointment it was.

The FX-4100 has 33% more integer cores than a Phenom-X4.
The FX-8150 has 78% more integer cores than a Phenom-X6.

In both comparisons the FX has a 32nm vs 45nm process advantage, more cache, more transistors and higher clock speeds. Performance wise it can be slower or slightly faster depending on benchmark.

Architecture wise it should have a significant advantage. We can just hope that AMD learned something and Piledriver can redeem the architecture.

If 22nm Ivy-Bridge doesn't perform better than 32mn Sandy-Bridge it will be equally disappointing.

IIRC he is a software guy and not an engineer himself, but that doesn't excuse his behavior.

Isn't PD slated for this summer or fall quarter? I dunno if AMD would have sufficient time to do an extensive redesign to get around the architectural flaws in BD. My understanding is that AMD realized the problem with the shared front end, but was counting on much higher clock speeds at release to compensate. Unfortunately for them, clock speeds are dependent on the process, and thus in GloFlo's hands..

IB likely won't offer much CPU improvement over SB, but will probably OC higher and have a much improved GPU. Plus support for PCIe 3, etc. So that is what I'm planning to use in a few months, if as you say it is not a disappointment according to the reviews.

fazers_on_stun · Feb 17, 2012

I'm with you here.
I got my 990XA last year and when BD was released and reviews came out....
I upgraded to a 965BE instead, sold my 955BE C2 and for $20 more got the 965BE C3.
with the new hardware I was ready for BD, now my hopes rest in Piledriver.
I do have a 2500K rig but I was AMD before I was Intel even before my LGA 1156, I had AMD.

Originally I was interested in Barcelona as an upgrade from my P4 Northwood, but after the delays and the 'dancing in the aisles' and other BS leaks that I learned about, including here on THG articles and the THG forums, I got a Q6700 shortly after it was released and never looked back. Ever since then, I have waited for the independent reviews to appear before plunking down $$ on components. I recall your buying that AM3+ mobo however 😀, but rest assured there are plenty other posters here who did the same thing. Maybe you guys should file for refunds from AMD or from Baron 😛..

Most of us are here to learn, and appreciate hard facts and links. Unfortunately some few already 'know it all' and are here to pontificate on their preconceived notions instead. Take the discussion about the L3 cache being next to worthless, for instance. Intel has publicly stated they don't incorporate features unless there is at least a 2x performance increase for the extra power consumption required. Since the L3 cache uses a lot of transistors, clearly it's power draw is not negligible. However its benefits must outweigh the extra power seeing as how Intel seems to make their L3 cache bigger with each generation. And I note that both AMD and Intel's top-end CPUs have loads of L3. Plus looking at AT's benchmarks for an Athlon X4 and the Phenom X4, same clock speeds, only difference being the first has no L3 while the 2nd does, some benchmarks show a pretty significant increase with the L3 cache present.

While I have learned some things from Palladin and mostly appreciate his participating on the forums, lately he seems off his meds or something. At least he is not a blind AMD fanboy too, unlike some others..

Cazalan · Feb 17, 2012

Isn't PD slated for this summer or fall quarter? I dunno if AMD would have sufficient time to do an extensive redesign to get around the architectural flaws in BD. My understanding is that AMD realized the problem with the shared front end, but was counting on much higher clock speeds at release to compensate. Unfortunately for them, clock speeds are dependent on the process, and thus in GloFlo's hands..

The first "Piledriver" is in the Trinity APU so should be around summer. AMD was aware of the BD problems before launch but didn't have time to address them and still get chips out. Trinity was at a different design stage and they were able to get some fixes in.

I also think they were relying on higher clock speeds. It was easy for them to blame GloFlo but that's kind of a cop out. There's more than just the process that dictates the final speed. I believe the reports from engineers saying there was too much reliance on automated tools attributing to higher gate counts. Automation is good for a lot of things but any minor inefficiencies can get replicated millions of times.

There's a reason all these companies shed their foundries. It's the hardest part of the job! It's the risk they take outsourcing chip fab to a new process node with a new architecture. Intel takes the safer route and just does a die shrink before trying new architecture.

triny · Feb 18, 2012

Personally I think BD was,is and always will be a first step in process, AMD ,I suspect
never put that much effort into it and many lost their jobs over it.
but to say it's a failure or is no good is not accurate.
they sold many,but no it's not the best
Imho I doubt amd will continue selling server chips as PC CPU after their APU is matured

nforce4max · Feb 18, 2012

Their pulling and twisting numbers to try to make the FX8150 into a four core unit without actually knowing much about processor design.

AMD coined the term "Integer Core" not to mean integer units, which are actually components of a processor, but to represent a regular processor that has had it's FPU decoupled. The last CPU to not have an FPU was a 80486SX. From the Pentium onwards the FPU was integrated. Eventually it morphed from a standard 80-bit FPU into the 128-bit SIMD (64x2 or 128x1) units we see nowadays. AMD decided to remove the FPU and make it into a single 256-bit AVX unit that is capable of simultaneous 2x128-bit SIMD transactions (2x64 each) and allow it to be addressed separately from the standard integer units.

Thus technically a single AMD "module" contains three "cores", two Integer "Cores" complete with schedulers, 4 pipelines and their own set of L1 cache, and one SIMD "Core" that has it's own scheduler. Since FPU's are rarely treated as full processors it's not counted as a core.

To manage these three "Cores" AMD decoupled the instruction decoded, branch predictor / instruction prefetch unit and gave it four x86 instruction schedulers. Those are what keep track of the x86 macro-ops that are then reduced into several micro-ops each and dispatched to the internal Integer Units / FPU for processing. The internal Integer Units scheduler then tracks the processing of those micro-ops through its pipeline until the instruction is complete and the value is returned to the front end decoder + predictor for evaluation and return to program. It's complicated, much more then a typical processor design would be, and requires many independently moving parts. L2 cache is shared amongst all those components which explains the monstrous latency involved.

Anyhow the concept of "cores" is useless when looking over processors. Modern day processors have multitudes of processing resources with miniature dedicated processors dedicated to managing those resources. This is why we don't have 32 integer units on a CPU, it would be nearly impossible to track and schedule it efficiently unless you had 32 separate non-dependent integer threads going on. This doesn't even touch SIMD / FPU instructions and memory operations.

When I look upon AMD's choices in BD design almost anyone can figure out that what AMD had done resulted in a design that was far to complex during a time where current operating systems as well applications are still in the stone ages when it comes to scaling past two or four cores/threads. Very few scale past that outside the workstation and sever arena. Sharing the FPU between two cores when resources to manage that are so limited surely anyone working on this early on should have noticed that something needed to be done before time was due for a finished product. They should have improved existing products while buying more time to improve BD design before ruining company reputation and losing market share. Last but not least is the cache scheme, the way they made the choice of sharing the L2 cache in this manner is insane when some portions should have been made as exclusive to each core allowing for lower latency and a common cache for the complete module. If one thinks about this after the fact to try to make it work while keeping to the existing scheme it only comes out even more bloated than it already is. AMD should have adopted the same approach as what Intel had done with SB that being the L0.

"We start noticing significant changes in Sandy Bridge microarchitecture in the beginning of the pipeline already: when x86 instructions are decoded into simpler processor micro-ops. The actual decoder remained the same as in Nehalem – it processes 4 instructions per clock cycle and supports Micro-Fusion and Macro Fusion technologies that make the output instructions thread more even in terms of execution complexity. However, the processor instructions decoded into micro-operations are not just transferred to the next processing stage, but also cached. In other words, in addition to the regular 32 KB L1 cache for instructions that is a feature of almost any x86 processor, Sandy Bridge also has an additional “L0” cache for storing the decoding results. This cache is the first flashback from NetBurst microarchitecture, its general operation principles make it similar to the Execution Trace Cache." http://www.xbitlabs.com/articles/cpu/display/sandy-bridge-microarchitecture_3.html

Now comes the problem of the L2 cache for the module since the L3 is shared for the entire enabled chip no portion can be made exclusive from any of the cache. Traditionally the L2 cache is either shared or exclusive meaning that each core has it's own portion that is not available to the remaining cores from a total amount like with Bloomfield, Lynnfield, and SB ext. So that leaves a problem, one can't have from what I've seen over the years a shared and exclusive L2 at the same time with current cache hierarchy. If they had gone the more traditional route by having a FPU integrated into each core rather than the single 256bit decoupled FPU per module the added units that were needed to support the decoupled unit wouldn't have been needed and could have made the resulting module even smaller. That would have solved the issue with the L2.

There are few things that I do like about BD is that they did go for the modular concept of cores and resources which is ahead of its time but poorly executed. I wish that AMD and Intel would do a very highly scalable modular architectures while at the same time being supprior to past generations that were not so limited then we could see cpus years from now than could have scaled easily beyond 8 physical cores while at the same time having all levels of the x86 market addressed all the way from the ulv low end market all the way to the top of the line server market.

nforce4max · Feb 18, 2012

Nice to know some people actually know something about bulldozer's design. Most people seem to just spew random garbage they collected from bits of information they don't understand.

The worst are the people who don't know the difference between the module and SMT and calls the FX 81XX 4 cores. Makes no sense even by the benchmark numbers.

One thing I never got is why they have such a small L1 when the L2 is so bad.

The problem with larger caches in processor design is latency, the larger the cache is scaled up latency tends to fallow and so does power consumption as we all know. The L1 since the 386 days onward has been intended for lowest latency as possible so that means the cache will always remain very small. That is why L2 and L3 even exists. Their is L0 but rarely is it ever implemented for decoder results ect. The first cpu that I've seen that made use of it was from Cyrix but it didn't make much of an impact as it was well tiny being only 256bytes! SB however is the best example of L0 in use today and thus far created. Interesting though that Cyrix and Intel kept the usual L1 cache scheme that being divided into Int and Data rather than what happened to netburst.

nforce4max · Feb 18, 2012

Personally I think BD was,is and always will be a first step in process, AMD ,I suspect
never put that much effort into it and many lost their jobs over it.
but to say it's a failure or is no good is not accurate.
they sold many,but no it's not the best
Imho I doubt amd will continue selling server chips as PC CPU after their APU is matured

AMD needed what many in business culture often call "new blood" but may have never gotten it or it was already to late by the time they did while hemorrhaging critical designers and programmers before the project could go to market. Sure they will sell but eventually people and business will stop buying once they learn that they could have bought a better performing product for more or less than BD. BD in the server market is Hindenburg for those who have to debate over every watt of power needed to get a job or application done or decide to move over to Xeon or stick with their existing hardware. For data center use these are like kryptonite.

fazers_on_stun · Feb 18, 2012

The first "Piledriver" is in the Trinity APU so should be around summer. AMD was aware of the BD problems before launch but didn't have time to address them and still get chips out. Trinity was at a different design stage and they were able to get some fixes in.

Maybe so, although I think it takes longer than a year to design & test then validate any major architectural changes.

I also think they were relying on higher clock speeds. It was easy for them to blame GloFlo but that's kind of a cop out. There's more than just the process that dictates the final speed. I believe the reports from engineers saying there was too much reliance on automated tools attributing to higher gate counts. Automation is good for a lot of things but any minor inefficiencies can get replicated millions of timesThere's a reason all these companies shed their foundries. It's the hardest part of the job! It's the risk they take outsourcing chip fab to a new process node with a new architecture. Intel takes the safer route and just does a die shrink before trying new architecture.

I saw those same reports, although there were later reports stating the engineer never actually worked for AMD. IIRC AMD pretty much stated during their Q4 report, or maybe it was analyst day, that they would be increasing their reliance on automated SoC-type design tools. My understanding is that while that does reduce the time & cost of a new design, they also should hand-tune the critical speed paths on the chip for bumping up performance.

My guess is that it is indeed GF's process problems that are mostly responsible for the low clock speeds as well as the low yields. Most of the IBM fab club - including GF - will be abandoning gate-first HKMG at 22nm and going to the Intel/TSMC gate-last approach. While GF's 32nm is apparently pretty good at making leaky transistors (e.g., the recent world-record in overclocking BD), it's not so good at making low-leakage but high-clocking ones.

triny · Feb 18, 2012

AMD needed what many in business culture often call "new blood" but may have never gotten it or it was already to late by the time they did while hemorrhaging critical designers and programmers before the project could go to market. Sure they will sell but eventually people and business will stop buying once they learn that they could have bought a better performing product for more or less than BD. BD in the server market is Hindenburg for those who have to debate over every watt of power needed to get a job or application done or decide to move over to Xeon or stick with their existing hardware. For data center use these are like kryptonite.

Though it is beyond my understanding why some people will buy products blindly ,God forbid you should ask them their
reasoning.If AMD wants to get serious about selling high end they ought to switch to LGA like opteron
I can't overlook the possibility BD was the result of failed server cores to turn loses into gold

Cazalan · Feb 18, 2012

fermi 2.0, nvidia's monolithic design wins again. AMD doesn't seem to be having too many issues at ~378mm2, but nvidia apparently can't get one to work from 550mm2. learning from the past is not good for the future is apparently Nvidiots philosophy, but aside from that, whats this have to do with PD?

The bigger the die the worse the yields. Same for most processes.
Like a 30" LCD is way more expensive than a 24" LCD.

triny · Feb 18, 2012

The bigger the die the worse the yields. Same for most processes.
Like a 30" LCD is way more expensive than a 24" LCD.

then why do some 32" tv cost same as 23" monitors

nforce4max · Feb 18, 2012

Though it is beyond my understanding why some people will buy products blindly ,God forbid you should ask them their
reasoning.If AMD wants to get serious about selling high end they ought to switch to LGA like opteron
I can't overlook the possibility BD was the result of failed server cores to turn loses into gold

Netburst sold very well but afterward stained Intel's reputation even now amongst some. It all comes down to the Brand. People will buy thinking that all is good if the product before meet the needs or expectations. We all see examples of this in our everyday lives in just about everything one can care to imagine from cars, shoes, different fast food and shopping outlets. Brands are what people remember so any company that knows how to exploit it's brand sells well and any good salesman can sell sh^t but never again to the same people as before.

fazers_on_stun · Feb 18, 2012

When I look upon AMD's choices in BD design almost anyone can figure out that what AMD had done resulted in a design that was far to complex during a time where current operating systems as well applications are still in the stone ages when it comes to scaling past two or four cores/threads. Very few scale past that outside the workstation and sever arena. Sharing the FPU between two cores when resources to manage that are so limited surely anyone working on this early on should have noticed that something needed to be done before time was due for a finished product. They should have improved existing products while buying more time to improve BD design before ruining company reputation and losing market share. Last but not least is the cache scheme, the way they made the choice of sharing the L2 cache in this manner is insane when some portions should have been made as exclusive to each core allowing for lower latency and a common cache for the complete module. If one thinks about this after the fact to try to make it work while keeping to the existing scheme it only comes out even more bloated than it already is. AMD should have adopted the same approach as what Intel had done with SB that being the L0.

"We start noticing significant changes in Sandy Bridge microarchitecture in the beginning of the pipeline already: when x86 instructions are decoded into simpler processor micro-ops. The actual decoder remained the same as in Nehalem – it processes 4 instructions per clock cycle and supports Micro-Fusion and Macro Fusion technologies that make the output instructions thread more even in terms of execution complexity. However, the processor instructions decoded into micro-operations are not just transferred to the next processing stage, but also cached. In other words, in addition to the regular 32 KB L1 cache for instructions that is a feature of almost any x86 processor, Sandy Bridge also has an additional “L0” cache for storing the decoding results. This cache is the first flashback from NetBurst microarchitecture, its general operation principles make it similar to the Execution Trace Cache." http://www.xbitlabs.com/articles/cpu/display/sandy-bridge-microarchitecture_3.html

Hmm, I thought BD did have something a bit similar. Both SB and BD borrowed some ideas first introduced with Netburst. But this concept of storing the decoded micro-ops then playing them back during loop or other repetitive executions to save time and power compared to re-decoding the macro-ops, is certainly a plus. Maybe it was somebody's wish-list for BD features that I recall, rather than an actual implemented feature..

nforce4max · Feb 18, 2012

then why do some 32" tv cost same as 23" monitors

Standard monitors have higher pixel density than TVs with larger display panels. That is why TVs are cheap for their size but up close one knows as they look like crap while a monitor is different. Personally I would like to see higher pixel densities on TVs for better and more consistent image quality.

nforce4max · Feb 18, 2012

Hmm, I thought BD did have something a bit similar. Both SB and BD borrowed some ideas first introduced with Netburst. But this concept of storing the decoded micro-ops then playing them back during loop or other repetitive executions to save time and power compared to re-decoding the macro-ops, is certainly a plus. Maybe it was somebody's wish-list for BD features that I recall, rather than an actual implemented feature..

I will have to look over BD again but one thing I know for sure is that there is no L0 in BD unlike SB. I wonder how much effort they really do put into designing their CPUs now days. I know that their main pitfall has been their almost total reliance on automated design software, one of their former engineers had said a few months ago that such designs that are brought about by automated software is usually 20% larger and 20% slower than traditional methods of doing the work by hand. That is one reason the L3 takes up so much space, eventually the industry is going to have to come to the realization that every transistor added counts and every transistor that manged to save results in savings for everyone.

Cazalan · Feb 18, 2012

From what I have been reading here and elsewhere this year isn't going to be as great as many of us where expecting or at the very least hoping for when it comes to next gen upgrades :s

Pretty much stuck with two aging and decaying GTX280 and a gtx460 on the side that has outlived what most have seen that bought from Gigabyte. For now I am looking at the 7850 being a good option for my Intel build while my vintage relics will remain with my phenom 2 x4 box.

The 28nm video cards coming out should do well. Video cards have been stuck at 40nm for a while now. Considering how well the 7750/7770 overclock already. There's lots of headroom on this part.

Ivy looks promising. First 22nm part and first Tri-Gate part. Overclockers will probably have a field day with this chip.

AMD Piledriver rumours ... and expert conjecture

Administrator

Splendid

Distinguished

Splendid

Splendid

Splendid

Distinguished

Distinguished

Distinguished

Splendid

Splendid

Splendid

Distinguished

Distinguished

Splendid

Splendid

Splendid

Splendid

Distinguished

Distinguished

Distinguished

Splendid

Splendid

Splendid

Splendid

Distinguished

Share this page