AMD CPUs, SoC Rumors and Speculations Temp. thread 2

Page 57 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


It would not really have an impact...I have a 9590, but the process is completely different, and so is the uarch. Nothing useful would be gleaned from looking at PD uarch and process at 3 GHz.
 



1) No. Consider a two-way branch. Predictor assumes branch A will be taken, then early before execution the front-end starts decoding instructions from branch A and storing on the uop cache, then the branching condition is executed and found that prediction was incorrect and that branch B is taken. The pipeline has to be flushed and instructions from branch B have to be decoded. It is very unlikely that the uop cache (which contains instructions from the wrong branch) will contain the instructions from branch B. Precisely the goal of branching is to generate different code. The uop cache doesn't reduce the misprediction branch penalty. It wasn't designed for that, as explained to you before. In fact, canonical RISC microarchitectures don't use uop caches because are useless when you have a 1:1 ratio between decoded instructions and uops.

2) As showed above the uop cache doesn't reduce the misprediction branch penalty; therefore it plays no role on increasing pipeline stages. Intel last speed-demon microarchitecture was P4E, everything new is brainiac. This is also the correct picture for Zen. Zen is similar to Sandy, Ivy, Haswell, Broadwell, and Skylake.

BD/PD were speed-demons. The problem of BD/PD was neither the long pipeline (18 stages) nor the branch predictor. The problem is that the K11 microarchitecture was designed to target much higher frequencies than could be achieved in practice. The pipeline did target max theoretical frequencies of about 10GHz and relied on hype about SOI processes. IBM has been selling 5.5GHz CPUs (could hit up to 6GHz) on 32nm but BD only could hit 3.6GHz on Globalfoundries 32nm. The cycle was slower than expected and BD had an unexpected misprediction deficit on units of time.

Target: 18 cycles / 5 GHz = 3.6 ns.
Reality: 18 cycles / 3.6 GHz = 5 ns.

AMD did learn the lesson and is spinning towards a brainiac microarchitecture with Zen and no longer relying on magical SOI process nodes.

3) No. Pipelining is only a technique to help engineers to design fast circuits. Instead designing a complex circuit can do N work per time, they design simpler circuits that perform 1 work per time and then use pipelining to overlap N circuits to get nearly N work per unit of time. As any other technique, pipelining cannot violate the laws of physics. The maximum clock achievable by a microarchitecture follows a physical law that depends on physical parameters such as geometry, signal propagation, substrate parameters, etc. which cannot be varied by pipelining. Take the substrate for instance, if your process node is optimized for 2.5GHz, no matter how long is the pipeline in your microarchitecture you will not hit 7GHz.

4) Someone said me that Zen doesn't use a custom node but just plain 14LPP. Lisa Su confirmed that AMD doesn't longer invest on custom processes. What you call negativity, I call it realism. This is why most of what I have said about Zen has been confirmed.
 


He doesn't mean 3.5GHz base, but single-core turbo. He also wrote "Besides, until proved otherwise the 14nm Samsung / GlobalFoundries node is the biggest mistake AMD has ever made."



AMD could chose to talk about both IPC and performance or both IPC and clocks. It seems evident that they are only talking about IPC gains because frequencies will be a weak point and part of the IPC improvements will be compensated by reduction in clocks.
 


Let me emphasize again that I am talking about "base clocks". I have said that people is expecting base clocks in the 2.6--2.8GHz range.

The 14LPP node is optimized for 2.5GHz. Anything above that and both power consumption and voltage start to skyrocket, this is specially true above the 3GHz mark. Yes, >3GHz are expected for single-core turbo, but some people is skeptic AMD could hit 3.5GHz on that node. The Stilt wrote "At the moment I'd expect 2600MHz (±200MHz) base and 3200MHz (±200MHz) maximum boost. "

Yes, AMD is talking about a substantial overall performance increase over an unspecified baseline. They retort to ambiguities as "our current core", "our previous gen", and so on, and publish abstract performance graphs without labels, without baselines, and without specifying a concrete processor. They only give a concrete core, Excavator, and a concrete percentage when talk about IPC improvements.
 


It is important to remark that subtracting 40% from the FX-9590 base clock of 4.7GHz means 3.4GHz. And many people seems to agree on sub 3GHz base clocks for octo-core Zen.

It is worth to remark something similar did happen with Kaveri, the IPC increase over Richland was compensated by a reduction in clocks and less overclock headroom.
 


uop cache can reduce latency on mispredicted branches because you can run instruction loads to uop cache without consuming processor cycles. This improves processor performance on mispredicted branches by reducing recovery time, thus eliminating wasted cycles spent retrieving and storing instructions.

You still have a mispredicted branch penalty, but you lose fewer cycles by integrating uop cache.

Savvy?

EDIT: Here is a great link about pipelines: http://www.cs.cmu.edu/afs/cs/academic/class/15740-f03/public/doc/discussions/uniprocessors/technology/deep-pipelines-isca02.pdf

BD/PD were speed-demons. The problem of BD/PD was neither the long pipeline (18 stages) nor the branch predictor. The problem is that the K11 microarchitecture was designed to target much higher frequencies than could be achieved in practice. The pipeline did target max theoretical frequencies of about 10GHz and relied on hype about SOI processes. IBM has been selling 5.5GHz CPUs (could hit up to 6GHz) on 32nm but BD only could hit 3.6GHz on Globalfoundries 32nm. The cycle was slower than expected and BD had an unexpected misprediction deficit on units of time.

Target: 18 cycles / 5 GHz = 3.6 ns.
Reality: 18 cycles / 3.6 GHz = 5 ns.

AMD did learn the lesson and is spinning towards a brainiac microarchitecture with Zen and no longer relying on magical SOI process nodes.

Shorter pipelines = slower clocks than longer pipelines. That is a fundamental rule of chip design, as you noted, the issue was the process was not up to withstanding the leakage at reasonable temperatures and voltages.

3) No. Pipelining is only a technique to help engineers to design fast circuits. Instead designing a complex circuit can do N work per time, they design simpler circuits that perform 1 work per time and then use pipelining to overlap N circuits to get nearly N work per unit of time. As any other technique, pipelining cannot violate the laws of physics. The maximum clock achievable by a microarchitecture follows a physical law that depends on physical parameters such as geometry, signal propagation, substrate parameters, etc. which cannot be varied by pipelining. Take the substrate for instance, if your process node is optimized for 2.5GHz, no matter how long is the pipeline in your microarchitecture you will not hit 7GHz.

Because of additional factors, sure. However....a chip on that same substrate with shorter pipelines and otherwise same uarch will end up being a slower processor than a chip with longer pipelines.
4) Someone said me that Zen doesn't use a custom node but just plain 14LPP. Lisa Su confirmed that AMD doesn't longer invest on custom processes. What you call negativity, I call it realism. This is why most of what I have said about Zen has been confirmed.

Most of what you said has been confirmed?

Alright, I am calling your bluff on this one. Links to show AMD has confirmed anything you said so far...

 


Kaveri was a failure because they gained 15-20% IPC and lost 10-15% clockspeed.

That is a 5-10% improvement and well below what they needed to make any impact...
 


Paraphrasing the Washington Post, there is math and there is fantasy math

Zen single thread scores obtained from where?

"PD" means anything from the efficient 3.2GHz chips to the Centurion 4.7GHz chips.

According to The Stilt AMD has changed his claims lately and now they claim "up to 40% over Excavator".

From where you got that Excavator is 20% faster on ST?

Why do you mix ST scores with base clocks?
 


I provided you an explicit example showing no saving of cycles. I also pointed you that RISC desings with a 1:1 ratio doesn't use uop cache because is useless.



It is not true that "Shorter pipelines = slower clocks than longer pipelines". You are oversimplifying the topic.

I also wonder why you repeat what I have said about BD and how the 32nm process at Glofo didn't match the hype.



This is not true in general. You are oversimplifying the topic.
 


What improvement? Kaveri was slower than Richland both at stock clocks

A10-7850K_Cinebench-R15.jpg
 


There are several problems with multiquoting. One of them is some people reproducing the whole multiauthor answer instead cutting the text and answering only the relevant part (check the first post in this page, a whole piece of text is quoted only to reply the last line).

About turbos, my experience in several forums is that most people expect lower clocks (both base and turbo for Zen).
 


Not everything is public right now, juan. You should know that.

I have seen FX8350 versus Zen ST cinebench scores...Zen is 50% higher than 8350.
 



There you go pulling wccftech.
 


A lot of information "seen" or "heard" about Zen was finally proven to be incorrect. There are even a set of fake slides about Zen still circulating by the Internet.

Not to mention that you didn't answer the rest of my questions.
 


Hum, my dislike for wccftech is twofold: (i) they have no idea about tech and often write nonsense and (ii) they often stolen content from others (including forums).

I have no problem with linking to an image stored in their servers, but if you want the source here goes

https://www.pugetsystems.com/labs/articles/AMD-A10-7850K-Performance-Review-529/

You can find other benchmarks where Kaveri is slower than Richland

pic_disp.php
 
Yes, Kaveri was slower than Richland because the improvement was insufficient to cover the clock speed regression. I predicted that would occur prior to launch, if you will recall.

As for more AMD news: https://www.reddit.com/r/Amd/comments/4lpooo/psa_until_amd_gives_us_more_info_please_avoid/d3pkwrr

Amazing post there summarizing known information and mixing in a light bit of conjecture over polaris.

Additionally, break out the salt, supposedly AMD is going to demo Zen at computex, and paper launch: https://www.reddit.com/r/Amd/comments/4luw95/amd_zen_reported_to_be_ready_for_presentation_at/
 


I recall you stating "facts" about Steamroller latter showed to have zero relation to reality: the non-existent "extra ALU" in the core, the non existent "massive increase in floating point calculations", and my favorite "do you think haswell will be as good as steamroller?" ;-)

http://www.tomshardware.co.uk/forum/361854-28-steamroller-sandy/page-2#10677240

On the other hand I recall that my predictions about Steamroller and Kaveri were off by ~5% from measurements on final silicon, with my major fault being on frequencies, because the reduction in frequencies introduced by the new 28SHP node was ~300MHz higher than I had expected.



I stopped reading after "Observation 2", because the data he gives is incorrect.



Lovely reading, specially the guy that claims that the elongated shape of the Zen die indicates HBM integration. LOL

I would also like to know what Fuad means by "confirmed that AMD has Zen x86 prototype chips ready for demonstration". "Prototype" can means anything from FPGA-based RTL prototype to PC (Production Candidate) silicon.
 


Let us assume for an instant that is true. Take the next graph
67034.png


multiply the FX-8350 score and you obtain 144 points, which puts Zen just where many of us predicted or expected years ago. Where are your promises now?

Forget haswell, on par with skylake is coming...

JK says they will be better by skylake. My source says even if they fall short of predictions they will be "on par".

Jim Keller has had tons of time in the industry working on projects since K7...if he says it will be better, I have no reason to doubt. Though, like those I know at AMD, I consider it a win if they are within 5%.

Jim Keller said, "AMD are on track to catch Intel in high performance cores"

I said: Jim Keller expects them to be ahead by skylake, my source says he (conservatively) expects they will be about even by skylake, though he expected there would be within 5% difference in some workloads just because of differences in designs, and that would be fine...
 


Well assuming that is true (Zen as released will score 144 points in ST, and at relatively modest clock speeds if the info we're getting are accurate), then Zen will be neigh on level pegging with Intel for IPC, albeit at a bit of clock speed disadvantage. I mean the 4790k gets that score purely via higher clock rate, and we also know Skylake scores barely higher than Haswell. Whichever way you slice it that means Zen is a *huge* improvement from a competitive standpoint than at any point during the 'dozer' generation.

I personally would consider a zen ST cinebench score of 144 a big win, certainly anywhere in the 140+ region is fine. Assuming that kind of gain holds true across other single thread applications AMD should actually be a decent position (as they stand a chance of really challenging Intel in many of the MT benchmarks- certainly a 6 core Zen part would be considerably faster overall than intel's quad core i7s).

Lets hope this is the case (I'd argue this is probably best case, even an ST score in the 120 - 130 range would be an abnormally large generation jump in single thread performance). If it is- then going Intel or AMD would be an actual choice for enthusiasts and professional system builders again. That is what we all need- no AMD won't be dominant aka the A64 days, but they will be close enough that those without a massive bias will actually have cause to consider them outside the few niche areas they are competing in today.
 


I would be absolutely fine with scores in line with the 4960x or 5960x, especially considering the fact that the higher intel chips are there because of their increased clock speeds.

That is probably going to fall in line with Broadwell-E 8 core parts as well...which is just fine, too.
 
Status
Not open for further replies.