AMD CPUs, SoC Rumors and Speculations Temp. thread 2

8350rocks · May 27, 2016

prtskg :

I remember Thevenin posting these from which something can be said about clocks -

"It seems that unlike with Bulldozer, AMD has created separate dies for server and consumer parts. The server version of the die has twice the cores, L3 cache and additional I/O controllers per die. I haven´t been able to disassemble one yet, however judging from the package size it is a MCM part. 14nm LPP process.

The relative power consumption is roughly the same as on Intel 14nm parts with similar configuration, but the clocks are quite low :/

40501415 "

"For a long time I actually like what I see. I´d say as long as the consumer Zen parts can reach high enough clocks (min. 3.5GHz), everything will be pretty good"

The last post is from 16th March. His post doesn't seem to imply it can't reach 3.5GHz, not that I'd expect it to reach such clocks.

I think AMD talking about IPC is obvious because it's there that they are weak with BD. Improvement in IPC helps with efficiency while the same cannot be said about frequency.

@ 8350 - If you are using FX 8350, can you report what kind of power consumption you get at sub 3ghz frequencies?

It would not really have an impact...I have a 9590, but the process is completely different, and so is the uarch. Nothing useful would be gleaned from looking at PD uarch and process at 3 GHz.

juanrga · May 27, 2016

8350rocks :

1) No. Consider a two-way branch. Predictor assumes branch A will be taken, then early before execution the front-end starts decoding instructions from branch A and storing on the uop cache, then the branching condition is executed and found that prediction was incorrect and that branch B is taken. The pipeline has to be flushed and instructions from branch B have to be decoded. It is very unlikely that the uop cache (which contains instructions from the wrong branch) will contain the instructions from branch B. Precisely the goal of branching is to generate different code. The uop cache doesn't reduce the misprediction branch penalty. It wasn't designed for that, as explained to you before. In fact, canonical RISC microarchitectures don't use uop caches because are useless when you have a 1:1 ratio between decoded instructions and uops.

2) As showed above the uop cache doesn't reduce the misprediction branch penalty; therefore it plays no role on increasing pipeline stages. Intel last speed-demon microarchitecture was P4E, everything new is brainiac. This is also the correct picture for Zen. Zen is similar to Sandy, Ivy, Haswell, Broadwell, and Skylake.

BD/PD were speed-demons. The problem of BD/PD was neither the long pipeline (18 stages) nor the branch predictor. The problem is that the K11 microarchitecture was designed to target much higher frequencies than could be achieved in practice. The pipeline did target max theoretical frequencies of about 10GHz and relied on hype about SOI processes. IBM has been selling 5.5GHz CPUs (could hit up to 6GHz) on 32nm but BD only could hit 3.6GHz on Globalfoundries 32nm. The cycle was slower than expected and BD had an unexpected misprediction deficit on units of time.

Target: 18 cycles / 5 GHz = 3.6 ns.
Reality: 18 cycles / 3.6 GHz = 5 ns.

AMD did learn the lesson and is spinning towards a brainiac microarchitecture with Zen and no longer relying on magical SOI process nodes.

3) No. Pipelining is only a technique to help engineers to design fast circuits. Instead designing a complex circuit can do N work per time, they design simpler circuits that perform 1 work per time and then use pipelining to overlap N circuits to get nearly N work per unit of time. As any other technique, pipelining cannot violate the laws of physics. The maximum clock achievable by a microarchitecture follows a physical law that depends on physical parameters such as geometry, signal propagation, substrate parameters, etc. which cannot be varied by pipelining. Take the substrate for instance, if your process node is optimized for 2.5GHz, no matter how long is the pipeline in your microarchitecture you will not hit 7GHz.

4) Someone said me that Zen doesn't use a custom node but just plain 14LPP. Lisa Su confirmed that AMD doesn't longer invest on custom processes. What you call negativity, I call it realism. This is why most of what I have said about Zen has been confirmed.

juanrga · May 27, 2016

prtskg :

He doesn't mean 3.5GHz base, but single-core turbo. He also wrote "Besides, until proved otherwise the 14nm Samsung / GlobalFoundries node is the biggest mistake AMD has ever made."

prtskg :

AMD could chose to talk about both IPC and performance or both IPC and clocks. It seems evident that they are only talking about IPC gains because frequencies will be a weak point and part of the IPC improvements will be compensated by reduction in clocks.

juanrga · May 27, 2016

cdrkf :

juanrga :

The process being optimised for 'sub 3ghz frequencies' isn't the same thing as saying the process cannot go above 3ghz. Modern mobile processors are now in the high 2ghz range which tallies up. Pushing the speed beyond 3ghz will just use proportionately more power. I'm not suggesting we'll see a 4ghz base frequency chip here- but looking at the hex core part in particular you still have the same power budget of 95W and 2 less cores (4 less threads) + a corresponding reduction in cache (I'm assuming here). That will give them some wiggle room- yeah it's possible the 8 core part might be around (or just below) the 3ghz range for base clock. That said with a 25% reduction in execution resources for same power the hex core should have a corresponding increase in base clock. Lets be conservative and say that, due to the sub 3ghz optimization a 25% power reduction afforded by the reduction in core count translates to a 12.5% core clock increase- that would put a zen hex core part in the 3.3 - 3.4 ghz range of base clock.

I agree- AMD haven't talked much about clocks- but lets be clear here, they ARE talking about a substantial overall performance increase. If the chips cannot clock above 3ghz, that would negate pretty much all the performance uplift from the IPC improvements. My thinking is base clocks might be low, but so long as the Turbo can push at least some of the cores a lot higher then no problem. My guess would be circa 4ghz on single / dual threads, hopefully 3.8ghz on up to half the cores. That would put the speeds in a sensible range when running day to day tasks that don't use the whole chip.

Let me emphasize again that I am talking about "base clocks". I have said that people is expecting base clocks in the 2.6--2.8GHz range.

The 14LPP node is optimized for 2.5GHz. Anything above that and both power consumption and voltage start to skyrocket, this is specially true above the 3GHz mark. Yes, >3GHz are expected for single-core turbo, but some people is skeptic AMD could hit 3.5GHz on that node. The Stilt wrote "At the moment I'd expect 2600MHz (±200MHz) base and 3200MHz (±200MHz) maximum boost. "

Yes, AMD is talking about a substantial overall performance increase over an unspecified baseline. They retort to ambiguities as "our current core", "our previous gen", and so on, and publish abstract performance graphs without labels, without baselines, and without specifying a concrete processor. They only give a concrete core, Excavator, and a concrete percentage when talk about IPC improvements.

juanrga · May 27, 2016

cdrkf :

It is important to remark that subtracting 40% from the FX-9590 base clock of 4.7GHz means 3.4GHz. And many people seems to agree on sub 3GHz base clocks for octo-core Zen.

It is worth to remark something similar did happen with Kaveri, the IPC increase over Richland was compensated by a reduction in clocks and less overclock headroom.

8350rocks · May 27, 2016

juanrga :

uop cache can reduce latency on mispredicted branches because you can run instruction loads to uop cache without consuming processor cycles. This improves processor performance on mispredicted branches by reducing recovery time, thus eliminating wasted cycles spent retrieving and storing instructions.

You still have a mispredicted branch penalty, but you lose fewer cycles by integrating uop cache.

Savvy?

EDIT: Here is a great link about pipelines: http://www.cs.cmu.edu/afs/cs/academic/class/15740-f03/public/doc/discussions/uniprocessors/technology/deep-pipelines-isca02.pdf

BD/PD were speed-demons. The problem of BD/PD was neither the long pipeline (18 stages) nor the branch predictor. The problem is that the K11 microarchitecture was designed to target much higher frequencies than could be achieved in practice. The pipeline did target max theoretical frequencies of about 10GHz and relied on hype about SOI processes. IBM has been selling 5.5GHz CPUs (could hit up to 6GHz) on 32nm but BD only could hit 3.6GHz on Globalfoundries 32nm. The cycle was slower than expected and BD had an unexpected misprediction deficit on units of time.

Target: 18 cycles / 5 GHz = 3.6 ns.
Reality: 18 cycles / 3.6 GHz = 5 ns.

AMD did learn the lesson and is spinning towards a brainiac microarchitecture with Zen and no longer relying on magical SOI process nodes.

Shorter pipelines = slower clocks than longer pipelines. That is a fundamental rule of chip design, as you noted, the issue was the process was not up to withstanding the leakage at reasonable temperatures and voltages.

3) No. Pipelining is only a technique to help engineers to design fast circuits. Instead designing a complex circuit can do N work per time, they design simpler circuits that perform 1 work per time and then use pipelining to overlap N circuits to get nearly N work per unit of time. As any other technique, pipelining cannot violate the laws of physics. The maximum clock achievable by a microarchitecture follows a physical law that depends on physical parameters such as geometry, signal propagation, substrate parameters, etc. which cannot be varied by pipelining. Take the substrate for instance, if your process node is optimized for 2.5GHz, no matter how long is the pipeline in your microarchitecture you will not hit 7GHz.

Because of additional factors, sure. However....a chip on that same substrate with shorter pipelines and otherwise same uarch will end up being a slower processor than a chip with longer pipelines.

4) Someone said me that Zen doesn't use a custom node but just plain 14LPP. Lisa Su confirmed that AMD doesn't longer invest on custom processes. What you call negativity, I call it realism. This is why most of what I have said about Zen has been confirmed.

Most of what you said has been confirmed?

Alright, I am calling your bluff on this one. Links to show AMD has confirmed anything you said so far...

8350rocks · May 27, 2016

juanrga :

Kaveri was a failure because they gained 15-20% IPC and lost 10-15% clockspeed.

That is a 5-10% improvement and well below what they needed to make any impact...

juanrga · May 27, 2016

8350rocks :

Paraphrasing the Washington Post, there is math and there is fantasy math

Zen single thread scores obtained from where?

"PD" means anything from the efficient 3.2GHz chips to the Centurion 4.7GHz chips.

According to The Stilt AMD has changed his claims lately and now they claim "up to 40% over Excavator".

From where you got that Excavator is 20% faster on ST?

Why do you mix ST scores with base clocks?

juanrga · May 27, 2016

8350rocks :

I provided you an explicit example showing no saving of cycles. I also pointed you that RISC desings with a 1:1 ratio doesn't use uop cache because is useless.

8350rocks :

It is not true that "Shorter pipelines = slower clocks than longer pipelines". You are oversimplifying the topic.

I also wonder why you repeat what I have said about BD and how the 32nm process at Glofo didn't match the hype.

8350rocks :

This is not true in general. You are oversimplifying the topic.

juanrga · May 27, 2016

8350rocks :

What improvement? Kaveri was slower than Richland both at stock clocks

prtskg · May 27, 2016

Multiquote people, multiquote! It's much efficient and good looking. So how many people think zen's single core won't reach 3.5GHz in turbo?

juanrga · May 28, 2016

prtskg :

There are several problems with multiquoting. One of them is some people reproducing the whole multiauthor answer instead cutting the text and answering only the relevant part (check the first post in this page, a whole piece of text is quoted only to reply the last line).

About turbos, my experience in several forums is that most people expect lower clocks (both base and turbo for Zen).

8350rocks · May 28, 2016

juanrga :

8350rocks :

Paraphrasing the Washington Post, there is math and there is fantasy math

Zen single thread scores obtained from where?

"PD" means anything from the efficient 3.2GHz chips to the Centurion 4.7GHz chips.

According to The Stilt AMD has changed his claims lately and now they claim "up to 40% over Excavator".

From where you got that Excavator is 20% faster on ST?

Why do you mix ST scores with base clocks?

Not everything is public right now, juan. You should know that.

I have seen FX8350 versus Zen ST cinebench scores...Zen is 50% higher than 8350.

8350rocks · May 28, 2016

juanrga :

There you go pulling wccftech.

juanrga · May 30, 2016

8350rocks :

juanrga :

Not everything is public right now, juan. You should know that.

I have seen FX8350 versus Zen ST cinebench scores...Zen is 50% higher than 8350.

A lot of information "seen" or "heard" about Zen was finally proven to be incorrect. There are even a set of fake slides about Zen still circulating by the Internet.

Not to mention that you didn't answer the rest of my questions.

juanrga · May 30, 2016

8350rocks :

Hum, my dislike for wccftech is twofold: (i) they have no idea about tech and often write nonsense and (ii) they often stolen content from others (including forums).

I have no problem with linking to an image stored in their servers, but if you want the source here goes

https://www.pugetsystems.com/labs/articles/AMD-A10-7850K-Performance-Review-529/

You can find other benchmarks where Kaveri is slower than Richland

8350rocks · May 31, 2016

Yes, Kaveri was slower than Richland because the improvement was insufficient to cover the clock speed regression. I predicted that would occur prior to launch, if you will recall.

As for more AMD news: https://www.reddit.com/r/Amd/comments/4lpooo/psa_until_amd_gives_us_more_info_please_avoid/d3pkwrr

Amazing post there summarizing known information and mixing in a light bit of conjecture over polaris.

Additionally, break out the salt, supposedly AMD is going to demo Zen at computex, and paper launch: https://www.reddit.com/r/Amd/comments/4luw95/amd_zen_reported_to_be_ready_for_presentation_at/

juanrga · May 31, 2016

8350rocks :

I recall you stating "facts" about Steamroller latter showed to have zero relation to reality: the non-existent "extra ALU" in the core, the non existent "massive increase in floating point calculations", and my favorite "do you think haswell will be as good as steamroller?" ;-)

http://www.tomshardware.co.uk/forum/361854-28-steamroller-sandy/page-2#10677240

On the other hand I recall that my predictions about Steamroller and Kaveri were off by ~5% from measurements on final silicon, with my major fault being on frequencies, because the reduction in frequencies introduced by the new 28SHP node was ~300MHz higher than I had expected.

8350rocks :

I stopped reading after "Observation 2", because the data he gives is incorrect.

8350rocks :

Lovely reading, specially the guy that claims that the elongated shape of the Zen die indicates HBM integration. LOL

I would also like to know what Fuad means by "confirmed that AMD has Zen x86 prototype chips ready for demonstration". "Prototype" can means anything from FPGA-based RTL prototype to PC (Production Candidate) silicon.

juanrga · Jun 1, 2016

The Zen Prototype showed at Computex was an Engineering Sample.

-Fran- · Jun 1, 2016

And how was it? I wasn't able to see the presentation xD

Did they show anything at all? 😛

Cheers!

jimmysmitty · Jun 1, 2016

http://www.anandtech.com/show/10391/amd-briefly-shows-off-zen-summit-ridge-silicon

8350rocks · Jun 1, 2016

Live stream recording of computex presentation: https://www.youtube.com/watch?v=ZwlQvjwYFEM

juanrga · Jun 1, 2016

8350rocks :

Let us assume for an instant that is true. Take the next graph

multiply the FX-8350 score and you obtain 144 points, which puts Zen just where many of us predicted or expected years ago. Where are your promises now?

Forget haswell, on par with skylake is coming...

JK says they will be better by skylake. My source says even if they fall short of predictions they will be "on par".

Jim Keller has had tons of time in the industry working on projects since K7...if he says it will be better, I have no reason to doubt. Though, like those I know at AMD, I consider it a win if they are within 5%.

Jim Keller said, "AMD are on track to catch Intel in high performance cores"

I said: Jim Keller expects them to be ahead by skylake, my source says he (conservatively) expects they will be about even by skylake, though he expected there would be within 5% difference in some workloads just because of differences in designs, and that would be fine...

cdrkf · Jun 1, 2016

juanrga :

Well assuming that is true (Zen as released will score 144 points in ST, and at relatively modest clock speeds if the info we're getting are accurate), then Zen will be neigh on level pegging with Intel for IPC, albeit at a bit of clock speed disadvantage. I mean the 4790k gets that score purely via higher clock rate, and we also know Skylake scores barely higher than Haswell. Whichever way you slice it that means Zen is a *huge* improvement from a competitive standpoint than at any point during the 'dozer' generation.

I personally would consider a zen ST cinebench score of 144 a big win, certainly anywhere in the 140+ region is fine. Assuming that kind of gain holds true across other single thread applications AMD should actually be a decent position (as they stand a chance of really challenging Intel in many of the MT benchmarks- certainly a 6 core Zen part would be considerably faster overall than intel's quad core i7s).

Lets hope this is the case (I'd argue this is probably best case, even an ST score in the 120 - 130 range would be an abnormally large generation jump in single thread performance). If it is- then going Intel or AMD would be an actual choice for enthusiasts and professional system builders again. That is what we all need- no AMD won't be dominant aka the A64 days, but they will be close enough that those without a massive bias will actually have cause to consider them outside the few niche areas they are competing in today.

8350rocks · Jun 1, 2016

juanrga :

8350rocks :

Let us assume for an instant that is true. Take the next graph

multiply the FX-8350 score and you obtain 144 points, which puts Zen just where many of us predicted or expected years ago. Where are your promises now?

Forget haswell, on par with skylake is coming...

JK says they will be better by skylake. My source says even if they fall short of predictions they will be "on par".

Jim Keller has had tons of time in the industry working on projects since K7...if he says it will be better, I have no reason to doubt. Though, like those I know at AMD, I consider it a win if they are within 5%.

Jim Keller said, "AMD are on track to catch Intel in high performance cores"

I said: Jim Keller expects them to be ahead by skylake, my source says he (conservatively) expects they will be about even by skylake, though he expected there would be within 5% difference in some workloads just because of differences in designs, and that would be fine...

I would be absolutely fine with scores in line with the 4960x or 5960x, especially considering the fact that the higher intel chips are there because of their increased clock speeds.

That is probably going to fall in line with Broadwell-E 8 core parts as well...which is just fine, too.

AMD CPUs, SoC Rumors and Speculations Temp. thread 2

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Glorious

Champion

Distinguished

Distinguished

Judicious

Distinguished

Share this page