AMD CPUs, SoC Rumors and Speculations Temp. thread 2

Page 56 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
The cache is split. The other cores will likely not be able to access it directly. Probably similar to the console SOCs with their L2s. It should work fine so long as the OS doesn't push threads from module to module all the time.
 


Seems like a unified large L3 would be more beneficial and using the same idea Intel had, where instructions are stored in the L3 cache in case they are needed again.

Who am I to say though.
 


Which at least on Windows it will. Windows allocates thread(s) on the first core that is open. If a thread was running on core 6, gets bounced by a higher priority thread, and when it gets scheduled core 0 is ready for a new thread, the thread goes on core 0. That's one reason why BD did so poor on Windows and why Microsoft had to modify the windows scheduler after BD launched.

If the cache is private per module [module = four cores], then I could see some hit (5-10%) in performance when threads bump across modules. The more threads, the worse the performance loss becomes. In that regards, DX12's focus on increasing threading might end up hurting AMDs Zen CPUs...
 


It is 8MB per cluster with "one region of 4MB for each pair of cores."
 


As usual they watched closely our yesterday work at SA forums and copied our identifications of structures in the die and the discussion about Orochi. If you want a more detailed annotated die, check next

http://postimg.org/image/y6n0d75cb/
http://www.chip-architect.com/news/Zen_Summit_Ridge_First.jpg


WCCFTECH copied wrong our discussion about performance. Orochi is usually associated to Bulldozer. Thus we compared performance to FX-8150 not to FX-8350.

Claims such as "this design is very similar to intels latest Broadwell EP cpu" and rest of nonsense found in the WCCFTECH page is their own, however.
 
A unified cache would be better but would also increase complexity. They can probably clock the cache higher by making it smaller and only serve 4 cores. It also make the arch scalable to more cores without reworking the L3 cache to serve more cores. It is probably a sacrifice in the design.

 


Clocks would be the same (otherwise power would increase), but access latency would be significantly reduced on a smaller L3 cache.

It is not a sacrifice for scaling up because they will be always scaling up by combining dies, but for scaling down. The reason why the server-like die is not a monolithic 8-core cluster with fully shared L3 is because AMD needs quad-core clusters for the future APUs.
 
This looks interesting....
http://www.electronicsweekly.com/news/business/information-technology/arm-amd-huawei-ibm-qualcomm-mellanox-and-xilinx-team-up-on-datacentre-2016-05/

My only thought, wasn't that the goal of HSA (or at least the HuMA aspect of it)?
 
With respect to the cache on Zen- from memory didn't AMD secure lots of patents relating to improving their cache designs whilst Keller was in charge? I'm sure I remember some headlines to that effect. If that is true the cache might prove to be more interesting than it appears at first.

One thought- as they have a pair of quad core modules, each with independent l3, I would think one solution to avoid stalls would be to duplicate everything in the two l3 blocks. That would reduce the overall l3 cache amount, but on the other hand if the chip has 16mb of it then I'd say that probably isn't much of an issue, especially on consumer oriented workloads?
 


That would reduce the effective L3 size to one half, which would hurt performance a lot of. Moreover, you would have a serious problem of interconnect BW because caches would be constantly communicating between them to synchronize their contents for both read and writes. It is much cheaper to maintain them separate, check the local L3 to find data, and if it fails then check the remote L3.
 
Interesting tidbit regarding cache latency, and potential clock speeds.

AMD has implemented a similar mechanism to intel's uop cache, which will supposedly help missed branch predictions by a wide margin while allowing deep pipelines so clock speed will not be dramatically impacted to decrease latency and reduce single thread penalty for a missed code branch.

I missed it in an earlier discussion about cache patent filings...however...I was reading about it elsewhere, and it jumped out.

So, essentially, the architecture will not impose any undue restrictions on clock speed itself, on process maturity will be the bigger culprit (think Intel's Sandy Bridge with deep pipelines and uop cache).
 


1) The main goal of the uop cache is to reduce the power and performance penalty of the x86 decoder and of the access to L1i.

2) Improving the branch prediction doesn't improve clock speeds, but improves performance via IPC.

3) The maximum achievable clock speed has been reduced because Piledriver was a speed-demon whereas Zen is a brainiac design. Zen is a wide design and thus cannot achieve the same maximum frequencies than a narrow core.

4) On top of that Zen will be made on a mobile-oriented process node, which will reduce clocks. Nobody would expect a 5GHz Zen CPU.

---------

I was very generous time ago and claimed~3.5GHz for Zen octo-core. But now with new info about the process node and details about the VRM/power management several knowledgeable people is claiming that Zen will be clocked at sub 3GHz* and will be a bad overclocker.

* Base clocks of about 2.6GHz and 2.8GHz are being considered.
 


Indeed it is.

@Juanrga - If those clocks are right, there won't be much difference between server cpu clocks and desktop cpu clocks. I find it hard to agree with Stilt's post that the surface area of cpu pin is so low that it won't support potential higher than almost 1V. I expect desktop cpu base clock around 3.2 GHz.
 


Well looking at Haswell E (which Zen is purported to compete with), base clock for the 8 core / 16 thread part is 3 ghz, with 3.5 ghz turbo. I could see Zen at similar speeds given it's probably one of the most comparable chips out there to what Zen is supposed to be. Maybe AMD have managed to get a more aggressive Turbo mode on there?

The other thing of interest is the Haswell E hex core part has a good bump in base clock- so I would imagine AMD could achieve something similar with Zen. The cheaper part would probably be the more interesting to games anyway, and would benefit more from the higher single thread performance, whereas the 8 core part would be more suited to video encoding, rendering and other such tasks that can make use of all those threads.
 


1.) While power consumption may be some of the goals for the uop cache, it also reduces the latency penalty for mispredicted branches.

2.) Correct, however, integrating a uop cache allows AMD to implement a cache fix that will allow them to retain long pipelines, instead of having to implement shorter pipelines which would cause the uarch itself to suffer from lower clockspeeds. This is part of your "it's a brainiac uarch, so lower clockspeeds" argument, and it is incorrect. High IPC uarch can have high clockspeed (a la intel, for example), as long as the pipelines are long enough, and they have a mechanism in place to reduce the penalties for mispredicted branches (long pipelines with mispredicted code branch penalties were a massive part of BD/PD issues).

3.) As I posted in the previous point, this is incorrect. It is a general tenet of differences between styles of designs...however...the biggest thing for a uarch to be capable of high clocks are having the proper pipelines, and the capability to take the voltage to push high clock speeds, and having a process that can cope with the leakage and other things that come along with high clockspeeds.

4.) Zen will be made on a custom version of a node that was geared toward LPP. Sure...your negativity in this regard is expected, and not at all surprising. Process is the one area where you might actually have something of a point. What I hear; however, does not align with your thoughts...
 


His claims are based in analysis of socket and VRM/power management for Zen. He found that AMD did spend an unusual amount of work to control voltages with very strict tolerances. That suggest issues with clocks and power consumption.

On the other hand Thevenin has access to Zen process node characteristics. He has claimed that the process is optimized for sub 3GHz frequencies.

Both claim sub 3GHz base frequencies for the 8C/16T variant: The Stilt claims 2.6GHz base, Thevenin claims 2.8GHz base, and both claim that Zen will be a bad overclocker.

My ancient claim about ~3.5GHz was based in my expectations for Zen core size (I predicted ~4mm²) and the use of an custom 14nm optimized for Zen. People has analyzed the Zen die shot and claims that Zen must measure 4mm² , but Thevenin said me that Zen is not using any custom process, but standard 14LPP process. Samsung 14LPP process node is optimized for mobile. It is optimized for frequencies smaller than 3GHz. Thus, I believe I was wrong about Zen frequencies and that both the Stilt and Thevenin must be closer to reality.
 


Well, AMD already has a more aggressive turbo on the last FX chips, just compare

FX-8350: 4.0GHz / 4.2GHz
FX-8370E: 3.3GHz / 4.3GHz

The six-core Haswell-E is not a good comparison. That chip is made on a version of the 22nm process tuned for high performance. The six-core can hit higher frequencies because the process node is not optimized for mobile. Zen is made on mobile oriented 14LPP process.

Why do you believe that since two years ago AMD only talks about IPC gains, never about frequencies?
 
I remember Thevenin posting these from which something can be said about clocks -

"It seems that unlike with Bulldozer, AMD has created separate dies for server and consumer parts. The server version of the die has twice the cores, L3 cache and additional I/O controllers per die. I haven´t been able to disassemble one yet, however judging from the package size it is a MCM part. 14nm LPP process.

The relative power consumption is roughly the same as on Intel 14nm parts with similar configuration, but the clocks are quite low :/

40501415 "

"For a long time I actually like what I see. I´d say as long as the consumer Zen parts can reach high enough clocks (min. 3.5GHz), everything will be pretty good"

The last post is from 16th March. His post doesn't seem to imply it can't reach 3.5GHz, not that I'd expect it to reach such clocks.

I think AMD talking about IPC is obvious because it's there that they are weak with BD. Improvement in IPC helps with efficiency while the same cannot be said about frequency.

@ 8350 - If you are using FX 8350, can you report what kind of power consumption you get at sub 3ghz frequencies?
 


The process being optimised for 'sub 3ghz frequencies' isn't the same thing as saying the process cannot go above 3ghz. Modern mobile processors are now in the high 2ghz range which tallies up. Pushing the speed beyond 3ghz will just use proportionately more power. I'm not suggesting we'll see a 4ghz base frequency chip here- but looking at the hex core part in particular you still have the same power budget of 95W and 2 less cores (4 less threads) + a corresponding reduction in cache (I'm assuming here). That will give them some wiggle room- yeah it's possible the 8 core part might be around (or just below) the 3ghz range for base clock. That said with a 25% reduction in execution resources for same power the hex core should have a corresponding increase in base clock. Lets be conservative and say that, due to the sub 3ghz optimization a 25% power reduction afforded by the reduction in core count translates to a 12.5% core clock increase- that would put a zen hex core part in the 3.3 - 3.4 ghz range of base clock.

I agree- AMD haven't talked much about clocks- but lets be clear here, they ARE talking about a substantial overall performance increase. If the chips cannot clock above 3ghz, that would negate pretty much all the performance uplift from the IPC improvements. My thinking is base clocks might be low, but so long as the Turbo can push at least some of the cores a lot higher then no problem. My guess would be circa 4ghz on single / dual threads, hopefully 3.8ghz on up to half the cores. That would put the speeds in a sensible range when running day to day tasks that don't use the whole chip.
 
This is a flat out easy answer to this it’s easier to increase frequency(over IPC) in a architecture by other improvements for example going to a die that wasn't meant for mobile or just a more mature process. I once said Zen 1.0 will be at sandy-ivy levels of performance. Given that fact, comparing a 8 core Zen to a 8150(as Juan said is the real numbers for double the performance not a 8350) would mean at least a 20-25% increase even at 3.0Ghz for a 8 core Zen when compared to a 8350.

That is also at 95 watts compared to 125 watts. Juan (and other sources) said they will not have a 125 watt CPU model but i simply think they might for a higher-end FX model even if it’s a super niche product, Even then it would only mean 200-300 MHz or slightly more.
 


True, although much of that 20 - 25% increase in multi threaded situations would be attributed to being a full 8 core + HT design. For Zen to meet expectations it needs to significantly best their current offerings in *single thread performance* as well. This does mean clocks can't be too low, at least when working with lower thread counts.

Thing is though if the chip features a strong (and sustainable) turbo mode, then low base clocks won't matter. I mean a 2ghz base speed wouldn't matter if the chip spins up to double of that most of the time (I almost think min / max frequencies are irrelevant these days, what matters is the *average* clock speed in specific scenarios).

To put it another way, if the 8 and 6 core parts really are that hampered on clocks, the performance uplift won't be there to justify them. If that's the case, I'd argue AMD would be focusing on a smaller quad core part that can clock higher. There's no point in gaining a 40% ipc increase, if that results in a 40% decrease in sustainable clock speed.

My guess, the 8 core part will have a low base clock but a 1ghz+ turbo range to compensate. The real question then becomes 'what speed is the cpu running at on average'? Looking at most of Intel's current processors, they reliably run towards the higher end of their turbo ranges, even when under heavy load, provided they have sufficient cooling. Hopefully AMD have achieved similar with Zen, in which case no issue. Over clocking is a separate issue of course, although I'd rather have a decent performing stock part over a slower part that over clocks better any day.
 


Here is why I think you are wrong...

The cinebench single thread scores for Zen are 50% higher than PD.

If we take AMD at face value and assume that the 40% number is IPC over carrizo. That puts the PD process about another 20% behind that. So, we are looking at 60% difference on uarch alone. Now, we consider that the IPC gains are 60%, and yet we have a 50% straight improvement in cinebench single thread...?

10% Clock speed deficit is the reason. This puts Zen's flagship part at 4.0*0.9 = ~3.6 GHz

Math is math.
 


If you back out some of the leaked benches (assuming that they are completely legitimate...of course)...you get consistent figures that extrapolate into a range around 3.5-3.6 GHz for the flagship part. It makes sense in a lot of ways. If you gain 40% IPC, to lose 10% in clock speed, that is an acceptable trade...you are still coming out 30% ahead.
 


Why is it hard to believe that Zen would have a lower than expected clock speed but perform much better? Look at Core 2. When Core 2 came out even the lowest end E6600 @ 2.4GHz pounded the 3.72GHz Pentium 965 EE at stock speeds. Anything higher made it look eve worse.

I find it highly possible that AMD has improved IPC enough that lower clocks are not going to hamper it as much, much like the Athlon 64 days where their clock speeds were much lower but their IPC was much higher

Will it beat out Intel? I don't think they are going to magically lead the current generation of Intel CPUs as that is a massive leap to do but they will be vastly more competitive and easier to recommend than they currently are..
 
'sup.
looked at the latest news. don't expect high clocks from 1st gen zen processors - which, i am guessing, is the first if not the only high performance processors fabbed by "14nm" ff process. first round of perf improvements will come mostly from the design improvements. clocks and power use will improve over time with process node maturity and fine tuning.

edit: by high clocks i don't just mean rated clockrate. i include turboboosting, sustained turbo clockrate, clockrate allocation around the processors etc.
 
Status
Not open for further replies.