AMD CPUs, SoC Rumors and Speculations Temp. thread 2

Page 37 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


I get your point. I was just mentioning that AMD roadmaps only show 8-core configurations.

When the construction cores had some core defective, AMD had to disable the full module, but this is no longer true with Zen and I doubt that they will have up half the die defective. 6-core? It is possible. Look at Intel. There is no Haswell-E quad-cores.
 


Zen has been designed within a Lego-like model for the Semicustom business. This doesn't mean that AMD will design multiple dies with different core counts for consumer PC CPU. They will only design one die for the server and will reuse for the FX CPUs the dies did not pass server-class qualification.

The question is will AMD design a second die for the future Zen APUs or will reuse the CPU die on a multidie configuration?
 


Zen-IPC-Gain.jpg


The slide says "Excavator core" and "Zen core". There is also a footnote citation #1 on the slide. The footnote says

Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core

My prediction about Zen core was 40% higher IPC than Piledriver core and it didn't include SMT. On another series of posts I discussed multithreading performance.

Then we have the GCC patch and the micro-architecture derived from it. 4ALU+2AGU + private L2 would give single-thread integer performance between Sandy Bridge and Haswell core. The pair of 128bit FMA units on Zen core would give floating point performance between Sandy and Ivy core.

This is estimation based in the little that we know about Zen. Final performance can vary by a number of reasons, for instance if Zen has some cache bottleneck then the performance of the core could be reduced.
 


I see that but again what is considered a "core" can be anything from the basics to the full part.

I would hope for it to be per core per clock but I am not going to make any assumptions because every time AMD has something new assumptions are made, hype is built up and it falls short very hard. Happened with Phenom and with Bulldozer.
 


Technically, it would be 40% per core; IPC is a scalar factor, same as clockspeed. IPC * Clockspeed gives you the performance of a single CPU core (Assuming no processing bottlenecks exist), and multiplied by the number of cores gives you total processor performance. This calculation assumes 100% core loading however; less then full load will result in less then expected results.

So yes, if you have a single-threaded workload, and you improve performance of a core by 40%, I'd expect a close to 40% performance increase. However, if the application only takes 50% CPU time, I'd expect closer to 20% gains. Hence why benchmarks are always much more sensitive to CPU performance increases then "real world" applications, as benchmarks stress CPU loads to 100%, where most other tasks do not.

Now, I'm assuming top end stock speeds are going to decrease from the ~4.0GHz speed the top-end FX 8350 is clocked at. Assuming ~3.5GHz is the clock of the top tier model (which is reasonable to assume), you get a ~13% clock speed delta between SR and Zen. Factor that in to single core performance, and we can reasonably expect closer to a 27% per-core performance increase, again, assuming no bottlenecks. Now, lets assume a slight performance hit for SMT (scheduling, for example), so lets reduce that to about 25% per-core performance increase. Now lets take another reasonable assumption: That non-benchmarks will only take about 50% CPU time at most. Now we're down to about 13% performance increases.

See how quickly I can reasonably reduce that 40% number down to just 13%, all the while making reasonable assumptions? That's why I remain very skeptical about what the final results are actually going to be in real-world situations. I fully expect 20%, 30% typical performance gains over SR, at most. So close to what Intel has now, but still likely behind similar priced Intel CPUs at the time Zen releases. That's my prediction, and I'm sticking to it.
 
@gamer, fine for single thread. However we are looking at an 8 core, 16 thread part.

On that basis I expect some Intel extreme edition matching numbers in seriously threaded workloads, certainly outpacing the top 'consumer' 4 core / 8 thread i7 by a decent margin.

Whilst Intel will probably still have the overall performance crown, I'm hopeful Zen packs enough punch to prompt Intel to offer more for the money (e.g. get some sensibly priced hex core parts out)....

As for game performance, I think the ipc boost could make amd a viable option again (above the i3) 😛
 
Personally? I'm predicting Haswell level performance, which is OK, but not if AMD is going to charge $300 for it's top CPU at launch. Which is what I suspect they'll try and do, leading to a lot of "good CPU, but too expensive to consider" posts.
 


only $300? nah zen will be priced to its performance and as 16 threads should for all intents destroy rendering performance on a 8 thread i7 it will be priced higher. I fully expect $500 to compete with lga 2011 socket "cheap" 6 core options.
 


@salgado&gamerk

Juan pointed out amd claims IPC over excavator. IPC means Instructions per clock. meaning 40% faster when both are clocked the same.

Excavator is said to be 10% better IPC than Piledriver.

Total improvement Per clock is 50% faster than Piledriver core.

If we downclock 8350 to 3.5, and had a quad core zen clocked at 3.5 we will have a proper scalar (I can downclock my 8350 when I get home and run some benchmarks to give us this baseline)

amd claims 8 core 16 thread for zen.

meaning single core performance will not improve, but scaled performance goes from 1 to 8, to 1 to 16.
real world scaling performance of BD was roughly 6.6x single core perf.

If we assume Zen scales equally to intel's 16 thread cpu (as 8350 scaled better than 3770k and I assume worst case scenario) we get a 11x of the single core performance for all cores doing a render type application.

You may be able to see where im going with this...

no matter how much faster kaby lake is in single core performance they wont be able to out render zen when you compare an 8 thread to a 16 thread. this puts zen right in place between the top end i7 mainstream and lga 2011 extreme cpu's.

I am still worried for zen as single core still has to be close to haswell/ivy bridge (@3.5ghz) for games to get same frame rates.

 
my estimate before I run benchmarks tonight is zen will do 135 for single core perf and 1500 in multicore perf in cinebench R15, and 9000 in 3dmark 11 physics.

anything else you want me to run?
 
The way I look at it, is you first figure out how much an individual core has improved in performance. Then you scale that across multiple cores to get "best case" performance improvements.

Assuming a 20% net per-core performance gain (I'm a pessimist), scaled to 8 cores (let's disregard SMT for a second), gets us a 160% (1.6x) best-care performance gain over SR. However, since we also need to consider SR was hurt when scaling due to how CMT worked, we'll bump that to 192% (assuming a 20% hit when CMT was in effect), or just under double performance assuming full processor load, before considering SMT. But again, just 20% per core. The less scaling you get, the less improvement in performance you get. That's the downside to building wide instead of tall; Intel will still be measurably faster in certain workloads.

I'll make the same prediction that I (correctly, as I remind everyone, again) made for BD: Benchmarking monster, but somewhat lacking in the real-word department. Like BD, I expect the total number crunching ability of the CPU to actually be greater then Intel's at launch, but it's design (wide rather then tall) will leave it starved most of the time.
 


you keep trying to count for cmt vs smt. amd claims 40% improvement from cmt to smt. meaning those assumed losses/gains are already accounted for.

I don't think your viewpoint of adding the same % improvement is a good point to use as it adds up way to much. I see zen getting 1.5X tops aka a 50% improvement. you show 1.6x and claim pessimist while that's more generous than my estimate.

back to my experiment for tonight to give us some baselines for PD cores running at 3.5. what benchmarks do you want to see?

EDIT: to keep Ram factored in as zen will have ddr4 I will keep my ram at 1866mhz but expect a minor .1-.2% faster due to ddr4 being faster ram.
 
Uhm... Clock speed... If they move between 3.5 Ghz and 4.0 Ghz from factory, they will be fine and we will be happy campers. Their claims won't be in the mud.

In any case, any information from GF and/or Samsung in regards to the node? Have they been taping out samples at least of anything using high power 14nm? The new Exynos chips are 14nm, right?

Cheers!
 


So, when Intel announces that they have 8% gains from haswell to broadwell, and everyone says "Oh look Intel got 8% in benchmarks that stress the CPU 100% load for X time frame..." People hype it up...even though through reasonable assumptions, we can make that 8% into a number <1% before we get 5 minutes into discussion.

This, being AMD, everyone sees 40%, and instead of being reasonably impressed that they are making strides and getting into the realm of being competitive in single thread performance...people are nitpicking and trying to pick apart what AMD is doing before it even hits the ground.

They are doing this in a thread about AMD future products as well...

Hmm....maybe we should stop being overly pessimistic/optimistic until the product comes...?
 


You reminded me...

2931532.jpg


Plus, bold part: Hell no. The truth is always in the middle ground, so the more information we get and extrapolate from the low amount of truths we get, will give us a clear picture for sure. We just need to keep more information coming and we will be able to more accurately speculate. Plus, most extreme points will have a counter argument (positive and negative), so it's healthy to keep discussing both.

Still, nothing speculated can be taken as truth, no matter how accurate we think that speculation is.

Oh well, from the information you guys have posted, I side with gamerk and Juan in regards to public availability going well into March 2017 for any Zen based product. In regards to IPC, we don't have much information, but my hunch goes into Integer performance or Integer heavy loads.

Cheers!
 


Difference is that Intel is normally never hyped. People have come to expect minor improvements in most areas, really good on new uArch in other areas. Their server market is different though as they tend to make good jumps there.

AMD however has tons of hype built up by the more diehard fanboys which never helps anything. I remember the hype for Phenom (K10). It was insane. Everyone was acting like it was the second coming of K8 and it failed for impress. In fact at launch it could barely outpace its predecessor. Of course people proclaimed clock speed differences but Phenom was having issues getting past 2.4GHz stable so it was another issue.

That is why speculation is fine but it is just speculation. AMD has also failed to live up to their last few CPUs in terms of being a good performer. Right now no one really would recommend a FX series CPU for a gaming system unless the budget absolutely calls for it but people with Sandy Bridge i5/i7s are being told to hold off still since their system is still decent for gaming.

Either way what will matter is the performance. For me as long as AMDs marketing doesn'y pull another BD with their performance previews, you know where they cherry pick the CPUs that it competes against only when it wins kind for stuff, I will be fine. But I have said it before, I will say it again and I will keep saying it. If Zens 8c 16t is competitive with Intels LGA2011 offering those who love AMD better be ready to open up their wallets, dig deep and pull out every last penny because it will not be $300 dollars. It will be priced near, at or above whatever it is near, at or above. The proof is in their GPUs. Fury X was launched near the 980Ti and the Nano is as well since it is the only high end mITX GPU in its class.

AMD is a business first so don't ever be surprised if you have to shell out some hard earned cash for their good stuff.
 


No, AMD is giving the 40% figure for single-core performance, not related to core scaling. And as noted above, the scaling for PD was about 6.6x over 8 cores, which comes to just under 20% performance loss from "perfect" scaling. You have to add that in at the end to account for that extra performance benefit. Granted, it's an overestimation (you won't ever get 100% scaling), but closer then the initial number without accounting for it.

I don't think your viewpoint of adding the same % improvement is a good point to use as it adds up way to much. I see zen getting 1.5X tops aka a 50% improvement. you show 1.6x and claim pessimist while that's more generous than my estimate.

Maximum Theoretical performance is a wonderful thing. You'll never actually see it though.

Here's the more realistic (and simple) case, which highlights why I think, realistically, 20% performance gains should be expected:

Performance = IPC * Clock * NumberOfCores

Lets assume for a minute IPC for PD for some random benchmark is a 10, for the purposes of establishing a baseline.

Performance = 10 * 4 * 8
Performance = 320

That's our baseline. Now lets add 20% to IPC, and reduce the clock to 3.5, like I suspect is likely for Zen:

Performance = 12 * 3.5 * 8
Performance = 336

Now lets add that 20% we won't lose due to how CMT works:

Performance = 336 + (336 * .2)
Performance = 336 + 67.2
Performance = 403.2

Performance Delta: 23.01% difference

Now let's take a different case, where the real IPC improvement is really 40% and clocks remain the same as PD, like some people are hoping:

Performance = 14 * 4 * 8
Performance = 448

Remove CMT penalty:

Performance = 448 + (448 * .2)
Performance = 448 + 89.6
Performance = 537.6

Performance Delta: 50.75%

This covers the maximum theoretical *best case* without factoring in SMT and assuming 100% perfect scaling at full loads.

Once we get some actual benchmarks, I can start doing some math and look for trends, same as I did for BD. Without factoring in SMT (I can't really account for it without knowing the implementation details), 50% is about the most you could gain architecturally if typical case IPC improvements are 40%. If the typical IPC improvements are less, or the clock is reduced, the overall performance of the chip goes down with it.
 


The x86 ISA is non-scalable and close to its limit. No engineer (Intel, AMD, or VIA) know how to design a x86 core with 50% higher IPC that Haswell whereas hitting ~4GHz clocks.

Moreover that Anandtech review is about desktop chips or simple Skylake. The main muarch improvements such as new 512bit ISA extensions or new hybrid memory architecture are coming exclusively to Skylake Xeons. My only doubt is if the Skylake-E series will include those improvements as well.
 


My year-old prediction for Zen was clocks around 3.5GHz, because I was expecting a reduction on clocks from AMD using a mobile oriented node this time.

However, in another forum a well-known user with inside info about foundries claims that the Samsung/Glofo 14LPP node provides about 70% of 32PDSOI clocks. He explicitly said he expects 70% of 4GHz for Zen. I did the computation and I got 2.8GHz, which sounds too low to me, but well he has inside foundry info I don't.
 


See the bold. I gave it the 50% ipc improvement from PD. Remember they claim 40% over excavator!

Im horrible with BB code sorry. I account for a 3.5 ghz clock and 50% improvements

 


I don't think AMD could justify releasing Zen at sub 3ghz clocks on the desktop (it wouldn't make sense as it would probably be a performance regression from top end PD based parts).

The thing with clock speeds and processes though is it's all related to power consumption. I don't believe it will *max out* at 2,8ghz, although that is probably the 'sweet spot' where power goes up disproportionately compared to clock speed thereafter. That would suggest that a desktop Zen design (which needs to push clocks for single core performance) will probably not be all that power efficient, a bit like Kaveri. However I expect they will release a lower clocked version that is more efficient like the did with Kaveri (the A8 giving 90% of the performance of the top A10 at only half the power) and server parts will probably clocked lower as well.

Either that or they have a low base clock but a super aggressive turbo instead (so 2,8ghz base with 3.5ghz + turbo on up to 4 cores)...

I mean a low clock with high core count would make sense in servers but I can't see a part like that appealing to desktop / workstation users much at all.
 
Please remember that BDs clockspeed was limited by power comsumption in the first place, not the clock speed wall of the process (see FX 9590)

The sweet spot of 14LPP might be even lower than the one of 32nm PD-SOI, but we're not talking about just a much faster core, but a much more efficient core as well... so clockspeed remains to be seen
I still see 4ghz as possible if 14LPP scales well, even more the claimed 3.5
 
BD was a speed-demon microarchitecture. The goal was to hit higher frequencies at cost of IPC. BD did initially target too high frequencies and the Glofo process node couldn't support them. Only recently a mature 32PDSOI and certain microarchitecture tweaks allowed Piledriver to hit 5GHz turbo. That is still far from the maximum theoretical allowed by the microarchitecture: ~10GHz.

The 28SHP process used in Kaveri cannot hit frequencies so higher as 32PDSOI because is a process node optimized for density not performance.

The 14LPP process node cannot hit frequencies so higher as 32PDSOI because is a process node optimized for mobile applications. Phones run at sub 3GHz; therefore, it is likely to think that 14LPP is not optimized for higher frequencies.

We could maybe expect ~2.8GHz base and ~3.5GHz turbo. What is beyond any reasonable doubt is the possibility that Zen hits 4GHz base.
 
Status
Not open for further replies.