Discussion: AMD Ryzen

Reynod · Aug 19, 2016

Looks like some significant work in cache design has been implemented. That was a major weakness they needed to resolve.

salgado18 · Aug 19, 2016

juanrga :

cdrkf :

Blender is FP heavy benchmark.

@ 3GHz Broadwell 8C gives a maximum throughput of 768 GFLOPS.
@ 3GHz Zen 8C gives a maximum throughput of 384 GFLOPS.

Therefore it is evident that they are playing with compiler settings to get Zen achieve the same performance than Broadwell-E. Moreover, Blender code mix is unusual and must be favoring the four half FP pipes of Zen over the two full pipes on Broadwell microarchitecture.

Excellent marketing move!

Can they double performance with compiler settings?

Cazalan · Aug 19, 2016

So Naples is a socketed SoC. They don't have a chipset anymore for glueless 2P servers. Nice!

Edit: The ATS2500 on there is a remote management processor common on server platforms. The Xilinx FPGA (Spartan3, XC3S200AN) is likely just for debug purposes. It is much too low end to do anything really useful. A Spartan 6 could support PCIe-E or SATA.

juanrga · Aug 19, 2016

salgado18 :

juanrga :

Can they double performance with compiler settings?

No, but they can reduce the performance of Broadwell by nearly one-half, if you select AVX-like flags and force Broadwell to use only half the width of the 256bit FP units.

ComputerSecurityGuy · Aug 19, 2016

But there isn't much for them to benefit by doing that level of mucking about. It'll just cause bad press and they would still have a lousy chip.

-Fran- · Aug 19, 2016

Juan, the same can be said with Bulldozer/PD. Like I've been saying, in Linux, when compiled correctly, it's closer to Intel CPUs; or at least doesn't show the abysmal disparity it shows in Windows. I do remember linking some benchies from Phoronix to prove the point as well.

Also, the FLOPs you quote using AVX2. Can you give me a date when Windows software actually uses it so I give a damn? Thanks!

Cheers!

ComputerSecurityGuy · Aug 19, 2016

However, this was on Windows 10. At least the images indicate so.

cdrkf · Aug 19, 2016

juanrga :

Juan I think the case is Blender doesn't support AVX2, in which case the theoretical throughput of both is the same (and interestingly they finish the run about the same speed). So it's possibly less a case of special compiling / settings and more a case of a well picked benchmark. To put it another way, you'd have to dig around to find anything that will highlight that theoretical advantage in Broadwell as not much software uses the new extension (yet)...

Also with respect to it being 6 issue vs 8, I think you need to keep in mind the balance of ALU and AGU (4 and 2 for AMD vs 4 and 4 for Intel). If I'm understanding this correctly it's often the case that AGUs are used less than the ALU, so depending on the specific situation AMD are either the same or slightly behind depending on what resources are needed (but they are unlikely to be a full 33% slower in integer). AMD site this as the optimum choice based on a power / performance view point (i.e. they expect the 2 additional AGU's would likely sit idle)...

-Fran- · Aug 19, 2016

ComputerSecurityGuy :

OS of choice has a small bearing other than the scheduler doing a decent job managing the threads and other loads facing the CPU. What really differentiates one software *build* from another is code optimization first and instruction set used second. This is where Juan is implying AMD "cheated" by capping Blender to use AVX1 (or so I understood) instead of enabling AVX2 or better.

I don't think AMD would go through the trouble of compiling Blender (which is a PITA to do) capping what it can do just to show Zen is a good light. I believe they just grabbed the current stable version (linked here) and ran the tests.

Cheers!

salgado18 · Aug 19, 2016

ComputerSecurityGuy :

-Fran- :

They wouldn't compile Blender to cripple the competition, that would be a huge bomb to blow after launch. I also don't think they would optimize Blender code for Zen, without contributing it to source (also too much resources for just a bit of hype).

However, I do think they not only cherry-picked the tool, they probably also cherry-picked the version. It may not be the latest, but instead one that gives them even a second less in the benchmark.

Even then, it is impressive that IPC seems close to Broadwell-E (even though Blender was their best benchmark for Bulldozer too).

Cazalan · Aug 19, 2016

Blender 2.72 Notes

Optimizations
CPUs with support for AVX2 (e.g. Intel Haswell) render a few percent faster.

AMD didn't have to change any flags. Blender just doesn't get much gain from AVX2 for whatever reason.

Cazalan · Aug 19, 2016

-Fran- :

Auto-vectorization for AVX2 is likely junk unless you fit Intel's Parallel Studio XE Suite into your tool chain.

Open source software like Blender is going to use an open source compiler. In some sense you could say the benchmark was cherry picked, but it is also representative of what the vast majority of developers are going to see.

It is up to Intel how much effort they want to put into GCC. If they put all their resources into Parallel Studio then 95% of the developers will probably never see it. People in the HPC world will likely have good AVX2 support, but that is only a subset of the server market.

So in this case AMD is properly designing to the lowest common denominator and should reap those benefits. This is in essence what all of the up and coming ARM server platforms are trying to do as well. Providing a bunch of cores with gobs of memory but few bells and whistles. Anyone who thought this would work for ARM vendors is basically agreeing with AMDs strategy here.

-Fran- · Aug 19, 2016

Cazalan :

-Fran- :

Auto-vectorization for AVX2 is likely junk unless you fit Intel's Parallel Studio XE Suite into your tool chain.

Open source software like Blender is going to use an open source compiler. In some sense you could say the benchmark was cherry picked, but it is also representative of what the vast majority of developers are going to see.

It is up to Intel how much effort they want to put into GCC. If they put all their resources into Parallel Studio then 95% of the developers will probably never see it. People in the HPC world will likely have good AVX2 support, but that is only a subset of the server market.

So in this case AMD is properly designing to the lowest common denominator and should reap those benefits. This is in essence what all of the up and coming ARM server platforms are trying to do as well. Providing a bunch of cores with gobs of memory but few bells and whistles. Anyone who thought this would work for ARM vendors is basically agreeing with AMDs strategy here.

Even better then. I stand corrected (taking your previous post into account) on Blender that is actually using all the bells and whistles from Intel, but doesn't gain much from them.

Interesting piece of information with PS:XE. I didn't know that. Does AVX2 has special treatment in Compilers to maximize performance throughput in Intel CPUs?

Cheers!

TMTOWTSAC · Aug 19, 2016

AMD probably just ran every benchmark they could get their hands and looked for the ones where Zen performed best. It's a lot simpler and less risky than fudging a benchmark.

Occam's benchmark.

Cazalan · Aug 19, 2016

TMTOWTSAC :

Or a case of practicing what they preach. AMD has taken a much stronger open source stance in recent years. So it makes perfect using open source programs for benchmarks.

http://developer.amd.com/tools-and-sdks/open-source/

Cazalan · Aug 20, 2016

SP3 pincount.

SP3 has significantly more pins. 4094 to be exact.
- The Stilt

juanrga · Aug 20, 2016

ComputerSecurityGuy :

Like when they promised a 1050GFLOPS Kaveri but shipped a sub-900 GFLOPS Kaveri? Like when they promised the moon for Carrizo and showed several designs Carrizo wins that latter didn't appear in any store? Like when they said that certain card was a "overclockers' dream", just before reviews showed how bad was the overclocking? Like when they promised a 16-core Seattle but gave us a 8-core Seattle after a two years delay? Like when they used odd settings benchmarks for hyping the 300 series? Like when they promised us again and again that Zen was a 2016 product, (funny enough I have known for a while it was 2017 and I have been saying it in forums), then Digitimes posted a rumor about AMD delaying Zen to 2017, AMD reacted attacking DigiTimes saying that the rumor was false, and now AMD confirms a delay of Zen to 2017? And that is the short list.

It is evident to me that they have cherry picked the benchmark for Zen vs Broadwell and that final reviews will show lots of benchmarks where Zen is outperformed. I have a 100% certainty on this. Mark my words.

juanrga · Aug 20, 2016

cdrkf :

I think I read in some part that recent versions of Blender support both AVX and AVX2 and fall-back to SSE when the chips doesn't support those.

cdrkf :

Issue width is unrelated to number of ALUs and AGUs.

Sandy/Ivy: 3 ALU + 2 AGU; 6-issue
Zen: 4 ALU + 2 AGU; 6-issue
Haswell/Broadwell: 4 ALU + 3 AGU; 8-issue

juanrga · Aug 20, 2016

Cazalan :

And if we continue reading it says (bold from mine):

Makes rendering on Haswell CPUs a few percent faster, only benchmarked with clang on OS X though.

Add that version 2.76 added another small optimization for Haswell CPUs, and consider that Broadwell microarchitecture introduced some optimizations on the AVX2 set (I don't know if this speeds-up Broadwell on Blender).

juanrga · Aug 20, 2016

About power, I will give a relevant quote from https://www.pcper.com/reviews/Processors/AMD-Zen-Architecture-and-Performance-Preview:

comparable TDPs to Broadwell-E.

If they overclocked the engineering sample from 2.8GHz to 3GHz and underclocked the Broadwell chip from 3.2GHz to 3GHz then TDPs would be similar. A rough computation gives

8C Zen @3GHz ~117W
8C BDW @3GHz ~123W

With a more mature process, Zen could hit the target 95W for 3GHz.

Said the above I also think that 95W for AMD didn't mean the same than 95W for Intel due to different ways to define/measure TDPs. I think a 95W AMD chip will dissipate about 85% of a 140W Intel chip.

Reynod · Aug 22, 2016

juanrga :

See if your so anti-AMD then maybe you should post elswhere Juan?

The last thing we need is another AMD vs Intel war.

Mark my words you sir will be the first casualty ... along with any of the others who chose to start and engage in any slinging matches.

Since you don't actually have any facts beyond a video AMD released and a couple of slides, it seems highly unlikely that you can accurately extrapolate, in any way, shape or form ... the real performance of these new CPU's.

Whilst you are free here to make predictions, I feel it is important to point out that going on your past history, when you were a committed AMD fanboi, you didn't exactly prove an accurate predictor of the performance results in the end.

If your still angry about being jilted in some way then you need to move on ... accept that all manufacturers spin things up a bit, and spend some time getting lost in a good game.

I recommend FreeLancer ... because I am oldskool geek.

I find the music soothing ...

salgado18 · Aug 22, 2016

Reynod :

I agree with Juan, actually. AMD has a past of generating hype over future tech, especially when they don't have something exciting or new (Bulldozer and Radeon 300 series, for example). These were just facts, can't deny it.

I also agree that, when external benchmarks appear, some will show Zen being outperformed by Broadwell-E, at least. Remember that Intel has huge resources and excellent chips, and is at least one generation ahead. The scale will tip in Intel's favor, also a fact.

However, Zen is shaping up to be a very competitive chip, and if it is slower, it won't be by much. Probably within 5% or 10% tops of Intel's. Then AMD just need to play a careful value game to be in good business.

People keep attacking Juan (even mods?), but all he does is 1) bring facts, many with excelent analysis (which is always good), and 2) talk about himself (which pisses people off). But we all have some ego, right? So we have to excuse him for talking about his stuff, because we do the same on different levels. Besides, if we read more when he talks about him than when he talks about chips, it's our personal problem with him, and should keep it to ourselves.

-Fran- · Aug 22, 2016

salgado18 :

All companies generate hype. Not just AMD. Falling short of expectations is something entirely personal and cannot be "fact". Since manipulating data can be false advertising, no company does it. They do pick stuff to show themselves in a positive light because which company wouldn't. To your two particular examples: Bulldozer was a new uArch and AMD fell short on several fronts, but that was in light to the competition of the time, but I don't remember them showing fake slides to prove their point. I do remember them benchmarking against the i7 980X (or 990X?) and the numbers were true for the benchmarks provided. For the 300 series example, they are faster than the 200 series and they never said anything about improving Hawaii, Tonga or Pitcarin core design with the 300 series. Only faster clocks and call it a day. Same refreshes have been done by nVidia multiple times as well (GTX680 -> GTX770; GTX580M -> GTX675M; 800 OEM-only series).

All the things Juan mentions are his own subjective interpretation of the data provided. Just like I have my own and you have your own.

Particularly, when you're being over sensitive to topics you lose objectivity and your analysis loses focus (sometimes becomes a rant than a piece of analysis). There is not a single human being out there that is exempt from this, but some of us are aware at least.

Cheers!

8350rocks · Aug 22, 2016

juanrga :

Cazalan :

In the first place, the known Zen die has a core-granularity of four. There are four cores in each cluster and it seems that individual cores cannot be disabled inside each cluster. There are only 4-core and 8-core engineering samples, not 6-cores and the above AMD roadmap for desktop CPU shows only 8-core and 4-core. Moreover the next AMD roadmap for mobile only shows 4-core APUs, not 2-cores.

AMD-PRO-Mobile-Roadmap-Zen-Summit-Ridge-Raven-Ridge.jpg

In the second place, disabling cores doesn't imply we can break the physical laws of silicon, thermodynamics, or microarchitecture limitations. We cannot take a 3.5GHz i5 disable two cores and get a boost to 6GHz on air for instance. We have more die area but the heat generated continues being concentrated on a small die area (the production of heat is not distributed across the whole die). Disabling one cluster on the 8-core die Zen could give us about 350MHz more (my estimation).

By the same token, your heat is dissipated into the other parts of the die more effectively because you have less overall heat generation.

Now, saying that, you will not get 6 GHz on air...but instead of 4.0, you might get 4.5 by cutting down cores.

8350rocks · Aug 22, 2016

juanrga :

Blender does not utilize AVX2.

Just because there are instruction sets out there does not mean every program adopts them. This, of course, is what I am referring to when I say "real world performance" in regards to actual performance differences.

@gamerk: The details about the cache are quite interesting, I knew about the higher bandwidth, but I did not realize they grew the L2 so much. This, actually, looks really good.

Discussion: AMD Ryzen

Administrator

Distinguished

Distinguished

Distinguished

Admirable

Glorious

Admirable

Judicious

Glorious

Distinguished

Distinguished

Distinguished

Glorious

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Administrator

Distinguished

Glorious

Distinguished

Distinguished

Share this page