http://wccftech.com/amd-zen-architecture-hot-chips/
Not sure how much is new but heres some zen info from hot chips, I await your analysis guys
There's a lot of interesting information in there, even keeping things in perspective since it's WTFBBQTech.
Unfortunately, I don't have any smart remarks to say about the high level overview of Zen. The only thing that caught my eye, was the logic around the 2 AGUs. I think the uops cache is paying of in the design and the re-organization of the L2 and L3 cache. Other than that, nothing else caught my eye.
The micro-op cache is an interesting addition. Intel introduced a 1.5K uOp cache to Sandy Bridge, though it's hard to say how much of Sandy's IPC improvement directly resulted from it. What's ironic is that the AMD-K6 was the first x86 to have a micro-op cache, and it was a whopping 20K at that.
From what I remember from my University course about CPU design, when the ALUs are 100% symmetrical you need to have them aligned with the number of AGUs so the number of dispatched operands don't fall into a queue for long waiting for the AGUs to do the addressing in all INT and memory ops. Since the ALUs are no longer symmetrical in current designs, the differences are usually balanced out through bigger op queues or a lower number of them; or at least, that is what I can think of about them.
It's been 10 years since I took that course though, so I might as well be outdated on this.
The argument made by David Kanter and myself already consider asymmetric ALUs (that is what he means when mentions branches).
From what I remember from my University course about CPU design, when the ALUs are 100% symmetrical you need to have them aligned with the number of AGUs so the number of dispatched operands don't fall into a queue for long waiting for the AGUs to do the addressing in all INT and memory ops. Since the ALUs are no longer symmetrical in current designs, the differences are usually balanced out through bigger op queues or a lower number of them; or at least, that is what I can think of about them.
It's been 10 years since I took that course though, so I might as well be outdated on this.
Cheers!
Cazalan :
Melonious :
utilizing SIMD is not magic and compiler settings don't do a thing. You have to actually manually use assembly or intrinsics to make it happen. And most programs don't. So it is not cherry picked even it is just a 'real world' result.
Artificial benchmarks don't mean a damned thing when the real software doesn't take advantage of all the features. And that is why they support the older extensions more and so does ARM.
That has been the general complaint of x86 CPUs for the last several years. Minor IPC gains unless you use the special new instructions. AMD has compromised by sticking with 128 bit FP units. If the application doesn't take significant advantage of AVX2 then AMD is looking good.
The problem is not on using 128bit units. AMD could have used 4x128bit units to get twice the max throughput, but then they would have to improve everything from the front end to the commit, including doubling the caches' BW, which would generate very complex engineering problems.
I see same mistake when people claims that Bulldozer performed badly because has a FP shared between two cores in a module. This is not rigth. Bulldozer performed badly because the FPU was only 256bit wide. If Bulldozer module had incorporated a 1024bit wide unit then it would beat the best designs Intel has. But of course a 4x bigger FPU would require lots of extra transistors and logic and power and...
And the excuse that AMD Zen uses 128bit units because 256bit is not very popular doesn't hold up on close inspection, because (i) AMD is supporting 256bit on Zen via fussing out the pair of 128bit units and (ii) AMD has been supporting ISAs with much more less popularity, including HSA.
The micro-op cache is an interesting addition. Intel introduced a 1.5K uOp cache to Sandy Bridge, though it's hard to say how much of Sandy's IPC improvement directly resulted from it.
Minimal because instructions already stream from the instruction cache at the pipeline’s full rate, the new design saves cycles only when execution can be restarted from the L0 cache after a mispredicted branch. The mayor improvement introduced by the uop cache is on the side of efficiency, because the x86 decoders are power hungry and can be shut down when fetching from the uop cache. In Haswell the total power is reduced by about 12%.
scuzzycard :
What's ironic is that the AMD-K6 was the first x86 to have a micro-op cache, and it was a whopping 20K at that.
Link? I have just looked to the K6 diagram and there is no mention of that.
I just looked it up and this site calls it a pre-decode cache. Now maybe I'm getting my terminology mixed up, because that sounds more like the trace cache the P4 had.
But there isn't much for them to benefit by doing that level of mucking about. It'll just cause bad press and they would still have a lousy chip.
Like when they promised a 1050GFLOPS Kaveri but shipped a sub-900 GFLOPS Kaveri? Like when they promised the moon for Carrizo and showed several designs Carrizo wins that latter didn't appear in any store? Like when they said that certain card was a "overclockers' dream", just before reviews showed how bad was the overclocking? Like when they promised a 16-core Seattle but gave us a 8-core Seattle after a two years delay? Like when they used odd settings benchmarks for hyping the 300 series? Like when they promised us again and again that Zen was a 2016 product, (funny enough I have known for a while it was 2017 and I have been saying it in forums), then Digitimes posted a rumor about AMD delaying Zen to 2017, AMD reacted attacking DigiTimes saying that the rumor was false, and now AMD confirms a delay of Zen to 2017? And that is the short list.
It is evident to me that they have cherry picked the benchmark for Zen vs Broadwell and that final reviews will show lots of benchmarks where Zen is outperformed. I have a 100% certainty on this. Mark my words.
See if your so anti-AMD then maybe you should post elswhere Juan?
The last thing we need is another AMD vs Intel war.
Mark my words you sir will be the first casualty ... along with any of the others who chose to start and engage in any slinging matches.
Since you don't actually have any facts beyond a video AMD released and a couple of slides, it seems highly unlikely that you can accurately extrapolate, in any way, shape or form ... the real performance of these new CPU's.
Whilst you are free here to make predictions, I feel it is important to point out that going on your past history, when you were a committed AMD fanboi, you didn't exactly prove an accurate predictor of the performance results in the end.
If your still angry about being jilted in some way then you need to move on ... accept that all manufacturers spin things up a bit, and spend some time getting lost in a good game.
I recommend FreeLancer ... because I am oldskool geek.
I find the music soothing ...
I am not anti-AMD, see my signature, but I am anti-hype and misinformation. And my comments aren't mean to be interpreted as some AMD vs Intel war. I am replying to demo of Zen vs Broadwell made by AMD. If they had made a Zen vs Piledriver demo and claimed that Zen is ~3x faster clock-for-clock I would be reacting in exactly the same way. To put things in perspective
According to AMD Zen @3GHz would be somewhat in the 100 second mark, whereas the FX-8350 is 320 seconds. This implies that Zen would have ~4.26x higher IPC than Piledriver (recall AMD own slides stating that Zen is ~2x faster than FX-8350 clock-for-clock).
The Stilt have made a similar analysis. He downloaded last version of Blender and benchmarked his Piledriver and Haswell chips. He tested single thread and found that Zen core would be 140% faster than PD core to match his Haswell. According to AMD Zen is ~40% faster than XV; therefore XV would be about 100% faster than PD. Numbers don't match by a huge amount. He claims to be puzzled by the result and he got similar conclusions than me.
We are not talking about a small "spin", about a 10% here and a 5% there to put products in a better shape. We are talking about huge gaps . What are we supposed to do? Say "yes" to anything published/advertised by companies and don't use our brain? Then why the forums? The news section would be enough.
And my facts aren't a video demo, but the details of the microarchitectures, and the good record of predictions that I have made about Zen. A good amount of the information in those slides you mention was posted by me here before those slides were even made by the marketing goods.
That is actually not so hard to believe...
Consider this...
8C zen has twice as many FPUs. So, assuming they are equal, we are already at +100% performance...now...we can assume +40% per AMDs estimates over piledriver, and finally, we can also include the fact that zen can run AVX, but piledriver cannot. Which would put another theoretical 60% blue sky for benchmarks in there.
So, we end up +200% total, which is not far off your calculations (20% difference...).
But there isn't much for them to benefit by doing that level of mucking about. It'll just cause bad press and they would still have a lousy chip.
Like when they promised a 1050GFLOPS Kaveri but shipped a sub-900 GFLOPS Kaveri? Like when they promised the moon for Carrizo and showed several designs Carrizo wins that latter didn't appear in any store? Like when they said that certain card was a "overclockers' dream", just before reviews showed how bad was the overclocking? Like when they promised a 16-core Seattle but gave us a 8-core Seattle after a two years delay? Like when they used odd settings benchmarks for hyping the 300 series? Like when they promised us again and again that Zen was a 2016 product, (funny enough I have known for a while it was 2017 and I have been saying it in forums), then Digitimes posted a rumor about AMD delaying Zen to 2017, AMD reacted attacking DigiTimes saying that the rumor was false, and now AMD confirms a delay of Zen to 2017? And that is the short list.
It is evident to me that they have cherry picked the benchmark for Zen vs Broadwell and that final reviews will show lots of benchmarks where Zen is outperformed. I have a 100% certainty on this. Mark my words.
See if your so anti-AMD then maybe you should post elswhere Juan?
The last thing we need is another AMD vs Intel war.
Mark my words you sir will be the first casualty ... along with any of the others who chose to start and engage in any slinging matches.
Since you don't actually have any facts beyond a video AMD released and a couple of slides, it seems highly unlikely that you can accurately extrapolate, in any way, shape or form ... the real performance of these new CPU's.
Whilst you are free here to make predictions, I feel it is important to point out that going on your past history, when you were a committed AMD fanboi, you didn't exactly prove an accurate predictor of the performance results in the end.
If your still angry about being jilted in some way then you need to move on ... accept that all manufacturers spin things up a bit, and spend some time getting lost in a good game.
I recommend FreeLancer ... because I am oldskool geek.
I find the music soothing ...
I am not anti-AMD, see my signature, but I am anti-hype and misinformation. And my comments aren't mean to be interpreted as some AMD vs Intel war. I am replying to demo of Zen vs Broadwell made by AMD. If they had made a Zen vs Piledriver demo and claimed that Zen is ~3x faster clock-for-clock I would be reacting in exactly the same way. To put things in perspective
According to AMD Zen @3GHz would be somewhat in the 100 second mark, whereas the FX-8350 is 320 seconds. This implies that Zen would have ~4.26x higher IPC than Piledriver (recall AMD own slides stating that Zen is ~2x faster than FX-8350 clock-for-clock).
The Stilt have made a similar analysis. He downloaded last version of Blender and benchmarked his Piledriver and Haswell chips. He tested single thread and found that Zen core would be 140% faster than PD core to match his Haswell. According to AMD Zen is ~40% faster than XV; therefore XV would be about 100% faster than PD. Numbers don't match by a huge amount. He claims to be puzzled by the result and he got similar conclusions than me.
We are not talking about a small "spin", about a 10% here and a 5% there to put products in a better shape. We are talking about huge gaps . What are we supposed to do? Say "yes" to anything published/advertised by companies and don't use our brain? Then why the forums? The news section would be enough.
And my facts aren't a video demo, but the details of the microarchitectures, and the good record of predictions that I have made about Zen. A good amount of the information in those slides you mention was posted by me here before those slides were even made by the marketing goods.
Let me ask you two things, Juan, that are bothering me with those affirmations:
1. are you considering that Zen is an 8-core, 16-thread cpu, when comparing it against the 8-thread Piledriver? Because all numbers on Zen should be halved (or PD's doubled) for a fair comparison.
2. Did Stilt matched clocks between cpus? If not, just by using an FX-8350 or an FX-9590, results would be a lot different. Also against what Haswell? i5 or i7? We are talking IPC here, so obviously it should be all directly comparable.
1. Yes I am considering SMT. SMT brings usually between 0% and 40% gains depending of the code. I am using 20% as average in my posts. I did it just yesterday again when I explained why Zen would be ~2x faster (average) than PD clock-for-clock on multithreaded applications
SMT doesn't double performance (that is impossible because execution units are shared). In any case check the i5 and i7 in the Blender benchmark given above.
2. Yes, he tested at same clocks and did other changes such as disabling two memory channels on the Haswell side... He used a Xeon model.
SMT changes very little, basically nothing in FPU ops.
However, having double the physical number of FPUs on one chip is a huge difference.
That has been the general complaint of x86 CPUs for the last several years. Minor IPC gains unless you use the special new instructions. AMD has compromised by sticking with 128 bit FP units. If the application doesn't take significant advantage of AVX2 then AMD is looking good.
Pretty much...new x86 extensions make for some nice blue sky numbers.
However, unless you personally hand write your own linux distro specifically for the maximum hardware capability you personally use, default compiler settings will show the gap is not nearly as large as many people want you to believe.
Hi, does someone know if the AMD zen cpu's will have version for different prices? or are they all going to be 150euro++++ ?
The initial launch of Zen will be aimed strictly at the enthusiast market.
There will be zen offerings later that will be lower core counts and lesser costs than the 8 core flagship, though.
The most mainstream part looks to be a 4 core CPU or APU.
It sure would be something if we had a repeat of the Summer of 99. When the Athlon came out, I bought it the very first day. Out went my Pentium III-500 and in went an Athlon-600, and some things ran almost TWICE as fast. Not only that, but that piece of launch-day silicon happily overclocked to 750 with the "gold fingers" device.
Maybe I'm being cynical but I don't think that's going to happen this time around. The fact that they are only showing a 3GHz chip is because they're either bluffing, or they're actually struggling to get it to go faster. Time will tell.
utilizing SIMD is not magic and compiler settings don't do a thing. You have to actually manually use assembly or intrinsics to make it happen. And most programs don't. So it is not cherry picked even it is just a 'real world' result.
Artificial benchmarks don't mean a damned thing when the real software doesn't take advantage of all the features. And that is why they support the older extensions more and so does ARM.
That has been the general complaint of x86 CPUs for the last several years. Minor IPC gains unless you use the special new instructions. AMD has compromised by sticking with 128 bit FP units. If the application doesn't take significant advantage of AVX2 then AMD is looking good.
The problem is not on using 128bit units. AMD could have used 4x128bit units to get twice the max throughput, but then they would have to improve everything from the front end to the commit, including doubling the caches' BW, which would generate very complex engineering problems.
I see same mistake when people claims that Bulldozer performed badly because has a FP shared between two cores in a module. This is not rigth. Bulldozer performed badly because the FPU was only 256bit wide. If Bulldozer module had incorporated a 1024bit wide unit then it would beat the best designs Intel has. But of course a 4x bigger FPU would require lots of extra transistors and logic and power and...
And the excuse that AMD Zen uses 128bit units because 256bit is not very popular doesn't hold up on close inspection, because (i) AMD is supporting 256bit on Zen via fussing out the pair of 128bit units and (ii) AMD has been supporting ISAs with much more less popularity, including HSA.
I expected readers to understand the same number of 128bit FP units. No need for this erroneous tangent.
From what I remember from my University course about CPU design, when the ALUs are 100% symmetrical you need to have them aligned with the number of AGUs so the number of dispatched operands don't fall into a queue for long waiting for the AGUs to do the addressing in all INT and memory ops. Since the ALUs are no longer symmetrical in current designs, the differences are usually balanced out through bigger op queues or a lower number of them; or at least, that is what I can think of about them.
It's been 10 years since I took that course though, so I might as well be outdated on this.
Cheers!
Cazalan :
Melonious :
utilizing SIMD is not magic and compiler settings don't do a thing. You have to actually manually use assembly or intrinsics to make it happen. And most programs don't. So it is not cherry picked even it is just a 'real world' result.
Artificial benchmarks don't mean a damned thing when the real software doesn't take advantage of all the features. And that is why they support the older extensions more and so does ARM.
That has been the general complaint of x86 CPUs for the last several years. Minor IPC gains unless you use the special new instructions. AMD has compromised by sticking with 128 bit FP units. If the application doesn't take significant advantage of AVX2 then AMD is looking good.
The problem is not on using 128bit units. AMD could have used 4x128bit units to get twice the max throughput, but then they would have to improve everything from the front end to the commit, including doubling the caches' BW, which would generate very complex engineering problems.
I see same mistake when people claims that Bulldozer performed badly because has a FP shared between two cores in a module. This is not rigth. Bulldozer performed badly because the FPU was only 256bit wide. If Bulldozer module had incorporated a 1024bit wide unit then it would beat the best designs Intel has. But of course a 4x bigger FPU would require lots of extra transistors and logic and power and...
And the excuse that AMD Zen uses 128bit units because 256bit is not very popular doesn't hold up on close inspection, because (i) AMD is supporting 256bit on Zen via fussing out the pair of 128bit units and (ii) AMD has been supporting ISAs with much more less popularity, including HSA.
lol
AMD is faster on blender, so how is it a failure?
The whole point is that artificial benchmarks showing intel is 50x better are largely crap. It doesn't matter what extensions you support if no one uses them. This is ultimately why risc is way better than cisc, and why real cores are better than hyperthreads.
Though I blame microsoft's crappy compiler as well. There is no reason not to use them automatically in many cases but it never will, and gcc is only a little better.
I just looked it up and this site calls it a pre-decode cache. Now maybe I'm getting my terminology mixed up, because that sounds more like the trace cache the P4 had.
The pre-decode cache on K6 is not a uop cache. One of the requirements of a uop cache is that it has to be placed after the decode stage of the pipeline, not before, because the uop cache store the RISC-like uops obtained from decoding the x86 instructions (CISC).
From what I remember from my University course about CPU design, when the ALUs are 100% symmetrical you need to have them aligned with the number of AGUs so the number of dispatched operands don't fall into a queue for long waiting for the AGUs to do the addressing in all INT and memory ops. Since the ALUs are no longer symmetrical in current designs, the differences are usually balanced out through bigger op queues or a lower number of them; or at least, that is what I can think of about them.
It's been 10 years since I took that course though, so I might as well be outdated on this.
Cheers!
Cazalan :
Melonious :
utilizing SIMD is not magic and compiler settings don't do a thing. You have to actually manually use assembly or intrinsics to make it happen. And most programs don't. So it is not cherry picked even it is just a 'real world' result.
Artificial benchmarks don't mean a damned thing when the real software doesn't take advantage of all the features. And that is why they support the older extensions more and so does ARM.
That has been the general complaint of x86 CPUs for the last several years. Minor IPC gains unless you use the special new instructions. AMD has compromised by sticking with 128 bit FP units. If the application doesn't take significant advantage of AVX2 then AMD is looking good.
The problem is not on using 128bit units. AMD could have used 4x128bit units to get twice the max throughput, but then they would have to improve everything from the front end to the commit, including doubling the caches' BW, which would generate very complex engineering problems.
I see same mistake when people claims that Bulldozer performed badly because has a FP shared between two cores in a module. This is not rigth. Bulldozer performed badly because the FPU was only 256bit wide. If Bulldozer module had incorporated a 1024bit wide unit then it would beat the best designs Intel has. But of course a 4x bigger FPU would require lots of extra transistors and logic and power and...
And the excuse that AMD Zen uses 128bit units because 256bit is not very popular doesn't hold up on close inspection, because (i) AMD is supporting 256bit on Zen via fussing out the pair of 128bit units and (ii) AMD has been supporting ISAs with much more less popularity, including HSA.
lol
AMD is faster on blender, so how is it a failure?
Where do you read the word "failure"? Sure it is not found in my posts, which you are quoting.
Also it is not proven that "AMD is faster on Blender". What has been demonstrated is much less. What has been demonstrated is that overclocked ES of Zen was ~2% faster than an underclocked Broadwell chip under unknown settings (compiler? Flags? platfform?) using a custom image on an unknown version of Blender. Moreover, that "2% faster" is statistically insignificant because it is smaller than the margin of error, which implies that measured "faster" could be just a random effect.
There is zero evidence an "overclocked ES" was used. Less misinformation please!
Nope. As everyone now knows current Zen silicon runs at 2.8GHz base. They had to overclock to 3GHz. AMD also refused to ask questions from audience about which was the TDP of that Zen sample.
But there isn't much for them to benefit by doing that level of mucking about. It'll just cause bad press and they would still have a lousy chip.
Like when they promised a 1050GFLOPS Kaveri but shipped a sub-900 GFLOPS Kaveri? Like when they promised the moon for Carrizo and showed several designs Carrizo wins that latter didn't appear in any store? Like when they said that certain card was a "overclockers' dream", just before reviews showed how bad was the overclocking? Like when they promised a 16-core Seattle but gave us a 8-core Seattle after a two years delay? Like when they used odd settings benchmarks for hyping the 300 series? Like when they promised us again and again that Zen was a 2016 product, (funny enough I have known for a while it was 2017 and I have been saying it in forums), then Digitimes posted a rumor about AMD delaying Zen to 2017, AMD reacted attacking DigiTimes saying that the rumor was false, and now AMD confirms a delay of Zen to 2017? And that is the short list.
It is evident to me that they have cherry picked the benchmark for Zen vs Broadwell and that final reviews will show lots of benchmarks where Zen is outperformed. I have a 100% certainty on this. Mark my words.
See if your so anti-AMD then maybe you should post elswhere Juan?
The last thing we need is another AMD vs Intel war.
Mark my words you sir will be the first casualty ... along with any of the others who chose to start and engage in any slinging matches.
Since you don't actually have any facts beyond a video AMD released and a couple of slides, it seems highly unlikely that you can accurately extrapolate, in any way, shape or form ... the real performance of these new CPU's.
Whilst you are free here to make predictions, I feel it is important to point out that going on your past history, when you were a committed AMD fanboi, you didn't exactly prove an accurate predictor of the performance results in the end.
If your still angry about being jilted in some way then you need to move on ... accept that all manufacturers spin things up a bit, and spend some time getting lost in a good game.
I recommend FreeLancer ... because I am oldskool geek.
I find the music soothing ...
I am not anti-AMD, see my signature, but I am anti-hype and misinformation. And my comments aren't mean to be interpreted as some AMD vs Intel war. I am replying to demo of Zen vs Broadwell made by AMD. If they had made a Zen vs Piledriver demo and claimed that Zen is ~3x faster clock-for-clock I would be reacting in exactly the same way. To put things in perspective
According to AMD Zen @3GHz would be somewhat in the 100 second mark, whereas the FX-8350 is 320 seconds. This implies that Zen would have ~4.26x higher IPC than Piledriver (recall AMD own slides stating that Zen is ~2x faster than FX-8350 clock-for-clock).
The Stilt have made a similar analysis. He downloaded last version of Blender and benchmarked his Piledriver and Haswell chips. He tested single thread and found that Zen core would be 140% faster than PD core to match his Haswell. According to AMD Zen is ~40% faster than XV; therefore XV would be about 100% faster than PD. Numbers don't match by a huge amount. He claims to be puzzled by the result and he got similar conclusions than me.
We are not talking about a small "spin", about a 10% here and a 5% there to put products in a better shape. We are talking about huge gaps . What are we supposed to do? Say "yes" to anything published/advertised by companies and don't use our brain? Then why the forums? The news section would be enough.
And my facts aren't a video demo, but the details of the microarchitectures, and the good record of predictions that I have made about Zen. A good amount of the information in those slides you mention was posted by me here before those slides were even made by the marketing goods.
Let me ask you two things, Juan, that are bothering me with those affirmations:
1. are you considering that Zen is an 8-core, 16-thread cpu, when comparing it against the 8-thread Piledriver? Because all numbers on Zen should be halved (or PD's doubled) for a fair comparison.
2. Did Stilt matched clocks between cpus? If not, just by using an FX-8350 or an FX-9590, results would be a lot different. Also against what Haswell? i5 or i7? We are talking IPC here, so obviously it should be all directly comparable.
1. Yes I am considering SMT. SMT brings usually between 0% and 40% gains depending of the code. I am using 20% as average in my posts. I did it just yesterday again when I explained why Zen would be ~2x faster (average) than PD clock-for-clock on multithreaded applications
SMT doesn't double performance (that is impossible because execution units are shared). In any case check the i5 and i7 in the Blender benchmark given above.
2. Yes, he tested at same clocks and did other changes such as disabling two memory channels on the Haswell side... He used a Xeon model.
SMT changes very little, basically nothing in FPU ops.
Just above, and quoted in your post, you can see a Blender benchmark showing the large performance gap between i5 and i7. In his internal testings, The Stilt found that Blender has abnormally large gains from SMT. He tested using the same Haswell chip with SMT enabled and disabled
One more observation regarding Blender. The SMT yield in Blender appears to be unusually high. In similar applications, such as Cinebench the yield is around 27% on Haswell-E. In Blender the yield is > 59%. Blender BMW benchmark (at default resolution, 20x20 tiles) was completed in 127.98 seconds with 18C/18T while with SMT enabled the time was reduced to 90.07 seconds.
There is zero evidence an "overclocked ES" was used. Less misinformation please!
Nope. As everyone now knows current Zen silicon runs at 2.8GHz base. They had to overclock to 3GHz. AMD also refused to answer questions from audience about which was the TDP of that Zen sample.
Why can't AMD have a 3Ghz ES in their hands? Why couldn't it be final silicon they had to under-clock to match Intel?
Also, I think they said giving the TDP was an strategic thing for them, so they wouldn't inform it until final silicon is ready. I have a vague memory of this last part, so I could be horribly wrong.
But there isn't much for them to benefit by doing that level of mucking about. It'll just cause bad press and they would still have a lousy chip.
Like when they promised a 1050GFLOPS Kaveri but shipped a sub-900 GFLOPS Kaveri? Like when they promised the moon for Carrizo and showed several designs Carrizo wins that latter didn't appear in any store? Like when they said that certain card was a "overclockers' dream", just before reviews showed how bad was the overclocking? Like when they promised a 16-core Seattle but gave us a 8-core Seattle after a two years delay? Like when they used odd settings benchmarks for hyping the 300 series? Like when they promised us again and again that Zen was a 2016 product, (funny enough I have known for a while it was 2017 and I have been saying it in forums), then Digitimes posted a rumor about AMD delaying Zen to 2017, AMD reacted attacking DigiTimes saying that the rumor was false, and now AMD confirms a delay of Zen to 2017? And that is the short list.
It is evident to me that they have cherry picked the benchmark for Zen vs Broadwell and that final reviews will show lots of benchmarks where Zen is outperformed. I have a 100% certainty on this. Mark my words.
See if your so anti-AMD then maybe you should post elswhere Juan?
The last thing we need is another AMD vs Intel war.
Mark my words you sir will be the first casualty ... along with any of the others who chose to start and engage in any slinging matches.
Since you don't actually have any facts beyond a video AMD released and a couple of slides, it seems highly unlikely that you can accurately extrapolate, in any way, shape or form ... the real performance of these new CPU's.
Whilst you are free here to make predictions, I feel it is important to point out that going on your past history, when you were a committed AMD fanboi, you didn't exactly prove an accurate predictor of the performance results in the end.
If your still angry about being jilted in some way then you need to move on ... accept that all manufacturers spin things up a bit, and spend some time getting lost in a good game.
I recommend FreeLancer ... because I am oldskool geek.
I find the music soothing ...
I am not anti-AMD, see my signature, but I am anti-hype and misinformation. And my comments aren't mean to be interpreted as some AMD vs Intel war. I am replying to demo of Zen vs Broadwell made by AMD. If they had made a Zen vs Piledriver demo and claimed that Zen is ~3x faster clock-for-clock I would be reacting in exactly the same way. To put things in perspective
According to AMD Zen @3GHz would be somewhat in the 100 second mark, whereas the FX-8350 is 320 seconds. This implies that Zen would have ~4.26x higher IPC than Piledriver (recall AMD own slides stating that Zen is ~2x faster than FX-8350 clock-for-clock).
The Stilt have made a similar analysis. He downloaded last version of Blender and benchmarked his Piledriver and Haswell chips. He tested single thread and found that Zen core would be 140% faster than PD core to match his Haswell. According to AMD Zen is ~40% faster than XV; therefore XV would be about 100% faster than PD. Numbers don't match by a huge amount. He claims to be puzzled by the result and he got similar conclusions than me.
We are not talking about a small "spin", about a 10% here and a 5% there to put products in a better shape. We are talking about huge gaps . What are we supposed to do? Say "yes" to anything published/advertised by companies and don't use our brain? Then why the forums? The news section would be enough.
And my facts aren't a video demo, but the details of the microarchitectures, and the good record of predictions that I have made about Zen. A good amount of the information in those slides you mention was posted by me here before those slides were even made by the marketing goods.
That is actually not so hard to believe...
Consider this...
8C zen has twice as many FPUs. So, assuming they are equal, we are already at +100% performance...now...we can assume +40% per AMDs estimates over piledriver, and finally, we can also include the fact that zen can run AVX, but piledriver cannot. Which would put another theoretical 60% blue sky for benchmarks in there.
So, we end up +200% total, which is not far off your calculations (20% difference...).
Doesn't work that way.
First, 200% more is not enough because the performance gap is higher than 4x.
Second, you are counting the same performance gains twice. Zen has 4 ALU per core, Piledriver has 2 ALU. This means that Zen has the ability to execute up to four integer instructions per cycle, which implies twice more throughput than Piledriver. But this is a peak performance, because Zen only has 2 AGUs (the same than Piledriver) and cannot provide data to sustain four ALUs each cycle. This simply implies that sustained performance gain will be inferior to 2x. We can do a cheap estimation.
Zen: 4ALU + 2AGU = 6 execution units
PD: 2ALU + 2AGU = 4 execution units
6/4 = 1.5%, which implies Zen could be about 50% faster, in sustained workloads, than Piledriver. The actual computation is more complex and has to account for other details including the frequency of use of each unit (ALU, AGU) in real code, but ~50% more than Piledriver is close to the expected performance for Zen.
Similar remarks for floating point. Zen is 16FLOP/core and has 2x peak throughput than Piledriver (8FLOP/core), but on sustained floating-point workloads Zen will be ~70% faster than Piledriver. The same happen on Intel side. Haswell core has 2x the number of FP resources than Ivy Bridge core, but it is only 70% faster (sustained performance) clock for clock.
In code that mixes integer and float we can obtain an average, ~60% over Piledriver, which roughly corresponds to the 40% over Excavator officially claimed by AMD.
You are taking the peak throughput gain (2x) and adding average throughput gain (1.4x) on top of that. You are counting twice.
Finally, I don't know what you mean by Piledriver not supporting AVX. AMD has supported AVX since Bulldozer
There is zero evidence an "overclocked ES" was used. Less misinformation please!
Nope. As everyone now knows current Zen silicon runs at 2.8GHz base. They had to overclock to 3GHz. AMD also refused to answer questions from audience about which was the TDP of that Zen sample.
Why can't AMD have a 3Ghz ES in their hands? Why couldn't it be final silicon they had to under-clock to match Intel?
Also, I think they said giving the TDP was an strategic thing for them, so they wouldn't inform it until final silicon is ready. I have a vague memory of this last part, so I could be horribly wrong.
Because the only known silicon is 2.8GHz.
Because they set the frequency to 3GHz without turbo when running Blender.
Because they had to underclock the Broadwell sample.
Because if they had final silicon they wouldn't announce a six month delay of the chip.
When pressed about the TDP of final silicon, AMD said "comparable to Broadwell".
None of this is terribly promising, IMO. If we look at what people are buying, it's the high-clocked, high-IPC quad-core chips that Intel sells the most of. Haswell-E, and Broadwell-E absolutely crush consumer Skylake parts in multi-threaded performance, but that doesn't stop people from choosing the 6700K, a single SKU, more than 5 times as often as all Broadwell-E parts combined and more than 3 times as often as all Haswell-E parts combined. (6700K market share 5.2%, all BW-E 0.9%, all HW-E 1.4% - source: Userbenchmark). They are going to have do better than matching the performance profile of a CPU that no one is buying. If they could release a quad-core Zen clocked at 5GHz, they'd have a slam-dunk, but it looks like we are nowhere near that.
I have to say that price is the primary deterrent of Broadwell-E and Haswell-E. With the pricing of the i7-5820K, only the $100-200 increase in board price keeps it out of the mainstream market.