AMD CPU speculation... and expert conjecture

8350rocks · Mar 24, 2015

Cache consumption is the culprit in Frostbite 3.

AMD has enough cache and seems to predict "good enough" for that particular engine, where as Intel's redundancy model cache system prevents them from caching as much as the CPU can run.

Victim cache is not a bad thing, they just have to get better branch prediction in place.

-Fran- · Mar 24, 2015

Gamerk, there's a problem with "IPC" when you start de-glossing it into the innards to the CPU uArch... In particular, when you're not hitting cache, I'm sure the FX'es "IPC" is pretty similar to Intel's, but when the program makes the CPU hit it's cache, you see what happens. This is something beyond having "good" or "bad" IPC. It's about Intel having a patent on IMC and Cache techniques AMD can't use. Not blaming Intel on that, but AMD is pretty much screwed there.

Cheers!

de5_Roy · Mar 24, 2015

AMD Fiji, Grenada, Antigua, Trinidad and Tobago GPUs set to debut at Computex
http://vr-zone.com/articles/amd-fiji-grenada-antigua-trinidad-tobago-gpus-set-debut-computex/89325.html
DirectX12 games could be available by year-end
http://vr-zone.com/articles/directx12-games-available-year-end/89334.html
Former AMD boss ends up in [strike]H[/strike]Dell
http://www.fudzilla.com/news/37338-former-amd-boss-ends-up-in-dell

AMD Bets on DirectX 12 for Not Just GPUs, but Also its CPUs
http://www.techpowerup.com/210960/amd-bets-on-directx-12-for-not-just-gpus-but-also-its-cpus.html
promo slides galore.

Catalyst 15.3 betas add FreeSync support, more CrossFire profiles
http://techreport.com/news/28005/catalyst-15-3-betas-add-freesync-support-more-crossfire-profiles

AMD A8-7650k Kaveri APU Review
http://www.eteknix.com/amd-a8-7650k-kaveri-apu-review/

blackkstar · Mar 24, 2015

Fascinating, there's not a huge difference in CPU performance when benchmarking at 1080p on ultra with 4xMSAA. Never saw that coming.

The freesync FUD is going berserk, lol.

Also, in regards to gsync not accounting for the future changes. You're assuming that Nvidia has customer's best interests in mind and they are not making a move based on planned obsolescence or anything. I'm just going to call it now. There's going to be "gsync 2" when panels with wider refresh rate range panels come out and monitor vendors are going to be grinning ear to ear because those monitors most people keep for 5+ years will now be obsolete within only a few. You just have to follow the money. Now Nvidia makes extra money on monitors as well as graphic cards, and they will encourage people to upgrade both instead of just one. Monitor makers are happy about this too. How many of you have monitors that you don't upgrade as often as the rest of your computer? My secondary monitor is a 10 year old LCD, lol.

And everyone is forgetting the latency with vsync added at low refresh rates. That input latency is going to be absolutely awful at something like 20hz. You're looking at 50ms for single buffering, 100ms for double buffering, and 150ms for triple buffering. I don't think even 50ms of input lag is worth not having to look at tearing. But now that Nvidia is good at dynamic refresh rates, input lag is no longer a metric for tech media to judge products on. And you'll notice input lag is missing from the freesync reviews, because it's not using vsync (or it can be toggled), and a torn screen will not have that input lag. So they focus on ghosting.

Tech media, lol. Sorry for everyone who takes what they say at face value still in 2015.

juanrga · Mar 24, 2015

-Fran- :

Haswell core has ~60% higher IPC than Piledriver core thanks to wider execution engine (8 vs 5), better branch/prefecth hardware, more aggressive OoO (192-size ROB), shorter pipeline... Intel has better cache techniques because needs to feed better the fastest cores.

juanrga · Mar 24, 2015

gamerk316 :

i3-4130 @3.4GHz gives about same performance than FX-4320 @4.0GHz

Four threads vs four-threads

~60% higher IPC per core plus the ~20% CMT penalty gives Haswell an advantage of ~92% over Piledriver clock per clock. Thus AMD needs four Piledriver cores to match two Haswell cores on total throughput.

-Fran- · Mar 24, 2015

juanrga :

That's a "self-fulfilling prophecy"...

You've been told time and time again about Bulldozer having issues with specific components of the CPU that don't allow them to fulfill the theoretical numbers they had. IPC is a consequence, not a reason. That is my simple point. Saying "because lower IPC", it's like saying "that house burned because fire". Well, yeah... So?

Cheers!

de5_Roy · Mar 24, 2015

nice catch, yuka. 😀

juanrga · Mar 24, 2015

-Fran- :

No. Those "issues" that you mention cannot explain the ~100% IPC gap between Piledriver and Haswell. Those "issues" could account for about 20% of the gap, the rest of the IPC gap is from Piledriver being a weak architecture compared to Haswell. A Haswell core is much more complex than a Piledriver core (check some details above). And your claim FX IPC "is pretty similar to Intel's" couldn't be more divorced from reality.

Bulldozer never was designed to compete with Intel on IPC. In fact, AMD engineers expected Bulldozer IPC to be lower than K10. Bulldozer was optimized for throughput and they expected to fill the single-thread performance gap by pushing very high-clocks. In fact the pipeline is designed for a theoretical maximum of 10GHz or so.

For your information, Keller is working in improving IPC via a new architecture from scratch, he is not working in fixing the "issues" of Bulldozer. Zen is not Bulldozer with a better cache ;-)

logainofhades · Mar 24, 2015

Faildozer is AMD's P4.

-Fran- · Mar 24, 2015

juanrga :

And how many cycles does Haswell take to complete an ADD instruction? How many cycles does Pilediver take? What about MOV? Lets get fancy and talk about a MUL as well. Each instruction has a certain number or promised cycles, giving the throughout put per core of each instruction. But you're right; it was indeed a design decision.

And yeah, Keller wants to move away from Bulldozer, because fixing BD with what AMD has to compete with is hard to do (being positive) in order to raise the theoretical IPC numbers they had originally (higher clocks and lower latencies). Like you correctly state, they need to re-design a lot of things, but guess what! Intel will continue to have a better Cache mechanism and IMC due to the patents around them. So there will be software, with a lot of cache hits, that will perform poorly on AMD and great on Intel. Your house is going to continue to burn and you'll keep on saying "yeah, it's the fire alright". And you're absolutely right

Cheers!

juanrga · Mar 24, 2015

-Fran- :

Fixing Bulldozer arch is simple for a person of the talent of Keller. But that hypothetical module would have about 20% better IPC per core and the same unreachable optimal frequencies, because of process node limits.

Thus instead wasting time and resources on fixing a speed-demon architecture that will hit a wall due to silicon limits, Keller took the nice approach of starting from scratch and push a brainiac design.

FYI, Keller has been working in cache and data prefetching since his return, and AMD filled many crucial patents that will be used in Zen.

The reason why AMD Zen will not crush Intel on performance is simple: Skylake will be a bigger and complex core than Zen. Skylake will be about 15% faster than Haswell clock for clock on traditional x86 workloads, but on the new AVX-512 workloads, Skylake will be about 60--80% faster clock for clock due to doubling the SIMD wide.

Why do you believe that 8 Zen cores are rated at only 95W, whereas 8 Skylake cores will be rated at about 150W?

-Fran- · Mar 24, 2015

juanrga :

At least we wholeheartedly agree on something: Keller is capable of reversing the situation.

I'm sure it was a simple call for Keller: a) fix the BD uArch or b) start from scratch (or improve on top of K7/K8 making a new one?). I'm sure Keller doesn't like the wide paradigm and he rather make simpler, yet performing, cores instead. Plus, if you're a chef, you don't make someone else's pasta better with your secret sauce and call it a day; you make your own pasta AND sauce, right? 😛

In any case, it's good they've been putting patents on Cache and Prefetchers. I'm sure he'll be able to tie those nicely to Zen. I'll be really glad if I'm wrong when I say that Intel has the upper hand patent wise, but currently, they do have the upper hand. In any case, I'd love to eat my words.

And in regards to Skylake, I have no idea about it's design or idea behind, so I can't comment on it. Actually, I know nothing about Zen either... Luckily the name 😛

Cheers!

8350rocks · Mar 24, 2015

juanrga :

-Fran- :

No. Those "issues" that you mention cannot explain the ~100% IPC gap between Piledriver and Haswell. Those "issues" could account for about 20% of the gap, the rest of the IPC gap is from Piledriver being a weak architecture compared to Haswell. A Haswell core is much more complex than a Piledriver core (check some details above). And your claim FX IPC "is pretty similar to Intel's" couldn't be more divorced from reality.

Bulldozer never was designed to compete with Intel on IPC. In fact, AMD engineers expected Bulldozer IPC to be lower than K10. Bulldozer was optimized for throughput and they expected to fill the single-thread performance gap by pushing very high-clocks. In fact the pipeline is designed for a theoretical maximum of 10GHz or so.

For your information, Keller is working in improving IPC via a new architecture from scratch, he is not working in fixing the "issues" of Bulldozer. Zen is not Bulldozer with a better cache ;-)

Wait a minute, first it was ~60%, now you are claiming ~100%.

So, we should break this down. Present a real world suite of benchmarks from various tasks, not just single thread, not just multithread, and not memory benchmarks either. SiSoft is not something that measures CPU performance in any meaningful way.

Present a set of benchmarks to me where an i5-4670k in stock format is 60% faster on average against the FX-8350.

Once you have provided that evidence we can debate the issue. I am not going to bother to chase down benchmarks because the fallacy is not my claim to support.

8350rocks · Mar 24, 2015

juanrga :

-Fran- :

Fixing Bulldozer arch is simple for a person of the talent of Keller. But that hypothetical module would have about 20% better IPC per core and the same unreachable optimal frequencies, because of process node limits.

Thus instead wasting time and resources on fixing a speed-demon architecture that will hit a wall due to silicon limits, Keller took the nice approach of starting from scratch and push a brainiac design.

FYI, Keller has been working in cache and data prefetching since his return, and AMD filled many crucial patents that will be used in Zen.

The reason why AMD Zen will not crush Intel on performance is simple: Skylake will be a bigger and complex core than Zen. Skylake will be about 15% faster than Haswell clock for clock on traditional x86 workloads, but on the new AVX-512 workloads, Skylake will be about 60--80% faster clock for clock due to doubling the SIMD wide.

Why do you believe that 8 Zen cores are rated at only 95W, whereas 8 Skylake cores will be rated at about 150W?

You do realize that Zen has listed AVX-512 as supported...yes?

cdrkf · Mar 24, 2015

@8350 his premise is based on core i3 = 2 cores, fx 4xxx = 4. If you then state 2 x 3.4 ghz Intel cores as fast as 4 x 4 ghz fx cores you get these exaggerated numbers.

When you look at the size and capabilities of a haswell i3 core (which critically includes ht), then compare to amd its comparable in most respects to 1 module. Fx 4xxx is directly comparable to an i3, its only poor marketing on amds part that brands this a quad core part.

Also out of interest, the haswell core is so wide that in games the haswell i3 (dual core + ht) is keeping pace with the the i5 with a small clock advantage (highest i3 vs entry i5). Does that mean the i5 is suddenly a bad design?

jdwii · Mar 24, 2015

I'd say 40-45% I did a lot of tests just about anything I could. Again in performance per clock Juan however 60% did come up in some of my benchmarks it definitely wasn't the average

-Fran- · Mar 24, 2015

8350rocks :

It's one thing having it listed and another thing making it perform on par to Intel's; which is part of my original point with BD's short comings and what Juan has been saying: Keller is trying to get at least on par with the shortcomings and using them on a brand new design. Plus, you need software support for that to actually matter 😛

Cheers!

juanrga · Mar 24, 2015

8350rocks :

Is this another instance where you don't read my posts but reply them? This is from a pair of posts above the your:

I rounded 92% to 100%.

8350rocks :

Jaguar has AVX support. Ivy Bridge has AVX support. Are you claiming that Jaguar gives same performance than Ivy Bridge? I hope not.

Another question, where has Zen listed AVX-512 support?

+ { "CPU_ZNVER1_FLAGS", + "Cpu186|Cpu286|Cpu386|Cpu486|Cpu586|Cpu686|Cpu SYSC ALL|CpuRdtscp|Cpu387|Cpu687|CpuFISTTP|CpuNop|CpuMM X|CpuSSE|CpuSSE2|CpuSSE3|CpuSSE4a|CpuABM|CpuLM|Cpu FMA|CpuFMA4|CpuBMI|CpuF16C|CpuCX16|CpuClflush|CpuS SSE3|CpuSVME|CpuSSE4_1|CpuSSE4_2|CpuAES|CpuAVX|Cpu PCLMUL|CpuLZCNT|CpuPRFCHW|CpuXsave|CpuXsaveopt|Cpu FSGSBase|CpuAVX2|CpuMovbe|CpuBMI2|CpuRdRnd|CpuADX| CpuRdSeed|CpuSMAP|CpuSHA|CpuXSAVEC|CpuXSAVES|CpuCl flushOpt|CpuCLZERO" },

juanrga · Mar 24, 2015

jdwii :

Yes, the average IPC gap varies with the applications you use to measure. Using a different set can give a different percentage. Note however, we are close. My 60% is only 10--15% ahead of your 40--45%

1.40 x 1.15 = 1.61
1.45 x 1.10 = 1.595

etayorius · Mar 25, 2015

Anyone knows what type of Multi GPUs setup does Vulkan supports? i want to know if it will also support GeForce/Radeon working together.

de5_Roy · Mar 25, 2015

AMD Launching “Hierofalcon” 64bit ARM Embedded Chips in 1H 2015 – Zen and K12 Next Year
http://wccftech.com/amd-launching-arm-serves-year-wip/
AMD To Disclose 14/16nm CPU and GPU Roadmaps in May – Zen, K12 and Arctic Islands
http://wccftech.com/amd-future-gpu-cpu-roadmap/

gamerk316 · Mar 25, 2015

-Fran- :

You aren't looking at the whole picture. The actual instruction execution is only part of what goes into IPC. Cache latencies matter. Resource sharing matters. Even memory access times matter to some degree. And that's where the BD arch starts to fall behind.

When you look at the size and capabilities of a haswell i3 core (which critically includes ht), then compare to amd its comparable in most respects to 1 module. Fx 4xxx is directly comparable to an i3, its only poor marketing on amds part that brands this a quad core part.

To be fair, CMT is a relatively aggressive form of SMT, and far superior performance wise to HTT. Which brings us back to the "What makes a CPU core" argument? As far as Windows is concerned, both HTT enabled i3s and the FX-4xxx have four cores, even if HTT is treated differently by the OS.

Also out of interest, the haswell core is so wide that in games the haswell i3 (dual core + ht) is keeping pace with the the i5 with a small clock advantage (highest i3 vs entry i5). Does that mean the i5 is suddenly a bad design?

No, just that the i3 (at least HTT enabled variants) are fast enough to avoid a CPU bottleneck.

juanrga · Mar 25, 2015

de5_Roy :

WCCFTECH writes in big bold face: AMD Launching “Hierofalcon” 64bit ARM Server Chips in The First Half of 2015

Pardon? Hierofalcon is not for servers...

de5_Roy :

This news is a month old. My thoughts are found in the comment section.

gamerk316 · Mar 25, 2015

Only two socketed Broadwell CPUs:

http://www.techspot.com/news/60153-intel-reportedly-releasing-two-socketed-broadwell-cpus.html

Not a big issue, given Skylake is just around the corner anyways.

AMD CPU speculation... and expert conjecture

Distinguished

Glorious

Splendid

Honorable

Distinguished

Distinguished

Glorious

Splendid

Distinguished

Titan

Glorious

Distinguished

Glorious

Distinguished

Distinguished

Judicious

Splendid

Glorious

Distinguished

Distinguished

Honorable

Splendid

Glorious

Distinguished

Glorious

Share this page