Discussion: AMD Ryzen

juanrga · Jan 24, 2017

8350rocks :

Once again. NO. Also it is irrelevant what turbo has the FX-6300, because it is not in the list

http://www.anandtech.com/bench/CPU/1028

The score of 96 that you used in your former computations to correspond to the FX-8350 or the FX-6350. Both have turbo of 4.2GHz, this is why both chips score 96 points in Anandtech database.

sarinaide · Jan 24, 2017

jdwii :

BF1 does test IPC as does any game, how sensitive it is is how well the game is coded, Dice is better but still shows that Sandy/Ivy is getting long in the tooth. You can't pick and choose benches or decide what is more tangible than the other. Dolphin is synthetic while Battlefield is real world compute intensive each holds its own and no two benchmarks are the same, some affect IPC more than others.

CPC tested over a number of games and productivity suites so that is itself viable.

Edit

Productivity benches:

H264 / H264 4K
wPrime
PovRay
Blender 3D
3DSMax
Corona
Mental Ray

Games:

Arma III - heavy single thread performance penalties
Farcry 4 - single threaded
Grid - Single threaded
Witcher 3 - Single threaded

sarinaide · Jan 24, 2017

cities-skylines-game-performance-review-cpu-scaling.png

City Skylines appears to be as multi threaded as a Battlefield is.

gamerk316 · Jan 24, 2017

Be careful when using games to calculate IPC; the GPU will eventually be a bottleneck, compressing results at the top.

sarwar_r87 · Jan 24, 2017

^^ agreed. Besides GPU, there is a question of RAM bandwidth.

juanrga · Jan 24, 2017

sarwar_r87 :

Excavator wasn't released on datacenter/HPC market. Piledriver is the last muarch used in Opteron/FX CPUs. AMD uses a different baseline depending of what is discussing. Sometimes they use Excavator as baseline, sometimes use Piledriver.

This has generated some confusion among the media in the past. In datacenter slides AMD was using Piledriver as baseline, and this generated some wrong headings in several media. When using Excavator as baseline, AMD mention it explicitly as "Excavator core", when using Piledriver as baseline, AMD don't refer to it explicitly but uses instead terms like "previous generation", "current generation", "current AMD CPU core",... or simply refer to ti using the codename of the 8-core die: "Orochi".

In the above video and in the slide reported by Arstechnica, we see again "previous generation", which suggests AMD is using Piledriver when comparing the improvement in L3. Indeed this makes sense because all Excavator implementations lack L3. All of this stuff about baselines was discussed broadly in forums. For instance

https://forums.anandtech.com/threads/new-zen-microarchitecture-details.2465645/page-53#post-38240247
http://semiaccurate.com/forums/showpost.php?p=265353&postcount=3089

In any case, this discussion about if the official IPC percentage uses a real Excavator CPU (without L3) or a fictitious Excavator CPU (with L3) is useless regarding my estimation of IPC using CB. As demonstrated in

http://www.tomshardware.co.uk/forum/id-2986517/discussion-amd-zen/page-36.html#19193405

Using Piledriver with L3 as baseline, we obtained that Haswell IPC is about 76% over Piledriver. Using Piledriver without L3 as baseline, we obtained that Haswell IPC is about 75% over Piledriver. The impact of L3 is minimal in this case.

The conclusion is that I can continue using the Excavator-based Athlon X4 845 as baseline to get an estimation of Zen IPC.

juanrga · Jan 24, 2017

salgado18 :

The mean Piledriver, which was the last core that released with L3 in the datacenter and HEDT. Steamroller and Excavator only launched for mainstream desktop and mobile. Zen replaces different cores.

http://www.kitguru.net/wp-content/uploads/2015/06/amd_client_platform_roadmap.jpg

Zen replaces Piledriver in HEDT and Opteron.

Zen replaces Excavator in desktop APU and top mobile.

Note as well that Excavator replaced Jaguar/Puma cores in some markets. And since Zen replaces Excavator, Zen will be used in products server by Jaguar/Puma before.

sarinaide · Jan 24, 2017

Of course 40% may also mean 40% over the fastest AMD SKU irrespective of clockspeed ie: FX9550 = 105 so clock for clock Ryzen may be more around double that. So a 3.4-3.8 Ryzen is around 147 which very much puts it in Haswell type domain at same clocks.

This image also implies something that nobody wants to address so I will.

Zen = 2x Orochi ie: 76x2=152 and that is very much Haswell performance domains especially knowing that AMD are doing that with lower clock speed.

sarwar_r87 · Jan 24, 2017

sarinaide :

no it does not

amd said 40% IPC improvement, which means Instruction per Cycle.

010TheMaster010 · Jan 24, 2017

@Juan how did you come to the XV with L3 performance metric? Just wondering
The direct quote is: AMD says the much-touted "40 percent higher IPC" over the Excavator core came from three design goals: core, cache, and power. For the core, AMD made everything bigger and wider, introducing a micro-op cache (something Intel has been using for some time), as well as a larger dispatch, larger retire, larger schedulers, and better branch prediction. On the cache side there's a faster prefetch, while L1 and L2 bandwidth has been doubled, and L3 more then quadrupled. Full details on the improvements are in the slides below.
They don't refer to previous core or anything, just XV

sarwar_r87 · Jan 24, 2017

where do I begin...

but I am glad you chose to respond 😛

juanrga :

as an engineer and researcher, that makes no sense what so ever. and AMD is a engineering company.

juanrga :

I am afraid you are reading too hard into slides.

juanrga :

Thank you for your acknowledgement

juanrga :

I agree that it is not possible to compare to anything as XV has no L3. I respect your way of interpreting the "40% uplift" where you say L3 has no performance benefit. but if that was indeed the case, AMD would decide just to drop L3, save die space and increase profitability

more over XV was designed for an APU with components being compacted. The floating-point scheduler is 38% smaller, fused multiply-accumulate (FMAC) units compacted by 35%, and instruction-cache controller compacted by another 35%.

So, respectfully you method is flawed. the most appropriate way of looking at it is:

XV based APU is 12% over PD based APUs. zen is 55 % over XV.
zen_ipc = PD_IPC * XV_boost * Zen_boost = 1.55*1.12 * PD_IPC = 1.74 * PD_IPC.

sarinaide · Jan 24, 2017

sarwar_r87 :

IPC is not measurement at equal clocks, IPC is not generic it is architectual, this is why a 4770 and 5960X have the identical base line throughput. One is better suited to gaming the other to production

sarwar_r87 · Jan 24, 2017

010TheMaster010 :

his assumption of 75 % difference with L3 and 76% without L3. indicates there is no benefit of L3. if that was the case AMD would have simply decided to drop all L3 and save die space to reduce power and production cost.

more over, XV was 10 % better than steamroller core. steamroller was 5% better than Piledriver. despite that AMD (Vishera) FX-6350 (3.6 GHz) score 96 and while Athlon 845 (at 3.5Ghz ) scores 91.57? that theory falls flat even before taking off unfortunately.

sarinaide · Jan 24, 2017

010TheMaster010 :

Level 3 cache has significant advantage, however with Bulldozer the native speed of L3 cache was slower than physical RAM and resulted in extremely high latency. This is why a Thuban/Deneb decimates a Bulldozer in pure performance metric. CMT design has to many penalties to overcome with just clockspeed.

sarwar_r87 · Jan 24, 2017

sarinaide :

of course its not measured at equal clocks. but IPC literally means the number of instructions a CPU processes per cycle i.e. per clock cycle or per clock. so at the end of the day, clocks do not matter as it gets normalized when you compute something per clock. meaning IPC of two core with same arch but different clocks will yield the same value.

EDIT: maybe you are confusing it with CB scores, so I will try to explain this way:
if you are looking at cinebench results, it is not IPC. to compute IPC from cinebench, you need to know the number of instructions that were processed (which is constant due to a constant workload) and the number of cycles that was needed by a given CPU. for a CPU with same arch by different clockspeed will still need the same number of cycles.

CB score is based on the time it takes to process a constant workload. so a higher clocked CPU will have a higher CB score because each cycle in the CPU has shorter time period. meaning it takes less time to finish the benchmark and gets a better score in cinibench. the only reason we use CB is because the workload is constant so the time it takes can be used as a indication of IPC once it is normalized by clock.

i.e: IPC = workload_in_CB / no_of_cycle = workload_in_CB / (escape_time/time_period) = workload_in_CB * time_period /escape_time = workload_in_CB /(escape_time * CPU_frequency) = CB_score / CPU_frequency

this is why a 4770 and 5960X have the identical base line throughput. One is better suited to gaming the other to production

additionally, IPC is always singlethreaded. for multithreaded applications it is called throughput. 4770/5960X have similar IPC, but thoughput is vastly different due to core count difference. gaming performance is NOT IPC. it is a work load. both 4770/5960X takes the same number of CPU cycles to complete it. its just that 4770 has a shorter time_period making the time it takes to run a game shorter.

for multithreaded apps, the 8c16t simply has more pipelines using which it run several concurrent instructions, therefore getting a better throughput. but it still has the same IPC per pilepine.

juanrga · Jan 24, 2017

sarwar_r87 :

This proof is not valid.

In the first place, the math is wrong 50% and 83% doesn't give 125% more, but 175% more.

1.5* 1.83 = 2.745

In the second place, one cannot apply the average IPC to any benchmark. For instance, Skylake is about 20% ahead of Sandy on average, but Skylake is about 70% ahead of Sandy on Dolphin, whereas only 8% ahead of Sandy on 7-Zip.

The same happens with SMT yields. Blender is a special benchmark that has abnormally large SMT yields. This was also discussed in this thread when AMD provided the first demo of Zen using Blender. I will repeat this:

One more observation regarding Blender. The SMT yield in Blender appears to be unusually high. In similar applications, such as Cinebench the yield is around 27% on Haswell-E. In Blender the yield is > 59%. Blender BMW benchmark (at default resolution, 20x20 tiles) was completed in 127.98 seconds with 18C/18T while with SMT enabled the time was reduced to 90.07 seconds.

If anything this gives extra weight to the hypothesis that Zen has better SMT yields due to a more distributed nature of the microarchitecture. It seems AMD chose Blender on purpose.

A last remark. CB has a standard workload and all the scores are directly comparable. Blender has different workloads and the benchmarks depend on the image and settings used. AMD used a custom workload for the demo. The results can vary with a different workload.

We can be during months discussing all this. For each possible explanation, there exists an alternative explanation, because we have a single equation, but two unknowns: IPC and SMT. Even assuming your data is valid, we have

IPC + SMT = 125%

The only value of IPC that we have is the official claim made by AMD.

Is it so difficult to wait for a proper leak of IPC measurements or to third party reviews?

jdwii · Jan 24, 2017

sarwar_r87 :

where do I begin...

but I am glad you chose to respond 😛

juanrga :

as an engineer and researcher, that makes no sense what so ever. and AMD is a engineering company.

juanrga :

I am afraid you are reading too hard into slides.

juanrga :

Thank you for your acknowledgement

juanrga :

I agree that it is not possible to compare to anything as XV has no L3. I respect your way of interpreting the "40% uplift" where you say L3 has no performance benefit. but if that was indeed the case, AMD would decide just to drop L3, save die space and increase profitability

more over XV was designed for an APU with components being compacted. The floating-point scheduler is 38% smaller, fused multiply-accumulate (FMAC) units compacted by 35%, and instruction-cache controller compacted by another 35%.

So, respectfully you method is flawed. the most appropriate way of looking at it is:

XV based APU is 12% over PD based APUs. zen is 55 % over XV.
zen_ipc = PD_IPC * XV_boost * Zen_boost = 1.55*1.12 * PD_IPC = 1.74 * PD_IPC.

Its sad but true Amd does use different CPU architectures to compare ryzen to, i heard them do it at CES, i seen designs by their marketing team. Remember Amd has a engineering department but that doesn't always mean they are the ones talking.

sarwar_r87 · Jan 24, 2017

juanrga :

that's your view, and I respect that. but you have not said why.

if you prefer I can run the AMD Zen Blender benchmark on my PC. I have Xeon 4c8t and a FX 6300. let me know which you prefer:

since you already claimed FX and XV has the same performance, it wont be hard to assume PD numbers for you to calculate your numbers.

juanrga :

actually your equation is wrong, but so was I actually. I just realized it now, thanks for pointing it out,
i3 is 120% faster w.r.t i3. (i3 473 sec, Athlon was 1040 sec; (1040-473)/473). zen is 4% faster in Blender than haswell. hence zen is 130% compared to Atlon w.r.t. zen.
gain_ICP*gain_SMT= 2.30 (because = 1.3+1, )
if gain_IPC = 1.5
gain_SMT = 1.25/1.5 = 1.55 or 1.64 if we consider gain_IPC = 1.44

a 55-64% gain from SMT is still unreasonable considering even a CMT gain is barely 53% on average with many many more transistors dedicated towards the second thread:

ALSO, my person xenon processor says SMT gain in intel using Blender is 35%.(78 sec with HTT and 105 sec without HTT). That makes a 64-82% SMT on zen even more unlikely as that would make the amd SMT 68% faster than intel's!!!

juanrga :

I really am not. those numbers are based from a specific benchmark by the same person. read the references please.

juanrga :

I think I answered that. but I have more results from my person xenon processor which says SMT gain in intel using Blender is 35%.(78 sec with HTT and 105 sec without HTT). That makes a 64% SMT on zen even more unlikely as that would make the amd SMT 68-82% faster than intel's

also your number, 127/90 = 41%. not > 59%

juanrga :

64% is unrealistic as That makes zen SMT 1.68-1.82 times faster. CPC uarch benchmark begs to differ from you

juanrga :

I really am not. those numbers are based from a specific benchmark by the same person. read the references please.

juanrga :

I can ask the same of you too

sarwar_r87 · Jan 24, 2017

jdwii :

sarwar_r87 :

where do I begin...

but I am glad you chose to respond 😛

juanrga :

as an engineer and researcher, that makes no sense what so ever. and AMD is a engineering company.

juanrga :

I am afraid you are reading too hard into slides.

juanrga :

Thank you for your acknowledgement

juanrga :

I agree that it is not possible to compare to anything as XV has no L3. I respect your way of interpreting the "40% uplift" where you say L3 has no performance benefit. but if that was indeed the case, AMD would decide just to drop L3, save die space and increase profitability

more over XV was designed for an APU with components being compacted. The floating-point scheduler is 38% smaller, fused multiply-accumulate (FMAC) units compacted by 35%, and instruction-cache controller compacted by another 35%.

So, respectfully you method is flawed. the most appropriate way of looking at it is:

XV based APU is 12% over PD based APUs. zen is 55 % over XV.
zen_ipc = PD_IPC * XV_boost * Zen_boost = 1.55*1.12 * PD_IPC = 1.74 * PD_IPC.

Its sad but true Amd does use different CPU architectures to compare ryzen to, i heard them do it at CES, i seen designs by their marketing team. Remember Amd has a engineering department but that doesn't always mean they are the ones talking.

the guy giving the presentation was engineer. but maybe marketing made him go to this weird thing. by why? they gain nothing from this except maybe confuse the intel to oblivion and catch them off guard in case they think along the same line as Juan is thinking.

sarinaide · Jan 25, 2017

sarwar_r87 :

jdwii :

sarwar_r87 :

Its sad but true Amd does use different CPU architectures to compare ryzen to, i heard them do it at CES, i seen designs by their marketing team. Remember Amd has a engineering department but that doesn't always mean they are the ones talking.

the guy giving the presentation was engineer. but maybe marketing made him go to this weird thing. by why? they gain nothing from this except maybe confuse the intel to oblivion and catch them off guard in case they think along the same line as Juan is thinking.

It is catching everyone off guard, even the numbers posted by an independent 3rd party are creating mental short circuits where people are trying to put 6900K like performance into the Sandybridge pigeon hole lol

juanrga · Jan 25, 2017

sarinaide :

That slide has been discussed multiple times in this thread. It is measuring CBMT performance of Zen vs Piledriver both at same clocks.

juanrga · Jan 25, 2017

010TheMaster010 :

In former posts I compared Piledriver with L3 and Piledriver without L3. The variation in IPC gap was minimal for the benchmark used. From here I got the conclusion that hypothetical excavator CPU with L3 wouldn't score much higher than current excavator chips without L3.

I know that quote from Arstechnica. I mentioned before that the media got confused because AMD uses two baselines at once: Excavator and Piledriver. When use Excavator as baseline, AMD mention it explicitly as "excavator core". When AMD uses Piledriver as baseline, AMD uses terms as "previous generation core" and variations. I provided above a pair of forum links where we discussed this confusion from some media.

Arstechnica also says that "details on the improvements are in the slides below." Precisely if one checks the slide that reproduce in front, one can check that Excavator is not mentioned in any part of the slide about perfo5rmance and power improvements. That slide is clearly using both baselines and that is why there is no mention to any specific core.

juanrga · Jan 25, 2017

sarwar_r87 :

This is clearly wrong. If AMD is giving a talk about datacenter products, they have to use Piledriver as baseline because the current Opteron CPUs use Piledrvier. It is absurd that they compare the future Zen Opteron tro nonexistent Excavator Opterons.

In any case it doesn't matter that we believe or not. AMD has used Piledriver as baseline. I provided you a slide where AMD compared Zen to Orochi (codename for server Piledriver die).

If I recall correctly 8350rocks said us that AMD show him a slide with Zen compared to a FX-8350. I would have to search the thread to find his post.

sarwar_r87 :

No. I didn't say that L3 doesn't have benefit. I said it is minimal on the bench used for the construction cores.

It is minimal because the cache design in Bulldozer and followers was bad. Precisely AMD dropped the L3 cache from Bulldozer family APUs because it provided little benefit. The L3 cache on Bulldozer family only provided real improvement in certain server workloads.

Zen is different. The L3 cache is part of the IPC improvement. You can see that the cache on Piledriver was very bad, when AMD has been able to increase the L3 BW by 5x. The Zen APUs have L3 as well.

Your computations are incorrect, and mistakes have been pointed many times before. No worth to repeat.

sarwar_r87 · Jan 25, 2017

juanrga :

so you are saying that desipite have 12% IPC enhancement in Excavator (please compare AMD slides of XV and benchmarks) the fact Excavator is still 4% behind Piledriver CPU sounds reason reasonable to you?

if anything, it shows that the impact of an L3 is 15%, which is what we are saying from the begining.

juanrga :

I just realized that unknowingly i have always been saying AMD is mixing benchmark for months. what it says is:
1. they are using Excavator because Excavator is a more advance core, so they want to say ipc of the core, excluding the cache system will itself give 40% gain on top of Xcavator.

2. Since Excavator does not have a full cache system, the performance comparision will have to be made based on a PROPER CPU after taking into account the 12% IPC gain in PD compared to XV. i.e.:
XV based APU is 12% over PD based APUs. zen is 55 % over XV.
zen_ipc = PD_IPC * XV_boost * Zen_boost = 1.55*1.12 * PD_IPC = 1.74 * PD_IPC.

if anything, this weird mixing and matching does not work for you UNLESS you consider L3 as marginal gains. this consideration is unreasonable and without any proof whatsoever.

juanrga :

Question: why did amd not drop L3 from Piledriver, that would have cut the die size by almost half and improve their profitability ?

again, so you are saying that desipite have 12% IPC enhancement in Excavator (please compare AMD slides of XV and benchmarks) the fact Excavator is still 4% behind Piledriver CPU sounds reason reasonable to you?

if anything, it shows that the impact of an L3 is 15%, which is what we are saying from the begining.

juanrga :

i can say exactly the same about your computations.

sarinaide · Jan 25, 2017

The graph title happens to be titled Zen doubles FX8350 performance, to me performance implies single threaded performance.

The first half shows evolutionary improvements inside the Bulldozer uArch and it is demarkated that way by stating Kavari to Bristol ridge.

The second half is completely different, it states 'performance processors" and "performance" which implies that Zen is at very least twice as fast as Orochi.

Discussion: AMD Ryzen

Distinguished

Splendid

Splendid

Glorious

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Honorable

Distinguished

Splendid

Distinguished

Splendid

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Share this page