And that is exactly the data they used as starting point. 8C Zen on pair with 8C Broadwell: ~107% vs 115%.
But we were comparing 6C Zen vs 4C Kabylake here.
The problem, which apparently everyone is ignoring, is on the clocks. Everyone, including myself was expecting the 4C Zen to have higher clocks than the 8C Zen. I expected about 13% higher clocks for the 65W 4C. But all the information and leaks claim that the 8C Zen has the higher clocks of the whole Zen family.
Even if Zen has an average throughput-per-clock (IPC+SMT) close to Broadwell, the 6C Zen has nearly 30% lower clocks than Kabylake.
Now the question is why those lower clocks for the 6C and the 4C Zen. I advanced two possible explanations: 14LPP and CCX. I will further extend.
[[ 14LPP ]]
Despite people believing the contrary, 14LPP is a Low Power node. It is not 14HP. I expected about 3.0/3.5GHz clocks for the 95W 8C Zen. It seems fully confirmed there exists a 3.4/3.8GHz PC/PR chip (F4 steeping). This is 11% higher clocks than I expected.
There is a 3.6/4.0GHz PC/PR chip (also F4 steeping) but according to CPCHardware this is not a 95W chip. CPCHardware didn't give us the exact TDP, but if data available is accurate we can try to estimate the TDP for that top SKU
95W * (3.6/3.4)^2 = 106W
95W * (3.6/3.4)^3 = 113W
Let us assume it is a 110W chip, which agrees with CPCHardware claim the TDP is "above 100W". Ok.
By silicon laws, and even common sense, the 4C Zen would have higher clocks. I expected 3.4GHz (but my baseline was my former estimation of 3.0GHz for the 95W 8C). Using the new 3.4GHz 95W 8C chip as baseline, we obtain for the 65W 4C Zen
3.4GHz * ROOT2 [ 2* (65/95) ] = 4.0GHz
3.4GHz * ROOT3 [ 2* (65/95) ] = 3.8GHz
Let us take 3.9GHz as base clock for a 65W 4C Zen on F4. However, the 4C Zen is rumored by everyone to come with much lower clocks. A possible explanation is that AMD is selecting the 8C chips (a la FX-9000 series) to get the higher possible clocks, leaving the normal silicon (the one cannot clock so high) for the rest of (cheaper) chips.
This is a possible explanation for the discrepancy between my predictions for clocks and reported clocks. It is like if I had predicted the clocks for the FX-8350 (average silicon) but AMD had started launching a FX-9590 (golden silicon). This analogy is not completely accurate, of course, because one is a 125W chip and the other is 220W, but you get the point.
If this explanation is correct, it says us why the 6C Zen has lower clocks. A 95W 6C Zen would have clocks ~4GHz. This would be the common sense approach. This is what Intel does, with its 140W 6C Broadwell achieving higher clocks than its 140W 8C Broadwell. Indeed taking the 3.2GHz base for the 8C Broadwell we obtain for a 6C Broadwell
3.2GHz * ROOT2 [ 8/6 ] = 3.7GHz
3.2GHz * ROOT3 [ 8/6 ] = 3.5GHz
And, unsurprisingly, the i7-6850K has a base clock of 3.6GHz.
However, if the 8C Zen chips are already in the upper limit of the 14LPP silicon, then a 95W 6C Zen couldn't achieve ~4GHz clocks. Only option left for AMD would be to reduce the TDP to 65W and ship it with lower clocks: 3.3GHz.
The 8C Zen must be close to the 8C Broadwell, because clocks are close, but the 6C Zen is being "destroyed" by a higher-clocked 4C Kabylake on heavily multithreaded benches. Ouch!
[[ CCX ]]
Another possible explanation for the lower clocks is the weird CCX approach. It is weird because one doesn't waste time designing a modular approach just to then ignore the modules and treat each core independently. We don't know the fine details of the CCX, but we know that performance varies depending of the topology. Not all 4C Zen are the same, the 4+0 chips perform differently than the 2+2 chips and the 3+1 chips.
Maybe this weird combination of modules plus individual core treatment is the origin for the lower clocks on lower core models. Maybe to avoid the unbalances generated by different active core topologies (4+0 is not the same than 2+2 evidently) it is needed to synchronize the pair of CCX in a module in a special way, just to make believe the OS that all 4C chips are the same. If this special synchronization exists it could add extra latency (from CCXtoCCX communication) and hurt clocks. Recall that f_max is a function of the length of the critical datapath.
This could explain why only the chips with full modules achieve the higher clocks, whereas the chips with less than 8C have clocks problems.
Time will say...