I like the E-cores, I just don't want them to be "Hybridized" with my P-cores.
I want ALL "E-cores" or ALL "P-cores" on seperate CPU's for Desktop.
But
why?? Do you not trust the OS to schedule them properly? That's the only explanation I can imagine, but I haven't seen evidence to support it.
Intel needs to improve their Hyper-Threading game, IBM is able to Hyper-Thread up to 8 Threads per core with SMT8 on their latest POWER10.
Why do you assume more is better? As they come at a nonzero cost, Intel & AMD are walking a balance between that cost and the benefits they provide in terms of latency-hiding and core utilization.
I trust they've done the math and concluded 2-per-core is best. Maybe 4 really would be cost-effective, but they're just gun-shy from all these side-channel attacks that seem to argue for disabling it altogether. In that case, maybe they don't want to burn silicon doubling-down on a feature some big customers might simply opt not to use.
BTW, there's a very good solution to the SMT side-channel attack problem, which Google has already implemented in Linux: only allowing threads from the same process to share a core. If you do that, there's no cause for concern. It could also be implemented at the hypervisor level, which might already be the case for all I know.
Especially given how wide AVX3-512 has become along with all the extra registers needed to support it.
You can't skimp on ISA registers. The only registers you can partition are the shadow registers, and I think we don't know how many they have for vector.
FWIW, Zen3 has 64 FP scheduler slots, which I think means you're not going to have more than 64 FP shadow registers. Since AVX-512 already increased the FP ISA registers to 32, we're not talking about a lot to partition between many threads.
I just don't see the value in a Hybrid solution that was clearly designed for Mobile to be forced onto DeskTop users.
Huh? Do you think CPUs have an infinite silicon and power budget? Because that's the only universe in which E-cores don't also make sense for desktop.
As I said before, they give you 60% of P-core performance at 25% of the area and about 20% of the power. How is that not relevant for desktops, when we live in a world where all-core clocks are lower than peak single-thread clocks? Not to mention die area -> cost.
Never shall the two meet and suffer a Ringbus performance penalty because code touched the E-cores.
Ring bus doesn't
really have anything to do with E-cores. It's outlived its usefulness, anyhow. IMO, they shouldn't use it in any CPU with more than 8 bus stops anyway. Mesh scales way better.
Never shall we waste die area for a Thread Director because we need a "Hybrid big.LITTLE solution".
I'm sure the Thread Director is like 0.1% of die area. The kinds of stats it keeps are mostly those already tracked by core performance counters.
Furthermore, you incorrectly assume the Thread Director is only useful for P- vs. E- scheduling, but I assure you it's quite valuable for Hyper Threading, because vector workloads scale so poorly with > 1 thread per core. It'll be telling if Intel leaves it in Sapphire Rapids.