I'm not saying you're wrong, it could be my lack of knowledge, but wouldn't replacing hyperthreading with e cores result in worse performance due to them being lower power and thus lower clocked? Or would it result in more performance due to more physical cores? I could see both things happening.
It really depends on work load.
First we have to understand what SMT really is, a secondary x86 register stack. So quick class on some basic superscalar uArch stuff.
There are no modern x86 processors and haven't been for a long time. Instead both Intel and AMD process CPU's with their own proprietary internal language. The front end instruction decoders accept x86 instructions, then convert them into smaller proprietary instructions that get shipped to the scheduler, that then schedules them to be executed on internal resources. Basic integer operations are done on the Arithmetic Logic Units (ALU's), memory instructions are executed on the Address Generation Units (AGU's) or Memory Management Units (MMU's). Floating point and SIMD (SSE/AVX/etc) instructions are shipped offed to the FPU / SIMD units. After the work is done the result is dumped onto the register stack in a format identical to what x86 produced.
How does SMT fit into this? Well just because x86 only allows for one operation at a time doesn't mean we can't have multiple of those processor resources. CPU cores frequently have multiple ALU's, AGU's, MMU's and FPU's, meaning there is always some amount of resource units sitting around not doing anything. If we introduce a second x86 register stack on the front end decoder, then we can accept two separate work streams and the decoder / scheduler can then assign work to those units and we can increase our total performance. The down side is that there is
no preference between even and odd numbered cores. If the OS assigns one thread to core 2 then another thread to core 3, there is a high likelihood those threads might end up fighting over the cores processing resources. There have been many methods to get around this, normally the OS's scheduler see's the CPU family and from that looks up how to treat different cores and tries not to assign two busy threads to adjacent target pairs.
With the advent of heterogeneous computing, instead of an 8 core SMT CPU, it might be better to do four heavy cores and 16 thin cores. The heavy cores might end up with unused resources but lately it's been thermal dissipation that has limited performance not processor resource availability.