Not knowledgeable enough to go into much detail, but there's shared resources between within a core, shared between the 2 'threads.
To have >2 "threads" per core, you'd need more resources, taking up more space.
Now sure, in theory; 1 core with 8 threads would probably take up marginally less space than 4 cores w/8threads
The best analogy (although ridiculously over-simplified) I ever heard for HT or SMT was
HT = hands
Core = mouth.
Data to process = food.
The hands can bring food to your mouth, but you only have one mouth - the second hand is ready to go as soon as the mouth is free.
Having 8 hands to only 1 mouth, doesn't seem particularly efficient.
Sure, food/data is always ready to go as soon as the mouth/core is free, but that's also true with 2 threads.
Essentially, 8 threads per core would just create a ridiculous, unnecessary 'bottleneck' in the process.
Another aspect of this is to consider "wave", clock cycle. Back and forth. It's running the "hyperthread" on the swing of the clock in a very super simplified look. It's not really that simple, but an easy way to envision it.