That's not what SMT/HT is, and there is no such thing as commands, it's instructions and superscaler uArch is what enabled prediction and pre-execution, and that was before SMT existed.
x86 cheat sheet that lists all the registers.
https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf
Essentially what we call a "core" is just a bunch of registers exposed to the OS via the BIOS. What we call "threads" are just streams of binary operations that get executed on those registers. Assembly lets you see what is actually happening on the hardware without all the abstraction.
Basic Hello World program in ASM that is compiled on NASM.
https://www.devdungeon.com/content/hello-world-nasm-assembler
Code:
; Define variables in the data section
SECTION .DATA
hello: db 'Hello world!',10
helloLen: equ $-hello
; Code goes in the text section
SECTION .TEXT
GLOBAL _start
_start:
mov eax,4 ; 'write' system call = 4
mov ebx,1 ; file descriptor 1 = STDOUT
mov ecx,hello ; string to write
mov edx,helloLen ; length of string to write
int 80h ; call the kernel
; Terminate program
mov eax,1 ; 'exit' system call
mov ebx,0 ; exit with error code 0
int 80h ; call the kernel
Under the _start we see the x86 instructions. First is moving the value 4 to the 32-bit AX register. Then moving 1 to the 32-bit BX register, then moving memory address that holds the string data "hello" to the 32-bit CX register, then putting the length of that string into the 32-bit DX register. Finally calling the Linux OS function stored as 80h on the interrupt table which will execute it's own machine code to read and send those values to STDOUT which ends up on the screen.
For MSDOS we would use a different function
https://medium.com/ax1al/dos-assembly-101-4c3660957d25
If you wanted to do it without calling an OS function, you could instead get the memory address of the video memory representing the screen known as the frame buffer, then do a MOV <address>,<variable> to send it's contents into video memory and it would show up on the screen. For IBM PC BIOS compatible systems the VGA frame is 64kb starting at A000 and going to AFFF.
That is how stuff
actually gets done on the CPU. If there is only one set of registers then it's impossible for more then one stream of instructions to be executed at any point in time. To execute another stream the contents of the registers first have to be saved into cache memory, then the values for the registers of that new stream are loaded into the ones being used and it goes from there, it's known as a context switch. Instead if we create a second set of registers that the OS can see, the OS can now send that stream of instructions to that second set avoiding the need for the expensive context switch. The OS genuinely believes there is a second value CPU to process on.