The instructions run in the CPU are micro-ops, because no CPU today works directly on the CISC x86 instruction set. Each x86 instruction is broken down (decoded) into simpler tasks, and in fact this used to be called RISC translation.
The microcode tells the CPU how to process these small RISC-like steps, hence its name. Generally, the hardware defaults to the fastest way but if an error is found in that path, the microcode tells the CPU to use a slower/longer workaround method instead. The microcode is like a table in RAM the CPU checks before doing anything to look up how, kind of like how a File Allocation Table in RAM is first checked by the disk controller to look up where.
The microcode table is loaded into RAM on POST and when the OS loads, if it has a table too it simply overwrites the first one in RAM. The old one stays in the BIOS so both load again on next bootup.
No programmer can see these micro-ops, but Intel's compiler is designed to optimize code to make best use of them. The compiler obviously has to be designed for specific chips, and some designs such as P4 had terrible performance if code wasn't optimized just for them--note Prescott had a hardware integer multiplier unit which Northwood lacked (it used the floating point unit instead!) so compilers have to be that specific.