Indeed we take care in writing this code to be optimized enough. In regards to the SIMD sqrt, if we need to hand-code a SIMD version reviewing instruction set manuals or writing assembly code, the whole point of "easy write" would be lost. A hand-coded Verilog implementation of hardware could be...