Well, reading the original Phoronix article and then the patch itself, it isn't so much an "optimization" as "removing a stupid limitation" where an instruction available on AVX2 was restricted to run only on AVX512 processors through a bunch of #IFDEF.
The patch is less than 20 lines long and almost all of them replace an AVX512 check with an AVX2 one - meaning the original programmer simply didn't care about what CPU the instruction was available on, he simply minded the CPU he was running currently. As far as I know, it's still bad practice to detect the platform instead of the feature - the only case where it's acceptable is when the feature is present but buggy and you have no efficient way to circumvent the bug.