gamerk316 :
Boy, you people talk like using CPU extensions is trivial. They only apply in VERY specific situations, and most workloads aren't designed to easily allow these instructions to be used.
Half true, I'd say. I do concede that in the broad sense, you can't optimize a certain point, so using AVX, SSE or whatever won't do magic since you'd have to re-work the code itself around a certain "feature" to be used. BUT, for regular code that *already has* code paths that allow for this kind of "easy" speed up, it is worth it. This comes from the first X87 vs MMX talks.
Cheers!