NVidia and ATI have taken different rounts in designing their GPUs. As such shader code that is made to be efficient on ATI cards isn't always so efficient on nVidia cards, and vice versa. The performance of ATI cards in particular varies since it's shaders are organized into groups of 5 and depending on the thread they are given only one stream may be active, or all 5 can be active. It really depends on the order in which the variables are multiplied, while in the nVidia architecture each shader can operate independently. Of course, it's a bit more complicated than that, so if you really want to know you should research it.