It's pretty simple why tri-fire/SLI is less susceptible to microstutter than dual. It's less likely to fall into a consistently highly asynchronous frame output pattern, and not as detrimental when it does.
Consider these simple graphs over time. I've used numbers to represent when each card # is outputting a frame. Each digit or dash, left to right, counts as a microsecond. Assume each card is capable one frame every 10 milliseconds.
Assuming load is staying fairly consistent, a single card setup would have the following output pattern:
1---------1---------1---------1---------1-
A dual card setup, assuming two of the same cards as in the last diagram, would (optimally, in terms of smoothness), have an output pattern that looks like this:
1----2----1----2----1----2----1----2----1-
In order to produce this output, your driver must have the ability to force one card to 'wait' for the other, as well as have good algorithms to detect just WHEN such waiting should happen. As the article describes, the reason that nV cards have less microstutter, is simply because they use better/stricter algorithms in terms of imposing 'wait times' in order to maintain even frame output over time.
Now, back to the dual/tri card thing.
With only two cards, it's totally possible for two equally powered cards to fall into the following output pattern:
12--------12--------12--------12--------12
This pattern produces the extreme contrast of the very short/very long frametimes needed to create the horrible microstutter profile seen in the graphs with two cards in xfire. (Note that this pattern ALSO happens to produce the *best* possible FPS scaling, because neither card is ever being 'forced to wait' by the driver in order to maintain even distribution of frame times.)
Optimal output from three of these same cards would look something like this:
1---2--3--1---2--3--1---2--3--1---2--3--1
And the worst possible would look something like this:
123-------123-------123-------123-------123-------
Two things to note about this relative to the worst-case scenario with two cards:
#1 is that the time ratio between when frames are being rendered and when they are not, is lower. With two cards, we have 2 frames rendered per 10 milliseconds, whereas with this tri-gpu scenario, we have frames rendered 3 per 10 milliseconds, which is 33% of the time. Might not sound like much, but 3 is actually 50% more than 2.
#2 is that with 3 cards rendering frames over the course of each 10 millisecond period, it's statistically more unlikely that the card's will fall into the 'worst case' output pattern. IOW, the likelihood of this pattern developing (with two cards):
12--------12--------12
is statistically considerably higher than the probability of this pattern developing (with three cards):
123-------123-------123
And as explained in #1, even when it DOES, it's not as detrimental of a phenomenon.
With three cards, all of which are the same speed, you simply have less chance of having large gaps in time when no frames are being produced.