The author of this thread is currently way, but I will try to help you.
First off if you have 3 slots and only plan to use to cards 2 x 8 slots will be best, but the last slot will most likely have to be left empty.
The reason for these combinations is because a system only has so many pci-e lanes to use.
So if you have 16 lanes(some boards have more or even less to work with. Boards with less should not have SLI or crossfire as an option clearly), you normally break them up just like your board.
Single card, use all 16
2 cards, use 8 per slot.
3 cards use 8 for the top card and split the remaining 8 lanes for last 2 cards. PCI-e does not work on other non standard configs. They can not go 5 per card for instance.
Now how much performance does a lower number of links cause? well for that I recommend you look at these article from techpowerup. The first one is a bit older, but the cards are very fast and it gives you an idea of what to expect. These are single cards tests as well. If you would not loose much in a single you should not loose much in a multi card setup(again all but the fastest cards on the market).
Ivy Bridge PCI-Express Scaling with HD 7970 and GTX 680
and
GeForce GTX 980 PCI-Express Scaling
You will notice these tests also have different revisions of PCI-e. This was important when people moved from 2.0 to 3.0 because many worried that the new 3.0 cards would be too slow on 2.0 systems. It takes a pretty fast card to require more bandwidth.
I hope this helps clear some things up.