There is one very significant piece of information that is not included in this article. Which particular ports being used on the controller makes a big difference.
Most of the embedded chipsets (or external chipsets) carry a multiplexer between SATA and PCI Express. The CPUs accept PCI Express connections, not SATA, so there is a conversion that must be made, which is done by the SATA chipset. Each lane on PCI Express 2.0 supports approximately 8GB/s, and PCI Express 3.0 supports approximately 15 GB/s.
Here's the problem I have seen in external expansion slots. They connect 4 SATA slots to a single PCI Express 2.0. So potentially, four connected SATA 6 GB/s drives, or 24 GB/s total I/O throughput, is being processed into a single 5 GB/s connection to the CPU. I don't care how good the SATA chipset is at processing and prioritizing I/O data, you are going to have an I/O bottleneck. Even four SATA 3 GB/s drives create a total of 12 GB/s throughput, more than a single PCI Express 2.0 lane can handle. SSDs can approach speeds greater than 3 GB/s, so it is not a theoretical bottleneck, it is a very real limitation.
So going back to the article. At most, I have seen 4 SATA slots connected to a single PCI Express 2.0 lane. I have seen 6 or 8 connected to either 2 discrete lanes or a 2x lane (or 4x lane when talking about SAS), which carries approximately 10 GB/s of total throughput. So depending on the implementation of the embedded chipset on the motherboard, it may be the PCI Express lanes giving you the throughput limitation and not the SATA chipset. Different ports may be connected to different 1x PCI Express lanes or to a 2x lane, giving you either two discrete paths to the CPU, maximizing throughput, or a larger pipeline to the CPU, which is better than a 1x lane but not nearly as good as discrete pathways.
I have an external PCI Express controller with a few drives on my main system, and when transferring files from drives on the internal (motherboard) chipset to drives on the connected card, there is a noticeable throughput difference.