The document you're reading is correct.
All you need to do ECC is the circuitry to perform the ECC algorithm, and the space. Since ECC circuitry is located on the memory controller, then all you need is a memory controller that does ECC and you're set. The problem is that if you run ECC on a non-parity or non-ecc module, you will lose space due to extra bits stored as redundancy bits. So an ECC module will come with the extra space, so that error correction and detection has the extra space. Basically, a non-ecc module with 128mb would lose some of that space to do ecc, however an ecc 128mb module would have extra memory chips on it to account for the error checking/correcting bits.
Now if that's really possible in practice, I'm not sure. But on paper it should work.
I think the algorithm described is called hamming codes. The document describes two different types of error checking: parity and hamming codes. Parity is just 1 extra bit calculated and only used to detect 1 error but it cannot correct errors or detect multiple errors. Hamming codes use more bits, but can check for multiple errors and correct one error.
I'm not sure how it carries on beyond this but I would think that there is wasted bandwidth from the ram to the memory controller. For example, in order to request 12 bits, the memory controller must actually request 16 bits in ECC mode, perform the algorithm, and return the 12 bits. Those aren't exact numbers, just an example.
From the given information, I think we can deduce that there is some overhead from running the algorithm to do ecc.
We can also speculate that there might be wasted bandwidth (assuming the bandwidth to the memory controller is the same as what the memory is rated). But you could also fix this by adding a few additional lines for the extra bits sent. So if the standard is 12 bits non-ecc and 16 bits ecc, then you just need 16 bits wide bus and don't use the extra 4 during non-ecc mode.
This is easy to test if you have an ecc module obviously... just plug it in, run sandra, see if the memory bandwidth differs from a normal non-ecc module. If not, then all you have to test for is the overhead from the ecc algorithm.