Tradesman1 :
Intel in Dual channel sees the DRAM as a 128 bit device, in tri channel on 1366 as a 192 bit device and in quad mode on 2011 as a 256 bit device
If they were ganged that would be true. However, Intel has used interleaved memory for quite some time; AMD supports it as well as far as I know. Interleaving ping-pongs physical address space assignments along cache block sized chunks (64 bytes) between banks, ranks, and channels (each can be enabled independently).
No interleaving:
Addresses 0-63 will be on channel0, rank0, bank0
Addresses 64-127 will be on channel0, rank0, bank0
Addresses 128-191 will be on channel0, rank0, bank0
Addresses 192-255 will be on channel0, rank0, bank0
This pattern repeats until channel0,rank0,bank0 is full, then moves on to channel0,rank0,bank1 and so on.
Channel interleaving:
Addresses 0-63 will be on channel0, rank0, bank0
Addresses 64-127 will be on channel1, rank0, bank0
Addresses 128-191 will be on channel2, rank0, bank0
Addresses 192-255 will be on channel3, rank0, bank0
Addresses 256-319 will be on channel0,rank0, bank0
This pattern repeats until rank0,bank0 is full on each channel, then moves on to rank0,bank1, and so on.
Bank interleaving:
Addresses 0-63 will be on channel0, rank0, bank0
Addresses 64-127 will be on channel0, rank0, bank1
Addresses 128-191 will be on channel0, rank0, bank2
Addresses 192-255 will be on channel0, rank0, bank3
Addresses 256-319 will be on channel0, rank0, bank4
Addresses 320-383 will be on channel0, rank0, bank5
Addresses 384-447 will be on channel0, rank0, bank6
Addresses 448-511 will be on channel0, rank0, bank7
Addresses 512-575 will be on channel0, rank0, bank0
This pattern repeats until all of the banks on channel0, rank0 are full, and then moves on to channel0, rank1 (if installed) and eventually to channel1, rank0.
Channel and bank interleaving:
Addresses 0-63 will be on channel0, rank0, bank0
Addresses 64-127 will be on channel1, rank0, bank0
Addresses 128-191 will be on channel2, rank0, bank0
Addresses 192-255 will be on channel3, rank0, bank0
Addresses 256-319 will be on channel0, rank0, bank1
Addresses 320-383 will be on channel1, rank0, bank1
Addresses 384-447 will be on channel2, rank0, bank1
Addresses 448-511 will be on channel3, rank0, bank1
This pattern repeats through the 8 DDR3 banks until all are full, then moves on to the next rank.
The benefit of using four independent 64-bit DRAM channels over a single large 256-bit DRAM channel is reduced latency for small datasets, especially those that have little spatial locality.
Loading a physically contiguous 256 byte dataset aligned on a 256 byte boundary using channel interleaving would be incredibly quick. Just select rank0,bank0 and read the desired column address. However, loading the same data set using bank interleaving requires opening four rows on four banks on one channel (banks 0,1,2,3) and burst transferring from all of them sequentially. Multiple channels does not help here. Without any interleaving, the memory controller would have to perform four separate read operations from the same row (or two rows if it crosses a row boundary) from a single bank.
Similarly, loading a physically contiguous 64 byte data set (a single cache block) using channel interleaving would be quick if the memory controller could operate all four channels independently, but if the channels are ganged together, 192 bytes out of 256 bytes will be masked off. As a result, loading two or more unrelated 64-byte datasets may result in a block which incurs a latency penalty. Loading the same 64 byte data set using bank interleaving is very simple as well. However, if no interleaving is done, the memory controller may be blocked while it waits for another read operation on the same bank to complete. Ganging in a bank-interleaved configuration would be quite useless as there's non-unit stride between the addresses associated with each channel.
Channel and bank interleaving provides the best of both worlds and results in lower random access times across the board.