Hbm stacking vs gddr5

buxti8

Commendable
Mar 23, 2016
9
0
1,510
If i get 2 gpus with 4gb hbm, will it then be 8gb hbm or will it like gddr5 be limited to 4gb?
 
Solution
Most of the resources on one GPU are duplicated on the other, which means putting two 4GB GPUs together still only gives you 4GB effective.

Accessing the other GPU's RAM would not be very effective anyway: HBM and 384+bits wide GDDR5 are capable of 300+GB/s while the PCIe 3.0 x16 interface is only capable of 16GB/s. The other GPU would only have access to 1/16th of the other's memory bandwidth, 1/32 of the bandwidth if you are doing CF with x8x8 PCIe. You would be better off using a single GPU and letting it access system RAM at 16GB/s since DDR4-2133 can do 34GB/s.

The memory interface may be slightly different but it changes nothing to off-GPU bottlenecks. Some other solution like NVLink is required if you want a high bandwidth...

Math Geek

Titan
Ambassador
each gpu will use the resources on the card. so it is effectively 8 gb of vram when both are working, though a single gpu won't use all 8 if working alone.

they are still working on sharing resources like this at the enterprise level and it will be a while before we see consumer level abilities like that.
 

InvalidError

Titan
Moderator
Most of the resources on one GPU are duplicated on the other, which means putting two 4GB GPUs together still only gives you 4GB effective.

Accessing the other GPU's RAM would not be very effective anyway: HBM and 384+bits wide GDDR5 are capable of 300+GB/s while the PCIe 3.0 x16 interface is only capable of 16GB/s. The other GPU would only have access to 1/16th of the other's memory bandwidth, 1/32 of the bandwidth if you are doing CF with x8x8 PCIe. You would be better off using a single GPU and letting it access system RAM at 16GB/s since DDR4-2133 can do 34GB/s.

The memory interface may be slightly different but it changes nothing to off-GPU bottlenecks. Some other solution like NVLink is required if you want a high bandwidth interconnect between chips for sharing local memory.
 
Solution