Question Nvidia, does each SM have it's own, independent 32-bit memory data bus?

80251

Distinguished
Jan 5, 2015
351
70
18,860
According to this https://forums.developer.nvidia.com/t/what-is-cores-per-sm/29997/4 Cuda cores are part of SMs. So do SMs provide individual MMUs over a 32-bit memory data bus to specific VRAM ICs? Or is there an independent MMU that accesses all VRAM for the individual SMs using some sort of ring bus or infinity fabric? In the past I had thought I read somewhere that each SM had its own independent , 32-bit, memory data bus.
 
Thanks Eximo, your answer brings up almost as many questions as it answers. Maybe each SM has its own MMU and they each communicate w/each other over some sort of bus? Maybe if each individual SM MMU controls a specific address space then if another SM MMU needed data from that address space it would know to which SM MMU to send its request? In which case the videocard would be a NUMA device despite the fact all the SMs are on the same die? Are the L2 caches specific to each SM?
The waveforms for the clocks for GDDR6x and GDDR7 memory were amazing to see -- way beyond edge triggered or level sensitive latches.
 
Lv0 and Lv1 cache are on the SM, and the large L2 cache pool must be shared.
If each quadrant of the SM can pump out 32 threads per clock, perhaps there is an internal bus between them? They don't really go into detail on what the Load/Store blocks are capable of, or the Special Function Units. That might be available in older white papers though.
LoadStore units might be able to hold enough data while waiting for the bus between the shaders to be free to transmit on, but the block diagram leaves much unspoken I am sure.