PC Doldrums: Quarterly Shipments Hit Lowest Levels Since 2007

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Let's play a different game: what cache tier analogue is actually missing from GPUs compared to CPUs?
- L3 in CPUs traditionally sits between the memory controllers and everything else, which is where L2 in GPUs is and makes adding L3 redundant as there already is cache there.
- L2 in CPUs sits at the boundary between the uncore and the core(s) it serves, which is where L1 in GPUs is.
- L1 in CPUs has always been dedicated to individual cores, which GPUs currently don't have since the tasks of getting data where it needs to be before thread batches are executed is delegated to the thread scheduler and the massive data crossbars tying everything within the SM/CU together.

If GPUs had to operate at 4GHz, those crossbars and associated resources would need to get much smaller and more local to beat the clock.
 
I already stated my case. I have nothing more to add.

If you require further insight, check out the above CUDA Programming Guide or AMD's OpenCL optimization guide from 2012:

http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/
(the fonts now seem a bit off, particularly some of the headings)
 
Status
Not open for further replies.