News AMD's Infinity Cache May Solve Big Navi's Rumored Mediocre Memory Bandwidth

Chung Leong

Reputable
Dec 6, 2019
493
193
4,860
GPU deals with a huge data I dont know how caching would solve the memory bandwidth here ... it would add latency if it needs fetching most of the time.

It'll help with the ray-tracing mainly. A cache large enough to store the upper layers of the BVH tree should greatly reduce the number of reads from VRAM.
 
GPU deals with a huge data I dont know how caching would solve the memory bandwidth here ... it would add latency if it needs fetching most of the time.

pre-emptive scheduling. No different than how a CPU prefetches data and puts it into a cache when the decode engine starts reading instructions and makes predictions about what sections of memory it will read from.

<prefetch> Okay I'm going assign a CU to render a block in the upper left hand corner. Let's grab all the predicted textures in advance and create a small frame buffer for it.
<CU> Setting up triangles and calculating lighting in advance. Also doing ray hit testing. (200 cycles)
<CU> Okay I'm ready to apply that texture. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)
<CU> Okay I'm ready to apply that texture 2. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)
<CU> Okay I'm ready to apply that texture 3. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)

Old way:

<prefetch> Okay I'm going assign a CU to render a block in the upper left hand corner.
<CU> Setting up triangles and calculating lighting in advance. Also doing ray hit testing. (200 cycles)
<CU> Okay I'm ready to apply that texture. Let me retrieve texture 1 from VRAM (20 cycle penalty)
<CU> Okay I need texture 2. Let me retrieve that from VRAM (20 cycle penalty)
<CU> Okay I need texture 3. Let me retrieve that from VRAM (20 cycle penalty)

CU's handle small blocks at a time so you only need relatively small chunks of cache for them.

Plus cross CU cache coherency delays are really lowered. This is important when one CU block is reading/writing from another from the same mem space. (And why crossfire/SLI didn't work well and had glitches)

See the diff?
 
I know what cache is and how it works but I don't know if 128MB is enough to offset a 128-bit bandwidth disadvantage. However, I also don't know that it isn't enough to offset a 128-bit bandwidth disadvantage. I'm sure that the ATi engineers do know (which is why they're ATi's engineers).

If it works, it's ingenious. If it doesn't, it's moronic. Having said that, it probably will.
 
It wouldn't surprise me if the 'rumours' of poor memory bandwidth, originated in Nvidia's marketing department.
I wouldn't be surprised either. They're just shady enough to do it.
pre-emptive scheduling. No different than how a CPU prefetches data and puts it into a cache when the decode engine starts reading instructions and makes predictions about what sections of memory it will read from.

<prefetch> Okay I'm going assign a CU to render a block in the upper left hand corner. Let's grab all the predicted textures in advance and create a small frame buffer for it.
<CU> Setting up triangles and calculating lighting in advance. Also doing ray hit testing. (200 cycles)
<CU> Okay I'm ready to apply that texture. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)
<CU> Okay I'm ready to apply that texture 2. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)
<CU> Okay I'm ready to apply that texture 3. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)

Old way:

<prefetch> Okay I'm going assign a CU to render a block in the upper left hand corner.
<CU> Setting up triangles and calculating lighting in advance. Also doing ray hit testing. (200 cycles)
<CU> Okay I'm ready to apply that texture. Let me retrieve texture 1 from VRAM (20 cycle penalty)
<CU> Okay I need texture 2. Let me retrieve that from VRAM (20 cycle penalty)
<CU> Okay I need texture 3. Let me retrieve that from VRAM (20 cycle penalty)

CU's handle small blocks at a time so you only need relatively small chunks of cache for them.

Plus cross CU cache coherency delays are really lowered. This is important when one CU block is reading/writing from another from the same mem space. (And why crossfire/SLI didn't work well and had glitches)

See the diff?
Yep. As long as 128MB is large enough to make a difference, it should work beautifully.
 

JayNor

Reputable
May 31, 2019
426
85
4,760
Intel came out with Rambo Cache for their Ponte Vecchio GPU. Perhaps AMD considered Rambo Infinity, but marketing had the last say.
 
I know what cache is and how it works but I don't know if 128MB is enough to offset a 128-bit bandwidth disadvantage. However, I also don't know that it isn't enough to offset a 128-bit bandwidth disadvantage. I'm sure that the ATi engineers do know (which is why they're ATi's engineers).

If it works, it's ingenious. If it doesn't, it's moronic. Having said that, it probably will.

What I do know is Lisa Su said they are competing at the high end. So whatever the secret sauce ends up really being I can confidently say AMD will be competitive with Nvidia. Until Lisa Su doesn't deliver I'm going to trust what she says, as she has built up enough equity. I'm no AMD fanboy even know I can see how that may come off I just simply have high confidence in AMD's execution under Lisa Su.
 
  • Like
Reactions: Avro Arrow
What I do know is Lisa Su said they are competing at the high end. So whatever the secret sauce ends up really being I can confidently say AMD will be competitive with Nvidia. Until Lisa Su doesn't deliver I'm going to trust what she says, as she has built up enough equity. I'm no AMD fanboy even know I can see how that may come off I just simply have high confidence in AMD's execution under Lisa Su.
Yeah, that's my line of thinking as well. She called it a "Halo Product" and AMD was apparently "underwhelmed" by the RTX 3090 (to be honest, so were most people) which is why I said that it will probably work.