News AMD's Infinity Cache May Solve Big Navi's Rumored Mediocre Memory Bandwidth

Admin · Oct 5, 2020

AMD has patented Infinity Cache, lending credence to the rumors of its existence.

AMD's Infinity Cache May Solve Big Navi's Rumored Mediocre Memory Bandwidth : Read more

awolfe63 · Oct 5, 2020

You can't patent a term, name, or phrase in the U.S. They have a trademark.

nofanneeded · Oct 5, 2020

GPU deals with a huge data I dont know how caching would solve the memory bandwidth here ... it would add latency if it needs fetching most of the time.

Conahl · Oct 5, 2020

View: https://www.youtube.com/watch?v=PBUn4RmBRDc

Chung Leong · Oct 6, 2020

nofanneeded said:
GPU deals with a huge data I dont know how caching would solve the memory bandwidth here ... it would add latency if it needs fetching most of the time.

It'll help with the ray-tracing mainly. A cache large enough to store the upper layers of the BVH tree should greatly reduce the number of reads from VRAM.

Friesiansam · Oct 6, 2020

It wouldn't surprise me if the 'rumours' of poor memory bandwidth, originated in Nvidia's marketing department.

JamesSneed · Oct 6, 2020

awolfe63 said:
You can't patent a term, name, or phrase in the U.S. They have a trademark.

AMD trademarked the name but they also patented the concept behind the name.

Likely patent: https://www.freepatentsonline.com/20200293445.pdf

digitalgriffin · Oct 6, 2020

nofanneeded said:
GPU deals with a huge data I dont know how caching would solve the memory bandwidth here ... it would add latency if it needs fetching most of the time.

pre-emptive scheduling. No different than how a CPU prefetches data and puts it into a cache when the decode engine starts reading instructions and makes predictions about what sections of memory it will read from.

<prefetch> Okay I'm going assign a CU to render a block in the upper left hand corner. Let's grab all the predicted textures in advance and create a small frame buffer for it.
<CU> Setting up triangles and calculating lighting in advance. Also doing ray hit testing. (200 cycles)
<CU> Okay I'm ready to apply that texture. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)
<CU> Okay I'm ready to apply that texture 2. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)
<CU> Okay I'm ready to apply that texture 3. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)

Old way:

<prefetch> Okay I'm going assign a CU to render a block in the upper left hand corner.
<CU> Setting up triangles and calculating lighting in advance. Also doing ray hit testing. (200 cycles)
<CU> Okay I'm ready to apply that texture. Let me retrieve texture 1 from VRAM (20 cycle penalty)
<CU> Okay I need texture 2. Let me retrieve that from VRAM (20 cycle penalty)
<CU> Okay I need texture 3. Let me retrieve that from VRAM (20 cycle penalty)

CU's handle small blocks at a time so you only need relatively small chunks of cache for them.

Plus cross CU cache coherency delays are really lowered. This is important when one CU block is reading/writing from another from the same mem space. (And why crossfire/SLI didn't work well and had glitches)

See the diff?

Avro Arrow · Oct 6, 2020

I know what cache is and how it works but I don't know if 128MB is enough to offset a 128-bit bandwidth disadvantage. However, I also don't know that it isn't enough to offset a 128-bit bandwidth disadvantage. I'm sure that the ATi engineers do know (which is why they're ATi's engineers).

If it works, it's ingenious. If it doesn't, it's moronic. Having said that, it probably will.

Avro Arrow · Oct 6, 2020

Friesiansam said:
It wouldn't surprise me if the 'rumours' of poor memory bandwidth, originated in Nvidia's marketing department.

I wouldn't be surprised either. They're just shady enough to do it.

digitalgriffin said:
pre-emptive scheduling. No different than how a CPU prefetches data and puts it into a cache when the decode engine starts reading instructions and makes predictions about what sections of memory it will read from.

<prefetch> Okay I'm going assign a CU to render a block in the upper left hand corner. Let's grab all the predicted textures in advance and create a small frame buffer for it.
<CU> Setting up triangles and calculating lighting in advance. Also doing ray hit testing. (200 cycles)
<CU> Okay I'm ready to apply that texture. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)
<CU> Okay I'm ready to apply that texture 2. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)
<CU> Okay I'm ready to apply that texture 3. Luckily for me it's already in the cache and I'm ready to go. (5 cycle penalty)

Old way:

<prefetch> Okay I'm going assign a CU to render a block in the upper left hand corner.
<CU> Setting up triangles and calculating lighting in advance. Also doing ray hit testing. (200 cycles)
<CU> Okay I'm ready to apply that texture. Let me retrieve texture 1 from VRAM (20 cycle penalty)
<CU> Okay I need texture 2. Let me retrieve that from VRAM (20 cycle penalty)
<CU> Okay I need texture 3. Let me retrieve that from VRAM (20 cycle penalty)

CU's handle small blocks at a time so you only need relatively small chunks of cache for them.

Plus cross CU cache coherency delays are really lowered. This is important when one CU block is reading/writing from another from the same mem space. (And why crossfire/SLI didn't work well and had glitches)

See the diff?

Yep. As long as 128MB is large enough to make a difference, it should work beautifully.

JayNor · Oct 6, 2020

Intel came out with Rambo Cache for their Ponte Vecchio GPU. Perhaps AMD considered Rambo Infinity, but marketing had the last say.

digitalgriffin · Oct 6, 2020

JayNor said:
Intel came out with Rambo Cache for their Ponte Vecchio GPU. Perhaps AMD considered Rambo Infinity, but marketing had the last say.

eDRAM first showed up in consumer hardware with XBox One as far as i can rememeber.

JamesSneed · Oct 7, 2020

Avro Arrow said:
I know what cache is and how it works but I don't know if 128MB is enough to offset a 128-bit bandwidth disadvantage. However, I also don't know that it isn't enough to offset a 128-bit bandwidth disadvantage. I'm sure that the ATi engineers do know (which is why they're ATi's engineers).

If it works, it's ingenious. If it doesn't, it's moronic. Having said that, it probably will.

What I do know is Lisa Su said they are competing at the high end. So whatever the secret sauce ends up really being I can confidently say AMD will be competitive with Nvidia. Until Lisa Su doesn't deliver I'm going to trust what she says, as she has built up enough equity. I'm no AMD fanboy even know I can see how that may come off I just simply have high confidence in AMD's execution under Lisa Su.

Avro Arrow · Oct 8, 2020

JamesSneed said:
What I do know is Lisa Su said they are competing at the high end. So whatever the secret sauce ends up really being I can confidently say AMD will be competitive with Nvidia. Until Lisa Su doesn't deliver I'm going to trust what she says, as she has built up enough equity. I'm no AMD fanboy even know I can see how that may come off I just simply have high confidence in AMD's execution under Lisa Su.

Yeah, that's my line of thinking as well. She called it a "Halo Product" and AMD was apparently "underwhelmed" by the RTX 3090 (to be honest, so were most people) which is why I said that it will probably work.

Search

News AMD's Infinity Cache May Solve Big Navi's Rumored Mediocre Memory Bandwidth

Admin

Administrator

awolfe63

Distinguished

nofanneeded

Respectable

Conahl

Commendable

Chung Leong

Reputable

Friesiansam

Distinguished

JamesSneed

Judicious

digitalgriffin

Splendid

Avro Arrow

Splendid

Avro Arrow

Splendid

JayNor

Reputable

digitalgriffin

Splendid

JamesSneed

Judicious

Avro Arrow

Splendid

TRENDING THREADS

Latest posts

Moderators online

Share this page