News AMD's Navi 31 to Feature 384-Bit Memory Interface

Oh ok, good that AMD is realising dumping huge cache isn't the way forward. This time they stop at 192MB.

.... It does work well at 1080p but starts dropping off once you hit 1440p and 4K... There is no way a tiny cache is enough at higher resolution. You need extra memory bandwidth too!!

Not to mention 5nm is alot more expensive compared to 7nm. So, you don't want to waste the silicon on just cache memory.
 
Last edited:
Oh ok, good that AMD is realising dumping huge cache isn't the way forward. This time they stop at 192MB.

.... It does work well at 1080p but starts dropping off once you hit 1440p and 4K... There is no way a tiny cache is enough at higher resolution. You need extra memory bandwidth too!!

Not to mention 5nm is alot more expensive compared to 7nm. So, you don't want to waste the silicon on just cache memory.

They're not leaving the cache idea - it's both, the cache and the bus width. The cache volume only just not enough since (if) computing power doubled.

Now, about spending expensive 5 nm silicon budget on cache: the cache is suppose to be on IO die and, if I'm not mistaken, the IO die is 6 nm. Huge bus width makes the PCB expensive (more layers) so AMD is balancing between expensive cache, expensive GDDR and expensive PCB.
 
Last edited:
  • Like
Reactions: Rdslw and Makaveli
Oh ok, good that AMD is realising dumping huge cache isn't the way forward. This time they stop at 192MB.

.... It does work well at 1080p but starts dropping off once you hit 1440p and 4K... There is no way a tiny cache is enough at higher resolution. You need extra memory bandwidth too!!

Not to mention 5nm is alot more expensive compared to 7nm. So, you don't want to waste the silicon on just cache memory.

I'm pretty certain i've seen projected specs showing the infinity cache sizes will be going up in the Next Gen gpu's so both are welcomed.
 
That's actually nice, hopefully helps performance quite a bit for bandwidth heavy operations, and that's a match to the RTX 3080 high end series (the 12 GB models?) and 3090 series.
I would definitively imagine it costs more for sure, might generate a bit more heat as well and consume a bit more power as well.
 
  • Like
Reactions: Rdslw
New AMD cards seems like a nice balance, I really hope they will deliver. I set my workstation in SFF and when I look at ampere, I know that will not happen for me.
AMD side seems promising, there is nothing there that screams 300W idle... so I have hope that dual height GPU won't be red-hot each time I recalculate something.
Fingers crossed guys. I am running 1060 right now, and I really need something stronger, I have to many coffee breaks.
 
AMD's flagship RDNA 3 GPU will apparently feature a wider 384-bit memory interface to increase overall bandwidth, according to some driver patch code.

AMD's Navi 31 to Feature 384-Bit Memory Interface : Read more

It's an interesting design choice, and one I had not expected. When I did my paper napkin block diagram, I did design out a separate memory controller chip to handle the IO. Cache was on another chiplette and CU on another. I think AMD had several such designs on the table. One with the compute clusters broken into two chips. CU's generate a lot of heat as they are focused around SIMD/MIMD matrix FP16 ops. So I thought it logical to break this up and bin them. I was more worried about the CU->memory interface -> CU (coherency) rather than the memory interface to memory due to that inherent latency. But memory latency is a huge issue due to long access times followed by high bandwidth.

This one seems optimized for memory bandwidth (mcd) over compute. Memory interfaces tend to be cooler and run slower because of trace and access latencies to memory. Having independent memory controllers allows an improved efficiency on ops that don't require 192 bit wide access. For example fetching 32 bits of data for a 192 bit bus is a waste due to latency. If the next data you need is non sequential, you have an even larger stall in the pipe. With a narrow 32 bit bus for each MCD, the likelihood of this happening is considerably smaller. But you have to design for parallel workloads with each doing separate work. And the cache kept for local draw call buffers. The final call where you compost the scene and flip the screen buffer window will be an interesting algorithm. While a section of the screen is being composted using mostly infinity cache using a tile approach, the io controller will have to in parallel be lining up optimal access to GDDR to compost the next section.

Driver optimizations in the past focused around keep commonly accessed compiled shaders and assets in GDDR memory. This will lead to a new scheduling technique. AI can be used to predict blocks of GDDR memory will be used before they are even executed. Then you have a shopping line algorithm bucket sift which is NP Complete to optimize the retrieval before the data is needed. Compute execution will be based on the results of which memory is ready first.
 
Does anyone else think this sentence is completely unnecessary?

If AMD’s Navi 31 indeed comes with a 384-bit memory bus, that would indirectly mean the company is positioning its upcoming flagship GPU higher than its current-generation Navi 21.

Is anyone seriously expecting the new gen flagship to be slower than the current flagship?
 
Last edited:
Does anyone else think this sentence is completely unnecessary?

"...the company is positioning its upcoming flagship GPU higher than its current-generation Navi 21."

Is anyone seriously expecting the new gen flagship to be slower than the current flagship?
You're thinking about performance, but Anton is referring to the target market. Navi 21 went for the high-end to just barely extreme price range ($580 to $1100). Nvidia GA102 went higher, $700 to $1500 (and even $2000 with the 3090 Ti launch). Of course Navi 31 will be faster than Navi 21, but we also think there's a good chance AMD will go after the absolute performance and extreme price crowd. That's speculation, but then so is a LOT of stuff with this news. So it could be a $1,500 part.

Frankly, I'm still VERY skeptical of the "guesswork" showing chiplet MCDs. That makes no sense to me. Each MCD would need to link to the GCD via, what, Infinity Fabric? But having six 64-bit MCDs would mean having six Infinity Fabric links, most likely, which hasn't really simplified anything. The only thing that would accomplish is moving the cache off the GPU and onto a separate chip, which maybe works out okay. Anyway, I'm not saying AMD isn't going this route, but I I still think a larger MCD + Cache with a varying number of enabled memory links (binning) connecting to multiple GCDs would be more sensible.

Also, the prospect of 3D V-Cache might seem tantalizing, but it basically added $100 to the cost of 5800X3D. Maybe smaller cache chips only add $25 to $50 per MCD, but that's still up to $300 to the raw bill of materials, which would mean the cards would have to cost basically $500 more.
 
  • Like
Reactions: digitalgriffin

Latest posts