News RDNA 3 Reportedly More Than Doubles the Core Count of the RX 6000 Series GPUs

I hope these numbers are not fantasy. Just like Nvidia changed their technology and their "double the cuda cores" were BS and were actually now counted twice.
If AMD is really going the MCM route with RDNA 3, then AMD can put together 2-4 dies worth without fab process yields being an issue. That will allow for larger GPUs without the price going up significantly for only a 5-10 percent performance increase.

Other than MCM, I have a hope that AMD put in more engineering for each "core" when it comes to performance, so 3 times the core count for only double the price of the 6900XT would be nice.
 
If AMD is really going the MCM route with RDNA 3, then AMD can put together 2-4 dies worth without fab process yields being an issue. That will allow for larger GPUs without the price going up significantly for only a 5-10 percent performance increase.

Other than MCM, I have a hope that AMD put in more engineering for each "core" when it comes to performance, so 3 times the core count for only double the price of the 6900XT would be nice.
If AMD goes the MCM route, I'm curious to know how they'll overcome the issues on what's essentially CrossFire on a single card.
 
  • Like
Reactions: renz496
Will we continue to increase wattage? I thought die shrinks were supposed to increase performance while decrease power usage?
This was an observed thing called Dennard Scaling, much like Moore's Law, where the power density of an IC remains constant even though the transistor density goes up. However this broke down in around the early 2000s.

Basically everything you might've heard about IC manufacturing to improve efficiency or whatnot has gone downhill in either the 2000s or 2010s.
 
  • Like
Reactions: renz496
"It's possible AMD will do something similar to Nvidia's Ampere where the number of shaders per functional unit (WGP)." - That's some wild speculation, even if unintelligible.
 
I hope these numbers are not fantasy. Just like Nvidia changed their technology and their "double the cuda cores" were BS and were actually now counted twice.
No, NVIDIA's definition of what a "CUDA core" remained consistent. Since the only compute spec that seems to matter is FLOPS, NVIDIA has only counted the units that perform floating point units. For the longest time (probably since GeForce 8), each shader unit was FP and INT capable. Pascal had 128 units per SM. When Turing came around, NVIDIA split it between 64 FP and 64 INT units. For Ampere, there are 128 FP units, but only 64 of them are capable of INT workloads and are partitioned off from the other 64FP units

The reason they did this was so an SM could do concurrent INT and FP workloads. But again, since the only compute spec that seems to matter is FLOPS, the FP units are what matter more.

while being weaker
I mean, if you want to consider half of the shader units in an SM losing their ability to do INT workloads, which is still a minority of workloads by a wide margin, sure, I guess.
 
Last edited:
this.
a card can be amazing spec wise but if it suffer crossfire/sli's latency/dropepd frames....it is basically dead at launch.
Also my big concern, but I'm pretty sure thats why they tested infinity cache on the last gen. A huge pool of fast cache that's "local" to all dies should go a long way towards smoothing out frame time consistency.
 
Also my big concern, but I'm pretty sure thats why they tested infinity cache on the last gen. A huge pool of fast cache that's "local" to all dies should go a long way towards smoothing out frame time consistency.
It could, but at the same time, I'm not confident it'll amount to much given the performance issues on the 6400/6500 when it runs out of VRAM. Unless somehow the memory controller can handle servicing the requests of multiple GPUs at once for the same thing, the GPUs are going to be starved for data from memory every other cycle at least.

The other issue is if something other than AFR can be used. AFR has issues with microstuttering and that's something you can't fix with more cache. And while I would say something like "render every nth pixel" would would be a nice compromise between AFR and SFR, it also breaks anything that needs to know what colors its neighbors are unless you spend another pass sending the other chiplet's frame buffers to each other.

The problem with GPUs have that CPUs don't is the workload we give GPUs is a real-time problem. They need to be done within say 16ms if you want 60 FPS. The workloads we give CPUs is more like "if it gets done sooner, that's nice."
 
If AMD is really going the MCM route with RDNA 3, then AMD can put together 2-4 dies worth without fab process yields being an issue. That will allow for larger GPUs without the price going up significantly for only a 5-10 percent performance increase.
How would the price not go up significantly?
Each die they use in this GPU is a GPU less that they can sell, this should be a huge factor in pricing them.
It's not like they have lots of FABs where they can produce dies at cost so they don't have to worry about it, they can only get so many dies from tsmc.
 
How would the price not go up significantly?
Each die they use in this GPU is a GPU less that they can sell, this should be a huge factor in pricing them.
It's not like they have lots of FABs where they can produce dies at cost so they don't have to worry about it, they can only get so many dies from tsmc.
The same thing could be said about say Ryzen 9, Threadripper, or Epyc. Those would've been 2, 4, 6, or 8 Ryzen 3/5/7 processors.
 
The same thing could be said about say Ryzen 9, Threadripper, or Epyc. Those would've been 2, 4, 6, or 8 Ryzen 3/5/7 processors.
The 3600x was $250 and the 3900x was $500, 2 times the price for 2 times the cores...
The 5600x was $300 that's $50 more money for the same amount of cores.
So yes, the same thing could be said and is being said, that's what I'm saying, prices are going to go up if they use multiple dies.
 
The 3600x was $250 and the 3900x was $500, 2 times the price for 2 times the cores...
The 5600x was $300 that's $50 more money for the same amount of cores.
So yes, the same thing could be said and is being said, that's what I'm saying, prices are going to go up if they use multiple dies.
I probably misunderstood your earlier argument, so let me present a counter example: The Ryzen 2700X and 3700X had the same launch price, despite the 3700X having two dies.

The number of dies doesn't automatically imply a higher cost to produce. It depends on how much you get at per wafer and how much you can sell them for. For example, you can make something like 500 Ryzen 7 5800X processors from two 300mm^2 wafers. At $450 each, assuming each wafer is $400 and ignoring all the R&D, testing, etc. costs for the sake of simplicity, you're looking at a profit margin of like $224,200. You can make about 110 Navi 21s for the 6900XT with a single 300mm^2 wafer. At $1000 each (although this is inflating the actual price since that's for the entire video card, not the GPU), that's a profit margin of $109,600.
 
You can make about 110 Navi 21s for the 6900XT with a single 300mm^2 wafer. At $1000 each (although this is inflating the actual price since that's for the entire video card, not the GPU), that's a profit margin of $109,600.
And if you now can only make 55 GPUs (or even only ~25) because you use up twice the dies you make half the profit or are forced to increase the price of each unit to compensate, what am I not getting here?
 
No, NVIDIA's definition of what a "CUDA core" remained consistent. Since the only compute spec that seems to matter is FLOPS, NVIDIA has only counted the units that perform floating point units. For the longest time (probably since GeForce 8), each shader unit was FP and INT capable. Pascal had 128 units per SM. When Turing came around, NVIDIA split it between 64 FP and 64 INT units. For Ampere, there are 128 FP units, but only 64 of them are capable of INT workloads and are partitioned off from the other 64FP units

The reason they did this was so an SM could do concurrent INT and FP workloads. But again, since the only compute spec that seems to matter is FLOPS, the FP units are what matter more.


I mean, if you want to consider half of the shader units in an SM losing their ability to do INT workloads, which is still a minority of workloads by a wide margin, sure, I guess.
Its still a a PR move to sound they have "more".
 
It could, but at the same time, I'm not confident it'll amount to much given the performance issues on the 6400/6500 when it runs out of VRAM. Unless somehow the memory controller can handle servicing the requests of multiple GPUs at once for the same thing, the GPUs are going to be starved for data from memory every other cycle at least.

The other issue is if something other than AFR can be used. AFR has issues with microstuttering and that's something you can't fix with more cache. And while I would say something like "render every nth pixel" would would be a nice compromise between AFR and SFR, it also breaks anything that needs to know what colors its neighbors are unless you spend another pass sending the other chiplet's frame buffers to each other.

The problem with GPUs have that CPUs don't is the workload we give GPUs is a real-time problem. They need to be done within say 16ms if you want 60 FPS. The workloads we give CPUs is more like "if it gets done sooner, that's nice."
I think infinity cache did a good job of easing memory pressure on the 6800/6900 esp. in light of how they compete against nvidia higher bandwidth 6x skus.

Here's a question born of my ignorance: how does the existing scheduler handle so many cores today? There must be multiple scheduler levels that break things down in to groups, no? If so, could they extrapolate that same concept to a front end scheduler that is shared by all dies and more intelligent than an AFR type scheduler?
 
If AMD is really going the MCM route with RDNA 3, then AMD can put together 2-4 dies worth without fab process yields being an issue. That will allow for larger GPUs without the price going up significantly for only a 5-10 percent performance increase.

Other than MCM, I have a hope that AMD put in more engineering for each "core" when it comes to performance, so 3 times the core count for only double the price of the 6900XT would be nice.

i imagine what kind of nightmare will be see on the software side if AMD go this route. even with MI200/250X AMD only combine two dies. and the system still see two GPU instead of one according to anandtech.
 
Will we continue to increase wattage? I thought die shrinks were supposed to increase performance while decrease power usage?
die shrink is not a magic. it still need to obey law of physics. reducing power consumption via die shrink has been a problem since 28nm. AMD and nvidia skip 20nm and stick with 28nm for another generation because the power saving did not worth the cost to use the most expensive node at the time.