News AMD Big Navi and RDNA 2 GPUs: Release Date, Specs, Everything We Know

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

bit_user

Polypheme
Ambassador
Raytracing is ultra GPU calc inensive- there are no shortcuts
Uh, not really. Ray-triangle intersection tests are fairly compute intensive, but BVH traversal is pretty cheap, computationally. The problem is more bandwidth limited, as it lacks good coherency.

The goal of ray tracing is to be able to play in a Pixar movie level game in real time - all of those movies use ray tracing - some frames on the original Toy Story took 8 hours of CPU time minimum (per frame, 24fps or 30fps). Neither AMD nor Nvidia have a lock on ray tracing - and there are no shortcuts to full scene ray tracing - it is raw GPU power.
Except Toy Story wasn't ray traced! In fact, most movies didn't use ray tracing, until about 10-15 years ago.
 
  • Like
Reactions: bigpinkdragon286
@JarredWaltonGPU

I'm probably reading more into this than you meant, but will AMD support Global Illumination? That would be a big deal, because it's one of the applications of Nvidia's tensor cores (i.e. for denoising - an essential part of GI).
True full scene GI is basically just doing ray tracing on everything -- like what Minecraft RTX does, on some level. Theoretically, anything with full DXR support (or VulkanRT) can do full GI ... but most games using RTX are only focusing on one or a few subsets to keep performance up. Metro Exodus does RT GI, but it's actually indirect lighting and not the same as the full GI you're referring to.

Realistically, I think we need close to 10X the ray tracing hardware of today's GPUs to make modest full scene GI (or full ray tracing, if you prefer) viable. Maybe more than 10X. And right now, for games, Nvidia isn't even using the Tensor cores to do denoising AFAIK.
 
I don't know about that. I suspect they just plowed that dividend into adding more CUs.

That still nets you more fps, but fps per GFLOPS would stay constant.
We'll have to see. If AMD is saying RDNA2 has 50% higher perf/watt than RDNA1, I doubt it will be just from adding CUs. We know the CUs had to change (to add RT support at a minimum), and in the process I suspect AMD found some other items it could tweak to improve efficiency.
 
Agreed on the RX 590, but I think AMD was just trying to cash in on cryptomining.

With regard to the Radeon VII, in fact it does basically match the RTX 2080! And it was virtually free to release, because it's a slightly-cripled datacenter GPU that they just sold to consumers as a bonus. Originally, they weren't planning to, but then they saw a market opportunity.
I really would love to know hard numbers on how many Radeon VII cards were produced and actually sold. I suspect the final number (not including Radeon Instinct) is extremely small -- less than 100K for sure, but probably less than 20K and maybe even less than 10K. Well, it might be higher because some imaging pros might have gone with Radeon VII instead of paying more for Radeon Instinct, but if we were able to look at GPUs actually used for playing games? I think the majority of Radeon VII parts used in gaming are only in the hands of reviewers and "Team Red" members. I don't know anyone who actually bought one, because it was overpriced and underperforming.
 
I don't know about that. I suspect they just plowed that dividend into adding more CUs.

That still nets you more fps, but fps per GFLOPS would stay constant.

Only time will tell. I'm curious if they gutted any int 8 or whatever to streamline for gaming. AMD are splitting GPU lines so they can sacrifice a bit of compute on the gaming cards like Nvidia does.
 
Only time will tell. I'm curious if they gutted any int 8 or whatever to streamline for gaming. AMD are splitting GPU lines so they can sacrifice a bit of compute on the gaming cards like Nvidia does.
My guess -- assuming Intel does multiple dies -- would be:

Xe HPC: huge die, INT8 + FP16 + FP32 + FP64 support
Xe HP: modest die (potentially two or three different die), FP16 and FP32 only, with 1/32 speed FP64 or something. INT8 might not require too much more die space, though.
Xe LP: Outside of the test vehicle, I don't think Xe LP will be used outside of iGPU. Probably same support as Xe HP, though -- so FP16 and FP32 for sure, possibly INT8, but only FP64 via emulation or whatever.

If Intel really tries to enter the dedicated GPU market, it will have three Xe LP variants -- similar to how AMD and Nvidia typically do multiple variants of each family. Xe HPC is basically a given, though, unless Raja posted a fake wafer picture, or Intel cans the 10nm monolithic design to wait for Ponte Vicchio (and I don't think either is at all likely).
 

bit_user

Polypheme
Ambassador
I think we found an Intel employee.
For that degree of partisan zeal, he'd either have to be a new employee (i.e. still in the "honeymoon" period) or an investor. I'm betting on #2. Probably a recent investor who's either under water or was betting it'll go up a lot more.

Based on his claims, it's a little hard to believe he doesn't have better things to be doing than writing those long screeds.
 

bit_user

Polypheme
Ambassador
@Deicidium369 > the slide that said the full screen real time RT will be done IN THE CLOUD..
Yeah, it doesn't even pass a basic sanity test. How could MS afford this or guarantee that the connection quality would support it, for all of the console's users?

What has Intel changed about its micro-architecture since Skylake? Absolutely nothing.
I didn't think so, but I wanted to check on it. Apparently, they did fix a couple bugs (Loop Stream Decoder) and add some in-silicon mitigations, but that's really it. And for models with higher core count, the total L3 cache increased, due to cache slices being part of the core tiles. But the cache-per-core ratio stayed constant.


So, basically, just: core count, mitigations, and fabrication tweaks.
 

bit_user

Polypheme
Ambassador
Metro Exodus does RT GI, but it's actually indirect lighting and not the same as the full GI you're referring to.

Realistically, I think we need close to 10X the ray tracing hardware of today's GPUs to make modest full scene GI (or full ray tracing, if you prefer) viable. Maybe more than 10X.
Well, I guess whatever it is that Nvidia is calling GI. Related:


And right now, for games, Nvidia isn't even using the Tensor cores to do denoising AFAIK.
Now that you mention it, that seems to be the case. When Turing launched, I'm sure I read something about how they were using the Tensor Cores for de-noising, though. I recall seeing demos of their AI denoising and how it was so good that they could even get away with just a couple lighting rays per image pixel.
 

bit_user

Polypheme
Ambassador
I don't know anyone who actually bought one, because it was overpriced and underperforming.
Well, it was priced comparably to RTX 2080 and GTX 1080 Ti, to which it's roughly equivalent.

Anyway, I got one for GPU Compute: the fp64 and 16 GB of HBM2 @ 1 TB/s. We're unlikely to see anything like it, in that price range, in the foreseeable future. However, I use it in Linux and have not run any games on it.
 

bit_user

Polypheme
Ambassador
I'm curious if they gutted any int 8
What's your deal with int8? You've mentioned it several times, but do you actually know what GCN had, in the way of int8, or what it's useful for?

It's not general-purpose, and is tailored closely to the needs of convolutional neural network forward-propagation.

Vega 7nm includes the additional instructions listed below:​
...​
* V_DOT4_I32_I8​
* V_DOT4_U32_U8​
...​

Those were only added to GCN in Radeon VII. They're exactly analogous to the DP4A instruction, added in Nvidia's GP102, GP104, and GP106 Pascal GPUs, which is probably what prompted AMD to do it.

 
Last edited:

bit_user

Polypheme
Ambassador
@JarredWaltonGPU
GCN issued one instruction per wave every four cycles; RDNA issues an instruction every cycle.
This makes it sound like the improvement from GCN to RDNA is bigger than it is. GCN would dispatch an instruction from one wave every cycle, but it rotated between feeding 4 different SIMD16 pipes from 4 different waves. So, when you got the pipelines full, you'd retire 64 results every cycle. It's explained, here:


the PS4 clearly had the faster GPU. It had 18 CUs or 1152 GPU cores
AMD's terminology is to call these "shaders", which I feel is better, as it avoids promoting a misunderstanding of these cores as being comparable to CPU cores (which you've already said they're not).

Nvidia's terminology is designed to spread confusion and make their GPUs sound even more impressive than they are. So, especially when not even talking about Nvidia GPUs, please don't fall prey to their semantic chicanery and co-opt their terminology.

Arcturus could eventually make its way into consumer / prosumer graphics cards.
No, Phoronix has reported that driver changes indicate Arcturus will lack any 3D graphics blocks.


I bought my Radeon VII, under the assumption that it was the last hybrid HPC/3D Graphics card AMD will probably ever make.
 
Guys should I hold of buying 5600 xt and wait for 6600 xt with RT or not?
What are you using now, and how much are you willing to spend? My expectation is that Nvidia and AMD will both launch high-end ($500+) graphics cards in September/October. It will probably be several more months at least before the 'mid-range' cards come out. AMD launched the RX 5600 XT in January, so it's not really due for a refresh until at least next January. Plus, it sounds like there's a lot of demand for TSMC 7nm so AMD might not have a ton of chips ready to go. Don't be surprised if an RX 6600 XT (or whatever it's called) ends up being priced closer to $350-$400. $400 is certainly my expectation for Nvidia's RTX 3060, because I'm pessimistic. :p

If you've got at least something at the RX 570 level (R9 390), I'd try to stretch that as long as you can until you get games that don't run fast enough at the settings you want. Once that happens, take the plunge on a new GPU. There are games where 1080p at high settings is going to fall below 60 fps on a 570, though, so it's at that point where I'd want to upgrade. Next month for Nvidia we should get Ampere, and then Navi 2x in October. Wait and see is the sage advice.
 
  • Like
Reactions: bit_user
Aug 6, 2020
2
2
15
What are you using now, and how much are you willing to spend? My expectation is that Nvidia and AMD will both launch high-end ($500+) graphics cards in September/October. It will probably be several more months at least before the 'mid-range' cards come out. AMD launched the RX 5600 XT in January, so it's not really due for a refresh until at least next January. Plus, it sounds like there's a lot of demand for TSMC 7nm so AMD might not have a ton of chips ready to go. Don't be surprised if an RX 6600 XT (or whatever it's called) ends up being priced closer to $350-$400. $400 is certainly my expectation for Nvidia's RTX 3060, because I'm pessimistic. :p

If you've got at least something at the RX 570 level (R9 390), I'd try to stretch that as long as you can until you get games that don't run fast enough at the settings you want. Once that happens, take the plunge on a new GPU. There are games where 1080p at high settings is going to fall below 60 fps on a 570, though, so it's at that point where I'd want to upgrade. Next month for Nvidia we should get Ampere, and then Navi 2x in October. Wait and see is the sage advice.
Thx for response. I was using gtx 970 until it broke. Now i have backup gpu in R5 240 with i7 4770, 16gb 1600 ddr3, 21:9 1080p 60Hz monitor. I play mostly Apex and thats not very demanding game. I wantto buy 144Hz monitor so therefore my interset for 5600 xt. Otherwise used rx 580 would be good replacement for my monitor.
I wanted to buy a 5600 xt and later a 144 Hz monitor, but then the news of the arrival of new consoles shook my plans.
I decided to buy used rx 580 now and 144hz monitor on black friday and next year we will see for 5600/6600 xt gpu
 

chris189

Distinguished
Jan 24, 2011
18
0
18,510
I believe that RDNA 2 is going to be bigger than 96 ROP. I think its going to have 128 at least & maybe even more, because I believe AMD is aiming to surpass the TITAN (Ampere).

So in my opinion I think Tom's Hardware estimates are low-balling AMD's supposed "HALO" product.
 
I believe that RDNA 2 is going to be bigger than 96 ROP. I think its going to have 128 at least & maybe even more, because I believe AMD is aiming to surpass the TITAN (Ampere).

So in my opinion I think Tom's Hardware estimates are low-balling AMD's supposed "HALO" product.
If we're low-balling, then so are most of the other rumors circulating. AMD hasn't had a chip bigger than 500mm2 since Fiji, which was not a great design overall. It did new stuff like HBM, but underperformed. If not for cryptocurrency mining, it wouldn't have sold many units at all -- and in fact I don't think the Fury cards ever showed up on the Steam Hardware Survey, which means they likely never even reached more than 0.15% of the gaming market.

Anyway, historically AMD has never gone quite as big as Nvidia and managed to pull it off. Fiji tried and failed, Vega tried and came up short. Hawaii by comparison was 'only' 438mm2. So now we have Navi 10 at 251mm2, and we're saying Navi 21 will double the CUs, add ray tracing hardware, and you think it will be even bigger than that? Navi 21 could very well be the largest GPU chip (not counting silicon interposer on Fiji and Vega) that AMD has ever created. I very much doubt it's going to be as large as GA100, however.

The reality is AMD doesn't sell as many cards into the professional space as Nvidia. It can't afford to make a huge chip that no one buys. That's the whole purpose of the chiplet approach for Zen 2. Maybe the Frontier supercomputer will change the dynamics and allow AMD to really go big, but it's a huge leap from Navi 10 to something that maxes out reticle size.
 

chris189

Distinguished
Jan 24, 2011
18
0
18,510
If we're low-balling, then so are most of the other rumors circulating. AMD hasn't had a chip bigger than 500mm2 since Fiji, which was not a great design overall. It did new stuff like HBM, but underperformed. If not for cryptocurrency mining, it wouldn't have sold many units at all -- and in fact I don't think the Fury cards ever showed up on the Steam Hardware Survey, which means they likely never even reached more than 0.15% of the gaming market.

Anyway, historically AMD has never gone quite as big as Nvidia and managed to pull it off. Fiji tried and failed, Vega tried and came up short. Hawaii by comparison was 'only' 438mm2. So now we have Navi 10 at 251mm2, and we're saying Navi 21 will double the CUs, add ray tracing hardware, and you think it will be even bigger than that? Navi 21 could very well be the largest GPU chip (not counting silicon interposer on Fiji and Vega) that AMD has ever created. I very much doubt it's going to be as large as GA100, however.

The reality is AMD doesn't sell as many cards into the professional space as Nvidia. It can't afford to make a huge chip that no one buys. That's the whole purpose of the chiplet approach for Zen 2. Maybe the Frontier supercomputer will change the dynamics and allow AMD to really go big, but it's a huge leap from Navi 10 to something that maxes out reticle size.

Yeah thats so true. I just hope they yield more than 96 ROP or they're going to get smoked by NVIDIA. lol
 

noko

Distinguished
Jan 22, 2001
2,414
1
19,785
The chart says <1600 for Navi 21, clock speed would make a world of difference with a 80 CU part if not a 72 CU. 2000mhz is 1.25x or 25% over the 1600mhz. AMD probably has a lot of leeway on clock speeds to adjust to compete and since Nvidia 3080 is rated at 320w that just allows AMD to crank them up if they want to. As for the Ram, they can use anything that will get the job done. If the PS5 is pushing 2.3ghz with a given power restraint, what will Navi 21 do? Bigger does not necessarily mean slower, look at Nvidia GPU's and previous AMD GPU's. Will 2.5ghz+ be on the table -> over 1.6ghz that would be 1.56x just due to clock speeds compared to the article

AMD recently as in the last several years have been stating a goal or number and then exceeding that number as in IPC gains on their CPU's. If AMD says 2x for RNDA 2 and 3x for RNDA 3 representing performance over RNDA 1 -> They will try to exceed that, which makes easier marketing, good press, happier more willing customers.

AMD is not shy on designing and selling high power cards, the Vega 64 LC was rated at 345w, 2 slot Liquid cooler which had no issue removing 400w+ heat and not dump that into your case. AMD if they want to and have the leeway in clock speeds (maybe why they are waiting a little) can custom tune to reach the goal of beating Nvidia tiers they want to compete in.

HBM2e is still on the table, makes for simpler board designs (cheaper) easier for cooling (cheaper) takes less power (allows more power to the GPU for performance). AMD already have designed successfully for ~500mm GPU's transposer, cooler design, card layout etc. I still think that is a possibility, some rumor of Navi 21 having both HBM and DDR 6 memory controller. A 16gb HBM2e which could deliver over 900mb/s data rates with two stacks, would be rather stout if Navi 21 can clock high, good efficiency etc.

AMD being rather quiet, to me means they have a lot of options they can use, as in clock speeds, ram etc. to compete. I expect AMD to beat the 3080 in general rasterization, as for the bigger bother 3090 who knows.

I will wait for sufficient number of Nvidia/AMD cards and AIBs to evaluate before I upgrade.
 
Last edited:

noko

Distinguished
Jan 22, 2001
2,414
1
19,785
There are three things AMD specifically put out going from RNDA 1x to RNDA 2x:
  • IPC improvements
  • Logic Enhancements to reduce complexity and switching power (better perf/w, 50% better)
  • Physical Optimization -> INCREASE CLOCK SPEEDS
PS5 does 2.3ghz and that is with a CPU in the same core plus restricted by a game console configuration. We do not know the upper most end for clock speed with efficiency and power, clock power curve. AMD may well have an option to push 2.3ghz+ out of a 72CU-80CU GPU if they want to.

If AMD has the option to, they should blow up Nvidia Ampere line in Rasterization performance then it will become a battle of useful features after that.
 
There are three things AMD specifically put out going from RNDA 1x to RNDA 2x:
  • IPC improvements
  • Logic Enhancements to reduce complexity and switching power (better perf/w, 50% better)
  • Physical Optimization -> INCREASE CLOCK SPEEDS
PS5 does 2.3ghz and that is with a CPU in the same core plus restricted by a game console configuration. We do not know the upper most end for clock speed with efficiency and power, clock power curve. AMD may well have an option to push 2.3ghz+ out of a 72CU-80CU GPU if they want to.

If AMD has the option to, they should blow up Nvidia Ampere line in Rasterization performance then it will become a battle of useful features after that.
PS5 has 36 CUs clocked at up to 2.23 GHz. Xbox Series X has 52 CUs clocked at up to 1.825 GHz. Notice how more CUs ended up with a much lower clockspeed? That's my assumption for a 72 CU RDNA2 GPU -- it will not clock as high as the PS5, and possibly not even as high as the XBSX. But we don't know for sure -- a higher TDP could certainly allow for higher clocks. 300W and 2.0-2.1GHz? Maybe!
 

noko

Distinguished
Jan 22, 2001
2,414
1
19,785
PS5 has 36 CUs clocked at up to 2.23 GHz. Xbox Series X has 52 CUs clocked at up to 1.825 GHz. Notice how more CUs ended up with a much lower clockspeed? That's my assumption for a 72 CU RDNA2 GPU -- it will not clock as high as the PS5, and possibly not even as high as the XBSX. But we don't know for sure -- a higher TDP could certainly allow for higher clocks. 300W and 2.0-2.1GHz? Maybe!
That is most likely a power limiting factor, form factor, cooling and overall cost for a console. Still we won't know until real hardware is reviewed representing actual products. 1600mhz would be slower than the Vega LC as a note which clocked at 1700mhz. My 5700 XT does 2100mhz without too much effort. With a more mature process, experience and time (also less power needed for a given performance) it should clock rather high. Their goal was indeed faster clock speeds which 1600 would really be a fail in that goal.
 
Last edited:

Dnaangel

Distinguished
Mar 9, 2011
26
1
18,540
Unless my math is wrong; 80 Compute Units * 96 Raster Operations * 12.28 TFLOPS of single precision floating point (FP32).

Not bad AMD. Not bad. Let's see what that translates to in the real world

Thats just shy of the specs of the 2080Ti if memory serves. I mean this would be nice if AMD dropped this two years ago... the Problem is they are poised to compete against Nvidia's top tier Turing, while Ampere was just announced lol. NV states 3070 will match or outperform the 2080Ti for only 500 bucks. AMD has a steep steep hill to climb right now and are seemingly yet again, a generation behind.