Review Nvidia GeForce RTX 4070 Ti Super review: More VRAM and bandwidth, slightly higher performance

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
I think that for us to see a change, people should start migrating to 4k. It's about time, it's worth it, and that would move the companies to load their cards with more VRAM, bandwith and core power. While everyone is at HD or 1440p, card makers will continue sitting on their asses.
This is a really optimistic, and quite frankly out of touch, view of how companies operate. Even if 4k had 50% of the market tomorrow you're not suddenly going to see $300 16GB video cards with RTX 3090 levels of performance (what I consider to be minimum viable for 4k). Instead what we'd likely see is even greater emphasis on upscaling and frame generation than we do already with relatively linear actual performance increases.

We have a rather cynical duopoly right now where AMD is perfectly happy pricing around whatever nvidia does. Consumer graphics just isn't high enough up on AMD's growth priorities for them to be interested in aggressive pricing to gain marketshare. Intel is the chance that the market has, but who knows how much they're interested in trying to take marketshare versus using the technology for mobile and enterprise.
 
I think that for us to see a change, people should start migrating to 4k. It's about time, it's worth it, and that would move the companies to load their cards with more VRAM, bandwith and core power. While everyone is at HD or 1440p, card makers will continue sitting on their asses.
People migrating to 4k is what spurred the development of DLSS and equivalent techniques by Intel and AMD.

We have a rather cynical duopoly right now where AMD is perfectly happy pricing around whatever nvidia does. Consumer graphics just isn't high enough up on AMD's growth priorities for them to be interested in aggressive pricing to gain marketshare.
You're too cynical by half. AMD's graphics division has been losing money for most of the past year. I think they can't afford to price most of their GPUs much lower.
 
You're too cynical by half. AMD's graphics division has been losing money for most of the past year. I think they can't afford to price most of their GPUs much lower.
How did you determine this? Their earnings report has enterprise graphics in datacenter and gaming combines discrete graphics along with semi custom and neither has reported any losses.

They cited semi custom propping up gaming earnings in Q2 '23 and discrete doing so in Q3' 23. Now maybe you've got some other information that I'm not seeing, but nothing seems inherently unhealthy here.
 
How did you determine this? Their earnings report has enterprise graphics in datacenter and gaming combines discrete graphics along with semi custom and neither has reported any losses.

They cited semi custom propping up gaming earnings in Q2 '23 and discrete doing so in Q3' 23. Now maybe you've got some other information that I'm not seeing, but nothing seems inherently unhealthy here.
If you want to be cynical, just think about that for a moment. Why would AMD combine custom solutions (PS5, Xbox, Steam Deck) with consumer graphics, if consumer graphics wasn't doing poorly? I'm quite sure bituser is correct that AMD's dedicated GPU division is losing money on its own, and the consoles are the only real bright spot right now.

Data center of course is a different story, and EPYC CPUs and Instinct GPUs are doing quite well right now by all accounts.
 
If you want to be cynical, just think about that for a moment. Why would AMD combine custom solutions (PS5, Xbox, Steam Deck) with consumer graphics, if consumer graphics wasn't doing poorly? I'm quite sure bituser is correct that AMD's dedicated GPU division is losing money on its own, and the consoles are the only real bright spot right now.

Data center of course is a different story, and EPYC CPUs and Instinct GPUs are doing quite well right now by all accounts.
It impossible to know unless they release the numbers. If I had to guess I'd say AMD is about breaking even with its dGPUs, but in the business world that's just as bad as losing money, which could be the very reason they combined the segments for reporting.
 
  • Like
Reactions: Order 66
If you want to be cynical, just think about that for a moment. Why would AMD combine custom solutions (PS5, Xbox, Steam Deck) with consumer graphics, if consumer graphics wasn't doing poorly? I'm quite sure bituser is correct that AMD's dedicated GPU division is losing money on its own, and the consoles are the only real bright spot right now.
They reorganized all of their financial reporting starting in 2022 in a way that actually makes sense. Until then computing and graphics had been combined and semi custom was in with enterprise and embedded. That's not to say semi custom doesn't make up a majority of the combined revenue given that its volume is at least 5-6x that of discrete graphics. There just isn't a correlation between two business units being part of the same section and the smaller one losing money.
 
  • Like
Reactions: Order 66
If you want to be cynical, just think about that for a moment. Why would AMD combine custom solutions (PS5, Xbox, Steam Deck) with consumer graphics, if consumer graphics wasn't doing poorly? I'm quite sure bituser is correct that AMD's dedicated GPU division is losing money on its own, and the consoles are the only real bright spot right now.

Data center of course is a different story, and EPYC CPUs and Instinct GPUs are doing quite well right now by all accounts.

One reason would be because they share the exact same R&D programs. Remember AMD was experimenting with automated modular chip design as far back as Bulldozer, and while that core design wasn't the greatest, the modular chip technology they built around it is fundamental to how they design chips now. The majority of costs for those products isn't in the manufacturing, it's in the R&D required to design them. By reusing these designs AMD can lower the cost of on-demand custom products, which is a very lucrative business model.
 
  • Like
Reactions: Order 66
It's not easy. My girlfriend's old gpu had to be replaced last july, so I gave her my rtx 2070 and i got a 4070 ti. It's a good card and it runs the games I play quite good in 4k (I usually play older games: 3 to 6 years old in general). I'm happy with it. It renders 3d very fast as well. But of course, the prices are madness.
I don't think we can really future proof like before. GPU wise, it's probably a question of 3 years maximum now... But honestly speaking, latest games are really lacking, so on the other hand, there are no games that would really justify so much horsepower in my opinion.

I think that for us to see a change, people should start migrating to 4k. It's about time, it's worth it, and that would move the companies to load their cards with more VRAM, bandwith and core power. While everyone is at HD or 1440p, card makers will continue sitting on their asses.
I had to downgrade from 2K to 1080p recently LMAO.
If you so much want people to upgrade to 4K the Mr Moneybags out there should stop buying GPUs at three time their prices and maybe people who don't have free money could get to play at 4K.
Or maybe if you REALLY want people to upgrade we could start by sending you my PayPal link.
 
Last edited:
I'm not sure what else you spend your money on but inflation has affected every aspect of the economy. Just about everything I buy for a family of 5 is twice as expensive as it was a decade ago.
Apparently every other aspect of the economy doesn't include most other PC components that still have reasonable prices.
By the logic of the people in this thread a Ryzen 7600 should be 540€ yet somehow it is 200€.
 
I had to downgrade from 2K to 1080p recently LMAO.
That's unfortunate. If I were faced with such a decision, I'd at least use DLSS or FSR, because if you've got a 1440p monitor and are rendering at 1080p, those upscaling technologies are going to give you a better image than conventional upscaling.

If you so much want people to upgrade to 4K the Mr Moneybags out there should stop buying GPUs at three time their prices and maybe people who don't have free money could get to play at 4K.
It's like I said, above: the margins on these cards aren't padded so much that they would start selling them at half price. If the market for such expensive GPUs collapsed, what would happen is they would only build the lower-tier models. We'd go back to not having a x090 tier and x080 (or lower) would be top of the line.

Nvidia could even just walk away from the gaming market, if it ceased to be very profitable.
 
  • Like
Reactions: Order 66
Apparently every other aspect of the economy doesn't include most other PC components that still have reasonable prices.
By the logic of the people in this thread a Ryzen 7600 should be 540€ yet somehow it is 200€.
Ah, now we get to the real meat of the problem.

First, let's look at trends in GPU price & performance. Here's some data I compiled on the top (mainstream) tier of Nvidia cards. You could argue that maybe I should've used Titan cards or the RTX 3090 Ti, but my decision of which GPUs to include was based on how representative each was of its generation (i.e. neither an outlier in value or launch timing).

ModelLaunchNodeAreaM TransistorsGFLOPSMSRPmm^2/$MTr/$GFLOPS/$
GTX 980 Ti
2015-06-01​
28 nm​
601​
8000​
5632​
$649​
0.926​
12.33​
8.68​
GTX 1080 Ti
2017-03-05​
16 nm​
471​
12000​
10609​
$699​
0.674​
17.17​
15.18​
RTX 2080 Ti
2018-09-27​
12 nm​
754​
18600​
11750​
$999​
0.755​
18.62​
11.76​
RTX 3090
2020-09-24​
8 nm​
628​
28300​
29280​
$1499​
0.419​
18.88​
19.53​
RTX 4090
2022-10-12​
4 nm​
609​
76300​
73100​
$1599​
0.381​
47.72​
45.72​

One discontinuity worth noting is the GFLOPS increase for RTX 3000, which I believe reflects a SM redesign whereby the theoretical throughput per SM doubled but practical throughput didn't. Another detail the keen observer will notice is the relative lack of improvement between the 1000-series and 2000-series, which is largely due to Nvidia's decision to spend most of their additional transistor budget on Tensor cores and RT cores.

One thing that's truly impressive is just how much faster the RTX 4090 is than its predecessor. That's the product of its clockspeed increase (1.6x) and its increase in CUDA cores (1.56x). I think the big transistor increase is mostly from a huge increase in L2 cache. Overall, I think the improvements in this generation owe a lot to the fact that RTX 2000 and RTX 3000 were being held back by inferior process nodes.

A theme we can clearly see is that the area of these flagship GPU dies tends to sit just above 600 mm^2. This is a large die to make on a cutting edge node.

By contrast, let's look at flagship Ryzen CPUs.

ModelLaunchNodeArea*M TransistorsCoresBase FreqGFLOPSMSRPmm^2/$MTr/$GFLOPS/$
1800X
2017-03-02​
14 nm​
213​
4800​
8​
3.6​
0.46​
$499​
0.427​
9.62​
0.92​
2700X
2018-04-19​
12 nm​
192​
4800​
8​
3.7​
0.47​
$329​
0.584​
14.59​
1.44​
3950X
2019-11-25​
7 nm​
148​
7600​
16​
3.5​
1.79​
$749​
0.198​
10.15​
2.39​
5950X
2020-11-05​
7 nm​
166​
8300​
16​
3.4​
1.74​
$799​
0.208​
10.39​
2.18​
7950X
2022-09-26​
5 nm​
140​
13140​
16​
4.5​
2.30​
$699​
0.200​
18.80​
3.30​

First, I want to address a set of discontinuities between the 2700X and 3950X, which is due to the fact that the former is a monolithic die and the latter just looks at the compute dies. I didn't want to deal with trying to factor in the I/O die into the calculations and I believe the substantial majority of the cost is in the compute dies, anyhow, since the I/O die is made on an older node. This is even more true of the dual-CCD CPUs. By doing this, the mm^2/$ calculations are thrown slightly off, but probably still more comparable to the GPU data, above.

The big takeaway is that mainstream CPU dies are much smaller. If you look at MTr/$ or mm^2/$, you're still getting a better value from GPUs, in spite of the fact that a CPU is basically the dies in a package, while a GPU includes VRMs, a PCB, a thermal solution, fans, and GBs of expensive GDDR memory!

With that said, Ryzen has stayed a little flatter in both area and transistor pricing. It's interesting to note that they've also stayed pretty flat in GFLOPS/$, although I'm not terribly confident about those theoretical GFLOPS numbers and will try to firm them up.

To gain some insight into why area and transistor pricing are breaking down, consider:

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F505a30e4-733d-49e4-86ea-f074a170373a_684x630.png


Why is that happening? Let's start by looking at wafer price trends:

sCQhvqs.png


So, should we blame TSMC or ASML for being greedy, instead of Nvidia?

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa276c7-737d-4895-a03e-d41275f1fc7b_778x394.png


Maybe not. Newer wafers are objectively more resource-intensive to produce.

Furthermore, design costs are also increasing at a similar pace:

cost-1024x542.png


GPUs rely primarily on node improvements and scale to deliver better performance. As long as newer nodes continue getting more expensive to design & manufacture, the GPU perf/$ curve will inevitably flatten, either by new generations offering smaller performance gains, being even more expensive, or some combination.

Try looking at it this way: you're not losing anything. GPU perf/$ is better than it's ever been. It just won't improve at the same rate as before. That sense of loss you're feeling is that GPUs got a "free ride" on Moore's Law bandwagon, and that's finally slowing down. It's sad to see a good thing end, but I think that's where we are.
 
Last edited:
With how little of a change there is with the super refresh, if anyone has the card, as well as photoshop, can someone try the card with the Adobe Camera Raw's AI noise reduction?

Adobe usually has issues with supporting newer cards when it comes to fully utilizing the GPU for acceleration.

For the noise reduction, it when adobe's GPU acceleration craps out, it goes from taking a few seconds, to like 5 minutes.

Wondering if this time around if their habit of breaking acceleration will apply this time around.
 
Last edited:
Ah, now we get to the real meat of the problem.

First, let's look at trends in GPU price & performance. Here's some data I compiled on the top (mainstream) tier of Nvidia cards. You could argue that maybe I should've used Titan cards or the RTX 3090 Ti, but my decision of which GPUs to include was based on how representative each was of its generation (i.e. neither an outlier in value or launch timing).

ModelLaunchNodeAreaM TransistorsGFLOPSMSRPmm^2/$MTr/$GFLOPS/$
GTX 980 Ti
2015-06-01​
28 nm​
601​
8000​
5632​
$649​
0.926​
12.33​
8.68​
GTX 1080 Ti
2017-03-05​
16 nm​
471​
12000​
10609​
$699​
0.674​
17.17​
15.18​
RTX 2080 Ti
2018-09-27​
12 nm​
754​
18600​
11750​
$999​
0.755​
18.62​
11.76​
RTX 3090
2020-09-24​
8 nm​
628​
28300​
29280​
$1499​
0.419​
18.88​
19.53​
RTX 4090
2022-10-12​
4 nm​
609​
76300​
73100​
$1599​
0.381​
47.72​
45.72​

One discontinuity worth noting is the GFLOPS increase for RTX 3000, which I believe reflects a SM redesign whereby the theoretical throughput per SM doubled but practical throughput didn't. Another detail the keen observer will notice is the relative lack of improvement between the 1000-series and 2000-series, which is largely due to Nvidia's decision to spend most of their additional transistor budget on Tensor cores and RT cores.

One thing that's truly impressive is just how much faster the RTX 4090 is than its predecessor. That's the product of its clockspeed increase (1.6x) and its increase in CUDA cores (1.56x). I think the big transistor increase is mostly from a huge increase in L2 cache. Overall, I think the improvements in this generation owe a lot to the fact that RTX 2000 and RTX 3000 were being held back by inferior process nodes.

A theme we can clearly see is that the area of these flagship GPU dies tends to sit just above 600 mm^2. This is a large die to make on a cutting edge node.

By contrast, let's look at flagship Ryzen CPUs.

ModelLaunchNodeArea*M TransistorsCoresBase FreqGFLOPSMSRPmm^2/$MTr/$GFLOPS/$
1800X
2017-03-02​
14 nm​
213​
4800​
8​
3.6​
0.46​
$499​
0.427​
9.62​
0.92​
2700X
2018-04-19​
12 nm​
192​
4800​
8​
3.7​
0.47​
$329​
0.584​
14.59​
1.44​
3950X
2019-11-25​
7 nm​
148​
7600​
16​
3.5​
1.79​
$749​
0.198​
10.15​
2.39​
5950X
2020-11-05​
7 nm​
166​
8300​
16​
3.4​
1.74​
$799​
0.208​
10.39​
2.18​
7950X
2022-09-26​
5 nm​
140​
13140​
16​
4.5​
2.30​
$699​
0.200​
18.80​
3.30​

First, I want to address a set of discontinuities between the 2700X and 3950X, which is due to the fact that the former is a monolithic die and the latter just looks at the compute dies. I didn't want to deal with trying to factor in the I/O die into the calculations and I believe the substantial majority of the cost is in the compute dies, anyhow, since the I/O die is made on an older node. This is even more true of the dual-CCD CPUs. By doing this, the mm^2/$ calculations are thrown slightly off, but probably still more comparable to the GPU data, above.

The big takeaway is that mainstream CPU dies are much smaller. If you look at MTr/$ or mm^2/$, you're still getting a better value from GPUs, in spite of the fact that a CPU is basically the dies in a package, while a GPU includes VRMs, a PCB, a thermal solution, fans, and GBs of expensive GDDR memory!

With that said, Ryzen has stayed a little flatter in both area and transistor pricing. It's interesting to note that they've also stayed pretty flat in GFLOPS/$, although I'm not terribly confident about those theoretical GFLOPS numbers and will try to firm them up.

To gain some insight into why area and transistor pricing are breaking down, consider:
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F505a30e4-733d-49e4-86ea-f074a170373a_684x630.png

Why is that happening? Let's start by looking at wafer price trends:
sCQhvqs.png

So, should we blame TSMC or ASML for being greedy, instead of Nvidia?
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa276c7-737d-4895-a03e-d41275f1fc7b_778x394.png

Maybe not. Newer wafers are objectively more resource-intensive to produce.

Furthermore, design costs are also increasing at a similar pace:
cost-1024x542.png

GPUs rely primarily on node improvements and scale to deliver better performance. As long as newer nodes continue getting more expensive to design & manufacture, the GPU perf/$ curve will inevitably flatten, either by new generations offering smaller performance gains, being even more expensive, or some combination.

Try looking at it this way: you're not losing anything. GPU perf/$ is better than it's ever been. It just won't improve at the same rate as before. That sense of loss you're feeling is that GPUs got a "free ride" on Moore's Law bandwagon, and that's finally slowing down. It's sad to see a good thing end, but I think that's where we are.
When you put it that way, it makes a lot of sense. Well said!
 
With how little of a change there is with the super refresh, if anyone has the card, as well as photoshop, can someone try the card with the Adobe Camera Raw's AI noise reduction?

Adobe usually has issues with supporting newer cards when it comes to fully utilizing the GPU for acceleration.

For the noise reduction, it when adobe's GPU acceleration craps out, it goes from taking a few seconds, to like 5 minutes.

Wondering if this time around if their habit of breaking acceleration will apply this time around.
Send me a PM with exactly what you want me to do. This isn't a new architecture so presumably it should be fine — it's usually stuff like when the 40-series first launched in 2022 that causes issues with apps from Adobe. But I do have Photoshop... and I never use Adobe Camera RAW stuff, so I'm not sure what needs to be done. I probably need some massive resolution file to use it on as well?
 
Agreed on the issues with a new launch. Photoshop and their related tools usually run into some issues for entirely new launches, especially in the past when they added support for 3D models, a new generation of GPU would lead to stability issues with 3D models until Adobe could release an update. Later when they added their "AI" features, those trends started making a comeback, but instead of stability issues (they would remain fully stable), the behavior shifted from a user getting 100% GPU usage on their old card, and the AI features processing a raw file within a few seconds, to the card using 15-20% on the new card, and taking 10+ minutes.

Since Adobe has not updated from 16.1.1 yet, one silver lining for the RTX 40XX Super launch is that the changes are so small that users potentially won't have to wait for Adobe to release another update to restore full GPU acceleration of software that is extremely picky about which cards it will offer full acceleration for. 😀

While for most people, it will not be an issue whether it offers full acceleration for AI or not, if someone uses a camera like a Sony a7R V and needs to batch process a bunch of raw files above a certain ISO, then that can mean he difference between whether such a feature can be used in a practical sense or not.
Sample raw file if anyone wants to see Photoshop/ ACR take 15 minutes per image. https://www.dpreview.com/sample-galleries/0641980724/sony-a7r-v-sample-gallery/5943306850
 
  • Like
Reactions: Order 66
This is not just about memory capacity, it's also about the additional 33% bandwidth and 33% larger L2 cache. Basically, we got 10% more compute and 33% more memory capacity, bandwidth, cache. I was expecting the general trend to be closer to 4080 than the 4070 Ti because of that, and in most of the tests that didn't happen — it's closer to the 4070 Ti.
Well you keep repeating this argument both in your review and in the comments but this is just not true...
Despite beeing an AD103 chip the 4070 Ti Super doesn't have the 64MB of L2 cache the 4080/4080S have.
Nvidia cut it down to 48MB to match the AD104 4070 Ti and 4070S while keeping a considerable performance gap with the 4080/4080S.
This L2 cache downgrade is what explains the underwhelming bump in performance relative to the 4070 Ti non-Super and why the 4080s perform so much better. Bandwidth alone doesn't make that much of a difference.
Likewise the huge improvement of the 4070S over the 4070 comes mostly from an increase of the L2 cache (36MB to 48MB).
 
Well you keep repeating this argument both in your review and in the comments but this is just not true...
Despite beeing an AD103 chip the 4070 Ti Super doesn't have the 64MB of L2 cache the 4080/4080S have.
Nvidia cut it down to 48MB to match the AD104 4070 Ti and 4070S while keeping a considerable performance gap with the 4080/4080S.
This L2 cache downgrade is what explains the underwhelming bump in performance relative to the 4070 Ti non-Super and why the 4080s perform so much better. Bandwidth alone doesn't make that much of a difference.
Likewise the huge improvement of the 4070S over the 4070 comes mostly from an increase of the L2 cache (36MB to 48MB).
Sorry, there was an issue in reported L2 cache size on the initial Nvidia documentation, and there were also four GPUs launched in a three week period. It appears that I made a mistake somewhere — the RTX 4070 Super has 48MB L2 cache (more than the 36MB on the non-Super), but the 4070 Ti Super is indeed 48MB L2 cache as you note. So it's 8x6MB rather than the maximum 8x8MB, and ends up the same as the 6x8MB that was on AD104. I'll see about correcting the text.

Keep in mind that L2 cache size is ultimately a "paper spec" item, which is why I don't worry too much about it. It can impact performance, but it's not something that is generally directly measurable unless you run some low-level test workloads aimed at determining cache size. Ultimately, performance is what matters, and all indications are that it's a lack of compute holding back the 4070 Ti Super, not a lack of real or effective bandwidth.
 
Sorry, there was an issue in reported L2 cache size on the initial Nvidia documentation, and there were also four GPUs launched in a three week period. It appears that I made a mistake somewhere — the RTX 4070 Super has 48MB L2 cache (more than the 36MB on the non-Super), but the 4070 Ti Super is indeed 48MB L2 cache as you note. So it's 8x6MB rather than the maximum 8x8MB, and ends up the same as the 6x8MB that was on AD104. I'll see about correcting the text.

Keep in mind that L2 cache size is ultimately a "paper spec" item, which is why I don't worry too much about it. It can impact performance, but it's not something that is generally directly measurable unless you run some low-level test workloads aimed at determining cache size. Ultimately, performance is what matters, and all indications are that it's a lack of compute holding back the 4070 Ti Super, not a lack of real or effective bandwidth.
That was indeed a frenzied month for hardware reviewers, I totally get that, and Nvidia did not make this any easier with the card specs all over the place^^
Thanks for adressing the cache size information.
I think the 48MB has a huge impact on the overall performance of the card. Nvidia did this purposedly to keep a clear segmentation between the cards tier, justifying the higher price tag of the 4080S.
Giving the 4070 Ti Super the same 64MB of L2 cache would have brought the 2 cards way too close, hurting the sales of the bigger one^^
 
I think the 48MB has a huge impact on the overall performance of the card. Nvidia did this purposely to keep a clear segmentation between the cards tier, justifying the higher price tag of the 4080S. Giving the 4070 Ti Super the same 64MB of L2 cache would have brought the 2 cards way too close, hurting the sales of the bigger one.
I simply don't believe this is true — that the cache size was done for market segmentation. Or at least, not that it had a significant overall impact on the final performance. Nvidia absolutely wants market segmentation, but it did that with SMs and core counts more than cache sizes on the 192-bit and higher interface width GPUs.

Larger L2 caches increase cache hit rates, thus increasing effective bandwidth. The 16GB on a 256-bit interface already improved the base bandwidth by 33%. An extra 16MB of L2 cache might improve the hit rates something like 5~10 percent, which would improve effective bandwidth by 40~46 percent over the 4070 Ti non-Super. It simply doesn't need that bandwidth. It also has the same 112 ROPs as the 4080/4080 Super, which is 40% more than the 4070 Ti, so it's not ROPs holding back the 4070 Ti Super.

The reality is that the 4070 Ti Super has 66 SMs, the 4080 has 76 SMs (15% more), and the 4080 Super has 80 SMs (21% more). That translates almost directly into more compute. The Ada architecture is very much compute limited in most gaming workloads. It needs sufficient memory bandwidth, and the 128-bit interface on the 4060 Ti likely holds that card back a bit, but 4070 and above don't appear to be seriously bandwidth (or effective bandwidth) limited.
 
I simply don't believe this is true — that the cache size was done for market segmentation. Nvidia absolutely wants market segmentation, but it did that with SMs and core counts more than cache sizes. Or at least, it did that on the 192-bit and higher interface width GPUs.

Larger L2 caches increase cache hit rates, thus increasing effective bandwidth. The 16GB on a 256-bit interface already improved the base bandwidth by 33%. An extra 16MB of L2 cache might improve the hit rates something like 5~10 percent, which would improve effective bandwidth by 40~46 percent over the 4070 Ti non-Super. It simply doesn't need that bandwidth. It also has the same 112 ROPs as the 4080/4080 Super, which is 40% more than the 4070 Ti, so it's not ROPs holding back the 4070 Ti Super.

The reality is that the 4070 Ti Super has 66 SMs, the 4080 has 76 SMs (15% more), and the 4080 Super has 80 SMs (21% more). That translates almost directly into more compute. The Ada architecture is very much compute limited in most gaming workloads. It needs sufficient memory bandwidth, and the 128-bit interface on the 4060 Ti likely holds that card back a bit, but 4070 and above don't appear to be seriously bandwidth (or effective bandwidth) limited.
That makes sense^^
Thank you for the insight !
 
Current price, maybe. Not the launch price, though. In 14 months, I expect yield and pricing of these wafers both improved, especially if you consider when they would've actually purchased said wafer capacity (i.e. during the "chip crunch", for the initial shipments vs. current wafers were probably reserved during the PC slump of the past year).
Why not?

They had AD103 dies at launch which didn't pass the binning and would have been fully trashed otherwise.

What they are doing is akin to a biscuits company picking up all the crumbled biscuits from the production line, grinding them, and selling a package of ground biscuits at almost the same price as a package of whole, unbroken, biscuits.
 
Why not?

They had AD103 dies at launch which didn't pass the binning and would have been fully trashed otherwise.

What they are doing is akin to a biscuits company picking up all the crumbled biscuits from the production line, grinding them, and selling a package of ground biscuits at almost the same price as a package of whole, unbroken, biscuits.
That's a flawed analogy, because it suggests the "bad" chips are actually less than they purport to be. A cookie that's made from crushed up pieces of other cookies wouldn't have the same texture or consistency.

This is more like a production line having some cars that can hit 160 MPH and others that have imperfections that limit the engine to 150 MPH. (Feel free to substitute horsepower or RPMs or some other metric.) Because the harvested chips are tested and determined to be unable to meet the higher requirements.

RTX 4070 Ti used a fully functional AD104 chip, with 60 SMs. If even one SM failed to run properly, it couldn't be a 4070 Ti. The same for the RTX 4080 Super now: It has all 84 SMs of AD103. If one SM or more is defective, it has to be downbinned to an RTX 4080. Except that's discontinued, so it will be an RTX 4070 Ti Super now, or some other AD103-based product.

There are probably data center GPUs that use slightly downbinned AD103, AD104, etc. as well, that just aren't listed in the normal places. Nvidia has probably half a dozen or more bins for each Ada GPU that determine where it can be used.

Now, if yields are all perfect, yes, some chips will just have functional bits fused off so it can be sold as a lower tier product. If you look at RTX 4070 as an example, which only has 46 of the potential 60 SMs enabled, I'm certain a lot of the chips have more than 46 SMs that are perfectly fine. So right now it's something like this:

AD104 binning:
Chips with all 60 SMs working go in the RTX 4500 Ada Generation pile. (Formerly also RTX 4070 Ti.)
58+ SMs goes into the L4 GPU accelerator bin with 58 SMs, or alternatively the RTX 4080 Laptop GPU bin.
56+ SMs go in the 4070 Super bin and are sold with 56 enabled SMs.
48+ SMs can be used in either RTX 4000 Ada Generation or the SFF variant, obviously catching a lot of more functional chips.
46+ SMs working can be RTX 4070, so there's overlap with the line above.
40+ SMs are required for RTX 3500 Ada Generation.

I suspect that probably accounts for 99% of the chips that come from an AD104 wafer. There will probably always be 1% of chips that are just too flawed to be useful, but all of the above require either all 192-bits of memory interface, or at least 160-bits. So if a chip has two defective memory channels, it would need to be a "special edition" RTX 4060 Ti or similar — and we'll probably see those at some point.
 
That's a flawed analogy, because it suggests the "bad" chips are actually less than they purport to be.
The analogy is good because it does explain what is going on.

1. You make dies for a full product
2. During binning you find that some percentage of those dies have 10% of parts non-functional
3. Instead of discarding them, you invent another product which has 10% less parts effectively redefining what a "full product" means
4. You repeat this process with other dies which have even less functional parts creating even more "full products"

That is all nice from the efficiency, ecology, waste management, ease of manufacturing, etc... perspective.

The problem I have with this process is that the prices of those lesser products (because let's face it, they are refuse compared to the fully functional die) don't seem to reflect that at all. They are just charging as much as possible at this point.
 
The analogy is good because it does explain what is going on.

1. You make dies for a full product
2. During binning you find that some percentage of those dies have 10% of parts non-functional
3. Instead of discarding them, you invent another product which has 10% less parts effectively redefining what a "full product" means
4. You repeat this process with other dies which have even less functional parts creating even more "full products"

That is all nice from the efficiency, ecology, waste management, ease of manufacturing, etc... perspective.

The problem I have with this process is that the prices of those lesser products (because let's face it, they are refuse compared to the fully functional die) don't seem to reflect that at all. They are just charging as much as possible at this point.
Mashing them up and putting them back together isn't what happens, though, not even remotely. And the downbinned chips are simply part of the design. You build in redundancies as well as the ability to fuse off parts of the chip so that you can actually make use of nearly all the dies that come from a wafer. To do otherwise would be wasteful and a poor approach to modern design.

Also, some chips will "fail" because they just need more of a lower tier — so they don't actually fail. Parts of some chips might also fail due to the voltages needed for the highest clocks. So a chip can be fully functional, but if it needs 1.3V to hit the desired 2.5 GHz or whatever while other chips can do that with 1.2V, they can turn off the bits that need more voltage.

Calling a chip "refuse" just because it's not fully enabled is, IMO, stupid. I guess all RTX 4090 cards are "refuse" because they only have 128 out of 144 SMs enabled? And all H100 and A100 chips are also "refuse" because they're not fully enabled?

I feel people are getting what the spec sheet says, nothing more, nothing less. With the cost of modern chips, it's simply the smart way of doing things. You don't buy a medium pizza and then complain that you didn't get a large pizza.